Citation

BibTex format

@inproceedings{Altuncu:2018,
author = {Altuncu, T and Yaliraki, SN and Barahona, M},
title = {Content-driven, unsupervised clustering of news articles through multiscale graph partitioning},
url = {http://arxiv.org/abs/1808.01175v1},
year = {2018}
}

RIS format (EndNote, RefMan)

TY  - CPAPER
AB - The explosion in the amount of news and journalistic content being generatedacross the globe, coupled with extended and instantaneous access to informationthrough online media, makes it difficult and time-consuming to monitor newsdevelopments and opinion formation in real time. There is an increasing needfor tools that can pre-process, analyse and classify raw text to extractinterpretable content; specifically, identifying topics and content-drivengroupings of articles. We present here such a methodology that brings togetherpowerful vector embeddings from Natural Language Processing with tools fromGraph Theory that exploit diffusive dynamics on graphs to reveal naturalpartitions across scales. Our framework uses a recent deep neural network textanalysis methodology (Doc2vec) to represent text in vector form and thenapplies a multi-scale community detection method (Markov Stability) topartition a similarity graph of document vectors. The method allows us toobtain clusters of documents with similar content, at different levels ofresolution, in an unsupervised manner. We showcase our approach with theanalysis of a corpus of 9,000 news articles published by Vox Media over oneyear. Our results show consistent groupings of documents according to contentwithout a priori assumptions about the number or type of clusters to be found.The multilevel clustering reveals a quasi-hierarchy of topics and subtopicswith increased intelligibility and improved topic coherence as compared toexternal taxonomy services and standard topic detection methods.
AU - Altuncu,T
AU - Yaliraki,SN
AU - Barahona,M
PY - 2018///
TI - Content-driven, unsupervised clustering of news articles through multiscale graph partitioning
UR - http://arxiv.org/abs/1808.01175v1
ER -