Extractive summarization using complex networks and syntactic dependency
Contribuinte(s) |
UNIVERSIDADE DE SÃO PAULO |
---|---|
Data(s) |
29/10/2013
29/10/2013
02/08/2013
|
Resumo |
The realization that statistical physics methods can be applied to analyze written texts represented as complex networks has led to several developments in natural language processing, including automatic summarization and evaluation of machine translation. Most importantly, so far only a few metrics of complex networks have been used and therefore there is ample opportunity to enhance the statistics-based methods as new measures of network topology and dynamics are created. In this paper, we employ for the first time the metrics betweenness, vulnerability and diversity to analyze written texts in Brazilian Portuguese. Using strategies based on diversity metrics, a better performance in automatic summarization is achieved in comparison to previous work employing complex networks. With an optimized method the Rouge score (an automatic evaluation method used in summarization) was 0.5089, which is the best value ever achieved for an extractive summarizer with statistical methods based on complex networks for Brazilian Portuguese. Furthermore, the diversity metric can detect keywords with high precision, which is why we believe it is suitable to produce good summaries. It is also shown that incorporating linguistic knowledge through a syntactic parser does enhance the performance of the automatic summarizers, as expected, but the increase in the Rouge score is only minor. These results reinforce the suitability of complex network methods for improving automatic summarizers in particular, and treating text in general. (C) 2011 Elsevier B.V. All rights reserved. FAPESP FAPESP CNPq (Brazil) CNPq (Brazil) |
Identificador |
PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, AMSTERDAM, v. 391, n. 4, supl. 1, Part 3, pp. 1855-1864, FEB 15, 2012 0378-4371 http://www.producao.usp.br/handle/BDPI/36427 10.1016/j.physa.2011.10.015 |
Idioma(s) |
eng |
Publicador |
ELSEVIER SCIENCE BV AMSTERDAM |
Relação |
PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS |
Direitos |
closedAccess Copyright ELSEVIER SCIENCE BV |
Palavras-Chave | #SUMMARIZATION #COMPLEX NETWORKS #DIVERSITY METRICS #ENTROPY #SYNTACTICAL DEPENDENCY #COMMUNITY STRUCTURE #CENTRALITY #LANGUAGE #WORLD #PHYSICS, MULTIDISCIPLINARY |
Tipo |
article original article publishedVersion |