Extractive summarization using complex networks and syntactic dependency


Autoria(s): Amancio, Diego R.; Nunes, Maria G. V.; Oliveira Junior, Osvaldo Novais de; Costa, Luciano da Fontoura
Contribuinte(s)

UNIVERSIDADE DE SÃO PAULO

Data(s)

29/10/2013

29/10/2013

02/08/2013

Resumo

The realization that statistical physics methods can be applied to analyze written texts represented as complex networks has led to several developments in natural language processing, including automatic summarization and evaluation of machine translation. Most importantly, so far only a few metrics of complex networks have been used and therefore there is ample opportunity to enhance the statistics-based methods as new measures of network topology and dynamics are created. In this paper, we employ for the first time the metrics betweenness, vulnerability and diversity to analyze written texts in Brazilian Portuguese. Using strategies based on diversity metrics, a better performance in automatic summarization is achieved in comparison to previous work employing complex networks. With an optimized method the Rouge score (an automatic evaluation method used in summarization) was 0.5089, which is the best value ever achieved for an extractive summarizer with statistical methods based on complex networks for Brazilian Portuguese. Furthermore, the diversity metric can detect keywords with high precision, which is why we believe it is suitable to produce good summaries. It is also shown that incorporating linguistic knowledge through a syntactic parser does enhance the performance of the automatic summarizers, as expected, but the increase in the Rouge score is only minor. These results reinforce the suitability of complex network methods for improving automatic summarizers in particular, and treating text in general. (C) 2011 Elsevier B.V. All rights reserved.

FAPESP

FAPESP

CNPq (Brazil)

CNPq (Brazil)

Identificador

PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, AMSTERDAM, v. 391, n. 4, supl. 1, Part 3, pp. 1855-1864, FEB 15, 2012

0378-4371

http://www.producao.usp.br/handle/BDPI/36427

10.1016/j.physa.2011.10.015

http://dx.doi.org/10.1016/j.physa.2011.10.015

Idioma(s)

eng

Publicador

ELSEVIER SCIENCE BV

AMSTERDAM

Relação

PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS

Direitos

closedAccess

Copyright ELSEVIER SCIENCE BV

Palavras-Chave #SUMMARIZATION #COMPLEX NETWORKS #DIVERSITY METRICS #ENTROPY #SYNTACTICAL DEPENDENCY #COMMUNITY STRUCTURE #CENTRALITY #LANGUAGE #WORLD #PHYSICS, MULTIDISCIPLINARY
Tipo

article

original article

publishedVersion