Tackling redundancy in text summarization through different levels of language analysis


Autoria(s): Lloret, Elena; Palomar, Manuel
Contribuinte(s)

Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos

Procesamiento del Lenguaje y Sistemas de Información (GPLSI)

Data(s)

08/09/2014

08/09/2014

01/09/2013

Resumo

One of the main challenges to be addressed in text summarization concerns the detection of redundant information. This paper presents a detailed analysis of three methods for achieving such goal. The proposed methods rely on different levels of language analysis: lexical, syntactic and semantic. Moreover, they are also analyzed for detecting relevance in texts. The results show that semantic-based methods are able to detect up to 90% of redundancy, compared to only the 19% of lexical-based ones. This is also reflected in the quality of the generated summaries, obtaining better summaries when employing syntactic- or semantic-based approaches to remove redundancy.

This research has been funded by the Spanish Government under the project TEXT-MESS 2.0 (TIN2009-13391-C04-01). Moreover, it has been also supported by Conselleria d'Educació —Generalitat Valenciana (grant no. PROMETEO/2009/119 and ACOMP/2010/286).

Identificador

Computer Standards & Interfaces. 2013, 35(5): 507-518. doi:10.1016/j.csi.2012.08.001

0920-5489 (Print)

1872-7018 (Online)

http://hdl.handle.net/10045/40116

10.1016/j.csi.2012.08.001

Idioma(s)

eng

Publicador

Elsevier

Relação

http://dx.doi.org/10.1016/j.csi.2012.08.001

Direitos

info:eu-repo/semantics/restrictedAccess

Palavras-Chave #Text summarization #Redundancy detection #Natural language processing #Information access #Lenguajes y Sistemas Informáticos
Tipo

info:eu-repo/semantics/article