The role of statistical and semantic features in single-document extractive summarization


Autoria(s): Vodolazova, Tatiana; Lloret, Elena; Muñoz, Rafael; Palomar, Manuel
Contribuinte(s)

Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos

Procesamiento del Lenguaje y Sistemas de Información (GPLSI)

Data(s)

26/03/2014

26/03/2014

10/04/2013

Resumo

This paper reports on the further results of the ongoing research analyzing the impact of a range of commonly used statistical and semantic features in the context of extractive text summarization. The features experimented with include word frequency, inverse sentence and term frequencies, stopwords filtering, word senses, resolved anaphora and textual entailment. The obtained results demonstrate the relative importance of each feature and the limitations of the tools available. It has been shown that the inverse sentence frequency combined with the term frequency yields almost the same results as the latter combined with stopwords filtering that in its turn proved to be a highly competitive baseline. To improve the suboptimal results of anaphora resolution, the system was extended with the second anaphora resolution module. The present paper also describes the first attempts of the internal document data representation.

Identificador

Artificial Intelligence Research. 2013, 2(3): 35-44. doi:10.5430/air.v2n3p35

1927-6974 (Print)

1927-6982 (Online)

http://hdl.handle.net/10045/36345

10.5430/air.v2n3p35

Idioma(s)

eng

Publicador

Sciedu Press

Relação

http://dx.doi.org/10.5430/air.v2n3p35

Direitos

This work is licensed under a Creative Commons Attribution 3.0 License

info:eu-repo/semantics/openAccess

Palavras-Chave #Extractive text summarization #Semantics #Statistics #Coreference resolution #Lenguajes y Sistemas Informáticos
Tipo

info:eu-repo/semantics/article