8 resultados para vignette in-text

em Universidad de Alicante


Relevância:

100.00% 100.00%

Publicador:

Resumo:

One of the main challenges to be addressed in text summarization concerns the detection of redundant information. This paper presents a detailed analysis of three methods for achieving such goal. The proposed methods rely on different levels of language analysis: lexical, syntactic and semantic. Moreover, they are also analyzed for detecting relevance in texts. The results show that semantic-based methods are able to detect up to 90% of redundancy, compared to only the 19% of lexical-based ones. This is also reflected in the quality of the generated summaries, obtaining better summaries when employing syntactic- or semantic-based approaches to remove redundancy.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In the past years, an important volume of research in Natural Language Processing has concentrated on the development of automatic systems to deal with affect in text. The different approaches considered dealt mostly with explicit expressions of emotion, at word level. Nevertheless, expressions of emotion are often implicit, inferrable from situations that have an affective meaning. Dealing with this phenomenon requires automatic systems to have “knowledge” on the situation, and the concepts it describes and their interaction, to be able to “judge” it, in the same manner as a person would. This necessity motivated us to develop the EmotiNet knowledge base — a resource for the detection of emotion from text based on commonsense knowledge on concepts, their interaction and their affective consequence. In this article, we briefly present the process undergone to build EmotiNet and subsequently propose methods to extend the knowledge it contains. We further on analyse the performance of implicit affect detection using this resource. We compare the results obtained with EmotiNet to the use of alternative methods for affect detection. Following the evaluations, we conclude that the structure and content of EmotiNet are appropriate to address the automatic treatment of implicitly expressed affect, that the knowledge it contains can be easily extended and that overall, methods employing EmotiNet obtain better results than traditional emotion detection approaches.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper describes a module for the prediction of emotions in text chats in Spanish, oriented to its use in specific-domain text-to-speech systems. A general overview of the system is given, and the results of some evaluations carried out with two corpora of real chat messages are described. These results seem to indicate that this system offers a performance similar to other systems described in the literature, for a more complex task than other systems (identification of emotions and emotional intensity in the chat domain).

Relevância:

90.00% 90.00%

Publicador:

Resumo:

In this paper we present a complete system for the treatment of both geographical and temporal dimensions in text and its application to information retrieval. This system has been evaluated in both the GeoTime task of the 8th and 9th NTCIR workshop in the years 2010 and 2011 respectively, making it possible to compare the system to contemporary approaches to the topic. In order to participate in this task we have added the temporal dimension to our GIR system. The system proposed here has a modular architecture in order to add or modify features. In the development of this system, we have followed a QA-based approach as well as multi-search engines to improve the system performance.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Proyecto emergente centrado en la desambiguación de topónimos y la detección del foco geográfico en el texto. La finalidad es mejorar el rendimiento de los sistemas de recuperación de información geográfica. Se describen los problemas abordados, la hipótesis de trabajo, las tareas a realizar y los objetivos parciales alcanzados.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

El foco geográfico de un documento identifica el lugar o lugares en los que se centra el contenido del texto. En este trabajo se presenta una aproximación basada en corpus para la detección del foco geográfico en el texto. Frente a otras aproximaciones que se centran en el uso de información puramente geográfica para la detección del foco, nuestra propuesta emplea toda la información textual existente en los documentos del corpus de trabajo, partiendo de la hipótesis de que la aparición de determinados personajes, eventos, fechas e incluso términos comunes, pueden resultar fundamentales para esta tarea. Para validar nuestra hipótesis, se ha realizado un estudio sobre un corpus de noticias geolocalizadas que tuvieron lugar entre los años 2008 y 2011. Esta distribución temporal nos ha permitido, además, analizar la evolución del rendimiento del clasificador y de los términos más representativos de diferentes localidades a lo largo del tiempo.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The great amount of text produced every day in the Web turned it as one of the main sources for obtaining linguistic corpora, that are further analyzed with Natural Language Processing techniques. On a global scale, languages such as Portuguese - official in 9 countries - appear on the Web in several varieties, with lexical, morphological and syntactic (among others) differences. Besides, a unified spelling system for Portuguese has been recently approved, and its implementation process has already started in some countries. However, it will last several years, so different varieties and spelling systems coexist. Since PoS-taggers for Portuguese are specifically built for a particular variety, this work analyzes different training corpora and lexica combinations aimed at building a model with high-precision annotation in several varieties and spelling systems of this language. Moreover, this paper presents different dictionaries of the new orthography (Spelling Agreement) as well as a new freely available testing corpus, containing different varieties and textual typologies.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

In recent years, Twitter has become one of the most important microblogging services of the Web 2.0. Among the possible uses it allows, it can be employed for communicating and broadcasting information in real time. The goal of this research is to analyze the task of automatic tweet generation from a text summarization perspective in the context of the journalism genre. To achieve this, different state-of-the-art summarizers are selected and employed for producing multi-lingual tweets in two languages (English and Spanish). A wide experimental framework is proposed, comprising the creation of a new corpus, the generation of the automatic tweets, and their assessment through a quantitative and a qualitative evaluation, where informativeness, indicativeness and interest are key criteria that should be ensured in the proposed context. From the results obtained, it was observed that although the original tweets were considered as model tweets with respect to their informativeness, they were not among the most interesting ones from a human viewpoint. Therefore, relying only on these tweets may not be the ideal way to communicate news through Twitter, especially if a more personalized and catchy way of reporting news wants to be performed. In contrast, we showed that recent text summarization techniques may be more appropriate, reflecting a balance between indicativeness and interest, even if their content was different from the tweets delivered by the news providers.