5 resultados para automated lexical analysis
em Universidad de Alicante
Resumo:
Preliminary research demonstrated the EmotiBlog annotated corpus relevance as a Machine Learning resource to detect subjective data. In this paper we compare EmotiBlog with the JRC Quotes corpus in order to check the robustness of its annotation. We concentrate on its coarse-grained labels and carry out a deep Machine Learning experimentation also with the inclusion of lexical resources. The results obtained show a similarity with the ones obtained with the JRC Quotes corpus demonstrating the EmotiBlog validity as a resource for the SA task.
Resumo:
EmotiBlog is a corpus labelled with the homonymous annotation schema designed for detecting subjectivity in the new textual genres. Preliminary research demonstrated its relevance as a Machine Learning resource to detect opinionated data. In this paper we compare EmotiBlog with the JRC corpus in order to check the EmotiBlog robustness of annotation. For this research we concentrate on its coarse-grained labels. We carry out a deep ML experimentation also with the inclusion of lexical resources. The results obtained show a similarity with the ones obtained with the JRC demonstrating the EmotiBlog validity as a resource for the SA task.
Resumo:
One of the main challenges to be addressed in text summarization concerns the detection of redundant information. This paper presents a detailed analysis of three methods for achieving such goal. The proposed methods rely on different levels of language analysis: lexical, syntactic and semantic. Moreover, they are also analyzed for detecting relevance in texts. The results show that semantic-based methods are able to detect up to 90% of redundancy, compared to only the 19% of lexical-based ones. This is also reflected in the quality of the generated summaries, obtaining better summaries when employing syntactic- or semantic-based approaches to remove redundancy.
Resumo:
Automated human behaviour analysis has been, and still remains, a challenging problem. It has been dealt from different points of views: from primitive actions to human interaction recognition. This paper is focused on trajectory analysis which allows a simple high level understanding of complex human behaviour. It is proposed a novel representation method of trajectory data, called Activity Description Vector (ADV) based on the number of occurrences of a person is in a specific point of the scenario and the local movements that perform in it. The ADV is calculated for each cell of the scenario in which it is spatially sampled obtaining a cue for different clustering methods. The ADV representation has been tested as the input of several classic classifiers and compared to other approaches using CAVIAR dataset sequences obtaining great accuracy in the recognition of the behaviour of people in a Shopping Centre.
Resumo:
The geographical proximity and socioeconomic dependence on the United States brought about a deep rooted anglicization of the Cuban Spanish lexis and social strata, especially throughout the Neocolonial period (1902–1959). This study is based on the revision of a renowned newspaper of that time, Diario de la Marina, and the corresponding elaboration of a corpus of English-induced loanwords. Diario de la Marina particularly targeted upper social class, and only crónicas sociales (society pages’ columns) and print advertising were revised because of their fully descriptive texts, which encoded the ruling class ideology and consumerism. The findings show that there existed a high number of lexical and cultural anglicisms in the sociolect in question, and that the sociolinguistic anglicization was openly embraced by the upper socioeconomic stratum, entailing a differentiating sign of sophistication and social stratification. Likewise, a number of the anglicisms collected, particularly those related with social events, are unused in contemporary Cuban Spanish, which suggests a major semantic shifting in this sociolect after 1959.