10 resultados para Automatic merging of lexical resources
em Universidad de Alicante
Resumo:
In this paper we present an automatic system for the extraction of syntactic semantic patterns applied to the development of multilingual processing tools. In order to achieve optimum methods for the automatic treatment of more than one language, we propose the use of syntactic semantic patterns. These patterns are formed by a verbal head and the main arguments, and they are aligned among languages. In this paper we present an automatic system for the extraction and alignment of syntactic semantic patterns from two manually annotated corpora, and evaluate the main linguistic problems that we must deal with in the alignment process.
Resumo:
This paper addresses the problem of the automatic recognition and classification of temporal expressions and events in human language. Efficacy in these tasks is crucial if the broader task of temporal information processing is to be successfully performed. We analyze whether the application of semantic knowledge to these tasks improves the performance of current approaches. We therefore present and evaluate a data-driven approach as part of a system: TIPSem. Our approach uses lexical semantics and semantic roles as additional information to extend classical approaches which are principally based on morphosyntax. The results obtained for English show that semantic knowledge aids in temporal expression and event recognition, achieving an error reduction of 59% and 21%, while in classification the contribution is limited. From the analysis of the results it may be concluded that the application of semantic knowledge leads to more general models and aids in the recognition of temporal entities that are ambiguous at shallower language analysis levels. We also discovered that lexical semantics and semantic roles have complementary advantages, and that it is useful to combine them. Finally, we carried out the same analysis for Spanish. The results obtained show comparable advantages. This supports the hypothesis that applying the proposed semantic knowledge may be useful for different languages.
Resumo:
In this paper we present the enrichment of the Integration of Semantic Resources based in WordNet (ISR-WN Enriched). This new proposal improves the previous one where several semantic resources such as SUMO, WordNet Domains and WordNet Affects were related, adding other semantic resources such as Semantic Classes and SentiWordNet. Firstly, the paper describes the architecture of this proposal explaining the particularities of each integrated resource. After that, we analyze some problems related to the mappings of different versions and how we solve them. Moreover, we show the advantages that this kind of tool can provide to different applications of Natural Language Processing. Related to that question, we can demonstrate that the integration of semantic resources allows acquiring a multidimensional vision in the analysis of natural language.
Resumo:
Preliminary research demonstrated the EmotiBlog annotated corpus relevance as a Machine Learning resource to detect subjective data. In this paper we compare EmotiBlog with the JRC Quotes corpus in order to check the robustness of its annotation. We concentrate on its coarse-grained labels and carry out a deep Machine Learning experimentation also with the inclusion of lexical resources. The results obtained show a similarity with the ones obtained with the JRC Quotes corpus demonstrating the EmotiBlog validity as a resource for the SA task.
Resumo:
This paper describes a module for the prediction of emotions in text chats in Spanish, oriented to its use in specific-domain text-to-speech systems. A general overview of the system is given, and the results of some evaluations carried out with two corpora of real chat messages are described. These results seem to indicate that this system offers a performance similar to other systems described in the literature, for a more complex task than other systems (identification of emotions and emotional intensity in the chat domain).
Resumo:
The aim of this study was to explore the experience of service providers in Spain regarding their daily professional encounters with battered immigrant women and their perception of this group’s help-seeking process and the eventual abandonment of the same. Twenty-nine in-depth interviews and four focus group discussions were conducted with a total of 43 professionals involved in providing support to battered immigrant women. We interviewed social workers, psychologists, intercultural mediators, judges, lawyers, and public health professionals from Spain. Through qualitative content analysis, four categories emerged: (a) frustration with the victim’s decision to abandon the help-seeking process, (b) ambivalent positions regarding differences between immigrant and Spanish women, (c) difficulties in the migratory process that may hinder the help-seeking process, and (d) criticisms regarding the inefficiency of existing resources. The four categories were cross-cut by an overarching theme: helping immigrant women not to abandon the help-seeking process as a chronicle of anticipated failure. The main reasons that emerged for abandoning the help-seeking process involved structural factors such as economic dependence, loss of social support after leaving their country of origin, and limited knowledge about available resources. The professionals perceived their encounters with battered immigrant women to be frustrating and unproductive because they felt that they had few resources to back them up. They felt that despite the existence of public policies targeting intimate partner violence (IPV) and immigration in Spain, the resources dedicated to tackling gender-based violence were insufficient to meet battered immigrant women’s needs. Professionals should be trained both in the problem of IPV and in providing support to the immigrant population.
Resumo:
EmotiBlog is a corpus labelled with the homonymous annotation schema designed for detecting subjectivity in the new textual genres. Preliminary research demonstrated its relevance as a Machine Learning resource to detect opinionated data. In this paper we compare EmotiBlog with the JRC corpus in order to check the EmotiBlog robustness of annotation. For this research we concentrate on its coarse-grained labels. We carry out a deep ML experimentation also with the inclusion of lexical resources. The results obtained show a similarity with the ones obtained with the JRC demonstrating the EmotiBlog validity as a resource for the SA task.
Resumo:
In the chemical textile domain experts have to analyse chemical components and substances that might be harmful for their usage in clothing and textiles. Part of this analysis is performed searching opinions and reports people have expressed concerning these products in the Social Web. However, this type of information on the Internet is not as frequent for this domain as for others, so its detection and classification is difficult and time-consuming. Consequently, problems associated to the use of chemical substances in textiles may not be detected early enough, and could lead to health problems, such as allergies or burns. In this paper, we propose a framework able to detect, retrieve, and classify subjective sentences related to the chemical textile domain, that could be integrated into a wider health surveillance system. We also describe the creation of several datasets with opinions from this domain, the experiments performed using machine learning techniques and different lexical resources such as WordNet, and the evaluation focusing on the sentiment classification, and complaint detection (i.e., negativity). Despite the challenges involved in this domain, our approach obtains promising results with an F-score of 65% for polarity classification and 82% for complaint detection.
Resumo:
This article is the English version of “Terminología y traducción económica francés-español: evaluación de recursos terminológicos en el ámbito contable” by Daniel Gallego Hernández. It was not published on the print version of MonTI for reasons of space. The online version of MonTI does not suffer from these limitations, and this is our way of promoting plurilingualism.
Resumo:
The aim of this paper is to describe the use that professional translators make of corpora as translation resources. First, we briefly review the literature on translation practitioners’ use of corpora in the contexts of both translation training and professional translation. Then we present our survey-based study, analyse the uptake of corpora among Spanish translators and describe the use of this kind of translation resource. The results show that even if corpora are not as frequently used as other kinds of resources, such as dictionaries, there are professional translators who do use corpora, in a variety of ways, in their work. Additionally, non-users do not seem entirely sceptical about corpora. Against that backdrop, translator trainers are invited to continue to report on how corpora can be used as translation resources.