969 resultados para linguistic corpora
Resumo:
For references, please quote the full paper as published in the above journal.
Resumo:
The exponential growth of the subjective information in the framework of the Web 2.0 has led to the need to create Natural Language Processing tools able to analyse and process such data for multiple practical applications. They require training on specifically annotated corpora, whose level of detail must be fine enough to capture the phenomena involved. This paper presents EmotiBlog – a fine-grained annotation scheme for subjectivity. We show the manner in which it is built and demonstrate the benefits it brings to the systems using it for training, through the experiments we carried out on opinion mining and emotion detection. We employ corpora of different textual genres –a set of annotated reported speech extracted from news articles, the set of news titles annotated with polarity and emotion from the SemEval 2007 (Task 14) and ISEAR, a corpus of real-life self-expressed emotion. We also show how the model built from the EmotiBlog annotations can be enhanced with external resources. The results demonstrate that EmotiBlog, through its structure and annotation paradigm, offers high quality training data for systems dealing both with opinion mining, as well as emotion detection.
Resumo:
This paper presents the automatic extension to other languages of TERSEO, a knowledge-based system for the recognition and normalization of temporal expressions originally developed for Spanish. TERSEO was first extended to English through the automatic translation of the temporal expressions. Then, an improved porting process was applied to Italian, where the automatic translation of the temporal expressions from English and from Spanish was combined with the extraction of new expressions from an Italian annotated corpus. Experimental results demonstrate how, while still adhering to the rule-based paradigm, the development of automatic rule translation procedures allowed us to minimize the effort required for porting to new languages. Relying on such procedures, and without any manual effort or previous knowledge of the target language, TERSEO recognizes and normalizes temporal expressions in Italian with good results (72% precision and 83% recall for recognition).
Resumo:
In this paper we present an automatic system for the extraction of syntactic semantic patterns applied to the development of multilingual processing tools. In order to achieve optimum methods for the automatic treatment of more than one language, we propose the use of syntactic semantic patterns. These patterns are formed by a verbal head and the main arguments, and they are aligned among languages. In this paper we present an automatic system for the extraction and alignment of syntactic semantic patterns from two manually annotated corpora, and evaluate the main linguistic problems that we must deal with in the alignment process.
Resumo:
There is no question nowadays as to the international and powerful status of English at a global scale and, consequently, as to its presence in non-English speaking countries at different levels. Linguistically speaking, English is one of the languages which have mostly influenced Spanish throughout its history and especially from the late 1960s. In this study, the impact of English on Spanish is considered in the language of sports; particularly, sports Anglicisms and false Anglicisms are analysed. Due attention is paid to the different forms that an Anglicism may adopt and to which of those forms are more widely accepted or rejected by prescriptivists and speakers at large, in the light of a contrastive analysis of their appearance in the Nuevo diccionario de anglicismos, the Diccionario de la Real Academia Española and the Corpus de Referencia del Español Actual.
Resumo:
El intérprete de conferencias debe llevar a cabo un trabajo documental antes, durante y después de los eventos en los que presta sus servicios, independientemente de su subcompetencia extralingüística. Desafortunadamente, pocas son las propuestas metodológicas que se hayan planteado para que este profesional pueda realizar esta tarea de manera sistemática. En el presente artículo, repasamos algunos de los trabajos que se han referido a las posibilidades que tiene el intérprete de satisfacer sus necesidades informativas. Una vez reseñada la mencionada escasez de propuestas, presentamos, en un estudio de caso, una aproximación metodológica a este trabajo de documentación, fundamentada en la compilación de corpus paralelos ad hoc y la extracción terminológica en forma de glosarios.
Resumo:
Linguistic systems are the human tools to understand reality. But is it possible to attain this reality? The reality that we perceive, is it just a fragmented reality of which we are part? In this paper the authors present is an attempt to address this question from an epistemological and philosophic linguistic point of view.
Resumo:
Reality contains information (significant) that becomes significances in the mind of the observer. Language is the human instrument to understand reality. But is it possible to attain this reality? Is there an absolute reality, as certain philosophical schools tell us? The reality that we perceive, is it just a fragmented reality of which we are part? The work that the authors present is an attempt to address this question from an epistemological, linguistic and logical-mathematical point of view.
Resumo:
The aim of this paper is to describe the use that professional translators make of corpora as translation resources. First, we briefly review the literature on translation practitioners’ use of corpora in the contexts of both translation training and professional translation. Then we present our survey-based study, analyse the uptake of corpora among Spanish translators and describe the use of this kind of translation resource. The results show that even if corpora are not as frequently used as other kinds of resources, such as dictionaries, there are professional translators who do use corpora, in a variety of ways, in their work. Additionally, non-users do not seem entirely sceptical about corpora. Against that backdrop, translator trainers are invited to continue to report on how corpora can be used as translation resources.
Resumo:
The aim of this paper is to evaluate the efficacy of the application WebBootCaT to create specialised corpora automatically, investigating the translation of articles of association from Italian into English. The first section reflects on the relevant literature and proposes the utility of corpora for translators. The second section discusses the methodology employed, and the third section analyses the results obtained and comments on how language professionals could possibly exploit the application to its full. The fourth section provides a few concrete usage examples of the thus built corpora, to then conclude that WebBootCaT is a genuinely powerful tool that could be implemented by professional translators in order to save time and improve their translations in the long term.
Resumo:
When analysing software metrics, users find that visualisation tools lack support for (1) the detection of patterns within metrics; and (2) enabling analysis of software corpora. In this paper we present Explora, a visualisation tool designed for the simultaneous analysis of multiple metrics of systems in software corpora. Explora incorporates a novel lightweight visualisation technique called PolyGrid that promotes the detection of graphical patterns. We present an example where we analyse the relation of subtype polymorphism with inheritance and invocation in corpora of Smalltalk and Java systems and find that (1) subtype polymorphism is more likely to be found in large hierarchies; (2) as class hierarchies grow horizontally, they also do so vertically; and (3) in polymorphic hierarchies the length of the name of the classes is orthogonal to the cardinality of the call sites.