26 resultados para Corpus Linguistics
em Universidad de Alicante
Resumo:
Este trabajo presenta la metodología empleada para compilar un corpus económico e identificar su terminología con el fin de crear un glosario de utilidad en la formación de traductores. Por una parte, se repasa brevemente la bibliografía sobre compilación de corpus y explotación con fines terminológicos. Por otra parte, se presenta la metodología en cuestión, así como una serie de actividades enfocadas a la adquisición de conocimiento especializado en economía. Los resultados muestran que las técnicas usadas para detectar términos y extraer automáticamente candidatos a término, si bien no terminan de adecuarse a las necesidades concretas del presente trabajo, son de utilidad e incluso pueden complementarse. Por su parte, las actividades propuestas pueden sumarse igualmente a otro tipo de actividades y modificarse según el contexto docente.
Resumo:
Estudi de l’evolució semàntica de enze, enza i de les unitats fraseològiques en què participa, des de les primeres atestacions escrites fins als usos contemporanis en català. S’hi té en compte l’evolució dels altres derivats romànics del llatí INDEX, -ICIS. S’hi aplica una anàlisi d’orientació cognitiva i es fonamenta l’estudi en l’aprofitament de corpora textuals (antics i contemporanis), dins dels quals hi ha l’obra literària i gramatical d’Enric Valor.
Resumo:
The reprise evidential conditional (REC) is nowadays not very usual in Catalan: it is restricted to journalistic language and to some very formal genres (such as academic or legal language), it is not present in spontaneous discourse. On the one hand, it has been described among the rather new modality values of the conditional. On the other, the normative tradition tended to reject it for being a gallicism, or to describe it as an unsuitable neologism. Thanks to the extraction from text corpora, we surprisingly find this REC in Catalan from the beginning of the fourteenth century to the contemporary age, with semantic and pragmatic nuances and different evidence of grammaticalization. Due to the current interest in evidentiality, the REC has been widely studied in French, Italian and Portuguese, focusing mainly on its contemporary uses and not so intensively on the diachronic process that could explain the origin of this value. In line with this research, that we initiated studying the epistemic and evidential future in Catalan, our aim is to describe: a) the pragmatic context that could have been the initial point of the REC in the thirteenth century, before we find indisputable attestations of this use; b) the path of semantic change followed by the conditional from a ‘future in the past’ tense to the acquisition of epistemic and evidential values; and c) the role played by invited inferences, subjectification and intersubjectification in this change.
Resumo:
El proyecto Araknion tiene como objetivo general dotar al español y al catalán de una infraestructura básica de recursos lingüísticos para el procesamiento semántico de corpus en el marco de la Web 2.0 sean de origen oral o escrito.
Resumo:
The importance of the new textual genres such as blogs or forum entries is growing in parallel with the evolution of the Social Web. This paper presents two corpora of blog posts in English and in Spanish, annotated according to the EmotiBlog annotation scheme. Furthermore, we created 20 factual and opinionated questions for each language and also the Gold Standard for their answers in the corpus. The purpose of our work is to study the challenges involved in a mixed fact and opinion question answering setting by comparing the performance of two Question Answering (QA) systems as far as mixed opinion and factual setting is concerned. The first one is open domain, while the second one is opinion-oriented. We evaluate separately the two systems in both languages and propose possible solutions to improve QA systems that have to process mixed questions.
Resumo:
The exponential growth of the subjective information in the framework of the Web 2.0 has led to the need to create Natural Language Processing tools able to analyse and process such data for multiple practical applications. They require training on specifically annotated corpora, whose level of detail must be fine enough to capture the phenomena involved. This paper presents EmotiBlog – a fine-grained annotation scheme for subjectivity. We show the manner in which it is built and demonstrate the benefits it brings to the systems using it for training, through the experiments we carried out on opinion mining and emotion detection. We employ corpora of different textual genres –a set of annotated reported speech extracted from news articles, the set of news titles annotated with polarity and emotion from the SemEval 2007 (Task 14) and ISEAR, a corpus of real-life self-expressed emotion. We also show how the model built from the EmotiBlog annotations can be enhanced with external resources. The results demonstrate that EmotiBlog, through its structure and annotation paradigm, offers high quality training data for systems dealing both with opinion mining, as well as emotion detection.
Resumo:
In this paper we present a method to automatically identify linguistic contexts which contain possible causes of emotions or emotional states from Italian newspaper articles (La Repubblica Corpus). Our methodology is based on the interplay between relevant linguistic patterns and an incremental repository of common sense knowledge on emotional states and emotion eliciting situations. Our approach has been evaluated with respect to manually annotated data. The results obtained so far are satisfying and support the validity of the methodology proposed.
Resumo:
This paper presents the automatic extension to other languages of TERSEO, a knowledge-based system for the recognition and normalization of temporal expressions originally developed for Spanish. TERSEO was first extended to English through the automatic translation of the temporal expressions. Then, an improved porting process was applied to Italian, where the automatic translation of the temporal expressions from English and from Spanish was combined with the extraction of new expressions from an Italian annotated corpus. Experimental results demonstrate how, while still adhering to the rule-based paradigm, the development of automatic rule translation procedures allowed us to minimize the effort required for porting to new languages. Relying on such procedures, and without any manual effort or previous knowledge of the target language, TERSEO recognizes and normalizes temporal expressions in Italian with good results (72% precision and 83% recall for recognition).
Resumo:
This paper presents a multi-layered Question Answering (Q.A.) architecture suitable for enhancing current Q.A. capabilities with the possibility of processing complex questions. That is, questions whose answer needs to be gathered from pieces of factual information scattered in different documents. Specifically, we have designed a layer oriented to process the different types of temporal questions. Complex temporal questions are first decomposed into simpler ones, according to the temporal relationships expressed in the original question. In the same way, the answers of each simple question are re-composed, fulfilling the temporal restrictions of the original complex question. Using this architecture, a Temporal Q.A. system has been developed. In this paper, we focus on explaining the first part of the process: the decomposition of the complex questions. Furthermore, it has been evaluated with the TERQAS question corpus of 112 temporal questions. For the task of question splitting our system has performed, in terms of precision and recall, 85% and 71%, respectively.
Resumo:
IARG-AnCora tiene como objetivo la anotación con papeles temáticos de los argumentos implícitos de las nominalizaciones deverbales en el corpus AnCora. Estos corpus servirán de base para los sistemas de etiquetado automático de roles semánticos basados en técnicas de aprendizaje automático. Los analizadores semánticos son componentes básicos en las aplicaciones actuales de las tecnologías del lenguaje, en las que se quiere potenciar una comprensión más profunda del texto para realizar inferencias de más alto nivel y obtener así mejoras cualitativas en los resultados.
Resumo:
This paper examines both theoretical and practical issues related to conversion. A quite detailed characterization of the 5329 instances identified in a 300.000-word corpus of American English written in the late 90s is provided. The examples are grouped according to the type of conversion involved. Frequency and the internal structure of words are also considered and compared with the results obtained by earlier scholars. In spite of the limitations that a corpus study imposes, the conclusions obtained seem to suggest that any item, independent of its morphological structure, may undergo conversion and this may happen in any register. Moreover, conversion seems to be an important source of new items in American English nowadays.
Resumo:
El foco geográfico de un documento identifica el lugar o lugares en los que se centra el contenido del texto. En este trabajo se presenta una aproximación basada en corpus para la detección del foco geográfico en el texto. Frente a otras aproximaciones que se centran en el uso de información puramente geográfica para la detección del foco, nuestra propuesta emplea toda la información textual existente en los documentos del corpus de trabajo, partiendo de la hipótesis de que la aparición de determinados personajes, eventos, fechas e incluso términos comunes, pueden resultar fundamentales para esta tarea. Para validar nuestra hipótesis, se ha realizado un estudio sobre un corpus de noticias geolocalizadas que tuvieron lugar entre los años 2008 y 2011. Esta distribución temporal nos ha permitido, además, analizar la evolución del rendimiento del clasificador y de los términos más representativos de diferentes localidades a lo largo del tiempo.
Resumo:
This paper describes a stage in the COMENEGO project, which is creating comparable corpora of Business texts in order to distribute them among translation practitioners so that they can use this resource when translating economic, business or financial texts. This stage consists of discursive analysis of a pilot specialised corpus initially compiled in French and Spanish. Its textual resources are classified in different categories which need to be confirmed so that they can be useful when including them into the virtual platform which will allow users exploit the corpus and filter their searches according to their specific needs. The aim of this paper is to propose a discursive analysis approach based on the concept of ‘metadiscourse’ (Hyland, 2005).
Resumo:
Il semble que peu d’importance ait été accordée à la langue générale dans la bibliographie sur la traduction économique bien qu’elle puisse en fait poser problème lors de sa traduction, tout au moins dans le cadre de la formation de traducteurs. Dans cet article nous traitons du comportement traductologique espagnol-français des locutions prépositionnelles. Nous nous pencherons d’abord sur les problèmes conceptuels de ce phénomène linguistique pour ensuite identifier et classer les locutions répertoriées dans notre corpus. Enfin, nous commentons leurs traductions. Les résultats peuvent être pris en considération, entre autres, dans l’enseignement de la traduction.
Metadiscurso y traducción en el lenguaje de los negocios: estudio basado en corpus (francés-español)
Resumo:
En este artículo estudiamos el concepto de metadiscurso, que puede entenderse, en esencia, como el conjunto de elementos retóricos utilizados según los objetivos de la comunicación. Nuestro objetivo es conocer, por una parte, el esquema metadiscursivo propio de los mensajes o cartas de presidentes en los informes anuales de las sociedades, y, por otra parte, el comportamiento traductológico francés-español de estos elementos microtextuales. Los resultados muestran que estos textos tienen su propio esquema metadiscursivo y que los traductores suelen respetar su estructura, si bien introducen nuevos tipos. Asimismo, los resultados pueden tenerse en cuenta en la enseñanza de la traducción y de la lengua de los negocios.