967 resultados para Comparable corpora
Resumo:
Software corpora facilitate reproducibility of analyses, however, static analysis for an entire corpus still requires considerable effort, often duplicated unnecessarily by multiple users. Moreover, most corpora are designed for single languages increasing the effort for cross-language analysis. To address these aspects we propose Pangea, an infrastructure allowing fast development of static analyses on multi-language corpora. Pangea uses language-independent meta-models stored as object model snapshots that can be directly loaded into memory and queried without any parsing overhead. To reduce the effort of performing static analyses, Pangea provides out-of-the box support for: creating and refining analyses in a dedicated environment, deploying an analysis on an entire corpus, using a runner that supports parallel execution, and exporting results in various formats. In this tool demonstration we introduce Pangea and provide several usage scenarios that illustrate how it reduces the cost of analysis.
Resumo:
Discourse connectives are lexical items indicating coherence relations between discourse segments. Even though many languages possess a whole range of connectives, important divergences exist cross-linguistically in the number of connectives that are used to express a given relation. For this reason, connectives are not easily paired with a univocal translation equivalent across languages. This paper is a first attempt to design a reliable method to annotate the meaning of discourse connectives cross-linguistically using corpus data. We present the methodological choices made to reach this aim and report three annotation experiments using the framework of the Penn Discourse Tree Bank.
La evidencia y la investigación con corpora discursivos : Ideología, interdiscurso, problematización
La evidencia y la investigación con corpora discursivos : Ideología, interdiscurso, problematización
La evidencia y la investigación con corpora discursivos : Ideología, interdiscurso, problematización
Resumo:
El objetivo de este Proyecto Fin de Carrera es abordar el análisis del capítulo de conclusiones de tesis de ingeniería de telecomunicación a partir de un corpus comparable en inglés y español. A través del léxico podrán conocerse las expresiones típicas y la estructura de capítulo de conclusiones, tanto en inglés como en español. Para empezar este Proyecto, se ha compilado los corpus que se quieren comparar, en total se ha digitalizado tres corpus, uno con 24 conclusiones de tesis doctorales en español, otro con el mismo número de capítulos de conclusiones de tesis doctorales en inglés (PhD) y por último un corpus de conclusiones de tesis de fin de máster y de grado. El primer análisis que se ha realizado es el de la estructura de las conclusiones a partir de los títulos y subtítulos del capítulo. Se han comparado los títulos más utilizados y se han comentado las coincidencias y diferencias entre los corpus. La estructura vista a través de los subtítulos, se ha comparado con la propuesta por la autora Glasman-Deal (2011) en trabajos académicos de investigación, principalmente en artículos de investigación. La siguiente parte del Proyecto se ha centrado en el estudio del léxico, para ello nos hemos ayudado de la herramienta informática Wordsmith tools de la que se han explicado sus herramientas y funciones más útiles para este trabajo entre ellas el plot, que informa número de archivos en la que aparece cada palabra en el corpus. Las palabras con mayor plot son las más usadas por todos los doctorandos cuando escriben el capítulo de conclusiones .Se han elaborado unas pirámides donde se han colocado las palabras propias del género académico de las tesis por orden de uso. Las más usadas, con mayor plot, en la base y según se asciende aparecen las que tienen menor plot, con el fin de ver de una forma gráfica el peso que tiene cada palabra en el corpus. El siguiente paso del análisis del léxico ha tenido el objetivo de diferenciar los contextos de uso de las palabras incluidas en las pirámides. Se ha diferenciado entre los usos de las palabras dependiendo de su denotación académica o técnica. Esta comparación ha permitido comprobar que dentro del mismo corpus un sustantivo como contribuciones tiene connotación positiva o negativa dependiendo del contexto. Con los ejemplos aportados por los corpus se proporciona una base para el análisis lingüístico, centrado en los sustantivos, en este trabajo. Para finalizar el Proyecto, se ha implementado una base de datos con los resultados obtenidos del análisis de los sustantivos en la que se pueden ver las palabras que corresponden a cada nivel de la pirámide y ejemplos del uso de estas palabras. The aim of this Project is to analyze the concluding chapter of PhD thesis in the field of telecommunication engineering by means of a comparable corpus in English and Spanish. Through the lexis we will be able to capture useful expressions and the typical structure of the chapter in these specialized thesis, either in English and Spanish. To start with, three corpora have been compiled. The first one consists of 24 concluding chapters of PhD thesis in Spanish; the second, is made up of the same number of chapters of PhD thesis in the English language; and finally, 24 further chapters of Master and Degree thesis in English were digitalized and prepared for lexis analysis. Second, the study of the structure of the chapter of conclusions has been carried out. In this part the most common titles in the chapter of conclusions have been analysed and compared so as to find differences and similarities between the two languages compared. Moreover, the structure found through the subtitles in the conclusions of the thesis has been compared with the structure proposed by Glasman-Deal (2011) in her book Science Research Writing. Third, the study has been focused on the lexis of each corpus. These corpora have been treated with a lexis analyser called Wordsmith tools. The variables of frequency and plot have been applied to withdraw the most widely used nouns from the list of all the words found in any of the corpus. A pyramidal structure has been designed in order to show the academic or gender nouns - the ones usually found in the concluding chapter of thesis – nouns with a higher plot in the corpus. Two different types of context have been found for these nouns: technical and academic denotation. To show the difference in use of these nouns, arranged examples of contexts are given for each of the words studied. Finally, a database has been implemented to arrange the results of the lexis study. In this database the most significant examples of each noun are shown.
Resumo:
Sign.: A-P4, Q2
Resumo:
For the act of membrane fusion, there are two competing, mutually exclusive molecular models that differ in the structure of the initial pore, the pathway for ionic continuity between formerly separated volumes. Because biological “fusion pores” can be as small as ionic channels or gap junctions, one model posits a proteinaceous initial fusion pore. Because biological fusion pore conductance varies widely, another model proposes a lipidic initial pore. We have found pore opening and flickering during the fusion of protein-free phospholipid vesicles with planar phospholipid bilayers. Fusion pore formation appears to follow the coalescence of contacting monolayers to create a zone of hemifusion where continuity between the two adherent membranes is lipidic, but not aqueous. Hypotonic stress, causing tension in the vesicle membrane, promotes complete fusion. Pores closed soon after opening (flickering), and the distribution of fusion pore conductance appears similar to the distribution of initial fusion pores in biological fusion. Because small flickering pores can form in the absence of protein, the existence of small pores in biological fusion cannot be an argument in support of models based on proteinaceous pores. Rather, these results support the model of a lipidic fusion pore developing within a hemifused contact site.
Resumo:
El proyecto Araknion tiene como objetivo general dotar al español y al catalán de una infraestructura básica de recursos lingüísticos para el procesamiento semántico de corpus en el marco de la Web 2.0 sean de origen oral o escrito.
Resumo:
For references, please quote the full paper as published in the above journal.
Resumo:
This paper presents the automatic extension to other languages of TERSEO, a knowledge-based system for the recognition and normalization of temporal expressions originally developed for Spanish. TERSEO was first extended to English through the automatic translation of the temporal expressions. Then, an improved porting process was applied to Italian, where the automatic translation of the temporal expressions from English and from Spanish was combined with the extraction of new expressions from an Italian annotated corpus. Experimental results demonstrate how, while still adhering to the rule-based paradigm, the development of automatic rule translation procedures allowed us to minimize the effort required for porting to new languages. Relying on such procedures, and without any manual effort or previous knowledge of the target language, TERSEO recognizes and normalizes temporal expressions in Italian with good results (72% precision and 83% recall for recognition).
Resumo:
There is no question nowadays as to the international and powerful status of English at a global scale and, consequently, as to its presence in non-English speaking countries at different levels. Linguistically speaking, English is one of the languages which have mostly influenced Spanish throughout its history and especially from the late 1960s. In this study, the impact of English on Spanish is considered in the language of sports; particularly, sports Anglicisms and false Anglicisms are analysed. Due attention is paid to the different forms that an Anglicism may adopt and to which of those forms are more widely accepted or rejected by prescriptivists and speakers at large, in the light of a contrastive analysis of their appearance in the Nuevo diccionario de anglicismos, the Diccionario de la Real Academia Española and the Corpus de Referencia del Español Actual.
Resumo:
The aim of this paper is to describe the use that professional translators make of corpora as translation resources. First, we briefly review the literature on translation practitioners’ use of corpora in the contexts of both translation training and professional translation. Then we present our survey-based study, analyse the uptake of corpora among Spanish translators and describe the use of this kind of translation resource. The results show that even if corpora are not as frequently used as other kinds of resources, such as dictionaries, there are professional translators who do use corpora, in a variety of ways, in their work. Additionally, non-users do not seem entirely sceptical about corpora. Against that backdrop, translator trainers are invited to continue to report on how corpora can be used as translation resources.