959 resultados para Corpus-based Translation Studies
Resumo:
This paper describes methods and results for the annotation of two discourse-level phenomena, connectives and pronouns, over a multilingual parallel corpus. Excerpts from Europarl in English and French have been annotated with disambiguation information for connectives and pronouns, for about 3600 tokens. This data is then used in several ways: for cross-linguistic studies, for training automatic disambiguation software, and ultimately for training and testing discourse-aware statistical machine translation systems. The paper presents the annotation procedures and their results in detail, and overviews the first systems trained on the annotated resources and their use for machine translation.
Resumo:
This paper describes a preprocessing module for improving the performance of a Spanish into Spanish Sign Language (Lengua de Signos Espanola: LSE) translation system when dealing with sparse training data. This preprocessing module replaces Spanish words with associated tags. The list with Spanish words (vocabulary) and associated tags used by this module is computed automatically considering those signs that show the highest probability of being the translation of every Spanish word. This automatic tag extraction has been compared to a manual strategy achieving almost the same improvement. In this analysis, several alternatives for dealing with non-relevant words have been studied. Non-relevant words are Spanish words not assigned to any sign. The preprocessing module has been incorporated into two well-known statistical translation architectures: a phrase-based system and a Statistical Finite State Transducer (SFST). This system has been developed for a specific application domain: the renewal of Identity Documents and Driver's License. In order to evaluate the system a parallel corpus made up of 4080 Spanish sentences and their LSE translation has been used. The evaluation results revealed a significant performance improvement when including this preprocessing module. In the phrase-based system, the proposed module has given rise to an increase in BLEU (Bilingual Evaluation Understudy) from 73.8% to 81.0% and an increase in the human evaluation score from 0.64 to 0.83. In the case of SFST, BLEU increased from 70.6% to 78.4% and the human evaluation score from 0.65 to 0.82.
Resumo:
Basándonos en la recopilación inicial de preposiciones, locuciones preposicionales, términos con preposición dependiente y phrasal verbs utilizados en el texto técnico realizada en otros proyectos anteriores del Departamento de Lingüística Aplicada a la Ciencia y a la Tecnología, el objetivo de este trabajo es completar, organizar, actualizar y dar visibilidad a esta información inicial. Tras realizar un proceso exhaustivo de verificación, unificación, clasificación y ampliación de la información existente, en caso necesario, el listado resultante se utiliza para elaborar un glosario de términos con preposición. El objetivo final de este proyecto es que este glosario esté a disposición de los usuarios, a través de una consulta on-line, en la página del ILLLab (http://illlab.euitt.upm.es/wordpress/), dependiente del Departamento de Lingüística Aplicada a la Ciencia y a la Tecnología. Para incluir en el glosario ejemplos actualizados de textos técnicos, se ha recopilado un corpus lingüístico de textos técnicos, tomando como base diferentes números de la revista IEEE Spectrum, en su edición digital, publicados entre los años 2009 y 2012. El objetivo de esta recopilación es la de ofrecer al consultante diferentes ejemplos de uso en el texto técnico de los distintos términos con preposición que componen el glosario, de manera que pueda acceder de manera rápida y sencilla a ejemplos de uso real de los términos que está buscando, con objeto de clarificar aspectos relacionados con su uso o, en su caso, facilitar su aprendizaje. Toda esta información, tanto el listado de términos con preposición como las frases pertenecientes al corpus recopilado, se incorpora a una base de datos, alojada dentro de la misma página web del ILLLab. A través de un formulario de consulta, a disposición del usuario en dicha página, se pueden obtener todos los términos recopilados que coincidan con los criterios de búsqueda introducidos. El usuario puede realizar dos tipos de búsqueda principales: por preposición o por término completo. Además, puede elegir una búsqueda global (entre todos los términos que integran el glosario) o parcial (en una sola de las categorías en las que se han dividido los diferentes términos, de acuerdo con su función gramatical). Por último, se presentan unas estadísticas de uso de los términos recopilados dentro de los diferentes textos que integran el corpus lingüístico, de manera que pueda establecerse una relación de los que aparecen con más frecuencia en el texto técnico. ABSTRACT. Based on the initial collection of prepositions, prepositional phrases, dependent prepositions and phrasal verbs used in technical texts collected on previous projects in the Department of Applied Linguistics to Science and Technology, the aim of this project is to improve, organize, update and provide visibility to this initial information. Following a process of verification, unification, classification and extension of existing information, if necessary, a glossary of terms with preposition is built. The ultimate objective of this project is to make this glossary available to users through an online consultation in the ILLLab webpage (http://illlab.euitt.upm.es/wordpress/). The administration of tis webpage depends of the Department of Applied Linguistics in Science and Technology. A linguistic corpus of technical texts has been compiled, based on different numbers of the IEEE Spectrum magazine, in its online edition, published between the years 2009 and 2012. The aim of this collection is to provide different examples of use in the technical text for the terms included in the glossary, so that examples of the actual use of the terms consulted can be easily and quickly accessed, in order to clarify doubts regarding their meaning or translation into Spanish and facilitate learning. All this information, both the list of terms with prepositional phrases as well as the corpus developed, is incorporated in a database. Through a searching form, the ILLLab's user may obtain all the terms matching the search criteria entered. The user can perform two types of main search: by preposition or by full term. Additionally, a global search can be selected (including all terms included in the glossary) or a partial one (including only one of the glossary's categories). Finally, some statistics of use are presented according to the various texts included in the corpus, so a relation of the most frequent prepositions in the technical text can be established.
Resumo:
R2RML is used to specify transformations of data available in relational databases into materialised or virtual RDF datasets. SPARQL queries evaluated against virtual datasets are translated into SQL queries according to the R2RML mappings, so that they can be evaluated over the underlying relational database engines. In this paper we describe an extension of a well-known algorithm for SPARQL to SQL translation, originally formalised for RDBMS-backed triple stores, that takes into account R2RML mappings. We present the result of our implementation using queries from a synthetic benchmark and from three real use cases, and show that SPARQL queries can be in general evaluated as fast as the SQL queries that would have been generated by SQL experts if no R2RML mappings had been used.
Resumo:
We have used Mössbauer and electron paramagnetic resonance (EPR) spectroscopy to study a heme-N-alkylated derivative of chloroperoxidase (CPO) prepared by mechanism-based inactivation with allylbenzene and hydrogen peroxide. The freshly prepared inactivated enzyme (“green CPO”) displayed a nearly pure low-spin ferric EPR signal with g = 1.94, 2.15, 2.31. The Mössbauer spectrum of the same species recorded at 4.2 K showed magnetic hyperfine splittings, which could be simulated in terms of a spin Hamiltonian with a complete set of hyperfine parameters in the slow spin fluctuation limit. The EPR spectrum of green CPO was simulated using a three-term crystal field model including g-strain. The best-fit parameters implied a very strong octahedral field in which the three 2T2 levels of the (3d)5 configuration in green CPO were lowest in energy, followed by a quartet. In native CPO, the 6A1 states follow the 2T2 ground state doublet. The alkene-mediated inactivation of CPO is spontaneously reversible. Warming of a sample of green CPO to 22°C for increasing times before freezing revealed slow conversion of the novel EPR species to two further spin S = ½ ferric species. One of these species displayed g = 1.82, 2.25, 2.60 indistinguishable from native CPO. By subtracting spectral components due to native and green CPO, a third species with g = 1.86, 2.24, 2.50 could be generated. The EPR spectrum of this “quasi-native CPO,” which appears at intermediate times during the reactivation, was simulated using best-fit parameters similar to those used for native CPO.
Resumo:
Objectives: To compare countries in western Europe with respect to class differences in mortality from specific causes of death and to assess the contributions these causes make to class differences in total mortality.
Resumo:
This paper describes a stage in the COMENEGO project, which is creating comparable corpora of Business texts in order to distribute them among translation practitioners so that they can use this resource when translating economic, business or financial texts. This stage consists of discursive analysis of a pilot specialised corpus initially compiled in French and Spanish. Its textual resources are classified in different categories which need to be confirmed so that they can be useful when including them into the virtual platform which will allow users exploit the corpus and filter their searches according to their specific needs. The aim of this paper is to propose a discursive analysis approach based on the concept of ‘metadiscourse’ (Hyland, 2005).
Resumo:
Statistical machine translation (SMT) is an approach to Machine Translation (MT) that uses statistical models whose parameter estimation is based on the analysis of existing human translations (contained in bilingual corpora). From a translation student’s standpoint, this dissertation aims to explain how a phrase-based SMT system works, to determine the role of the statistical models it uses in the translation process and to assess the quality of the translations provided that system is trained with in-domain goodquality corpora. To that end, a phrase-based SMT system based on Moses has been trained and subsequently used for the English to Spanish translation of two texts related in topic to the training data. Finally, the quality of this output texts produced by the system has been assessed through a quantitative evaluation carried out with three different automatic evaluation measures and a qualitative evaluation based on the Multidimensional Quality Metrics (MQM).
Resumo:
2nd ed.
Resumo:
From the Introduction. In the aftermath of the EU’s enlargement towards Central and Eastern Europe, many scholars and observers of European integration were proclaiming that the French-German “engine” of Europe had come to an end. The political legitimacy of French-German initiatives was contested by coalitions of smaller member states and the ‘new Europe’ was calling for new leadership dynamics. However, the experience of the Eurozone debt crisis provided dramatic evidence that no alternative to the Franco-German partnership has yet to emerge in the enlarged EU. In a time of existential crisis, Franco-German initiatives appear to have remained the basic dynamic of integration. However, unlike in the past, agreements on steps forward have proven to be particularly difficult. This is largely due to these countries’ contrasting political economic policy ideas, cultures, and practices....the paper analyses the ideational ‘frames’ of the two leaders while tracing their discursive interactions against changing background conditions since the European debt crisis was triggered by Greece in October 2009 until the last measures taken in 2012 before the French Presidential elections. The empirical analysis is based on a systematic corpus of press conferences and media interviews by Nicolas Sarkozy and Angela Merkel after European summits. It is complemented by a number of press interviews including some given by their respective Finance Ministers) and important speeches in that same period of time.
Resumo:
"P 106"--P. [4] of cover.