997 resultados para Parallel corpora


Relevância:

60.00% 60.00%

Publicador:

Resumo:

This monograph is a comparative study of past time reference in Spanish and Russian.The ambition is to present a functional perspective of how both languages systemically express temporal and aspectual information. The verb, naturally, attracts the main attention of the thesis and the focus is almost exclusively on verbs in the indicative mood.The definition of the parameters of Time and Aspect plays an important part in the present dissertation. A particular emphasis concerns the elaboration and testing of the ‘ABC’ model, which represents a graphic definition of verbal aspect as a grammatical category.Another important issue is the distinction of aspect and Aktionsart; these concepts are closely related but operate on different functional levels. The analysis is essentially based on linguistic material from parallel corpora, constructed for this purpose. At first the material is treated statistically in order to create astarting point for the qualitative part of the analysis. The three main areas of investigation dealt with are:1) The relation between the simple past tenses, pretérito and imperfecto, in Spanish and their imperfective and perfective counterparts in Russian.2) The relation between the compound tenses in Spanish and the Russian verbal system.The analysis of this relation also comprises a critique of the traditional interpretationof the aspectual contents of the compound tenses.3) The usage of alternative strategies in both languages. In this part of the analysis the focus is widened to include verbal periphrasis, infinite verb forms and subordination. The results of the analysis demonstrate that verbal aspect, according to the definition represented by ‘the ABC model’, works as a grammatical category in both Spanish and Russian. It is also shown that there are systemic differences in the manifestation of this functional category in both languages. Another important result is that the neither the compound tenses nor the progressive express verbal aspect, at least not in a narrow sense of the word but represent different verbal functions related to aspect.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Following the internationalization of contemporary higher education, academic institutions based in non-English speaking countries are increasingly urged to produce contents in English to address international prospective students and personnel, as well as to increase their attractiveness. The demand for English translations in the institutional academic domain is consequently increasing at a rate exceeding the capacity of the translation profession. Resources for assisting non-native authors and translators in the production of appropriate texts in L2 are therefore required in order to help academic institutions and professionals streamline their translation workload. Some of these resources include: (i) parallel corpora to train machine translation systems and multilingual authoring tools; and (ii) translation memories for computer-aided tools. The purpose of this study is to create and evaluate reference resources like the ones mentioned in (i) and (ii) through the automatic sentence alignment of a large set of Italian and English as a Lingua Franca (ELF) institutional academic texts given as equivalent but not necessarily parallel (i.e. translated). In this framework, a set of aligning algorithms and alignment tools is examined in order to identify the most profitable one(s) in terms of accuracy and time- and cost-effectiveness. In order to determine the text pairs to align, a sample is selected according to document length similarity (characters) and subsequently evaluated in terms of extent of noisiness/parallelism, alignment accuracy and content leverageability. The results of these analyses serve as the basis for the creation of an aligned bilingual corpus of academic course descriptions, which is eventually used to create a translation memory in TMX format.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

L'obiettivo della tesi è la compilazione del glossario culinario italiano-russo che “racchiudere” termini culinari artusiani e propone una versione russa basandosi anche sulla traduzione parziale del libro in lingua russa. La tesi si divide in sette parti: introduzione, capitoli primo, secondo, terzo e quarto, conclusione e bibliografia. Il primo capitolo introduce la figura di Pellegrino Artusi con brevi cenni sulla sua vita e tratteggia, altresì, le peripezie ed il successo internazionale della sua opera ed il suo approdo in Russia. Il secondo capitolo è dedicato alla ricerca terminologica e alle fasi propedeutiche alla creazione del glossario. Inoltre, vengono spiegate le risorse usate per la creazione dei corpora. Avendo a disposizione la traduzione parziale de “La scienza in cucina e l'arte di mangiar bene” in russo (traduzione di I. Alekberova) fornita dalla Casa Artusi, si cerca di spiegare la scelta dei termini italiani messi a confronto con quelli esistenti nella traduzione russa. Il terzo capitolo introduce il glossario stesso preceduto da una breve spiegazione. Ogni “entrata” contiene il termine, la sua categoria grammaticale e la sua definizione in entrambe le lingue, seguita nella maggior parte dei casi dalle collocazioni o dagli esempi d'uso oppure dai sinonimi. Il quarto capitolo presenta commenti alla compilazione del glossario. Qui vengono analizzati i problemi riscontrati durante la fase compilativa, si presentano le soluzioni trovate e si forniscono esempi concreti. Ci sono anche commenti alle voci non presenti nel glossario. Infine, vi è una breve conclusione del percorso affrontato seguita dalla bibliografia e dalla sitografia. ENGLISH The purpose of this dissertation is to present a bilingual Italian-Russian glossary based on the culinary terms drawn from Artusi's cooking book "The Science of Cooking and the Art of Fine dining". The dissertation consists of an introduction, 4 chapters, conclusions and a list of bibliography. An introduction presents an overview of the entire dissertation. The first chapter includes a presentation of Pellegrino Artusi, brief introduction to his life, his book and the success it has had around the world and mainly in Russia. The second chapter focuses on the creation and use of comparable and parallel corpora that have been created ad hoc for the purpose of the glossary. It also describes the different programs that have been used in order to select the terminology. The third chapter presents the structure of the bilingual culinary glossary followed by the glossary itself. Each entry contains the term, its gramatical category and the definition in both languages followed by, in most but not all cases, collocation, synonyms and additional notes. The fourth chapter presents an analysis of the compilation of the glossary combined with comments and examples. This chapter is followed by final conclusions of the present dissertation. The last part contains a bibliography that includes all the resources that have been used for the completion of this dissertation followed by the webliography.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Europarl is a large multilingual corpus containing the minutes of the debates at the European Parliament. This article presents a method to extract different corpora from Europarl: monolingual and multilingual comparable corpora, as well as parallel corpora. Using state-of-the-art measures of homogeneity, we show that these corpora are very similar. In addition, we argue that they present many advantages for research in various fields of linguistics and translation studies, and we also discuss some of their limitations. We conclude by reviewing a number of previous studies that made use of these corpora, emphasizing in each case the possibilities offered by Europarl.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The various meanings of discourse connectives like while and however are difficult to identify and annotate, even for trained human annotators. This problem is all the more important that connectives are salient textual markers of cohesion and need to be correctly interpreted for many NLP applications. In this paper, we suggest an alternative route to reach a reliable annotation of connectives, by making use of the information provided by their translation in large parallel corpora. This method thus replaces the difficult explicit reasoning involved in traditional sense annotation by an empirical clustering of the senses emerging from the translations. We argue that this method has the advantage of providing more reliable reference data than traditional sense annotation. In addition, its simplicity allows for the rapid constitution of large annotated datasets.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

El intérprete de conferencias debe llevar a cabo un trabajo documental antes, durante y después de los eventos en los que presta sus servicios, independientemente de su subcompetencia extralingüística. Desafortunadamente, pocas son las propuestas metodológicas que se hayan planteado para que este profesional pueda realizar esta tarea de manera sistemática. En el presente artículo, repasamos algunos de los trabajos que se han referido a las posibilidades que tiene el intérprete de satisfacer sus necesidades informativas. Una vez reseñada la mencionada escasez de propuestas, presentamos, en un estudio de caso, una aproximación metodológica a este trabajo de documentación, fundamentada en la compilación de corpus paralelos ad hoc y la extracción terminológica en forma de glosarios.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We use parallel weighted finite-state transducers to implement a part-of-speech tagger, which obtains state-of-the-art accuracy when used to tag the Europarl corpora for Finnish, Swedish and English. Our system consists of a weighted lexicon and a guesser combined with a bigram model factored into two weighted transducers. We use both lemmas and tag sequences in the bigram model, which guarantees reliable bigram estimates.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Pós-graduação em Estudos Linguísticos - IBILCE

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Software corpora facilitate reproducibility of analyses, however, static analysis for an entire corpus still requires considerable effort, often duplicated unnecessarily by multiple users. Moreover, most corpora are designed for single languages increasing the effort for cross-language analysis. To address these aspects we propose Pangea, an infrastructure allowing fast development of static analyses on multi-language corpora. Pangea uses language-independent meta-models stored as object model snapshots that can be directly loaded into memory and queried without any parsing overhead. To reduce the effort of performing static analyses, Pangea provides out-of-the box support for: creating and refining analyses in a dedicated environment, deploying an analysis on an entire corpus, using a runner that supports parallel execution, and exporting results in various formats. In this tool demonstration we introduce Pangea and provide several usage scenarios that illustrate how it reduces the cost of analysis.