542 resultados para corpora allata


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Neste artigo procuramos reflectir sobre a função dos corpora na observação e análise de fenómenos de uma língua natural bem como na criação de novos recursos de exploração linguísticos que as tecnologias de informação têm vindo a potenciar e a tornar mais eficaz.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The automatic acquisition of lexical associations from corpora is a crucial issue for Natural Language Processing. A lexical association is a recurrent combination of words that co-occur together more often than expected by chance in a given domain. In fact, lexical associations define linguistic phenomena such as idiomes, collocations or compound words. Due to the fact that the sense of a lexical association is not compositionnal, their identification is fundamental for the realization of analysis and synthesis that take into account all the subtilities of the language. In this report, we introduce a new statistically-based architecture that extracts from naturally occurring texts contiguous and non contiguous. For that purpose, three new concepts have been defined : the positional N-gram models, the Mutual Expectation and the GenLocalMaxs algorithm. Thus, the initial text is fisrtly transformed in a set of positionnal N-grams i.e ordered vectors of simple lexical units. Then, an association measure, the Mutual Expectation, evaluates the degree of cohesion of each positional N-grams based on the identification of local maximum values of Mutual Expectation. Great efforts have also been carried out to evaluate our metodology. For that purpose, we have proposed the normalisation of five well-known association measures and shown that both the Mutual Expectation and the GenLocalMaxs algorithm evidence significant improvements comparing to existent metodologies.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Trabalho de Projecto apresentado para cumprimento dos requisitos necessários à obtenção do grau de Mestre em Didáctica Do Inglês

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Tese apresentada para cumprimento dos requisitos necessários à obtenção do grau de Doutor em Linguística – Lexicologia, Lexicografia e Terminologia e e Tese apresentada para cumprimento dos requisitos necessários à obtenção do grau de Doutor em Filologia e Língua Portugesa na Faculdade de Filosofia Letras e Ciências Humanas da Universidade de São Paulo

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The business sphere is a multilingual world where foreign language communication skills are crucial in international relations. It makes employers look for business professionals who have a high level of linguistic competences. Language proficiency increases the chances of negotiation among partners. There are mainly two obstacles that make barriers in formal communication in a foreign language: lack of knowledge of specific linguistic structures or terminology and frequent transitions from one language to another. This paper contributes to the quest for quick access to a wide range of English, Spanish and Russian online databases that provide authentic language samples. Their application may improve communication skills and facilitate preparation for business discourse.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The objective of the PANACEA ICT-2007.2.2 EU project is to build a platform that automates the stages involved in the acquisition,production, updating and maintenance of the large language resources required by, among others, MT systems. The development of a Corpus Acquisition Component (CAC) for extracting monolingual and bilingual data from the web is one of the most innovative building blocks of PANACEA. The CAC, which is the first stage in the PANACEA pipeline for building Language Resources, adopts an efficient and distributed methodology to crawl for web documents with rich textual content in specific languages and predefined domains. The CAC includes modules that can acquire parallel data from sites with in-domain content available in more than one language. In order to extrinsically evaluate the CAC methodology, we have conducted several experiments that used crawled parallel corpora for the identification and extraction of parallel sentences using sentence alignment. The corpora were then successfully used for domain adaptation of Machine Translation Systems.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

L'objectiu d'aquest informe és presentar l'aplicació d'una sèrie de propostes sobre transcripció, etiquetatge i codificació a dos corpus: el corpus bilingüe LC (La Canonja (Català-Espanyol)) i el corpus trilingüe CSCD (Code-switching as Communicative Design (Català-Espanyol-Anglès)). Aquestes propostes, que constitueixen l'aportació de l'equip IULA-LIPPS (Language Interaction in Plurilingual and Plurilectal Speakers) al manual de codificació del sistema LIDES (Language Interaction Database Exchange System), adoptat pel grup europeu LIPPS, poden ser útils per transcriure, etiquetar i codificar dades provinents de llengües tipològicament properes i distants.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This research investigates the phenomenon of translationese in two monolingual comparable corpora of original and translated Catalan texts. Translationese has been defined as the dialect, sub-language or code of translated language. This study aims at giving empirical evidence of translation universals regardless the source language.Traditionally, research conducted on translation strategies has been mainly intuition-based. Computational Linguistics and Natural Language Processing techniques provide reliable information of lexical frequencies, morphological and syntactical distribution in corpora. Therefore, they have been applied to observe which translation strategies occur in these corpora.Results seem to prove the simplification, interference and explicitation hypotheses, whereas no sign of normalization has been detected with the methodology used.The data collected and the resources created for identifying lexical, morphological and syntactic patterns of translations can be useful for Translation Studies teachers, scholars and students: teachers will have more tools to help students avoid the reproduction of translationese patterns. Resources developed will help in detecting non-genuine or inadequate structures in the target language. This fact may imply an improvement in stylistic quality in translations. Translation professionals can also take advantage of these resources to improve their translation quality.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Mielenkiintoinen tapaus

Relevância:

20.00% 20.00%

Publicador:

Resumo:

CoCo is a collaborative web interface for the compilation of linguistic resources. In this demo we are presenting one of its possible applications: paraphrase acquisition.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Dedicatio: Johan Parmen Timm [ruots. pr.].

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Arkit: 1 arkintunnukseton lehti, A4 B3.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We introduce a new tool for correcting OCR errors of materials in a repository of cultural materials. The poster is aimed to all who are interested in digital humanities and who might find our tool useful. The poster will focus on the OCR correction tool and on the background processes. We have started a project on materials published in Finno-Ugric languages in the Soviet Union in the 1920s and 1930s. The materials are digitised in Russia. As they arrive, we publish them in DSpace (fennougrica.kansalliskirjasto.fi). For research purposes, the results of the OCR must be corrected manually. For this we have built a new tool. Although similar tools exist, we found in-house development necessary in order to serve the researchers' needs. The tool enables exporting the corrected text as required by the researchers. It makes it possible to distribute the correction tasks and their supervision. After a supervisor has approved a text as finalised, the new version of the work will replace the old one in DSpace. The project has - benefitted the small language communities, - opened channels for cooperation in Russia. - increased our capabilities in digital humanities. The OCR correction tool will be available to others.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Poster at Open Repositories 2014, Helsinki, Finland, June 9-13, 2014

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Se analizan algunas investigaciones recientes que exploran las potenciales ventajas y el uso efectivo de un método de enseñanza del ingles como lengua extranjera y la enseñanza de la lingüística inglesa.