559 resultados para Corpora


Relevância:

10.00% 10.00%

Publicador:

Resumo:

Statistical machine translation (SMT) is an approach to Machine Translation (MT) that uses statistical models whose parameter estimation is based on the analysis of existing human translations (contained in bilingual corpora). From a translation student’s standpoint, this dissertation aims to explain how a phrase-based SMT system works, to determine the role of the statistical models it uses in the translation process and to assess the quality of the translations provided that system is trained with in-domain goodquality corpora. To that end, a phrase-based SMT system based on Moses has been trained and subsequently used for the English to Spanish translation of two texts related in topic to the training data. Finally, the quality of this output texts produced by the system has been assessed through a quantitative evaluation carried out with three different automatic evaluation measures and a qualitative evaluation based on the Multidimensional Quality Metrics (MQM).

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Il progetto di questa tesi nasce da un accordo stipulato tra il Laboratorio di Terminologia della Scuola di Lingue e Letterature, Traduzione e Interpretazione di Forlì e l’Organizzazione delle Nazioni Unite per l’Alimentazione e l’Agricoltura (FAO) per la stesura di tesi di laurea in ambito terminologico in collaborazione con il Servizio Programmazione e Documentazione delle Riunioni (CPAM) della FAO. Con il presente lavoro si intende svolgere una ricerca terminologica in inglese e in russo nell'ambito delle risorse genetiche animali e, nello specifico, delle biotecnologie e della salute animale e creare un database terminologico bilingue, inglese-russo, utilizzabile anche nei programmi di traduzione assistita. Nel primo capitolo viene presentata una panoramica della FAO, della sua storia, delle sue missioni e dei suoi obiettivi. Il secondo capitolo tratta la Divisione Produzione e Salute Animale della FAO che si occupa delle risorse genetiche animali e delle biotecnologie e salute animali, dominio e sottodominio del progetto per questa tesi. Il terzo capitolo riguarda le lingue speciali, le loro caratteristiche, il loro rapporto con la lingua comune e la tipologia testuale dei corpora utilizzati per la ricerca terminologica. Nel quarto capitolo si parla di terminologia e dell’attività terminologica illustrando approcci e correnti della terminologia moderna e descrivendo brevemente le peculiarità della terminologia delle biotecnologie. Infine, il quinto capitolo presenta l’intero progetto di ricerca terminologica, dalla fase preparatoria alla fase di revisione, soffermandosi, in particolare, sul procedimento di compilazione e conversione delle schede terminologiche per la creazione del database bilingue. In appendice, a fine tesi, sono allegate le mappe concettuali, le schede terminologiche del file di Microsoft Excel e il database terminologico bilingue inglese-russo.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Lo studio si occupa di fluency e analizza alcuni aspetti che la definiscono (pause vuote, pause piene, segnali discorsivi, riformulazioni). Si analizzano frequenza e durata di tali fenomeni, attraverso due corpora di produzioni orali di due gruppi di parlanti della lingua inglese: gli studenti italiani del corso di Mediazione Linguistica Interculturale della Scuola di Lingue, Letterature, Interpretazione e Traduzione di Forlì, Università di Bologna, e partecipanti britannici di un programma radiofonico. Si è ritenuto utile comparare le produzioni orali di studenti della lingua inglese a quelle di oratori pubblici madrelingua. Si è cercato di bilanciare i due corpora in termini di genere. Sono stati utilzzati i software Praat, per identificare la morfologia e la durata delle variabili, e Notetab Light, per l'annotazione dei corpora. I risultati della ricerca mostrano che le differenze maggiori tra i due gruppi risiedono nella durata delle pause vuote e nella frequenza, durata e e varietà di suoni delle pause piene, oltre a sillabe aggiuntive, sillabe allungate e riformulazioni. Le sillabe aggiuntive appaiono tipiche della produzione orale degli studenti italiani, in quanto, per la maggior parte, le parole della lingua italiana terminano con un suono vocalico. E' inoltre emersa una questione di genere. Le parlanti di sesso femminile, in entrambi i corpora, impiegano maggiormente le variabili della fluency prese in esame, rispetto ai parlanti di sesso maschile. Sulla base di questa ricerca e ricerche future si potranno ideare moduli di insegnamento dell'inglese basati sulla fluency come fattore primario di competenza linguistica. Il Capitolo 1 introduce lo studio. Il Capitolo 2 presenta lo stato dell'arte sul tema. Il Capitolo 3 presenta la metodologia dello studio. Il Capitolo 4 è dedicato a illustrare e discutere i risultati della ricerca. Il Capitolo 5 presenta considerazioni conclusive e future prospettive per l'insegnamento dell'inglese e per la ricerca.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Con il presente lavoro si vuole introdurre un settore finora poco trattato in merito a una ricerca terminologica. Ho avuto modo di avvicinarmi al karting in un passato recente e volevo saperne di più. Così ho cercato la commistione dei due mondi: quello della terminologia per traduttori e quello delle piste e dei motori. Poiché un traduttore deve necessariamente capire il settore che tratta e traduce con questa tesi si vuole mettere a disposizione uno strumento per la ricerca terminologica in italiano e in tedesco. Il primo capitolo introduce la nascita di questo sport, suo sviluppo, arrivo in Italia e delle categorie attuali. Seguirà poi la parte di terminologia e delle lingue speciali soffermandosi sulla lingua speciale della tecnologia automobilistica. La parte pratica della ricerca, costruzione dei corpora e dei sistemi concettuali viene descritta nel terzo capitolo di metodologia e analisi e nel capitolo conclusivo viene presentato il database terminologico con le sue schede in appendice.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Contiene con portadilla propia: Hermanni Boerhaave ... De mercurio experimenta ...

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This study investigates the expression of epistemic modality in a corpus of Ghanaian Pidgin English (GhaPE). The epistemic expressions are manually identified and thereafter distinguished from each other in terms of grammatical status and their indication of different epistemic and evidential notions. 7 different elements are found, ranging from 1 pre-verbal marker, 1 adverb, 2 particles and 3 complement-taking predicates. The results indicate, in line with existing research, that to differentiate between usage properties of individual modal expressions it may be necessary to subdivide them in terms of not only epistemic but also evidential meanings. Moreover, a functional parallel between the GhaPE particle abi, the Swedish modal particle väl and the Spanish adverbs a lo mejor and igual is demonstrated, with respect to their simultaneous function of expressing epistemic probability and asking the hearer for confirmation. Finally, the results suggest, contrary to previous accounts, that the pre-verbal marker fit may indicate epistemic possibility without the addition of a preceding irrealis marker go. It is proposed that future researchers should make use of bigger corpora in order to arrive at a more ample conception of both individual modal categories and their interrelations.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The goals of this study are to determine relationships between synaptogenesis and morphogenesis within the mushroom body calyx of the honeybee Apis mellifera and to find out how the microglomerular structure characteristic for the mature calyx is established during metamorphosis. We show that synaptogenesis in the mushroom body calycal neuropile starts in early metamorphosis (stages P1-P3), before the microglomerular structure of the neuropile is established. The initial step of synaptogenesis is characterized by the rare occurrence of distinct synaptic contacts. A massive synaptogenesis starts at stage P5, which coincides with the formation of microglomeruli, structural units of the calyx that are composed of centrally located presynaptic boutons surrounded by spiny postsynaptic endings. Microglomeruli are assembled either via accumulation of fine postsynaptic processes around preexisting presynaptic boutons or via ingrowth of thin neurites of presynaptic neurons into premicroglomeruli, tightly packed groups of spiny endings. During late pupal stages (P8-P9), addition of new synapses and microglomeruli is likely to continue. Most of the synaptic appositions formed there are made by boutons (putative extrinsic mushroom body neurons) into small postsynaptic profiles that do not exhibit presynaptic specializations (putative intrinsic mushroom body neurons). Synapses between presynaptic boutons characteristic of the adult calyx first appear at stage P8 but remain rare toward the end of metamorphosis. Our observations are consistent with the hypothesis that most of the synapses established during metamorphosis provide the structural basis for afferent information flow to calyces, whereas maturation of local synaptic circuitry is likely to occur after adult emergence.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Com o propósito de incrementar suas campanhas mercadológicas, muitas organizações, recorrem às ferramentas de mídias sociais hospedadas na Internet. Com isso, procuram o aumento de produtividade com adoção de sistemas automatizados de reprodução de mensagens, ou mesmo de recursos de acesso direto, inserindo mensagens de caráter persuasivo nos fóruns de discussões em comunidades online. Uma certa falta de sensibilidade para com o trato comunicacional, num meio potencialmente promissor, mas que pede uma outra interpretação, para posterior ação. Frequentemente implica em uma possibilidade de reverberação indicando ser imprescindível maior atenção na elaboração e no direcionamento desses fluxos comunicacionais, acentuadamente os de propósitos persuasivos. Nesse sentido, o presente trabalho propõe o estudo de comunidades online nas quais possamos a partir da identificação dos fatores que levem à sua formação, analisar e interpretar sua estrutura e seus fluxos comunicacionais, tais que, indiquem seus elementos agregadores. Para tal, com os preceitos metodológicos observados, objetivou-se demonstrar que, com esses componentes, as análises podem ser desenvolvidas para melhor adequação de estratégias de relacionamentos, possibilitando ações inerentes ao processo comunicacional mercadológico com essas comunidades. A metodologia ora empregada envolveu análise estrutural da rede com aplicações de softwares como UCINET, integrado com NetDraw, e dos fluxos comunicacionais, que formaram o corpora, analisado com a suíte Wordsmith Tools. Uma rede formada em comunidade hospedada na ferramenta orkut, por meio da obtenção dos conteúdos de fóruns temáticos, forneceu o corpora para as análises lexicais. Os resultados obtidos puderam caracterizar, não só a própria existência da rede social, como as potencialidades de relacionamento, a partir de interpretações de fluxos dialógicos de seus elementos agregadores, por meio de recursos visuais (grafos), estatísticos e lexicais.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The use of ontologies as representations of knowledge is widespread but their construction, until recently, has been entirely manual. We argue in this paper for the use of text corpora and automated natural language processing methods for the construction of ontologies. We delineate the challenges and present criteria for the selection of appropriate methods. We distinguish three ma jor steps in ontology building: associating terms, constructing hierarchies and labelling relations. A number of methods are presented for these purposes but we conclude that the issue of data-sparsity still is a ma jor challenge. We argue for the use of resources external tot he domain specific corpus.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper describes part of the corpus collection efforts underway in the EC funded Companions project. The Companions project is collecting substantial quantities of dialogue a large part of which focus on reminiscing about photographs. The texts are in English and Czech. We describe the context and objectives for which this dialogue corpus is being collected, the methodology being used and make observations on the resulting data. The corpora will be made available to the wider research community through the Companions Project web site.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Automatic Term Recognition (ATR) is a fundamental processing step preceding more complex tasks such as semantic search and ontology learning. From a large number of methodologies available in the literature only a few are able to handle both single and multi-word terms. In this paper we present a comparison of five such algorithms and propose a combined approach using a voting mechanism. We evaluated the six approaches using two different corpora and show how the voting algorithm performs best on one corpus (a collection of texts from Wikipedia) and less well using the Genia corpus (a standard life science corpus). This indicates that choice and design of corpus has a major impact on the evaluation of term recognition algorithms. Our experiments also showed that single-word terms can be equally important and occupy a fairly large proportion in certain domains. As a result, algorithms that ignore single-word terms may cause problems to tasks built on top of ATR. Effective ATR systems also need to take into account both the unstructured text and the structured aspects and this means information extraction techniques need to be integrated into the term recognition process.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

On the basis of a transcribed French television corpus made of two news bulletins, two chat shows and one literary programme recorded in February 2003, this paper explores the claim that passé simple (PS) may still be used in prepared oral discourse (Pfister 1974). The corpus does not provide support for that use on television, but it seems to suggest a shift from temporal to aspectual features in French television talk: a perfective presentation prevails on a past presentation. This trend would need to be confirmed by a larger television corpus, tested in other types of oral discourse and tested on written corpora.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this study, we investigate crosslinguistic patterns in the alternation between UM, a hesitation marker consisting of a neutral vowel followed by a final labial nasal, and UH, a hesitation marker consisting of a neutral vowel in an open syllable. Based on a quantitative analysis of a range of spoken and written corpora, we identify clear and consistent patterns of change in the use of these forms in various Germanic languages (English, Dutch, German, Norwegian, Danish, Faroese) and dialects (American English, British English), with the use of UM increasing over time relative to the use of UH. We also find that this pattern of change is generally led by women and more educated speakers. Finally, we propose a series of possible explanations for this surprising change in hesitation marker usage that is currently taking place across Germanic languages.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Information technology has increased both the speed and medium of communication between nations. It has brought the world closer, but it has also created new challenges for translation — how we think about it, how we carry it out and how we teach it. Translation and Information Technology has brought together experts in computational linguistics, machine translation, translation education, and translation studies to discuss how these new technologies work, the effect of electronic tools, such as the internet, bilingual corpora, and computer software, on translator education and the practice of translation, as well as the conceptual gaps raised by the interface of human and machine.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Corpus Linguistics is a young discipline. The earliest work was done in the 1960s, but corpora only began to be widely used by lexicographers and linguists in the late 1980s, by language teachers in the late 1990s, and by language students only very recently. This course in corpus linguistics was held at the Departamento de Linguistica Aplicada, E.T.S.I. de Minas, Universidad Politecnica de Madrid from June 15-19 1998. About 45 teachers registered for the course. 30% had PhDs in linguistics, 20% in literature, and the rest were doctorandi or qualified English teachers. The course was designed to introduce the use of corpora and other computational resources in teaching and research, with special reference to scientific and technological discourse in English. Each participant had a computer networked with the lecturer’s machine, whose display could be projected onto a large screen. Application programs were loaded onto the central server, and telnet and a web browser were available. COBUILD gave us permission to access the 323 million word Bank of English corpus, Mike Scott allowed us to use his Wordsmith Tools software, and Tim Johns gave us a copy of his MicroConcord program.