894 resultados para unknown words


Relevância:

70.00% 70.00%

Publicador:

Resumo:

Statistical Machine Translation (SMT) is one of the potential applications in the field of Natural Language Processing. The translation process in SMT is carried out by acquiring translation rules automatically from the parallel corpora. However, for many language pairs (e.g. Malayalam- English), they are available only in very limited quantities. Therefore, for these language pairs a huge portion of phrases encountered at run-time will be unknown. This paper focuses on methods for handling such out-of-vocabulary (OOV) words in Malayalam that cannot be translated to English using conventional phrase-based statistical machine translation systems. The OOV words in the source sentence are pre-processed to obtain the root word and its suffix. Different inflected forms of the OOV root are generated and a match is looked up for the word variants in the phrase translation table of the translation model. A Vocabulary filter is used to choose the best among the translations of these word variants by finding the unigram count. A match for the OOV suffix is also looked up in the phrase entries and the target translations are filtered out. Structuring of the filtered phrases is done and SMT translation model is extended by adding OOV with its new phrase translations. By the results of the manual evaluation done it is observed that amount of OOV words in the input has been reduced considerably

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Previous investigations comparing auditory event-related potentials (ERPs) to words whose meanings infants did or did not comprehend, found bilateral differences in brain activity to known versus unknown words in 13-month-old infants, in contrast with unilateral, left hemisphere, differences in activity in 20-month-old infants. We explore two alternative explanations for these findings. Changes in hemispheric specialization may result from a qualitative shift in the way infants process known words between 13 and 20 months. Alternatively, hemispheric specialization may arise from increased familiarity with the individual words tested. We contrasted these two explanations by measuring ERPs from 20-month-old infants with high and low production scores, for novel words they had just learned. A bilateral distribution of ERP differences was observed in both groups of infants, though the difference was larger in the left hemisphere for the high producers. These findings suggest that word familiarity is an important factor in determining the distribution of brain regions involved in word learning. An emerging left hemispheric specialization may reflect increased efficiency in the manner in which infants process familiar and novel words. (c) 2004 Elsevier Inc. All rights reserved.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Language software applications encounter new words, e.g., acronyms, technical terminology, names or compounds of such words. In order to add new words to a lexicon, we need to indicate their inflectional paradigm. We present a new generally applicable method for creating an entry generator, i.e. a paradigm guesser, for finite-state transducer lexicons. As a guesser tends to produce numerous suggestions, it is important that the correct suggestions be among the first few candidates. We prove some formal properties of the method and evaluate it on Finnish, English and Swedish full-scale transducer lexicons. We use the open-source Helsinki Finite-State Technology to create finitestate transducer lexicons from existing lexical resources and automatically derive guessers for unknown words. The method has a recall of 82-87 % and a precision of 71-76 % for the three test languages. The model needs no external corpus and can therefore serve as a baseline.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

该文针对小型词库,提出了基于规则统计模型的消歧方法和识别未登录词的词加权算法.通过大量语料库学习获取歧义高频字,作为歧义标记,利用规则统计模型对标记的上下文信息分类处理,剩下的部分进行正向或逆向动态最大匹配,对连续单字串使用词加权算法来判断其是否为未登录多字词.经过实验测试,该系统的准确率为98.88%,召回率为98.32%.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Il a été avancé que des apprenants expérimentés développeraient des niveaux élevés de conscience métalinguistique (MLA), ce qui leur faciliterait l’apprentissage de langues subséquentes (p.ex., Singleton & Aronin, 2007). De plus, des chercheurs dans le domaine de l’acquisition des langues tierces insistent sur les influences positives qu’exercent les langues précédemment apprises sur l’apprentissage formel d’une langue étrangère (p.ex., Cenoz & Gorter, 2015), et proposent de délaisser le regard traditionnel qui mettait l’accent sur l’interférence à l’origine des erreurs des apprenants pour opter pour une vision plus large et positive de l’interaction entre les langues. Il a été démontré que la similarité typologique ainsi que la compétence dans la langue source influence tous les types de transfert (p.ex., Ringbom, 1987, 2007). Cependant, le défi méthodologique de déterminer, à la fois l’usage pertinent d’une langue cible en tant que résultat d’une influence translinguistique (p.ex., Falk & Bardel, 2010) et d’établir le rôle crucial de la MLA dans l’activation consciente de mots ou de constructions reliés à travers différentes langues, demeure. La présente étude avait pour but de relever ce double défi en faisant appel à des protocoles oraux (TAPs) pour examiner le transfert positif de l’anglais (L2) vers l’allemand (L3) chez des Québécois francophones après cinq semaines d’enseignement formel de la L3. Les participants ont été soumis à une tâche de traduction développée aux fins de la présente étude. Les 42 items ont été sélectionnés sur la base de jugements de similarité et d’imagibilité ainsi que de fréquence des mots provenant d’une étude de cognats allemands-anglais (Friel & Kennison, 2001). Les participants devaient réfléchir à voix haute pendant qu’ils traduisaient des mots inconnus de l’allemand (L3) vers le français (L1). Le transfert positif a été opérationnalisé par des traductions correctes qui étaient basées sur un cognat anglais. La MLA a été mesurée par le biais du THAM (Test d’habiletés métalinguistiques) (Pinto & El Euch, 2015) ainsi que par l’analyse des TAPs. Les niveaux de compétence en anglais ont été établis sur la base du Michigan Test (Corrigan et al., 1979), tandis que les niveaux d’exposition ainsi que l’intérêt envers la langue et la culture allemandes ont été mesurés à l’aide d’un questionnaire. Une analyse fine des TAPs a révélé de la variabilité inter- et intra-individuelle dans l’activation consciente du vocabulaire en L2, tout en permettant l’identification de niveaux distincts de prise de conscience. Deux modèles indépendants de régressions logistiques ont permis d’identifier les deux dimensions de MLA comme prédicteurs de transfert positif. Le premier modèle, dans lequel le THAM était la mesure exclusive de MLA, a déterminé cette dimension réflexive comme principal prédicteur, suivie de la compétence en anglais, tandis qu’aucune des autres variables indépendantes pouvait prédire le transfert positif de l’anglais. Dans le second modèle, incluant le THAM ainsi que les TAPs comme mesures complémentaires de MLA, la dimension appliquée de MLA, telle que mesurée par les TAPs, était de loin le prédicteur principal, suivie de la dimension réflexive, telle que mesurée par le THAM, tandis que la compétence en anglais ne figurait plus parmi les facteurs ayant une influence significative sur la variable réponse. Bien que la verbalisation puisse avoir influencé la performance dans une certaine mesure, nos observations mettent en évidence la contribution précieuse de données introspectives comme complément aux résultats basés sur des caractéristiques purement linguistiques du transfert. Nos analyses soulignent la complexité des processus métalinguistiques et des stratégies individuelles, ce qui reflète une perspective dynamique du multilinguisme (p.ex., Jessner, 2008).

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In an increasingly multilingual world, English language has kept a marked predominance as a global language. In many countries, English is the primary choice for foreign language learning. There is a long history of research in English language learning. The same applies for research in reading. A main interest since the 1970s has been the reading strategy defined as inferencing or guessing the meaning of unknown words from context. Inferencing has ben widely researched, however, the results and conclusions seem to be mixed. While some agree that inferencing is a useful strategy, others doubt its usefulness. Nevertheless, most of the research seem to agree that the cultural background affects comprehension and inferencing. While most of these studies have been done with texts and contexts created by the researches, little has been done using natural prose. The present study will attempt to further clarify the process of inferencing and the effects of the text’s cultural context and the linguistic background of the reader using a text that has not been created by the researcher. The participants of the study are 40 international students from Turku, Finland. Their linguistic background was obtained through a questionnaire and proved to be diverse. Think aloud protocols were performed to investigate their inferencing process and find connections between their inferences, comments, the text, and their linguistic background. The results show that: some inferences were made based on the participants’ world knowledge, experience, other languages, and English language knowledge; other inferences and comments were made based on the text, its use of language and vocabulary, and few cues provided by the author. The results from the present study and previous research seem to show that: 1) linguistic background is a source of information for inferencing but is not a major source; 2) the cultural context of the text affected the inferences made by the participants according to their closeness or distance from it.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Many children in the United States begin kindergarten unprepared to converse in the academic language surrounding instruction, putting them at greater risk for later language and reading difficulties. Importantly, correlational research has shown there are certain experiences prior to kindergarten that foster the oral language skills needed to understand and produce academic language. The focus of this dissertation was on increasing one of these experiences: parent-child conversations about abstract and non-present concepts, known as decontextualized language (DL). Decontextualized language involves talking about non-present concepts such as events that happened in the past or future, or abstract discussions such as providing explanations or defining unknown words. As caregivers’ decontextualized language input to children aged three to five is consistently correlated with kindergarten oral language skills and later reading achievement, it is surprising no experimental research has been done to establish this relation causally. The study described in this dissertation filled this literature gap by designing, implementing, and evaluating a decontextualized language training program for parents of 4-year-old children (N=30). After obtaining an initial measure of decontextualized language, parents were randomly assigned to a control condition or training condition, the latter of which educated parents about decontextualized language and why it is important. All parents then audio-recorded four mealtime conversations over the next month, which were transcribed and reliably coded for decontextualized language. Results indicated that trained parents boosted their DL from roughly 17 percent of their total utterances at baseline to approximately 50 percent by the mid-point of the study, and remained at these boosted levels throughout the duration of the study. Children’s DL was also boosted by similar margins, but no improvement in children’s oral language skills was observed, measured prior to, and one month following training. Further, exploratory analyses pointed to parents’ initial use of DL and their theories of the malleability of intelligence (i.e., growth mindsets) as moderators of training gains. Altogether, these findings are a first step in establishing DL as a viable strategy for giving children the oral language skills needed to begin kindergarten ready to succeed in the classroom.

Relevância:

30.00% 30.00%

Publicador:

Relevância:

30.00% 30.00%

Publicador:

Relevância:

30.00% 30.00%

Publicador:

Relevância:

30.00% 30.00%

Publicador:

Relevância:

30.00% 30.00%

Publicador:

Relevância:

30.00% 30.00%

Publicador: