960 resultados para Word Sense Disambguaion, WSD, Natural Language Processing
Resumo:
The human language-learning ability persists throughout life, indicating considerable flexibility at the cognitive and neural level. This ability spans from expanding the vocabulary in the mother tongue to acquisition of a new language with its lexicon and grammar. The present thesis consists of five studies that tap both of these aspects of adult language learning by using magnetoencephalography (MEG) and functional magnetic resonance imaging (fMRI) during language processing and language learning tasks. The thesis shows that learning novel phonological word forms, either in the native tongue or when exposed to a foreign phonology, activates the brain in similar ways. The results also show that novel native words readily become integrated in the mental lexicon. Several studies in the thesis highlight the left temporal cortex as an important brain region in learning and accessing phonological forms. Incidental learning of foreign phonological word forms was reflected in functionally distinct temporal lobe areas that, respectively, reflected short-term memory processes and more stable learning that persisted to the next day. In a study where explicitly trained items were tracked for ten months, it was found that enhanced naming-related temporal and frontal activation one week after learning was predictive of good long-term memory. The results suggest that memory maintenance is an active process that depends on mechanisms of reconsolidation, and that these process vary considerably between individuals. The thesis put special emphasis on studying language learning in the context of language production. The neural foundation of language production has been studied considerably less than that of perceptive language, especially on the sentence level. A well-known paradigm in language production studies is picture naming, also used as a clinical tool in neuropsychology. This thesis shows that accessing the meaning and phonological form of a depicted object are subserved by different neural implementations. Moreover, a comparison between action and object naming from identical images indicated that the grammatical class of the retrieved word (verb, noun) is less important than the visual content of the image. In the present thesis, the picture naming was further modified into a novel paradigm in order to probe sentence-level speech production in a newly learned miniature language. Neural activity related to grammatical processing did not differ between the novel language and the mother tongue, but stronger neural activation for the novel language was observed during the planning of the upcoming output, likely related to more demanding lexical retrieval and short-term memory. In sum, the thesis aimed at examining language learning by combining different linguistic domains, such as phonology, semantics, and grammar, in a dynamic description of language processing in the human brain.
Resumo:
Le but de cette thèse est d'étudier les corrélats comportementaux et neuronaux du transfert inter-linguistique (TIL) dans l'apprentissage d’une langue seconde (L2). Compte tenu de nos connaissances sur l'influence de la distance linguistique sur le TIL (Paradis, 1987, 2004; Odlin, 1989, 2004, 2005; Gollan, 2005; Ringbom, 2007), nous avons examiné l'effet de facilitation de la similarité phonologique à l’aide de la résonance magnétique fonctionnelle entre des langues linguistiquement proches (espagnol-français) et des langues linguistiquement éloignées (persan-français). L'étude I rapporte les résultats obtenus pour des langues linguistiquement proches (espagnol-français), alors que l'étude II porte sur des langues linguistiquement éloignées (persan-français). Puis, les changements de connectivité fonctionnelle dans le réseau langagier (Price, 2010) et dans le réseau de contrôle supplémentaire impliqué dans le traitement d’une langue seconde (Abutalebi & Green, 2007) lors de l’apprentissage d’une langue linguistiquement éloignée (persan-français) sont rapportés dans l’étude III. Les résultats des analyses d’IRMF suivant le modèle linéaire général chez les bilingues de langues linguistiquement proches (français-espagnol) montrent que le traitement des mots phonologiquement similaires dans les deux langues (cognates et clangs) compte sur un réseau neuronal partagé par la langue maternelle (L1) et la L2, tandis que le traitement des mots phonologiquement éloignés (non-clang-non-cognates) active des structures impliquées dans le traitement de la mémoire de travail et d'attention. Toutefois, chez les personnes bilingues de L1-L2 linguistiquement éloignées (français-persan), même les mots phonologiquement similaires à travers les langues (cognates et clangs) activent des régions connues pour être impliquées dans l'attention et le contrôle cognitif. Par ailleurs, les mots phonologiquement éloignés (non-clang-non-cognates) activent des régions usuellement associées à la mémoire de travail et aux fonctions exécutives. Ainsi, le facteur de distance inter-linguistique entre L1 et L2 module la charge cognitive sur la base du degré de similarité phonologiques entres les items en L1 et L2. Des structures soutenant les processus impliqués dans le traitement exécutif sont recrutées afin de compenser pour des demandes cognitives. Lorsque la compétence linguistique en L2 augmente et que les tâches linguistiques exigent ainsi moins d’effort, la demande pour les ressources cognitives diminue. Tel que déjà rapporté (Majerus, et al, 2008; Prat, et al, 2007; Veroude, et al, 2010; Dodel, et al, 2005; Coynel, et al ., 2009), les résultats des analyses de connectivité fonctionnelle montrent qu’après l’entraînement la valeur d'intégration (connectivité fonctionnelle) diminue puisqu’il y a moins de circulation du flux d'information. Les résultats de cette recherche contribuent à une meilleure compréhension des aspects neurocognitifs et de plasticité cérébrale du TIL ainsi que l'impact de la distance linguistique dans l'apprentissage des langues. Ces résultats ont des implications dans les stratégies d'apprentissage d’une L2, les méthodes d’enseignement d’une L2 ainsi que le développement d'approches thérapeutiques chez des patients bilingues qui souffrent de troubles langagiers.
Resumo:
Word sense disambiguation is the task of determining which sense of a word is intended from its context. Previous methods have found the lack of training data and the restrictiveness of dictionaries' choices of senses to be major stumbling blocks. A robust novel algorithm is presented that uses multiple dictionaries, the Internet, clustering and triangulation to attempt to discern the most useful senses of a given word and learn how they can be disambiguated. The algorithm is explained, and some promising sample results are given.
Resumo:
This cross-sectional study examines the role of L1-L2 differences and structural distance in the processing of gender and number agreement by English-speaking learners of Spanish at three different levels of proficiency. Preliminary results show that differences between the L1 and L2 impact L2 development, as sensitivity to gender agreement violations, as opposed to number agreement violations, emerges only in learners at advanced levels of proficiency. Results also show that the establishment of agreement dependencies is impacted by the structural distance between the agreeing elements for native speakers and for learners at intermediate and advanced levels of proficiency but not for low proficiency. The overall pattern of results suggests that the linguistic factors examined here impact development but do not constrain ultimate attainment; for advanced learners, results suggest that second language processing is qualitatively similar to native processing.
Resumo:
The goal of a research programme Evidence Algorithm is a development of an open system of automated proving that is able to accumulate mathematical knowledge and to prove theorems in a context of a self-contained mathematical text. By now, the first version of such a system called a System for Automated Deduction, SAD, is implemented in software. The system SAD possesses the following main features: mathematical texts are formalized using a specific formal language that is close to a natural language of mathematical publications; a proof search is based on special sequent-type calculi formalizing natural reasoning style, such as application of definitions and auxiliary propositions. These calculi also admit a separation of equality handling from deduction that gives an opportunity to integrate logical reasoning with symbolic calculation.
Resumo:
This study investigated whether there are differences in the Speech-Evoked Auditory Brainstem Response among children with Typical Development (TD), (Central) Auditory Processing Disorder (C) APD, and Language Impairment (LI). The speech-evoked Auditory Brainstem Response was tested in 57 children (ages 6-12). The children were placed into three groups: TD (n = 18), (C)APD (n = 18) and LI (n = 21). Speech-evoked ABR were elicited using the five-formant syllable/da/. Three dimensions were defined for analysis, including timing, harmonics, and pitch. A comparative analysis of the responses between the typical development children and children with (C)APD and LI revealed abnormal encoding of the speech acoustic features that are characteristics of speech perception in children with (C)APD and LI, although the two groups differed in their abnormalities. While the children with (C)APD might had a greater difficulty distinguishing stimuli based on timing cues, the children with LI had the additional difficulty of distinguishing speech harmonics, which are important to the identification of speech sounds. These data suggested that an inefficient representation of crucial components of speech sounds may contribute to the difficulties with language processing found in children with LI. Furthermore, these findings may indicate that the neural processes mediated by the auditory brainstem differ among children with auditory processing and speech-language disorders. (C) 2012 Elsevier B.V. All rights reserved.
Resumo:
This research tests the hypothesis that knowledge of derivational morphology facilitates vocabulary acquisition in beginning adult second language learners. Participants were mono-lingual English-speaking college students aged 18 years and older enrolled inintroductory Spanish courses. Knowledge of Spanish derivational morphology was tested through the use of a forced-choice translation task. Spanish lexical knowledge was measured by a translation task using direct translation (English word) primes and conceptual (picture) primes. A 2x2x2 mixed factor ANOVA examined the relationships between morphological knowledge (strong, moderate), error type (form-based, conceptual), and prime type (direct translation, picture). The results are consistent with the existence of a relationship between knowledge of derivational morphology andacquisition of second language vocabulary. Participants made more conceptually-based errors than form-based errors F (1,22)=7.744, p=.011. This result is consistent with Clahsen & Felser’s (2006) and Ullman’s (2004) models of second language processing. Additionally, participants with Strong morphological knowledge made fewer errors onthe lexical knowledge task than participants with Moderate morphological knowledge t(23)=-2.656, p=.014. I suggest future directions to clarify the relationship between morphological knowledge and lexical development in adult second language learners.
Resumo:
Brain processing of grammatical word class was studied analyzing event-related potential (ERP) brain fields. Normal subjects observed a randomized sequence of single German nouns and verbs on a computer screen, while 20-channel ERP field map series were recorded separately for both word classes. Spatial microstate analysis was applied, based on the observation that series of ERP maps consist of epochs of quasi-stable map landscapes and based on the rationale that different map landscapes must have been generated by different neural generators and thus suggest different brain functions. Space-oriented segmentation of the mean map series identified nine successive, different functional microstates, i.e., steps of brain information processing characterized by quasi-stable map landscapes. In the microstate from 116 to 172 msec, noun-related maps differed significantly from verb-related maps along the left–right axis. The results indicate that different neural populations represent different grammatical word classes in language processing, in agreement with clinical observations. This word class differentiation as revealed by the spatial–temporal organization of neural activity occurred at a time after word input compatible with speed of reading.
Resumo:
In his in uential article about the evolution of the Web, Berners-Lee [1] envisions a Semantic Web in which humans and computers alike are capable of understanding and processing information. This vision is yet to materialize. The main obstacle for the Semantic Web vision is that in today's Web meaning is rooted most often not in formal semantics, but in natural language and, in the sense of semiology, emerges not before interpretation and processing. Yet, an automated form of interpretation and processing can be tackled by precisiating raw natural language. To do that, Web agents extract fuzzy grassroots ontologies through induction from existing Web content. Inductive fuzzy grassroots ontologies thus constitute organically evolved knowledge bases that resemble automated gradual thesauri, which allow precisiating natural language [2]. The Web agents' underlying dynamic, self-organizing, and best-effort induction, enable a sub-syntactical bottom up learning of semiotic associations. Thus, knowledge is induced from the users' natural use of language in mutual Web interactions, and stored in a gradual, thesauri-like lexical-world knowledge database as a top-level ontology, eventually allowing a form of computing with words [3]. Since when computing with words the objects of computation are words, phrases and propositions drawn from natural languages, it proves to be a practical notion to yield emergent semantics for the Semantic Web. In the end, an improved understanding by computers on the one hand should upgrade human- computer interaction on the Web, and, on the other hand allow an initial version of human- intelligence amplification through the Web.
Resumo:
The goal of the present thesis was to investigate the production of code-switched utterances in bilinguals’ speech production. This study investigates the availability of grammatical-category information during bilingual language processing. The specific aim is to examine the processes involved in the production of Persian-English bilingual compound verbs (BCVs). A bilingual compound verb is formed when the nominal constituent of a compound verb is replaced by an item from the other language. In the present cases of BCVs the nominal constituents are replaced by a verb from the other language. The main question addressed is how a lexical element corresponding to a verb node can be placed in a slot that corresponds to a noun lemma. This study also investigates how the production of BCVs might be captured within a model of BCVs and how such a model may be integrated within incremental network models of speech production. In the present study, both naturalistic and experimental data were used to investigate the processes involved in the production of BCVs. In the first part of the present study, I collected 2298 minutes of a popular Iranian TV program and found 962 code-switched utterances. In 83 (8%) of the switched cases, insertions occurred within the Persian compound verb structure, hence, resulting in BCVs. As to the second part of my work, a picture-word interference experiment was conducted. This study addressed whether in the case of the production of Persian-English BCVs, English verbs compete with the corresponding Persian compound verbs as a whole, or whether English verbs compete with the nominal constituents of Persian compound verbs only. Persian-English bilinguals named pictures depicting actions in 4 conditions in Persian (L1). In condition 1, participants named pictures of action using the whole Persian compound verb in the context of its English equivalent distractor verb. In condition 2, only the nominal constituent was produced in the presence of the light verb of the target Persian compound verb and in the context of a semantically closely related English distractor verb. In condition 3, the whole Persian compound verb was produced in the context of a semantically unrelated English distractor verb. In condition 4, only the nominal constituent was produced in the presence of the light verb of the target Persian compound verb and in the context of a semantically unrelated English distractor verb. The main effect of linguistic unit was significant by participants and items. Naming latencies were longer in the nominal linguistic unit compared to the compound verb (CV) linguistic unit. That is, participants were slower to produce the nominal constituent of compound verbs in the context of a semantically closely related English distractor verb compared to producing the whole compound verbs in the context of a semantically closely related English distractor verb. The three-way interaction between version of the experiment (CV and nominal versions), linguistic unit (nominal and CV linguistic units), and relation (semantically related and unrelated distractor words) was significant by participants. In both versions, naming latencies were longer in the semantically related nominal linguistic unit compared to the response latencies in the semantically related CV linguistic unit. In both versions, naming latencies were longer in the semantically related nominal linguistic unit compared to response latencies in the semantically unrelated nominal linguistic unit. Both the analysis of the naturalistic data and the results of the experiment revealed that in the case of the production of the nominal constituent of BCVs, a verb from the other language may compete with a noun from the base language, suggesting that grammatical category does not necessarily provide a constraint on lexical access during the production of the nominal constituent of BCVs. There was a minimal context in condition 2 (the nominal linguistic unit) in which the nominal constituent was produced in the presence of its corresponding light verb. The results suggest that generating words within a context may not guarantee that the effect of grammatical class becomes available. A model is proposed in order to characterize the processes involved in the production of BCVs. Implications for models of bilingual language production are discussed.
Resumo:
This paper describes the design, development and field evaluation of a machine translation system from Spanish to Spanish Sign Language (LSE: Lengua de Signos Española). The developed system focuses on helping Deaf people when they want to renew their Driver’s License. The system is made up of a speech recognizer (for decoding the spoken utterance into a word sequence), a natural language translator (for converting a word sequence into a sequence of signs belonging to the sign language), and a 3D avatar animation module (for playing back the signs). For the natural language translator, three technological approaches have been implemented and evaluated: an example-based strategy, a rule-based translation method and a statistical translator. For the final version, the implemented language translator combines all the alternatives into a hierarchical structure. This paper includes a detailed description of the field evaluation. This evaluation was carried out in the Local Traffic Office in Toledo involving real government employees and Deaf people. The evaluation includes objective measurements from the system and subjective information from questionnaires. The paper details the main problems found and a discussion on how to solve them (some of them specific for LSE).
Resumo:
This paper describes the application of language translation technologies for generating bus information in Spanish Sign Language (LSE: Lengua de Signos Española). In this work, two main systems have been developed: the first for translating text messages from information panels and the second for translating spoken Spanish into natural conversations at the information point of the bus company. Both systems are made up of a natural language translator (for converting a word sentence into a sequence of LSE signs), and a 3D avatar animation module (for playing back the signs). For the natural language translator, two technological approaches have been analyzed and integrated: an example-based strategy and a statistical translator. When translating spoken utterances, it is also necessary to incorporate a speech recognizer for decoding the spoken utterance into a word sequence, prior to the language translation module. This paper includes a detailed description of the field evaluation carried out in this domain. This evaluation has been carried out at the customer information office in Madrid involving both real bus company employees and deaf people. The evaluation includes objective measurements from the system and information from questionnaires. In the field evaluation, the whole translation presents an SER (Sign Error Rate) of less than 10% and a BLEU greater than 90%.
Resumo:
Recently, experts and practitioners in language resources have started recognizing the benefits of the linked data (LD) paradigm for the representation and exploitation of linguistic data on the Web. The adoption of the LD principles is leading to an emerging ecosystem of multilingual open resources that conform to the Linguistic Linked Open Data Cloud, in which datasets of linguistic data are interconnected and represented following common vocabularies, which facilitates linguistic information discovery, integration and access. In order to contribute to this initiative, this paper summarizes several key aspects of the representation of linguistic information as linked data from a practical perspective. The main goal of this document is to provide the basic ideas and tools for migrating language resources (lexicons, corpora, etc.) as LD on the Web and to develop some useful NLP tasks with them (e.g., word sense disambiguation). Such material was the basis of a tutorial imparted at the EKAW’14 conference, which is also reported in the paper.
Resumo:
Cerebral organization during sentence processing in English and in American Sign Language (ASL) was characterized by employing functional magnetic resonance imaging (fMRI) at 4 T. Effects of deafness, age of language acquisition, and bilingualism were assessed by comparing results from (i) normally hearing, monolingual, native speakers of English, (ii) congenitally, genetically deaf, native signers of ASL who learned English late and through the visual modality, and (iii) normally hearing bilinguals who were native signers of ASL and speakers of English. All groups, hearing and deaf, processing their native language, English or ASL, displayed strong and repeated activation within classical language areas of the left hemisphere. Deaf subjects reading English did not display activation in these regions. These results suggest that the early acquisition of a natural language is important in the expression of the strong bias for these areas to mediate language, independently of the form of the language. In addition, native signers, hearing and deaf, displayed extensive activation of homologous areas within the right hemisphere, indicating that the specific processing requirements of the language also in part determine the organization of the language systems of the brain.