38 resultados para Word Sense Disambguaion, WSD, Natural Language Processing
em Université de Lausanne, Switzerland
Resumo:
In the past, research in ontology learning from text has mainly focused on entity recognition, taxonomy induction and relation extraction. In this work we approach a challenging research issue: detecting semantic frames from texts and using them to encode web ontologies. We exploit a new generation Natural Language Processing technology for frame detection, and we enrich the frames acquired so far with argument restrictions provided by a super-sense tagger and domain specializations. The results are encoded according to a Linguistic MetaModel, which allows a complete translation of lexical resources and data acquired from text, enabling custom transformations of the enriched frames into modular ontology components.
Resumo:
The long term goal of this research is to develop a program able to produce an automatic segmentation and categorization of textual sequences into discourse types. In this preliminary contribution, we present the construction of an algorithm which takes a segmented text as input and attempts to produce a categorization of sequences, such as narrative, argumentative, descriptive and so on. Also, this work aims at investigating a possible convergence between the typological approach developed in particular in the field of text and discourse analysis in French by Adam (2008) and Bronckart (1997) and unsupervised statistical learning.
Resumo:
Abstract Since its creation, the Internet has permeated our daily life. The web is omnipresent for communication, research and organization. This exploitation has resulted in the rapid development of the Internet. Nowadays, the Internet is the biggest container of resources. Information databases such as Wikipedia, Dmoz and the open data available on the net are a great informational potentiality for mankind. The easy and free web access is one of the major feature characterizing the Internet culture. Ten years earlier, the web was completely dominated by English. Today, the web community is no longer only English speaking but it is becoming a genuinely multilingual community. The availability of content is intertwined with the availability of logical organizations (ontologies) for which multilinguality plays a fundamental role. In this work we introduce a very high-level logical organization fully based on semiotic assumptions. We thus present the theoretical foundations as well as the ontology itself, named Linguistic Meta-Model. The most important feature of Linguistic Meta-Model is its ability to support the representation of different knowledge sources developed according to different underlying semiotic theories. This is possible because mast knowledge representation schemata, either formal or informal, can be put into the context of the so-called semiotic triangle. In order to show the main characteristics of Linguistic Meta-Model from a practical paint of view, we developed VIKI (Virtual Intelligence for Knowledge Induction). VIKI is a work-in-progress system aiming at exploiting the Linguistic Meta-Model structure for knowledge expansion. It is a modular system in which each module accomplishes a natural language processing task, from terminology extraction to knowledge retrieval. VIKI is a supporting system to Linguistic Meta-Model and its main task is to give some empirical evidence regarding the use of Linguistic Meta-Model without claiming to be thorough.
Resumo:
BACKGROUND: Molecular interaction Information is a key resource in modern biomedical research. Publicly available data have previously been provided in a broad array of diverse formats, making access to this very difficult. The publication and wide implementation of the Human Proteome Organisation Proteomics Standards Initiative Molecular Interactions (HUPO PSI-MI) format in 2004 was a major step towards the establishment of a single, unified format by which molecular interactions should be presented, but focused purely on protein-protein interactions. RESULTS: The HUPO-PSI has further developed the PSI-MI XML schema to enable the description of interactions between a wider range of molecular types, for example nucleic acids, chemical entities, and molecular complexes. Extensive details about each supported molecular interaction can now be captured, including the biological role of each molecule within that interaction, detailed description of interacting domains, and the kinetic parameters of the interaction. The format is supported by data management and analysis tools and has been adopted by major interaction data providers. Additionally, a simpler, tab-delimited format MITAB2.5 has been developed for the benefit of users who require only minimal information in an easy to access configuration. CONCLUSION: The PSI-MI XML2.5 and MITAB2.5 formats have been jointly developed by interaction data producers and providers from both the academic and commercial sector, and are already widely implemented and well supported by an active development community. PSI-MI XML2.5 enables the description of highly detailed molecular interaction data and facilitates data exchange between databases and users without loss of information. MITAB2.5 is a simpler format appropriate for fast Perl parsing or loading into Microsoft Excel.
Resumo:
Language is typically a function of the left hemisphere but the right hemisphere is also essential in some healthy individuals and patients. This inter-subject variability necessitates the localization of language function, at the individual level, prior to neurosurgical intervention. Such assessments are typically made by comparing left and right hemisphere language function to determine "language lateralization" using clinical tests or fMRI. Here, we show that language function needs to be assessed at the region and hemisphere specific level, because laterality measures can be misleading. Using fMRI data from 82 healthy participants, we investigated the degree to which activation for a semantic word matching task was lateralized in 50 different brain regions and across the entire cortex. This revealed two novel findings. First, the degree to which language is lateralized across brain regions and between subjects was primarily driven by differences in right hemisphere activation rather than differences in left hemisphere activation. Second, we found that healthy subjects who have relatively high left lateralization in the angular gyrus also have relatively low left lateralization in the ventral precentral gyrus. These findings illustrate spatial heterogeneity in language lateralization that is lost when global laterality measures are considered. It is likely that the complex spatial variability we observed in healthy controls is more exaggerated in patients with brain damage. We therefore highlight the importance of investigating within hemisphere regional variations in fMRI activation, prior to neuro-surgical intervention, to determine how each hemisphere and each region contributes to language processing. Hum Brain Mapp, 2010. © 2010 Wiley-Liss, Inc.
Resumo:
Dans le domaine de la perception, l'apprentissage est contraint par la présence d'une architecture fonctionnelle constituée d'aires corticales distribuées et très spécialisées. Dans le domaine des troubles visuels d'origine cérébrale, l'apprentissage d'un patient hémi-anopsique ou agnosique sera limité par ses capacités perceptives résiduelles, mais un déficit de reconnaissance visuelle de nature apparemment perceptive, peut également être associé à une altération des représentations en mémoire à long terme. Des réseaux neuronaux distincts pour la reconnaissance - cortex temporal - et pour la localisation des sons - cortex pariétal - ont été décrits chez l'homme. L'étude de patients cérébro-lésés confirme le rôle des indices spatiaux dans un traitement auditif explicite du « where » et dans la discrimination implicite du « what ». Cette organisation, similaire à ce qui a été décrit dans la modalité visuelle, faciliterait les apprentissages perceptifs. Plus généralement, l'apprentissage implicite fonde une grande partie de nos connaissances sur le monde en nous rendant sensible, à notre insu, aux règles et régularités de notre environnement. Il serait impliqué dans le développement cognitif, la formation des réactions émotionnelles ou encore l'apprentissage par le jeune enfant de sa langue maternelle. Le caractère inconscient de cet apprentissage est confirmé par l'étude des temps de réaction sériels de patients amnésiques dans l'acquisition d'une grammaire artificielle. Son évaluation pourrait être déterminante dans la prise en charge ré-adaptative. [In the field of perception, learning is formed by a distributed functional architecture of very specialized cortical areas. For example, capacities of learning in patients with visual deficits - hemianopia or visual agnosia - from cerebral lesions are limited by perceptual abilities. Moreover a visual deficit in link with abnormal perception may be associated with an alteration of representations in long term (semantic) memory. Furthermore, perception and memory traces rely on parallel processing. This has been recently demonstrated for human audition. Activation studies in normal subjects and psychophysical investigations in patients with focal hemispheric lesions have shown that auditory information relevant to sound recognition and that relevant to sound localisation are processed in parallel, anatomically distinct cortical networks, often referred to as the "What" and "Where" processing streams. Parallel processing may appear counterintuitive from the point of view of a unified perception of the auditory world, but there are advantages, such as rapidity of processing within a single stream, its adaptability in perceptual learning or facility of multisensory interactions. More generally, implicit learning mechanisms are responsible for the non-conscious acquisition of a great part of our knowledge about the world, using our sensitivity to the rules and regularities structuring our environment. Implicit learning is involved in cognitive development, in the generation of emotional processing and in the acquisition of natural language. Preserved implicit learning abilities have been shown in amnesic patients with paradigms like serial reaction time and artificial grammar learning tasks, confirming that implicit learning mechanisms are not sustained by the cognitive processes and the brain structures that are damaged in amnesia. In a clinical perspective, the assessment of implicit learning abilities in amnesic patients could be critical for building adapted neuropsychological rehabilitation programs.]
Resumo:
Introduction. In autism and schizophrenia attenuated/atypical functional hemispheric asymmetry and theory of mind impairments have been reported, suggesting common underlying neuroscientific correlates. We here investigated whether impaired theory of mind performance is associated with attenuated/atypical hemispheric asymmetry. An association may explain the co-occurrence of both dysfunctions in psychiatric populations. Methods. Healthy participants (n 129) performed a left hemisphere (lateralised lexical decision task) and right hemisphere (lateralised face decision task) dominant task as well as a visual cartoon task to assess theory of mind performance. Results. Linear regression analyses revealed inconsistent associations between theory of mind performance and functional hemisphere asymmetry: enhanced theory of mind performance was only associated with (1) faster right hemisphere language processing, and (2) reduced right hemisphere dominance for face processing (men only). Conclusions. The majority of non-significant findings suggest that theory of mind and functional hemispheric asymmetry are unrelated. Instead of ''overinterpreting'' the two significant results, discrepancies in the previous literature relating to the problem of the theory of mind concept, the variety of tasks, and the lack of normative data are discussed. We also suggest how future studies could explore a possible link between hemispheric asymmetry and theory of mind.
Resumo:
Background: Language processing abnormalities and inhibition difficulties are hallmark features of schizophrenia. The objective of this study is to asses the blood oxygenation level-dependent (BOLD) response at two different stages of the illness and compare the frontal activity between adolescents and adults with schizophrenia. Methods: 10 adults with schizophrenia (mean age 31,5 years) and 6 psychotic adolescents with schizophrenic symptoms (mean age 16,2 years) underwent functional magnetic resonance imaging while performing two frontal tasks. Regional activation is compared in the bilateral frontal areas during a covert verbal fluency task (letter version) and a Stroop task (inhibition task). Results: Preliminary results show poorer task performance and less frontal cortex activation during both tasks in the adult group of patients with schizophrenia. In the adolescent patients group, fMRI analysis show significant and larger activity in the left frontal operculum (Broca's area) in the verbal fluency task and greater activity in the medium cingulate during the inhibition phase of the Stroop task. Conclusions: These preliminary findings suggest a decrease of frontal activity in the course of the illness. We assume that schizophrenia contributes to frontal brain activity reduction.
Resumo:
RÉSUMÉ Cette thèse porte sur le développement de méthodes algorithmiques pour découvrir automatiquement la structure morphologique des mots d'un corpus. On considère en particulier le cas des langues s'approchant du type introflexionnel, comme l'arabe ou l'hébreu. La tradition linguistique décrit la morphologie de ces langues en termes d'unités discontinues : les racines consonantiques et les schèmes vocaliques. Ce genre de structure constitue un défi pour les systèmes actuels d'apprentissage automatique, qui opèrent généralement avec des unités continues. La stratégie adoptée ici consiste à traiter le problème comme une séquence de deux sous-problèmes. Le premier est d'ordre phonologique : il s'agit de diviser les symboles (phonèmes, lettres) du corpus en deux groupes correspondant autant que possible aux consonnes et voyelles phonétiques. Le second est de nature morphologique et repose sur les résultats du premier : il s'agit d'établir l'inventaire des racines et schèmes du corpus et de déterminer leurs règles de combinaison. On examine la portée et les limites d'une approche basée sur deux hypothèses : (i) la distinction entre consonnes et voyelles peut être inférée sur la base de leur tendance à alterner dans la chaîne parlée; (ii) les racines et les schèmes peuvent être identifiés respectivement aux séquences de consonnes et voyelles découvertes précédemment. L'algorithme proposé utilise une méthode purement distributionnelle pour partitionner les symboles du corpus. Puis il applique des principes analogiques pour identifier un ensemble de candidats sérieux au titre de racine ou de schème, et pour élargir progressivement cet ensemble. Cette extension est soumise à une procédure d'évaluation basée sur le principe de la longueur de description minimale, dans- l'esprit de LINGUISTICA (Goldsmith, 2001). L'algorithme est implémenté sous la forme d'un programme informatique nommé ARABICA, et évalué sur un corpus de noms arabes, du point de vue de sa capacité à décrire le système du pluriel. Cette étude montre que des structures linguistiques complexes peuvent être découvertes en ne faisant qu'un minimum d'hypothèses a priori sur les phénomènes considérés. Elle illustre la synergie possible entre des mécanismes d'apprentissage portant sur des niveaux de description linguistique distincts, et cherche à déterminer quand et pourquoi cette coopération échoue. Elle conclut que la tension entre l'universalité de la distinction consonnes-voyelles et la spécificité de la structuration racine-schème est cruciale pour expliquer les forces et les faiblesses d'une telle approche. ABSTRACT This dissertation is concerned with the development of algorithmic methods for the unsupervised learning of natural language morphology, using a symbolically transcribed wordlist. It focuses on the case of languages approaching the introflectional type, such as Arabic or Hebrew. The morphology of such languages is traditionally described in terms of discontinuous units: consonantal roots and vocalic patterns. Inferring this kind of structure is a challenging task for current unsupervised learning systems, which generally operate with continuous units. In this study, the problem of learning root-and-pattern morphology is divided into a phonological and a morphological subproblem. The phonological component of the analysis seeks to partition the symbols of a corpus (phonemes, letters) into two subsets that correspond well with the phonetic definition of consonants and vowels; building around this result, the morphological component attempts to establish the list of roots and patterns in the corpus, and to infer the rules that govern their combinations. We assess the extent to which this can be done on the basis of two hypotheses: (i) the distinction between consonants and vowels can be learned by observing their tendency to alternate in speech; (ii) roots and patterns can be identified as sequences of the previously discovered consonants and vowels respectively. The proposed algorithm uses a purely distributional method for partitioning symbols. Then it applies analogical principles to identify a preliminary set of reliable roots and patterns, and gradually enlarge it. This extension process is guided by an evaluation procedure based on the minimum description length principle, in line with the approach to morphological learning embodied in LINGUISTICA (Goldsmith, 2001). The algorithm is implemented as a computer program named ARABICA; it is evaluated with regard to its ability to account for the system of plural formation in a corpus of Arabic nouns. This thesis shows that complex linguistic structures can be discovered without recourse to a rich set of a priori hypotheses about the phenomena under consideration. It illustrates the possible synergy between learning mechanisms operating at distinct levels of linguistic description, and attempts to determine where and why such a cooperation fails. It concludes that the tension between the universality of the consonant-vowel distinction and the specificity of root-and-pattern structure is crucial for understanding the advantages and weaknesses of this approach.
Resumo:
Background: Language processing abnormalities and executive difficulties are hallmark features of schizophrenia. The objective of this study is to assess the blood oxygenation level-dependent (BOLD) response at two different stages of the illness (i.e. comparison between adolescents and adults with schizophrenic symptoms) during a fluency task.Methods: BOLD responses during a covert verbal fluency task were compared between 11 psychotic adolescents with schizophrenic symptoms (mean age 16,9 years) and 14 adults with schizophrenia (mean age 33,4 years). fMRI data were analyzed with standard routine of spm5.Results: First, expected activation's network was found for both groups, separately. Secondly, adolescents showed greater activation in left rolandic opercule (BA 48), left angular (BA 39) and right hippocampus compared to adults. Thirdly, adults demonstrated greater activation in presupplementary motor area (BA 6) and in precentral area (BA 4) compared to adolescents.Conclusions: The adolescents seemed to recruit a verbal network (Broca and Wernicke) and memory abilities to perform a fluency task. In contrast, adults seemed to recruit more executive function abilities to perform a similar task. Despite the evolution of schizophrenia, which is known to have a deleterious influence on the prefrontal cortex development, adult patients seemed to be able to recruit such areas to perform a verbal fluency / executive function task.
Resumo:
Tobacco use is positively associated with severity of symptoms along the schizophrenia spectrum. Accordingly it could be argued that neuropsychological performance, formerly thought to be modulated by schizotypy, is actually modulated by drug use or an interaction of drug use and schizotypy. We tested whether habitual cigarette smokers as compared to non-smokers would show a neuropsychological profile similar to that observed along the schizophrenia spectrum and, if so, whether smoking status or nicotine dependence would be more significant modulators of behavior than schizotypy. Because hemispheric dominance has been found to be attenuated along the schizophrenia spectrum, 40 right-handed male students (20 non-smokers) performed lateralized left- (lexical decisions) and right- (facial decision task) hemisphere dominant tasks. All individuals completed self-report measures of schizotypy and nicotine dependence. Schizotypy predicted laterality in addition to smoking status: While positive schizotypy (Unusual Experiences) was unrelated to hemispheric performance, Cognitive Disorganization predicted reduced left hemisphere dominant language functions. These latter findings suggest that Cognitive Disorganization should be regarded separately as a potentially important mediator of thought disorganization and language processing. Additionally, increasing nicotine dependence among smokers predicted a right hemisphere shift of function in both tasks that supports the role of the right hemisphere in compulsive/impulsive behavior.
Resumo:
Autism is a neurodevelopmental disorder characterized by deficits in social interaction and social communication, as well as by the presence of repetitive and stereotyped behaviors and interests. Brodmann areas 44 and 45 in the inferior frontal cortex, which are involved in language processing, imitation function, and sociality processing networks, have been implicated in this complex disorder. Using a stereologic approach, this study aims to explore the presence of neuropathological differences in areas 44 and 45 in patients with autism compared to age- and hemisphere-matched controls. Based on previous evidence in the fusiform gyrus, we expected to find a decrease in the number and size of pyramidal neurons as well as an increase in volume of layers III, V, and VI in patients with autism. We observed significantly smaller pyramidal neurons in patients with autism compared to controls, although there was no difference in pyramidal neuron numbers or layer volumes. The reduced pyramidal neuron size suggests that a certain degree of dysfunction of areas 44 and 45 plays a role in the pathology of autism. Our results also support previous studies that have shown specific cellular neuropathology in autism with regionally specific reduction in neuron size, and provide further evidence for the possible involvement of the mirror neuron system, as well as impairment of neuronal networks relevant to communication and social behaviors, in this disorder.