8 resultados para Natural language

em Université de Lausanne, Switzerland


Relevância:

60.00% 60.00%

Publicador:

Resumo:

The long term goal of this research is to develop a program able to produce an automatic segmentation and categorization of textual sequences into discourse types. In this preliminary contribution, we present the construction of an algorithm which takes a segmented text as input and attempts to produce a categorization of sequences, such as narrative, argumentative, descriptive and so on. Also, this work aims at investigating a possible convergence between the typological approach developed in particular in the field of text and discourse analysis in French by Adam (2008) and Bronckart (1997) and unsupervised statistical learning.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In the past, research in ontology learning from text has mainly focused on entity recognition, taxonomy induction and relation extraction. In this work we approach a challenging research issue: detecting semantic frames from texts and using them to encode web ontologies. We exploit a new generation Natural Language Processing technology for frame detection, and we enrich the frames acquired so far with argument restrictions provided by a super-sense tagger and domain specializations. The results are encoded according to a Linguistic MetaModel, which allows a complete translation of lexical resources and data acquired from text, enabling custom transformations of the enriched frames into modular ontology components.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Abstract Since its creation, the Internet has permeated our daily life. The web is omnipresent for communication, research and organization. This exploitation has resulted in the rapid development of the Internet. Nowadays, the Internet is the biggest container of resources. Information databases such as Wikipedia, Dmoz and the open data available on the net are a great informational potentiality for mankind. The easy and free web access is one of the major feature characterizing the Internet culture. Ten years earlier, the web was completely dominated by English. Today, the web community is no longer only English speaking but it is becoming a genuinely multilingual community. The availability of content is intertwined with the availability of logical organizations (ontologies) for which multilinguality plays a fundamental role. In this work we introduce a very high-level logical organization fully based on semiotic assumptions. We thus present the theoretical foundations as well as the ontology itself, named Linguistic Meta-Model. The most important feature of Linguistic Meta-Model is its ability to support the representation of different knowledge sources developed according to different underlying semiotic theories. This is possible because mast knowledge representation schemata, either formal or informal, can be put into the context of the so-called semiotic triangle. In order to show the main characteristics of Linguistic Meta-Model from a practical paint of view, we developed VIKI (Virtual Intelligence for Knowledge Induction). VIKI is a work-in-progress system aiming at exploiting the Linguistic Meta-Model structure for knowledge expansion. It is a modular system in which each module accomplishes a natural language processing task, from terminology extraction to knowledge retrieval. VIKI is a supporting system to Linguistic Meta-Model and its main task is to give some empirical evidence regarding the use of Linguistic Meta-Model without claiming to be thorough.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

BACKGROUND: Molecular interaction Information is a key resource in modern biomedical research. Publicly available data have previously been provided in a broad array of diverse formats, making access to this very difficult. The publication and wide implementation of the Human Proteome Organisation Proteomics Standards Initiative Molecular Interactions (HUPO PSI-MI) format in 2004 was a major step towards the establishment of a single, unified format by which molecular interactions should be presented, but focused purely on protein-protein interactions. RESULTS: The HUPO-PSI has further developed the PSI-MI XML schema to enable the description of interactions between a wider range of molecular types, for example nucleic acids, chemical entities, and molecular complexes. Extensive details about each supported molecular interaction can now be captured, including the biological role of each molecule within that interaction, detailed description of interacting domains, and the kinetic parameters of the interaction. The format is supported by data management and analysis tools and has been adopted by major interaction data providers. Additionally, a simpler, tab-delimited format MITAB2.5 has been developed for the benefit of users who require only minimal information in an easy to access configuration. CONCLUSION: The PSI-MI XML2.5 and MITAB2.5 formats have been jointly developed by interaction data producers and providers from both the academic and commercial sector, and are already widely implemented and well supported by an active development community. PSI-MI XML2.5 enables the description of highly detailed molecular interaction data and facilitates data exchange between databases and users without loss of information. MITAB2.5 is a simpler format appropriate for fast Perl parsing or loading into Microsoft Excel.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

RÉSUMÉ Cette thèse porte sur le développement de méthodes algorithmiques pour découvrir automatiquement la structure morphologique des mots d'un corpus. On considère en particulier le cas des langues s'approchant du type introflexionnel, comme l'arabe ou l'hébreu. La tradition linguistique décrit la morphologie de ces langues en termes d'unités discontinues : les racines consonantiques et les schèmes vocaliques. Ce genre de structure constitue un défi pour les systèmes actuels d'apprentissage automatique, qui opèrent généralement avec des unités continues. La stratégie adoptée ici consiste à traiter le problème comme une séquence de deux sous-problèmes. Le premier est d'ordre phonologique : il s'agit de diviser les symboles (phonèmes, lettres) du corpus en deux groupes correspondant autant que possible aux consonnes et voyelles phonétiques. Le second est de nature morphologique et repose sur les résultats du premier : il s'agit d'établir l'inventaire des racines et schèmes du corpus et de déterminer leurs règles de combinaison. On examine la portée et les limites d'une approche basée sur deux hypothèses : (i) la distinction entre consonnes et voyelles peut être inférée sur la base de leur tendance à alterner dans la chaîne parlée; (ii) les racines et les schèmes peuvent être identifiés respectivement aux séquences de consonnes et voyelles découvertes précédemment. L'algorithme proposé utilise une méthode purement distributionnelle pour partitionner les symboles du corpus. Puis il applique des principes analogiques pour identifier un ensemble de candidats sérieux au titre de racine ou de schème, et pour élargir progressivement cet ensemble. Cette extension est soumise à une procédure d'évaluation basée sur le principe de la longueur de description minimale, dans- l'esprit de LINGUISTICA (Goldsmith, 2001). L'algorithme est implémenté sous la forme d'un programme informatique nommé ARABICA, et évalué sur un corpus de noms arabes, du point de vue de sa capacité à décrire le système du pluriel. Cette étude montre que des structures linguistiques complexes peuvent être découvertes en ne faisant qu'un minimum d'hypothèses a priori sur les phénomènes considérés. Elle illustre la synergie possible entre des mécanismes d'apprentissage portant sur des niveaux de description linguistique distincts, et cherche à déterminer quand et pourquoi cette coopération échoue. Elle conclut que la tension entre l'universalité de la distinction consonnes-voyelles et la spécificité de la structuration racine-schème est cruciale pour expliquer les forces et les faiblesses d'une telle approche. ABSTRACT This dissertation is concerned with the development of algorithmic methods for the unsupervised learning of natural language morphology, using a symbolically transcribed wordlist. It focuses on the case of languages approaching the introflectional type, such as Arabic or Hebrew. The morphology of such languages is traditionally described in terms of discontinuous units: consonantal roots and vocalic patterns. Inferring this kind of structure is a challenging task for current unsupervised learning systems, which generally operate with continuous units. In this study, the problem of learning root-and-pattern morphology is divided into a phonological and a morphological subproblem. The phonological component of the analysis seeks to partition the symbols of a corpus (phonemes, letters) into two subsets that correspond well with the phonetic definition of consonants and vowels; building around this result, the morphological component attempts to establish the list of roots and patterns in the corpus, and to infer the rules that govern their combinations. We assess the extent to which this can be done on the basis of two hypotheses: (i) the distinction between consonants and vowels can be learned by observing their tendency to alternate in speech; (ii) roots and patterns can be identified as sequences of the previously discovered consonants and vowels respectively. The proposed algorithm uses a purely distributional method for partitioning symbols. Then it applies analogical principles to identify a preliminary set of reliable roots and patterns, and gradually enlarge it. This extension process is guided by an evaluation procedure based on the minimum description length principle, in line with the approach to morphological learning embodied in LINGUISTICA (Goldsmith, 2001). The algorithm is implemented as a computer program named ARABICA; it is evaluated with regard to its ability to account for the system of plural formation in a corpus of Arabic nouns. This thesis shows that complex linguistic structures can be discovered without recourse to a rich set of a priori hypotheses about the phenomena under consideration. It illustrates the possible synergy between learning mechanisms operating at distinct levels of linguistic description, and attempts to determine where and why such a cooperation fails. It concludes that the tension between the universality of the consonant-vowel distinction and the specificity of root-and-pattern structure is crucial for understanding the advantages and weaknesses of this approach.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Dans le domaine de la perception, l'apprentissage est contraint par la présence d'une architecture fonctionnelle constituée d'aires corticales distribuées et très spécialisées. Dans le domaine des troubles visuels d'origine cérébrale, l'apprentissage d'un patient hémi-anopsique ou agnosique sera limité par ses capacités perceptives résiduelles, mais un déficit de reconnaissance visuelle de nature apparemment perceptive, peut également être associé à une altération des représentations en mémoire à long terme. Des réseaux neuronaux distincts pour la reconnaissance - cortex temporal - et pour la localisation des sons - cortex pariétal - ont été décrits chez l'homme. L'étude de patients cérébro-lésés confirme le rôle des indices spatiaux dans un traitement auditif explicite du « where » et dans la discrimination implicite du « what ». Cette organisation, similaire à ce qui a été décrit dans la modalité visuelle, faciliterait les apprentissages perceptifs. Plus généralement, l'apprentissage implicite fonde une grande partie de nos connaissances sur le monde en nous rendant sensible, à notre insu, aux règles et régularités de notre environnement. Il serait impliqué dans le développement cognitif, la formation des réactions émotionnelles ou encore l'apprentissage par le jeune enfant de sa langue maternelle. Le caractère inconscient de cet apprentissage est confirmé par l'étude des temps de réaction sériels de patients amnésiques dans l'acquisition d'une grammaire artificielle. Son évaluation pourrait être déterminante dans la prise en charge ré-adaptative. [In the field of perception, learning is formed by a distributed functional architecture of very specialized cortical areas. For example, capacities of learning in patients with visual deficits - hemianopia or visual agnosia - from cerebral lesions are limited by perceptual abilities. Moreover a visual deficit in link with abnormal perception may be associated with an alteration of representations in long term (semantic) memory. Furthermore, perception and memory traces rely on parallel processing. This has been recently demonstrated for human audition. Activation studies in normal subjects and psychophysical investigations in patients with focal hemispheric lesions have shown that auditory information relevant to sound recognition and that relevant to sound localisation are processed in parallel, anatomically distinct cortical networks, often referred to as the "What" and "Where" processing streams. Parallel processing may appear counterintuitive from the point of view of a unified perception of the auditory world, but there are advantages, such as rapidity of processing within a single stream, its adaptability in perceptual learning or facility of multisensory interactions. More generally, implicit learning mechanisms are responsible for the non-conscious acquisition of a great part of our knowledge about the world, using our sensitivity to the rules and regularities structuring our environment. Implicit learning is involved in cognitive development, in the generation of emotional processing and in the acquisition of natural language. Preserved implicit learning abilities have been shown in amnesic patients with paradigms like serial reaction time and artificial grammar learning tasks, confirming that implicit learning mechanisms are not sustained by the cognitive processes and the brain structures that are damaged in amnesia. In a clinical perspective, the assessment of implicit learning abilities in amnesic patients could be critical for building adapted neuropsychological rehabilitation programs.]

Relevância:

30.00% 30.00%

Publicador:

Resumo:

As a constantly evolving set of complex biotechnologies, medically assisted procreation (MAP) jeopardises a category that seems to be taken for granted: that of 'natural'. What is 'natural' or not when MAP is used to procreate? What are the boundaries between a 'natural' and a 'non-natural' fertilisation? Drawing upon a dialogical approach to language and cognition, our study examined the semantic field of the category 'natural' as expressed in interviews between a psychiatrist and seven couples who resorted to MAP and had to decide whether to keep their frozen pre-embryonic cells (zygotes) for further procreation or to allow them be destroyed. We examined how these couples evoked the category 'natural' and showed that in their argumentation, the category 'natural' encompassed a wide variety of phenomena, which shifted the boundaries between the 'natural' and 'non-natural'. In so doing, the couples 'renaturalised' MAP, normalized it, moved the boundaries between what is legitimate or not, and showed their accountability. Hence, reference to the category 'natural' seemed to act both as an argumentative and a psychological resource in the elaboration of the person's experience in resorting to MAP.