960 resultados para Word Sense Disambguaion, WSD, Natural Language Processing


Relevância:

100.00% 100.00%

Publicador:

Resumo:

It is possible to view the relations between mathematics and natural language from different aspects. This relation between mathematics and language is not based on just one aspect. In this article, the authors address the role of the Subject facing Reality through language. Perception is defined and a mathematical theory of the perceptual field is proposed. The distinction between purely expressive language and purely informative language is considered false, because the subject is expressed in the communication of a message, and conversely, in purely expressive language, as in an exclamation, there is some information. To study the relation between language and reality, the function of ostensibility is defined and propositions are divided into ostensives and estimatives.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this work we present a semantic framework suitable of being used as support tool for recommender systems. Our purpose is to use the semantic information provided by a set of integrated resources to enrich texts by conducting different NLP tasks: WSD, domain classification, semantic similarities and sentiment analysis. After obtaining the textual semantic enrichment we would be able to recommend similar content or even to rate texts according to different dimensions. First of all, we describe the main characteristics of the semantic integrated resources with an exhaustive evaluation. Next, we demonstrate the usefulness of our resource in different NLP tasks and campaigns. Moreover, we present a combination of different NLP approaches that provide enough knowledge for being used as support tool for recommender systems. Finally, we illustrate a case of study with information related to movies and TV series to demonstrate that our framework works properly.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Research on semantic processing focused mainly on isolated units in language, which does not reflect the complexity of language. In order to understand how semantic information is processed in a wider context, the first goal of this thesis was to determine whether Swedish pre-school children are able to comprehend semantic context and if that context is semantically built up over time. The second goal was to investigate how the brain distributes attentional resources by means of brain activation amplitude and processing type. Swedish preschool children were tested in a dichotic listening task with longer children’s narratives. The development of event-related potential N400 component and its amplitude were used to investigate both goals. The decrease of the N400 in the attended and unattended channel indicated semantic comprehension and that semantic context was built up over time. The attended stimulus received more resources, processed the stimuli in more of a top-down manner and displayed prominent N400 amplitude in contrast to the unattended stimulus. The N400 and the late positivity were more complex than expected since endings of utterances longer than nine words were not accounted for. More research on wider linguistic context is needed in order to understand how the human brain comprehends natural language

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In the last decade we have seen an exponential growth of functional imaging studies investigating multiple aspects of language processing. These studies have sparked an interest in applying some of the paradigms to various clinically relevant questions, such as the identification of the cortical regions mediating language function in surgical candidates for refractory epilepsy. Here we present data from a group of adult control participants in order to investigate the potential of using frequency specific spectral power changes in MEG activation patterns to establish lateralisation of language function using expressive language tasks. In addition, we report on a paediatric patient whose language function was assessed before and after a left hemisphere amygdalo-hippocampectomy. Our verb generation task produced left hemisphere decreases in beta-band power accompanied by right hemisphere increases in low beta-band power in the majority of the control group, a previously unreported phenomenon. This pattern of spectral power was also found in the patient's post-surgery data, though not her pre-surgery data. Comparison of pre and post-operative results also provided some evidence of reorganisation in language related cortex both inter- and intra-hemispherically following surgery. The differences were not limited to changes in localisation of language specific cortex but also changes in the spectral and temporal profile of frontal brain regions during verb generation. While further investigation is required to establish concordance with invasive measures, our data suggest that the methods described may serve as a reliable lateralisation marker for clinical assessment. Furthermore, our findings highlight the potential utility of MEG for the investigation of cortical language functioning in both healthy development and pathology.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

For more than forty years, research has been on going in the use of the computer in the processing of natural language. During this period methods have evolved, with various parsing techniques and grammars coming to prominence. Problems still exist, not least in the field of Machine Translation. However, one of the successes in this field is the translation of sublanguage. The present work reports Deterministic Parsing, a relatively new parsing technique, and its application to the sublanguage of an aircraft maintenance manual for Machine Translation. The aim has been to investigate the practicability of using Deterministic Parsers in the analysis stage of a Machine Translation system. Machine Translation, Sublanguage and parsing are described in general terms with a review of Deterministic parsing systems, pertinent to this research, being presented in detail. The interaction between machine Translation, Sublanguage and Parsing, including Deterministic parsing, is also highlighted. Two types of Deterministic Parser have been investigated, a Marcus-type parser, based on the basic design of the original Deterministic parser (Marcus, 1980) and an LR-type Deterministic Parser for natural language, based on the LR parsing algorithm. In total, four Deterministic Parsers have been built and are described in the thesis. Two of the Deterministic Parsers are prototypes from which the remaining two parsers to be used on sublanguage have been developed. This thesis reports the results of parsing by the prototypes, a Marcus-type parser and an LR-type parser which have a similar grammatical and linguistic range to the original Marcus parser. The Marcus-type parser uses a grammar of production rules, whereas the LR-type parser employs a Definite Clause Grammar(DGC).

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Over recent years, evidence has been accumulating in favour of the importance of long-term information as a variable which can affect the success of short-term recall. Lexicality, word frequency, imagery and meaning have all been shown to augment short term recall performance. Two competing theories as to the causes of this long-term memory influence are outlined and tested in this thesis. The first approach is the order-encoding account, which ascribes the effect to the usage of resources at encoding, hypothesising that word lists which require less effort to process will benefit from increased levels of order encoding, in turn enhancing recall success. The alternative view, trace redintegration theory, suggests that order is automatically encoded phonologically, and that long-term information can only influence the interpretation of the resultant memory trace. The free recall experiments reported here attempted to determine the importance of order encoding as a facilitatory framework and to determine the locus of the effects of long-term information in free recall. Experiments 1 and 2 examined the effects of word frequency and semantic categorisation over a filled delay, and experiments 3 and 4 did the same for immediate recall. Free recall was improved by both long-term factors tested. Order information was not used over a short filled delay, but was evident in immediate recall. Furthermore, it was found that both long-term factors increased the amount of order information retained. Experiment 5 induced an order encoding effect over a filled delay, leaving a picture of short-term processes which are closely associated with long-term processes, and which fit conceptions of short-term memory being part of language processes rather better than either the encoding or the retrieval-based models. Experiments 6 and 7 aimed to determine to what extent phonological processes were responsible for the pattern of results observed. Articulatory suppression affected the encoding of order information where speech rate had no direct influence, suggesting that it is ease of lexical access which is the most important factor in the influence of long-term memory on immediate recall tasks. The evidence presented in this thesis does not offer complete support for either the retrieval-based account or the order encoding account of long-term influence. Instead, the evidence sits best with models that are based upon language-processing. The path urged for future research is to find ways in which this diffuse model can be better specified, and which can take account of the versatility of the human brain.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Models are central tools for modern scientists and decision makers, and there are many existing frameworks to support their creation, execution and composition. Many frameworks are based on proprietary interfaces, and do not lend themselves to the integration of models from diverse disciplines. Web based systems, or systems based on web services, such as Taverna and Kepler, allow composition of models based on standard web service technologies. At the same time the Open Geospatial Consortium has been developing their own service stack, which includes the Web Processing Service, designed to facilitate the executing of geospatial processing - including complex environmental models. The current Open Geospatial Consortium service stack employs Extensible Markup Language as a default data exchange standard, and widely-used encodings such as JavaScript Object Notation can often only be used when incorporated with Extensible Markup Language. Similarly, no successful engagement of the Web Processing Service standard with the well-supported technologies of Simple Object Access Protocol and Web Services Description Language has been seen. In this paper we propose a pure Simple Object Access Protocol/Web Services Description Language processing service which addresses some of the issues with the Web Processing Service specication and brings us closer to achieving a degree of interoperability between geospatial models, and thus realising the vision of a useful 'model web'.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Natural language understanding (NLU) aims to map sentences to their semantic mean representations. Statistical approaches to NLU normally require fully-annotated training data where each sentence is paired with its word-level semantic annotations. In this paper, we propose a novel learning framework which trains the Hidden Markov Support Vector Machines (HM-SVMs) without the use of expensive fully-annotated data. In particular, our learning approach takes as input a training set of sentences labeled with abstract semantic annotations encoding underlying embedded structural relations and automatically induces derivation rules that map sentences to their semantic meaning representations. The proposed approach has been tested on the DARPA Communicator Data and achieved 93.18% in F-measure, which outperforms the previously proposed approaches of training the hidden vector state model or conditional random fields from unaligned data, with a relative error reduction rate of 43.3% and 10.6% being achieved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The semantic web vision is one in which rich, ontology-based semantic markup will become widely available. The availability of semantic markup on the web opens the way to novel, sophisticated forms of question answering. AquaLog is a portable question-answering system which takes queries expressed in natural language and an ontology as input, and returns answers drawn from one or more knowledge bases (KBs). We say that AquaLog is portable because the configuration time required to customize the system for a particular ontology is negligible. AquaLog presents an elegant solution in which different strategies are combined together in a novel way. It makes use of the GATE NLP platform, string metric algorithms, WordNet and a novel ontology-based relation similarity service to make sense of user queries with respect to the target KB. Moreover it also includes a learning component, which ensures that the performance of the system improves over the time, in response to the particular community jargon used by end users.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The semantic web (SW) vision is one in which rich, ontology-based semantic markup will become widely available. The availability of semantic markup on the web opens the way to novel, sophisticated forms of question answering. AquaLog is a portable question-answering system which takes queries expressed in natural language (NL) and an ontology as input, and returns answers drawn from one or more knowledge bases (KB). AquaLog presents an elegant solution in which different strategies are combined together in a novel way. AquaLog novel ontology-based relation similarity service makes sense of user queries.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Information can be expressed in many ways according to the different capacities of humans to perceive it. Current systems deals with multimedia, multiformat and multiplatform systems but another « multi » is still pending to guarantee global access to information, that is, multilinguality. Different languages imply different replications of the systems according to the language in question. No solutions appear to represent the bridge between the human representation (natural language) and a system-oriented representation. The United Nations University defined in 1997 a language to be the support of effective multilinguism in Internet. In this paper, we describe this language and its possible applications beyond multilingual services as the possible future standard for different language independent applications.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

* This paper was made according to the program of fundamental scientific research of the Presidium of the Russian Academy of Sciences «Mathematical simulation and intellectual systems», the project "Theoretical foundation of the intellectual systems based on ontologies for intellectual support of scientific researches".

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This article briefly reviews multilingual language resources for Bulgarian, developed in the frame of some international projects: the first-ever annotated Bulgarian MTE digital lexical resources, Bulgarian-Polish corpus, Bulgarian-Slovak parallel and aligned corpus, and Bulgarian-Polish-Lithuanian corpus. These resources are valuable multilingual dataset for language engineering research and development for Bulgarian language. The multilingual corpora are large repositories of language data with an important role in preserving and supporting the world's cultural heritage, because the natural language is an outstanding part of the human cultural values and collective memory, and a bridge between cultures.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The article briefly reviews bilingual Slovak-Bulgarian/Bulgarian-Slovak parallel and aligned corpus. The corpus is collected and developed as results of the collaboration in the frameworks of the joint research project between Institute of Mathematics and Informatics, Bulgarian Academy of Sciences, and Ľ. Štúr Institute of Linguistics, Slovak Academy of Sciences. The multilingual corpora are large repositories of language data with an important role in preserving and supporting the world's cultural heritage, because the natural language is an outstanding part of the human cultural values and collective memory, and a bridge between cultures. This bilingual corpus will be widely applicable to the contrastive studies of the both Slavic languages, will also be useful resource for language engineering research and development, especially in machine translation.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper introduces a quantitative method for identifying newly emerging word forms in large time-stamped corpora of natural language and then describes an analysis of lexical emergence in American social media using this method based on a multi-billion word corpus of Tweets collected between October 2013 and November 2014. In total 29 emerging word forms, which represent various semantic classes, grammatical parts-of speech, and word formations processes, were identified through this analysis. These 29 forms are then examined from various perspectives in order to begin to better understand the process of lexical emergence.