979 resultados para Natural Language


Relevância:

60.00% 60.00%

Publicador:

Resumo:

Quantitative databases are limited to information identified as important by their creators, while databases containing natural language are limited by our ability to analyze large unstructured bodies of text. Leximancer is a tool that uses semantic mapping to develop concept maps from natural language. We have applied Leximancer to educational based pathology case notes to demonstrate how real patient records or databases of case studies could be analyzed to identify unique relationships. We then discuss how such analysis could be used to conduct quantitative analysis from databases such as the Coronary Heart Disease Database.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Automatic ontology building is a vital issue in many fields where they are currently built manually. This paper presents a user-centred methodology for ontology construction based on the use of Machine Learning and Natural Language Processing. In our approach, the user selects a corpus of texts and sketches a preliminary ontology (or selects an existing one) for a domain with a preliminary vocabulary associated to the elements in the ontology (lexicalisations). Examples of sentences involving such lexicalisation (e.g. ISA relation) in the corpus are automatically retrieved by the system. Retrieved examples are validated by the user and used by an adaptive Information Extraction system to generate patterns that discover other lexicalisations of the same objects in the ontology, possibly identifying new concepts or relations. New instances are added to the existing ontology or used to tune it. This process is repeated until a satisfactory ontology is obtained. The methodology largely automates the ontology construction process and the output is an ontology with an associated trained leaner to be used for further ontology modifications.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

With this paper, we propose a set of techniques to largely automate the process of KA, by using technologies based on Information Extraction (IE) , Information Retrieval and Natural Language Processing. We aim to reduce all the impeding factors mention above and thereby contribute to the wider utility of the knowledge management tools. In particular we intend to reduce the introspection of knowledge engineers or the extended elicitations of knowledge from experts by extensive textual analysis using a variety of methods and tools, as texts are largely available and in them - we believe - lies most of an organization's memory.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The fundamental failure of current approaches to ontology learning is to view it as single pipeline with one or more specific inputs and a single static output. In this paper, we present a novel approach to ontology learning which takes an iterative view of knowledge acquisition for ontologies. Our approach is founded on three open-ended resources: a set of texts, a set of learning patterns and a set of ontological triples, and the system seeks to maintain these in equilibrium. As events occur which disturb this equilibrium, actions are triggered to re-establish a balance between the resources. We present a gold standard based evaluation of the final output of the system, the intermediate output showing the iterative process and a comparison of performance using different seed input. The results are comparable to existing performance in the literature.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The use of ontologies as representations of knowledge is widespread but their construction, until recently, has been entirely manual. We argue in this paper for the use of text corpora and automated natural language processing methods for the construction of ontologies. We delineate the challenges and present criteria for the selection of appropriate methods. We distinguish three ma jor steps in ontology building: associating terms, constructing hierarchies and labelling relations. A number of methods are presented for these purposes but we conclude that the issue of data-sparsity still is a ma jor challenge. We argue for the use of resources external tot he domain specific corpus.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Yorick Wilks is a central figure in the fields of Natural Language Processing and Artificial Intelligence. His influence extends to many areas and includes contributions to Machines Translation, word sense disambiguation, dialogue modeling and Information Extraction. This book celebrates the work of Yorick Wilks in the form of a selection of his papers which are intended to reflect the range and depth of his work. The volume accompanies a Festschrift which celebrates his contribution to the fields of Computational Linguistics and Artificial Intelligence. The papers include early work carried out at Cambridge University, descriptions of groundbreaking work on Machine Translation and Preference Semantics as well as more recent works on belief modeling and computational semantics. The selected papers reflect Yorick’s contribution to both practical and theoretical aspects of automatic language processing.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Corpora—large collections of written and/or spoken text stored and accessed electronically—provide the means of investigating language that is of growing importance academically and professionally. Corpora are now routinely used in the following fields: •the production of dictionaries and other reference materials; •the development of aids to translation; •language teaching materials; •the investigation of ideologies and cultural assumptions; •natural language processing; and •the investigation of all aspects of linguistic behaviour, including vocabulary, grammar and pragmatics.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

For more than forty years, research has been on going in the use of the computer in the processing of natural language. During this period methods have evolved, with various parsing techniques and grammars coming to prominence. Problems still exist, not least in the field of Machine Translation. However, one of the successes in this field is the translation of sublanguage. The present work reports Deterministic Parsing, a relatively new parsing technique, and its application to the sublanguage of an aircraft maintenance manual for Machine Translation. The aim has been to investigate the practicability of using Deterministic Parsers in the analysis stage of a Machine Translation system. Machine Translation, Sublanguage and parsing are described in general terms with a review of Deterministic parsing systems, pertinent to this research, being presented in detail. The interaction between machine Translation, Sublanguage and Parsing, including Deterministic parsing, is also highlighted. Two types of Deterministic Parser have been investigated, a Marcus-type parser, based on the basic design of the original Deterministic parser (Marcus, 1980) and an LR-type Deterministic Parser for natural language, based on the LR parsing algorithm. In total, four Deterministic Parsers have been built and are described in the thesis. Two of the Deterministic Parsers are prototypes from which the remaining two parsers to be used on sublanguage have been developed. This thesis reports the results of parsing by the prototypes, a Marcus-type parser and an LR-type parser which have a similar grammatical and linguistic range to the original Marcus parser. The Marcus-type parser uses a grammar of production rules, whereas the LR-type parser employs a Definite Clause Grammar(DGC).

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Investigation of the different approaches used by Expert Systems researchers to solve problems in the domain of Mechanical Design and Expert Systems was carried out. The techniques used for conventional formal logic programming were compared with those used when applying Expert Systems concepts. A literature survey of design processes was also conducted with a view to adopting a suitable model of the design process. A model, comprising a variation on two established ones, was developed and applied to a problem within what are described as class 3 design tasks. The research explored the application of these concepts to Mechanical Engineering Design problems and their implementation on a microcomputer using an Expert System building tool. It was necessary to explore the use of Expert Systems in this manner so as to bridge the gap between their use as a control structure and for detailed analytical design. The former application is well researched into and this thesis discusses the latter. Some Expert System building tools available to the author at the beginning of his work were evaluated specifically for their suitability for Mechanical Engineering design problems. Microsynics was found to be the most suitable on which to implement a design problem because of its simple but powerful Semantic Net Knowledge Representation structure and the ability to use other types of representation schemes. Two major implementations were carried out. The first involved a design program for a Helical compression spring and the second a gearpair system design. Two concepts were proposed in the thesis for the modelling and implementation of design systems involving many equations. The method proposed enables equation manipulation and analysis using a combination of frames, semantic nets and production rules. The use of semantic nets for purposes other than for psychology and natural language interpretation, is quite new and represents one of the major contributions to knowledge by the author. The development of a purpose built shell program for this type of design problems was recommended as an extension of the research. Microsynics may usefully be used as a platform for this development.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper proposes a novel framework of incorporating protein-protein interactions (PPI) ontology knowledge into PPI extraction from biomedical literature in order to address the emerging challenges of deep natural language understanding. It is built upon the existing work on relation extraction using the Hidden Vector State (HVS) model. The HVS model belongs to the category of statistical learning methods. It can be trained directly from un-annotated data in a constrained way whilst at the same time being able to capture the underlying named entity relationships. However, it is difficult to incorporate background knowledge or non-local information into the HVS model. This paper proposes to represent the HVS model as a conditionally trained undirected graphical model in which non-local features derived from PPI ontology through inference would be easily incorporated. The seamless fusion of ontology inference with statistical learning produces a new paradigm to information extraction.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Text classification is essential for narrowing down the number of documents relevant to a particular topic for further pursual, especially when searching through large biomedical databases. Protein-protein interactions are an example of such a topic with databases being devoted specifically to them. This paper proposed a semi-supervised learning algorithm via local learning with class priors (LL-CP) for biomedical text classification where unlabeled data points are classified in a vector space based on their proximity to labeled nodes. The algorithm has been evaluated on a corpus of biomedical documents to identify abstracts containing information about protein-protein interactions with promising results. Experimental results show that LL-CP outperforms the traditional semisupervised learning algorithms such as SVMand it also performs better than local learning without incorporating class priors.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper summarizes the scientific work presented at the 32nd European Conference on Information Retrieval. It demonstrates that information retrieval (IR) as a research area continues to thrive with progress being made in three complementary sub-fields, namely IR theory and formal methods together with indexing and query representation issues, furthermore Web IR as a primary application area and finally research into evaluation methods and metrics. It is the combination of these areas that gives IR its solid scientific foundations. The paper also illustrates that significant progress has been made in other areas of IR. The keynote speakers addressed three such subject fields, social search engines using personalization and recommendation technologies, the renewed interest in applying natural language processing to IR, and multimedia IR as another fast-growing area.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper presents a comparative study of three closely related Bayesian models for unsupervised document level sentiment classification, namely, the latent sentiment model (LSM), the joint sentiment-topic (JST) model, and the Reverse-JST model. Extensive experiments have been conducted on two corpora, the movie review dataset and the multi-domain sentiment dataset. It has been found that while all the three models achieve either better or comparable performance on these two corpora when compared to the existing unsupervised sentiment classification approaches, both JST and Reverse-JST are able to extract sentiment-oriented topics. In addition, Reverse-JST always performs worse than JST suggesting that the JST model is more appropriate for joint sentiment topic detection.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Natural language understanding (NLU) aims to map sentences to their semantic mean representations. Statistical approaches to NLU normally require fully-annotated training data where each sentence is paired with its word-level semantic annotations. In this paper, we propose a novel learning framework which trains the Hidden Markov Support Vector Machines (HM-SVMs) without the use of expensive fully-annotated data. In particular, our learning approach takes as input a training set of sentences labeled with abstract semantic annotations encoding underlying embedded structural relations and automatically induces derivation rules that map sentences to their semantic meaning representations. The proposed approach has been tested on the DARPA Communicator Data and achieved 93.18% in F-measure, which outperforms the previously proposed approaches of training the hidden vector state model or conditional random fields from unaligned data, with a relative error reduction rate of 43.3% and 10.6% being achieved.