877 resultados para Natural Language
Resumo:
The use of ontologies as representations of knowledge is widespread but their construction, until recently, has been entirely manual. We argue in this paper for the use of text corpora and automated natural language processing methods for the construction of ontologies. We delineate the challenges and present criteria for the selection of appropriate methods. We distinguish three ma jor steps in ontology building: associating terms, constructing hierarchies and labelling relations. A number of methods are presented for these purposes but we conclude that the issue of data-sparsity still is a ma jor challenge. We argue for the use of resources external tot he domain specific corpus.
Resumo:
Yorick Wilks is a central figure in the fields of Natural Language Processing and Artificial Intelligence. His influence extends to many areas and includes contributions to Machines Translation, word sense disambiguation, dialogue modeling and Information Extraction. This book celebrates the work of Yorick Wilks in the form of a selection of his papers which are intended to reflect the range and depth of his work. The volume accompanies a Festschrift which celebrates his contribution to the fields of Computational Linguistics and Artificial Intelligence. The papers include early work carried out at Cambridge University, descriptions of groundbreaking work on Machine Translation and Preference Semantics as well as more recent works on belief modeling and computational semantics. The selected papers reflect Yorick’s contribution to both practical and theoretical aspects of automatic language processing.
Resumo:
Corpora—large collections of written and/or spoken text stored and accessed electronically—provide the means of investigating language that is of growing importance academically and professionally. Corpora are now routinely used in the following fields: •the production of dictionaries and other reference materials; •the development of aids to translation; •language teaching materials; •the investigation of ideologies and cultural assumptions; •natural language processing; and •the investigation of all aspects of linguistic behaviour, including vocabulary, grammar and pragmatics.
Resumo:
For more than forty years, research has been on going in the use of the computer in the processing of natural language. During this period methods have evolved, with various parsing techniques and grammars coming to prominence. Problems still exist, not least in the field of Machine Translation. However, one of the successes in this field is the translation of sublanguage. The present work reports Deterministic Parsing, a relatively new parsing technique, and its application to the sublanguage of an aircraft maintenance manual for Machine Translation. The aim has been to investigate the practicability of using Deterministic Parsers in the analysis stage of a Machine Translation system. Machine Translation, Sublanguage and parsing are described in general terms with a review of Deterministic parsing systems, pertinent to this research, being presented in detail. The interaction between machine Translation, Sublanguage and Parsing, including Deterministic parsing, is also highlighted. Two types of Deterministic Parser have been investigated, a Marcus-type parser, based on the basic design of the original Deterministic parser (Marcus, 1980) and an LR-type Deterministic Parser for natural language, based on the LR parsing algorithm. In total, four Deterministic Parsers have been built and are described in the thesis. Two of the Deterministic Parsers are prototypes from which the remaining two parsers to be used on sublanguage have been developed. This thesis reports the results of parsing by the prototypes, a Marcus-type parser and an LR-type parser which have a similar grammatical and linguistic range to the original Marcus parser. The Marcus-type parser uses a grammar of production rules, whereas the LR-type parser employs a Definite Clause Grammar(DGC).
Resumo:
Investigation of the different approaches used by Expert Systems researchers to solve problems in the domain of Mechanical Design and Expert Systems was carried out. The techniques used for conventional formal logic programming were compared with those used when applying Expert Systems concepts. A literature survey of design processes was also conducted with a view to adopting a suitable model of the design process. A model, comprising a variation on two established ones, was developed and applied to a problem within what are described as class 3 design tasks. The research explored the application of these concepts to Mechanical Engineering Design problems and their implementation on a microcomputer using an Expert System building tool. It was necessary to explore the use of Expert Systems in this manner so as to bridge the gap between their use as a control structure and for detailed analytical design. The former application is well researched into and this thesis discusses the latter. Some Expert System building tools available to the author at the beginning of his work were evaluated specifically for their suitability for Mechanical Engineering design problems. Microsynics was found to be the most suitable on which to implement a design problem because of its simple but powerful Semantic Net Knowledge Representation structure and the ability to use other types of representation schemes. Two major implementations were carried out. The first involved a design program for a Helical compression spring and the second a gearpair system design. Two concepts were proposed in the thesis for the modelling and implementation of design systems involving many equations. The method proposed enables equation manipulation and analysis using a combination of frames, semantic nets and production rules. The use of semantic nets for purposes other than for psychology and natural language interpretation, is quite new and represents one of the major contributions to knowledge by the author. The development of a purpose built shell program for this type of design problems was recommended as an extension of the research. Microsynics may usefully be used as a platform for this development.
Resumo:
This paper proposes a novel framework of incorporating protein-protein interactions (PPI) ontology knowledge into PPI extraction from biomedical literature in order to address the emerging challenges of deep natural language understanding. It is built upon the existing work on relation extraction using the Hidden Vector State (HVS) model. The HVS model belongs to the category of statistical learning methods. It can be trained directly from un-annotated data in a constrained way whilst at the same time being able to capture the underlying named entity relationships. However, it is difficult to incorporate background knowledge or non-local information into the HVS model. This paper proposes to represent the HVS model as a conditionally trained undirected graphical model in which non-local features derived from PPI ontology through inference would be easily incorporated. The seamless fusion of ontology inference with statistical learning produces a new paradigm to information extraction.
Resumo:
Text classification is essential for narrowing down the number of documents relevant to a particular topic for further pursual, especially when searching through large biomedical databases. Protein-protein interactions are an example of such a topic with databases being devoted specifically to them. This paper proposed a semi-supervised learning algorithm via local learning with class priors (LL-CP) for biomedical text classification where unlabeled data points are classified in a vector space based on their proximity to labeled nodes. The algorithm has been evaluated on a corpus of biomedical documents to identify abstracts containing information about protein-protein interactions with promising results. Experimental results show that LL-CP outperforms the traditional semisupervised learning algorithms such as SVMand it also performs better than local learning without incorporating class priors.
Resumo:
This paper summarizes the scientific work presented at the 32nd European Conference on Information Retrieval. It demonstrates that information retrieval (IR) as a research area continues to thrive with progress being made in three complementary sub-fields, namely IR theory and formal methods together with indexing and query representation issues, furthermore Web IR as a primary application area and finally research into evaluation methods and metrics. It is the combination of these areas that gives IR its solid scientific foundations. The paper also illustrates that significant progress has been made in other areas of IR. The keynote speakers addressed three such subject fields, social search engines using personalization and recommendation technologies, the renewed interest in applying natural language processing to IR, and multimedia IR as another fast-growing area.
Resumo:
This paper presents a comparative study of three closely related Bayesian models for unsupervised document level sentiment classification, namely, the latent sentiment model (LSM), the joint sentiment-topic (JST) model, and the Reverse-JST model. Extensive experiments have been conducted on two corpora, the movie review dataset and the multi-domain sentiment dataset. It has been found that while all the three models achieve either better or comparable performance on these two corpora when compared to the existing unsupervised sentiment classification approaches, both JST and Reverse-JST are able to extract sentiment-oriented topics. In addition, Reverse-JST always performs worse than JST suggesting that the JST model is more appropriate for joint sentiment topic detection.
Resumo:
Natural language understanding (NLU) aims to map sentences to their semantic mean representations. Statistical approaches to NLU normally require fully-annotated training data where each sentence is paired with its word-level semantic annotations. In this paper, we propose a novel learning framework which trains the Hidden Markov Support Vector Machines (HM-SVMs) without the use of expensive fully-annotated data. In particular, our learning approach takes as input a training set of sentences labeled with abstract semantic annotations encoding underlying embedded structural relations and automatically induces derivation rules that map sentences to their semantic meaning representations. The proposed approach has been tested on the DARPA Communicator Data and achieved 93.18% in F-measure, which outperforms the previously proposed approaches of training the hidden vector state model or conditional random fields from unaligned data, with a relative error reduction rate of 43.3% and 10.6% being achieved.
Resumo:
All aspects of the concept of collocation – the phenomenon whereby words naturally tend to occur in the company of a restricted set of other words – are covered in this book. It deals in detail with the history of the word collocation, the concepts associated with it and its use in a linguistic context. The authors show the practical means by which the collocational behaviour of words can be explored using illustrative computer programs and examine applications in teaching, lexicography and natural language processing that use collocation in formation. The book investigates the place that collocation occupies in theories of language and provides a thoroughly comprehensive and up-to-date survey of the current position of collocation in language studies and applied linguistics. This text presents a comprehensive description of collocation, covering both the theoretical and practical background and the implications and applications of the concept as language model and analytical tool. It provides a definitive survey of currently available techniques and a detailed description of their implementation.
Resumo:
In this demonstration, we will present a semantic environment called the K-Box. The K-Box supports the lightweight integration of knowledge tools, with a focus on semantic tools, but with the flexibility to integrate natural language and conventional tools. We discuss the implementation of the framework, and two existing applications, including details of a new application for developers of semantic workflows. The demonstration will be of interest to developers and researchers of ontology-based knowledge management systems, and semantic desktops, and to analysts working with cross-media information. © 2011 ACM.
Resumo:
Linked Data semantic sources, in particular DBpedia, can be used to answer many user queries. PowerAqua is an open multi-ontology Question Answering (QA) system for the Semantic Web (SW). However, the emergence of Linked Data, characterized by its openness, heterogeneity and scale, introduces a new dimension to the Semantic Web scenario, in which exploiting the relevant information to extract answers for Natural Language (NL) user queries is a major challenge. In this paper we discuss the issues and lessons learned from our experience of integrating PowerAqua as a front-end for DBpedia and a subset of Linked Data sources. As such, we go one step beyond the state of the art on end-users interfaces for Linked Data by introducing mapping and fusion techniques needed to translate a user query by means of multiple sources. Our first informal experiments probe whether, in fact, it is feasible to obtain answers to user queries by composing information across semantic sources and Linked Data, even in its current form, where the strength of Linked Data is more a by-product of its size than its quality. We believe our experiences can be extrapolated to a variety of end-user applications that wish to scale, open up, exploit and re-use what possibly is the greatest wealth of data about everything in the history of Artificial Intelligence. © 2010 Springer-Verlag.
Resumo:
PowerAqua is a Question Answering system, which takes as input a natural language query and is able to return answers drawn from relevant semantic resources found anywhere on the Semantic Web. In this paper we provide two novel contributions: First, we detail a new component of the system, the Triple Similarity Service, which is able to match queries effectively to triples found in different ontologies on the Semantic Web. Second, we provide a first evaluation of the system, which in addition to providing data about PowerAqua's competence, also gives us important insights into the issues related to using the Semantic Web as the target answer set in Question Answering. In particular, we show that, despite the problems related to the noisy and incomplete conceptualizations, which can be found on the Semantic Web, good results can already be obtained.