55 resultados para natural language processing

em Aston University Research Archive


Relevância:

100.00% 100.00%

Publicador:

Resumo:

The main argument of this paper is that Natural Language Processing (NLP) does, and will continue to, underlie the Semantic Web (SW), including its initial construction from unstructured sources like the World Wide Web (WWW), whether its advocates realise this or not. Chiefly, we argue, such NLP activity is the only way up to a defensible notion of meaning at conceptual levels (in the original SW diagram) based on lower level empirical computations over usage. Our aim is definitely not to claim logic-bad, NLP-good in any simple-minded way, but to argue that the SW will be a fascinating interaction of these two methodologies, again like the WWW (which has been basically a field for statistical NLP research) but with deeper content. Only NLP technologies (and chiefly information extraction) will be able to provide the requisite RDF knowledge stores for the SW from existing unstructured text databases in the WWW, and in the vast quantities needed. There is no alternative at this point, since a wholly or mostly hand-crafted SW is also unthinkable, as is a SW built from scratch and without reference to the WWW. We also assume that, whatever the limitations on current SW representational power we have drawn attention to here, the SW will continue to grow in a distributed manner so as to serve the needs of scientists, even if it is not perfect. The WWW has already shown how an imperfect artefact can become indispensable.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Procedural knowledge is the knowledge required to perform certain tasks, and forms an important part of expertise. A major source of procedural knowledge is natural language instructions. While these readable instructions have been useful learning resources for human, they are not interpretable by machines. Automatically acquiring procedural knowledge in machine interpretable formats from instructions has become an increasingly popular research topic due to their potential applications in process automation. However, it has been insufficiently addressed. This paper presents an approach and an implemented system to assist users to automatically acquire procedural knowledge in structured forms from instructions. We introduce a generic semantic representation of procedures for analysing instructions, using which natural language techniques are applied to automatically extract structured procedures from instructions. The method is evaluated in three domains to justify the generality of the proposed semantic representation as well as the effectiveness of the implemented automatic system.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper aims to identify the communication goal(s) of a user's information-seeking query out of a finite set of within-domain goals in natural language queries. It proposes using Tree-Augmented Naive Bayes networks (TANs) for goal detection. The problem is formulated as N binary decisions, and each is performed by a TAN. Comparative study has been carried out to compare the performance with Naive Bayes, fully-connected TANs, and multi-layer neural networks. Experimental results show that TANs consistently give better results when tested on the ATIS and DARPA Communicator corpora.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

An implementation of a Lexical Functional Grammar (LFG) natural language front-end to a database is presented, and its capabilities demonstrated by reference to a set of queries used in the Chat-80 system. The potential of LFG for such applications is explored. Other grammars previously used for this purpose are briefly reviewed and contrasted with LFG. The basic LFG formalism is fully described, both as to its syntax and semantics, and the deficiencies of the latter for database access application shown. Other current LFG implementations are reviewed and contrasted with the LFG implementation developed here specifically for database access. The implementation described here allows a natural language interface to a specific Prolog database to be produced from a set of grammar rule and lexical specifications in an LFG-like notation. In addition to this the interface system uses a simple database description to compile metadata about the database for later use in planning the execution of queries. Extensions to LFG's semantic component are shown to be necessary to produce a satisfactory functional analysis and semantic output for querying a database. A diverse set of natural language constructs are analysed using LFG and the derivation of Prolog queries from the F-structure output of LFG is illustrated. The functional description produced from LFG is proposed as sufficient for resolving many problems of quantification and attachment.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Humans are especially good at taking another's perspective-representing what others might be thinking or experiencing. This "mentalizing" capacity is apparent in everyday human interactions and conversations. We investigated its neural basis using magnetoencephalography. We focused on whether mentalizing was engaged spontaneously and routinely to understand an utterance's meaning or largely on-demand, to restore "common ground" when expectations were violated. Participants conversed with 1 of 2 confederate speakers and established tacit agreements about objects' names. In a subsequent "test" phase, some of these agreements were violated by either the same or a different speaker. Our analysis of the neural processing of test phase utterances revealed recruitment of neural circuits associated with language (temporal cortex), episodic memory (e.g., medial temporal lobe), and mentalizing (temporo-parietal junction and ventromedial prefrontal cortex). Theta oscillations (3-7 Hz) were modulated most prominently, and we observed phase coupling between functionally distinct neural circuits. The episodic memory and language circuits were recruited in anticipation of upcoming referring expressions, suggesting that context-sensitive predictions were spontaneously generated. In contrast, the mentalizing areas were recruited on-demand, as a means for detecting and resolving perceived pragmatic anomalies, with little evidence they were activated to make partner-specific predictions about upcoming linguistic utterances.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Natural language understanding is to specify a computational model that maps sentences to their semantic mean representation. In this paper, we propose a novel framework to train the statistical models without using expensive fully annotated data. In particular, the input of our framework is a set of sentences labeled with abstract semantic annotations. These annotations encode the underlying embedded semantic structural relations without explicit word/semantic tag alignment. The proposed framework can automatically induce derivation rules that map sentences to their semantic meaning representations. The learning framework is applied on two statistical models, the conditional random fields (CRFs) and the hidden Markov support vector machines (HM-SVMs). Our experimental results on the DARPA communicator data show that both CRFs and HM-SVMs outperform the baseline approach, previously proposed hidden vector state (HVS) model which is also trained on abstract semantic annotations. In addition, the proposed framework shows superior performance than two other baseline approaches, a hybrid framework combining HVS and HM-SVMs and discriminative training of HVS, with a relative error reduction rate of about 25% and 15% being achieved in F-measure.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This investigation is grounded within the concept of embodied cognition where the mind is considered to be part of a biological system. A first year undergraduate Mechanical Engineering cohort of students was tasked with explaining the behaviour of three balls of different masses being rolled down a ramp. The explanations given by the students highlighted the cognitive conflict between the everyday interpretation of the word energy and its mathematical use. The results showed that even after many years of schooling, students found it challenging to interpret the mathematics they had learned and relied upon pseudo-scientific notions to account for the behaviour of the balls.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Automatic ontology building is a vital issue in many fields where they are currently built manually. This paper presents a user-centred methodology for ontology construction based on the use of Machine Learning and Natural Language Processing. In our approach, the user selects a corpus of texts and sketches a preliminary ontology (or selects an existing one) for a domain with a preliminary vocabulary associated to the elements in the ontology (lexicalisations). Examples of sentences involving such lexicalisation (e.g. ISA relation) in the corpus are automatically retrieved by the system. Retrieved examples are validated by the user and used by an adaptive Information Extraction system to generate patterns that discover other lexicalisations of the same objects in the ontology, possibly identifying new concepts or relations. New instances are added to the existing ontology or used to tune it. This process is repeated until a satisfactory ontology is obtained. The methodology largely automates the ontology construction process and the output is an ontology with an associated trained leaner to be used for further ontology modifications.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

With this paper, we propose a set of techniques to largely automate the process of KA, by using technologies based on Information Extraction (IE) , Information Retrieval and Natural Language Processing. We aim to reduce all the impeding factors mention above and thereby contribute to the wider utility of the knowledge management tools. In particular we intend to reduce the introspection of knowledge engineers or the extended elicitations of knowledge from experts by extensive textual analysis using a variety of methods and tools, as texts are largely available and in them - we believe - lies most of an organization's memory.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The fundamental failure of current approaches to ontology learning is to view it as single pipeline with one or more specific inputs and a single static output. In this paper, we present a novel approach to ontology learning which takes an iterative view of knowledge acquisition for ontologies. Our approach is founded on three open-ended resources: a set of texts, a set of learning patterns and a set of ontological triples, and the system seeks to maintain these in equilibrium. As events occur which disturb this equilibrium, actions are triggered to re-establish a balance between the resources. We present a gold standard based evaluation of the final output of the system, the intermediate output showing the iterative process and a comparison of performance using different seed input. The results are comparable to existing performance in the literature.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The use of ontologies as representations of knowledge is widespread but their construction, until recently, has been entirely manual. We argue in this paper for the use of text corpora and automated natural language processing methods for the construction of ontologies. We delineate the challenges and present criteria for the selection of appropriate methods. We distinguish three ma jor steps in ontology building: associating terms, constructing hierarchies and labelling relations. A number of methods are presented for these purposes but we conclude that the issue of data-sparsity still is a ma jor challenge. We argue for the use of resources external tot he domain specific corpus.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Yorick Wilks is a central figure in the fields of Natural Language Processing and Artificial Intelligence. His influence extends to many areas and includes contributions to Machines Translation, word sense disambiguation, dialogue modeling and Information Extraction. This book celebrates the work of Yorick Wilks in the form of a selection of his papers which are intended to reflect the range and depth of his work. The volume accompanies a Festschrift which celebrates his contribution to the fields of Computational Linguistics and Artificial Intelligence. The papers include early work carried out at Cambridge University, descriptions of groundbreaking work on Machine Translation and Preference Semantics as well as more recent works on belief modeling and computational semantics. The selected papers reflect Yorick’s contribution to both practical and theoretical aspects of automatic language processing.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Corpora—large collections of written and/or spoken text stored and accessed electronically—provide the means of investigating language that is of growing importance academically and professionally. Corpora are now routinely used in the following fields: •the production of dictionaries and other reference materials; •the development of aids to translation; •language teaching materials; •the investigation of ideologies and cultural assumptions; •natural language processing; and •the investigation of all aspects of linguistic behaviour, including vocabulary, grammar and pragmatics.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Text classification is essential for narrowing down the number of documents relevant to a particular topic for further pursual, especially when searching through large biomedical databases. Protein-protein interactions are an example of such a topic with databases being devoted specifically to them. This paper proposed a semi-supervised learning algorithm via local learning with class priors (LL-CP) for biomedical text classification where unlabeled data points are classified in a vector space based on their proximity to labeled nodes. The algorithm has been evaluated on a corpus of biomedical documents to identify abstracts containing information about protein-protein interactions with promising results. Experimental results show that LL-CP outperforms the traditional semisupervised learning algorithms such as SVMand it also performs better than local learning without incorporating class priors.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper summarizes the scientific work presented at the 32nd European Conference on Information Retrieval. It demonstrates that information retrieval (IR) as a research area continues to thrive with progress being made in three complementary sub-fields, namely IR theory and formal methods together with indexing and query representation issues, furthermore Web IR as a primary application area and finally research into evaluation methods and metrics. It is the combination of these areas that gives IR its solid scientific foundations. The paper also illustrates that significant progress has been made in other areas of IR. The keynote speakers addressed three such subject fields, social search engines using personalization and recommendation technologies, the renewed interest in applying natural language processing to IR, and multimedia IR as another fast-growing area.