967 resultados para Natural language techniques, Semantic spaces, Random projection, Documents


Relevância:

100.00% 100.00%

Publicador:

Resumo:

While semantic search technologies have been proven to work well in specific domains, they still have to confront two main challenges to scale up to the Web in its entirety. In this work we address this issue with a novel semantic search system that a) provides the user with the capability to query Semantic Web information using natural language, by means of an ontology-based Question Answering (QA) system [14] and b) complements the specific answers retrieved during the QA process with a ranked list of documents from the Web [3]. Our results show that ontology-based semantic search capabilities can be used to complement and enhance keyword search technologies.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The given work is devoted to development of the computer-aided system of semantic text analysis of a technical specification. The purpose of this work is to increase efficiency of software engineering based on automation of semantic text analysis of a technical specification. In work it is offered and investigated the model of the analysis of the text of the technical project is submitted, the attribute grammar of a technical specification, intended for formalization of limited Russian is constructed with the purpose of analysis of offers of text of a technical specification, style features of the technical project as class of documents are considered, recommendations on preparation of text of a technical specification for the automated processing are formulated. The computer-aided system of semantic text analysis of a technical specification is considered. This system consists of the following subsystems: preliminary text processing, the syntactic and semantic analysis and construction of software models, storage of documents and interface.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The given work is devoted to development of the computer-aided system of semantic text analysis of a technical specification. The purpose of this work is to increase efficiency of software engineering based on automation of semantic text analysis of a technical specification. In work it is offered and investigated a technique of the text analysis of a technical specification is submitted, the expanded fuzzy attribute grammar of a technical specification, intended for formalization of limited Russian language is constructed with the purpose of analysis of offers of text of a technical specification, style features of the technical specification as class of documents are considered, recommendations on preparation of text of a technical specification for the automated processing are formulated. The computer-aided system of semantic text analysis of a technical specification is considered. This system consist of the following subsystems: preliminary text processing, the syntactic and semantic analysis and construction of software models, storage of documents and interface.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Question Answering systems that resort to the Semantic Web as a knowledge base can go well beyond the usual matching words in documents and, preferably, find a precise answer, without requiring user help to interpret the documents returned. In this paper, the authors introduce a Dialogue Manager that, through the analysis of the question and the type of expected answer, provides accurate answers to the questions posed in Natural Language. The Dialogue Manager not only represents the semantics of the questions, but also represents the structure of the discourse, including the user intentions and the questions context, adding the ability to deal with multiple answers and providing justified answers. The authors’ system performance is evaluated by comparing with similar question answering systems. Although the test suite is slight dimension, the results obtained are very promising.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Dissertação para obtenção do Grau de Mestre em Engenharia Electrotécnica e de Computadores

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Desarrollo de un sistema capaz de procesar consultas en lenguaje natural introducidas por el usuario mediante el teclado. El sistema es capaz de responder a consultas en castellano, relacionadas con un dominio de aplicación representado mediante una base de datos relacional.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In the past, research in ontology learning from text has mainly focused on entity recognition, taxonomy induction and relation extraction. In this work we approach a challenging research issue: detecting semantic frames from texts and using them to encode web ontologies. We exploit a new generation Natural Language Processing technology for frame detection, and we enrich the frames acquired so far with argument restrictions provided by a super-sense tagger and domain specializations. The results are encoded according to a Linguistic MetaModel, which allows a complete translation of lexical resources and data acquired from text, enabling custom transformations of the enriched frames into modular ontology components.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Thèse numérisée par la Division de la gestion de documents et des archives de l'Université de Montréal

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The goal of the work reported here is to capture the commonsense knowledge of non-expert human contributors. Achieving this goal will enable more intelligent human-computer interfaces and pave the way for computers to reason about our world. In the domain of natural language processing, it will provide the world knowledge much needed for semantic processing of natural language. To acquire knowledge from contributors not trained in knowledge engineering, I take the following four steps: (i) develop a knowledge representation (KR) model for simple assertions in natural language, (ii) introduce cumulative analogy, a class of nearest-neighbor based analogical reasoning algorithms over this representation, (iii) argue that cumulative analogy is well suited for knowledge acquisition (KA) based on a theoretical analysis of effectiveness of KA with this approach, and (iv) test the KR model and the effectiveness of the cumulative analogy algorithms empirically. To investigate effectiveness of cumulative analogy for KA empirically, Learner, an open source system for KA by cumulative analogy has been implemented, deployed, and evaluated. (The site "1001 Questions," is available at http://teach-computers.org/learner.html). Learner acquires assertion-level knowledge by constructing shallow semantic analogies between a KA topic and its nearest neighbors and posing these analogies as natural language questions to human contributors. Suppose, for example, that based on the knowledge about "newspapers" already present in the knowledge base, Learner judges "newspaper" to be similar to "book" and "magazine." Further suppose that assertions "books contain information" and "magazines contain information" are also already in the knowledge base. Then Learner will use cumulative analogy from the similar topics to ask humans whether "newspapers contain information." Because similarity between topics is computed based on what is already known about them, Learner exhibits bootstrapping behavior --- the quality of its questions improves as it gathers more knowledge. By summing evidence for and against posing any given question, Learner also exhibits noise tolerance, limiting the effect of incorrect similarities. The KA power of shallow semantic analogy from nearest neighbors is one of the main findings of this thesis. I perform an analysis of commonsense knowledge collected by another research effort that did not rely on analogical reasoning and demonstrate that indeed there is sufficient amount of correlation in the knowledge base to motivate using cumulative analogy from nearest neighbors as a KA method. Empirically, evaluating the percentages of questions answered affirmatively, negatively and judged to be nonsensical in the cumulative analogy case compares favorably with the baseline, no-similarity case that relies on random objects rather than nearest neighbors. Of the questions generated by cumulative analogy, contributors answered 45% affirmatively, 28% negatively and marked 13% as nonsensical; in the control, no-similarity case 8% of questions were answered affirmatively, 60% negatively and 26% were marked as nonsensical.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper is about the use of natural language to communicate with computers. Most researches that have pursued this goal consider only requests expressed in English. A way to facilitate the use of several languages in natural language systems is by using an interlingua. An interlingua is an intermediary representation for natural language information that can be processed by machines. We propose to convert natural language requests into an interlingua [universal networking language (UNL)] and to execute these requests using software components. In order to achieve this goal, we propose OntoMap, an ontology-based architecture to perform the semantic mapping between UNL sentences and software components. OntoMap also performs component search and retrieval based on semantic information formalized in ontologies and rules.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Identifying the correct sense of a word in context is crucial for many tasks in natural language processing (machine translation is an example). State-of-the art methods for Word Sense Disambiguation (WSD) build models using hand-crafted features that usually capturing shallow linguistic information. Complex background knowledge, such as semantic relationships, are typically either not used, or used in specialised manner, due to the limitations of the feature-based modelling techniques used. On the other hand, empirical results from the use of Inductive Logic Programming (ILP) systems have repeatedly shown that they can use diverse sources of background knowledge when constructing models. In this paper, we investigate whether this ability of ILP systems could be used to improve the predictive accuracy of models for WSD. Specifically, we examine the use of a general-purpose ILP system as a method to construct a set of features using semantic, syntactic and lexical information. This feature-set is then used by a common modelling technique in the field (a support vector machine) to construct a classifier for predicting the sense of a word. In our investigation we examine one-shot and incremental approaches to feature-set construction applied to monolingual and bilingual WSD tasks. The monolingual tasks use 32 verbs and 85 verbs and nouns (in English) from the SENSEVAL-3 and SemEval-2007 benchmarks; while the bilingual WSD task consists of 7 highly ambiguous verbs in translating from English to Portuguese. The results are encouraging: the ILP-assisted models show substantial improvements over those that simply use shallow features. In addition, incremental feature-set construction appears to identify smaller and better sets of features. Taken together, the results suggest that the use of ILP with diverse sources of background knowledge provide a way for making substantial progress in the field of WSD.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

With the increasing production of information from e-government initiatives, there is also the need to transform a large volume of unstructured data into useful information for society. All this information should be easily accessible and made available in a meaningful and effective way in order to achieve semantic interoperability in electronic government services, which is a challenge to be pursued by governments round the world. Our aim is to discuss the context of e-Government Big Data and to present a framework to promote semantic interoperability through automatic generation of ontologies from unstructured information found in the Internet. We propose the use of fuzzy mechanisms to deal with natural language terms and present some related works found in this area. The results achieved in this study are based on the architectural definition and major components and requirements in order to compose the proposed framework. With this, it is possible to take advantage of the large volume of information generated from e-Government initiatives and use it to benefit society.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

While the use of statistical physics methods to analyze large corpora has been useful to unveil many patterns in texts, no comprehensive investigation has been performed on the interdependence between syntactic and semantic factors. In this study we propose a framework for determining whether a text (e.g., written in an unknown alphabet) is compatible with a natural language and to which language it could belong. The approach is based on three types of statistical measurements, i.e. obtained from first-order statistics of word properties in a text, from the topology of complex networks representing texts, and from intermittency concepts where text is treated as a time series. Comparative experiments were performed with the New Testament in 15 different languages and with distinct books in English and Portuguese in order to quantify the dependency of the different measurements on the language and on the story being told in the book. The metrics found to be informative in distinguishing real texts from their shuffled versions include assortativity, degree and selectivity of words. As an illustration, we analyze an undeciphered medieval manuscript known as the Voynich Manuscript. We show that it is mostly compatible with natural languages and incompatible with random texts. We also obtain candidates for keywords of the Voynich Manuscript which could be helpful in the effort of deciphering it. Because we were able to identify statistical measurements that are more dependent on the syntax than on the semantics, the framework may also serve for text analysis in language-dependent applications.