960 resultados para Word Sense Disambguaion, WSD, Natural Language Processing
Resumo:
The Semantic Web (SW) offers an opportunity to develop novel, sophisticated forms of question answering (QA). Specifically, the availability of distributed semantic markup on a large scale opens the way to QA systems which can make use of such semantic information to provide precise, formally derived answers to questions. At the same time the distributed, heterogeneous, large-scale nature of the semantic information introduces significant challenges. In this paper we describe the design of a QA system, PowerAqua, designed to exploit semantic markup on the web to provide answers to questions posed in natural language. PowerAqua does not assume that the user has any prior information about the semantic resources. The system takes as input a natural language query, translates it into a set of logical queries, which are then answered by consulting and aggregating information derived from multiple heterogeneous semantic sources.
Resumo:
The management and sharing of complex data, information and knowledge is a fundamental and growing concern in the Water and other Industries for a variety of reasons. For example, risks and uncertainties associated with climate, and other changes require knowledge to prepare for a range of future scenarios and potential extreme events. Formal ways in which knowledge can be established and managed can help deliver efficiencies on acquisition, structuring and filtering to provide only the essential aspects of the knowledge really needed. Ontologies are a key technology for this knowledge management. The construction of ontologies is a considerable overhead on any knowledge management programme. Hence current computer science research is investigating generating ontologies automatically from documents using text mining and natural language techniques. As an example of this, results from application of the Text2Onto tool to stakeholder documents for a project on sustainable water cycle management in new developments are presented. It is concluded that by adopting ontological representations sooner, rather than later in an analytical process, decision makers will be able to make better use of highly knowledgeable systems containing automated services to ensure that sustainability considerations are included.
Resumo:
The project “Reference in Discourse” deals with the selection of a specific object from a visual scene in a natural language situation. The goal of this research is to explain this everyday discourse reference task in terms of a concept generation process based on subconceptual visual and verbal information. The system OINC (Object Identification in Natural Communicators) aims at solving this problem in a psychologically adequate way. The system’s difficulties occurring with incomplete and deviant descriptions correspond to the data from experiments with human subjects. The results of these experiments are reported.
Resumo:
The management and sharing of complex data, information and knowledge is a fundamental and growing concern in the Water and other Industries for a variety of reasons. For example, risks and uncertainties associated with climate, and other changes require knowledge to prepare for a range of future scenarios and potential extreme events. Formal ways in which knowledge can be established and managed can help deliver efficiencies on acquisition, structuring and filtering to provide only the essential aspects of the knowledge really needed. Ontologies are a key technology for this knowledge management. The construction of ontologies is a considerable overhead on any knowledge management programme. Hence current computer science research is investigating generating ontologies automatically from documents using text mining and natural language techniques. As an example of this, results from application of the Text2Onto tool to stakeholder documents for a project on sustainable water cycle management in new developments are presented. It is concluded that by adopting ontological representations sooner, rather than later in an analytical process, decision makers will be able to make better use of highly knowledgeable systems containing automated services to ensure that sustainability considerations are included. © 2010 The authors.
Resumo:
The paper presents an approach to extraction of facts from texts of documents. This approach is based on using knowledge about the subject domain, specialized dictionary and the schemes of facts that describe fact structures taking into consideration both semantic and syntactic compatibility of elements of facts. Actually extracted facts combine into one structure the dictionary lexical objects found in the text and match them against concepts of subject domain ontology.
Resumo:
* This work was financially supported by RFBF-04-01-00858.
Resumo:
The biggest threat to any business is a lack of timely and accurate information. Without all the facts, businesses are pressured to make critical decisions and assess risks and opportunities based largely on guesswork, sometimes resulting in financial losses and missed opportunities. The meteoric rise of Databases (DB) appears to confirm the adage that “information is power”, but the stark reality is that information is useless if one has no way to find what one needs to know. It is more accurate perhaps to state that, “the ability to find information is power”. In this paper we show how Instantaneous Database Access System (IDAS) can make a crucial difference by pulling data together and allowing users to summarise information quickly from all areas of a business organisation.
Resumo:
An automated cognitive approach for the design of Information Systems is presented. It is supposed to be used at the very beginning of the design process, between the stages of requirements determination and analysis, including the stage of analysis. In the context of the approach used either UML or ERD notations may be used for model representation. The approach provides the opportunity of using natural language text documents as a source of knowledge for automated problem domain model generation. It also simplifies the process of modelling by assisting the human user during the whole period of working upon the model (using UML or ERD notations).
Resumo:
Projects solutions reuse methodology is offered for software development. The main idea consists in connection of the system objective with the situation using the entities which describe the condition of the system in the process of the objective statement. Every situation is associated with one or several design solutions, which can be used at the development. Based on this connection the situation representing language has been created, it lets to express a problem situation using a natural language describe. The similarity measure has been built to compare situations, it is based on the similarity coefficients with adding the absent part weight.
Resumo:
The paper deals with methods of choice in the INTERNET of natural-language textual fragments relevant to a given theme. Relevancy is estimated on the basis of semantic analysis of sentences. Recognition of syntactic and semantic connections between words of the text is carried out by the analysis of combinations of inflections and prepositions, without use of categories and rules of traditional grammar. Choice in the INTERNET of the thematic information is organized cyclically with automatic forming of the new key at every cycle when addressing to the INTERNET.
Resumo:
Systems analysis (SA) is widely used in complex and vague problem solving. Initial stages of SA are analysis of problems and purposes to obtain problems/purposes of smaller complexity and vagueness that are combined into hierarchical structures of problems(SP)/purposes(PS). Managers have to be sure the PS and the purpose realizing system (PRS) that can achieve the PS-purposes are adequate to the problem to be solved. However, usually SP/PS are not substantiated well enough, because their development is based on a collective expertise in which logic of natural language and expert estimation methods are used. That is why scientific foundations of SA are not supposed to have been completely formed. The structure-and-purpose approach to SA based on a logic-and-linguistic simulation of problems/purposes analysis is a step towards formalization of the initial stages of SA to improve adequacy of their results, and also towards increasing quality of SA as a whole. Managers of industrial organizing systems using the approach eliminate logical errors in SP/PS at early stages of planning and so they will be able to find better decisions of complex and vague problems.
Resumo:
Relatively little research on dialect variation has been based on corpora of naturally occurring language. Instead, dialect variation has been studied based primarily on language elicited through questionnaires and interviews. Eliciting dialect data has several advantages, including allowing for dialectologists to select individual informants, control the communicative situation in which language is collected, elicit rare forms directly, and make high-quality audio recordings. Although far less common, a corpus-based approach to data collection also has several advantages, including allowing for dialectologists to collect large amounts of data from a large number of informants, observe dialect variation across a range of communicative situations, and analyze quantitative linguistic variation in large samples of natural language. Although both approaches allow for dialect variation to be observed, they provide different perspectives on language variation and change. The corpus- based approach to dialectology has therefore produced a number of new findings, many of which challenge traditional assumptions about the nature of dialect variation. Most important, this research has shown that dialect variation involves a wider range of linguistic variables and exists across a wider range of language varieties than has previously been assumed. The goal of this chapter is to introduce this emerging approach to dialectology. The first part of this chapter reviews the growing body of research that analyzes dialect variation in corpora, including research on variation across nations, regions, genders, ages, and classes, in both speech and writing, and from both a synchronic and diachronic perspective, with a focus on dialect variation in the English language. Although collections of language data elicited through interviews and questionnaires are now commonly referred to as corpora in sociolinguistics and dialectology (e.g. see Bauer 2002; Tagliamonte 2006; Kretzschmar et al. 2006; D'Arcy 2011), this review focuses on corpora of naturally occurring texts and discourse. The second part of this chapter presents the results of an analysis of variation in not contraction across region, gender, and time in a corpus of American English letters to the editor in order to exemplify a corpus-based approach to dialectology.
Resumo:
The first study of its kind, Regional Variation in Written American English takes a corpus-based approach to map over a hundred grammatical alternation variables across the United States. A multivariate spatial analysis of these maps shows that grammatical alternation variables follow a relatively small number of common regional patterns in American English, which can be explained based on both linguistic and extra-linguistic factors. Based on this rigorous analysis of extensive data, Grieve identifies five primary modern American dialect regions, demonstrating that regional variation is far more pervasive and complex in natural language than is generally assumed. The wealth of maps and data and the groundbreaking implications of this volume make it essential reading for students and researchers in linguistics, English language, geography, computer science, sociology and communication studies.
Resumo:
The Everglades Online Thesaurus is a structured vocabulary of concepts and terms relating to the south Florida environment. Designed as an information management tool for both researchers and metadata creators, the Thesaurus is intended to improve information retrieval across the many disparate information systems, databases, and web sites that provide Everglades-related information. The vocabulary provided by the Everglades Online Thesaurus expresses each relevant concept using a single ‘preferred term’, whereas in natural language many terms may exist to express that same concept. In this way, the Thesaurus offers the possibility of standardizing the terminology used to describe Everglades-related information — an important factor in predictable and successful resource discovery.
Resumo:
There is a growing societal need to address the increasing prevalence of behavioral health issues, such as obesity, alcohol or drug use, and general lack of treatment adherence for a variety of health problems. The statistics, worldwide and in the USA, are daunting. Excessive alcohol use is the third leading preventable cause of death in the United States (with 79,000 deaths annually), and is responsible for a wide range of health and social problems. On the positive side though, these behavioral health issues (and associated possible diseases) can often be prevented with relatively simple lifestyle changes, such as losing weight with a diet and/or physical exercise, or learning how to reduce alcohol consumption. Medicine has therefore started to move toward finding ways of preventively promoting wellness, rather than solely treating already established illness. Evidence-based patient-centered Brief Motivational Interviewing (BMI) interven- tions have been found particularly effective in helping people find intrinsic motivation to change problem behaviors after short counseling sessions, and to maintain healthy lifestyles over the long-term. Lack of locally available personnel well-trained in BMI, however, often limits access to successful interventions for people in need. To fill this accessibility gap, Computer-Based Interventions (CBIs) have started to emerge. Success of the CBIs, however, critically relies on insuring engagement and retention of CBI users so that they remain motivated to use these systems and come back to use them over the long term as necessary. Because of their text-only interfaces, current CBIs can therefore only express limited empathy and rapport, which are the most important factors of health interventions. Fortunately, in the last decade, computer science research has progressed in the design of simulated human characters with anthropomorphic communicative abilities. Virtual characters interact using humans’ innate communication modalities, such as facial expressions, body language, speech, and natural language understanding. By advancing research in Artificial Intelligence (AI), we can improve the ability of artificial agents to help us solve CBI problems. To facilitate successful communication and social interaction between artificial agents and human partners, it is essential that aspects of human social behavior, especially empathy and rapport, be considered when designing human-computer interfaces. Hence, the goal of the present dissertation is to provide a computational model of rapport to enhance an artificial agent’s social behavior, and to provide an experimental tool for the psychological theories shaping the model. Parts of this thesis were already published in [LYL+12, AYL12, AL13, ALYR13, LAYR13, YALR13, ALY14].