938 resultados para Question-Answering System
Resumo:
The treatment of factual data has been widely studied in different areas of Natural Language Processing (NLP). However, processing subjective information still poses important challenges. This paper presents research aimed at assessing techniques that have been suggested as appropriate in the context of subjective - Opinion Question Answering (OQA). We evaluate the performance of an OQA with these new components and propose methods to optimally tackle the issues encountered. We assess the impact of including additional resources and processes with the purpose of improving the system performance on two distinct blog datasets. The improvements obtained for the different combination of tools are statistically significant. We thus conclude that the proposed approach is adequate for the OQA task, offering a good strategy to deal with opinionated questions.
Resumo:
In this paper we present a complete system for the treatment of both geographical and temporal dimensions in text and its application to information retrieval. This system has been evaluated in both the GeoTime task of the 8th and 9th NTCIR workshop in the years 2010 and 2011 respectively, making it possible to compare the system to contemporary approaches to the topic. In order to participate in this task we have added the temporal dimension to our GIR system. The system proposed here has a modular architecture in order to add or modify features. In the development of this system, we have followed a QA-based approach as well as multi-search engines to improve the system performance.
Resumo:
Linked Data semantic sources, in particular DBpedia, can be used to answer many user queries. PowerAqua is an open multi-ontology Question Answering (QA) system for the Semantic Web (SW). However, the emergence of Linked Data, characterized by its openness, heterogeneity and scale, introduces a new dimension to the Semantic Web scenario, in which exploiting the relevant information to extract answers for Natural Language (NL) user queries is a major challenge. In this paper we discuss the issues and lessons learned from our experience of integrating PowerAqua as a front-end for DBpedia and a subset of Linked Data sources. As such, we go one step beyond the state of the art on end-users interfaces for Linked Data by introducing mapping and fusion techniques needed to translate a user query by means of multiple sources. Our first informal experiments probe whether, in fact, it is feasible to obtain answers to user queries by composing information across semantic sources and Linked Data, even in its current form, where the strength of Linked Data is more a by-product of its size than its quality. We believe our experiences can be extrapolated to a variety of end-user applications that wish to scale, open up, exploit and re-use what possibly is the greatest wealth of data about everything in the history of Artificial Intelligence. © 2010 Springer-Verlag.
Resumo:
In this paper, we describe a voting mechanism for accurate named entity (NE) translation in English–Chinese question answering (QA). This mechanism involves translations from three different sources: machine translation,online encyclopaedia, and web documents. The translation with the highest number of votes is selected. We evaluated this approach using test collection, topics and assessment results from the NTCIR-8 evaluation forum. This mechanism achieved 95% accuracy in NEs translation and 0.3756 MAP in English–Chinese cross-lingual information retrieval of QA.
Resumo:
Collaborative question answering (cQA) portals such as Yahoo! Answers allow users as askers or answer authors to communicate, and exchange information through the asking and answering of questions in the network. In their current set-up, answers to a question are arranged in chronological order. For effective information retrieval, it will be advantageous to have the users’ answers ranked according to their quality. This paper proposes a novel approach of evaluating and ranking the users’answers and recommending the top-n quality answers to information seekers. The proposed approach is based on a user-reputation method which assigns a score to an answer reflecting its answer author’s reputation level in the network. The proposed approach is evaluated on a dataset collected from a live cQA, namely, Yahoo! Answers. To compare the results obtained by the non-content-based user-reputation method, experiments were also conducted with several content-based methods that assign a score to an answer reflecting its content quality. Various combinations of non-content and content-based scores were also used in comparing results. Empirical analysis shows that the proposed method is able to rank the users’ answers and recommend the top-n answers with good accuracy. Results of the proposed method outperform the content-based methods, various combinations, and the results obtained by the popular link analysis method, HITS.
Resumo:
The importance of the new textual genres such as blogs or forum entries is growing in parallel with the evolution of the Social Web. This paper presents two corpora of blog posts in English and in Spanish, annotated according to the EmotiBlog annotation scheme. Furthermore, we created 20 factual and opinionated questions for each language and also the Gold Standard for their answers in the corpus. The purpose of our work is to study the challenges involved in a mixed fact and opinion question answering setting by comparing the performance of two Question Answering (QA) systems as far as mixed opinion and factual setting is concerned. The first one is open domain, while the second one is opinion-oriented. We evaluate separately the two systems in both languages and propose possible solutions to improve QA systems that have to process mixed questions.
Resumo:
The development of the Web 2.0 led to the birth of new textual genres such as blogs, reviews or forum entries. The increasing number of such texts and the highly diverse topics they discuss make blogs a rich source for analysis. This paper presents a comparative study on open domain and opinion QA systems. A collection of opinion and mixed fact-opinion questions in English is defined and two Question Answering systems are employed to retrieve the answers to these queries. The first one is generic, while the second is specific for emotions. We comparatively evaluate and analyze the systems’ results, concluding that opinion Question Answering requires the use of specific resources and methods.
Resumo:
Currently there are an overwhelming number of scientific publications in Life Sciences, especially in Genetics and Biotechnology. This huge amount of information is structured in corporate Data Warehouses (DW) or in Biological Databases (e.g. UniProt, RCSB Protein Data Bank, CEREALAB or GenBank), whose main drawback is its cost of updating that makes it obsolete easily. However, these Databases are the main tool for enterprises when they want to update their internal information, for example when a plant breeder enterprise needs to enrich its genetic information (internal structured Database) with recently discovered genes related to specific phenotypic traits (external unstructured data) in order to choose the desired parentals for breeding programs. In this paper, we propose to complement the internal information with external data from the Web using Question Answering (QA) techniques. We go a step further by providing a complete framework for integrating unstructured and structured information by combining traditional Databases and DW architectures with QA systems. The great advantage of our framework is that decision makers can compare instantaneously internal data with external data from competitors, thereby allowing taking quick strategic decisions based on richer data.
Resumo:
With the recent rapid growth of the Semantic Web (SW), the processes of searching and querying content that is both massive in scale and heterogeneous have become increasingly challenging. User-friendly interfaces, which can support end users in querying and exploring this novel and diverse, structured information space, are needed to make the vision of the SW a reality. We present a survey on ontology-based Question Answering (QA), which has emerged in recent years to exploit the opportunities offered by structured semantic information on the Web. First, we provide a comprehensive perspective by analyzing the general background and history of the QA research field, from influential works from the artificial intelligence and database communities developed in the 70s and later decades, through open domain QA stimulated by the QA track in TREC since 1999, to the latest commercial semantic QA solutions, before tacking the current state of the art in open user-friendly interfaces for the SW. Second, we examine the potential of this technology to go beyond the current state of the art to support end-users in reusing and querying the SW content. We conclude our review with an outlook for this novel research area, focusing in particular on the R&D directions that need to be pursued to realize the goal of efficient and competent retrieval and integration of answers from large scale, heterogeneous, and continuously evolving semantic sources.
Resumo:
PowerAqua is a Question Answering system, which takes as input a natural language query and is able to return answers drawn from relevant semantic resources found anywhere on the Semantic Web. In this paper we provide two novel contributions: First, we detail a new component of the system, the Triple Similarity Service, which is able to match queries effectively to triples found in different ontologies on the Semantic Web. Second, we provide a first evaluation of the system, which in addition to providing data about PowerAqua's competence, also gives us important insights into the issues related to using the Semantic Web as the target answer set in Question Answering. In particular, we show that, despite the problems related to the noisy and incomplete conceptualizations, which can be found on the Semantic Web, good results can already be obtained.
Resumo:
SMS (Short Message Service) is now a hugely popular and a very powerful business communication technology for mobile phones. In order to respond correctly to a free form factual question given a large collection of texts, one needs to understand the question at a level that allows determining some of constraints the question imposes on a possible answer. These constraints may include a semantic classification of the sought after answer and may even suggest using different strategies when looking for and verifying a candidate answer. In this paper we focus on various attempts to overcome the major contradiction: the technical limitations of the SMS standard, and the huge number of found information for a possible answer.
Resumo:
The value of Question Answering (Q&A) communities is dependent on members of the community finding the questions they are most willing and able to answer. This can be difficult in communities with a high volume of questions. Much previous has work attempted to address this problem by recommending questions similar to those already answered. However, this approach disregards the question selection behaviour of the answers and how it is affected by factors such as question recency and reputation. In this paper, we identify the parameters that correlate with such a behaviour by analysing the users' answering patterns in a Q&A community. We then generate a model to predict which question a user is most likely to answer next. We train Learning to Rank (LTR) models to predict question selections using various user, question and thread feature sets. We show that answering behaviour can be predicted with a high level of success, and highlight the particular features that inuence users' question selections.
Resumo:
Taking the three basic systems of Yes/No particles the group looked at the relative deep and surface structures, and asked what types of systems are present in the Georgian, Polish and Armenian languages. The choice of languages was of particular interest as the Caucasian and Indo-European languages usually have different question-answering systems, but Georgian (Caucasian) and Polish (Indo-European) in fact share the same system. The Armenian language is Indo-European, but the country is situated in the southern Caucasus, on Georgia's southern border, making it worth analysing Armenian in comparison with Georgian (from the point of view of language interference) and with Polish (as two relative languages). The group identified two different deep structures, tracing the occurrence of these in different languages, and showed that one is more natural in the majority of languages. They found no correspondence between relative languages and their question-answer systems and demonstrated that languages in the same typological class may show different systems, as with Georgian and the North Caucasian languages. It became clear that Georgian, Armenian and Polish all have an agree/disagree question-answering system defined by the same deep structure. From this they conclude that the lingual mentalities of Georgians, Armenians and Poles are more oriented to the communicative act. At the same time the Yes/No system, in which a positive particle stands for a positive answer and a negative particle for a negative answer, also functions in these languages, indicating that the second deep structure identified also functions alongside the first.
Resumo:
Mobile advertising is a rapidly growing sector providing brands and marketing agencies the opportunity to connect with consumers beyond traditional and digital media and instead communicate directly on their mobile phones. Mobile advertising will be intrinsically linked with mobile search, which has transported from the internet to the mobile and is identified as an area of potential growth. The result of mobile searching show that as a general rule such search result exceed 160 characters; the dialog is required to deliver the relevant portion of a response to the mobile user. In this paper we focus initially on mobile search and mobile advert creation, and later the mechanism of interaction between the user’s request, the result of searching, advertising and dialog.