890 resultados para Computational linguistics


Relevância:

60.00% 60.00%

Publicador:

Resumo:

In the chemical textile domain experts have to analyse chemical components and substances that might be harmful for their usage in clothing and textiles. Part of this analysis is performed searching opinions and reports people have expressed concerning these products in the Social Web. However, this type of information on the Internet is not as frequent for this domain as for others, so its detection and classification is difficult and time-consuming. Consequently, problems associated to the use of chemical substances in textiles may not be detected early enough, and could lead to health problems, such as allergies or burns. In this paper, we propose a framework able to detect, retrieve, and classify subjective sentences related to the chemical textile domain, that could be integrated into a wider health surveillance system. We also describe the creation of several datasets with opinions from this domain, the experiments performed using machine learning techniques and different lexical resources such as WordNet, and the evaluation focusing on the sentiment classification, and complaint detection (i.e., negativity). Despite the challenges involved in this domain, our approach obtains promising results with an F-score of 65% for polarity classification and 82% for complaint detection.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In this paper we present a complete system for the treatment of both geographical and temporal dimensions in text and its application to information retrieval. This system has been evaluated in both the GeoTime task of the 8th and 9th NTCIR workshop in the years 2010 and 2011 respectively, making it possible to compare the system to contemporary approaches to the topic. In order to participate in this task we have added the temporal dimension to our GIR system. The system proposed here has a modular architecture in order to add or modify features. In the development of this system, we have followed a QA-based approach as well as multi-search engines to improve the system performance.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In this paper we describe Fénix, a data model for exchanging information between Natural Language Processing applications. The format proposed is intended to be flexible enough to cover both current and future data structures employed in the field of Computational Linguistics. The Fénix architecture is divided into four separate layers: conceptual, logical, persistence and physical. This division provides a simple interface to abstract the users from low-level implementation details, such as programming languages and data storage employed, allowing them to focus in the concepts and processes to be modelled. The Fénix architecture is accompanied by a set of programming libraries to facilitate the access and manipulation of the structures created in this framework. We will also show how this architecture has been already successfully applied in different research projects.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In the article relevance of system development for subject search using computational linguistics is considered. The basic principles of system functioning are defined. The principle of grammar development for information retrieval from the partially structured text in a natural language is considered. The ranging principle of results of information search is defined.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Thesis (M.S.)--Illinois.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Includes bibliography.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The structure and infrastructure of the Mexican technical literature was determined. A representative database of technical articles was extracted from the Science Citation Index for the year 2002, with each article containing at least one author with a Mexican address. Many different manual and statistical clustering methods were used to identify the structure of the technical literature (especially the science and technology core competencies). One of the pervasive technical topics identified from the clustering, thin films research, was analyzed further using bibliometrics, in order to identify the infrastructure of this technology. Published by Elsevier Inc.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The use of ontologies as representations of knowledge is widespread but their construction, until recently, has been entirely manual. We argue in this paper for the use of text corpora and automated natural language processing methods for the construction of ontologies. We delineate the challenges and present criteria for the selection of appropriate methods. We distinguish three ma jor steps in ontology building: associating terms, constructing hierarchies and labelling relations. A number of methods are presented for these purposes but we conclude that the issue of data-sparsity still is a ma jor challenge. We argue for the use of resources external tot he domain specific corpus.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Yorick Wilks is a central figure in the fields of Natural Language Processing and Artificial Intelligence. His influence extends to many areas and includes contributions to Machines Translation, word sense disambiguation, dialogue modeling and Information Extraction. This book celebrates the work of Yorick Wilks in the form of a selection of his papers which are intended to reflect the range and depth of his work. The volume accompanies a Festschrift which celebrates his contribution to the fields of Computational Linguistics and Artificial Intelligence. The papers include early work carried out at Cambridge University, descriptions of groundbreaking work on Machine Translation and Preference Semantics as well as more recent works on belief modeling and computational semantics. The selected papers reflect Yorick’s contribution to both practical and theoretical aspects of automatic language processing.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Information technology has increased both the speed and medium of communication between nations. It has brought the world closer, but it has also created new challenges for translation — how we think about it, how we carry it out and how we teach it. Translation and Information Technology has brought together experts in computational linguistics, machine translation, translation education, and translation studies to discuss how these new technologies work, the effect of electronic tools, such as the internet, bilingual corpora, and computer software, on translator education and the practice of translation, as well as the conceptual gaps raised by the interface of human and machine.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This is the first published edition of John Sinclair, Susan Jones and Robert Daley's research on collocation undertaken in 1970. The unpublished report was circulated amongst a small group of academics and was enormously influential, sparking a growth of interest in collocation amongst researchers in linguistics. Collocation was first viewed as important in computational linguistics in the work of Harold Palmer in Japan. Later M.A.K. Halliday and John Sinclair published on collocation in the 1960s. English Collocation Studies is a report on empirical research into collocation, devised by Halliday with Sinclair acting as the Principal Investigator and editor of the resultant OSTI report. The present edition contains an introduction by Professor Wolfgang Teubert based on his interview with John Sinclair. The introduction assesses the extent to which the findings of the original research have developed in the intervening years, and how some of the techniques mentioned in the report were implemented in the COBUILD project at Birmingham University in the 1980s.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We propose a hybrid generative/discriminative framework for semantic parsing which combines the hidden vector state (HVS) model and the hidden Markov support vector machines (HM-SVMs). The HVS model is an extension of the basic discrete Markov model in which context is encoded as a stack-oriented state vector. The HM-SVMs combine the advantages of the hidden Markov models and the support vector machines. By employing a modified K-means clustering method, a small set of most representative sentences can be automatically selected from an un-annotated corpus. These sentences together with their abstract annotations are used to train an HVS model which could be subsequently applied on the whole corpus to generate semantic parsing results. The most confident semantic parsing results are selected to generate a fully-annotated corpus which is used to train the HM-SVMs. The proposed framework has been tested on the DARPA Communicator Data. Experimental results show that an improvement over the baseline HVS parser has been observed using the hybrid framework. When compared with the HM-SVMs trained from the fully-annotated corpus, the hybrid framework gave a comparable performance with only a small set of lightly annotated sentences. © 2008. Licensed under the Creative Commons.