988 resultados para Named entity recognition


Relevância:

100.00% 100.00%

Publicador:

Resumo:

In named entity recognition (NER) for biomedical literature, approaches based on combined classifiers have demonstrated great performance improvement compared to a single (best) classifier. This is mainly owed to sufficient level of diversity exhibited among classifiers, which is a selective property of classifier set. Given a large number of classifiers, how to select different classifiers to put into a classifier-ensemble is a crucial issue of multiple classifier-ensemble design. With this observation in mind, we proposed a generic genetic classifier-ensemble method for the classifier selection in biomedical NER. Various diversity measures and majority voting are considered, and disjoint feature subsets are selected to construct individual classifiers. A basic type of individual classifier – Support Vector Machine (SVM) classifier is adopted as SVM-classifier committee. A multi-objective Genetic algorithm (GA) is employed as the classifier selector to facilitate the ensemble classifier to improve the overall sample classification accuracy. The proposed approach is tested on the benchmark dataset – GENIA version 3.02 corpus, and compared with both individual best SVM classifier and SVM-classifier ensemble algorithm as well as other machine learning methods such as CRF, HMM and MEMM. The results show that the proposed approach outperforms other classification algorithms and can be a useful method for the biomedical NER problem.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Named entity recognition (NER) is an essential step in the process of information extraction within text mining. This paper proposes a technique to extract drug named entities from unstructured and informal medical text using a hybrid model of lexicon-based and rule-based techniques. In the proposed model, a lexicon is first used as the initial step to detect drug named entities. Inference rules are then deployed to further extract undetected drug names. The designed rules employ part of speech tags and morphological features for drug name detection. The proposed hybrid model is evaluated using a benchmark data set from the i2b2 2009 medication challenge, and is able to achieve an f-score of 66.97%.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Objective : The objective of this paper is to formulate an extended segment representation (SR) technique to enhance named entity recognition (NER) in medical applications.

Methods : An extension to the IOBES (Inside/Outside/Begin/End/Single) SR technique is formulated. In the proposed extension, a new class is assigned to words that do not belong to a named entity (NE) in one context but appear as an NE in other contexts. Ambiguity in such cases can negatively affect the results of classification-based NER techniques. Assigning a separate class to words that can potentially cause ambiguity in NER allows a classifier to detect NEs more accurately; therefore increasing classification accuracy.

Results : The proposed SR technique is evaluated using the i2b2 2010 medical challenge data set with eight different classifiers. Each classifier is trained separately to extract three different medical NEs, namely treatment, problem, and test. From the three experimental results, the extended SR technique is able to improve the average F1-measure results pertaining to seven out of eight classifiers. The kNN classifier shows an average reduction of 0.18% across three experiments, while the C4.5 classifier records an average improvement of 9.33%.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

An accurate Named Entity Recognition (NER) is important for knowledge discovery in text mining. This paper proposes an ensemble machine learning approach to recognise Named Entities (NEs) from unstructured and informal medical text. Specifically, Conditional Random Field (CRF) and Maximum Entropy (ME) classifiers are applied individually to the test data set from the i2b2 2010 medication challenge. Each classifier is trained using a different set of features. The first set focuses on the contextual features of the data, while the second concentrates on the linguistic features of each word. The results of the two classifiers are then combined. The proposed approach achieves an f-score of 81.8%, showing a considerable improvement over the results from CRF and ME classifiers individually which achieve f-scores of 76% and 66.3% for the same data set, respectively.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Named Entity Recognition (NER) is a crucial step in text mining. This paper proposes a new graph-based technique for representing unstructured medical text. The new representation is used to extract discriminative features that are able to enhance the NER performance. To evaluate the usefulness of the proposed graph-based technique, the i2b2 medication challenge data set is used. Specifically, the 'treatment' named entities are extracted for evaluation using six different classifiers. The F-measure results of five classifiers are enhanced, with an average improvement of up to 26% in performance.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Actualmente existe una gran cantidad de empresas ofreciendo servicios para el análisis de contenido y minería de datos de las redes sociales con el objetivo de realizar análisis de opiniones y gestión de la reputación. Un alto porcentaje de pequeñas y medianas empresas (pymes) ofrecen soluciones específicas a un sector o dominio industrial. Sin embargo, la adquisición de la necesaria tecnología básica para ofrecer tales servicios es demasiado compleja y constituye un sobrecoste demasiado alto para sus limitados recursos. El objetivo del proyecto europeo OpeNER es la reutilización y desarrollo de componentes y recursos para el procesamiento lingüístico que proporcione la tecnología necesaria para su uso industrial y/o académico.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Browsing constitutes an important part of the user information searching process on the Web. In this paper, we present a browser plug-in called ESpotter, which recognizes entities of various types on Web pages and highlights them according to their types to assist user browsing. ESpotter uses a range of standard named entity recognition techniques. In addition, a key new feature of ESpotter is that it addresses the problem of multiple domains on the Web by adapting lexicon and patterns to these domains.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The thesis introduces a set of machine learning techniques that enhance the extraction of Named Entities from informal and unstructured free text.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In questo lavoro si introducono i concetti di base di Natural Language Processing, soffermandosi su Information Extraction e analizzandone gli ambiti applicativi, le attività principali e la differenza rispetto a Information Retrieval. Successivamente si analizza il processo di Named Entity Recognition, focalizzando l’attenzione sulle principali problematiche di annotazione di testi e sui metodi per la valutazione della qualità dell’estrazione di entità. Infine si fornisce una panoramica della piattaforma software open-source di language processing GATE/ANNIE, descrivendone l’architettura e i suoi componenti principali, con approfondimenti sugli strumenti che GATE offre per l'approccio rule-based a Named Entity Recognition.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper, we describe a voting mechanism for accurate named entity (NE) translation in English–Chinese question answering (QA). This mechanism involves translations from three different sources: machine translation,online encyclopaedia, and web documents. The translation with the highest number of votes is selected. We evaluated this approach using test collection, topics and assessment results from the NTCIR-8 evaluation forum. This mechanism achieved 95% accuracy in NEs translation and 0.3756 MAP in English–Chinese cross-lingual information retrieval of QA.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Mémoire numérisé par la Division de la gestion de documents et des archives de l'Université de Montréal

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This is a Named Entity Based Question Answering System for Malayalam Language. Although a vast amount of information is available today in digital form, no effective information access mechanism exists to provide humans with convenient information access. Information Retrieval and Question Answering systems are the two mechanisms available now for information access. Information systems typically return a long list of documents in response to a user’s query which are to be skimmed by the user to determine whether they contain an answer. But a Question Answering System allows the user to state his/her information need as a natural language question and receives most appropriate answer in a word or a sentence or a paragraph. This system is based on Named Entity Tagging and Question Classification. Document tagging extracts useful information from the documents which will be used in finding the answer to the question. Question Classification extracts useful information from the question to determine the type of the question and the way in which the question is to be answered. Various Machine Learning methods are used to tag the documents. Rule-Based Approach is used for Question Classification. Malayalam belongs to the Dravidian family of languages and is one of the four major languages of this family. It is one of the 22 Scheduled Languages of India with official language status in the state of Kerala. It is spoken by 40 million people. Malayalam is a morphologically rich agglutinative language and relatively of free word order. Also Malayalam has a productive morphology that allows the creation of complex words which are often highly ambiguous. Document tagging tools such as Parts-of-Speech Tagger, Phrase Chunker, Named Entity Tagger, and Compound Word Splitter are developed as a part of this research work. No such tools were available for Malayalam language. Finite State Transducer, High Order Conditional Random Field, Artificial Immunity System Principles, and Support Vector Machines are the techniques used for the design of these document preprocessing tools. This research work describes how the Named Entity is used to represent the documents. Single sentence questions are used to test the system. Overall Precision and Recall obtained are 88.5% and 85.9% respectively. This work can be extended in several directions. The coverage of non-factoid questions can be increased and also it can be extended to include open domain applications. Reference Resolution and Word Sense Disambiguation techniques are suggested as the future enhancements