10 resultados para Named entity recognition

em Deakin Research Online - Australia


Relevância:

100.00% 100.00%

Publicador:

Resumo:

In named entity recognition (NER) for biomedical literature, approaches based on combined classifiers have demonstrated great performance improvement compared to a single (best) classifier. This is mainly owed to sufficient level of diversity exhibited among classifiers, which is a selective property of classifier set. Given a large number of classifiers, how to select different classifiers to put into a classifier-ensemble is a crucial issue of multiple classifier-ensemble design. With this observation in mind, we proposed a generic genetic classifier-ensemble method for the classifier selection in biomedical NER. Various diversity measures and majority voting are considered, and disjoint feature subsets are selected to construct individual classifiers. A basic type of individual classifier – Support Vector Machine (SVM) classifier is adopted as SVM-classifier committee. A multi-objective Genetic algorithm (GA) is employed as the classifier selector to facilitate the ensemble classifier to improve the overall sample classification accuracy. The proposed approach is tested on the benchmark dataset – GENIA version 3.02 corpus, and compared with both individual best SVM classifier and SVM-classifier ensemble algorithm as well as other machine learning methods such as CRF, HMM and MEMM. The results show that the proposed approach outperforms other classification algorithms and can be a useful method for the biomedical NER problem.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Named entity recognition (NER) is an essential step in the process of information extraction within text mining. This paper proposes a technique to extract drug named entities from unstructured and informal medical text using a hybrid model of lexicon-based and rule-based techniques. In the proposed model, a lexicon is first used as the initial step to detect drug named entities. Inference rules are then deployed to further extract undetected drug names. The designed rules employ part of speech tags and morphological features for drug name detection. The proposed hybrid model is evaluated using a benchmark data set from the i2b2 2009 medication challenge, and is able to achieve an f-score of 66.97%.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Objective : The objective of this paper is to formulate an extended segment representation (SR) technique to enhance named entity recognition (NER) in medical applications.

Methods : An extension to the IOBES (Inside/Outside/Begin/End/Single) SR technique is formulated. In the proposed extension, a new class is assigned to words that do not belong to a named entity (NE) in one context but appear as an NE in other contexts. Ambiguity in such cases can negatively affect the results of classification-based NER techniques. Assigning a separate class to words that can potentially cause ambiguity in NER allows a classifier to detect NEs more accurately; therefore increasing classification accuracy.

Results : The proposed SR technique is evaluated using the i2b2 2010 medical challenge data set with eight different classifiers. Each classifier is trained separately to extract three different medical NEs, namely treatment, problem, and test. From the three experimental results, the extended SR technique is able to improve the average F1-measure results pertaining to seven out of eight classifiers. The kNN classifier shows an average reduction of 0.18% across three experiments, while the C4.5 classifier records an average improvement of 9.33%.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

An accurate Named Entity Recognition (NER) is important for knowledge discovery in text mining. This paper proposes an ensemble machine learning approach to recognise Named Entities (NEs) from unstructured and informal medical text. Specifically, Conditional Random Field (CRF) and Maximum Entropy (ME) classifiers are applied individually to the test data set from the i2b2 2010 medication challenge. Each classifier is trained using a different set of features. The first set focuses on the contextual features of the data, while the second concentrates on the linguistic features of each word. The results of the two classifiers are then combined. The proposed approach achieves an f-score of 81.8%, showing a considerable improvement over the results from CRF and ME classifiers individually which achieve f-scores of 76% and 66.3% for the same data set, respectively.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Named Entity Recognition (NER) is a crucial step in text mining. This paper proposes a new graph-based technique for representing unstructured medical text. The new representation is used to extract discriminative features that are able to enhance the NER performance. To evaluate the usefulness of the proposed graph-based technique, the i2b2 medication challenge data set is used. Specifically, the 'treatment' named entities are extracted for evaluation using six different classifiers. The F-measure results of five classifiers are enhanced, with an average improvement of up to 26% in performance.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The thesis introduces a set of machine learning techniques that enhance the extraction of Named Entities from informal and unstructured free text.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Most face recognition (FR) algorithms require the face images to satisfy certain restrictions in various aspects like view angle, illumination, occlusion, etc. But what is needed in general is the techniques that can recognize any face images recognizable by human beings. This paper provides one potential solution to this problem. A method named Individual Discriminative Subspace (IDS) is proposed for robust face recognition under uncontrolled conditions. IDS is the subspace where only the images from one particular person converge around the origin while those from others scatter. Each IDS can be used to distinguish one individual from others. There is no restriction on the face images fed into the algorithm, which makes it practical for real-life applications. In the experiments, IDS is tested on two large face databases with extensive variations and performs significantly better than 12 existing FR techniques.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

There usually exist many kinds of variations in face images taken under uncontrolled conditions, such as changes of pose, illumination, expression, etc. Most previous works on face recognition (FR) focus on particular variations and usually assume the absence of others. Instead of such a ldquodivide and conquerrdquo strategy, this paper attempts to directly address face recognition under uncontrolled conditions. The key is the individual stable space (ISS), which only expresses personal characteristics. A neural network named ISNN is proposed to map a raw face image into the ISS. After that, three ISS-based algorithms are designed for FR under uncontrolled conditions. There are no restrictions for the images fed into these algorithms. Moreover, unlike many other FR techniques, they do not require any extra training information, such as the view angle. These advantages make them practical to implement under uncontrolled conditions. The proposed algorithms are tested on three large face databases with vast variations and achieve superior performance compared with other 12 existing FR techniques.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The two-dimensional Principal Component Analysis (2DPCA) is a robust method in face recognition. Much recent research shows that the 2DPCA is more reliable than the well-known PCA method in recognising human face. However, in many cases, this method tends to be overfitted to sample data. In this paper, we proposed a novel method named random subspace two-dimensional PCA (RS-2DPCA), which combines the 2DPCA method with the random subspace (RS) technique. The RS-2DPCA inherits the advantages of both the 2DPCA and RS technique, thus it can avoid the overfitting problem and achieve high recognition accuracy. Experimental results in three benchmark face data sets -the ORL database, the Yale face database and the extended Yale face database B - confirm our hypothesis that the RS-2DPCA is superior to the 2DPCA itself.