929 resultados para ensemble classifiers


Relevância:

40.00% 40.00%

Publicador:

Resumo:

The aim of the present study is to define an optimally performing computer-aided diagnosis (CAD) architecture for the classification of liver tissue from non-enhanced computed tomography (CT) images into normal liver (C1), hepatic cyst (C2), hemangioma (C3), and hepatocellular carcinoma (C4). To this end, various CAD architectures, based on texture features and ensembles of classifiers (ECs), are comparatively assessed.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In the first part of this paper we reviewed the fingerprint classification literature from two different perspectives: the feature extraction and the classifier learning. Aiming at answering the question of which among the reviewed methods would perform better in a real implementation we end up in a discussion which showed the difficulty in answering this question. No previous comparison exists in the literature and comparisons among papers are done with different experimental frameworks. Moreover, the difficulty in implementing published methods was stated due to the lack of details in their description, parameters and the fact that no source code is shared. For this reason, in this paper we will go through a deep experimental study following the proposed double perspective. In order to do so, we have carefully implemented some of the most relevant feature extraction methods according to the explanations found in the corresponding papers and we have tested their performance with different classifiers, including those specific proposals made by the authors. Our aim is to develop an objective experimental study in a common framework, which has not been done before and which can serve as a baseline for future works on the topic. This way, we will not only test their quality, but their reusability by other researchers and will be able to indicate which proposals could be considered for future developments. Furthermore, we will show that combining different feature extraction models in an ensemble can lead to a superior performance, significantly increasing the results obtained by individual models.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In many domains when we have several competing classifiers available we want to synthesize them or some of them to get a more accurate classifier by a combination function. In this paper we propose a ‘class-indifferent’ method for combining classifier decisions represented by evidential structures called triplet and quartet, using Dempster's rule of combination. This method is unique in that it distinguishes important elements from the trivial ones in representing classifier decisions, makes use of more information than others in calculating the support for class labels and provides a practical way to apply the theoretically appealing Dempster–Shafer theory of evidence to the problem of ensemble learning. We present a formalism for modelling classifier decisions as triplet mass functions and we establish a range of formulae for combining these mass functions in order to arrive at a consensus decision. In addition we carry out a comparative study with the alternatives of simplet and dichotomous structure and also compare two combination methods, Dempster's rule and majority voting, over the UCI benchmark data, to demonstrate the advantage our approach offers. (A continuation of the work in this area that was published in IEEE Trans on KDE, and conferences)

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Application of sensor-based technology within activity monitoring systems is becoming a popular technique within the smart environment paradigm. Nevertheless, the use of such an approach generates complex constructs of data, which subsequently requires the use of intricate activity recognition techniques to automatically infer the underlying activity. This paper explores a cluster-based ensemble method as a new solution for the purposes of activity recognition within smart environments. With this approach activities are modelled as collections of clusters built on different subsets of features. A classification process is performed by assigning a new instance to its closest cluster from each collection. Two different sensor data representations have been investigated, namely numeric and binary. Following the evaluation of the proposed methodology it has been demonstrated that the cluster-based ensemble method can be successfully applied as a viable option for activity recognition. Results following exposure to data collected from a range of activities indicated that the ensemble method had the ability to perform with accuracies of 94.2% and 97.5% for numeric and binary data, respectively. These results outperformed a range of single classifiers considered as benchmarks.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

One of the most popular techniques of generating classifier ensembles is known as stacking which is based on a meta-learning approach. In this paper, we introduce an alternative method to stacking which is based on cluster analysis. Similar to stacking, instances from a validation set are initially classified by all base classifiers. The output of each classifier is subsequently considered as a new attribute of the instance. Following this, a validation set is divided into clusters according to the new attributes and a small subset of the original attributes of the instances. For each cluster, we find its centroid and calculate its class label. The collection of centroids is considered as a meta-classifier. Experimental results show that the new method outperformed all benchmark methods, namely Majority Voting, Stacking J48, Stacking LR, AdaBoost J48, and Random Forest, in 12 out of 22 data sets. The proposed method has two advantageous properties: it is very robust to relatively small training sets and it can be applied in semi-supervised learning problems. We provide a theoretical investigation regarding the proposed method. This demonstrates that for the method to be successful, the base classifiers applied in the ensemble should have greater than 50% accuracy levels.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Generally classifiers tend to overfit if there is noise in the training data or there are missing values. Ensemble learning methods are often used to improve a classifier's classification accuracy. Most ensemble learning approaches aim to improve the classification accuracy of decision trees. However, alternative classifiers to decision trees exist. The recently developed Random Prism ensemble learner for classification aims to improve an alternative classification rule induction approach, the Prism family of algorithms, which addresses some of the limitations of decision trees. However, Random Prism suffers like any ensemble learner from a high computational overhead due to replication of the data and the induction of multiple base classifiers. Hence even modest sized datasets may impose a computational challenge to ensemble learners such as Random Prism. Parallelism is often used to scale up algorithms to deal with large datasets. This paper investigates parallelisation for Random Prism, implements a prototype and evaluates it empirically using a Hadoop computing cluster.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The induction of classification rules from previously unseen examples is one of the most important data mining tasks in science as well as commercial applications. In order to reduce the influence of noise in the data, ensemble learners are often applied. However, most ensemble learners are based on decision tree classifiers which are affected by noise. The Random Prism classifier has recently been proposed as an alternative to the popular Random Forests classifier, which is based on decision trees. Random Prism is based on the Prism family of algorithms, which is more robust to noise. However, like most ensemble classification approaches, Random Prism also does not scale well on large training data. This paper presents a thorough discussion of Random Prism and a recently proposed parallel version of it called Parallel Random Prism. Parallel Random Prism is based on the MapReduce programming paradigm. The paper provides, for the first time, novel theoretical analysis of the proposed technique and in-depth experimental study that show that Parallel Random Prism scales well on a large number of training examples, a large number of data features and a large number of processors. Expressiveness of decision rules that our technique produces makes it a natural choice for Big Data applications where informed decision making increases the user’s trust in the system.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In content-based image retrieval, learning from users’ feedback can be considered as an one-class classification problem. However, the OCIB method proposed in [1] suffers from the problem that it is only a one-mode method which cannot deal with multiple interest regions. In addition, it requires a pre-specified radius which is usually unavailable in real world applications. This paper overcomes these two problems by introducing ensemble learning into the OCIB method: by Bagging, we can construct a group of one-class classifiers which emphasize various parts of the data set; this is followed by a rank aggregating with which results from different parameter settings are incorporated into a single final ranking list. The experimental results show that the proposed I-OCIB method outperforms the OCIB for image retrieval applications.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Ensemble learning that combines the decisions of multiple weak classifiers to from an output, has recently emerged as an effective identification method. This paper presents a road-sign identification system based upon the ensemble learning approach. The system identifies the regions of interest that are extracted from the scene into the road-sign groups that they belong to. A large road-sign image dataset is formed and used to train and test the system. Fifteen groups of road signs are chosen for identification. Five experiments are performed and the results are presented and discussed.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper presents a triple-random ensemble learning method for handling multi-label classification problems. The proposed method integrates and develops the concepts of random subspace, bagging and random k-label sets ensemble learning methods to form an approach to classify multi-label data. It applies the random subspace method to feature space, label space as well as instance space. The devised subsets selection procedure is executed iteratively. Each multi-label classifier is trained using the randomly selected subsets. At the end of the iteration, optimal parameters are selected and the ensemble MLC classifiers are constructed. The proposed method is implemented and its performance compared against that of popular multi-label classification methods. The experimental results reveal that the proposed method outperforms the examined counterparts in most occasions when tested on six small to larger multi-label datasets from different domains. This demonstrates that the developed method possesses general applicability for various multi-label classification problems.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background: Feature selection techniques are critical to the analysis of high dimensional datasets. This is especially true in gene selection from microarray data which are commonly with extremely high feature-to-sample ratio. In addition to the essential objectives such as to reduce data noise, to reduce data redundancy, to improve sample classification accuracy, and to improve model generalization property, feature selection also helps biologists to focus on the selected genes to further validate their biological hypotheses.
Results: In this paper we describe an improved hybrid system for gene selection. It is based on a recently proposed genetic ensemble (GE) system. To enhance the generalization property of the selected genes or gene subsets and to overcome the overfitting problem of the GE system, we devised a mapping strategy to fuse the goodness information of each gene provided by multiple filtering algorithms. This information is then used for initialization and mutation operation of the genetic ensemble system.
Conclusion: We used four benchmark microarray datasets (including both binary-class and multi-class classification problems) for concept proving and model evaluation. The experimental results indicate that the proposed multi-filter enhanced genetic ensemble (MF-GE) system is able to improve sample classification accuracy, generate more compact gene subset, and converge to the selection results more quickly. The MF-GE system is very flexible as various combinations of multiple filters and classifiers can be incorporated based on the data characteristics and the user preferences.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In named entity recognition (NER) for biomedical literature, approaches based on combined classifiers have demonstrated great performance improvement compared to a single (best) classifier. This is mainly owed to sufficient level of diversity exhibited among classifiers, which is a selective property of classifier set. Given a large number of classifiers, how to select different classifiers to put into a classifier-ensemble is a crucial issue of multiple classifier-ensemble design. With this observation in mind, we proposed a generic genetic classifier-ensemble method for the classifier selection in biomedical NER. Various diversity measures and majority voting are considered, and disjoint feature subsets are selected to construct individual classifiers. A basic type of individual classifier – Support Vector Machine (SVM) classifier is adopted as SVM-classifier committee. A multi-objective Genetic algorithm (GA) is employed as the classifier selector to facilitate the ensemble classifier to improve the overall sample classification accuracy. The proposed approach is tested on the benchmark dataset – GENIA version 3.02 corpus, and compared with both individual best SVM classifier and SVM-classifier ensemble algorithm as well as other machine learning methods such as CRF, HMM and MEMM. The results show that the proposed approach outperforms other classification algorithms and can be a useful method for the biomedical NER problem.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

An accurate Named Entity Recognition (NER) is important for knowledge discovery in text mining. This paper proposes an ensemble machine learning approach to recognise Named Entities (NEs) from unstructured and informal medical text. Specifically, Conditional Random Field (CRF) and Maximum Entropy (ME) classifiers are applied individually to the test data set from the i2b2 2010 medication challenge. Each classifier is trained using a different set of features. The first set focuses on the contextual features of the data, while the second concentrates on the linguistic features of each word. The results of the two classifiers are then combined. The proposed approach achieves an f-score of 81.8%, showing a considerable improvement over the results from CRF and ME classifiers individually which achieve f-scores of 76% and 66.3% for the same data set, respectively.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper investigates the problem of minimizing data transfer between different data centers of the cloud during the neurological diagnostics of cardiac autonomic neuropathy (CAN). This problem has never been considered in the literature before. All classifiers considered for the diagnostics of CAN previously assume complete access to all data, which would lead to enormous burden of data transfer during training if such classifiers were deployed in the cloud. We introduce a new model of clustering-based multi-layer distributed ensembles (CBMLDE). It is designed to eliminate the need to transfer data between different data centers for training of the classifiers. We conducted experiments utilizing a dataset derived from an extensive DiScRi database. Our comprehensive tests have determined the best combinations of options for setting up CBMLDE classifiers. The results demonstrate that CBMLDE classifiers not only completely eliminate the need in patient data transfer, but also have significantly outperformed all base classifiers and simpler counterpart models in all cloud frameworks.