911 resultados para classifier combination
Resumo:
Peer-reviewed
Resumo:
In this paper we present a multi-stage classifier for magnetic resonance spectra of human brain tumours which is being developed as part of a decision support system for radiologists. The basic idea is to decompose a complex classification scheme into a sequence of classifiers, each specialising in different classes of tumours and trying to reproducepart of the WHO classification hierarchy. Each stage uses a particular set of classification features, which are selected using a combination of classical statistical analysis, splitting performance and previous knowledge.Classifiers with different behaviour are combined using a simple voting scheme in order to extract different error patterns: LDA, decision trees and the k-NN classifier. A special label named "unknown¿ is used when the outcomes of the different classifiers disagree. Cascading is alsoused to incorporate class distances computed using LDA into decision trees. Both cascading and voting are effective tools to improve classification accuracy. Experiments also show that it is possible to extract useful information from the classification process itself in order to helpusers (clinicians and radiologists) to make more accurate predictions and reduce the number of possible classification mistakes.
Resumo:
In this paper we present a novel approach for multispectral image contextual classification by combining iterative combinatorial optimization algorithms. The pixel-wise decision rule is defined using a Bayesian approach to combine two MRF models: a Gaussian Markov Random Field (GMRF) for the observations (likelihood) and a Potts model for the a priori knowledge, to regularize the solution in the presence of noisy data. Hence, the classification problem is stated according to a Maximum a Posteriori (MAP) framework. In order to approximate the MAP solution we apply several combinatorial optimization methods using multiple simultaneous initializations, making the solution less sensitive to the initial conditions and reducing both computational cost and time in comparison to Simulated Annealing, often unfeasible in many real image processing applications. Markov Random Field model parameters are estimated by Maximum Pseudo-Likelihood (MPL) approach, avoiding manual adjustments in the choice of the regularization parameters. Asymptotic evaluations assess the accuracy of the proposed parameter estimation procedure. To test and evaluate the proposed classification method, we adopt metrics for quantitative performance assessment (Cohen`s Kappa coefficient), allowing a robust and accurate statistical analysis. The obtained results clearly show that combining sub-optimal contextual algorithms significantly improves the classification performance, indicating the effectiveness of the proposed methodology. (C) 2010 Elsevier B.V. All rights reserved.
Resumo:
A organização automática de mensagens de correio electrónico é um desafio actual na área da aprendizagem automática. O número excessivo de mensagens afecta cada vez mais utilizadores, especialmente os que usam o correio electrónico como ferramenta de comunicação e trabalho. Esta tese aborda o problema da organização automática de mensagens de correio electrónico propondo uma solução que tem como objectivo a etiquetagem automática de mensagens. A etiquetagem automática é feita com recurso às pastas de correio electrónico anteriormente criadas pelos utilizadores, tratando-as como etiquetas, e à sugestão de múltiplas etiquetas para cada mensagem (top-N). São estudadas várias técnicas de aprendizagem e os vários campos que compõe uma mensagem de correio electrónico são analisados de forma a determinar a sua adequação como elementos de classificação. O foco deste trabalho recai sobre os campos textuais (o assunto e o corpo das mensagens), estudando-se diferentes formas de representação, selecção de características e algoritmos de classificação. É ainda efectuada a avaliação dos campos de participantes através de algoritmos de classificação que os representam usando o modelo vectorial ou como um grafo. Os vários campos são combinados para classificação utilizando a técnica de combinação de classificadores Votação por Maioria. Os testes são efectuados com um subconjunto de mensagens de correio electrónico da Enron e um conjunto de dados privados disponibilizados pelo Institute for Systems and Technologies of Information, Control and Communication (INSTICC). Estes conjuntos são analisados de forma a perceber as características dos dados. A avaliação do sistema é realizada através da percentagem de acerto dos classificadores. Os resultados obtidos apresentam melhorias significativas em comparação com os trabalhos relacionados.
Resumo:
In systems that combine the outputs of classification methods (combination systems), such as ensembles and multi-agent systems, one of the main constraints is that the base components (classifiers or agents) should be diverse among themselves. In other words, there is clearly no accuracy gain in a system that is composed of a set of identical base components. One way of increasing diversity is through the use of feature selection or data distribution methods in combination systems. In this work, an investigation of the impact of using data distribution methods among the components of combination systems will be performed. In this investigation, different methods of data distribution will be used and an analysis of the combination systems, using several different configurations, will be performed. As a result of this analysis, it is aimed to detect which combination systems are more suitable to use feature distribution among the components
Resumo:
Multi-classifier systems, also known as ensembles, have been widely used to solve several problems, because they, often, present better performance than the individual classifiers that form these systems. But, in order to do so, it s necessary that the base classifiers to be as accurate as diverse among themselves this is also known as diversity/accuracy dilemma. Given its importance, some works have investigate the ensembles behavior in context of this dilemma. However, the majority of them address homogenous ensemble, i.e., ensembles composed only of the same type of classifiers. Thus, motivated by this limitation, this thesis, using genetic algorithms, performs a detailed study on the dilemma diversity/accuracy for heterogeneous ensembles
Resumo:
Committees of classifiers may be used to improve the accuracy of classification systems, in other words, different classifiers used to solve the same problem can be combined for creating a system of greater accuracy, called committees of classifiers. To that this to succeed is necessary that the classifiers make mistakes on different objects of the problem so that the errors of a classifier are ignored by the others correct classifiers when applying the method of combination of the committee. The characteristic of classifiers of err on different objects is called diversity. However, most measures of diversity could not describe this importance. Recently, were proposed two measures of the diversity (good and bad diversity) with the aim of helping to generate more accurate committees. This paper performs an experimental analysis of these measures applied directly on the building of the committees of classifiers. The method of construction adopted is modeled as a search problem by the set of characteristics of the databases of the problem and the best set of committee members in order to find the committee of classifiers to produce the most accurate classification. This problem is solved by metaheuristic optimization techniques, in their mono and multi-objective versions. Analyzes are performed to verify if use or add the measures of good diversity and bad diversity in the optimization objectives creates more accurate committees. Thus, the contribution of this study is to determine whether the measures of good diversity and bad diversity can be used in mono-objective and multi-objective optimization techniques as optimization objectives for building committees of classifiers more accurate than those built by the same process, but using only the accuracy classification as objective of optimization
Resumo:
The combination of the synthetic minority oversampling technique (SMOTE) and the radial basis function (RBF) classifier is proposed to deal with classification for imbalanced two-class data. In order to enhance the significance of the small and specific region belonging to the positive class in the decision region, the SMOTE is applied to generate synthetic instances for the positive class to balance the training data set. Based on the over-sampled training data, the RBF classifier is constructed by applying the orthogonal forward selection procedure, in which the classifier structure and the parameters of RBF kernels are determined using a particle swarm optimization algorithm based on the criterion of minimizing the leave-one-out misclassification rate. The experimental results on both simulated and real imbalanced data sets are presented to demonstrate the effectiveness of our proposed algorithm.
Resumo:
The Optimum-Path Forest (OPF) classifier is a recent and promising method for pattern recognition, with a fast training algorithm and good accuracy results. Therefore, the investigation of a combining method for this kind of classifier can be important for many applications. In this paper we report a fast method to combine OPF-based classifiers trained with disjoint training subsets. Given a fixed number of subsets, the algorithm chooses random samples, without replacement, from the original training set. Each subset accuracy is improved by a learning procedure. The final decision is given by majority vote. Experiments with simulated and real data sets showed that the proposed combining method is more efficient and effective than naive approach provided some conditions. It was also showed that OPF training step runs faster for a series of small subsets than for the whole training set. The combining scheme was also designed to support parallel or distributed processing, speeding up the procedure even more. © 2011 Springer-Verlag.
Resumo:
In the last decade, local image features have been widely used in robot visual localization. In order to assess image similarity, a strategy exploiting these features compares raw descriptors extracted from the current image with those in the models of places. This paper addresses the ensuing step in this process, where a combining function must be used to aggregate results and assign each place a score. Casting the problem in the multiple classifier systems framework, in this paper we compare several candidate combiners with respect to their performance in the visual localization task. For this evaluation, we selected the most popular methods in the class of non-trained combiners, namely the sum rule and product rule. A deeper insight into the potential of these combiners is provided through a discriminativity analysis involving the algebraic rules and two extensions of these methods: the threshold, as well as the weighted modifications. In addition, a voting method, previously used in robot visual localization, is assessed. Furthermore, we address the process of constructing a model of the environment by describing how the model granularity impacts upon performance. All combiners are tested on a visual localization task, carried out on a public dataset. It is experimentally demonstrated that the sum rule extensions globally achieve the best performance, confirming the general agreement on the robustness of this rule in other classification problems. The voting method, whilst competitive with the product rule in its standard form, is shown to be outperformed by its modified versions.
Resumo:
Several real problems involve the classification of data into categories or classes. Given a data set containing data whose classes are known, Machine Learning algorithms can be employed for the induction of a classifier able to predict the class of new data from the same domain, performing the desired discrimination. Some learning techniques are originally conceived for the solution of problems with only two classes, also named binary classification problems. However, many problems require the discrimination of examples into more than two categories or classes. This paper presents a survey on the main strategies for the generalization of binary classifiers to problems with more than two classes, known as multiclass classification problems. The focus is on strategies that decompose the original multiclass problem into multiple binary subtasks, whose outputs are combined to obtain the final prediction.
Resumo:
The efficiency in image classification tasks can be improved using combined information provided by several sources, such as shape, color, and texture visual properties. Although many works proposed to combine different feature vectors, we model the descriptor combination as an optimization problem to be addressed by evolutionary-based techniques, which compute distances between samples that maximize their separability in the feature space. The robustness of the proposed technique is assessed by the Optimum-Path Forest classifier. Experiments showed that the proposed methodology can outperform individual information provided by single descriptors in well-known public datasets. © 2012 IEEE.
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
Low cost RGB-D cameras such as the Microsoft’s Kinect or the Asus’s Xtion Pro are completely changing the computer vision world, as they are being successfully used in several applications and research areas. Depth data are particularly attractive and suitable for applications based on moving objects detection through foreground/background segmentation approaches; the RGB-D applications proposed in literature employ, in general, state of the art foreground/background segmentation techniques based on the depth information without taking into account the color information. The novel approach that we propose is based on a combination of classifiers that allows improving background subtraction accuracy with respect to state of the art algorithms by jointly considering color and depth data. In particular, the combination of classifiers is based on a weighted average that allows to adaptively modifying the support of each classifier in the ensemble by considering foreground detections in the previous frames and the depth and color edges. In this way, it is possible to reduce false detections due to critical issues that can not be tackled by the individual classifiers such as: shadows and illumination changes, color and depth camouflage, moved background objects and noisy depth measurements. Moreover, we propose, for the best of the author’s knowledge, the first publicly available RGB-D benchmark dataset with hand-labeled ground truth of several challenging scenarios to test background/foreground segmentation algorithms.