855 resultados para optimal feature selection
Resumo:
Background: The electroencephalogram (EEG) may be described by a large number of different feature types and automated feature selection methods are needed in order to reliably identify features which correlate with continuous independent variables. New method: A method is presented for the automated identification of features that differentiate two or more groups inneurologicaldatasets basedupona spectraldecompositionofthe feature set. Furthermore, the method is able to identify features that relate to continuous independent variables. Results: The proposed method is first evaluated on synthetic EEG datasets and observed to reliably identify the correct features. The method is then applied to EEG recorded during a music listening task and is observed to automatically identify neural correlates of music tempo changes similar to neural correlates identified in a previous study. Finally,the method is applied to identify neural correlates of music-induced affective states. The identified neural correlates reside primarily over the frontal cortex and are consistent with widely reported neural correlates of emotions. Comparison with existing methods: The proposed method is compared to the state-of-the-art methods of canonical correlation analysis and common spatial patterns, in order to identify features differentiating synthetic event-related potentials of different amplitudes and is observed to exhibit greater performance as the number of unique groups in the dataset increases. Conclusions: The proposed method is able to identify neural correlates of continuous variables in EEG datasets and is shown to outperform canonical correlation analysis and common spatial patterns.
Resumo:
Parkinson is a neurodegenerative disease, in which tremor is the main symptom. This paper investigates the use of different classification methods to identify tremors experienced by Parkinsonian patients.Some previous research has focussed tremor analysis on external body signals (e.g., electromyography, accelerometer signals, etc.). Our advantage is that we have access to sub-cortical data, which facilitates the applicability of the obtained results into real medical devices since we are dealing with brain signals directly. Local field potentials (LFP) were recorded in the subthalamic nucleus of 7 Parkinsonian patients through the implanted electrodes of a deep brain stimulation (DBS) device prior to its internalization. Measured LFP signals were preprocessed by means of splinting, down sampling, filtering, normalization and rec-tification. Then, feature extraction was conducted through a multi-level decomposition via a wavelettrans form. Finally, artificial intelligence techniques were applied to feature selection, clustering of tremor types, and tremor detection.The key contribution of this paper is to present initial results which indicate, to a high degree of certainty, that there appear to be two distinct subgroups of patients within the group-1 of patients according to the Consensus Statement of the Movement Disorder Society on Tremor. Such results may well lead to different resultant treatments for the patients involved, depending on how their tremor has been classified. Moreover, we propose a new approach for demand driven stimulation, in which tremor detection is also based on the subtype of tremor the patient has. Applying this knowledge to the tremor detection problem, it can be concluded that the results improve when patient clustering is applied prior to detection.
Resumo:
The design of translation invariant and locally defined binary image operators over large windows is made difficult by decreased statistical precision and increased training time. We present a complete framework for the application of stacked design, a recently proposed technique to create two-stage operators that circumvents that difficulty. We propose a novel algorithm, based on Information Theory, to find groups of pixels that should be used together to predict the Output Value. We employ this algorithm to automate the process of creating a set of first-level operators that are later combined in a global operator. We also propose a principled way to guide this combination, by using feature selection and model comparison. Experimental results Show that the proposed framework leads to better results than single stage design. (C) 2009 Elsevier B.V. All rights reserved.
Resumo:
In a global economy, manufacturers mainly compete with cost efficiency of production, as the price of raw materials are similar worldwide. Heavy industry has two big issues to deal with. On the one hand there is lots of data which needs to be analyzed in an effective manner, and on the other hand making big improvements via investments in cooperate structure or new machinery is neither economically nor physically viable. Machine learning offers a promising way for manufacturers to address both these problems as they are in an excellent position to employ learning techniques with their massive resource of historical production data. However, choosing modelling a strategy in this setting is far from trivial and this is the objective of this article. The article investigates characteristics of the most popular classifiers used in industry today. Support Vector Machines, Multilayer Perceptron, Decision Trees, Random Forests, and the meta-algorithms Bagging and Boosting are mainly investigated in this work. Lessons from real-world implementations of these learners are also provided together with future directions when different learners are expected to perform well. The importance of feature selection and relevant selection methods in an industrial setting are further investigated. Performance metrics have also been discussed for the sake of completion.
Resumo:
Visual attention is a very important task in autonomous robotics, but, because of its complexity, the processing time required is significant. We propose an architecture for feature selection using foveated images that is guided by visual attention tasks and that reduces the processing time required to perform these tasks. Our system can be applied in bottom-up or top-down visual attention. The foveated model determines which scales are to be used on the feature extraction algorithm. The system is able to discard features that are not extremely necessary for the tasks, thus, reducing the processing time. If the fovea is correctly placed, then it is possible to reduce the processing time without compromising the quality of the tasks outputs. The distance of the fovea from the object is also analyzed. If the visual system loses the tracking in top-down attention, basic strategies of fovea placement can be applied. Experiments have shown that it is possible to reduce up to 60% the processing time with this approach. To validate the method, we tested it with the feature algorithm known as Speeded Up Robust Features (SURF), one of the most efficient approaches for feature extraction. With the proposed architecture, we can accomplish real time requirements of robotics vision, mainly to be applied in autonomous robotics
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
Traditional applications of feature selection in areas such as data mining, machine learning and pattern recognition aim to improve the accuracy and to reduce the computational cost of the model. It is done through the removal of redundant, irrelevant or noisy data, finding a representative subset of data that reduces its dimensionality without loss of performance. With the development of research in ensemble of classifiers and the verification that this type of model has better performance than the individual models, if the base classifiers are diverse, comes a new field of application to the research of feature selection. In this new field, it is desired to find diverse subsets of features for the construction of base classifiers for the ensemble systems. This work proposes an approach that maximizes the diversity of the ensembles by selecting subsets of features using a model independent of the learning algorithm and with low computational cost. This is done using bio-inspired metaheuristics with evaluation filter-based criteria
Resumo:
In systems that combine the outputs of classification methods (combination systems), such as ensembles and multi-agent systems, one of the main constraints is that the base components (classifiers or agents) should be diverse among themselves. In other words, there is clearly no accuracy gain in a system that is composed of a set of identical base components. One way of increasing diversity is through the use of feature selection or data distribution methods in combination systems. In this work, an investigation of the impact of using data distribution methods among the components of combination systems will be performed. In this investigation, different methods of data distribution will be used and an analysis of the combination systems, using several different configurations, will be performed. As a result of this analysis, it is aimed to detect which combination systems are more suitable to use feature distribution among the components
Resumo:
The objective of the researches in artificial intelligence is to qualify the computer to execute functions that are performed by humans using knowledge and reasoning. This work was developed in the area of machine learning, that it s the study branch of artificial intelligence, being related to the project and development of algorithms and techniques capable to allow the computational learning. The objective of this work is analyzing a feature selection method for ensemble systems. The proposed method is inserted into the filter approach of feature selection method, it s using the variance and Spearman correlation to rank the feature and using the reward and punishment strategies to measure the feature importance for the identification of the classes. For each ensemble, several different configuration were used, which varied from hybrid (homogeneous) to non-hybrid (heterogeneous) structures of ensemble. They were submitted to five combining methods (voting, sum, sum weight, multiLayer Perceptron and naïve Bayes) which were applied in six distinct database (real and artificial). The classifiers applied during the experiments were k- nearest neighbor, multiLayer Perceptron, naïve Bayes and decision tree. Finally, the performance of ensemble was analyzed comparatively, using none feature selection method, using a filter approach (original) feature selection method and the proposed method. To do this comparison, a statistical test was applied, which demonstrate that there was a significant improvement in the precision of the ensembles
Resumo:
Classifier ensembles are systems composed of a set of individual classifiers and a combination module, which is responsible for providing the final output of the system. In the design of these systems, diversity is considered as one of the main aspects to be taken into account since there is no gain in combining identical classification methods. The ideal situation is a set of individual classifiers with uncorrelated errors. In other words, the individual classifiers should be diverse among themselves. One way of increasing diversity is to provide different datasets (patterns and/or attributes) for the individual classifiers. The diversity is increased because the individual classifiers will perform the same task (classification of the same input patterns) but they will be built using different subsets of patterns and/or attributes. The majority of the papers using feature selection for ensembles address the homogenous structures of ensemble, i.e., ensembles composed only of the same type of classifiers. In this investigation, two approaches of genetic algorithms (single and multi-objective) will be used to guide the distribution of the features among the classifiers in the context of homogenous and heterogeneous ensembles. The experiments will be divided into two phases that use a filter approach of feature selection guided by genetic algorithm
Resumo:
Although non-technical losses automatic identification has been massively studied, the problem of selecting the most representative features in order to boost the identification accuracy has not attracted much attention in this context. In this paper, we focus on this problem applying a novel feature selection algorithm based on Particle Swarm Optimization and Optimum-Path Forest. The results demonstrated that this method can improve the classification accuracy of possible frauds up to 49% in some datasets composed by industrial and commercial profiles. © 2011 IEEE.
Resumo:
Musical genre classification has been paramount in the last years, mainly in large multimedia datasets, in which new songs and genres can be added at every moment by anyone. In this context, we have seen the growing of musical recommendation systems, which can improve the benefits for several applications, such as social networks and collective musical libraries. In this work, we have introduced a recent machine learning technique named Optimum-Path Forest (OPF) for musical genre classification, which has been demonstrated to be similar to the state-of-the-art pattern recognition techniques, but much faster for some applications. Experiments in two public datasets were conducted against Support Vector Machines and a Bayesian classifier to show the validity of our work. In addition, we have executed an experiment using very recent hybrid feature selection techniques based on OPF to speed up feature extraction process. © 2011 International Society for Music Information Retrieval.
Resumo:
Pós-graduação em Engenharia Elétrica - FEIS
Resumo:
Para compor um sistema de Reconhecimento Automático de Voz, pode ser utilizada uma tarefa chamada Classificação Fonética, onde a partir de uma amostra de voz decide-se qual fonema foi emitido por um interlocutor. Para facilitar a classificação e realçar as características mais marcantes dos fonemas, normalmente, as amostras de voz são pré- processadas através de um fronl-en'L Um fron:-end, geralmente, extrai um conjunto de parâmetros para cada amostra de voz. Após este processamento, estes parâmetros são insendos em um algoritmo classificador que (já devidamente treinado) procurará decidir qual o fonema emitido. Existe uma tendência de que quanto maior a quantidade de parâmetros utilizados no sistema, melhor será a taxa de acertos na classificação. A contrapartida para esta tendência é o maior custo computacional envolvido. A técnica de Seleção de Parâmetros tem como função mostrar quais os parâmetros mais relevantes (ou mais utilizados) em uma tarefa de classificação, possibilitando, assim, descobrir quais os parâmetros redundantes, que trazem pouca (ou nenhuma) contribuição à tarefa de classificação. A proposta deste trabalho é aplicar o classificador SVM à classificação fonética, utilizando a base de dados TIMIT, e descobrir os parâmetros mais relevantes na classificação, aplicando a técnica Boosting de Seleção de Parâmetros.
Resumo:
Sistemas de reconhecimento e síntese de voz são constituídos por módulos que dependem da língua e, enquanto existem muitos recursos públicos para alguns idiomas (p.e. Inglês e Japonês), os recursos para Português Brasileiro (PB) ainda são escassos. Outro aspecto é que, para um grande número de tarefas, a taxa de erro dos sistemas de reconhecimento de voz atuais ainda é elevada, quando comparada à obtida por seres humanos. Assim, apesar do sucesso das cadeias escondidas de Markov (HMM), é necessária a pesquisa por novos métodos. Este trabalho tem como motivação esses dois fatos e se divide em duas partes. A primeira descreve o desenvolvimento de recursos e ferramentas livres para reconhecimento e síntese de voz em PB, consistindo de bases de dados de áudio e texto, um dicionário fonético, um conversor grafema-fone, um separador silábico e modelos acústico e de linguagem. Todos os recursos construídos encontram-se publicamente disponíveis e, junto com uma interface de programação proposta, têm sido usados para o desenvolvimento de várias novas aplicações em tempo-real, incluindo um módulo de reconhecimento de voz para a suíte de aplicativos para escritório OpenOffice.org. São apresentados testes de desempenho dos sistemas desenvolvidos. Os recursos aqui produzidos e disponibilizados facilitam a adoção da tecnologia de voz para PB por outros grupos de pesquisa, desenvolvedores e pela indústria. A segunda parte do trabalho apresenta um novo método para reavaliar (rescoring) o resultado do reconhecimento baseado em HMMs, o qual é organizado em uma estrutura de dados do tipo lattice. Mais especificamente, o sistema utiliza classificadores discriminativos que buscam diminuir a confusão entre pares de fones. Para cada um desses problemas binários, são usadas técnicas de seleção automática de parâmetros para escolher a representaçãao paramétrica mais adequada para o problema em questão.