571 resultados para classifiers
Resumo:
Sentiment analysis over Twitter offer organisations a fast and effective way to monitor the publics' feelings towards their brand, business, directors, etc. A wide range of features and methods for training sentiment classifiers for Twitter datasets have been researched in recent years with varying results. In this paper, we introduce a novel approach of adding semantics as additional features into the training set for sentiment analysis. For each extracted entity (e.g. iPhone) from tweets, we add its semantic concept (e.g. Apple product) as an additional feature, and measure the correlation of the representative concept with negative/positive sentiment. We apply this approach to predict sentiment for three different Twitter datasets. Our results show an average increase of F harmonic accuracy score for identifying both negative and positive sentiment of around 6.5% and 4.8% over the baselines of unigrams and part-of-speech features respectively. We also compare against an approach based on sentiment-bearing topic analysis, and find that semantic features produce better Recall and F score when classifying negative sentiment, and better Precision with lower Recall and F score in positive sentiment classification.
Resumo:
There has been considerable recent research into the connection between Parkinson's disease (PD) and speech impairment. Recently, a wide range of speech signal processing algorithms (dysphonia measures) aiming to predict PD symptom severity using speech signals have been introduced. In this paper, we test how accurately these novel algorithms can be used to discriminate PD subjects from healthy controls. In total, we compute 132 dysphonia measures from sustained vowels. Then, we select four parsimonious subsets of these dysphonia measures using four feature selection algorithms, and map these feature subsets to a binary classification response using two statistical classifiers: random forests and support vector machines. We use an existing database consisting of 263 samples from 43 subjects, and demonstrate that these new dysphonia measures can outperform state-of-the-art results, reaching almost 99% overall classification accuracy using only ten dysphonia features. We find that some of the recently proposed dysphonia measures complement existing algorithms in maximizing the ability of the classifiers to discriminate healthy controls from PD subjects. We see these results as an important step toward noninvasive diagnostic decision support in PD.
Resumo:
Objectives: Recently, pattern recognition approaches have been used to classify patterns of brain activity elicited by sensory or cognitive processes. In the clinical context, these approaches have been mainly applied to classify groups of individuals based on structural magnetic resonance imaging (MRI) data. Only a few studies have applied similar methods to functional MRI (fMRI) data. Methods: We used a novel analytic framework to examine the extent to which unipolar and bipolar depressed individuals differed on discrimination between patterns of neural activity for happy and neutral faces. We used data from 18 currently depressed individuals with bipolar I disorder (BD) and 18 currently depressed individuals with recurrent unipolar depression (UD), matched on depression severity, age, and illness duration, and 18 age- and gender ratio-matched healthy comparison subjects (HC). fMRI data were analyzed using a general linear model and Gaussian process classifiers. Results: The accuracy for discriminating between patterns of neural activity for happy versus neutral faces overall was lower in both patient groups relative to HC. The predictive probabilities for intense and mild happy faces were higher in HC than in BD, and for mild happy faces were higher in HC than UD (all p < 0.001). Interestingly, the predictive probability for intense happy faces was significantly higher in UD than BD (p = 0.03). Conclusions: These results indicate that patterns of whole-brain neural activity to intense happy faces were significantly less distinct from those for neutral faces in BD than in either HC or UD. These findings indicate that pattern recognition approaches can be used to identify abnormal brain activity patterns in patient populations and have promising clinical utility as techniques that can help to discriminate between patients with different psychiatric illnesses.
Resumo:
Background - Bipolar disorder (BD) is one of the leading causes of disability worldwide. Patients are further disadvantaged by delays in accurate diagnosis ranging between 5 and 10 years. We applied Gaussian process classifiers (GPCs) to structural magnetic resonance imaging (sMRI) data to evaluate the feasibility of using pattern recognition techniques for the diagnostic classification of patients with BD. Method - GPCs were applied to gray (GM) and white matter (WM) sMRI data derived from two independent samples of patients with BD (cohort 1: n = 26; cohort 2: n = 14). Within each cohort patients were matched on age, sex and IQ to an equal number of healthy controls. Results - The diagnostic accuracy of the GPC for GM was 73% in cohort 1 and 72% in cohort 2; the sensitivity and specificity of the GM classification were respectively 69% and 77% in cohort 1 and 64% and 99% in cohort 2. The diagnostic accuracy of the GPC for WM was 69% in cohort 1 and 78% in cohort 2; the sensitivity and specificity of the WM classification were both 69% in cohort 1 and 71% and 86% respectively in cohort 2. In both samples, GM and WM clusters discriminating between patients and controls were localized within cortical and subcortical structures implicated in BD. Conclusions - Our results demonstrate the predictive value of neuroanatomical data in discriminating patients with BD from healthy individuals. The overlap between discriminative networks and regions implicated in the pathophysiology of BD supports the biological plausibility of the classifiers.
Resumo:
Combining the results of classifiers has shown much promise in machine learning generally. However, published work on combining text categorizers suggests that, for this particular application, improvements in performance are hard to attain. Explorative research using a simple voting system is presented and discussed in the light of a probabilistic model that was originally developed for safety critical software. It was found that typical categorization approaches produce predictions which are too similar for combining them to be effective since they tend to fail on the same records. Further experiments using two less orthodox categorizers are also presented which suggest that combining text categorizers can be successful, provided the essential element of ‘difference’ is considered.
Resumo:
MOTIVATION: G protein-coupled receptors (GPCRs) play an important role in many physiological systems by transducing an extracellular signal into an intracellular response. Over 50% of all marketed drugs are targeted towards a GPCR. There is considerable interest in developing an algorithm that could effectively predict the function of a GPCR from its primary sequence. Such an algorithm is useful not only in identifying novel GPCR sequences but in characterizing the interrelationships between known GPCRs. RESULTS: An alignment-free approach to GPCR classification has been developed using techniques drawn from data mining and proteochemometrics. A dataset of over 8000 sequences was constructed to train the algorithm. This represents one of the largest GPCR datasets currently available. A predictive algorithm was developed based upon the simplest reasonable numerical representation of the protein's physicochemical properties. A selective top-down approach was developed, which used a hierarchical classifier to assign sequences to subdivisions within the GPCR hierarchy. The predictive performance of the algorithm was assessed against several standard data mining classifiers and further validated against Support Vector Machine-based GPCR prediction servers. The selective top-down approach achieves significantly higher accuracy than standard data mining methods in almost all cases.
Resumo:
Rotation invariance is important for an iris recognition system since changes of head orientation and binocular vergence may cause eye rotation. The conventional methods of iris recognition cannot achieve true rotation invariance. They only achieve approximate rotation invariance by rotating the feature vector before matching or unwrapping the iris ring at different initial angles. In these methods, the complexity of the method is increased, and when the rotation scale is beyond the certain scope, the error rates of these methods may substantially increase. In order to solve this problem, a new rotation invariant approach for iris feature extraction based on the non-separable wavelet is proposed in this paper. Firstly, a bank of non-separable orthogonal wavelet filters is used to capture characteristics of the iris. Secondly, a method of Markov random fields is used to capture rotation invariant iris feature. Finally, two-class kernel Fisher classifiers are adopted for classification. Experimental results on public iris databases show that the proposed approach has a low error rate and achieves true rotation invariance. © 2010.
Resumo:
Today, due to globalization of the world the size of data set is increasing, it is necessary to discover the knowledge. The discovery of knowledge can be typically in the form of association rules, classification rules, clustering, discovery of frequent episodes and deviation detection. Fast and accurate classifiers for large databases are an important task in data mining. There is growing evidence that integrating classification and association rules mining, classification approaches based on heuristic, greedy search like decision tree induction. Emerging associative classification algorithms have shown good promises on producing accurate classifiers. In this paper we focus on performance of associative classification and present a parallel model for classifier building. For classifier building some parallel-distributed algorithms have been proposed for decision tree induction but so far no such work has been reported for associative classification.
Resumo:
Electrocardiography (ECG) has been recently proposed as biometric trait for identification purposes. Intra-individual variations of ECG might affect identification performance. These variations are mainly due to Heart Rate Variability (HRV). In particular, HRV causes changes in the QT intervals along the ECG waveforms. This work is aimed at analysing the influence of seven QT interval correction methods (based on population models) on the performance of ECG-fiducial-based identification systems. In addition, we have also considered the influence of training set size, classifier, classifier ensemble as well as the number of consecutive heartbeats in a majority voting scheme. The ECG signals used in this study were collected from thirty-nine subjects within the Physionet open access database. Public domain software was used for fiducial points detection. Results suggested that QT correction is indeed required to improve the performance. However, there is no clear choice among the seven explored approaches for QT correction (identification rate between 0.97 and 0.99). MultiLayer Perceptron and Support Vector Machine seemed to have better generalization capabilities, in terms of classification performance, with respect to Decision Tree-based classifiers. No such strong influence of the training-set size and the number of consecutive heartbeats has been observed on the majority voting scheme.
Resumo:
In this report we summarize the state-of-the-art of speech emotion recognition from the signal processing point of view. On the bases of multi-corporal experiments with machine-learning classifiers, the observation is made that existing approaches for supervised machine learning lead to database dependent classifiers which can not be applied for multi-language speech emotion recognition without additional training because they discriminate the emotion classes following the used training language. As there are experimental results showing that Humans can perform language independent categorisation, we made a parallel between machine recognition and the cognitive process and tried to discover the sources of these divergent results. The analysis suggests that the main difference is that the speech perception allows extraction of language independent features although language dependent features are incorporated in all levels of the speech signal and play as a strong discriminative function in human perception. Based on several results in related domains, we have suggested that in addition, the cognitive process of emotion-recognition is based on categorisation, assisted by some hierarchical structure of the emotional categories, existing in the cognitive space of all humans. We propose a strategy for developing language independent machine emotion recognition, related to the identification of language independent speech features and the use of additional information from visual (expression) features.
Resumo:
This paper addresses the task of learning classifiers from streams of labelled data. In this case we can face the problem that the underlying concepts can change over time. The paper studies two mechanisms developed for dealing with changing concepts. Both are based on the time window idea. The first one forgets gradually, by assigning to the examples weight that gradually decreases over time. The second one uses a statistical test to detect changes in concept and then optimizes the size of the time window, aiming to maximise the classification accuracy on the new examples. Both methods are general in nature and can be used with any learning algorithm. The objectives of the conducted experiments were to compare the mechanisms and explore whether they can be combined to achieve a synergetic e ect. Results from experiments with three basic learning algorithms (kNN, ID3 and NBC) using four datasets are reported and discussed.
Resumo:
Bayesian algorithms pose a limit to the performance learning algorithms can achieve. Natural selection should guide the evolution of information processing systems towards those limits. What can we learn from this evolution and what properties do the intermediate stages have? While this question is too general to permit any answer, progress can be made by restricting the class of information processing systems under study. We present analytical and numerical results for the evolution of on-line algorithms for learning from examples for neural network classifiers, which might include or not a hidden layer. The analytical results are obtained by solving a variational problem to determine the learning algorithm that leads to maximum generalization ability. Simulations using evolutionary programming, for programs that implement learning algorithms, confirm and expand the results. The principal result is not just that the evolution is towards a Bayesian limit. Indeed it is essentially reached. In addition we find that evolution is driven by the discovery of useful structures or combinations of variables and operators. In different runs the temporal order of the discovery of such combinations is unique. The main result is that combinations that signal the surprise brought by an example arise always before combinations that serve to gauge the performance of the learning algorithm. This latter structures can be used to implement annealing schedules. The temporal ordering can be understood analytically as well by doing the functional optimization in restricted functional spaces. We also show that there is data suggesting that the appearance of these traits also follows the same temporal ordering in biological systems. © 2006 American Institute of Physics.
Resumo:
The problem of recognition on finite set of events is considered. The generalization ability of classifiers for this problem is studied within the Bayesian approach. The method for non-uniform prior distribution specification on recognition tasks is suggested. It takes into account the assumed degree of intersection between classes. The results of the analysis are applied for pruning of classification trees.
Resumo:
When combining remote sensing imagery with statistical classifiers to obtain categorical thematic maps it is not usual to provide data about the spatial distribution of the error and uncertainty of the resulting maps. This paper describes, in the context of GeoViQua FP7 project, feasible approaches for methods based on several steps such as hybrid classifiers. Both for “per pixel” and “per polygon” strategies, the proposal is based on the use of the available ground truth, which is used to properly model the spatial distribution of the errors. Results allow mapping the classification success with a very high level of reliability (R2>0,94), providing users a sound knowledge of the accuracy at every area of the map.
Resumo:
Identification of humans via ECG is being increasingly studied because it can have several advantages over the traditional biometric identification techniques. However, difficulties arise because of the heartrate variability. In this study we analysed the influence of QT interval correction on the performance of an identification system based on temporal and amplitude features of ECG. In particular we tested MLP, Naive Bayes and 3-NN classifiers on the Fantasia database. Results indicate that QT correction can significantly improve the overall system performance. © 2013 IEEE.