442 resultados para SVM
Resumo:
Digital collections are growing exponentially in size as the information age takes a firm grip on all aspects of society. As a result Information Retrieval (IR) has become an increasingly important area of research. It promises to provide new and more effective ways for users to find information relevant to their search intentions. Document clustering is one of the many tools in the IR toolbox and is far from being perfected. It groups documents that share common features. This grouping allows a user to quickly identify relevant information. If these groups are misleading then valuable information can accidentally be ignored. There- fore, the study and analysis of the quality of document clustering is important. With more and more digital information available, the performance of these algorithms is also of interest. An algorithm with a time complexity of O(n2) can quickly become impractical when clustering a corpus containing millions of documents. Therefore, the investigation of algorithms and data structures to perform clustering in an efficient manner is vital to its success as an IR tool. Document classification is another tool frequently used in the IR field. It predicts categories of new documents based on an existing database of (doc- ument, category) pairs. Support Vector Machines (SVM) have been found to be effective when classifying text documents. As the algorithms for classifica- tion are both efficient and of high quality, the largest gains can be made from improvements to representation. Document representations are vital for both clustering and classification. Representations exploit the content and structure of documents. Dimensionality reduction can improve the effectiveness of existing representations in terms of quality and run-time performance. Research into these areas is another way to improve the efficiency and quality of clustering and classification results. Evaluating document clustering is a difficult task. Intrinsic measures of quality such as distortion only indicate how well an algorithm minimised a sim- ilarity function in a particular vector space. Intrinsic comparisons are inherently limited by the given representation and are not comparable between different representations. Extrinsic measures of quality compare a clustering solution to a “ground truth” solution. This allows comparison between different approaches. As the “ground truth” is created by humans it can suffer from the fact that not every human interprets a topic in the same manner. Whether a document belongs to a particular topic or not can be subjective.
Resumo:
This paper presents an extended study on the implementation of support vector machine(SVM) based speaker verification in systems that employ continuous progressive model adaptation using the weight-based factor analysis model. The weight-based factor analysis model compensates for session variations in unsupervised scenarios by incorporating trial confidence measures in the general statistics used in the inter-session variability modelling process. Employing weight-based factor analysis in Gaussian mixture models (GMM) was recently found to provide significant performance gains to unsupervised classification. Further improvements in performance were found through the integration of SVM-based classification in the system by means of GMM supervectors. This study focuses particularly on the way in which a client is represented in the SVM kernel space using single and multiple target supervectors. Experimental results indicate that training client SVMs using a single target supervector maximises performance while exhibiting a certain robustness to the inclusion of impostor training data in the model. Furthermore, the inclusion of low-scoring target trials in the adaptation process is investigated where they were found to significantly aid performance.
Resumo:
In this paper, we presented an automatic system for precise urban road model reconstruction based on aerial images with high spatial resolution. The proposed approach consists of two steps: i) road surface detection and ii) road pavement marking extraction. In the first step, support vector machine (SVM) was utilized to classify the images into two categories: road and non-road. In the second step, road lane markings are further extracted on the generated road surface based on 2D Gabor filters. The experiments using several pan-sharpened aerial images of Brisbane, Queensland have validated the proposed method.
Resumo:
The theory of nonlinear dyamic systems provides some new methods to handle complex systems. Chaos theory offers new concepts, algorithms and methods for processing, enhancing and analyzing the measured signals. In recent years, researchers are applying the concepts from this theory to bio-signal analysis. In this work, the complex dynamics of the bio-signals such as electrocardiogram (ECG) and electroencephalogram (EEG) are analyzed using the tools of nonlinear systems theory. In the modern industrialized countries every year several hundred thousands of people die due to sudden cardiac death. The Electrocardiogram (ECG) is an important biosignal representing the sum total of millions of cardiac cell depolarization potentials. It contains important insight into the state of health and nature of the disease afflicting the heart. Heart rate variability (HRV) refers to the regulation of the sinoatrial node, the natural pacemaker of the heart by the sympathetic and parasympathetic branches of the autonomic nervous system. Heart rate variability analysis is an important tool to observe the heart's ability to respond to normal regulatory impulses that affect its rhythm. A computerbased intelligent system for analysis of cardiac states is very useful in diagnostics and disease management. Like many bio-signals, HRV signals are non-linear in nature. Higher order spectral analysis (HOS) is known to be a good tool for the analysis of non-linear systems and provides good noise immunity. In this work, we studied the HOS of the HRV signals of normal heartbeat and four classes of arrhythmia. This thesis presents some general characteristics for each of these classes of HRV signals in the bispectrum and bicoherence plots. Several features were extracted from the HOS and subjected an Analysis of Variance (ANOVA) test. The results are very promising for cardiac arrhythmia classification with a number of features yielding a p-value < 0.02 in the ANOVA test. An automated intelligent system for the identification of cardiac health is very useful in healthcare technology. In this work, seven features were extracted from the heart rate signals using HOS and fed to a support vector machine (SVM) for classification. The performance evaluation protocol in this thesis uses 330 subjects consisting of five different kinds of cardiac disease conditions. The classifier achieved a sensitivity of 90% and a specificity of 89%. This system is ready to run on larger data sets. In EEG analysis, the search for hidden information for identification of seizures has a long history. Epilepsy is a pathological condition characterized by spontaneous and unforeseeable occurrence of seizures, during which the perception or behavior of patients is disturbed. An automatic early detection of the seizure onsets would help the patients and observers to take appropriate precautions. Various methods have been proposed to predict the onset of seizures based on EEG recordings. The use of nonlinear features motivated by the higher order spectra (HOS) has been reported to be a promising approach to differentiate between normal, background (pre-ictal) and epileptic EEG signals. In this work, these features are used to train both a Gaussian mixture model (GMM) classifier and a Support Vector Machine (SVM) classifier. Results show that the classifiers were able to achieve 93.11% and 92.67% classification accuracy, respectively, with selected HOS based features. About 2 hours of EEG recordings from 10 patients were used in this study. This thesis introduces unique bispectrum and bicoherence plots for various cardiac conditions and for normal, background and epileptic EEG signals. These plots reveal distinct patterns. The patterns are useful for visual interpretation by those without a deep understanding of spectral analysis such as medical practitioners. It includes original contributions in extracting features from HRV and EEG signals using HOS and entropy, in analyzing the statistical properties of such features on real data and in automated classification using these features with GMM and SVM classifiers.
Resumo:
The use of appropriate features to characterize an output class or object is critical for all classification problems. This paper evaluates the capability of several spectral and texture features for object-based vegetation classification at the species level using airborne high resolution multispectral imagery. Image-objects as the basic classification unit were generated through image segmentation. Statistical moments extracted from original spectral bands and vegetation index image are used as feature descriptors for image objects (i.e. tree crowns). Several state-of-art texture descriptors such as Gray-Level Co-Occurrence Matrix (GLCM), Local Binary Patterns (LBP) and its extensions are also extracted for comparison purpose. Support Vector Machine (SVM) is employed for classification in the object-feature space. The experimental results showed that incorporating spectral vegetation indices can improve the classification accuracy and obtained better results than in original spectral bands, and using moments of Ratio Vegetation Index obtained the highest average classification accuracy in our experiment. The experiments also indicate that the spectral moment features also outperform or can at least compare with the state-of-art texture descriptors in terms of classification accuracy.
Resumo:
A good object representation or object descriptor is one of the key issues in object based image analysis. To effectively fuse color and texture as a unified descriptor at object level, this paper presents a novel method for feature fusion. Color histogram and the uniform local binary patterns are extracted from arbitrary-shaped image-objects, and kernel principal component analysis (kernel PCA) is employed to find nonlinear relationships of the extracted color and texture features. The maximum likelihood approach is used to estimate the intrinsic dimensionality, which is then used as a criterion for automatic selection of optimal feature set from the fused feature. The proposed method is evaluated using SVM as the benchmark classifier and is applied to object-based vegetation species classification using high spatial resolution aerial imagery. Experimental results demonstrate that great improvement can be achieved by using proposed feature fusion method.
Resumo:
This document outlines the system submitted by the Speech and Audio Research Laboratory at the Queensland University of Technology (QUT) for the Speaker Identity Verification: Application task of EVALITA 2009. This competitive submission consisted of a score-level fusion of three component systems; a joint-factor analysis GMM system and two SVM systems using GLDS and GMM supervector kernels. Development evaluation and post-submission results are presented in this study, demonstrating the effectiveness of this fused system approach. This study highlights the challenges associated with system calibration from limited development data and that mismatch between training and testing conditions continues to be a major source of error in speaker verification technology.
Resumo:
A significant proportion of the cost of software development is due to software testing and maintenance. This is in part the result of the inevitable imperfections due to human error, lack of quality during the design and coding of software, and the increasing need to reduce faults to improve customer satisfaction in a competitive marketplace. Given the cost and importance of removing errors improvements in fault detection and removal can be of significant benefit. The earlier in the development process faults can be found, the less it costs to correct them and the less likely other faults are to develop. This research aims to make the testing process more efficient and effective by identifying those software modules most likely to contain faults, allowing testing efforts to be carefully targeted. This is done with the use of machine learning algorithms which use examples of fault prone and not fault prone modules to develop predictive models of quality. In order to learn the numerical mapping between module and classification, a module is represented in terms of software metrics. A difficulty in this sort of problem is sourcing software engineering data of adequate quality. In this work, data is obtained from two sources, the NASA Metrics Data Program, and the open source Eclipse project. Feature selection before learning is applied, and in this area a number of different feature selection methods are applied to find which work best. Two machine learning algorithms are applied to the data - Naive Bayes and the Support Vector Machine - and predictive results are compared to those of previous efforts and found to be superior on selected data sets and comparable on others. In addition, a new classification method is proposed, Rank Sum, in which a ranking abstraction is laid over bin densities for each class, and a classification is determined based on the sum of ranks over features. A novel extension of this method is also described based on an observed polarising of points by class when rank sum is applied to training data to convert it into 2D rank sum space. SVM is applied to this transformed data to produce models the parameters of which can be set according to trade-off curves to obtain a particular performance trade-off.
Resumo:
Multilevel converters are used in high power and high voltage applications due to their attractive benefits in generating high quality output voltage. Increasing the number of voltage levels can lead to a reduction in lower order harmonics. Various modulation and control techniques are introduced for multilevel converters like Space Vector Modulation (SVM), Sinusoidal Pulse Width Modulation (SPWM) and Harmonic Elimination (HE) methods. Multilevel converters may have a DC link with equal or unequal DC voltages. In this paper a new modulation technique based on harmonic elimination method is proposed for those multilevel converters that have unequal DC link voltages. This new technique has better effect on output voltage quality and less Total Harmonic Distortion (THD) than other modulation techniques. In order to verify the proposed modulation technique, MATLAB simulations are carried out for a single-phase diode-clamped inverter.
Resumo:
The Electrocardiogram (ECG) is an important bio-signal representing the sum total of millions of cardiac cell depolarization potentials. It contains important insight into the state of health and nature of the disease afflicting the heart. Heart rate variability (HRV) refers to the regulation of the sinoatrial node, the natural pacemaker of the heart by the sympathetic and parasympathetic branches of the autonomic nervous system. The HRV signal can be used as a base signal to observe the heart's functioning. These signals are non-linear and non-stationary in nature. So, higher order spectral (HOS) analysis, which is more suitable for non-linear systems and is robust to noise, was used. An automated intelligent system for the identification of cardiac health is very useful in healthcare technology. In this work, we have extracted seven features from the heart rate signals using HOS and fed them to a support vector machine (SVM) for classification. Our performance evaluation protocol uses 330 subjects consisting of five different kinds of cardiac disease conditions. We demonstrate a sensitivity of 90% for the classifier with a specificity of 87.93%. Our system is ready to run on larger data sets.
Resumo:
The use of appropriate features to characterise an output class or object is critical for all classification problems. In order to find optimal feature descriptors for vegetation species classification in a power line corridor monitoring application, this article evaluates the capability of several spectral and texture features. A new idea of spectral–texture feature descriptor is proposed by incorporating spectral vegetation indices in statistical moment features. The proposed method is evaluated against several classic texture feature descriptors. Object-based classification method is used and a support vector machine is employed as the benchmark classifier. Individual tree crowns are first detected and segmented from aerial images and different feature vectors are extracted to represent each tree crown. The experimental results showed that the proposed spectral moment features outperform or can at least compare with the state-of-the-art texture descriptors in terms of classification accuracy. A comprehensive quantitative evaluation using receiver operating characteristic space analysis further demonstrates the strength of the proposed feature descriptors.
Resumo:
The support vector machine (SVM) has played an important role in bringing certain themes to the fore in computationally oriented statistics. However, it is important to place the SVM in context as but one member of a class of closely related algorithms for nonlinear classification. As we discuss, several of the “open problems” identified by the authors have in fact been the subject of a significant literature, a literature that may have been missed because it has been aimed not only at the SVM but at a broader family of algorithms. Keeping the broader class of algorithms in mind also helps to make clear that the SVM involves certain specific algorithmic choices, some of which have favorable consequences and others of which have unfavorable consequences—both in theory and in practice. The broader context helps to clarify the ties of the SVM to the surrounding statistical literature.
Resumo:
Log-linear and maximum-margin models are two commonly-used methods in supervised machine learning, and are frequently used in structured prediction problems. Efficient learning of parameters in these models is therefore an important problem, and becomes a key factor when learning from very large data sets. This paper describes exponentiated gradient (EG) algorithms for training such models, where EG updates are applied to the convex dual of either the log-linear or max-margin objective function; the dual in both the log-linear and max-margin cases corresponds to minimizing a convex function with simplex constraints. We study both batch and online variants of the algorithm, and provide rates of convergence for both cases. In the max-margin case, O(1/ε) EG updates are required to reach a given accuracy ε in the dual; in contrast, for log-linear models only O(log(1/ε)) updates are required. For both the max-margin and log-linear cases, our bounds suggest that the online EG algorithm requires a factor of n less computation to reach a desired accuracy than the batch EG algorithm, where n is the number of training examples. Our experiments confirm that the online algorithms are much faster than the batch algorithms in practice. We describe how the EG updates factor in a convenient way for structured prediction problems, allowing the algorithms to be efficiently applied to problems such as sequence learning or natural language parsing. We perform extensive evaluation of the algorithms, comparing them to L-BFGS and stochastic gradient descent for log-linear models, and to SVM-Struct for max-margin models. The algorithms are applied to a multi-class problem as well as to a more complex large-scale parsing task. In all these settings, the EG algorithms presented here outperform the other methods.
Resumo:
Background The majority of peptide bonds in proteins are found to occur in the trans conformation. However, for proline residues, a considerable fraction of Prolyl peptide bonds adopt the cis form. Proline cis/trans isomerization is known to play a critical role in protein folding, splicing, cell signaling and transmembrane active transport. Accurate prediction of proline cis/trans isomerization in proteins would have many important applications towards the understanding of protein structure and function. Results In this paper, we propose a new approach to predict the proline cis/trans isomerization in proteins using support vector machine (SVM). The preliminary results indicated that using Radial Basis Function (RBF) kernels could lead to better prediction performance than that of polynomial and linear kernel functions. We used single sequence information of different local window sizes, amino acid compositions of different local sequences, multiple sequence alignment obtained from PSI-BLAST and the secondary structure information predicted by PSIPRED. We explored these different sequence encoding schemes in order to investigate their effects on the prediction performance. The training and testing of this approach was performed on a newly enlarged dataset of 2424 non-homologous proteins determined by X-Ray diffraction method using 5-fold cross-validation. Selecting the window size 11 provided the best performance for determining the proline cis/trans isomerization based on the single amino acid sequence. It was found that using multiple sequence alignments in the form of PSI-BLAST profiles could significantly improve the prediction performance, the prediction accuracy increased from 62.8% with single sequence to 69.8% and Matthews Correlation Coefficient (MCC) improved from 0.26 with single local sequence to 0.40. Furthermore, if coupled with the predicted secondary structure information by PSIPRED, our method yielded a prediction accuracy of 71.5% and MCC of 0.43, 9% and 0.17 higher than the accuracy achieved based on the singe sequence information, respectively. Conclusion A new method has been developed to predict the proline cis/trans isomerization in proteins based on support vector machine, which used the single amino acid sequence with different local window sizes, the amino acid compositions of local sequence flanking centered proline residues, the position-specific scoring matrices (PSSMs) extracted by PSI-BLAST and the predicted secondary structures generated by PSIPRED. The successful application of SVM approach in this study reinforced that SVM is a powerful tool in predicting proline cis/trans isomerization in proteins and biological sequence analysis.
Resumo:
Epilepsy is characterized by the spontaneous and seemingly unforeseeable occurrence of seizures, during which the perception or behavior of patients is disturbed. An automatic system that detects seizure onsets would allow patients or the people near them to take appropriate precautions, and could provide more insight into this phenomenon. Various methods have been proposed to predict the onset of seizures based on EEG recordings. The use of nonlinear features motivated by the higher order spectra (HOS) has been reported to be a promising approach to differentiate between normal, background (pre-ictal) and epileptic EEG signals. In this work, we made a comparative study of the performance of Gaussian mixture model (GMM) and Support Vector Machine (SVM) classifiers using the features derived from HOS and from the power spectrum. Results show that the selected HOS based features achieve 93.11% classification accuracy compared to 88.78% with features derived from the power spectrum for a GMM classifier. The SVM classifier achieves an improvement from 86.89% with features based on the power spectrum to 92.56% with features based on the bispectrum.