442 resultados para SVM
Resumo:
The problem of impostor dataset selection for GMM-based speaker verification is addressed through the recently proposed data-driven background dataset refinement technique. The SVM-based refinement technique selects from a candidate impostor dataset those examples that are most frequently selected as support vectors when training a set of SVMs on a development corpus. This study demonstrates the versatility of dataset refinement in the task of selecting suitable impostor datasets for use in GMM-based speaker verification. The use of refined Z- and T-norm datasets provided performance gains of 15% in EER in the NIST 2006 SRE over the use of heuristically selected datasets. The refined datasets were shown to generalise well to the unseen data of the NIST 2008 SRE.
Resumo:
A data-driven background dataset refinement technique was recently proposed for SVM based speaker verification. This method selects a refined SVM background dataset from a set of candidate impostor examples after individually ranking examples by their relevance. This paper extends this technique to the refinement of the T-norm dataset for SVM-based speaker verification. The independent refinement of the background and T-norm datasets provides a means of investigating the sensitivity of SVM-based speaker verification performance to the selection of each of these datasets. Using refined datasets provided improvements of 13% in min. DCF and 9% in EER over the full set of impostor examples on the 2006 SRE corpus with the majority of these gains due to refinement of the T-norm dataset. Similar trends were observed for the unseen data of the NIST 2008 SRE.
Resumo:
This paper presents Scatter Difference Nuisance Attribute Projection (SD-NAP) as an enhancement to NAP for SVM-based speaker verification. While standard NAP may inadvertently remove desirable speaker variability, SD-NAP explicitly de-emphasises this variability by incorporating a weighted version of the between-class scatter into the NAP optimisation criterion. Experimental evaluation of SD-NAP with a variety of SVM systems on the 2006 and 2008 NIST SRE corpora demonstrate that SD-NAP provides improved verification performance over standard NAP in most cases, particularly at the EER operating point.
Resumo:
Automatic recognition of people is an active field of research with important forensic and security applications. In these applications, it is not always possible for the subject to be in close proximity to the system. Voice represents a human behavioural trait which can be used to recognise people in such situations. Automatic Speaker Verification (ASV) is the process of verifying a persons identity through the analysis of their speech and enables recognition of a subject at a distance over a telephone channel { wired or wireless. A significant amount of research has focussed on the application of Gaussian mixture model (GMM) techniques to speaker verification systems providing state-of-the-art performance. GMM's are a type of generative classifier trained to model the probability distribution of the features used to represent a speaker. Recently introduced to the field of ASV research is the support vector machine (SVM). An SVM is a discriminative classifier requiring examples from both positive and negative classes to train a speaker model. The SVM is based on margin maximisation whereby a hyperplane attempts to separate classes in a high dimensional space. SVMs applied to the task of speaker verification have shown high potential, particularly when used to complement current GMM-based techniques in hybrid systems. This work aims to improve the performance of ASV systems using novel and innovative SVM-based techniques. Research was divided into three main themes: session variability compensation for SVMs; unsupervised model adaptation; and impostor dataset selection. The first theme investigated the differences between the GMM and SVM domains for the modelling of session variability | an aspect crucial for robust speaker verification. Techniques developed to improve the robustness of GMMbased classification were shown to bring about similar benefits to discriminative SVM classification through their integration in the hybrid GMM mean supervector SVM classifier. Further, the domains for the modelling of session variation were contrasted to find a number of common factors, however, the SVM-domain consistently provided marginally better session variation compensation. Minimal complementary information was found between the techniques due to the similarities in how they achieved their objectives. The second theme saw the proposal of a novel model for the purpose of session variation compensation in ASV systems. Continuous progressive model adaptation attempts to improve speaker models by retraining them after exploiting all encountered test utterances during normal use of the system. The introduction of the weight-based factor analysis model provided significant performance improvements of over 60% in an unsupervised scenario. SVM-based classification was then integrated into the progressive system providing further benefits in performance over the GMM counterpart. Analysis demonstrated that SVMs also hold several beneficial characteristics to the task of unsupervised model adaptation prompting further research in the area. In pursuing the final theme, an innovative background dataset selection technique was developed. This technique selects the most appropriate subset of examples from a large and diverse set of candidate impostor observations for use as the SVM background by exploiting the SVM training process. This selection was performed on a per-observation basis so as to overcome the shortcoming of the traditional heuristic-based approach to dataset selection. Results demonstrate the approach to provide performance improvements over both the use of the complete candidate dataset and the best heuristically-selected dataset whilst being only a fraction of the size. The refined dataset was also shown to generalise well to unseen corpora and be highly applicable to the selection of impostor cohorts required in alternate techniques for speaker verification.
Resumo:
The recently proposed data-driven background dataset refinement technique provides a means of selecting an informative background for support vector machine (SVM)-based speaker verification systems. This paper investigates the characteristics of the impostor examples in such highly-informative background datasets. Data-driven dataset refinement individually evaluates the suitability of candidate impostor examples for the SVM background prior to selecting the highest-ranking examples as a refined background dataset. Further, the characteristics of the refined dataset were analysed to investigate the desired traits of an informative SVM background. The most informative examples of the refined dataset were found to consist of large amounts of active speech and distinctive language characteristics. The data-driven refinement technique was shown to filter the set of candidate impostor examples to produce a more disperse representation of the impostor population in the SVM kernel space, thereby reducing the number of redundant and less-informative examples in the background dataset. Furthermore, data-driven refinement was shown to provide performance gains when applied to the difficult task of refining a small candidate dataset that was mis-matched to the evaluation conditions.
Resumo:
The ability to accurately predict the remaining useful life of machine components is critical for machine continuous operation, and can also improve productivity and enhance system safety. In condition-based maintenance (CBM), maintenance is performed based on information collected through condition monitoring and an assessment of the machine health. Effective diagnostics and prognostics are important aspects of CBM for maintenance engineers to schedule a repair and to acquire replacement components before the components actually fail. All machine components are subjected to degradation processes in real environments and they have certain failure characteristics which can be related to the operating conditions. This paper describes a technique for accurate assessment of the remnant life of machines based on health state probability estimation and involving historical knowledge embedded in the closed loop diagnostics and prognostics systems. The technique uses a Support Vector Machine (SVM) classifier as a tool for estimating health state probability of machine degradation, which can affect the accuracy of prediction. To validate the feasibility of the proposed model, real life historical data from bearings of High Pressure Liquefied Natural Gas (HP-LNG) pumps were analysed and used to obtain the optimal prediction of remaining useful life. The results obtained were very encouraging and showed that the proposed prognostic system based on health state probability estimation has the potential to be used as an estimation tool for remnant life prediction in industrial machinery.
Resumo:
Membrane proteins play important roles in many biochemical processes and are also attractive targets of drug discovery for various diseases. The elucidation of membrane protein types provides clues for understanding the structure and function of proteins. Recently we developed a novel system for predicting protein subnuclear localizations. In this paper, we propose a simplified version of our system for predicting membrane protein types directly from primary protein structures, which incorporates amino acid classifications and physicochemical properties into a general form of pseudo-amino acid composition. In this simplified system, we will design a two-stage multi-class support vector machine combined with a two-step optimal feature selection process, which proves very effective in our experiments. The performance of the present method is evaluated on two benchmark datasets consisting of five types of membrane proteins. The overall accuracies of prediction for five types are 93.25% and 96.61% via the jackknife test and independent dataset test, respectively. These results indicate that our method is effective and valuable for predicting membrane protein types. A web server for the proposed method is available at http://www.juemengt.com/jcc/memty_page.php
Resumo:
This paper presents an effective classification method based on Support Vector Machines (SVM) in the context of activity recognition. Local features that capture both spatial and temporal information in activity videos have made significant progress recently. Efficient and effective features, feature representation and classification plays a crucial role in activity recognition. For classification, SVMs are popularly used because of their simplicity and efficiency; however the common multi-class SVM approaches applied suffer from limitations including having easily confused classes and been computationally inefficient. We propose using a binary tree SVM to address the shortcomings of multi-class SVMs in activity recognition. We proposed constructing a binary tree using Gaussian Mixture Models (GMM), where activities are repeatedly allocated to subnodes until every new created node contains only one activity. Then, for each internal node a separate SVM is learned to classify activities, which significantly reduces the training time and increases the speed of testing compared to popular the `one-against-the-rest' multi-class SVM classifier. Experiments carried out on the challenging and complex Hollywood dataset demonstrates comparable performance over the baseline bag-of-features method.
Resumo:
XML has emerged as a medium for interoperability over the Internet. As the number of documents published in the form of XML is increasing there is a need for selective dissemination of XML documents based on user interests. In the proposed technique, a combination of Self Adaptive Migration Model Genetic Algorithm (SAMCA)[5] and multi class Support Vector Machine (SVM) are used to learn a user model. Based on the feedback from the users the system automatically adapts to the user's preference and interests. The user model and a similarity metric are used for selective dissemination of a continuous stream of XML documents. Experimental evaluations performed over a wide range of XML documents indicate that the proposed approach significantly improves the performance of the selective dissemination task, with respect to accuracy and efficiency.
Resumo:
This paper aims at evaluating the methods of multiclass support vector machines (SVMs) for effective use in distance relay coordination. Also, it describes a strategy of supportive systems to aid the conventional protection philosophy in combating situations where protection systems have maloperated and/or information is missing and provide selective and secure coordinations. SVMs have considerable potential as zone classifiers of distance relay coordination. This typically requires a multiclass SVM classifier to effectively analyze/build the underlying concept between reach of different zones and the apparent impedance trajectory during fault. Several methods have been proposed for multiclass classification where typically several binary SVM classifiers are combined together. Some authors have extended binary SVM classification to one-step single optimization operation considering all classes at once. In this paper, one-step multiclass classification, one-against-all, and one-against-one multiclass methods are compared for their performance with respect to accuracy, number of iterations, number of support vectors, training, and testing time. The performance analysis of these three methods is presented on three data sets belonging to training and testing patterns of three supportive systems for a region and part of a network, which is an equivalent 526-bus system of the practical Indian Western grid.
Resumo:
Due to its wide applicability, semi-supervised learning is an attractive method for using unlabeled data in classification. In this work, we present a semi-supervised support vector classifier that is designed using quasi-Newton method for nonsmooth convex functions. The proposed algorithm is suitable in dealing with very large number of examples and features. Numerical experiments on various benchmark datasets showed that the proposed algorithm is fast and gives improved generalization performance over the existing methods. Further, a non-linear semi-supervised SVM has been proposed based on a multiple label switching scheme. This non-linear semi-supervised SVM is found to converge faster and it is found to improve generalization performance on several benchmark datasets. (C) 2010 Elsevier Ltd. All rights reserved.
Resumo:
The following topics were dealt with: document analysis and recognition; multimedia document processing; character recognition; document image processing; cheque processing; form processing; music processing; document segmentation; electronic documents; character classification; handwritten character recognition; information retrieval; postal automation; font recognition; Indian language OCR; handwriting recognition; performance evaluation; graphics recognition; oriental character recognition; and word recognition
Resumo:
The generalization performance of the SVM classifier depends mainly on the VC dimension and the dimensionality of the data. By reducing the VC dimension of the SVM classifier, its generalization performance is expected to increase. In the present paper, we argue that the VC dimension of SVM classifier can be reduced by applying bootstrapping and dimensionality reduction techniques. Experimental results showed that bootstrapping the original data and bootstrapping the projected (dimensionally reduced) data improved the performance of the SVM classifier.
Resumo:
Support vector machines (SVM) are a popular class of supervised models in machine learning. The associated compute intensive learning algorithm limits their use in real-time applications. This paper presents a fully scalable architecture of a coprocessor, which can compute multiple rows of the kernel matrix in parallel. Further, we propose an extended variant of the popular decomposition technique, sequential minimal optimization, which we call hybrid working set (HWS) algorithm, to effectively utilize the benefits of cached kernel columns and the parallel computational power of the coprocessor. The coprocessor is implemented on Xilinx Virtex 7 field-programmable gate array-based VC707 board and achieves a speedup of upto 25x for kernel computation over single threaded computation on Intel Core i5. An application speedup of upto 15x over software implementation of LIBSVM and speedup of upto 23x over SVMLight is achieved using the HWS algorithm in unison with the coprocessor. The reduction in the number of iterations and sensitivity of the optimization time to variation in cache size using the HWS algorithm are also shown.