813 resultados para SUPPORT VECTOR MACHINES
Resumo:
Support vector machine (SVM) is a powerful technique for data classification. Despite of its good theoretic foundations and high classification accuracy, normal SVM is not suitable for classification of large data sets, because the training complexity of SVM is highly dependent on the size of data set. This paper presents a novel SVM classification approach for large data sets by using minimum enclosing ball clustering. After the training data are partitioned by the proposed clustering method, the centers of the clusters are used for the first time SVM classification. Then we use the clusters whose centers are support vectors or those clusters which have different classes to perform the second time SVM classification. In this stage most data are removed. Several experimental results show that the approach proposed in this paper has good classification accuracy compared with classic SVM while the training is significantly faster than several other SVM classifiers.
Resumo:
This paper proposes a new hierarchical learning structure, namely the holistic triple learning (HTL), for extending the binary support vector machine (SVM) to multi-classification problems. For an N-class problem, a HTL constructs a decision tree up to a depth of A leaf node of the decision tree is allowed to be placed with a holistic triple learning unit whose generalisation abilities are assessed and approved. Meanwhile, the remaining nodes in the decision tree each accommodate a standard binary SVM classifier. The holistic triple classifier is a regression model trained on three classes, whose training algorithm is originated from a recently proposed implementation technique, namely the least-squares support vector machine (LS-SVM). A major novelty with the holistic triple classifier is the reduced number of support vectors in the solution. For the resultant HTL-SVM, an upper bound of the generalisation error can be obtained. The time complexity of training the HTL-SVM is analysed, and is shown to be comparable to that of training the one-versus-one (1-vs.-1) SVM, particularly on small-scale datasets. Empirical studies show that the proposed HTL-SVM achieves competitive classification accuracy with a reduced number of support vectors compared to the popular 1-vs-1 alternative.
Resumo:
Nonlinear principal component analysis (PCA) based on neural networks has drawn significant attention as a monitoring tool for complex nonlinear processes, but there remains a difficulty with determining the optimal network topology. This paper exploits the advantages of the Fast Recursive Algorithm, where the number of nodes, the location of centres, and the weights between the hidden layer and the output layer can be identified simultaneously for the radial basis function (RBF) networks. The topology problem for the nonlinear PCA based on neural networks can thus be solved. Another problem with nonlinear PCA is that the derived nonlinear scores may not be statistically independent or follow a simple parametric distribution. This hinders its applications in process monitoring since the simplicity of applying predetermined probability distribution functions is lost. This paper proposes the use of a support vector data description and shows that transforming the nonlinear principal components into a feature space allows a simple statistical inference. Results from both simulated and industrial data confirm the efficacy of the proposed method for solving nonlinear principal component problems, compared with linear PCA and kernel PCA.
Resumo:
Retinopathy of prematurity (ROP) is a rare disease in which retinal blood vessels of premature infants fail to develop normally, and is one of the major causes of childhood blindness throughout the world. The Discrete Conditional Phase-type (DC-Ph) model consists of two components, the conditional component measuring the inter-relationships between covariates and the survival component which models the survival distribution using a Coxian phase-type distribution. This paper expands the DC-Ph models by introducing a support vector machine (SVM), in the role of the conditional component. The SVM is capable of classifying multiple outcomes and is used to identify the infant's risk of developing ROP. Class imbalance makes predicting rare events difficult. A new class decomposition technique, which deals with the problem of multiclass imbalance, is introduced. Based on the SVM classification, the length of stay in the neonatal ward is modelled using a 5, 8 or 9 phase Coxian distribution.
Resumo:
The peptides derived from envelope proteins have been shown to inhibit the protein-protein interactions in the virus membrane fusion process and thus have a great potential to be developed into effective antiviral therapies. There are three types of envelope proteins each exhibiting distinct structure folds. Although the exact fusion mechanism remains elusive, it was suggested that the three classes of viral fusion proteins share a similar mechanism of membrane fusion. The common mechanism of action makes it possible to correlate the properties of self-derived peptide inhibitors with their activities. Here we developed a support vector machine model using sequence-based statistical scores of self-derived peptide inhibitors as input features to correlate with their activities. The model displayed 92% prediction accuracy with the Matthew’s correlation coefficient of 0.84, obviously superior to those using physicochemical properties and amino acid decomposition as input. The predictive support vector machine model for self- derived peptides of envelope proteins would be useful in development of antiviral peptide inhibitors targeting the virus fusion process.
Resumo:
The algorithmic approach to data modelling has developed rapidly these last years, in particular methods based on data mining and machine learning have been used in a growing number of applications. These methods follow a data-driven methodology, aiming at providing the best possible generalization and predictive abilities instead of concentrating on the properties of the data model. One of the most successful groups of such methods is known as Support Vector algorithms. Following the fruitful developments in applying Support Vector algorithms to spatial data, this paper introduces a new extension of the traditional support vector regression (SVR) algorithm. This extension allows for the simultaneous modelling of environmental data at several spatial scales. The joint influence of environmental processes presenting different patterns at different scales is here learned automatically from data, providing the optimum mixture of short and large-scale models. The method is adaptive to the spatial scale of the data. With this advantage, it can provide efficient means to model local anomalies that may typically arise in situations at an early phase of an environmental emergency. However, the proposed approach still requires some prior knowledge on the possible existence of such short-scale patterns. This is a possible limitation of the method for its implementation in early warning systems. The purpose of this paper is to present the multi-scale SVR model and to illustrate its use with an application to the mapping of Cs137 activity given the measurements taken in the region of Briansk following the Chernobyl accident.
Resumo:
This paper presents the application of wavelet processing in the domain of handwritten character recognition. To attain high recognition rate, robust feature extractors and powerful classifiers that are invariant to degree of variability of human writing are needed. The proposed scheme consists of two stages: a feature extraction stage, which is based on Haar wavelet transform and a classification stage that uses support vector machine classifier. Experimental results show that the proposed method is effective