802 resultados para Support Vector Machine


Relevância:

80.00% 80.00%

Publicador:

Resumo:

We derive a mean field algorithm for binary classification with Gaussian processes which is based on the TAP approach originally proposed in Statistical Physics of disordered systems. The theory also yields an approximate leave-one-out estimator for the generalization error which is computed with no extra computational cost. We show that from the TAP approach, it is possible to derive both a simpler 'naive' mean field theory and support vector machines (SVM) as limiting cases. For both mean field algorithms and support vectors machines, simulation results for three small benchmark data sets are presented. They show 1. that one may get state of the art performance by using the leave-one-out estimator for model selection and 2. the built-in leave-one-out estimators are extremely precise when compared to the exact leave-one-out estimate. The latter result is a taken as a strong support for the internal consistency of the mean field approach.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In this chapter, we elaborate on the well-known relationship between Gaussian processes (GP) and Support Vector Machines (SVM). Secondly, we present approximate solutions for two computational problems arising in GP and SVM. The first one is the calculation of the posterior mean for GP classifiers using a `naive' mean field approach. The second one is a leave-one-out estimator for the generalization error of SVM based on a linear response method. Simulation results on a benchmark dataset show similar performances for the GP mean field algorithm and the SVM algorithm. The approximate leave-one-out estimator is found to be in very good agreement with the exact leave-one-out error.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We apply methods of Statistical Mechanics to study the generalization performance of Support vector Machines in large data spaces.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We propose a hybrid generative/discriminative framework for semantic parsing which combines the hidden vector state (HVS) model and the hidden Markov support vector machines (HM-SVMs). The HVS model is an extension of the basic discrete Markov model in which context is encoded as a stack-oriented state vector. The HM-SVMs combine the advantages of the hidden Markov models and the support vector machines. By employing a modified K-means clustering method, a small set of most representative sentences can be automatically selected from an un-annotated corpus. These sentences together with their abstract annotations are used to train an HVS model which could be subsequently applied on the whole corpus to generate semantic parsing results. The most confident semantic parsing results are selected to generate a fully-annotated corpus which is used to train the HM-SVMs. The proposed framework has been tested on the DARPA Communicator Data. Experimental results show that an improvement over the baseline HVS parser has been observed using the hybrid framework. When compared with the HM-SVMs trained from the fully-annotated corpus, the hybrid framework gave a comparable performance with only a small set of lightly annotated sentences. © 2008. Licensed under the Creative Commons.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Web APIs have gained increasing popularity in recent Web service technology development owing to its simplicity of technology stack and the proliferation of mashups. However, efficiently discovering Web APIs and the relevant documentations on the Web is still a challenging task even with the best resources available on the Web. In this paper we cast the problem of detecting the Web API documentations as a text classification problem of classifying a given Web page as Web API associated or not. We propose a supervised generative topic model called feature latent Dirichlet allocation (feaLDA) which offers a generic probabilistic framework for automatic detection of Web APIs. feaLDA not only captures the correspondence between data and the associated class labels, but also provides a mechanism for incorporating side information such as labelled features automatically learned from data that can effectively help improving classification performance. Extensive experiments on our Web APIs documentation dataset shows that the feaLDA model outperforms three strong supervised baselines including naive Bayes, support vector machines, and the maximum entropy model, by over 3% in classification accuracy. In addition, feaLDA also gives superior performance when compared against other existing supervised topic models.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

There has been considerable recent research into the connection between Parkinson's disease (PD) and speech impairment. Recently, a wide range of speech signal processing algorithms (dysphonia measures) aiming to predict PD symptom severity using speech signals have been introduced. In this paper, we test how accurately these novel algorithms can be used to discriminate PD subjects from healthy controls. In total, we compute 132 dysphonia measures from sustained vowels. Then, we select four parsimonious subsets of these dysphonia measures using four feature selection algorithms, and map these feature subsets to a binary classification response using two statistical classifiers: random forests and support vector machines. We use an existing database consisting of 263 samples from 43 subjects, and demonstrate that these new dysphonia measures can outperform state-of-the-art results, reaching almost 99% overall classification accuracy using only ten dysphonia features. We find that some of the recently proposed dysphonia measures complement existing algorithms in maximizing the ability of the classifiers to discriminate healthy controls from PD subjects. We see these results as an important step toward noninvasive diagnostic decision support in PD.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The binding between peptide epitopes and major histocompatibility complex (MHC) proteins is a major event in the cellular immune response. Accurate prediction of the binding between short peptides and class I or class II MHC molecules is an important task in immunoinformatics. SVRMHC which is a novel method to model peptide-MHC binding affinities based on support rector machine regression (SVR) is described in this chapter. SVRMHC is among a small handful of quantitative modeling methods that make predictions about precise binding affinities between a peptide and an MHC molecule. As a kernel-based learning method, SVRMHC has rendered models with demonstrated appealing performance in the practice of modeling peptide-MHC binding.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Natural language understanding is to specify a computational model that maps sentences to their semantic mean representation. In this paper, we propose a novel framework to train the statistical models without using expensive fully annotated data. In particular, the input of our framework is a set of sentences labeled with abstract semantic annotations. These annotations encode the underlying embedded semantic structural relations without explicit word/semantic tag alignment. The proposed framework can automatically induce derivation rules that map sentences to their semantic meaning representations. The learning framework is applied on two statistical models, the conditional random fields (CRFs) and the hidden Markov support vector machines (HM-SVMs). Our experimental results on the DARPA communicator data show that both CRFs and HM-SVMs outperform the baseline approach, previously proposed hidden vector state (HVS) model which is also trained on abstract semantic annotations. In addition, the proposed framework shows superior performance than two other baseline approaches, a hybrid framework combining HVS and HM-SVMs and discriminative training of HVS, with a relative error reduction rate of about 25% and 15% being achieved in F-measure.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

2000 Mathematics Subject Classification: 62H30, 62J20, 62P12, 68T99

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Protein-DNA interactions are involved in many fundamental biological processes essential for cellular function. Most of the existing computational approaches employed only the sequence context of the target residue for its prediction. In the present study, for each target residue, we applied both the spatial context and the sequence context to construct the feature space. Subsequently, Latent Semantic Analysis (LSA) was applied to remove the redundancies in the feature space. Finally, a predictor (PDNAsite) was developed through the integration of the support vector machines (SVM) classifier and ensemble learning. Results on the PDNA-62 and the PDNA-224 datasets demonstrate that features extracted from spatial context provide more information than those from sequence context and the combination of them gives more performance gain. An analysis of the number of binding sites in the spatial context of the target site indicates that the interactions between binding sites next to each other are important for protein-DNA recognition and their binding ability. The comparison between our proposed PDNAsite method and the existing methods indicate that PDNAsite outperforms most of the existing methods and is a useful tool for DNA-binding site identification. A web-server of our predictor (http://hlt.hitsz.edu.cn:8080/PDNAsite/) is made available for free public accessible to the biological research community.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This research is to establish new optimization methods for pattern recognition and classification of different white blood cells in actual patient data to enhance the process of diagnosis. Beckman-Coulter Corporation supplied flow cytometry data of numerous patients that are used as training sets to exploit the different physiological characteristics of the different samples provided. The methods of Support Vector Machines (SVM) and Artificial Neural Networks (ANN) were used as promising pattern classification techniques to identify different white blood cell samples and provide information to medical doctors in the form of diagnostic references for the specific disease states, leukemia. The obtained results prove that when a neural network classifier is well configured and trained with cross-validation, it can perform better than support vector classifiers alone for this type of data. Furthermore, a new unsupervised learning algorithm---Density based Adaptive Window Clustering algorithm (DAWC) was designed to process large volumes of data for finding location of high data cluster in real-time. It reduces the computational load to ∼O(N) number of computations, and thus making the algorithm more attractive and faster than current hierarchical algorithms.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Traffic incidents are a major source of traffic congestion on freeways. Freeway traffic diversion using pre-planned alternate routes has been used as a strategy to reduce traffic delays due to major traffic incidents. However, it is not always beneficial to divert traffic when an incident occurs. Route diversion may adversely impact traffic on the alternate routes and may not result in an overall benefit. This dissertation research attempts to apply Artificial Neural Network (ANN) and Support Vector Regression (SVR) techniques to predict the percent of delay reduction from route diversion to help determine whether traffic should be diverted under given conditions. The DYNASMART-P mesoscopic traffic simulation model was applied to generate simulated data that were used to develop the ANN and SVR models. A sample network that comes with the DYNASMART-P package was used as the base simulation network. A combination of different levels of incident duration, capacity lost, percent of drivers diverted, VMS (variable message sign) messaging duration, and network congestion was simulated to represent different incident scenarios. The resulting percent of delay reduction, average speed, and queue length from each scenario were extracted from the simulation output. The ANN and SVR models were then calibrated for percent of delay reduction as a function of all of the simulated input and output variables. The results show that both the calibrated ANN and SVR models, when applied to the same location used to generate the calibration data, were able to predict delay reduction with a relatively high accuracy in terms of mean square error (MSE) and regression correlation. It was also found that the performance of the ANN model was superior to that of the SVR model. Likewise, when the models were applied to a new location, only the ANN model could produce comparatively good delay reduction predictions under high network congestion level.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This dissertation established a state-of-the-art programming tool for designing and training artificial neural networks (ANNs) and showed its applicability to brain research. The developed tool, called NeuralStudio, allows users without programming skills to conduct studies based on ANNs in a powerful and very user friendly interface. A series of unique features has been implemented in NeuralStudio, such as ROC analysis, cross-validation, network averaging, topology optimization, and optimization of the activation function’s slopes. It also included a Support Vector Machines module for comparison purposes. Once the tool was fully developed, it was applied to two studies in brain research. In the first study, the goal was to create and train an ANN to detect epileptic seizures from subdural EEG. This analysis involved extracting features from the spectral power in the gamma frequencies. In the second application, a unique method was devised to link EEG recordings to epileptic and nonepileptic subjects. The contribution of this method consisted of developing a descriptor matrix that can be used to represent any EEG file regarding its duration and the number of electrodes. The first study showed that the inter-electrode mean of the spectral power in the gamma frequencies and its duration above a specific threshold performs better than the other frequencies in seizure detection, exhibiting an accuracy of 95.90%, a sensitivity of 92.59%, and a specificity of 96.84%. The second study yielded that Hjorth’s parameter activity is sufficient to accurately relate EEG to epileptic and non-epileptic subjects. After testing, accuracy, sensitivity and specificity of the classifier were all above 0.9667. Statistical tests measured the superiority of activity at over 99.99 % certainty. It was demonstrated that (1) the spectral power in the gamma frequencies is highly effective in locating seizures from EEG and (2) activity can be used to link EEG recordings to epileptic and non-epileptic subjects. These two studies required high computational load and could be addressed thanks to NeuralStudio. From a medical perspective, both methods proved the merits of NeuralStudio in brain research applications. For its outstanding features, NeuralStudio has been recently awarded a patent (US patent No. 7502763).

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Voice communication systems such as Voice-over IP (VoIP), Public Switched Telephone Networks, and Mobile Telephone Networks, are an integral means of human tele-interaction. These systems pose distinctive challenges due to their unique characteristics such as low volume, burstiness and stringent delay/loss requirements across heterogeneous underlying network technologies. Effective quality evaluation methodologies are important for system development and refinement, particularly by adopting user feedback based measurement. Presently, most of the evaluation models are system-centric (Quality of Service or QoS-based), which questioned us to explore a user-centric (Quality of Experience or QoE-based) approach as a step towards the human-centric paradigm of system design. We research an affect-based QoE evaluation framework which attempts to capture users' perception while they are engaged in voice communication. Our modular approach consists of feature extraction from multiple information sources including various affective cues and different classification procedures such as Support Vector Machines (SVM) and k-Nearest Neighbor (kNN). The experimental study is illustrated in depth with detailed analysis of results. The evidences collected provide the potential feasibility of our approach for QoE evaluation and suggest the consideration of human affective attributes in modeling user experience.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This dissertation established a state-of-the-art programming tool for designing and training artificial neural networks (ANNs) and showed its applicability to brain research. The developed tool, called NeuralStudio, allows users without programming skills to conduct studies based on ANNs in a powerful and very user friendly interface. A series of unique features has been implemented in NeuralStudio, such as ROC analysis, cross-validation, network averaging, topology optimization, and optimization of the activation function’s slopes. It also included a Support Vector Machines module for comparison purposes. Once the tool was fully developed, it was applied to two studies in brain research. In the first study, the goal was to create and train an ANN to detect epileptic seizures from subdural EEG. This analysis involved extracting features from the spectral power in the gamma frequencies. In the second application, a unique method was devised to link EEG recordings to epileptic and non-epileptic subjects. The contribution of this method consisted of developing a descriptor matrix that can be used to represent any EEG file regarding its duration and the number of electrodes. The first study showed that the inter-electrode mean of the spectral power in the gamma frequencies and its duration above a specific threshold performs better than the other frequencies in seizure detection, exhibiting an accuracy of 95.90%, a sensitivity of 92.59%, and a specificity of 96.84%. The second study yielded that Hjorth’s parameter activity is sufficient to accurately relate EEG to epileptic and non-epileptic subjects. After testing, accuracy, sensitivity and specificity of the classifier were all above 0.9667. Statistical tests measured the superiority of activity at over 99.99 % certainty. It was demonstrated that 1) the spectral power in the gamma frequencies is highly effective in locating seizures from EEG and 2) activity can be used to link EEG recordings to epileptic and non-epileptic subjects. These two studies required high computational load and could be addressed thanks to NeuralStudio. From a medical perspective, both methods proved the merits of NeuralStudio in brain research applications. For its outstanding features, NeuralStudio has been recently awarded a patent (US patent No. 7502763).