887 resultados para Support Vector machines


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Tese de mestrado, Bioinformática e Biologia Computacional (Bioinformática), Universidade de Lisboa, Faculdade de Ciências, 2016

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We have used microarray gene expression pro. ling and machine learning to predict the presence of BRAF mutations in a panel of 61 melanoma cell lines. The BRAF gene was found to be mutated in 42 samples (69%) and intragenic mutations of the NRAS gene were detected in seven samples (11%). No cell line carried mutations of both genes. Using support vector machines, we have built a classifier that differentiates between melanoma cell lines based on BRAF mutation status. As few as 83 genes are able to discriminate between BRAF mutant and BRAF wild-type samples with clear separation observed using hierarchical clustering. Multidimensional scaling was used to visualize the relationship between a BRAF mutation signature and that of a generalized mitogen-activated protein kinase ( MAPK) activation ( either BRAF or NRAS mutation) in the context of the discriminating gene list. We observed that samples carrying NRAS mutations lie somewhere between those with or without BRAF mutations. These observations suggest that there are gene-specific mutation signals in addition to a common MAPK activation that result from the pleiotropic effects of either BRAF or NRAS on other signaling pathways, leading to measurably different transcriptional changes.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Motivation: Targeting peptides direct nascent proteins to their specific subcellular compartment. Knowledge of targeting signals enables informed drug design and reliable annotation of gene products. However, due to the low similarity of such sequences and the dynamical nature of the sorting process, the computational prediction of subcellular localization of proteins is challenging. Results: We contrast the use of feed forward models as employed by the popular TargetP/SignalP predictors with a sequence-biased recurrent network model. The models are evaluated in terms of performance at the residue level and at the sequence level, and demonstrate that recurrent networks improve the overall prediction performance. Compared to the original results reported for TargetP, an ensemble of the tested models increases the accuracy by 6 and 5% on non-plant and plant data, respectively.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background. The factors behind the reemergence of severe, invasive group A streptococcal (GAS) diseases are unclear, but it could be caused by altered genetic endowment in these organisms. However, data from previous studies assessing the association between single genetic factors and invasive disease are often conflicting, suggesting that other, as-yet unidentified factors are necessary for the development of this class of disease. Methods. In this study, we used a targeted GAS virulence microarray containing 226 GAS genes to determine the virulence gene repertoires of 68 GAS isolates (42 associated with invasive disease and 28 associated with noninvasive disease) collected in a defined geographic location during a contiguous time period. We then employed 3 advanced machine learning methods (genetic algorithm neural network, support vector machines, and classification trees) to identify genes with an increased association with invasive disease. Results. Virulence gene profiles of individual GAS isolates varied extensively among these geographically and temporally related strains. Using genetic algorithm neural network analysis, we identified 3 genes with a marginal overrepresentation in invasive disease isolates. Significantly, 2 of these genes, ssa and mf4, encoded superantigens but were only present in a restricted set of GAS M-types. The third gene, spa, was found in variable distributions in all M-types in the study. Conclusions. Our comprehensive analysis of GAS virulence profiles provides strong evidence for the incongruent relationships among any of the 226 genes represented on the array and the overall propensity of GAS to cause invasive disease, underscoring the pathogenic complexity of these diseases, as well as the importance of multiple bacteria and/ or host factors.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this study, we propose a novel method to predict the solvent accessible surface areas of transmembrane residues. For both transmembrane alpha-helix and beta-barrel residues, the correlation coefficients between the predicted and observed accessible surface areas are around 0.65. On the basis of predicted accessible surface areas, residues exposed to the lipid environment or buried inside a protein can be identified by using certain cutoff thresholds. We have extensively examined our approach based on different definitions of accessible surface areas and a variety of sets of control parameters. Given that experimentally determining the structures of membrane proteins is very difficult and membrane proteins are actually abundant in nature, our approach is useful for theoretically modeling membrane protein tertiary structures, particularly for modeling the assembly of transmembrane domains. This approach can be used to annotate the membrane proteins in proteomes to provide extra structural and functional information.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

An emerging issue in the field of astronomy is the integration, management and utilization of databases from around the world to facilitate scientific discovery. In this paper, we investigate application of the machine learning techniques of support vector machines and neural networks to the problem of amalgamating catalogues of galaxies as objects from two disparate data sources: radio and optical. Formulating this as a classification problem presents several challenges, including dealing with a highly unbalanced data set. Unlike the conventional approach to the problem (which is based on a likelihood ratio) machine learning does not require density estimation and is shown here to provide a significant improvement in performance. We also report some experiments that explore the importance of the radio and optical data features for the matching problem.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Fast Classification (FC) networks were inspired by a biologically plausible mechanism for short term memory where learning occurs instantaneously. Both weights and the topology for an FC network are mapped directly from the training samples by using a prescriptive training scheme. Only two presentations of the training data are required to train an FC network. Compared with iterative learning algorithms such as Back-propagation (which may require many hundreds of presentations of the training data), the training of FC networks is extremely fast and learning convergence is always guaranteed. Thus FC networks may be suitable for applications where real-time classification is needed. In this paper, the FC networks are applied for the real-time extraction of gene expressions for Chlamydia microarray data. Both the classification performance and learning time of the FC networks are compared with the Multi-Layer Proceptron (MLP) networks and support-vector-machines (SVM) in the same classification task. The FC networks are shown to have extremely fast learning time and comparable classification accuracy.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Machine learning techniques for prediction and rule extraction from artificial neural network methods are used. The hypothesis that market sentiment and IPO specific attributes are equally responsible for first-day IPO returns in the US stock market is tested. Machine learning methods used are Bayesian classifications, support vector machines, decision tree techniques, rule learners and artificial neural networks. The outcomes of the research are predictions and rules associated With first-day returns of technology IPOs. The hypothesis that first-day returns of technology IPOs are equally determined by IPO specific and market sentiment is rejected. Instead lower yielding IPOs are determined by IPO specific and market sentiment attributes, while higher yielding IPOs are largely dependent on IPO specific attributes.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Support vector machines (SVMs) have recently emerged as a powerful technique for solving problems in pattern classification and regression. Best performance is obtained from the SVM its parameters have their values optimally set. In practice, good parameter settings are usually obtained by a lengthy process of trial and error. This paper describes the use of genetic algorithm to evolve these parameter settings for an application in mobile robotics.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In emergency situations, where time for blood transfusion is reduced, the O negative blood type (the universal donor) is administrated. However, sometimes even the universal donor can cause transfusion reactions that can be fatal to the patient. As commercial systems do not allow fast results and are not suitable for emergency situations, this paper presents the steps considered for the development and validation of a prototype, able to determine blood type compatibilities, even in emergency situations. Thus it is possible, using the developed system, to administer a compatible blood type, since the first blood unit transfused. In order to increase the system’s reliability, this prototype uses different approaches to classify blood types, the first of which is based on Decision Trees and the second one based on support vector machines. The features used to evaluate these classifiers are the standard deviation values, histogram, Histogram of Oriented Gradients and fast Fourier transform, computed on different regions of interest. The main characteristics of the presented prototype are small size, lightweight, easy transportation, ease of use, fast results, high reliability and low cost. These features are perfectly suited for emergency scenarios, where the prototype is expected to be used.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We derive a mean field algorithm for binary classification with Gaussian processes which is based on the TAP approach originally proposed in Statistical Physics of disordered systems. The theory also yields an approximate leave-one-out estimator for the generalization error which is computed with no extra computational cost. We show that from the TAP approach, it is possible to derive both a simpler 'naive' mean field theory and support vector machines (SVM) as limiting cases. For both mean field algorithms and support vectors machines, simulation results for three small benchmark data sets are presented. They show 1. that one may get state of the art performance by using the leave-one-out estimator for model selection and 2. the built-in leave-one-out estimators are extremely precise when compared to the exact leave-one-out estimate. The latter result is a taken as a strong support for the internal consistency of the mean field approach.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this chapter, we elaborate on the well-known relationship between Gaussian processes (GP) and Support Vector Machines (SVM). Secondly, we present approximate solutions for two computational problems arising in GP and SVM. The first one is the calculation of the posterior mean for GP classifiers using a `naive' mean field approach. The second one is a leave-one-out estimator for the generalization error of SVM based on a linear response method. Simulation results on a benchmark dataset show similar performances for the GP mean field algorithm and the SVM algorithm. The approximate leave-one-out estimator is found to be in very good agreement with the exact leave-one-out error.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We apply methods of Statistical Mechanics to study the generalization performance of Support vector Machines in large data spaces.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We propose a hybrid generative/discriminative framework for semantic parsing which combines the hidden vector state (HVS) model and the hidden Markov support vector machines (HM-SVMs). The HVS model is an extension of the basic discrete Markov model in which context is encoded as a stack-oriented state vector. The HM-SVMs combine the advantages of the hidden Markov models and the support vector machines. By employing a modified K-means clustering method, a small set of most representative sentences can be automatically selected from an un-annotated corpus. These sentences together with their abstract annotations are used to train an HVS model which could be subsequently applied on the whole corpus to generate semantic parsing results. The most confident semantic parsing results are selected to generate a fully-annotated corpus which is used to train the HM-SVMs. The proposed framework has been tested on the DARPA Communicator Data. Experimental results show that an improvement over the baseline HVS parser has been observed using the hybrid framework. When compared with the HM-SVMs trained from the fully-annotated corpus, the hybrid framework gave a comparable performance with only a small set of lightly annotated sentences. © 2008. Licensed under the Creative Commons.