873 resultados para Support Vector Machines and Naive Bayes Classifier


Relevância:

100.00% 100.00%

Publicador:

Resumo:

We derive a mean field algorithm for binary classification with Gaussian processes which is based on the TAP approach originally proposed in Statistical Physics of disordered systems. The theory also yields an approximate leave-one-out estimator for the generalization error which is computed with no extra computational cost. We show that from the TAP approach, it is possible to derive both a simpler 'naive' mean field theory and support vector machines (SVM) as limiting cases. For both mean field algorithms and support vectors machines, simulation results for three small benchmark data sets are presented. They show 1. that one may get state of the art performance by using the leave-one-out estimator for model selection and 2. the built-in leave-one-out estimators are extremely precise when compared to the exact leave-one-out estimate. The latter result is a taken as a strong support for the internal consistency of the mean field approach.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background - The binding between peptide epitopes and major histocompatibility complex proteins (MHCs) is an important event in the cellular immune response. Accurate prediction of the binding between short peptides and the MHC molecules has long been a principal challenge for immunoinformatics. Recently, the modeling of MHC-peptide binding has come to emphasize quantitative predictions: instead of categorizing peptides as "binders" or "non-binders" or as "strong binders" and "weak binders", recent methods seek to make predictions about precise binding affinities. Results - We developed a quantitative support vector machine regression (SVR) approach, called SVRMHC, to model peptide-MHC binding affinities. As a non-linear method, SVRMHC was able to generate models that out-performed existing linear models, such as the "additive method". By adopting a new "11-factor encoding" scheme, SVRMHC takes into account similarities in the physicochemical properties of the amino acids constituting the input peptides. When applied to MHC-peptide binding data for three mouse class I MHC alleles, the SVRMHC models produced more accurate predictions than those produced previously. Furthermore, comparisons based on Receiver Operating Characteristic (ROC) analysis indicated that SVRMHC was able to out-perform several prominent methods in identifying strongly binding peptides. Conclusion - As a method with demonstrated performance in the quantitative modeling of MHC-peptide binding and in identifying strong binders, SVRMHC is a promising immunoinformatics tool with not inconsiderable future potential.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Data fluctuation in multiple measurements of Laser Induced Breakdown Spectroscopy (LIBS) greatly affects the accuracy of quantitative analysis. A new LIBS quantitative analysis method based on the Robust Least Squares Support Vector Machine (RLS-SVM) regression model is proposed. The usual way to enhance the analysis accuracy is to improve the quality and consistency of the emission signal, such as by averaging the spectral signals or spectrum standardization over a number of laser shots. The proposed method focuses more on how to enhance the robustness of the quantitative analysis regression model. The proposed RLS-SVM regression model originates from the Weighted Least Squares Support Vector Machine (WLS-SVM) but has an improved segmented weighting function and residual error calculation according to the statistical distribution of measured spectral data. Through the improved segmented weighting function, the information on the spectral data in the normal distribution will be retained in the regression model while the information on the outliers will be restrained or removed. Copper elemental concentration analysis experiments of 16 certified standard brass samples were carried out. The average value of relative standard deviation obtained from the RLS-SVM model was 3.06% and the root mean square error was 1.537%. The experimental results showed that the proposed method achieved better prediction accuracy and better modeling robustness compared with the quantitative analysis methods based on Partial Least Squares (PLS) regression, standard Support Vector Machine (SVM) and WLS-SVM. It was also demonstrated that the improved weighting function had better comprehensive performance in model robustness and convergence speed, compared with the four known weighting functions.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This thesis studies survival analysis techniques dealing with censoring to produce predictive tools that predict the risk of endovascular aortic aneurysm repair (EVAR) re-intervention. Censoring indicates that some patients do not continue follow up, so their outcome class is unknown. Methods dealing with censoring have drawbacks and cannot handle the high censoring of the two EVAR datasets collected. Therefore, this thesis presents a new solution to high censoring by modifying an approach that was incapable of differentiating between risks groups of aortic complications. Feature selection (FS) becomes complicated with censoring. Most survival FS methods depends on Cox's model, however machine learning classifiers (MLC) are preferred. Few methods adopted MLC to perform survival FS, but they cannot be used with high censoring. This thesis proposes two FS methods which use MLC to evaluate features. The two FS methods use the new solution to deal with censoring. They combine factor analysis with greedy stepwise FS search which allows eliminated features to enter the FS process. The first FS method searches for the best neural networks' configuration and subset of features. The second approach combines support vector machines, neural networks, and K nearest neighbor classifiers using simple and weighted majority voting to construct a multiple classifier system (MCS) for improving the performance of individual classifiers. It presents a new hybrid FS process by using MCS as a wrapper method and merging it with the iterated feature ranking filter method to further reduce the features. The proposed techniques outperformed FS methods based on Cox's model such as; Akaike and Bayesian information criteria, and least absolute shrinkage and selector operator in the log-rank test's p-values, sensitivity, and concordance. This proves that the proposed techniques are more powerful in correctly predicting the risk of re-intervention. Consequently, they enable doctors to set patients’ appropriate future observation plan.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This dissertation established a state-of-the-art programming tool for designing and training artificial neural networks (ANNs) and showed its applicability to brain research. The developed tool, called NeuralStudio, allows users without programming skills to conduct studies based on ANNs in a powerful and very user friendly interface. A series of unique features has been implemented in NeuralStudio, such as ROC analysis, cross-validation, network averaging, topology optimization, and optimization of the activation function’s slopes. It also included a Support Vector Machines module for comparison purposes. Once the tool was fully developed, it was applied to two studies in brain research. In the first study, the goal was to create and train an ANN to detect epileptic seizures from subdural EEG. This analysis involved extracting features from the spectral power in the gamma frequencies. In the second application, a unique method was devised to link EEG recordings to epileptic and nonepileptic subjects. The contribution of this method consisted of developing a descriptor matrix that can be used to represent any EEG file regarding its duration and the number of electrodes. The first study showed that the inter-electrode mean of the spectral power in the gamma frequencies and its duration above a specific threshold performs better than the other frequencies in seizure detection, exhibiting an accuracy of 95.90%, a sensitivity of 92.59%, and a specificity of 96.84%. The second study yielded that Hjorth’s parameter activity is sufficient to accurately relate EEG to epileptic and non-epileptic subjects. After testing, accuracy, sensitivity and specificity of the classifier were all above 0.9667. Statistical tests measured the superiority of activity at over 99.99 % certainty. It was demonstrated that (1) the spectral power in the gamma frequencies is highly effective in locating seizures from EEG and (2) activity can be used to link EEG recordings to epileptic and non-epileptic subjects. These two studies required high computational load and could be addressed thanks to NeuralStudio. From a medical perspective, both methods proved the merits of NeuralStudio in brain research applications. For its outstanding features, NeuralStudio has been recently awarded a patent (US patent No. 7502763).

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This dissertation established a state-of-the-art programming tool for designing and training artificial neural networks (ANNs) and showed its applicability to brain research. The developed tool, called NeuralStudio, allows users without programming skills to conduct studies based on ANNs in a powerful and very user friendly interface. A series of unique features has been implemented in NeuralStudio, such as ROC analysis, cross-validation, network averaging, topology optimization, and optimization of the activation function’s slopes. It also included a Support Vector Machines module for comparison purposes. Once the tool was fully developed, it was applied to two studies in brain research. In the first study, the goal was to create and train an ANN to detect epileptic seizures from subdural EEG. This analysis involved extracting features from the spectral power in the gamma frequencies. In the second application, a unique method was devised to link EEG recordings to epileptic and non-epileptic subjects. The contribution of this method consisted of developing a descriptor matrix that can be used to represent any EEG file regarding its duration and the number of electrodes. The first study showed that the inter-electrode mean of the spectral power in the gamma frequencies and its duration above a specific threshold performs better than the other frequencies in seizure detection, exhibiting an accuracy of 95.90%, a sensitivity of 92.59%, and a specificity of 96.84%. The second study yielded that Hjorth’s parameter activity is sufficient to accurately relate EEG to epileptic and non-epileptic subjects. After testing, accuracy, sensitivity and specificity of the classifier were all above 0.9667. Statistical tests measured the superiority of activity at over 99.99 % certainty. It was demonstrated that 1) the spectral power in the gamma frequencies is highly effective in locating seizures from EEG and 2) activity can be used to link EEG recordings to epileptic and non-epileptic subjects. These two studies required high computational load and could be addressed thanks to NeuralStudio. From a medical perspective, both methods proved the merits of NeuralStudio in brain research applications. For its outstanding features, NeuralStudio has been recently awarded a patent (US patent No. 7502763).

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We experimentally demonstrate 7-dB reduction of nonlinearity penalty in 40-Gb/s CO-OFDM at 2000-km using support vector machine regression-based equalization. Simulation in WDM-CO-OFDM shows up to 12-dB enhancement in Q-factor compared to linear equalization.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Brain injury due to lack of oxygen or impaired blood flow around the time of birth, may cause long term neurological dysfunction or death in severe cases. The treatments need to be initiated as soon as possible and tailored according to the nature of the injury to achieve best outcomes. The Electroencephalogram (EEG) currently provides the best insight into neurological activities. However, its interpretation presents formidable challenge for the neurophsiologists. Moreover, such expertise is not widely available particularly around the clock in a typical busy Neonatal Intensive Care Unit (NICU). Therefore, an automated computerized system for detecting and grading the severity of brain injuries could be of great help for medical staff to diagnose and then initiate on-time treatments. In this study, automated systems for detection of neonatal seizures and grading the severity of Hypoxic-Ischemic Encephalopathy (HIE) using EEG and Heart Rate (HR) signals are presented. It is well known that there is a lot of contextual and temporal information present in the EEG and HR signals if examined at longer time scale. The systems developed in the past, exploited this information either at very early stage of the system without any intelligent block or at very later stage where presence of such information is much reduced. This work has particularly focused on the development of a system that can incorporate the contextual information at the middle (classifier) level. This is achieved by using dynamic classifiers that are able to process the sequences of feature vectors rather than only one feature vector at a time.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

SQL Injection Attack (SQLIA) remains a technique used by a computer network intruder to pilfer an organisation’s confidential data. This is done by an intruder re-crafting web form’s input and query strings used in web requests with malicious intent to compromise the security of an organisation’s confidential data stored at the back-end database. The database is the most valuable data source, and thus, intruders are unrelenting in constantly evolving new techniques to bypass the signature’s solutions currently provided in Web Application Firewalls (WAF) to mitigate SQLIA. There is therefore a need for an automated scalable methodology in the pre-processing of SQLIA features fit for a supervised learning model. However, obtaining a ready-made scalable dataset that is feature engineered with numerical attributes dataset items to train Artificial Neural Network (ANN) and Machine Leaning (ML) models is a known issue in applying artificial intelligence to effectively address ever evolving novel SQLIA signatures. This proposed approach applies numerical attributes encoding ontology to encode features (both legitimate web requests and SQLIA) to numerical data items as to extract scalable dataset for input to a supervised learning model in moving towards a ML SQLIA detection and prevention model. In numerical attributes encoding of features, the proposed model explores a hybrid of static and dynamic pattern matching by implementing a Non-Deterministic Finite Automaton (NFA). This combined with proxy and SQL parser Application Programming Interface (API) to intercept and parse web requests in transition to the back-end database. In developing a solution to address SQLIA, this model allows processed web requests at the proxy deemed to contain injected query string to be excluded from reaching the target back-end database. This paper is intended for evaluating the performance metrics of a dataset obtained by numerical encoding of features ontology in Microsoft Azure Machine Learning (MAML) studio using Two-Class Support Vector Machines (TCSVM) binary classifier. This methodology then forms the subject of the empirical evaluation.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Biology is now a “Big Data Science” thanks to technological advancements allowing the characterization of the whole macromolecular content of a cell or a collection of cells. This opens interesting perspectives, but only a small portion of this data may be experimentally characterized. From this derives the demand of accurate and efficient computational tools for automatic annotation of biological molecules. This is even more true when dealing with membrane proteins, on which my research project is focused leading to the development of two machine learning-based methods: BetAware-Deep and SVMyr. BetAware-Deep is a tool for the detection and topology prediction of transmembrane beta-barrel proteins found in Gram-negative bacteria. These proteins are involved in many biological processes and primary candidates as drug targets. BetAware-Deep exploits the combination of a deep learning framework (bidirectional long short-term memory) and a probabilistic graphical model (grammatical-restrained hidden conditional random field). Moreover, it introduced a modified formulation of the hydrophobic moment, designed to include the evolutionary information. BetAware-Deep outperformed all the available methods in topology prediction and reported high scores in the detection task. Glycine myristoylation in Eukaryotes is the binding of a myristic acid on an N-terminal glycine. SVMyr is a fast method based on support vector machines designed to predict this modification in dataset of proteomic scale. It uses as input octapeptides and exploits computational scores derived from experimental examples and mean physicochemical features. SVMyr outperformed all the available methods for co-translational myristoylation prediction. In addition, it allows (as a unique feature) the prediction of post-translational myristoylation. Both the tools here described are designed having in mind best practices for the development of machine learning-based tools outlined by the bioinformatics community. Moreover, they are made available via user-friendly web servers. All this make them valuable tools for filling the gap between sequential and annotated data.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Every year, autochthonous cases of Plasmodium vivax malaria occur in low-endemicity areas of Vale do Ribeira in the south-eastern part of the Atlantic Forest, state of São Paulo, where Anopheles cruzii and Anopheles bellator are considered the primary vectors. However, other species in the subgenus Nyssorhynchus of Anopheles (e.g., Anopheles marajoara) are abundant and may participate in the dynamics of malarial transmission in that region. The objectives of the present study were to assess the spatial distribution of An. cruzii, An. bellator and An. marajoara and to associate the presence of these species with malaria cases in the municipalities of the Vale do Ribeira. Potential habitat suitability modelling was applied to determine both the spatial distribution of An. cruzii, An. bellator and An. marajoara and to establish the density of each species. Poisson regression was utilized to associate malaria cases with estimated vector densities. As a result, An. cruzii was correlated with the forested slopes of the Serra do Mar, An. bellator with the coastal plain and An. marajoara with the deforested areas. Moreover, both An. marajoara and An. cruzii were positively associated with malaria cases. Considering that An. marajoara was demonstrated to be a primary vector of human Plasmodium in the rural areas of the state of Amapá, more attention should be given to the species in the deforested areas of the Atlantic Forest, where it might be a secondary vector.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

BACKGROUND: Traditionally, epidemiologists have considered electrification to be a positive factor. In fact, electrification and plumbing are typical initiatives that represent the integration of an isolated population into modern society, ensuring the control of pathogens and promoting public health. Nonetheless, electrification is always accompanied by night lighting that attracts insect vectors and changes people's behavior. Although this may lead to new modes of infection and increased transmission of insect-borne diseases, epidemiologists rarely consider the role of night lighting in their surveys. OBJECTIVE: We reviewed the epidemiological evidence concerning the role of lighting in the spread of vector-borne diseases to encourage other researchers to consider it in future studies. DISCUSSION: We present three infectious vector-borne diseases-Chagas, leishmaniasis, and malaria-and discuss evidence that suggests that the use of artificial lighting results in behavioral changes among human populations and changes in the prevalence of vector species and in the modes of transmission. CONCLUSION: Despite a surprising lack of studies, existing evidence supports our hypothesis that artificial lighting leads to a higher risk of infection from vector-borne diseases. We believe that this is related not only to the simple attraction of traditional vectors to light sources but also to changes in the behavior of both humans and insects that result in new modes of disease transmission. Considering the ongoing expansion of night lighting in developing countries, additional research on this subject is urgently needed.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper describes the modeling of a weed infestation risk inference system that implements a collaborative inference scheme based on rules extracted from two Bayesian network classifiers. The first Bayesian classifier infers a categorical variable value for the weed-crop competitiveness using as input categorical variables for the total density of weeds and corresponding proportions of narrow and broad-leaved weeds. The inferred categorical variable values for the weed-crop competitiveness along with three other categorical variables extracted from estimated maps for the weed seed production and weed coverage are then used as input for a second Bayesian network classifier to infer categorical variables values for the risk of infestation. Weed biomass and yield loss data samples are used to learn the probability relationship among the nodes of the first and second Bayesian classifiers in a supervised fashion, respectively. For comparison purposes, two types of Bayesian network structures are considered, namely an expert-based Bayesian classifier and a naive Bayes classifier. The inference system focused on the knowledge interpretation by translating a Bayesian classifier into a set of classification rules. The results obtained for the risk inference in a corn-crop field are presented and discussed. (C) 2009 Elsevier Ltd. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In the context of cancer diagnosis and treatment, we consider the problem of constructing an accurate prediction rule on the basis of a relatively small number of tumor tissue samples of known type containing the expression data on very many (possibly thousands) genes. Recently, results have been presented in the literature suggesting that it is possible to construct a prediction rule from only a few genes such that it has a negligible prediction error rate. However, in these results the test error or the leave-one-out cross-validated error is calculated without allowance for the selection bias. There is no allowance because the rule is either tested on tissue samples that were used in the first instance to select the genes being used in the rule or because the cross-validation of the rule is not external to the selection process; that is, gene selection is not performed in training the rule at each stage of the cross-validation process. We describe how in practice the selection bias can be assessed and corrected for by either performing a cross-validation or applying the bootstrap external to the selection process. We recommend using 10-fold rather than leave-one-out cross-validation, and concerning the bootstrap, we suggest using the so-called. 632+ bootstrap error estimate designed to handle overfitted prediction rules. Using two published data sets, we demonstrate that when correction is made for the selection bias, the cross-validated error is no longer zero for a subset of only a few genes.