991 resultados para class prediction


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Soil information is needed for managing the agricultural environment. The aim of this study was to apply artificial neural networks (ANNs) for the prediction of soil classes using orbital remote sensing products, terrain attributes derived from a digital elevation model and local geology information as data sources. This approach to digital soil mapping was evaluated in an area with a high degree of lithologic diversity in the Serra do Mar. The neural network simulator used in this study was JavaNNS and the backpropagation learning algorithm. For soil class prediction, different combinations of the selected discriminant variables were tested: elevation, declivity, aspect, curvature, curvature plan, curvature profile, topographic index, solar radiation, LS topographic factor, local geology information, and clay mineral indices, iron oxides and the normalized difference vegetation index (NDVI) derived from an image of a Landsat-7 Enhanced Thematic Mapper Plus (ETM+) sensor. With the tested sets, best results were obtained when all discriminant variables were associated with geological information (overall accuracy 93.2 - 95.6 %, Kappa index 0.924 - 0.951, for set 13). Excluding the variable profile curvature (set 12), overall accuracy ranged from 93.9 to 95.4 % and the Kappa index from 0.932 to 0.948. The maps based on the neural network classifier were consistent and similar to conventional soil maps drawn for the study area, although with more spatial details. The results show the potential of ANNs for soil class prediction in mountainous areas with lithological diversity.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

SUMMARY: A top scoring pair (TSP) classifier consists of a pair of variables whose relative ordering can be used for accurately predicting the class label of a sample. This classification rule has the advantage of being easily interpretable and more robust against technical variations in data, as those due to different microarray platforms. Here we describe a parallel implementation of this classifier which significantly reduces the training time, and a number of extensions, including a multi-class approach, which has the potential of improving the classification performance. AVAILABILITY AND IMPLEMENTATION: Full C++ source code and R package Rgtsp are freely available from http://lausanne.isb-sib.ch/~vpopovic/research/. The implementation relies on existing OpenMP libraries.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Objective: To develop a model to predict the bleeding source and identify the cohort amongst patients with acute gastrointestinal bleeding (GIB) who require urgent intervention, including endoscopy. Patients with acute GIB, an unpredictable event, are most commonly evaluated and managed by non-gastroenterologists. Rapid and consistently reliable risk stratification of patients with acute GIB for urgent endoscopy may potentially improve outcomes amongst such patients by targeting scarce health-care resources to those who need it the most. Design and methods: Using ICD-9 codes for acute GIB, 189 patients with acute GIB and all. available data variables required to develop and test models were identified from a hospital medical records database. Data on 122 patients was utilized for development of the model and on 67 patients utilized to perform comparative analysis of the models. Clinical data such as presenting signs and symptoms, demographic data, presence of co-morbidities, laboratory data and corresponding endoscopic diagnosis and outcomes were collected. Clinical data and endoscopic diagnosis collected for each patient was utilized to retrospectively ascertain optimal management for each patient. Clinical presentations and corresponding treatment was utilized as training examples. Eight mathematical models including artificial neural network (ANN), support vector machine (SVM), k-nearest neighbor, linear discriminant analysis (LDA), shrunken centroid (SC), random forest (RF), logistic regression, and boosting were trained and tested. The performance of these models was compared using standard statistical analysis and ROC curves. Results: Overall the random forest model best predicted the source, need for resuscitation, and disposition with accuracies of approximately 80% or higher (accuracy for endoscopy was greater than 75%). The area under ROC curve for RF was greater than 0.85, indicating excellent performance by the random forest model Conclusion: While most mathematical models are effective as a decision support system for evaluation and management of patients with acute GIB, in our testing, the RF model consistently demonstrated the best performance. Amongst patients presenting with acute GIB, mathematical models may facilitate the identification of the source of GIB, need for intervention and allow optimization of care and healthcare resource allocation; these however require further validation. (c) 2007 Elsevier B.V. All rights reserved.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Background: There are compelling economic and environmental reasons to reduce our reliance on inorganic phosphate (Pi) fertilisers. Better management of Pi fertiliser applications is one option to improve the efficiency of Pi fertiliser use, whilst maintaining crop yields. Application rates of Pi fertilisers are traditionally determined from analyses of soil or plant tissues. Alternatively, diagnostic genes with altered expression under Pi limiting conditions that suggest a physiological requirement for Pi fertilisation, could be used to manage Pifertiliser applications, and might be more precise than indirect measurements of soil or tissue samples. Results: We grew potato (Solanum tuberosum L.) plants hydroponically, under glasshouse conditions, to control their nutrient status accurately. Samples of total leaf RNA taken periodically after Pi was removed from the nutrient solution were labelled and hybridised to potato oligonucleotide arrays. A total of 1,659 genes were significantly differentially expressed following Pi withdrawal. These included genes that encode proteins involved in lipid, protein, and carbohydrate metabolism, characteristic of Pi deficient leaves and included potential novel roles for genes encoding patatin like proteins in potatoes. The array data were analysed using a support vector machine algorithm to identify groups of genes that could predict the Pi status of the crop. These groups of diagnostic genes were tested using field grown potatoes that had either been fertilised or unfertilised. A group of 200 genes could correctly predict the Pi status of field grown potatoes. Conclusions: This paper provides a proof-of-concept demonstration for using microarrays and class prediction tools to predict the Pi status of a field grown potato crop. There is potential to develop this technology for other biotic and abiotic stresses in field grown crops. Ultimately, a better understanding of crop stresses may improve our management of the crop, improving the sustainability of agriculture.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Il presente lavoro di tesi si inserisce nell’ambito della classificazione di dati ad alta dimensionalità, sviluppando un algoritmo basato sul metodo della Discriminant Analysis. Esso classifica i campioni attraverso le variabili prese a coppie formando un network a partire da quelle che hanno una performance sufficientemente elevata. Successivamente, l’algoritmo si avvale di proprietà topologiche dei network (in particolare la ricerca di subnetwork e misure di centralità di singoli nodi) per ottenere varie signature (sottoinsiemi delle variabili iniziali) con performance ottimali di classificazione e caratterizzate da una bassa dimensionalità (dell’ordine di 101, inferiore di almeno un fattore 103 rispetto alle variabili di partenza nei problemi trattati). Per fare ciò, l’algoritmo comprende una parte di definizione del network e un’altra di selezione e riduzione della signature, calcolando ad ogni passaggio la nuova capacità di classificazione operando test di cross-validazione (k-fold o leave- one-out). Considerato l’alto numero di variabili coinvolte nei problemi trattati – dell’ordine di 104 – l’algoritmo è stato necessariamente implementato su High-Performance Computer, con lo sviluppo in parallelo delle parti più onerose del codice C++, nella fattispecie il calcolo vero e proprio del di- scriminante e il sorting finale dei risultati. L’applicazione qui studiata è a dati high-throughput in ambito genetico, riguardanti l’espressione genica a livello cellulare, settore in cui i database frequentemente sono costituiti da un numero elevato di variabili (104 −105) a fronte di un basso numero di campioni (101 −102). In campo medico-clinico, la determinazione di signature a bassa dimensionalità per la discriminazione e classificazione di campioni (e.g. sano/malato, responder/not-responder, ecc.) è un problema di fondamentale importanza, ad esempio per la messa a punto di strategie terapeutiche personalizzate per specifici sottogruppi di pazienti attraverso la realizzazione di kit diagnostici per l’analisi di profili di espressione applicabili su larga scala. L’analisi effettuata in questa tesi su vari tipi di dati reali mostra che il metodo proposto, anche in confronto ad altri metodi esistenti basati o me- no sull’approccio a network, fornisce performance ottime, tenendo conto del fatto che il metodo produce signature con elevate performance di classifica- zione e contemporaneamente mantenendo molto ridotto il numero di variabili utilizzate per questo scopo.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We examine the occurrence of the ≈300 known protein folds in different groups of organisms. To do this, we characterize a large fraction of the currently known protein sequences (≈140,000) in structural terms, by matching them to known structures via sequence comparison (or by secondary-structure class prediction for those without structural homologues). Overall, we find that an appreciable fraction of the known folds are present in each of the major groups of organisms (e.g., bacteria and eukaryotes share 156 of 275 folds), and most of the common folds are associated with many families of nonhomologous sequences (i.e., >10 sequence families for each common fold). However, different groups of organisms have characteristically distinct distributions of folds. So, for instance, some of the most common folds in vertebrates, such as globins or zinc fingers, are rare or absent in bacteria. Many of these differences in fold usage are biologically reasonable, such as the folds of metabolic enzymes being common in bacteria and those associated with extracellular transport and communication being common in animals. They also have important implications for database-based methods for fold recognition, suggesting that an unknown sequence from a plant is more likely to have a certain fold (e.g., a TIM barrel) than an unknown sequence from an animal.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Motivation: Prediction methods for identifying binding peptides could minimize the number of peptides required to be synthesized and assayed, and thereby facilitate the identification of potential T-cell epitopes. We developed a bioinformatic method for the prediction of peptide binding to MHC class II molecules. Results: Experimental binding data and expert knowledge of anchor positions and binding motifs were combined with an evolutionary algorithm (EA) and an artificial neural network (ANN): binding data extraction --> peptide alignment --> ANN training and classification. This method, termed PERUN, was implemented for the prediction of peptides that bind to HLA-DR4(B1*0401). The respective positive predictive values of PERUN predictions of high-, moderate-, low- and zero-affinity binder-a were assessed as 0.8, 0.7, 0.5 and 0.8 by cross-validation, and 1.0, 0.8, 0.3 and 0.7 by experimental binding. This illustrates the synergy between experimentation and computer modeling, and its application to the identification of potential immunotheraaeutic peptides.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Promiscuous T-cell epitopes make ideal targets for vaccine development. We report here a computational system, multipred, for the prediction of peptide binding to the HLA-A2 supertype. It combines a novel representation of peptide/MHC interactions with a hidden Markov model as the prediction algorithm. multipred is both sensitive and specific, and demonstrates high accuracy of peptide-binding predictions for HLA-A*0201, *0204, and *0205 alleles, good accuracy for *0206 allele, and marginal accuracy for *0203 allele. multipred replaces earlier requirements for individual prediction models for each HLA allelic variant and simplifies computational aspects of peptide-binding prediction. Preliminary testing indicates that multipred can predict peptide binding to HLA-A2 supertype molecules with high accuracy, including those allelic variants for which no experimental binding data are currently available.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

An ab initio structure prediction approach adapted to the peptide-major histocompatibility complex (MHC) class I system is presented. Based on structure comparisons of a large set of peptide-MHC class I complexes, a molecular dynamics protocol is proposed using simulated annealing (SA) cycles to sample the conformational space of the peptide in its fixed MHC environment. A set of 14 peptide-human leukocyte antigen (HLA) A0201 and 27 peptide-non-HLA A0201 complexes for which X-ray structures are available is used to test the accuracy of the prediction method. For each complex, 1000 peptide conformers are obtained from the SA sampling. A graph theory clustering algorithm based on heavy atom root-mean-square deviation (RMSD) values is applied to the sampled conformers. The clusters are ranked using cluster size, mean effective or conformational free energies, with solvation free energies computed using Generalized Born MV 2 (GB-MV2) and Poisson-Boltzmann (PB) continuum models. The final conformation is chosen as the center of the best-ranked cluster. With conformational free energies, the overall prediction success is 83% using a 1.00 Angstroms crystal RMSD criterion for main-chain atoms, and 76% using a 1.50 Angstroms RMSD criterion for heavy atoms. The prediction success is even higher for the set of 14 peptide-HLA A0201 complexes: 100% of the peptides have main-chain RMSD values < or =1.00 Angstroms and 93% of the peptides have heavy atom RMSD values < or =1.50 Angstroms. This structure prediction method can be applied to complexes of natural or modified antigenic peptides in their MHC environment with the aim to perform rational structure-based optimizations of tumor vaccines.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

We present a method for predicting protein folding class based on global protein chain description and a voting process. Selection of the best descriptors was achieved by a computer-simulated neural network trained on a data base consisting of 83 folding classes. Protein-chain descriptors include overall composition, transition, and distribution of amino acid attributes, such as relative hydrophobicity, predicted secondary structure, and predicted solvent exposure. Cross-validation testing was performed on 15 of the largest classes. The test shows that proteins were assigned to the correct class (correct positive prediction) with an average accuracy of 71.7%, whereas the inverse prediction of proteins as not belonging to a particular class (correct negative prediction) was 90-95% accurate. When tested on 254 structures used in this study, the top two predictions contained the correct class in 91% of the cases.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Motivation: While processing of MHC class II antigens for presentation to helper T-cells is essential for normal immune response, it is also implicated in the pathogenesis of autoimmune disorders and hypersensitivity reactions. Sequence-based computational techniques for predicting HLA-DQ binding peptides have encountered limited success, with few prediction techniques developed using three-dimensional models. Methods: We describe a structure-based prediction model for modeling peptide-DQ3.2 beta complexes. We have developed a rapid and accurate protocol for docking candidate peptides into the DQ3.2 beta receptor and a scoring function to discriminate binders from the background. The scoring function was rigorously trained, tested and validated using experimentally verified DQ3.2 beta binding and non-binding peptides obtained from biochemical and functional studies. Results: Our model predicts DQ3.2 beta binding peptides with high accuracy [area under the receiver operating characteristic (ROC) curve A(ROC) > 0.90], compared with experimental data. We investigated the binding patterns of DQ3.2 beta peptides and illustrate that several registers exist within a candidate binding peptide. Further analysis reveals that peptides with multiple registers occur predominantly for high-affinity binders.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Vaccines are the greatest single instrument of prophylaxis against infectious diseases, with immeasurable benefits to human wellbeing. The accurate and reliable prediction of peptide-MHC binding is fundamental to the robust identification of T-cell epitopes and thus the successful design of peptide- and protein-based vaccines. The prediction of MHC class II peptide binding has hitherto proved recalcitrant and refractory. Here we illustrate the utility of existing computational tools for in silico prediction of peptides binding to class II MHCs. Most of the methods, tested in the present study, detect more than the half of the true binders in the top 5% of all possible nonamers generated from one protein. This number increases in the top 10% and 15% and then does not change significantly. For the top 15% the identified binders approach 86%. In terms of lab work this means 85% less expenditure on materials, labour and time. We show that while existing caveats are well founded, nonetheless use of computational models of class II binding can still offer viable help to the work of the immunologist and vaccinologist.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

MHC class II proteins bind oligopeptide fragments derived from proteolysis of pathogen antigens, presenting them at the cell surface for recognition by CD4+ T cells. Human MHC class II alleles are grouped into three loci: HLA-DP, HLA-DQ and HLA-DR. In contrast to HLA-DR and HLA-DQ, HLA-DP proteins have not been studied extensively, as they have been viewed as less important in immune responses than DRs and DQs. However, it is now known that HLA-DP alleles are associated with many autoimmune diseases. Quite recently, the X-ray structure of the HLA-DP2 molecule (DPA*0103, DPB1*0201) in complex with a self-peptide derived from the HLA-DR a-chain has been determined. In the present study, we applied a validated molecular docking protocol to a library of 247 modelled peptide-DP2 complexes, seeking to assess the contribution made by each of the 20 naturally occurred amino acids at each of the nine binding core peptide positions and the four flanking residues (two on both sides).

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Motivation: T-cell epitope identification is a critical immunoinformatic problem within vaccine design. To be an epitope, a peptide must bind an MHC protein. Results: Here, we present EpiTOP, the first server predicting MHC class II binding based on proteochemometrics, a QSAR approach for ligands binding to several related proteins. EpiTOP uses a quantitative matrix to predict binding to 12 HLA-DRB1 alleles. It identifies 89% of known epitopes within the top 20% of predicted binders, reducing laboratory labour, materials and time by 80%. EpiTOP is easy to use, gives comprehensive quantitative predictions and will be expanded and updated with new quantitative matrices over time.