785 resultados para TOPOLOGY PREDICTION
Resumo:
A number of state-of-the-art protein structure prediction servers have been developed by researchers working in the Bioinformatics Unit at University College London. The popular PSIPRED server allows users to perform secondary structure prediction, transmembrane topology prediction and protein fold recognition. More recent servers include DISOPRED for the prediction of protein dynamic disorder and DomPred for domain boundary prediction.
Resumo:
The PSIPRED protein structure prediction server allows users to submit a protein sequence, perform a prediction of their choice and receive the results of the prediction both textually via e-mail and graphically via the web. The user may select one of three prediction methods to apply to their sequence: PSIPRED, a highly accurate secondary structure prediction method; MEMSAT 2, a new version of a widely used transmembrane topology prediction method; or GenTHREADER, a sequence profile based fold recognition method.
Resumo:
Different types of proteins exist with diverse functions that are essential for living organisms. An important class of proteins is represented by transmembrane proteins which are specifically designed to be inserted into biological membranes and devised to perform very important functions in the cell such as cell communication and active transport across the membrane. Transmembrane β-barrels (TMBBs) are a sub-class of membrane proteins largely under-represented in structure databases because of the extreme difficulty in experimental structure determination. For this reason, computational tools that are able to predict the structure of TMBBs are needed. In this thesis, two computational problems related to TMBBs were addressed: the detection of TMBBs in large datasets of proteins and the prediction of the topology of TMBB proteins. Firstly, a method for TMBB detection was presented based on a novel neural network framework for variable-length sequence classification. The proposed approach was validated on a non-redundant dataset of proteins. Furthermore, we carried-out genome-wide detection using the entire Escherichia coli proteome. In both experiments, the method significantly outperformed other existing state-of-the-art approaches, reaching very high PPV (92%) and MCC (0.82). Secondly, a method was also introduced for TMBB topology prediction. The proposed approach is based on grammatical modelling and probabilistic discriminative models for sequence data labeling. The method was evaluated using a newly generated dataset of 38 TMBB proteins obtained from high-resolution data in the PDB. Results have shown that the model is able to correctly predict topologies of 25 out of 38 protein chains in the dataset. When tested on previously released datasets, the performances of the proposed approach were measured as comparable or superior to the current state-of-the-art of TMBB topology prediction.
Resumo:
Membrane proteins, which constitute approximately 20% of most genomes, are poorly tractable targets for experimental structure determination, thus analysis by prediction and modelling makes an important contribution to their on-going study. Membrane proteins form two main classes: alpha helical and beta barrel trans-membrane proteins. By using a method based on Bayesian Networks, which provides a flexible and powerful framework for statistical inference, we addressed alpha-helical topology prediction. This method has accuracies of 77.4% for prokaryotic proteins and 61.4% for eukaryotic proteins. The method described here represents an important advance in the computational determination of membrane protein topology and offers a useful, and complementary, tool for the analysis of membrane proteins for a range of applications.
Resumo:
Membrane proteins, which constitute approximately 20% of most genomes, form two main classes: alpha helical and beta barrel transmembrane proteins. Using methods based on Bayesian Networks, a powerful approach for statistical inference, we have sought to address beta-barrel topology prediction. The beta-barrel topology predictor reports individual strand accuracies of 88.6%. The method outlined here represents a potentially important advance in the computational determination of membrane protein topology.
Resumo:
Protein structure prediction is a cornerstone of bioinformatics research. Membrane proteins require their own prediction methods due to their intrinsically different composition. A variety of tools exist for topology prediction of membrane proteins, many of them available on the Internet. The server described in this paper, BPROMPT (Bayesian PRediction Of Membrane Protein Topology), uses a Bayesian Belief Network to combine the results of other prediction methods, providing a more accurate consensus prediction. Topology predictions with accuracies of 70% for prokaryotes and 53% for eukaryotes were achieved. BPROMPT can be accessed at http://www.jenner.ac.uk/BPROMPT.
Resumo:
Spodoptera frugiperda beta-1,3-glucanase (SLam) was purified from larval midgut. It has a molecular mass of 37.5 kDa, an alkaline optimum pH of 9.0, is active against beta-1,3-glucan (laminarin), but cannot hydrolyze yeast beta-1,3-1,6-glucan or other polysaccharides. The enzyme is an endoglucanase with low processivity (0.4), and is not inhibited by high concentrations of substrate. In contrast to other digestive beta-1,3-glucanases from insects, SLam is unable to lyse Saccharomyces cerevisae cells. The cDNA encoding SLam was cloned and sequenced, showing that the protein belongs to glycosyl hydrolase family 16 as other insect glucanases and glucan-binding proteins. Multiple sequence alignment of beta-1,3-glucanases and beta-glucan-binding protein supports the assumption that the beta-1,3-glucanase gene duplicated in the ancestor of mollusks and arthropods. One copy originated the derived beta-1,3-glucanases by the loss of an extended N-terminal region and the beta-glucan-binding proteins by the loss of the catalytic residues. SLam homology modeling suggests that E228 may affect the ionization of the catalytic residues, thus displacing the enzyme pH optimum. SLam antiserum reacts with a single protein in the insect midgut. Immunocytolocalization shows that the enzyme is present in secretory vesicles and glycocalyx from columnar cells. (C) 2010 Elsevier Ltd. All rights reserved.
Resumo:
The overwhelming amount and unprecedented speed of publication in the biomedical domain make it difficult for life science researchers to acquire and maintain a broad view of the field and gather all information that would be relevant for their research. As a response to this problem, the BioNLP (Biomedical Natural Language Processing) community of researches has emerged and strives to assist life science researchers by developing modern natural language processing (NLP), information extraction (IE) and information retrieval (IR) methods that can be applied at large-scale, to scan the whole publicly available biomedical literature and extract and aggregate the information found within, while automatically normalizing the variability of natural language statements. Among different tasks, biomedical event extraction has received much attention within BioNLP community recently. Biomedical event extraction constitutes the identification of biological processes and interactions described in biomedical literature, and their representation as a set of recursive event structures. The 2009–2013 series of BioNLP Shared Tasks on Event Extraction have given raise to a number of event extraction systems, several of which have been applied at a large scale (the full set of PubMed abstracts and PubMed Central Open Access full text articles), leading to creation of massive biomedical event databases, each of which containing millions of events. Sinece top-ranking event extraction systems are based on machine-learning approach and are trained on the narrow-domain, carefully selected Shared Task training data, their performance drops when being faced with the topically highly varied PubMed and PubMed Central documents. Specifically, false-positive predictions by these systems lead to generation of incorrect biomolecular events which are spotted by the end-users. This thesis proposes a novel post-processing approach, utilizing a combination of supervised and unsupervised learning techniques, that can automatically identify and filter out a considerable proportion of incorrect events from large-scale event databases, thus increasing the general credibility of those databases. The second part of this thesis is dedicated to a system we developed for hypothesis generation from large-scale event databases, which is able to discover novel biomolecular interactions among genes/gene-products. We cast the hypothesis generation problem as a supervised network topology prediction, i.e predicting new edges in the network, as well as types and directions for these edges, utilizing a set of features that can be extracted from large biomedical event networks. Routine machine learning evaluation results, as well as manual evaluation results suggest that the problem is indeed learnable. This work won the Best Paper Award in The 5th International Symposium on Languages in Biology and Medicine (LBM 2013).
Resumo:
Biology is now a “Big Data Science” thanks to technological advancements allowing the characterization of the whole macromolecular content of a cell or a collection of cells. This opens interesting perspectives, but only a small portion of this data may be experimentally characterized. From this derives the demand of accurate and efficient computational tools for automatic annotation of biological molecules. This is even more true when dealing with membrane proteins, on which my research project is focused leading to the development of two machine learning-based methods: BetAware-Deep and SVMyr. BetAware-Deep is a tool for the detection and topology prediction of transmembrane beta-barrel proteins found in Gram-negative bacteria. These proteins are involved in many biological processes and primary candidates as drug targets. BetAware-Deep exploits the combination of a deep learning framework (bidirectional long short-term memory) and a probabilistic graphical model (grammatical-restrained hidden conditional random field). Moreover, it introduced a modified formulation of the hydrophobic moment, designed to include the evolutionary information. BetAware-Deep outperformed all the available methods in topology prediction and reported high scores in the detection task. Glycine myristoylation in Eukaryotes is the binding of a myristic acid on an N-terminal glycine. SVMyr is a fast method based on support vector machines designed to predict this modification in dataset of proteomic scale. It uses as input octapeptides and exploits computational scores derived from experimental examples and mean physicochemical features. SVMyr outperformed all the available methods for co-translational myristoylation prediction. In addition, it allows (as a unique feature) the prediction of post-translational myristoylation. Both the tools here described are designed having in mind best practices for the development of machine learning-based tools outlined by the bioinformatics community. Moreover, they are made available via user-friendly web servers. All this make them valuable tools for filling the gap between sequential and annotated data.
Resumo:
The identification of genes essential for survival is important for the understanding of the minimal requirements for cellular life and for drug design. As experimental studies with the purpose of building a catalog of essential genes for a given organism are time-consuming and laborious, a computational approach which could predict gene essentiality with high accuracy would be of great value. We present here a novel computational approach, called NTPGE (Network Topology-based Prediction of Gene Essentiality), that relies on the network topology features of a gene to estimate its essentiality. The first step of NTPGE is to construct the integrated molecular network for a given organism comprising protein physical, metabolic and transcriptional regulation interactions. The second step consists in training a decision-tree-based machine-learning algorithm on known essential and non-essential genes of the organism of interest, considering as learning attributes the network topology information for each of these genes. Finally, the decision-tree classifier generated is applied to the set of genes of this organism to estimate essentiality for each gene. We applied the NTPGE approach for discovering the essential genes in Escherichia coli and then assessed its performance. (C) 2007 Elsevier B.V. All rights reserved.
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Prediction of Oncogenic Interactions and Cancer-Related Signaling Networks Based on Network Topology
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
Identification, prediction, and control of a system are engineering subjects, regardless of the nature of the system. Here, the temporal evolution of the number of individuals with dengue fever weekly recorded in the city of Rio de Janeiro, Brazil, during 2007, is used to identify SIS (susceptible-infective-susceptible) and SIR (susceptible-infective-removed) models formulated in terms of cellular automaton (CA). In the identification process, a genetic algorithm (GA) is utilized to find the probabilities of the state transition S -> I able of reproducing in the CA lattice the historical series of 2007. These probabilities depend on the number of infective neighbors. Time-varying and non-time-varying probabilities, three different sizes of lattices, and two kinds of coupling topology among the cells are taken into consideration. Then, these epidemiological models built by combining CA and GA are employed for predicting the cases of sick persons in 2008. Such models can be useful for forecasting and controlling the spreading of this infectious disease.
Resumo:
We report the first steps of a collaborative project between the University of Queensland, Polyflow, Michelin, SK Chemicals, and RMIT University; on simulation, validation and application of a recently introduced constitutive model designed to describe branched polymers. Whereas much progress has been made on predicting the complex flow behaviour of many - in particular linear - polymers, it sometimes appears difficult to predict simultaneously shear thinning and extensional strain hardening behaviour using traditional constitutive models. Recently a new viscoelastic model based on molecular topology, was proposed by McLeish and Larson (1998). We explore the predictive power of a differential multi-mode version of the pom-pom model for the flow behaviour of two commercial polymer melts: a (long-chain branched) low-density polyethylene (LDPE) and a (linear) high-density polyethylene (HDPE). The model responses are compared to elongational recovery experiments published by Langouche and Debbaut (1999), and start-up of simple shear flow, stress relaxation after simple and reverse step strain experiments carried out in our laboratory.
Resumo:
Membrane proteins are a large and important class of proteins. They are responsible for several of the key functions in a living cell, e.g. transport of nutrients and ions, cell-cell signaling, and cell-cell adhesion. Despite their importance it has not been possible to study their structure and organization in much detail because of the difficulty to obtain 3D structures. In this thesis theoretical studies of membrane protein sequences and structures have been carried out by analyzing existing experimental data. The data comes from several sources including sequence databases, genome sequencing projects, and 3D structures. Prediction of the membrane spanning regions by hydrophobicity analysis is a key technique used in several of the studies. A novel method for this is also presented and compared to other methods. The primary questions addressed in the thesis are: What properties are common to all membrane proteins? What is the overall architecture of a membrane protein? What properties govern the integration into the membrane? How many membrane proteins are there and how are they distributed in different organisms? Several of the findings have now been backed up by experiments. An analysis of the large family of G-protein coupled receptors pinpoints differences in length and amino acid composition of loops between proteins with and without a signal peptide and also differences between extra- and intracellular loops. Known 3D structures of membrane proteins have been studied in terms of hydrophobicity, distribution of secondary structure and amino acid types, position specific residue variability, and differences between loops and membrane spanning regions. An analysis of several fully and partially sequenced genomes from eukaryotes, prokaryotes, and archaea has been carried out. Several differences in the membrane protein content between organisms were found, the most important being the total number of membrane proteins and the distribution of membrane proteins with a given number of transmembrane segments. Of the properties that were found to be similar in all organisms, the most obvious is the bias in the distribution of positive charges between the extra- and intracellular loops. Finally, an analysis of homologues to membrane proteins with known topology uncovered two related, multi-spanning proteins with opposite predicted orientations. The predicted topologies were verified experimentally, providing a first example of "divergent topology evolution".