954 resultados para Protein structure prediction


100.00% 100.00%



Elucidating the biological and biochemical roles of proteins, and subsequently determining their interacting partners, can be difficult and time consuming using in vitro and/or in vivo methods, and consequently the majority of newly sequenced proteins will have unknown structures and functions. However, in silico methods for predicting protein–ligand binding sites and protein biochemical functions offer an alternative practical solution. The characterisation of protein–ligand binding sites is essential for investigating new functional roles, which can impact the major biological research spheres of health, food, and energy security. In this review we discuss the role in silico methods play in 3D modelling of protein–ligand binding sites, along with their role in predicting biochemical functionality. In addition, we describe in detail some of the key alternative in silico prediction approaches that are available, as well as discussing the Critical Assessment of Techniques for Protein Structure Prediction (CASP) and the Continuous Automated Model EvaluatiOn (CAMEO) projects, and their impact on developments in the field. Furthermore, we discuss the importance of protein function prediction methods for tackling 21st century problems.


100.00% 100.00%



Protein–ligand binding site prediction methods aim to predict, from amino acid sequence, protein–ligand interactions, putative ligands, and ligand binding site residues using either sequence information, structural information, or a combination of both. In silico characterization of protein–ligand interactions has become extremely important to help determine a protein’s functionality, as in vivo-based functional elucidation is unable to keep pace with the current growth of sequence databases. Additionally, in vitro biochemical functional elucidation is time-consuming, costly, and may not be feasible for large-scale analysis, such as drug discovery. Thus, in silico prediction of protein–ligand interactions must be utilized to aid in functional elucidation. Here, we briefly discuss protein function prediction, prediction of protein–ligand interactions, the Critical Assessment of Techniques for Protein Structure Prediction (CASP) and the Continuous Automated EvaluatiOn (CAMEO) competitions, along with their role in shaping the field. We also discuss, in detail, our cutting-edge web-server method, FunFOLD for the structurally informed prediction of protein–ligand interactions. Furthermore, we provide a step-by-step guide on using the FunFOLD web server and FunFOLD3 downloadable application, along with some real world examples, where the FunFOLD methods have been used to aid functional elucidation.


100.00% 100.00%



The PilZ protein was originally identified as necessary for type IV pilus (T4P) biogenesis. Since then, a large and diverse family of bacterial PilZ homology domains have been identified, some of which have been implicated in signaling pathways that control important processes, including motility, virulence and biofilm formation. Furthermore, many PilZ homology domains, though not PilZ itself, have been shown to bind the important bacterial second messenger bis(3`-> 5`)cyclic diGMP (c-diGMP). The crystal structures of the PilZ orthologs from Xanthomonas axonopodis pv Citri (PilZ(XAC1133), this work) and from Xanthomonas campestris pv campestris (XC1028) present significant structural differences to other PilZ homologs that explain its failure to bind c-diGMP. NMR analysis of PilZ(XAC1133) shows that these structural differences are maintained in solution. In spite of their emerging importance in bacterial signaling, the means by which NZ proteins regulate specific processes is not clear. In this study, we show that PilZ(XAC1133) binds to PilB, an ATPase required for TV polymerization, and to the EAL domain of FiMX(XAC2398), which regulates TV biogenesis and localization in other bacterial species. These interactions were confirmed in NMR, two-hybrid and far-Western blot assays and are the first interactions observed between any PilZ domain and a target protein. While we were unable to detect phosphodiesterase activity for FimXX(AC2398) in vitro, we show that it binds c-diGMP both in the presence and in the absence of PilZ(XAC1133). Site-directed mutagenesis studies for conserved and exposed residues suggest that PilZ(XAC1133) interactions with FimX(XAC2398) and PilB(XAC3239) are mediated through a hydrophobic surface and an unstructured C-terminal extension conserved only in PilZ orthologs. The FimX-PilZ-PilB interactions involve a full set of ""degenerate"" GGDEF, EAL and PilZ domains and provide the first evidence of the means by which PilZ orthologs and FimX interact directly with the TP4 machinery. (C) 2009 Elsevier Ltd. All rights reserved.


100.00% 100.00%



Many problems in chemistry depend on the ability to identify the global minimum or maximum of a function. Examples include applications in chemometrics, optimization of reaction or operating conditions, and non-linear least-squares analysis. This paper presents the results of the application of a new method of deterministic global optimization, called the cutting angle method (CAM), as applied to the prediction of molecular geometries. CAM is shown to be competitive with other global optimization techniques for several benchmark molecular conformation problem. CAM is a general method that can also be applied to other computational problems involving global minima, global maxima or finding the roots of nonlinear equations.


100.00% 100.00%



The molecular geometry, the three dimensional arrangement of atoms in space, is a major factor determining the properties and reactivity of molecules, biomolecules and macromolecules. Computation of stable molecular conformations can be done by locating minima on the potential energy surface (PES). This is a very challenging global optimization problem because of extremely large numbers of shallow local minima and complicated landscape of PES. This paper illustrates the mathematical and computational challenges on one important instance of the problem, computation of molecular geometry of oligopeptides, and proposes the use of the Extended Cutting Angle Method (ECAM) to solve this problem.

ECAM is a deterministic global optimization technique, which computes tight lower bounds on the values of the objective function and fathoms those part of the domain where the global minimum cannot reside. As with any domain partitioning scheme, its challenge is an extremely large partition of the domain required for accurate lower bounds. We address this challenge by providing an efficient combinatorial algorithm for calculating the lower bounds, and by combining ECAM with a local optimization method, while preserving the deterministic character of ECAM.


100.00% 100.00%



As one of the primary substances in a living organism, protein defines the character of each cell by interacting with the cellular environment to promote the cell’s growth and function [1]. Previous studies on proteomics indicate that the functions of different proteins could be assigned based upon protein structures [2,3]. The knowledge on protein structures gives us an overview of protein fold space and is helpful for the understanding of the evolutionary principles behind structure. By observing the architectures and topologies of the protein families, biological processes can be investigated more directly with much higher resolution and finer detail. For this reason, the analysis of protein, its structure and the interaction with the other materials is emerging as an important problem in bioinformatics. However, the determination of protein structures is experimentally expensive and time consuming, this makes scientists largely dependent on sequence rather than more general structure to infer the function of the protein at the present time. For this reason, data mining technology is introduced into this area to provide more efficient data processing and knowledge discovery approaches.

Unlike many data mining applications which lack available data, the protein structure determination problem and its interaction study, on the contrary, could utilize a vast amount of biologically relevant information on protein and its interaction, such as the protein data bank (PDB) [4], the structural classification of proteins (SCOP) databases [5], CATH databases [6], UniProt [7], and others. The difficulty of predicting protein structures, specially its 3D structures, and the interactions between proteins as shown in Figure 6.1, lies in the computational complexity of the data. Although a large number of approaches have been developed to determine the protein structures such as ab initio modelling [8], homology modelling [9] and threading [10], more efficient and reliable methods are still greatly needed.

In this chapter, we will introduce a state-of-the-art data mining technique, graph mining, which is good at defining and discovering interesting structural patterns in graphical data sets, and take advantage of its expressive power to study protein structures, including protein structure prediction and comparison, and protein-protein interaction (PPI). The current graph pattern mining methods will be described, and typical algorithms will be presented, together with their applications in the protein structure analysis.

The rest of the chapter is organized as follows: Section 6.2 will give a brief introduction of the fundamental knowledge of protein, the publicly accessible protein data resources and the current research status of protein analysis; in Section 6.3, we will pay attention to one of the state-of-the-art data mining methods, graph mining; then Section 6.4 surveys several existing work for protein structure analysis using advanced graph mining methods in the recent decade; finally, in Section 6.5, a conclusion with potential further work will be summarized.


100.00% 100.00%



Background: Current approaches of predicting protein functions from a protein-protein interaction (PPI) dataset are based on an assumption that the available functions of the proteins (a.k.a. annotated proteins) will determine the functions of the proteins whose functions are unknown yet at the moment (a.k.a. un-annotated proteins). Therefore, the protein function prediction is a mono-directed and one-off procedure, i.e. from annotated proteins to un-annotated proteins. However, the interactions between proteins are mutual rather than static and mono-directed, although functions of some proteins are unknown for some reasons at present. That means when we use the similarity-based approach to predict functions of un-annotated proteins, the un-annotated proteins, once their functions are predicted, will affect the similarities between proteins, which in turn will affect the prediction results. In other words, the function prediction is a dynamic and mutual procedure. This dynamic feature of protein interactions, however, was not considered in the existing prediction algorithms.

Results: In this paper, we propose a new prediction approach that predicts protein functions iteratively. This iterative approach incorporates the dynamic and mutual features of PPI interactions, as well as the local and global semantic influence of protein functions, into the prediction. To guarantee predicting functions iteratively, we propose a new protein similarity from protein functions. We adapt new evaluation metrics to evaluate the prediction quality of our algorithm and other similar algorithms. Experiments on real PPI datasets were conducted to evaluate the effectiveness of the proposed approach in predicting unknown protein functions.

The iterative approach is more likely to reflect the real biological nature between proteins when predicting functions. A proper definition of protein similarity from protein functions is the key to predicting functions iteratively. The evaluation results demonstrated that in most cases, the iterative approach outperformed non-iterative ones with higher prediction quality in terms of prediction precision, recall and F-value.


100.00% 100.00%



Expressed Sequence Tags (ESTs) are short DNA sequences generated by sequencing the transcribed cDNAs coming from a gene expression. They can provide significant functional, structural and evolutionary information and thus are a primary resource for gene discovery. EST annotation basically refers to the analysis of unknown ESTs that can be performed by database similarity search for possible identities and database search for functional prediction of translation products. Such kind of annotation typically consists of a series of repetitive tasks which should be automated, and be customizable and amenable to using distributed computing resources. Furthermore, processing of EST data should be done efficiently using a high performance computing platform. In this paper, we describe an EST annotator, EST-PACHPC, which has been developed for harnessing HPC resources potentially from Grid and Cloud systems for high throughput EST annotations. The performance analysis of EST-PACHPC has shown that it provides substantial performance gain in EST annotation.


100.00% 100.00%



Wool fabrics, ultrasonically treated for different time durations, were analysed by Fourier transform infrared (FTIR), differential scanning calorimeter (DSC), and thermo-gravimetric analysis (TGA), in comparison with the wool without ultrasonic treatment. Fabric tensile and thermal properties were measured in addition to the fibre micro structure analysis. Wool protein chains in the macro fibrils were shown to be rearranged to a more regular and less flexible structure, as a result of the ultrasonically treated fabric. Prolonged ultrasonic treatment, however, significantly reduced both fabric tenacity and extensibility. Wool treated with ultrasonics was found to have less mass loss and a higher thermal degradation temperature than that of without ultrasonic treatment and prolonged treated. DSC analysis showed that while ultrasonic treatment has little effect on the fibre crystallinity, an appropriate treatment can provide wool with increased water absorption.


100.00% 100.00%



The availability of large amounts of protein-protein interaction (PPI) data makes it feasible to use computational approaches to predict protein functions. The base of existing computational approaches is to exploit the known function information of annotated proteins in the PPI data to predict functions of un-annotated proteins. However, these approaches consider the prediction domain (i.e. the set of proteins from which the functions are predicted) as unchangeable during the prediction procedure. This may lead to valuable information being overwhelmed by the unavoidable noise information in the PPI data when predicting protein functions, and in turn, the prediction results will be distorted. In this paper, we propose a novel method to dynamically predict protein functions from the PPI data. Our method regards the function prediction as a dynamic process of finding a suitable prediction domain, from which representative functions of the domain are selected to predict functions of un-annotated proteins. Our method exploits the topological structural information of a PPI network and the semantic relationship between protein functions to measure the relationship between proteins, dynamically select a suitable prediction domain and predict functions. The evaluation on real PPI datasets demonstrated the effectiveness of our proposed method, and generated better prediction results.


100.00% 100.00%



In recent years, significant effort has been given to predicting protein functions from protein interaction data generated from high throughput techniques. However, predicting protein functions correctly and reliably still remains a challenge. Recently, many computational methods have been proposed for predicting protein functions. Among these methods, clustering based methods are the most promising. The existing methods, however, mainly focus on protein relationship modeling and the prediction algorithms that statically predict functions from the clusters that are related to the unannotated proteins. In fact, the clustering itself is a dynamic process and the function prediction should take this dynamic feature of clustering into consideration. Unfortunately, this dynamic feature of clustering is ignored in the existing prediction methods. In this paper, we propose an innovative progressive clustering based prediction method to trace the functions of relevant annotated proteins across all clusters that are generated through the progressive clustering of proteins. A set of prediction criteria is proposed to predict functions of unannotated proteins from all relevant clusters and traced functions. The method was evaluated on real protein interaction datasets and the results demonstrated the effectiveness of the proposed method compared with representative existing methods.


100.00% 100.00%



In this work, genetic algorithms concepts along with a rotamer library for proteins side chains are used to optimize the tertiary structure of the hydrophobic core of Cytochrome b(562) starting from the known PDB structure of its backbone which is kept fixed while the side chains of the hydrophobic core are allowed to adopt the conformations present in the rotamer library. The atoms of the side chains forming the core interact via van der Waals energy. Besides the prediction of the native core structure, it is also suggested a set of different amino acid sequences for this core. Comparison between these new cores and the native are made in terms of their volumes, van der Waals energies values and the numbers of contacts made by the side chains forming the cores. This paper proves that genetic algorithms area efficient to design new sequence for the protein core. (C) 2007 Elsevier B.V. All rights reserved.


100.00% 100.00%



The goal of this thesis work is to develop a computational method based on machine learning techniques for predicting disulfide-bonding states of cysteine residues in proteins, which is a sub-problem of a bigger and yet unsolved problem of protein structure prediction. Improvement in the prediction of disulfide bonding states of cysteine residues will help in putting a constraint in the three dimensional (3D) space of the respective protein structure, and thus will eventually help in the prediction of 3D structure of proteins. Results of this work will have direct implications in site-directed mutational studies of proteins, proteins engineering and the problem of protein folding. We have used a combination of Artificial Neural Network (ANN) and Hidden Markov Model (HMM), the so-called Hidden Neural Network (HNN) as a machine learning technique to develop our prediction method. By using different global and local features of proteins (specifically profiles, parity of cysteine residues, average cysteine conservation, correlated mutation, sub-cellular localization, and signal peptide) as inputs and considering Eukaryotes and Prokaryotes separately we have reached to a remarkable accuracy of 94% on cysteine basis for both Eukaryotic and Prokaryotic datasets, and an accuracy of 90% and 93% on protein basis for Eukaryotic dataset and Prokaryotic dataset respectively. These accuracies are best so far ever reached by any existing prediction methods, and thus our prediction method has outperformed all the previously developed approaches and therefore is more reliable. Most interesting part of this thesis work is the differences in the prediction performances of Eukaryotes and Prokaryotes at the basic level of input coding when ‘profile’ information was given as input to our prediction method. And one of the reasons for this we discover is the difference in the amino acid composition of the local environment of bonded and free cysteine residues in Eukaryotes and Prokaryotes. Eukaryotic bonded cysteine examples have a ‘symmetric-cysteine-rich’ environment, where as Prokaryotic bonded examples lack it.


100.00% 100.00%



Structure and folding of membrane proteins are important issues in molecular and cell biology. In this work new approaches are developed to characterize the structure of folded, unfolded and partially folded membrane proteins. These approaches combine site-directed spin labeling and pulse EPR techniques. The major plant light harvesting complex LHCIIb was used as a model system. Measurements of longitudinal and transversal relaxation times of electron spins and of hyperfine couplings to neighboring nuclei by electron spin echo envelope modulation(ESEEM) provide complementary information about the local environment of a single spin label. By double electron electron resonance (DEER) distances in the nanometer range between two spin labels can be determined. The results are analyzed in terms of relative water accessibilities of different sites in LHCIIb and its geometry. They reveal conformational changes as a function of micelle composition. This arsenal of methods is used to study protein folding during the LHCIIb self assembly and a spatially and temporally resolved folding model is proposed. The approaches developed here are potentially applicable for studying structure and folding of any protein or other self-assembling structure if site-directed spin labeling is feasible and the time scale of folding is accessible to freeze-quench techniques.