870 resultados para scoring functions


Relevância:

70.00% 70.00%

Publicador:

Resumo:

The identification of near native protein-protein complexes among a set of decoys remains highly challenging. A stategy for improving the success rate of near native detection is to enrich near native docking decoys in a small number of top ranked decoys. Recently, we found that a combination of three scoring functions (energy, conservation, and interface propensity) can predict the location of binding interface regions with reasonable accuracy. Here, these three scoring functions are modified and combined into a consensus scoring function called ENDES for enriching near native docking decoys. We found that all individual scores result in enrichment for the majority of 28 targets in ZDOCK2.3 decoy set and the 22 targets in Benchmark 2.0. Among the three scores, the interface propensity score yields the highest enrichment in both sets of protein complexes. When these scores are combined into the ENDES consensus score, a significant increase in enrichment of near-native structures is found. For example, when 2000 dock decoys are reduced to 200 decoys by ENDES, the fraction of near-native structures in docking decoys increases by a factor of about six in average. ENDES was implemented into a computer program that is available for download at http://sparks.informatics.iupui.edu.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

A successful protein-protein docking study culminates in identification of decoys at top ranks with near-native quaternary structures. However, this task remains enigmatic because no generalized scoring functions exist that effectively infer decoys according to the similarity to near-native quaternary structures. Difficulties arise because of the highly irregular nature of the protein surface and the significant variation of the nonbonding and solvation energies based on the chemical composition of the protein-protein interface. In this work, we describe a novel method combining an interface-size filter, a regression model for geometric compatibility (based on two correlated surface and packing parameters), and normalized interaction energy (calculated from correlated nonbonded and solvation energies), to effectively rank decoys from a set of 10,000 decoys. Tests on 30 unbound binary protein-protein complexes show that in 16 cases we can identify at least one decoy in top three ranks having <= 10 angstrom backbone root mean square deviation from true binding geometry. Comparisons with other state-of-art methods confirm the improved ranking power of our method without the use of any experiment-guided restraints, evolutionary information, statistical propensities, or modified interaction energy equations. Tests on 118 less-difficult bound binary protein-protein complexes with <= 35% sequence redundancy at the interface showed that in 77% cases, at least 1 in 10,000 decoys were identified with <= 5 angstrom backbone root mean square deviation from true geometry at first rank. The work will promote the use of new concepts where correlations among parameters provide more robust scoring models. It will facilitate studies involving molecular interactions, including modeling of large macromolecular assemblies and protein structure prediction. (C) 2010 Wiley Periodicals, Inc. J Comput Chem 32: 787-796, 2011.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Membrane proteins are involved in a number of important biological functions. Yet, they are poorly understood from the structure and folding point of view. The external environment being drastically different from that of globular proteins, the intra-protein interactions in membrane proteins are also expected to be different. Hence, statistical potentials representing the features of inter-residue interactions based exclusively on the structures of membrane proteins are much needed. Currently, a reasonable number of structures are available, making it possible to undertake such an analysis on membrane proteins. In this study we have examined the inter-residue interaction propensities of amino acids in the membrane spanning regions of the alpha-helical membrane (HM) proteins. Recently we have shown that valuable information can be obtained on globular proteins by the evaluation of the pair-wise interactions of amino acids by classifying them into different structural environments, based on factors such as the secondary structure or the number of contacts that a residue can make. Here we have explored the possible ways of classifying the intra-protein environment of HM proteins and have developed scoring functions based on different classification schemes. On evaluation of different schemes, we find that the scheme which classifies amino acids to different intra-contact environment is the most promising one. Based on this classification scheme, we also redefine the hydrophobicity scale of amino acids in HM proteins.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Convergence of the vast sequence space of proteins into a highly restricted fold/conformational space suggests a simple yet unique underlying mechanism of protein folding that has been the subject of much debate in the last several decades. One of the major challenges related to the understanding of protein folding or in silico protein structure prediction is the discrimination of non-native structures/decoys from the native structure. Applications of knowledge-based potentials to attain this goal have been extensively reported in the literature. Also, scoring functions based on accessible surface area and amino acid neighbourhood considerations were used in discriminating the decoys from native structures. In this article, we have explored the potential of protein structure network (PSN) parameters to validate the native proteins against a large number of decoy structures generated by diverse methods. We are guided by two principles: (a) the PSNs capture the local properties from a global perspective and (b) inclusion of non-covalent interactions, at all-atom level, including the side-chain atoms, in the network construction accommodates the sequence dependent features. Several network parameters such as the size of the largest cluster, community size, clustering coefficient are evaluated and scored on the basis of the rank of the native structures and the Z-scores. The network analysis of decoy structures highlights the importance of the global properties contributing to the uniqueness of native structures. The analysis also exhibits that the network parameters can be used as metrics to identify the native structures and filter out non-native structures/decoys in a large number of data-sets; thus also has a potential to be used in the protein `structure prediction' problem.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Most of the biological processes are governed through specific protein-ligand interactions. Discerning different components that contribute toward a favorable protein-ligand interaction could contribute significantly toward better understanding protein function, rationalizing drug design and obtaining design principles for protein engineering. The Protein Data Bank (PDB) currently hosts the structure of similar to 68 000 protein-ligand complexes. Although several databases exist that classify proteins according to sequence and structure, a mere handful of them annotate and classify protein-ligand interactions and provide information on different attributes of molecular recognition. In this study, an exhaustive comparison of all the biologically relevant ligand-binding sites (84 846 sites) has been conducted using PocketMatch: a rapid, parallel, in-house algorithm. PocketMatch quantifies the similarity between binding sites based on structural descriptors and residue attributes. A similarity network was constructed using binding sites whose PocketMatch scores exceeded a high similarity threshold (0.80). The binding site similarity network was clustered into discrete sets of similar sites using the Markov clustering (MCL) algorithm. Furthermore, various computational tools have been used to study different attributes of interactions within the individual clusters. The attributes can be roughly divided into (i) binding site characteristics including pocket shape, nature of residues and interaction profiles with different kinds of atomic probes, (ii) atomic contacts consisting of various types of polar, hydrophobic and aromatic contacts along with binding site water molecules that could play crucial roles in protein-ligand interactions and (iii) binding energetics involved in interactions derived from scoring functions developed for docking. For each ligand-binding site in each protein in the PDB, site similarity information, clusters they belong to and description of site attributes are provided as a relational database-protein-ligand interaction clusters (PLIC).

Relevância:

60.00% 60.00%

Publicador:

Resumo:

How to refine a near-native structure to make it closer to its native conformation is an unsolved problem in protein-structure and protein-protein complex-structure prediction. In this article, we first test several scoring functions for selecting locally resampled near-native protein-protein docking conformations and then propose a computationally efficient protocol for structure refinement via local resampling and energy minimization. The proposed method employs a statistical energy function based on a Distance-scaled Ideal-gas REference state (DFIRE) as an initial filter and an empirical energy function EMPIRE (EMpirical Protein-InteRaction Energy) for optimization and re-ranking. Significant improvement of final top-1 ranked structures over initial near-native structures is observed in the ZDOCK 2.3 decoy set for Benchmark 1.0 (74% whose global rmsd reduced by 0.5 angstrom or more and only 7% increased by 0.5 angstrom or more). Less significant improvement is observed for Benchmark 2.0 (38% versus 33%). Possible reasons are discussed.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We propose a new characterization of protein structure based on the natural tetrahedral geometry of the β carbon and a new geometric measure of structural similarity, called visible volume. In our model, the side-chains are replaced by an ideal tetrahedron, the orientation of which is fixed with respect to the backbone and corresponds to the preferred rotamer directions. Visible volume is a measure of the non-occluded empty space surrounding each residue position after the side-chains have been removed. It is a robust, parameter-free, locally-computed quantity that accounts for many of the spatial constraints that are of relevance to the corresponding position in the native structure. When computing visible volume, we ignore the nature of both the residue observed at each site and the ones surrounding it. We focus instead on the space that, together, these residues could occupy. By doing so, we are able to quantify a new kind of invariance beyond the apparent variations in protein families, namely, the conservation of the physical space available at structurally equivalent positions for side-chain packing. Corresponding positions in native structures are likely to be of interest in protein structure prediction, protein design, and homology modeling. Visible volume is related to the degree of exposure of a residue position and to the actual rotamers in native proteins. In this article, we discuss the properties of this new measure, namely, its robustness with respect to both crystallographic uncertainties and naturally occurring variations in atomic coordinates, and the remarkable fact that it is essentially independent of the choice of the parameters used in calculating it. We also show how visible volume can be used to align protein structures, to identify structurally equivalent positions that are conserved in a family of proteins, and to single out positions in a protein that are likely to be of biological interest. These properties qualify visible volume as a powerful tool in a variety of applications, from the detailed analysis of protein structure to homology modeling, protein structural alignment, and the definition of better scoring functions for threading purposes.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Accurate in silico models for the quantitative prediction of the activity of G protein-coupled receptor (GPCR) ligands would greatly facilitate the process of drug discovery and development. Several methodologies have been developed based on the properties of the ligands, the direct study of the receptor-ligand interactions, or a combination of both approaches. Ligand-based three-dimensional quantitative structure-activity relationships (3D-QSAR) techniques, not requiring knowledge of the receptor structure, have been historically the first to be applied to the prediction of the activity of GPCR ligands. They are generally endowed with robustness and good ranking ability; however they are highly dependent on training sets. Structure-based techniques generally do not provide the level of accuracy necessary to yield meaningful rankings when applied to GPCR homology models. However, they are essentially independent from training sets and have a sufficient level of accuracy to allow an effective discrimination between binders and nonbinders, thus qualifying as viable lead discovery tools. The combination of ligand and structure-based methodologies in the form of receptor-based 3D-QSAR and ligand and structure-based consensus models results in robust and accurate quantitative predictions. The contribution of the structure-based component to these combined approaches is expected to become more substantial and effective in the future, as more sophisticated scoring functions are developed and more detailed structural information on GPCRs is gathered.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Fusion process is known to be the initial step of viral infection and hence targeting the entry process is a promising strategy to design antiviral therapy. The self-inhibitory peptides derived from the enveloped (E) proteins function to inhibit the proteinprotein interactions in the membrane fusion step mediated by the viral E protein. Thus, they have the potential to be developed into effective antiviral therapy. Herein, we have developed a Monte Carlo-based computational method with the aim to identify and optimize potential peptide hits from the E proteins. The stability of the peptides, which indicates their potential to bind in situ to the E proteins, was evaluated by two different scoring functions, dipolar distance-scaled, finite, ideal-gas reference state and residue-specific all-atom probability discriminatory function. The method was applied to a-helical Class I HIV-1 gp41, beta-sheet Class II Dengue virus (DENV) type 2 E proteins, as well as Class III Herpes Simplex virus-1 (HSV-1) glycoprotein, a E protein with a mixture of a-helix and beta-sheet structural fold. The peptide hits identified are in line with the druggable regions where the self-inhibitory peptide inhibitors for the three classes of viral fusion proteins were derived. Several novel peptides were identified from either the hydrophobic regions or the functionally important regions on Class II DENV-2 E protein and Class III HSV-1 gB. They have potential to disrupt the proteinprotein interaction in the fusion process and may serve as starting points for the development of novel inhibitors for viral E proteins. 

Relevância:

60.00% 60.00%

Publicador:

Resumo:

L’avancée des infrastructures informatiques a permis l’émergence de la modélisation moléculaire. À cet effet, une multitude de modèles mathématiques sont aujourd’hui disponibles pour simuler différents systèmes chimiques. À l’aide de la modélisation moléculaire, différents types d’interactions chimiques ont été observés. À partir des systèmes les plus simples permettant l’utilisation de modèles quantiques rigoureux, une série d’approximations a été considérée pour rendre envisageable la simulation de systèmes moléculaires de plus en plus complexes. En premier lieu, la théorie de la fonctionnelle de densité dépendante du temps a été utilisée pour simuler les énergies d’excitation de molécules photoactives. De manière similaire, la DFT indépendante du temps a permis la simulation du pont hydrogène intramoléculaire de structures analogues au 1,3,5-triazapentadiène et la rationalisation de la stabilité des états de transition. Par la suite, la dynamique moléculaire et la mécanique moléculaire ont permis de simuler les interactions d’un trimère d’acide cholique et d’un pyrène dans différents solvants. Cette même méthodologie a été utilisée pour simuler les interactions d’un rotaxane-parapluie à l’interface d’un système biphasique. Finalement, l’arrimage moléculaire et les fonctions de score ont été utilisés pour simuler les interactions intermoléculaires entre une protéine et des milliers de candidats moléculaires. Les résultats ont permis de mettre en place une stratégie de développement d’un nouvel inhibiteur enzymatique.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Les facteurs de transcription sont des protéines spécialisées qui jouent un rôle important dans différents processus biologiques tel que la différenciation, le cycle cellulaire et la tumorigenèse. Ils régulent la transcription des gènes en se fixant sur des séquences d’ADN spécifiques (éléments cis-régulateurs). L’identification de ces éléments est une étape cruciale dans la compréhension des réseaux de régulation des gènes. Avec l’avènement des technologies de séquençage à haut débit, l’identification de tout les éléments fonctionnels dans les génomes, incluant gènes et éléments cis-régulateurs a connu une avancée considérable. Alors qu’on est arrivé à estimer le nombre de gènes chez différentes espèces, l’information sur les éléments qui contrôlent et orchestrent la régulation de ces gènes est encore mal définie. Grace aux techniques de ChIP-chip et de ChIP-séquençage il est possible d’identifier toutes les régions du génome qui sont liées par un facteur de transcription d’intérêt. Plusieurs approches computationnelles ont été développées pour prédire les sites fixés par les facteurs de transcription. Ces approches sont classées en deux catégories principales: les algorithmes énumératifs et probabilistes. Toutefois, plusieurs études ont montré que ces approches génèrent des taux élevés de faux négatifs et de faux positifs ce qui rend difficile l’interprétation des résultats et par conséquent leur validation expérimentale. Dans cette thèse, nous avons ciblé deux objectifs. Le premier objectif a été de développer une nouvelle approche pour la découverte des sites de fixation des facteurs de transcription à l’ADN (SAMD-ChIP) adaptée aux données de ChIP-chip et de ChIP-séquençage. Notre approche implémente un algorithme hybride qui combine les deux stratégies énumérative et probabiliste, afin d’exploiter les performances de chacune d’entre elles. Notre approche a montré ses performances, comparée aux outils de découvertes de motifs existants sur des jeux de données simulées et des jeux de données de ChIP-chip et de ChIP-séquençage. SAMD-ChIP présente aussi l’avantage d’exploiter les propriétés de distributions des sites liés par les facteurs de transcription autour du centre des régions liées afin de limiter la prédiction aux motifs qui sont enrichis dans une fenêtre de longueur fixe autour du centre de ces régions. Les facteurs de transcription agissent rarement seuls. Ils forment souvent des complexes pour interagir avec l’ADN pour réguler leurs gènes cibles. Ces interactions impliquent des facteurs de transcription dont les sites de fixation à l’ADN sont localisés proches les uns des autres ou bien médier par des boucles de chromatine. Notre deuxième objectif a été d’exploiter la proximité spatiale des sites liés par les facteurs de transcription dans les régions de ChIP-chip et de ChIP-séquençage pour développer une approche pour la prédiction des motifs composites (motifs composés par deux sites et séparés par un espacement de taille fixe). Nous avons testé ce module pour prédire la co-localisation entre les deux demi-sites ERE qui forment le site ERE, lié par le récepteur des œstrogènes ERα. Ce module a été incorporé à notre outil de découverte de motifs SAMD-ChIP.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In this work, two different docking programs were used, AutoDock and FlexX, which use different types of scoring functions and searching methods. The docking poses of all quinone compounds studied stayed in the same region in the trypanothione reductase. This region is a hydrophobic pocket near to Phe396, Pro398 and Leu399 amino acid residues. The compounds studied displays a higher affinity in trypanothione reductase (TR) than glutathione reductase (GR), since only two out of 28 quinone compounds presented more favorable docking energy in the site of human enzyme. The interaction of quinone compounds with the TR enzyme is in agreement with other studies, which showed different binding sites from the ones formed by cysteines 52 and 58. To verify the results obtained by docking, we carried out a molecular dynamics simulation with the compounds that presented the highest and lowest docking energies. The results showed that the root mean square deviation (RMSD) between the initial and final pose were very small. In addition, the hydrogen bond pattern was conserved along the simulation. In the parasite enzyme, the amino acid residues Leu399, Met400 and Lys402 are replaced in the human enzyme by Met406, Tyr407 and Ala409, respectively. In view of the fact that Leu399 is an amino acid of the Z site, this difference could be explored to design selective inhibitors of TR.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Spatial prediction of hourly rainfall via radar calibration is addressed. The change of support problem (COSP), arising when the spatial supports of different data sources do not coincide, is faced in a non-Gaussian setting; in fact, hourly rainfall in Emilia-Romagna region, in Italy, is characterized by abundance of zero values and right-skeweness of the distribution of positive amounts. Rain gauge direct measurements on sparsely distributed locations and hourly cumulated radar grids are provided by the ARPA-SIMC Emilia-Romagna. We propose a three-stage Bayesian hierarchical model for radar calibration, exploiting rain gauges as reference measure. Rain probability and amounts are modeled via linear relationships with radar in the log scale; spatial correlated Gaussian effects capture the residual information. We employ a probit link for rainfall probability and Gamma distribution for rainfall positive amounts; the two steps are joined via a two-part semicontinuous model. Three model specifications differently addressing COSP are presented; in particular, a stochastic weighting of all radar pixels, driven by a latent Gaussian process defined on the grid, is employed. Estimation is performed via MCMC procedures implemented in C, linked to R software. Communication and evaluation of probabilistic, point and interval predictions is investigated. A non-randomized PIT histogram is proposed for correctly assessing calibration and coverage of two-part semicontinuous models. Predictions obtained with the different model specifications are evaluated via graphical tools (Reliability Plot, Sharpness Histogram, PIT Histogram, Brier Score Plot and Quantile Decomposition Plot), proper scoring rules (Brier Score, Continuous Rank Probability Score) and consistent scoring functions (Root Mean Square Error and Mean Absolute Error addressing the predictive mean and median, respectively). Calibration is reached and the inclusion of neighbouring information slightly improves predictions. All specifications outperform a benchmark model with incorrelated effects, confirming the relevance of spatial correlation for modeling rainfall probability and accumulation.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Background Tools to explore large compound databases in search for analogs of query molecules provide a strategically important support in drug discovery to help identify available analogs of any given reference or hit compound by ligand based virtual screening (LBVS). We recently showed that large databases can be formatted for very fast searching with various 2D-fingerprints using the city-block distance as similarity measure, in particular a 2D-atom pair fingerprint (APfp) and the related category extended atom pair fingerprint (Xfp) which efficiently encode molecular shape and pharmacophores, but do not perceive stereochemistry. Here we investigated related 3D-atom pair fingerprints to enable rapid stereoselective searches in the ZINC database (23.2 million 3D structures). Results Molecular fingerprints counting atom pairs at increasing through-space distance intervals were designed using either all atoms (16-bit 3DAPfp) or different atom categories (80-bit 3DXfp). These 3D-fingerprints retrieved molecular shape and pharmacophore analogs (defined by OpenEye ROCS scoring functions) of 110,000 compounds from the Cambridge Structural Database with equal or better accuracy than the 2D-fingerprints APfp and Xfp, and showed comparable performance in recovering actives from decoys in the DUD database. LBVS by 3DXfp or 3DAPfp similarity was stereoselective and gave very different analogs when starting from different diastereomers of the same chiral drug. Results were also different from LBVS with the parent 2D-fingerprints Xfp or APfp. 3D- and 2D-fingerprints also gave very different results in LBVS of folded molecules where through-space distances between atom pairs are much shorter than topological distances. Conclusions 3DAPfp and 3DXfp are suitable for stereoselective searches for shape and pharmacophore analogs of query molecules in large databases. Web-browsers for searching ZINC by 3DAPfp and 3DXfp similarity are accessible at www.gdb.unibe.ch webcite and should provide useful assistance to drug discovery projects.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The discrimination of true oligomeric protein–protein contacts from nonspecific crystal contacts remains problematic. Criteria that have been used previously base the assignment of oligomeric state on consideration of the area of the interface and/or the results of scoring functions based on statistical potentials. Both techniques have a high success rate but fail in more than 10% of cases. More importantly, the oligomeric states of several proteins are incorrectly assigned by both methods. Here we test the hypothesis that true oligomeric contacts should be identifiable on the basis of an increased degree of conservation of the residues involved in the interface. By quantifying the degree of conservation of the interface and comparing it with that of the remainder of the protein surface, we develop a new criterion that provides a highly effective complement to existing methods.