18 resultados para Scoring

em Indian Institute of Science - Bangalore - Índia


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Classification of pharmacologic activity of a chemical compound is an essential step in any drug discovery process. We develop two new atom-centered fragment descriptors (vertex indices) - one based solely on topological considerations without discriminating atomor bond types, and another based on topological and electronic features. We also assess their usefulness by devising a method to rank and classify molecules with regard to their antibacterial activity. Classification performances of our method are found to be superior compared to two previous studies on large heterogeneous data sets for hit finding and hit-to-lead studies even though we use much fewer parameters. It is found that for hit finding studies topological features (simple graph) alone provide significant discriminating power, and for hit-to-lead process small but consistent improvement can be made by additionally including electronic features (colored graph). Our approach is simple, interpretable, and suitable for design of molecules as we do not use any physicochemical properties. The singular use of vertex index as descriptor, novel range based feature extraction, and rigorous statistical validation are the key elements of this study.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The immune response against Salmonella is multi-faceted involving both the innate and the adaptive immune system. The characterization of specific Salmonella antigens inducing immune response could critically contribute to the development of epitope based vaccines for Salmonella. We have tried to identify a protective T cell epitope(s) of Salmonella, as cell mediated immunity conferred by CD8+ T cells is the most crucial subset conferring protective immunity against Salmonella. It being a proven fact that secreted proteins are better in inducing cell mediated immunity than cell surface and cytosolic antigens, we have analyzed all the genbank annotated Salmonella pathogenicity island 1 and 2 secreted proteins of Salmonella enterica serovar Typhimurium (S. typhimurium) and S. enterica serovar Typhi (S. typhi). They were subjected to BIMAS and SYFPEITHI analysis to map MHC-I and MHC-II binding epitopes. The huge profile of possible T cell epitopes obtained from the two classes of secreted proteins were tabulated and using a scoring system that considers the binding affinity and promiscuity of binding to more than one allele, SopB and SifB were chosen for experimental confirmation in murine immunization model. The entire SopB and SifB genes were cloned into DNA vaccine vectors and were administered along with live attenuated Salmonella and it was found that SopB vaccination reduced the bacterial burden of organs by about 5-fold on day 4 and day 8 after challenge with virulent Salmonella and proved to be a more efficient vaccination strategy than live attenuated bacteria alone.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Query incentive networks capture the role of incentives in extracting information from decentralized information networks such as a social network. Several game theoretic tilt:Kids of query incentive networks have been proposed in the literature to study and characterize the dependence, of the monetary reward required to extract the answer for a query, on various factors such as the structure of the network, the level of difficulty of the query, and the required success probability.None of the existing models, however, captures the practical andimportant factor of quality of answers. In this paper, we develop a complete mechanism design based framework to incorporate the quality of answers, in the monetization of query incentive networks. First, we extend the model of Kleinberg and Raghavan [2] to allow the nodes to modulate the incentive on the basis of the quality of the answer they receive. For this qualify conscious model. we show are existence of a unique Nash equilibrium and study the impact of quality of answers on the growth rate of the initial reward, with respect to the branching factor of the network. Next, we present two mechanisms; the direct comparison mechanism and the peer prediction mechanism, for truthful elicitation of quality from the agents. These mechanisms are based on scoring rules and cover different; scenarios which may arise in query incentive networks. We show that the proposed quality elicitation mechanisms are incentive compatible and ex-ante budget balanced. We also derive conditions under which ex-post budget balance can beachieved by these mechanisms.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Background: Protein phosphorylation is a generic way to regulate signal transduction pathways in all kingdoms of life. In many organisms, it is achieved by the large family of Ser/Thr/Tyr protein kinases which are traditionally classified into groups and subfamilies on the basis of the amino acid sequence of their catalytic domains. Many protein kinases are multidomain in nature but the diversity of the accessory domains and their organization are usually not taken into account while classifying kinases into groups or subfamilies. Methodology: Here, we present an approach which considers amino acid sequences of complete gene products, in order to suggest refinements in sets of pre-classified sequences. The strategy is based on alignment-free similarity scores and iterative Area Under the Curve (AUC) computation. Similarity scores are computed by detecting common patterns between two sequences and scoring them using a substitution matrix, with a consistent normalization scheme. This allows us to handle full-length sequences, and implicitly takes into account domain diversity and domain shuffling. We quantitatively validate our approach on a subset of 212 human protein kinases. We then employ it on the complete repertoire of human protein kinases and suggest few qualitative refinements in the subfamily assignment stored in the KinG database, which is based on catalytic domains only. Based on our new measure, we delineate 37 cases of potential hybrid kinases: sequences for which classical classification based entirely on catalytic domains is inconsistent with the full-length similarity scores computed here, which implicitly consider multi-domain nature and regions outside the catalytic kinase domain. We also provide some examples of hybrid kinases of the protozoan parasite Entamoeba histolytica. Conclusions: The implicit consideration of multi-domain architectures is a valuable inclusion to complement other classification schemes. The proposed algorithm may also be employed to classify other families of enzymes with multidomain architecture.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A successful protein-protein docking study culminates in identification of decoys at top ranks with near-native quaternary structures. However, this task remains enigmatic because no generalized scoring functions exist that effectively infer decoys according to the similarity to near-native quaternary structures. Difficulties arise because of the highly irregular nature of the protein surface and the significant variation of the nonbonding and solvation energies based on the chemical composition of the protein-protein interface. In this work, we describe a novel method combining an interface-size filter, a regression model for geometric compatibility (based on two correlated surface and packing parameters), and normalized interaction energy (calculated from correlated nonbonded and solvation energies), to effectively rank decoys from a set of 10,000 decoys. Tests on 30 unbound binary protein-protein complexes show that in 16 cases we can identify at least one decoy in top three ranks having <= 10 angstrom backbone root mean square deviation from true binding geometry. Comparisons with other state-of-art methods confirm the improved ranking power of our method without the use of any experiment-guided restraints, evolutionary information, statistical propensities, or modified interaction energy equations. Tests on 118 less-difficult bound binary protein-protein complexes with <= 35% sequence redundancy at the interface showed that in 77% cases, at least 1 in 10,000 decoys were identified with <= 5 angstrom backbone root mean square deviation from true geometry at first rank. The work will promote the use of new concepts where correlations among parameters provide more robust scoring models. It will facilitate studies involving molecular interactions, including modeling of large macromolecular assemblies and protein structure prediction. (C) 2010 Wiley Periodicals, Inc. J Comput Chem 32: 787-796, 2011.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The protein-protein docking programs typically perform four major tasks: (i) generation of docking poses, (ii) selecting a subset of poses, (iii) their structural refinement and (iv) scoring, ranking for the final assessment of the true quaternary structure. Although the tasks can be integrated or performed in a serial order, they are by nature modular, allowing an opportunity to substitute one algorithm with another. We have implemented two modular web services, (i) PRUNE: to select a subset of docking poses generated during sampling search (http://pallab.serc.iisc.ernet.in/prune) and (ii) PROBE: to refine, score and rank them (http://pallab.serc.iisc.ernet.in/probe). The former uses a new interface area based edge-scoring function to eliminate > 95% of the poses generated during docking search. In contrast to other multi-parameter-based screening functions, this single parameter based elimination reduces the computational time significantly, in addition to increasing the chances of selecting native-like models in the top rank list. The PROBE server performs ranking of pruned poses, after structure refinement and scoring using a regression model for geometric compatibility, and normalized interaction energy. While web-service similar to PROBE is infrequent, no web-service akin to PRUNE has been described before. Both the servers are publicly accessible and free for use.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Membrane proteins are involved in a number of important biological functions. Yet, they are poorly understood from the structure and folding point of view. The external environment being drastically different from that of globular proteins, the intra-protein interactions in membrane proteins are also expected to be different. Hence, statistical potentials representing the features of inter-residue interactions based exclusively on the structures of membrane proteins are much needed. Currently, a reasonable number of structures are available, making it possible to undertake such an analysis on membrane proteins. In this study we have examined the inter-residue interaction propensities of amino acids in the membrane spanning regions of the alpha-helical membrane (HM) proteins. Recently we have shown that valuable information can be obtained on globular proteins by the evaluation of the pair-wise interactions of amino acids by classifying them into different structural environments, based on factors such as the secondary structure or the number of contacts that a residue can make. Here we have explored the possible ways of classifying the intra-protein environment of HM proteins and have developed scoring functions based on different classification schemes. On evaluation of different schemes, we find that the scheme which classifies amino acids to different intra-contact environment is the most promising one. Based on this classification scheme, we also redefine the hydrophobicity scale of amino acids in HM proteins.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A fundamental task in bioinformatics involves a transfer of knowledge from one protein molecule onto another by way of recognizing similarities. Such similarities are obtained at different levels, that of sequence, whole fold, or important substructures. Comparison of binding sites is important to understand functional similarities among the proteins and also to understand drug cross-reactivities. Current methods in literature have their own merits and demerits, warranting exploration of newer concepts and algorithms, especially for large-scale comparisons and for obtaining accurate residue-wise mappings. Here, we report the development of a new algorithm, PocketAlign, for obtaining structural superpositions of binding sites. The software is available as a web-service at http://proline.physicslisc.emetin/pocketalign/. The algorithm encodes shape descriptors in the form of geometric perspectives, supplemented by chemical group classification. The shape descriptor considers several perspectives with each residue as the focus and captures relative distribution of residues around it in a given site. Residue-wise pairings are computed by comparing the set of perspectives of the first site with that of the second, followed by a greedy approach that incrementally combines residue pairings into a mapping. The mappings in different frames are then evaluated by different metrics encoding the extent of alignment of individual geometric perspectives. Different initial seed alignments are computed, each subsequently extended by detecting consequential atomic alignments in a three-dimensional grid, and the best 500 stored in a database. Alignments are then ranked, and the top scoring alignments reported, which are then streamed into Pymol for visualization and analyses. The method is validated for accuracy and sensitivity and benchmarked against existing methods. An advantage of PocketAlign, as compared to some of the existing tools available for binding site comparison in literature, is that it explores different schemes for identifying an alignment thus has a better potential to capture similarities in ligand recognition abilities. PocketAlign, by finding a detailed alignment of a pair of sites, provides insights as to why two sites are similar and which set of residues and atoms contribute to the similarity.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Convergence of the vast sequence space of proteins into a highly restricted fold/conformational space suggests a simple yet unique underlying mechanism of protein folding that has been the subject of much debate in the last several decades. One of the major challenges related to the understanding of protein folding or in silico protein structure prediction is the discrimination of non-native structures/decoys from the native structure. Applications of knowledge-based potentials to attain this goal have been extensively reported in the literature. Also, scoring functions based on accessible surface area and amino acid neighbourhood considerations were used in discriminating the decoys from native structures. In this article, we have explored the potential of protein structure network (PSN) parameters to validate the native proteins against a large number of decoy structures generated by diverse methods. We are guided by two principles: (a) the PSNs capture the local properties from a global perspective and (b) inclusion of non-covalent interactions, at all-atom level, including the side-chain atoms, in the network construction accommodates the sequence dependent features. Several network parameters such as the size of the largest cluster, community size, clustering coefficient are evaluated and scored on the basis of the rank of the native structures and the Z-scores. The network analysis of decoy structures highlights the importance of the global properties contributing to the uniqueness of native structures. The analysis also exhibits that the network parameters can be used as metrics to identify the native structures and filter out non-native structures/decoys in a large number of data-sets; thus also has a potential to be used in the protein `structure prediction' problem.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Over the past two decades, many ingenious efforts have been made in protein remote homology detection. Because homologous proteins often diversify extensively in sequence, it is challenging to demonstrate such relatedness through entirely sequence-driven searches. Here, we describe a computational method for the generation of `protein-like' sequences that serves to bridge gaps in protein sequence space. Sequence profile information, as embodied in a position-specific scoring matrix of multiply aligned sequences of bona fide family members, serves as the starting point in this algorithm. The observed amino acid propensity and the selection of a random number dictate the selection of a residue for each position in the sequence. In a systematic manner, and by applying a `roulette-wheel' selection approach at each position, we generate parent family-like sequences and thus facilitate an enlargement of sequence space around the family. When generated for a large number of families, we demonstrate that they expand the utility of natural intermediately related sequences in linking distant proteins. In 91% of the assessed examples, inclusion of designed sequences improved fold coverage by 5-10% over searches made in their absence. Furthermore, with several examples from proteins adopting folds such as TIM, globin, lipocalin and others, we demonstrate that the success of including designed sequences in a database positively sensitized methods such as PSI-BLAST and Cascade PSI-BLAST and is a promising opportunity for enormously improved remote homology recognition using sequence information alone.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Information diffusion and influence maximization are important and extensively studied problems in social networks. Various models and algorithms have been proposed in the literature in the context of the influence maximization problem. A crucial assumption in all these studies is that the influence probabilities are known to the social planner. This assumption is unrealistic since the influence probabilities are usually private information of the individual agents and strategic agents may not reveal them truthfully. Moreover, the influence probabilities could vary significantly with the type of the information flowing in the network and the time at which the information is propagating in the network. In this paper, we use a mechanism design approach to elicit influence probabilities truthfully from the agents. Our main contribution is to design a scoring rule based mechanism in the context of the influencer-influencee model. In particular, we show the incentive compatibility of the mechanisms and propose a reverse weighted scoring rule based mechanism as an appropriate mechanism to use.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Ferric uptake regulator (Fur) is a transcriptional regulator controlling the expression of genes involved in iron homeostasis and plays an important role in pathogenesis. Fur-regulated sRNAs/CDSs were found to have upstream Fur Binding Sites (FBS). We have constructed a Positional Weight Matrix from 100 known FBS (19 nt) and tracked the `Orphan' FBSs. Possible Fur regulated sRNAs and CDSs were identified by comparing their genomic locations with the `Orphan' FBSs identified. Thirty-eight `novel' and all known Fur regulated sRNAs in nine proteobacteria were identified. In addition, we identified high scoring FBSs in the promoter regions of the 304 CDSs and 68 of them were involved in siderophore biosynthesis, iron-transporters, two-component system, starch/sugar metabolism, sulphur/methane metabolism, etc. The present study shows that the Fur regulator controls the expression of genes involved in diverse metabolic activities and it is not limited to iron metabolism alone. (C) 2012 Elsevier B.V. All rights reserved.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Most of the biological processes are governed through specific protein-ligand interactions. Discerning different components that contribute toward a favorable protein-ligand interaction could contribute significantly toward better understanding protein function, rationalizing drug design and obtaining design principles for protein engineering. The Protein Data Bank (PDB) currently hosts the structure of similar to 68 000 protein-ligand complexes. Although several databases exist that classify proteins according to sequence and structure, a mere handful of them annotate and classify protein-ligand interactions and provide information on different attributes of molecular recognition. In this study, an exhaustive comparison of all the biologically relevant ligand-binding sites (84 846 sites) has been conducted using PocketMatch: a rapid, parallel, in-house algorithm. PocketMatch quantifies the similarity between binding sites based on structural descriptors and residue attributes. A similarity network was constructed using binding sites whose PocketMatch scores exceeded a high similarity threshold (0.80). The binding site similarity network was clustered into discrete sets of similar sites using the Markov clustering (MCL) algorithm. Furthermore, various computational tools have been used to study different attributes of interactions within the individual clusters. The attributes can be roughly divided into (i) binding site characteristics including pocket shape, nature of residues and interaction profiles with different kinds of atomic probes, (ii) atomic contacts consisting of various types of polar, hydrophobic and aromatic contacts along with binding site water molecules that could play crucial roles in protein-ligand interactions and (iii) binding energetics involved in interactions derived from scoring functions developed for docking. For each ligand-binding site in each protein in the PDB, site similarity information, clusters they belong to and description of site attributes are provided as a relational database-protein-ligand interaction clusters (PLIC).

Relevância:

10.00% 10.00%

Publicador:

Resumo:

m-AMSA, an established inhibitor of eukaryotic type II topoisomerases, exerts its cidal effect by binding to the enzyme-DNA complex thus inhibiting the DNA religation step. The molecule and its analogues have been successfully used as chemotherapeutic agents against different forms of cancer. After virtual screening using a homology model of the Mycobacterium tuberculosis topoisomerase I, we identified m-AMSA as a high scoring hit. We demonstrate that m-AMSA can inhibit the DNA relaxation activity of topoisomerase I from M. tuberculosis and Mycobacterium smegmatis. In a whole cell assay, m-AMSA inhibited the growth of both the mycobacteria. (C) 2014 Elsevier Inc. All rights reserved.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The problem of bipartite ranking, where instances are labeled positive or negative and the goal is to learn a scoring function that minimizes the probability of mis-ranking a pair of positive and negative instances (or equivalently, that maximizes the area under the ROC curve), has been widely studied in recent years. A dominant theoretical and algorithmic framework for the problem has been to reduce bipartite ranking to pairwise classification; in particular, it is well known that the bipartite ranking regret can be formulated as a pairwise classification regret, which in turn can be upper bounded using usual regret bounds for classification problems. Recently, Kotlowski et al. (2011) showed regret bounds for bipartite ranking in terms of the regret associated with balanced versions of the standard (non-pairwise) logistic and exponential losses. In this paper, we show that such (non-pairwise) surrogate regret bounds for bipartite ranking can be obtained in terms of a broad class of proper (composite) losses that we term as strongly proper. Our proof technique is much simpler than that of Kotlowski et al. (2011), and relies on properties of proper (composite) losses as elucidated recently by Reid and Williamson (2010, 2011) and others. Our result yields explicit surrogate bounds (with no hidden balancing terms) in terms of a variety of strongly proper losses, including for example logistic, exponential, squared and squared hinge losses as special cases. An important consequence is that standard algorithms minimizing a (non-pairwise) strongly proper loss, such as logistic regression and boosting algorithms (assuming a universal function class and appropriate regularization), are in fact consistent for bipartite ranking; moreover, our results allow us to quantify the bipartite ranking regret in terms of the corresponding surrogate regret. We also obtain tighter surrogate bounds under certain low-noise conditions via a recent result of Clemencon and Robbiano (2011).