990 resultados para Banach Sequence Space
Resumo:
Over the past two decades, many ingenious efforts have been made in protein remote homology detection. Because homologous proteins often diversify extensively in sequence, it is challenging to demonstrate such relatedness through entirely sequence-driven searches. Here, we describe a computational method for the generation of `protein-like' sequences that serves to bridge gaps in protein sequence space. Sequence profile information, as embodied in a position-specific scoring matrix of multiply aligned sequences of bona fide family members, serves as the starting point in this algorithm. The observed amino acid propensity and the selection of a random number dictate the selection of a residue for each position in the sequence. In a systematic manner, and by applying a `roulette-wheel' selection approach at each position, we generate parent family-like sequences and thus facilitate an enlargement of sequence space around the family. When generated for a large number of families, we demonstrate that they expand the utility of natural intermediately related sequences in linking distant proteins. In 91% of the assessed examples, inclusion of designed sequences improved fold coverage by 5-10% over searches made in their absence. Furthermore, with several examples from proteins adopting folds such as TIM, globin, lipocalin and others, we demonstrate that the success of including designed sequences in a database positively sensitized methods such as PSI-BLAST and Cascade PSI-BLAST and is a promising opportunity for enormously improved remote homology recognition using sequence information alone.
Resumo:
Protein functional annotation relies on the identification of accurate relationships, sequence divergence being a key factor. This is especially evident when distant protein relationships are demonstrated only with three-dimensional structures. To address this challenge, we describe a computational approach to purposefully bridge gaps between related protein families through directed design of protein-like ``linker'' sequences. For this, we represented SCOP domain families, integrated with sequence homologues, as multiple profiles and performed HMM-HMM alignments between related domain families. Where convincing alignments were achieved, we applied a roulette wheel-based method to design 3,611,010 protein-like sequences corresponding to 374 SCOP folds. To analyze their ability to link proteins in homology searches, we used 3024 queries to search two databases, one containing only natural sequences and another one additionally containing designed sequences. Our results showed that augmented database searches showed up to 30% improvement in fold coverage for over 74% of the folds, with 52 folds achieving all theoretically possible connections. Although sequences could not be designed between some families, the availability of designed sequences between other families within the fold established the sequence continuum to demonstrate 373 difficult relationships. Ultimately, as a practical and realistic extension, we demonstrate that such protein-like sequences can be ``plugged-into'' routine and generic sequence database searches to empower not only remote homology detection but also fold recognition. Our richly statistically supported findings show that complementary searches in both databases will increase the effectiveness of sequence-based searches in recognizing all homologues sharing a common fold. (C) 2013 Elsevier Ltd. All rights reserved.
Resumo:
People with sequence-space synesthesia (SSS) report stable visuo-spatial forms corresponding to numbers, days, and months (amongst others). This type of synesthesia has intrigued scientists for over 130 years but the lack of an agreed upon tool for assessing it has held back research on this phenomenon. The present study builds on previous tests by measuring the consistency of spatial locations that is known to discriminate controls from synesthetes. We document, for the first time, the sensitivity and specificity of such a test and suggest a diagnostic cut-off point for discriminating between the groups based on the area bounded by different placement attempts with the same item.
Resumo:
2000 Mathematics Subject Classification: 45G15, 26A33, 32A55, 46E15.
Resumo:
The notion of optimization is inherent in protein design. A long linear chain of twenty types of amino acid residues are known to fold to a 3-D conformation that minimizes the combined inter-residue energy interactions. There are two distinct protein design problems, viz. predicting the folded structure from a given sequence of amino acid monomers (folding problem) and determining a sequence for a given folded structure (inverse folding problem). These two problems have much similarity to engineering structural analysis and structural optimization problems respectively. In the folding problem, a protein chain with a given sequence folds to a conformation, called a native state, which has a unique global minimum energy value when compared to all other unfolded conformations. This involves a search in the conformation space. This is somewhat akin to the principle of minimum potential energy that determines the deformed static equilibrium configuration of an elastic structure of given topology, shape, and size that is subjected to certain boundary conditions. In the inverse-folding problem, one has to design a sequence with some objectives (having a specific feature of the folded structure, docking with another protein, etc.) and constraints (sequence being fixed in some portion, a particular composition of amino acid types, etc.) while obtaining a sequence that would fold to the desired conformation satisfying the criteria of folding. This requires a search in the sequence space. This is similar to structural optimization in the design-variable space wherein a certain feature of structural response is optimized subject to some constraints while satisfying the governing static or dynamic equilibrium equations. Based on this similarity, in this work we apply the topology optimization methods to protein design, discuss modeling issues and present some initial results.
Resumo:
Determining the sequence of amino acid residues in a heteropolymer chain of a protein with a given conformation is a discrete combinatorial problem that is not generally amenable for gradient-based continuous optimization algorithms. In this paper we present a new approach to this problem using continuous models. In this modeling, continuous "state functions" are proposed to designate the type of each residue in the chain. Such a continuous model helps define a continuous sequence space in which a chosen criterion is optimized to find the most appropriate sequence. Searching a continuous sequence space using a deterministic optimization algorithm makes it possible to find the optimal sequences with much less computation than many other approaches. The computational efficiency of this method is further improved by combining it with a graph spectral method, which explicitly takes into account the topology of the desired conformation and also helps make the combined method more robust. The continuous modeling used here appears to have additional advantages in mimicking the folding pathways and in creating the energy landscapes that help find sequences with high stability and kinetic accessibility. To illustrate the new approach, a widely used simplifying assumption is made by considering only two types of residues: hydrophobic (H) and polar (P). Self-avoiding compact lattice models are used to validate the method with known results in the literature and data that can be practically obtained by exhaustive enumeration on a desktop computer. We also present examples of sequence design for the HP models of some real proteins, which are solved in less than five minutes on a single-processor desktop computer Some open issues and future extensions are noted.
Resumo:
NrichD
Resumo:
In the first half of this memoir we explore the interrelationships between the abstract theory of limit operators (see e.g. the recent monographs of Rabinovich, Roch and Silbermann (2004) and Lindner (2006)) and the concepts and results of the generalised collectively compact operator theory introduced by Chandler-Wilde and Zhang (2002). We build up to results obtained by applying this generalised collectively compact operator theory to the set of limit operators of an operator (its operator spectrum). In the second half of this memoir we study bounded linear operators on the generalised sequence space , where and is some complex Banach space. We make what seems to be a more complete study than hitherto of the connections between Fredholmness, invertibility, invertibility at infinity, and invertibility or injectivity of the set of limit operators, with some emphasis on the case when the operator is a locally compact perturbation of the identity. Especially, we obtain stronger results than previously known for the subtle limiting cases of and . Our tools in this study are the results from the first half of the memoir and an exploitation of the partial duality between and and its implications for bounded linear operators which are also continuous with respect to the weaker topology (the strict topology) introduced in the first half of the memoir. Results in this second half of the memoir include a new proof that injectivity of all limit operators (the classic Favard condition) implies invertibility for a general class of almost periodic operators, and characterisations of invertibility at infinity and Fredholmness for operators in the so-called Wiener algebra. In two final chapters our results are illustrated by and applied to concrete examples. Firstly, we study the spectra and essential spectra of discrete Schrödinger operators (both self-adjoint and non-self-adjoint), including operators with almost periodic and random potentials. In the final chapter we apply our results to integral operators on .
Resumo:
We investigate the operator associating with a function fєLp2π, 1
sequence of Fourier coefficients of ƒ with respect to a trigonometric gap system, as well as an operator from a modular space X ρs(ϕ) to the generalized Orlicz sequence space lϕ.
Resumo:
There are two main aims of the paper. The first one is to extend the criterion for the precompactness of sets in Banach function spaces to the setting of quasi-Banach function spaces. The second one is to extend the criterion for the precompactness of sets in the Lebesgue spaces $L_p(\Rn)$, $1 \leq p < \infty$, to the so-called power quasi-Banach function spaces.
These criteria are applied to establish compact embeddings of abstract Besov spaces into quasi-Banach function spaces. The results are illustrated on embeddings of Besov spaces $B^s_{p,q}(\Rn)$, $0spaces.
Resumo:
Using six kinds of lattice types (4×4 ,5×5 , and6×6 square lattices;3×3×3 cubic lattice; and2+3+4+3+2 and4+5+6+5+4 triangular lattices), three different size alphabets (HP ,HNUP , and 20 letters), and two energy functions, the designability of proteinstructures is calculated based on random samplings of structures and common biased sampling (CBS) of proteinsequence space. Then three quantities stability (average energy gap),foldability, and partnum of the structure, which are defined to elucidate the designability, are calculated. The authors find that whatever the type of lattice, alphabet size, and energy function used, there will be an emergence of highly designable (preferred) structure. For all cases considered, the local interactions reduce degeneracy and make the designability higher. The designability is sensitive to the lattice type, alphabet size, energy function, and sampling method of the sequence space. Compared with the random sampling method, both the CBS and the Metropolis Monte Carlo sampling methods make the designability higher. The correlation coefficients between the designability, stability, and foldability are mostly larger than 0.5, which demonstrate that they have strong correlation relationship. But the correlation relationship between the designability and the partnum is not so strong because the partnum is independent of the energy. The results are useful in practical use of the designability principle, such as to predict the proteintertiary structure.
Resumo:
In this paper, we present numerical evidence that supports the notion of minimization in the sequence space of proteins for a target conformation. We use the conformations of the real proteins in the Protein Data Bank (PDB) and present computationally efficient methods to identify the sequences with minimum energy. We use edge-weighted connectivity graph for ranking the residue sites with reduced amino acid alphabet and then use continuous optimization to obtain the energy-minimizing sequences. Our methods enable the computation of a lower bound as well as a tight upper bound for the energy of a given conformation. We validate our results by using three different inter-residue energy matrices for five proteins from protein data bank (PDB), and by comparing our energy-minimizing sequences with 80 million diverse sequences that are generated based on different considerations in each case. When we submitted some of our chosen energy-minimizing sequences to Basic Local Alignment Search Tool (BLAST), we obtained some sequences from non-redundant protein sequence database that are similar to ours with an E-value of the order of 10(-7). In summary, we conclude that proteins show a trend towards minimizing energy in the sequence space but do not seem to adopt the global energy-minimizing sequence. The reason for this could be either that the existing energy matrices are not able to accurately represent the inter-residue interactions in the context of the protein environment or that Nature does not push the optimization in the sequence space, once it is able to perform the function.
Resumo:
Background: Signal transduction events often involve transient, yet specific, interactions between structurally conserved protein domains and polypeptide sequences in target proteins. The identification and validation of these associating domains is crucial to understand signal transduction pathways that modulate different cellular or developmental processes. Bioinformatics strategies to extract and integrate information from diverse sources have been shown to facilitate the experimental design to understand complex biological events. These methods, primarily based on information from high-throughput experiments, have also led to the identification of new connections thus providing hypothetical models for cellular events. Such models, in turn, provide a framework for directing experimental efforts for validating the predicted molecular rationale for complex cellular processes. In this context, it is envisaged that the rational design of peptides for protein-peptide binding studies could substantially facilitate the experimental strategies to evaluate a predicted interaction. This rational design procedure involves the integration of protein-protein interaction data, gene ontology, physico-chemical calculations, domain-domain interaction data and information on functional sites or critical residues. Results: Here we describe an integrated approach called ``PeptideMine'' for the identification of peptides based on specific functional patterns present in the sequence of an interacting protein. This approach based on sequence searches in the interacting sequence space has been developed into a webserver, which can be used for the identification and analysis of peptides, peptide homologues or functional patterns from the interacting sequence space of a protein. To further facilitate experimental validation, the PeptideMine webserver also provides a list of physico-chemical parameters corresponding to the peptide to determine the feasibility of using the peptide for in vitro biochemical or biophysical studies. Conclusions: The strategy described here involves the integration of data and tools to identify potential interacting partners for a protein and design criteria for peptides based on desired biochemical properties. Alongside the search for interacting protein sequences using three different search programs, the server also provides the biochemical characteristics of candidate peptides to prune peptide sequences based on features that are most suited for a given experiment. The PeptideMine server is available at the URL: http://caps.ncbs.res.in/peptidemine
Resumo:
Convergence of the vast sequence space of proteins into a highly restricted fold/conformational space suggests a simple yet unique underlying mechanism of protein folding that has been the subject of much debate in the last several decades. One of the major challenges related to the understanding of protein folding or in silico protein structure prediction is the discrimination of non-native structures/decoys from the native structure. Applications of knowledge-based potentials to attain this goal have been extensively reported in the literature. Also, scoring functions based on accessible surface area and amino acid neighbourhood considerations were used in discriminating the decoys from native structures. In this article, we have explored the potential of protein structure network (PSN) parameters to validate the native proteins against a large number of decoy structures generated by diverse methods. We are guided by two principles: (a) the PSNs capture the local properties from a global perspective and (b) inclusion of non-covalent interactions, at all-atom level, including the side-chain atoms, in the network construction accommodates the sequence dependent features. Several network parameters such as the size of the largest cluster, community size, clustering coefficient are evaluated and scored on the basis of the rank of the native structures and the Z-scores. The network analysis of decoy structures highlights the importance of the global properties contributing to the uniqueness of native structures. The analysis also exhibits that the network parameters can be used as metrics to identify the native structures and filter out non-native structures/decoys in a large number of data-sets; thus also has a potential to be used in the protein `structure prediction' problem.