928 resultados para Secondary Structure Prediction


Relevância:

100.00% 100.00%

Publicador:

Resumo:

In der vorliegenden Dissertation wurden verschiedene Kandidatengene für den Wilmstumor (WT), eine Tumorerkrankung der Niere, identifiziert und charakterisiert. Da dieses frühkindliche Malignom aus einer inkorrekt ablaufenden Metanephrogenese resultiert, wurden die Genexpressionsmuster verschiedener humaner Wilmstumor- und Normalnierengewebe (adulte sowie fetale Niere) mit Hilfe der Technik des differential display verglichen und die als differenziell exprimiert identifizierten Gene kloniert und charakterisiert. Bei TM7SF1 handelt es sich um ein neues Gen, dessen Transkription im Zuge der Metanephrogenese angeschaltet wird. Das von ihm codierte putative Protein kann aufgrund von Strukturvorhersagen vermutlich zur Familie G Protein-gekoppelter Rezeptoren gezählt werden. Die ableitbare Funktion als Signalmolekül der Nierenentwicklung, sowie seine Lokalisation in einem WT-Lokus (1q42-q43) machen TM7SF1 zu einem aussichtsreichen Kandidatengen für den WT. Darüber hinaus konnten die Voraussetzungen für funktionelle Tests, die eine weitere Charakterisierung von TM7SF1 erlauben, geschaffen werden (Identifikation und Klonierung des murinen Homologen, stabil überexprimierende WT-Zelllinien, Antikörper gegen den Aminoterminus des putativen Proteins). Mit TCF2 wurde ein weiteres Gen identifiziert, dessen Produkt in Prozessen der Metanephrogenese eine Rolle spielt. Die signifikante Herunterregulation der TCF2-Expression in der großen Mehrzahl der untersuchten WTs, die innerhalb der vorliegenden Arbeit gezeigte Regulation durch das WT1-Genprodukt, sowie seine genomische Lokalisation in einem Intervall für die familiäre Form des WT (FWT1 in 17q12-q21) zeigen das Potenzial von TCF2, als Kandidatengen für den FWT zu gelten. Darüber hinaus wurde mit GLI3 ein in verschiedenen WTs stark exprimiertes Gen identifiziert. Sein Produkt ist eine Komponente des entwicklungsbiologisch relevanten und in verschiedene Tumorerkrankungen involvierten sonic hedgehog-Signaltransduktionsweges. Mit FE7A3 und CDT151 konnten zwei differenziell exprimierte cDNAs identifiziert werden, die Teile neuer Gene darstellen und die in WT-Loci kartiert werden konnten. Aufgrund von Homologievergleichen im Bereich der identifizierten offenen Leserahmen konnte eine mögliche Bedeutung der putativen Genprodukte für die WT-Pathogenese als Zelladhäsionsmolekül (FE7A3) bzw. als mit der Proliferation assoziiertem Transkriptionsfaktor (CDT151) herausgearbeitet werden. Neben den komparativen Genexpressionsuntersuchungen wurde in einem zweiten Ansatz die transkriptionelle Regulation des einzigen bisher klonierten Wilmstumorgens (WT1) analysiert. Mit Hilfe vergleichender Reportergenanalysen in WT1-exprimierenden und nicht-exprimierenden Zelllinien konnten neue für die transkriptionelle Regulation von WT1 relevante Bereiche identifiziert werden. Darüber hinaus wurde der für die Transkriptionsfaktoren SP1 und SP3 an anderen Promotoren beschriebene funktionelle Antagonismus für die WT1-Expression untersucht und in Gelretardationsanalysen mit dem WT1-Expressionsstatus oben genannter Zelllinien korreliert.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Purple acid phosphatases are a family of binuclear metallohydrolases that have been identified in plants, animals and fungi. Only one isoform of similar to 35 kDa has been isolated from animals, where it is associated with bone resorption and microbial killing through its phosphatase activity, and hydroxyl radical production, respectively. Using the sensitive PSI-BLAST search method, sequences representing new purple acid phosphatase-like proteins have been identified in mammals, insects and nematodes. These new putative isoforms are closely related to the similar to 55 kDa purple acid phosphatase characterized from plants. Secondary structure prediction of the new human isoform further confirms its similarity to a purple acid phosphatase from the red kidney bean. A structural model for the human enzyme was constructed based on the red kidney bean purple acid phosphatase structure. This model shows that the catalytic centre observed in other purple acid phosphatases is also present in this new isoform. These observations suggest that the sequences identified in this study represent a novel subfamily of plant-like purple acid phosphatases in animals and humans. (c) 2006 Elsevier B.V. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background: The residue-wise contact order (RWCO) describes the sequence separations between the residues of interest and its contacting residues in a protein sequence. It is a new kind of one-dimensional protein structure that represents the extent of long-range contacts and is considered as a generalization of contact order. Together with secondary structure, accessible surface area, the B factor, and contact number, RWCO provides comprehensive and indispensable important information to reconstructing the protein three-dimensional structure from a set of one-dimensional structural properties. Accurately predicting RWCO values could have many important applications in protein three-dimensional structure prediction and protein folding rate prediction, and give deep insights into protein sequence-structure relationships. Results: We developed a novel approach to predict residue-wise contact order values in proteins based on support vector regression (SVR), starting from primary amino acid sequences. We explored seven different sequence encoding schemes to examine their effects on the prediction performance, including local sequence in the form of PSI-BLAST profiles, local sequence plus amino acid composition, local sequence plus molecular weight, local sequence plus secondary structure predicted by PSIPRED, local sequence plus molecular weight and amino acid composition, local sequence plus molecular weight and predicted secondary structure, and local sequence plus molecular weight, amino acid composition and predicted secondary structure. When using local sequences with multiple sequence alignments in the form of PSI-BLAST profiles, we could predict the RWCO distribution with a Pearson correlation coefficient (CC) between the predicted and observed RWCO values of 0.55, and root mean square error (RMSE) of 0.82, based on a well-defined dataset with 680 protein sequences. Moreover, by incorporating global features such as molecular weight and amino acid composition we could further improve the prediction performance with the CC to 0.57 and an RMSE of 0.79. In addition, combining the predicted secondary structure by PSIPRED was found to significantly improve the prediction performance and could yield the best prediction accuracy with a CC of 0.60 and RMSE of 0.78, which provided at least comparable performance compared with the other existing methods. Conclusion: The SVR method shows a prediction performance competitive with or at least comparable to the previously developed linear regression-based methods for predicting RWCO values. In contrast to support vector classification (SVC), SVR is very good at estimating the raw value profiles of the samples. The successful application of the SVR approach in this study reinforces the fact that support vector regression is a powerful tool in extracting the protein sequence-structure relationship and in estimating the protein structural profiles from amino acid sequences.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this study, we propose a novel method to predict the solvent accessible surface areas of transmembrane residues. For both transmembrane alpha-helix and beta-barrel residues, the correlation coefficients between the predicted and observed accessible surface areas are around 0.65. On the basis of predicted accessible surface areas, residues exposed to the lipid environment or buried inside a protein can be identified by using certain cutoff thresholds. We have extensively examined our approach based on different definitions of accessible surface areas and a variety of sets of control parameters. Given that experimentally determining the structures of membrane proteins is very difficult and membrane proteins are actually abundant in nature, our approach is useful for theoretically modeling membrane protein tertiary structures, particularly for modeling the assembly of transmembrane domains. This approach can be used to annotate the membrane proteins in proteomes to provide extra structural and functional information.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background: Designing novel proteins with site-directed recombination has enormous prospects. By locating effective recombination sites for swapping sequence parts, the probability that hybrid sequences have the desired properties is increased dramatically. The prohibitive requirements for applying current tools led us to investigate machine learning to assist in finding useful recombination sites from amino acid sequence alone. Results: We present STAR, Site Targeted Amino acid Recombination predictor, which produces a score indicating the structural disruption caused by recombination, for each position in an amino acid sequence. Example predictions contrasted with those of alternative tools, illustrate STAR'S utility to assist in determining useful recombination sites. Overall, the correlation coefficient between the output of the experimentally validated protein design algorithm SCHEMA and the prediction of STAR is very high (0.89). Conclusion: STAR allows the user to explore useful recombination sites in amino acid sequences with unknown structure and unknown evolutionary origin. The predictor service is available from http://pprowler.itee.uq.edu.au/star.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The cross-recognition of peptides by cytotoxic T lymphocytes is a key element in immunology and in particular in peptide based immunotherapy. Here we develop three-dimensional (3D) quantitative structure-activity relationships (QSARs) to predict cross-recognition by Melan-A-specific cytotoxic T lymphocytes of peptides bound to HLA A*0201 (hereafter referred to as HLA A2). First, we predict the structure of a set of self- and pathogen-derived peptides bound to HLA A2 using a previously developed ab initio structure prediction approach [Fagerberg et al., J. Mol. Biol., 521-46 (2006)]. Second, shape and electrostatic energy calculations are performed on a 3D grid to produce similarity matrices which are combined with a genetic neural network method [So et al., J. Med. Chem., 4347-59 (1997)] to generate 3D-QSAR models. The models are extensively validated using several different approaches. During the model generation, the leave-one-out cross-validated correlation coefficient (q (2)) is used as the fitness criterion and all obtained models are evaluated based on their q (2) values. Moreover, the best model obtained for a partitioned data set is evaluated by its correlation coefficient (r = 0.92 for the external test set). The physical relevance of all models is tested using a functional dependence analysis and the robustness of the models obtained for the entire data set is confirmed using y-randomization. Finally, the validated models are tested for their utility in the setting of rational peptide design: their ability to discriminate between peptides that only contain side chain substitutions in a single secondary anchor position is evaluated. In addition, the predicted cross-recognition of the mono-substituted peptides is confirmed experimentally in chromium-release assays. These results underline the utility of 3D-QSARs in peptide mimetic design and suggest that the properties of the unbound epitope are sufficient to capture most of the information to determine the cross-recognition.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Streptococcus pyogenes infections remain a health problem in several countries due to poststreptococcal sequelae. We developed a vaccine epitope (StreptInCor) composed of 55 amino acids residues of the C-terminal portion of the M protein that encompasses both T and B cell protective epitopes. The nuclear magnetic resonance (NMR) structure of the StreptInCor peptide showed that the structure was composed of two microdomains linked by an 18-residue alpha-helix. A chemical stability study of the StreptInCor folding/unfolding process using far-UV circular dichroism showed that the structure was chemically stable with respect to pH and the concentration of urea. The T cell epitope is located in the first microdomain and encompasses 11 out of the 18 alpha-helix residues, whereas the B cell epitope is in the second microdomain and showed no alpha-helical structure. The prediction of StreptInCor epitope binding to different HLA class II molecules was evaluated based on an analysis of the 55 residues and the theoretical possibilities for the processed peptides to fit into the P1, P4, P6, and P9 pockets in the groove of several HLA class II molecules. We observed 7 potential sites along the amino acid sequence of StreptInCor that were capable of recognizing HLA class II molecules (DRB1*, DRB3*, DRB4*, and DRB5*). StreptInCoroverlapping peptides induced cellular and humoral immune responses of individuals bearing different HLA class II molecules and could be considered as a universal vaccine epitope.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We describe the impact of subtype differences on the seroreactivity of linear antigenic epitopes in envelope glycoprotein of HIV-1 isolates from different geographical locations. By computer analysis, we predicted potential antigenic sites of envelope glycoprotein (gp120 and gp4l) of this virus. For this purpose, after fetching sequences of proteins of interest from data banks, values of hydrophilicity, flexibility, accessibility, inverted hydrophobicity, and secondary structure were considered. We identified several potential antigenic epitopes in a B subtype strain of envelope glycoprotein of HIV-1 (IIIB). Solid- phase peptide synthesis methods of Merrifield and Fmoc chemistry were used for synthesizing peptides. These synthetic peptides corresponded mainly to the C2, V3 and CD4 binding sites of gp120 and some parts of the ectodomain of gp41. The reactivity of these peptides was tested by ELISA against different HIV-1-positive sera from different locations in India. For two of these predicted epitopes, the corresponding Indian consensus sequences (LAIERYLKQQLLGWG and DIIGDIRQAHCNISEDKWNET) (subtype C) were also synthesized and their reactivity was tested by ELISA. These peptides also distinguished HIV-1-positive sera of Indians with C subtype infections from sera from HIV-negative subjects.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A number of new and newly improved methods for predicting protein structure developed by the Jones–University College London group were used to make predictions for the CASP6 experiment. Structures were predicted with a combination of fold recognition methods (mGenTHREADER, nFOLD, and THREADER) and a substantially enhanced version of FRAGFOLD, our fragment assembly method. Attempts at automatic domain parsing were made using DomPred and DomSSEA, which are based on a secondary structure parsing algorithm and additionally for DomPred, a simple local sequence alignment scoring function. Disorder prediction was carried out using a new SVM-based version of DISOPRED. Attempts were also made at domain docking and “microdomain” folding in order to build complete chain models for some targets.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Dynamically disordered regions appear to be relatively abundant in eukaryotic proteomes. The DISOPRED server allows users to submit a protein sequence, and returns a probability estimate of each residue in the sequence being disordered. The results are sent in both plain text and graphical formats, and the server can also supply predictions of secondary structure to provide further structural information.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The results of applying a fragment-based protein tertiary structure prediction method to the prediction of 14 CASP5 target domains are described. The method is based on the assembly of supersecondary structural fragments taken from highly resolved protein structures using a simulated annealing algorithm. A number of good predictions for proteins with novel folds were produced, although not always as the first model. For two fold recognition targets, FRAGFOLD produced the most accurate model in both cases, despite the fact that the predictions were not based on a template structure. Although clear progress has been made in improving FRAGFOLD since CASP4, the ranking of final models still seems to be the main problem that needs to be addressed before the next CASP experiment

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background: NEP1-like proteins (NLPs) are a novel family of microbial elicitors of plant necrosis. Some NLPs induce a hypersensitive-like response in dicot plants though the basis for this response remains unclear. In addition, the spatial structure and the role of these highly conserved proteins are not known.Results: We predict a 3d-structure for the beta-rich section of the NLPs based on alignments, prediction tools and molecular dynamics. We calculated a consensus sequence from 42 NLPs proteins, predicted its secondary structure and obtained a high quality alignment of this structure and conserved residues with the two Cupin superfamily motifs. The conserved sequence GHRHDWE and several common residues, especially some conserved histidines, in NLPs match closely the two cupin motifs. Besides other common residues shared by dicot Auxin-Binding Proteins (ABPs) and NLPs, an additional conserved histidine found in all dicot ABPs was also found in all NLPs at the same position.Conclusion: We propose that the necrosis inducing protein class belongs to the Cupin superfamily. Based on the 3d-structure, we are proposing some possible functions for the NLPs.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Membrane proteins are a large and important class of proteins. They are responsible for several of the key functions in a living cell, e.g. transport of nutrients and ions, cell-cell signaling, and cell-cell adhesion. Despite their importance it has not been possible to study their structure and organization in much detail because of the difficulty to obtain 3D structures. In this thesis theoretical studies of membrane protein sequences and structures have been carried out by analyzing existing experimental data. The data comes from several sources including sequence databases, genome sequencing projects, and 3D structures. Prediction of the membrane spanning regions by hydrophobicity analysis is a key technique used in several of the studies. A novel method for this is also presented and compared to other methods. The primary questions addressed in the thesis are: What properties are common to all membrane proteins? What is the overall architecture of a membrane protein? What properties govern the integration into the membrane? How many membrane proteins are there and how are they distributed in different organisms? Several of the findings have now been backed up by experiments. An analysis of the large family of G-protein coupled receptors pinpoints differences in length and amino acid composition of loops between proteins with and without a signal peptide and also differences between extra- and intracellular loops. Known 3D structures of membrane proteins have been studied in terms of hydrophobicity, distribution of secondary structure and amino acid types, position specific residue variability, and differences between loops and membrane spanning regions. An analysis of several fully and partially sequenced genomes from eukaryotes, prokaryotes, and archaea has been carried out. Several differences in the membrane protein content between organisms were found, the most important being the total number of membrane proteins and the distribution of membrane proteins with a given number of transmembrane segments. Of the properties that were found to be similar in all organisms, the most obvious is the bias in the distribution of positive charges between the extra- and intracellular loops. Finally, an analysis of homologues to membrane proteins with known topology uncovered two related, multi-spanning proteins with opposite predicted orientations. The predicted topologies were verified experimentally, providing a first example of "divergent topology evolution".

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The vast majority of known proteins have not yet been experimentally characterized and little is known about their function. The design and implementation of computational tools can provide insight into the function of proteins based on their sequence, their structure, their evolutionary history and their association with other proteins. Knowledge of the three-dimensional (3D) structure of a protein can lead to a deep understanding of its mode of action and interaction, but currently the structures of <1% of sequences have been experimentally solved. For this reason, it became urgent to develop new methods that are able to computationally extract relevant information from protein sequence and structure. The starting point of my work has been the study of the properties of contacts between protein residues, since they constrain protein folding and characterize different protein structures. Prediction of residue contacts in proteins is an interesting problem whose solution may be useful in protein folding recognition and de novo design. The prediction of these contacts requires the study of the protein inter-residue distances related to the specific type of amino acid pair that are encoded in the so-called contact map. An interesting new way of analyzing those structures came out when network studies were introduced, with pivotal papers demonstrating that protein contact networks also exhibit small-world behavior. In order to highlight constraints for the prediction of protein contact maps and for applications in the field of protein structure prediction and/or reconstruction from experimentally determined contact maps, I studied to which extent the characteristic path length and clustering coefficient of the protein contacts network are values that reveal characteristic features of protein contact maps. Provided that residue contacts are known for a protein sequence, the major features of its 3D structure could be deduced by combining this knowledge with correctly predicted motifs of secondary structure. In the second part of my work I focused on a particular protein structural motif, the coiled-coil, known to mediate a variety of fundamental biological interactions. Coiled-coils are found in a variety of structural forms and in a wide range of proteins including, for example, small units such as leucine zippers that drive the dimerization of many transcription factors or more complex structures such as the family of viral proteins responsible for virus-host membrane fusion. The coiled-coil structural motif is estimated to account for 5-10% of the protein sequences in the various genomes. Given their biological importance, in my work I introduced a Hidden Markov Model (HMM) that exploits the evolutionary information derived from multiple sequence alignments, to predict coiled-coil regions and to discriminate coiled-coil sequences. The results indicate that the new HMM outperforms all the existing programs and can be adopted for the coiled-coil prediction and for large-scale genome annotation. Genome annotation is a key issue in modern computational biology, being the starting point towards the understanding of the complex processes involved in biological networks. The rapid growth in the number of protein sequences and structures available poses new fundamental problems that still deserve an interpretation. Nevertheless, these data are at the basis of the design of new strategies for tackling problems such as the prediction of protein structure and function. Experimental determination of the functions of all these proteins would be a hugely time-consuming and costly task and, in most instances, has not been carried out. As an example, currently, approximately only 20% of annotated proteins in the Homo sapiens genome have been experimentally characterized. A commonly adopted procedure for annotating protein sequences relies on the "inheritance through homology" based on the notion that similar sequences share similar functions and structures. This procedure consists in the assignment of sequences to a specific group of functionally related sequences which had been grouped through clustering techniques. The clustering procedure is based on suitable similarity rules, since predicting protein structure and function from sequence largely depends on the value of sequence identity. However, additional levels of complexity are due to multi-domain proteins, to proteins that share common domains but that do not necessarily share the same function, to the finding that different combinations of shared domains can lead to different biological roles. In the last part of this study I developed and validate a system that contributes to sequence annotation by taking advantage of a validated transfer through inheritance procedure of the molecular functions and of the structural templates. After a cross-genome comparison with the BLAST program, clusters were built on the basis of two stringent constraints on sequence identity and coverage of the alignment. The adopted measure explicity answers to the problem of multi-domain proteins annotation and allows a fine grain division of the whole set of proteomes used, that ensures cluster homogeneity in terms of sequence length. A high level of coverage of structure templates on the length of protein sequences within clusters ensures that multi-domain proteins when present can be templates for sequences of similar length. This annotation procedure includes the possibility of reliably transferring statistically validated functions and structures to sequences considering information available in the present data bases of molecular functions and structures.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In many application domains data can be naturally represented as graphs. When the application of analytical solutions for a given problem is unfeasible, machine learning techniques could be a viable way to solve the problem. Classical machine learning techniques are defined for data represented in a vectorial form. Recently some of them have been extended to deal directly with structured data. Among those techniques, kernel methods have shown promising results both from the computational complexity and the predictive performance point of view. Kernel methods allow to avoid an explicit mapping in a vectorial form relying on kernel functions, which informally are functions calculating a similarity measure between two entities. However, the definition of good kernels for graphs is a challenging problem because of the difficulty to find a good tradeoff between computational complexity and expressiveness. Another problem we face is learning on data streams, where a potentially unbounded sequence of data is generated by some sources. There are three main contributions in this thesis. The first contribution is the definition of a new family of kernels for graphs based on Directed Acyclic Graphs (DAGs). We analyzed two kernels from this family, achieving state-of-the-art results from both the computational and the classification point of view on real-world datasets. The second contribution consists in making the application of learning algorithms for streams of graphs feasible. Moreover,we defined a principled way for the memory management. The third contribution is the application of machine learning techniques for structured data to non-coding RNA function prediction. In this setting, the secondary structure is thought to carry relevant information. However, existing methods considering the secondary structure have prohibitively high computational complexity. We propose to apply kernel methods on this domain, obtaining state-of-the-art results.