976 resultados para PROTEIN SEQUENCES


Relevância:

60.00% 60.00%

Publicador:

Resumo:

BACKGROUND: We present the results of EGASP, a community experiment to assess the state-of-the-art in genome annotation within the ENCODE regions, which span 1% of the human genome sequence. The experiment had two major goals: the assessment of the accuracy of computational methods to predict protein coding genes; and the overall assessment of the completeness of the current human genome annotations as represented in the ENCODE regions. For the computational prediction assessment, eighteen groups contributed gene predictions. We evaluated these submissions against each other based on a 'reference set' of annotations generated as part of the GENCODE project. These annotations were not available to the prediction groups prior to the submission deadline, so that their predictions were blind and an external advisory committee could perform a fair assessment. RESULTS: The best methods had at least one gene transcript correctly predicted for close to 70% of the annotated genes. Nevertheless, the multiple transcript accuracy, taking into account alternative splicing, reached only approximately 40% to 50% accuracy. At the coding nucleotide level, the best programs reached an accuracy of 90% in both sensitivity and specificity. Programs relying on mRNA and protein sequences were the most accurate in reproducing the manually curated annotations. Experimental validation shows that only a very small percentage (3.2%) of the selected 221 computationally predicted exons outside of the existing annotation could be verified. CONCLUSION: This is the first such experiment in human DNA, and we have followed the standards established in a similar experiment, GASP1, in Drosophila melanogaster. We believe the results presented here contribute to the value of ongoing large-scale annotation projects and should guide further experimental methods when being scaled up to the entire human genome sequence.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Dermatophytes are human and animal pathogenic fungi which cause cutaneous infections and grow exclusively in the stratum corneum, nails and hair. In a culture medium containing soy proteins as sole nitrogen source a substantial proteolytic activity was secreted by Trichophyton rubrum, Trichophyton mentagrophytes and Microsporum canis. This proteolytic activity was 55-75 % inhibited by o-phenanthroline, attesting that metalloproteases were secreted by all three species. Using a consensus probe constructed on previously characterized genes encoding metalloproteases (MEP) of the M36 fungalysin family in Aspergillus fumigatus, Aspergillus oryzae and M. canis, a five-member MEP family was isolated from genomic libraries of T. rubrum, T. mentagrophytes and M. canis. A phylogenetic analysis of genomic and protein sequences revealed a robust tree consisting of five main clades, each of them including a MEP sequence type from each dermatophyte species. Each MEP type was remarkably conserved across species (72-97 % amino acid sequence identity). The tree topology clearly indicated that the multiplication of MEP genes in dermatophytes occurred prior to species divergence. In culture medium containing soy proteins as a sole nitrogen source secreted Meps accounted for 19-36 % of total secreted protein extracts; characterization of protein bands by proteolysis and mass spectrometry revealed that the three dermatophyte species secreted two Meps (Mep3 and Mep4) encoded by orthologous genes.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Bacterial classification is a long-standing problem for taxonomists and species definition itself is constantly debated among specialists. The classification of strict intracellular bacteria such as members of the order Chlamydiales mainly relies on DNA- or protein-based phylogenetic reconstructions because these organisms exhibit few phenotypic differences and are difficult to culture. The availability of full genome sequences allows the comparison of the performance of conserved protein sequences to reconstruct Chlamydiales phylogeny. This approach permits the identification of markers that maximize the phylogenetic signal and the robustness of the inferred tree. In this study, a set of 424 core proteins was identified and concatenated to reconstruct a reference species tree. Although individual protein trees present variable topologies, we detected only few cases of incongruence with the reference species tree, which were due to horizontal gene transfers. Detailed analysis of the phylogenetic information of individual protein sequences (i) showed that phylogenies based on single randomly chosen core proteins are not reliable and (ii) led to the identification of twenty taxonomically highly reliable proteins, allowing the reconstruction of a robust tree close to the reference species tree. We recommend using these protein sequences to precisely classify newly discovered isolates at the family, genus and species levels.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Modeling methods to derive 3D-structure of proteins have been recently developed. Protein homology-modeling, also known as comparative protein modeling, is nowadays the most accurate protein modeling method. This technique can produce useful models for about an order of magnitude more protein sequences than there have been structures determined by experiment in the same amount of time. All current protein homology-modeling methods consist of four sequential steps: fold assignment and template selection, template-target alignment, model building, and model evaluation. In this paper we discuss in some detail the protein-homology paradigm, its predictive power and its limitations.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Weeds can act as important reservoirs for viruses. Solanum americanum (Black nightshade) is a common weed in Brazil and samples showing mosaic were collected from sweet pepper crops to verify the presence of viruses. One sample showed mixed infection between Cucumber mosaic virus (CMV) and Potato virus Y (PVY) and one sample showed simple infection by PVY. Both virus species were transmitted by plant extract and caused mosaic in tomato (Solanum lycopersicum cv. Santa Clara), sweet pepper (Capsicum annuum cv. Magda), Nicotiana benthamiana and N. tabaccum TNN, and local lesions on Chenopodium quinoa, C. murale and C. amaranticolor. The coat protein sequences for CMV and PVY found in S. americanum are phylogenetically more related to isolates from tomato. We conclude that S. americanum can act as a reservoir for different viruses during and between sweet pepper crop seasons.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The target of any immunization is to activate and expand lymphocyte clones with the desired recognition specificity and the necessary effector functions. In gene, recombinant and peptide vaccines, the immunogen is a single protein or a small assembly of epitopes from antigenic proteins. Since most immune responses against protein and peptide antigens are T-cell dependent, the molecular target of such vaccines is to generate at least 50-100 complexes between MHC molecule and the antigenic peptide per antigen-presenting cell, sensitizing a T cell population of appropriate clonal size and effector characteristics. Thus, the immunobiology of antigen recognition by T cells must be taken into account when designing new generation peptide- or gene-based vaccines. Since T cell recognition is MHC-restricted, and given the wide polymorphism of the different MHC molecules, distinct epitopes may be recognized by different individuals in the population. Therefore, the issue of whether immunization will be effective in inducing a protective immune response, covering the entire target population, becomes an important question. Many pathogens have evolved molecular mechanisms to escape recognition by the immune system by variation of antigenic protein sequences. In this short review, we will discuss the several concepts related to selection of amino acid sequences to be included in DNA and peptide vaccines.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Les séquences protéiques naturelles sont le résultat net de l’interaction entre les mécanismes de mutation, de sélection naturelle et de dérive stochastique au cours des temps évolutifs. Les modèles probabilistes d’évolution moléculaire qui tiennent compte de ces différents facteurs ont été substantiellement améliorés au cours des dernières années. En particulier, ont été proposés des modèles incorporant explicitement la structure des protéines et les interdépendances entre sites, ainsi que les outils statistiques pour évaluer la performance de ces modèles. Toutefois, en dépit des avancées significatives dans cette direction, seules des représentations très simplifiées de la structure protéique ont été utilisées jusqu’à présent. Dans ce contexte, le sujet général de cette thèse est la modélisation de la structure tridimensionnelle des protéines, en tenant compte des limitations pratiques imposées par l’utilisation de méthodes phylogénétiques très gourmandes en temps de calcul. Dans un premier temps, une méthode statistique générale est présentée, visant à optimiser les paramètres d’un potentiel statistique (qui est une pseudo-énergie mesurant la compatibilité séquence-structure). La forme fonctionnelle du potentiel est par la suite raffinée, en augmentant le niveau de détails dans la description structurale sans alourdir les coûts computationnels. Plusieurs éléments structuraux sont explorés : interactions entre pairs de résidus, accessibilité au solvant, conformation de la chaîne principale et flexibilité. Les potentiels sont ensuite inclus dans un modèle d’évolution et leur performance est évaluée en termes d’ajustement statistique à des données réelles, et contrastée avec des modèles d’évolution standards. Finalement, le nouveau modèle structurellement contraint ainsi obtenu est utilisé pour mieux comprendre les relations entre niveau d’expression des gènes et sélection et conservation de leur séquence protéique.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

BACKGROUND: In order to maintain the most comprehensive structural annotation databases we must carry out regular updates for each proteome using the latest profile-profile fold recognition methods. The ability to carry out these updates on demand is necessary to keep pace with the regular updates of sequence and structure databases. Providing the highest quality structural models requires the most intensive profile-profile fold recognition methods running with the very latest available sequence databases and fold libraries. However, running these methods on such a regular basis for every sequenced proteome requires large amounts of processing power.In this paper we describe and benchmark the JYDE (Job Yield Distribution Environment) system, which is a meta-scheduler designed to work above cluster schedulers, such as Sun Grid Engine (SGE) or Condor. We demonstrate the ability of JYDE to distribute the load of genomic-scale fold recognition across multiple independent Grid domains. We use the most recent profile-profile version of our mGenTHREADER software in order to annotate the latest version of the Human proteome against the latest sequence and structure databases in as short a time as possible. RESULTS: We show that our JYDE system is able to scale to large numbers of intensive fold recognition jobs running across several independent computer clusters. Using our JYDE system we have been able to annotate 99.9% of the protein sequences within the Human proteome in less than 24 hours, by harnessing over 500 CPUs from 3 independent Grid domains. CONCLUSION: This study clearly demonstrates the feasibility of carrying out on demand high quality structural annotations for the proteomes of major eukaryotic organisms. Specifically, we have shown that it is now possible to provide complete regular updates of profile-profile based fold recognition models for entire eukaryotic proteomes, through the use of Grid middleware such as JYDE.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Protein sequences from characterized type III secretion (TTS) systems were used as probes in silico to identify several TTS gene homologs in the genome sequence of Brucella suis biovar 1 strain 1330. Four of the genes, named flhB, fliP, fliR, and fliF on the basis of greatest homologies to known flagellar apparatus proteins, were targeted in PCR and hybridization assays to determine their distribution among other Brucella nomen species and biovars. The results indicated that flhB, fliP, fliR and fliF are present in Brucella melitensis, Brucella ovis, and Brucella suis biovars 1, 2 and 3. Similar homologos have been reported previously in Brucella abortus. Using RT-PCR assays, we were unable to detect any expression of these genes. It is not yet known whether the genes are the cryptic remnants of a flagellar system or are actively involved in a process contributing to pathogenicity or previously undetected motility, but they are distributed widely in Brucella and merit further study to determine their role.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Musca domestica larvae display in anterior and middle midgut contents, a proteolytic activity with pH optimum of 3.0-3.5 and kinetic properties like cathepsin D. Three cDNAs coding for preprocathepsin D-like proteinases (ppCAD 1, ppCAD 2, ppCAD 3) were cloned from a M. domestica midgut cDNA library. The coded protein sequences included the signal peptide, propeptide and mature enzyme that has all conserved catalytic and substrate binding residues found in bovine lysosomal cathepsin D. Nevertheless, ppCAD 2 and ppCAD 3 lack the characteristic proline loop and glycosylation sites. A comparison among the sequences of cathepsin D-like enzymes from some vertebrates and those found in M. domestica and in the genomes of Aedes aegypti, Drosophila melanogaster, Tribolium castaneum, and Bombyx mori showed that only flies have enzymes lacking the proline loop (as defined by the motif: DxPxPx(G/A)P), thus resembling vertebrate pepsin. ppCAD 3 should correspond to the digestive cathepsin D-like proteinase (CAD) found in enzyme assays because: (1) it seems to be the most expressed CAD, based on the frequency of ESTs found. (2) The mRNA for CAD 3 is expressed only in the anterior and proximal middle midgut. (3) Recombinant procathepsin D-like proteinase (pCAD 3), after auto-activation has a pH optimum of 2.5-3.0 that is close to the luminal pH of M. domestica midgut. (4) Immunoblots of proteins from different tissues revealed with anti-pCAD 3 serum were positive only in samples of anterior and middle midgut tissue and contents. (5) CAD 3 is localized with immunogold inside secretory vesicles and around microvilli in anterior and middle midguit cells. The data support the view that on adapting to deal with a bacteria-rich food in an acid midgut region, M. domestica digestive CAD resulted from the same archetypical gene as the intracellular cathepsin D, paralleling what happened with vertebrates. The lack of the proline loop may be somehow associated with the extracellular role of both pepsin and digestive CAD 3. (C) 2009 Elsevier Ltd. All rights reserved.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Nowadays, classifying proteins in structural classes, which concerns the inference of patterns in their 3D conformation, is one of the most important open problems in Molecular Biology. The main reason for this is that the function of a protein is intrinsically related to its spatial conformation. However, such conformations are very difficult to be obtained experimentally in laboratory. Thus, this problem has drawn the attention of many researchers in Bioinformatics. Considering the great difference between the number of protein sequences already known and the number of three-dimensional structures determined experimentally, the demand of automated techniques for structural classification of proteins is very high. In this context, computational tools, especially Machine Learning (ML) techniques, have become essential to deal with this problem. In this work, ML techniques are used in the recognition of protein structural classes: Decision Trees, k-Nearest Neighbor, Naive Bayes, Support Vector Machine and Neural Networks. These methods have been chosen because they represent different paradigms of learning and have been widely used in the Bioinfornmatics literature. Aiming to obtain an improvment in the performance of these techniques (individual classifiers), homogeneous (Bagging and Boosting) and heterogeneous (Voting, Stacking and StackingC) multiclassification systems are used. Moreover, since the protein database used in this work presents the problem of imbalanced classes, artificial techniques for class balance (Undersampling Random, Tomek Links, CNN, NCL and OSS) are used to minimize such a problem. In order to evaluate the ML methods, a cross-validation procedure is applied, where the accuracy of the classifiers is measured using the mean of classification error rate, on independent test sets. These means are compared, two by two, by the hypothesis test aiming to evaluate if there is, statistically, a significant difference between them. With respect to the results obtained with the individual classifiers, Support Vector Machine presented the best accuracy. In terms of the multi-classification systems (homogeneous and heterogeneous), they showed, in general, a superior or similar performance when compared to the one achieved by the individual classifiers used - especially Boosting with Decision Tree and the StackingC with Linear Regression as meta classifier. The Voting method, despite of its simplicity, has shown to be adequate for solving the problem presented in this work. The techniques for class balance, on the other hand, have not produced a significant improvement in the global classification error. Nevertheless, the use of such techniques did improve the classification error for the minority class. In this context, the NCL technique has shown to be more appropriated

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The isolate AF199 of Lettuce mosaic virus (LMV, genus Potyvirus) causes local lesions followed by systemic wilting and plant death in the lettuce cultivars Ithaca and Vanguard 75. Analysis of the phenotype of virus chimeras revealed that a region within the PI protein coding region (nucleotides 112-386 in the viral genome) and/or another one within the CI protein coding region (nucleoticles 5496-5855) are sufficient together to cause the lethal wilting in Ithaca, but not in Vanguard 75. This indicates that the determinants of this particular symptom are different in these two lettuce cultivars. The wilting phenotype was not directly correlated with differences in the deduced amino acid sequence of these two regions. Furthermore, transient expression of the LMV-AF 199 proteins, separately or in combination, did not induce local necrosis or any other visible reaction in the plants. Together, these results Suggest that the systemic wilting reaction might be Clue to RNA rather than protein sequences. (c) 2004 Elsevier B.V. All rights reserved.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The phylogeny is one of the main activities of the modern taxonomists and a way to reconstruct the history of the life through comparative analysis of these sequences stored in their genomes aimed find any justification for the origin or evolution of them. Among the sequences with a high level of conservation are the genes of repair because it is important for the conservation and maintenance of genetic stability. Hence, variations in repair genes, as the genes of the nucleotide excision repair (NER), may indicate a possible gene transfer between species. This study aimed to examine the evolutionary history of the components of the NER. For this, sequences of UVRA, UVRB, UVRC and XPB were obtained from GenBank by Blast-p, considering 10-15 as cutoff to create a database. Phylogenetic studies were done using algorithms in PAUP programs, BAYES and PHYLIP package. Phylogenetic trees were build with protein sequences and with sequences of 16S ribosomal RNA for comparative analysis by the methods of parsimony, likelihood and Bayesian. The XPB tree shows that archaeal´s XPB helicases are similar to eukaryotic helicases. According to this data, we infer that the eukaryote nucleotide excision repair system had appeared in Archaea. At UVRA, UVRB and UVRC trees was found a monophyletic group formed by three species of epsilonproteobacterias class, three species of mollicutes class and archaeabacterias of Methanobacteria and Methanococci classes. This information is supported by a tree obtained with the proteins, UVRA, UVRB and UVRC concatenated. Thus, although there are arguments in the literature defending the horizontal transfer of the system uvrABC of bacteria to archaeabacterias, the analysis made in this study suggests that occurred a vertical transfer, from archaeabacteria, of both the NER genes: uvrABC and XPs. According the parsimony, this is the best way because of the occurrence of monophyletic groups, the time of divergence of classes and number of archaeabacterias species with uvrABC system

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Within about 30 years the Brazilian buffalo (Bubalus bubalis) herd will reach approximately 50 million head as a result of the great adaptive capacity of these animals to tropical climates, together with the good productive and reproductive potential which make these animals an important animal protein source for poor and developing countries. The myostatin gene (GDF8) is important in the physiology of stock animals because its product produces a direct effect on muscle development and consequently also on meat production. The myostatin sequence is known in several mammalian species and shows a high degree of amino acid sequence conservation, although the presence of non-silent and silent changes in the coding sequences and several alterations in the introns and untranslated regions have been identified. The objective of our work was to characterize the myostatin coding regions of B. bubalis (Murrah breed) and to compare them with the Bos taurus regions looking for variations in nucleotide and protein sequences. In this way, we were able to identify 12 variations at DNA level and five alterations on the presumed myostatin protein sequence as compared to non double-muscled bovine sequences.