36 resultados para Protein Sequence Analysis
em Consorci de Serveis Universitaris de Catalunya (CSUC), Spain
Resumo:
A genomic clone (p268c) coding for the 28 kD storage protein Zc2 from maize endosperm has been isolated and sequenced.
Resumo:
A genomic clone (p268c) coding for the 28 kD storage protein Zc2 from maize endosperm has been isolated and sequenced.
Resumo:
The genetic diversity of three temperate fruit tree phytoplasmas ‘Candidatus Phytoplasma prunorum’, ‘Ca. P. mali’ and ‘Ca. P. pyri’ has been established by multilocus sequence analysis. Among the four genetic loci used, the genes imp and aceF distinguished 30 and 24 genotypes, respectively, and showed the highest variability. Percentage of substitution for imp ranged from 50 to 68% according to species. Percentage of substitution varied between 9 and 12% for aceF, whereas it was between 5 and 6% for pnp and secY. In the case of ‘Ca P. prunorum’ the three most prevalent aceF genotypes were detected in both plants and insect vectors, confirming that the prevalent isolates are propagated by insects. The four isolates known to be hypo-virulent had the same aceF sequence, indicating a possible monophyletic origin. Haplotype network reconstructed by eBURST revealed that among the 34 haplotypes of ‘Ca. P. prunorum’, the four hypo-virulent isolates also grouped together in the same clade. Genotyping of some Spanish and Azerbaijanese ‘Ca. P. pyri’ isolates showed that they shared some alleles with ‘Ca. P. prunorum’, supporting for the first time to our knowledge, the existence of inter-species recombination between these two species.
Resumo:
Comparative analysis of gene fragments of six housekeeping loci, distributed around the two chromosomes of Vibrio cholerae, has been carried out for a collection of 29 V. cholerae O139 Bengal strains isolated from India during the first epidemic period (1992 to 1993). A toxigenic O1 ElTor strain from the seventh pandemic and an environmental non-O1/non-O139 strain were also included in this study. All loci studied were polymorphic, with a small number of polymorphic sites in the sequenced fragments. The genetic diversity determined for our O139 population is concordant with a previous multilocus enzyme electrophoresis study in which we analyzed the same V. cholerae O139 strains. In both studies we have found a higher genetic diversity than reported previously in other molecular studies. The results of the present work showed that O139 strains clustered in several lineages of the dendrogram generated from the matrix of allelic mismatches between the different genotypes, a finding which does not support the hypothesis previously reported that the O139 serogroup is a unique clone. The statistical analysis performed in the V. cholerae O139 isolates suggested a clonal population structure. Moreover, the application of the Sawyer's test and split decomposition to detect intragenic recombination in the sequenced gene fragments did not indicate the existence of recombination in our O139 population.
Resumo:
Turbot (Scophthalmus maximus L.) is an important aquacultural resource both in Europe and Asia. However, there is little information on gene sequences available in public databases. Currently, one of the main problems affecting the culture of this flatfish is mortality due to several pathogens, especially viral diseases which are not treatable. In order to identify new genes involved in immune defense, we conducted 454-pyrosequencing of the turbot transcriptome after different immune stimulations.
Resumo:
Selenoproteins are a diverse group of proteinsusually misidentified and misannotated in sequencedatabases. The presence of an in-frame UGA (stop)codon in the coding sequence of selenoproteingenes precludes their identification and correctannotation. The in-frame UGA codons are recodedto cotranslationally incorporate selenocysteine,a rare selenium-containing amino acid. The developmentof ad hoc experimental and, more recently,computational approaches have allowed the efficientidentification and characterization of theselenoproteomes of a growing number of species.Today, dozens of selenoprotein families have beendescribed and more are being discovered in recentlysequenced species, but the correct genomic annotationis not available for the majority of thesegenes. SelenoDB is a long-term project that aims toprovide, through the collaborative effort of experimentaland computational researchers, automaticand manually curated annotations of selenoproteingenes, proteins and SECIS elements. Version 1.0 ofthe database includes an initial set of eukaryoticgenomic annotations, with special emphasis on thehuman selenoproteome, for immediate inspectionby selenium researchers or incorporation into moregeneral databases. SelenoDB is freely available athttp://www.selenodb.org.
Resumo:
Background: Amino acid tandem repeats are found in nearly one-fifth of human proteins. Abnormal expansion of these regions is associated with several human disorders. To gain further insight into the mutational mechanisms that operate in this type of sequence, we have analyzed a large number of mutation variants derived from human expressed sequence tags (ESTs).Results: We identified 137 polymorphic variants in 115 different amino acid tandem repeats. Of these, 77 contained amino acid substitutions and 60 contained gaps (expansions or contractions of the repeat unit). The analysis showed that at least about 21% of the repeats might be polymorphic in humans. We compared the mutations found in different types of amino acid repeats and in adjacent regions. Overall, repeats showed a five-fold increase in the number of gap mutations compared to adjacent regions, reflecting the action of slippage within the repetitive structures. Gap and substitution mutations were very differently distributed between different amino acid repeat types. Among repeats containing gap variants we identified several disease and candidate disease genes.Conclusion: This is the first report at a genome-wide scale of the types of mutations occurring in the amino acid repeat component of the human proteome. We show that the mutational dynamics of different amino acid repeat types are very diverse. We provide a list of loci with highly variable repeat structures, some of which may be potentially involved in disease.
Resumo:
Background: Single nucleotide polymorphisms (SNPs) are the most frequent type of sequence variation between individuals, and represent a promising tool for finding genetic determinants of complex diseases and understanding the differences in drug response. In this regard, it is of particular interest to study the effect of non-synonymous SNPs in the context of biological networks such as cell signalling pathways. UniProt provides curated information about the functional and phenotypic effects of sequence variation, including SNPs, as well as on mutations of protein sequences. However, no strategy has been developed to integrate this information with biological networks, with the ultimate goal of studying the impact of the functional effect of SNPs in the structure and dynamics of biological networks. Results: First, we identified the different challenges posed by the integration of the phenotypic effect of sequence variants and mutations with biological networks. Second, we developed a strategy for the combination of data extracted from public resources, such as UniProt, NCBI dbSNP, Reactome and BioModels. We generated attribute files containing phenotypic and genotypic annotations to the nodes of biological networks, which can be imported into network visualization tools such as Cytoscape. These resources allow the mapping and visualization of mutations and natural variations of human proteins and their phenotypic effect on biological networks (e.g. signalling pathways, protein-protein interaction networks, dynamic models). Finally, an example on the use of the sequence variation data in the dynamics of a network model is presented. Conclusion: In this paper we present a general strategy for the integration of pathway and sequence variation data for visualization, analysis and modelling purposes, including the study of the functional impact of protein sequence variations on the dynamics of signalling pathways. This is of particular interest when the SNP or mutation is known to be associated to disease. We expect that this approach will help in the study of the functional impact of disease-associated SNPs on the behaviour of cell signalling pathways, which ultimately will lead to a better understanding of the mechanisms underlying complex diseases.
Resumo:
Report for the scientific sojourn at the University of Maryland Biotechnology Institute from February to August 2007. Myogenesis of skeletal muscles in vertebrates is controlled by extracellular signalling molecules together with intracellular transcription factors. Among the transcriptional factors, the members of the myogenic regulatory family play important roles regulating skeletal muscle development and growth. To characterize the gene structure and expression of fish myogenin, we have isolated the myogenin genomic gene and cDNA from gilthead seabream (Sparus aurata) and analyzed the genomic structure, pattern of expression and the regulation of musclespecific expression. Sequence analysis revealed that the seabream myogenin shares a similar gene structure with other fish myogenins, with three exons, two introns and the highly conserved bHLH domain. Expression studies demonstrated that myogenin is expressed in both slow and fast muscles as well as in muscle cells in primary culture. In situ hybridization showed that myogenin was specifically expressed in developing somites of seabream embryos. Promoter activity analysis demonstrated that the myogenin promoter could drive green fluorescence protein expression in muscle cells of zebrafish embryos, as well as in myofibers of adult zebrafish and juvenile seabream.
Resumo:
Background: We present the results of EGASP, a community experiment to assess the state-ofthe-art in genome annotation within the ENCODE regions, which span 1% of the human genomesequence. The experiment had two major goals: the assessment of the accuracy of computationalmethods to predict protein coding genes; and the overall assessment of the completeness of thecurrent human genome annotations as represented in the ENCODE regions. For thecomputational prediction assessment, eighteen groups contributed gene predictions. Weevaluated these submissions against each other based on a ‘reference set’ of annotationsgenerated as part of the GENCODE project. These annotations were not available to theprediction groups prior to the submission deadline, so that their predictions were blind and anexternal advisory committee could perform a fair assessment.Results: The best methods had at least one gene transcript correctly predicted for close to 70%of the annotated genes. Nevertheless, the multiple transcript accuracy, taking into accountalternative splicing, reached only approximately 40% to 50% accuracy. At the coding nucleotidelevel, the best programs reached an accuracy of 90% in both sensitivity and specificity. Programsrelying on mRNA and protein sequences were the most accurate in reproducing the manuallycurated annotations. Experimental validation shows that only a very small percentage (3.2%) of the selected 221 computationally predicted exons outside of the existing annotation could beverified.Conclusions: This is the first such experiment in human DNA, and we have followed thestandards established in a similar experiment, GASP1, in Drosophila melanogaster. We believe theresults presented here contribute to the value of ongoing large-scale annotation projects and shouldguide further experimental methods when being scaled up to the entire human genome sequence.
Resumo:
Background: The RPS4 gene codifies for ribosomal protein S4, a very well-conserved protein present in all kingdoms. In primates, RPS4 is codified by two functional genes located on both sex chromosomes: the RPS4X and RPS4Y genes. In humans, RPS4Y is duplicated and the Y chromosome therefore carries a third functional paralog: RPS4Y2, which presents a testis-specific expression pattern. Results: DNA sequence analysis of the intronic and cDNA regions of RPS4Y genes from species covering the entire primate phylogeny showed that the duplication event leading to the second Y-linked copy occurred after the divergence of New World monkeys, about 35 million years ago. Maximum likelihood analyses of the synonymous and non-synonymous substitutions revealed that positive selection was acting on RPS4Y2 gene in the human lineage, which represents the first evidence of positive selection on a ribosomal protein gene. Putative positive amino acid replacements affected the three domains of the protein: one of these changes is located in the KOW protein domain and affects the unique invariable position of this motif, and might thus have a dramatic effect on the protein function.Conclusion: Here, we shed new light on the evolutionary history of RPS4Y gene family, especially on that of RPS4Y2. The results point that the RPS4Y1 gene might be maintained to compensate gene dosage between sexes, while RPS4Y2 might have acquired a new function, at least in the lineage leading to humans.
Resumo:
Background: Annotations of completely sequenced genomes reveal that nearly half of the genes identified are of unknown function, and that some belong to uncharacterized gene families. To help resolve such issues, information can be obtained from the comparative analysis of homologous genes in model organisms. Results: While characterizing genes from the retinitis pigmentosa locus RP26 at 2q31-q33, we have identified a new gene, ORMDL1, that belongs to a novel gene family comprising three genes in humans (ORMDL1, ORMDL2 and ORMDL3), and homologs in yeast, microsporidia, plants, Drosophila, urochordates and vertebrates. The human genes are expressed ubiquitously in adult and fetal tissues. The Drosophila ORMDL homolog is also expressed throughout embryonic and larval stages, particularly in ectodermally derived tissues. The ORMDL genes encode transmembrane proteins anchored in the endoplasmic reticulum (ER). Double knockout of the two Saccharomyces cerevisiae homologs leads to decreased growth rate and greater sensitivity to tunicamycin and dithiothreitol. Yeast mutants can be rescued by human ORMDL homologs. Conclusions: From protein sequence comparisons we have defined a novel gene family, not previously recognized because of the absence of a characterized functional signature. The sequence conservation of this family from yeast to vertebrates, the maintenance of duplicate copies in different lineages, the ubiquitous pattern of expression in human and Drosophila, the partial functional redundancy of the yeast homologs and phenotypic rescue by the human homologs, strongly support functional conservation. Subcellular localization and the response of yeast mutants to specific agents point to the involvement of ORMDL in protein folding in the ER.
Resumo:
Adenoviruses of primates include human (HAdV) and simian (SAdV) isolates classified into 8 species (Human Adenovirus A to G, and Simian Adenovirus A). In this study, a novel adenovirus was isolated from a colony of cynomolgus macaques (Macaca fascicularis) and subcultured in VERO cells. Its complete genome was purified and a region encompassing the hexon gene, the protease gene, the DNA binding protein (DBP) and the 100 kDa protein was amplified by PCR and sequenced by primer walking. Sequence analysis of these four genes showed that the new isolate had 80% identity to other primate adenoviruses and lacked recombination events. The study of the evolutionary relationships of this new monkey AdV based on the combined sequences of the four genes supported a close relationship to SAdV-3 and SAdV-6, lineages isolated from Rhesus monkeys. The clade formed by these three types is separated from the remaining clades and establishes a novel branch that is related to species HAdV-A, F and G. However, the genetic distance corresponding to the newly isolated monkey AdV considerably differs from these as to belong to a new, not yet established species. Results presented here widen our knowledge on SAdV and represents an important contribution to the understanding of the evolutionary history of primate adenoviruses.
Resumo:
Adenoviruses of primates include human (HAdV) and simian (SAdV) isolates classified into 8 species (Human Adenovirus A to G, and Simian Adenovirus A). In this study, a novel adenovirus was isolated from a colony of cynomolgus macaques (Macaca fascicularis) and subcultured in VERO cells. Its complete genome was purified and a region encompassing the hexon gene, the protease gene, the DNA binding protein (DBP) and the 100 kDa protein was amplified by PCR and sequenced by primer walking. Sequence analysis of these four genes showed that the new isolate had 80% identity to other primate adenoviruses and lacked recombination events. The study of the evolutionary relationships of this new monkey AdV based on the combined sequences of the four genes supported a close relationship to SAdV-3 and SAdV-6, lineages isolated from Rhesus monkeys. The clade formed by these three types is separated from the remaining clades and establishes a novel branch that is related to species HAdV-A, F and G. However, the genetic distance corresponding to the newly isolated monkey AdV considerably differs from these as to belong to a new, not yet established species. Results presented here widen our knowledge on SAdV and represents an important contribution to the understanding of the evolutionary history of primate adenoviruses.
Resumo:
Gene set enrichment (GSE) analysis is a popular framework for condensing information from gene expression profiles into a pathway or signature summary. The strengths of this approach over single gene analysis include noise and dimension reduction, as well as greater biological interpretability. As molecular profiling experiments move beyond simple case-control studies, robust and flexible GSE methodologies are needed that can model pathway activity within highly heterogeneous data sets. To address this challenge, we introduce Gene Set Variation Analysis (GSVA), a GSE method that estimates variation of pathway activity over a sample population in an unsupervised manner. We demonstrate the robustness of GSVA in a comparison with current state of the art sample-wise enrichment methods. Further, we provide examples of its utility in differential pathway activity and survival analysis. Lastly, we show how GSVA works analogously with data from both microarray and RNA-seq experiments. GSVA provides increased power to detect subtle pathway activity changes over a sample population in comparison to corresponding methods. While GSE methods are generally regarded as end points of a bioinformatic analysis, GSVA constitutes a starting point to build pathway-centric models of biology. Moreover, GSVA contributes to the current need of GSE methods for RNA-seq data. GSVA is an open source software package for R which forms part of the Bioconductor project and can be downloaded at http://www.bioconductor.org.