935 resultados para Complete Genome Sequence


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Conventional methods of gene prediction rely on the recognition of DNA-sequence signals, the coding potential or the comparison of a genomic sequence with a cDNA, EST, or protein database. Reasons for limited accuracy in many circumstances are species-specific training and the incompleteness of reference databases. Lately, comparative genome analysis has attracted increasing attention. Several analysis tools that are based on human/mouse comparisons are already available. Here, we present a program for the prediction of protein-coding genes, termed SGP-1 (Syntenic Gene Prediction), which is based on the similarity of homologous genomic sequences. In contrast to most existing tools, the accuracy of SGP-1 depends little on species-specific properties such as codon usage or the nucleotide distribution. SGP-1 may therefore be applied to nonstandard model organisms in vertebrates as well as in plants, without the need for extensive parameter training. In addition to predicting genes in large-scale genomic sequences, the program may be useful to validate gene structure annotations from databases. To this end, SGP-1 output also contains comparisons between predicted and annotated gene structures in HTML format. The program can be accessed via a Web server at http://soft.ice.mpg.de/sgp-1. The source code, written in ANSI C, is available on request from the authors.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Functional RNA structures play an important role both in the context of noncoding RNA transcripts as well as regulatory elements in mRNAs. Here we present a computational study to detect functional RNA structures within the ENCODE regions of the human genome. Since structural RNAs in general lack characteristic signals in primary sequence, comparative approaches evaluating evolutionary conservation of structures are most promising. We have used three recently introduced programs based on either phylogenetic–stochastic context-free grammar (EvoFold) or energy directed folding (RNAz and AlifoldZ), yielding several thousand candidate structures (corresponding to ∼2.7% of the ENCODE regions). EvoFold has its highest sensitivity in highly conserved and relatively AU-rich regions, while RNAz favors slightly GC-rich regions, resulting in a relatively small overlap between methods. Comparison with the GENCODE annotation points to functional RNAs in all genomic contexts, with a slightly increased density in 3′-UTRs. While we estimate a significant false discovery rate of ∼50%–70% many of the predictions can be further substantiated by additional criteria: 248 loci are predicted by both RNAz and EvoFold, and an additional 239 RNAz or EvoFold predictions are supported by the (more stringent) AlifoldZ algorithm. Five hundred seventy RNAz structure predictions fall into regions that show signs of selection pressure also on the sequence level (i.e., conserved elements). More than 700 predictions overlap with noncoding transcripts detected by oligonucleotide tiling arrays. One hundred seventy-five selected candidates were tested by RT-PCR in six tissues, and expression could be verified in 43 cases (24.6%).

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The goals of the human genome project did not include sequencing of the heterochromatic regions. We describe here an initial sequence of 1.1 Mb of the short arm of human chromosome 21 (HSA21p), estimated to be 10% of 21p. This region contains extensive euchromatic-like sequence and includes on average one transcript every 100 kb. These transcripts show multiple inter- and intrachromosomal copies, and extensive copy number and sequence variability. The sequencing of the "heterochromatic" regions of the human genome is likely to reveal many additional functional elements and provide important evolutionary information.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The “one-gene, one-protein” rule, coined by Beadle and Tatum, has been fundamental to molecular biology. The rule implies that the genetic complexity of an organism depends essentially on its gene number. The discovery, however, that alternative gene splicing and transcription are widespread phenomena dramatically altered our understanding of the genetic complexity of higher eukaryotic organisms; in these, a limited number of genes may potentially encode a much larger number of proteins. Here we investigate yet another phenomenon that may contribute to generate additional protein diversity. Indeed, by relying on both computational and experimental analysis, we estimate that at least 4%–5% of the tandem gene pairs in the human genome can be eventually transcribed into a single RNA sequence encoding a putative chimeric protein. While the functional significance of most of these chimeric transcripts remains to be determined, we provide strong evidence that this phenomenon does not correspond to mere technical artifacts and that it is a common mechanism with the potential of generating hundreds of additional proteins in the human genome.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Understanding the molecular mechanisms responsible for the regulation of the transcriptome present in eukaryotic cells isone of the most challenging tasks in the postgenomic era. In this regard, alternative splicing (AS) is a key phenomenoncontributing to the production of different mature transcripts from the same primary RNA sequence. As a plethora ofdifferent transcript forms is available in databases, a first step to uncover the biology that drives AS is to identify thedifferent types of reflected splicing variation. In this work, we present a general definition of the AS event along with anotation system that involves the relative positions of the splice sites. This nomenclature univocally and dynamically assignsa specific ‘‘AS code’’ to every possible pattern of splicing variation. On the basis of this definition and the correspondingcodes, we have developed a computational tool (AStalavista) that automatically characterizes the complete landscape of ASevents in a given transcript annotation of a genome, thus providing a platform to investigate the transcriptome diversityacross genes, chromosomes, and species. Our analysis reveals that a substantial part—in human more than a quarter—ofthe observed splicing variations are ignored in common classification pipelines. We have used AStalavista to investigate andto compare the AS landscape of different reference annotation sets in human and in other metazoan species and found thatproportions of AS events change substantially depending on the annotation protocol, species-specific attributes, andcoding constraints acting on the transcripts. The AStalavista system therefore provides a general framework to conductspecific studies investigating the occurrence, impact, and regulation of AS.

Relevância:

30.00% 30.00%

Publicador:

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A number of experimental methods have been reported for estimating the number of genes in a genome, or the closely related coding density of a genome, defined as the fraction of base pairs in codons. Recently, DNA sequence data representative of the genome as a whole have become available for several organisms, making the problem of estimating coding density amenable to sequence analytic methods. Estimates of coding density for a single genome vary widely, so that methods with characterized error bounds have become increasingly desirable. We present a method to estimate the protein coding density in a corpus of DNA sequence data, in which a ‘coding statistic’ is calculated for a large number of windows of the sequence under study, and the distribution of the statistic is decomposed into two normal distributions, assumed to be the distributions of the coding statistic in the coding and noncoding fractions of the sequence windows. The accuracy of the method is evaluated using known data and application is made to the yeast chromosome III sequence and to C.elegans cosmid sequences. It can also be applied to fragmentary data, for example a collection of short sequences determined in the course of STS mapping.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The vast majority of the biology of a newly sequenced genome is inferred from the set of encoded proteins. Predicting this set is therefore invariably the first step after the completion of the genome DNA sequence. Here we review the main computational pipelines used to generate the human reference protein-coding gene sets.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Selenocysteine (Sec) is co-translationally inserted into selenoproteins in response to codon UGA with the help of the selenocysteine insertion sequence (SECIS) element. The number of selenoproteins in animals varies, with humans having 25 and mice having 24 selenoproteins. To date, however, only one selenoprotein, thioredoxin reductase, has been detected in Caenorhabditis elegans, and this enzyme contains only one Sec. Here, we characterize the selenoproteomes of C.elegans and Caenorhabditis briggsae with three independent algorithms, one searching for pairs of homologous nematode SECIS elements, another searching for Cys- or Sec-containing homologs of potential nematode selenoprotein genes and the third identifying Sec-containing homologs of annotated nematode proteins. These methods suggest that thioredoxin reductase is the only Sec-containing protein in the C.elegans and C.briggsae genomes. In contrast, we identified additional selenoproteins in other nematodes. Assuming that Sec insertion mechanisms are conserved between nematodes and other eukaryotes, the data suggest that nematode selenoproteomes were reduced during evolution, and that in an extreme reduction case Sec insertion systems probably decode only a single UGA codon in C.elegans and C.briggsae genomes. In addition, all detected genes had a rare form of SECIS element containing a guanosine in place of a conserved adenosine present in most other SECIS structures, suggesting that in organisms with small selenoproteomes SECIS elements may change rapidly.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background: A number of studies have used protein interaction data alone for protein function prediction. Here, we introduce a computational approach for annotation of enzymes, based on the observation that similar protein sequences are more likely to perform the same function if they share similar interacting partners. Results: The method has been tested against the PSI-BLAST program using a set of 3,890 protein sequences from which interaction data was available. For protein sequences that align with at least 40% sequence identity to a known enzyme, the specificity of our method in predicting the first three EC digits increased from 80% to 90% at 80% coverage when compared to PSI-BLAST. Conclusion: Our method can also be used in proteins for which homologous sequences with known interacting partners can be detected. Thus, our method could increase 10% the specificity of genome-wide enzyme predictions based on sequence matching by PSI-BLAST alone.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The distribution of transposable elements (TEs) in a genome reflects a balance between insertion rate and selection against new insertions. Understanding the distribution of TEs therefore provides insights into the forces shaping the organization of genomes. Past research has shown that TEs tend to accumulate in genomic regions with low gene density and low recombination rate. However, little is known about the factors modulating insertion rates across the genome and their evolutionary significance. One candidate factor is gene expression, which has been suggested to increase local insertion rate by rendering DNA more accessible. We test this hypothesis by comparing the TE density around germline- and soma-expressed genes in the euchromatin of Drosophila melanogaster. Because only insertions that occur in the germline are transmitted to the next generation, we predicted a higher density of TEs around germline-expressed genes than soma-expressed genes. We show that the rate of TE insertions is greater near germline- than soma-expressed genes. However, this effect is partly offset by stronger selection for genome compactness (against excess noncoding DNA) on germline-expressed genes. We also demonstrate that the local genome organization in clusters of coexpressed genes plays a fundamental role in the genomic distribution of TEs. Our analysis shows that-in addition to recombination rate-the distribution of TEs is shaped by the interaction of gene expression and genome organization. The important role of selection for compactness sheds a new light on the role of TEs in genome evolution. Instead of making genomes grow passively, TEs are controlled by the forces shaping genome compactness, most likely linked to the efficiency of gene expression or its complexity and possibly their interaction with mechanisms of TE silencing.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We evaluated 25 protocol variants of 14 independent computational methods for exon identification, transcript reconstruction and expression-level quantification from RNA-seq data. Our results show that most algorithms are able to identify discrete transcript components with high success rates but that assembly of complete isoform structures poses a major challenge even when all constituent elements are identified. Expression-level estimates also varied widely across methods, even when based on similar transcript models. Consequently, the complexity of higher eukaryotic genomes imposes severe limitations on transcript recall and splice product discrimination that are likely to remain limiting factors for the analysis of current-generation RNA-seq data.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Microtubule plus-end-tracking proteins (+TIPs) specifically localize to the growing plus-ends of microtubules to regulate microtubule dynamics and functions. A large group of +TIPs contain a short linear motif, SXIP, which is essential for them to bind to end-binding proteins (EBs) and target microtubule ends. The SXIP sequence site thus acts as a widespread microtubule tip localization signal (MtLS). Here we have analyzed the sequence-function relationship of a canonical MtLS. Using synthetic peptide arrays on membrane supports, we identified the residue preferences at each amino acid position of the SXIP motif and its surrounding sequence with respect to EB binding. We further developed an assay based on fluorescence polarization to assess the mechanism of the EB-SXIP interaction and to correlate EB binding and microtubule tip tracking of MtLS sequences from different +TIPs. Finally, we investigated the role of phosphorylation in regulating the EB-SXIP interaction. Together, our results define the sequence determinants of a canonical MtLS and provide the experimental data for bioinformatics approaches to carry out genome-wide predictions of novel +TIPs in multiple organisms.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Summary [résumé français voir ci-dessous] From the beginning of the 20th century the world population has been confronted with the human immune deficiency virus 1 (HIV-1). This virus has the particularity to mutate fast, and could thus evade and adapt to the human host. Our closest evolutionary related organisms, the non-human primates, are less susceptible to HIV-1. In a broader sense, primates are differentially susceptible to various retrovirus. Species specificity may be due to genetic differences among primates. In the present study we applied evolutionary and comparative genetic techniques to characterize the evolutionary pattern of host cellular determinants of HIV-1 pathogenesis. The study of the evolution of genes coding for proteins participating to the restriction or pathogenesis of HIV-1 may help understanding the genetic basis of modern human susceptibility to infection. To perform comparative genetics analysis, we constituted a collection of primate DNA and RNA to allow generation of de novo sequence of gene orthologs. More recently, release to the public domain of two new primate complete genomes (bornean orang-utan and common marmoset) in addition of the three previously available genomes (human, chimpanzee and Rhesus monkey) help scaling up the evolutionary and comparative genome analysis. Sequence analysis used phylogenetic and statistical methods for detecting molecular adaptation. We identified different selective pressures acting on host proteins involved in HIV-1 pathogenesis. Proteins with HIV-1 restriction properties in non-human primates were under strong positive selection, in particular in regions of interaction with viral proteins. These regions carried key residues for the antiviral activity. Proteins of the innate immunity presented an evolutionary pattern of conservation (purifying selection) but with signals of relaxed constrain if we compared them to the average profile of purifying selection of the primate genomes. Large scale analysis resulted in patterns of evolutionary pressures according to molecular function, biological process and cellular distribution. The data generated by various analyses served to guide the ancestral reconstruction of TRIM5a a potent antiviral host factor. The resurrected TRIM5a from the common ancestor of Old world monkeys was effective against HIV-1 and the recent resurrected hominoid variants were more effective against other retrovirus. Thus, as the result of trade-offs in the ability to restrict different retrovirus, human might have been exposed to HIV-1 at a time when TRIM5a lacked the appropriate specific restriction activity. The application of evolutionary and comparative genetic tools should be considered for the systematical assessment of host proteins relevant in viral pathogenesis, and to guide biological and functional studies. Résumé La population mondiale est confrontée depuis le début du vingtième siècle au virus de l'immunodéficience humaine 1 (VIH-1). Ce virus a un taux de mutation particulièrement élevé, il peut donc s'évader et s'adapter très efficacement à son hôte. Les organismes évolutivement le plus proches de l'homme les primates nonhumains sont moins susceptibles au VIH-1. De façon générale, les primates répondent différemment aux rétrovirus. Cette spécificité entre espèces doit résider dans les différences génétiques entre primates. Dans cette étude nous avons appliqué des techniques d'évolution et de génétique comparative pour caractériser le modèle évolutif des déterminants cellulaires impliqués dans la pathogenèse du VIH- 1. L'étude de l'évolution des gènes, codant pour des protéines impliquées dans la restriction ou la pathogenèse du VIH-1, aidera à la compréhension des bases génétiques ayant récemment rendu l'homme susceptible. Pour les analyses de génétique comparative, nous avons constitué une collection d'ADN et d'ARN de primates dans le but d'obtenir des nouvelles séquences de gènes orthologues. Récemment deux nouveaux génomes complets ont été publiés (l'orang-outan du Bornéo et Marmoset commun) en plus des trois génomes déjà disponibles (humain, chimpanzé, macaque rhésus). Ceci a permis d'améliorer considérablement l'étendue de l'analyse. Pour détecter l'adaptation moléculaire nous avons analysé les séquences à l'aide de méthodes phylogénétiques et statistiques. Nous avons identifié différentes pressions de sélection agissant sur les protéines impliquées dans la pathogenèse du VIH-1. Des protéines avec des propriétés de restriction du VIH-1 dans les primates non-humains présentent un taux particulièrement haut de remplacement d'acides aminés (sélection positive). En particulier dans les régions d'interaction avec les protéines virales. Ces régions incluent des acides aminés clé pour l'activité de restriction. Les protéines appartenant à l'immunité inné présentent un modèle d'évolution de conservation (sélection purifiante) mais avec des traces de "relaxation" comparé au profil général de sélection purifiante du génome des primates. Une analyse à grande échelle a permis de classifier les modèles de pression évolutive selon leur fonction moléculaire, processus biologique et distribution cellulaire. Les données générées par les différentes analyses ont permis la reconstruction ancestrale de TRIM5a, un puissant facteur antiretroviral. Le TRIM5a ressuscité, correspondant à l'ancêtre commun entre les grands singes et les groupe des catarrhiniens, est efficace contre le VIH-1 moderne. Les TRIM5a ressuscités plus récents, correspondant aux ancêtres des grands singes, sont plus efficaces contre d'autres rétrovirus. Ainsi, trouver un compromis dans la capacité de restreindre différents rétrovirus, l'homme aurait été exposé au VIH-1 à une période où TRIM5a manquait d'activité de restriction spécifique contre celui-ci. L'application de techniques d'évolution et de génétique comparative devraient être considérées pour l'évaluation systématique de protéines impliquées dans la pathogenèse virale, ainsi que pour guider des études biologiques et fonctionnelles

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Eukaryotic cells make many types of primary and processed RNAs that are found either in specific subcellular compartments or throughout the cells. A complete catalogue of these RNAs is not yet available and their characteristic subcellular localizations are also poorly understood. Because RNA represents the direct output of the genetic information encoded by genomes and a significant proportion of a cell's regulatory capabilities are focused on its synthesis, processing, transport, modification and translation, the generation of such a catalogue is crucial for understanding genome function. Here we report evidence that three-quarters of the human genome is capable of being transcribed, as well as observations about the range and levels of expression, localization, processing fates, regulatory regions and modifications of almost all currently annotated and thousands of previously unannotated RNAs. These observations, taken together, prompt a redefinition of the concept of a gene.