955 resultados para Espectral sequences
Resumo:
The goals of the human genome project did not include sequencing of the heterochromatic regions. We describe here an initial sequence of 1.1 Mb of the short arm of human chromosome 21 (HSA21p), estimated to be 10% of 21p. This region contains extensive euchromatic-like sequence and includes on average one transcript every 100 kb. These transcripts show multiple inter- and intrachromosomal copies, and extensive copy number and sequence variability. The sequencing of the "heterochromatic" regions of the human genome is likely to reveal many additional functional elements and provide important evolutionary information.
Resumo:
The construction of metagenomic libraries has permitted the study of microorganisms resistant to isolation and the analysis of 16S rDNA sequences has been used for over two decades to examine bacterial biodiversity. Here, we show that the analysis of random sequence reads (RSRs) instead of 16S is a suitable shortcut to estimate the biodiversity of a bacterial community from metagenomic libraries. We generated 10,010 RSRs from a metagenomic library of microorganisms found in human faecal samples. Then searched them using the program BLASTN against a prokaryotic sequence database to assign a taxon to each RSR. The results were compared with those obtained by screening and analysing the clones containing 16S rDNA sequences in the whole library. We found that the biodiversity observed by RSR analysis is consistent with that obtained by 16S rDNA. We also show that RSRs are suitable to compare the biodiversity between different metagenomic libraries. RSRs can thus provide a good estimate of the biodiversity of a metagenomic library and, as an alternative to 16S, this approach is both faster and cheaper.
Resumo:
The vast majority of the biology of a newly sequenced genome is inferred from the set of encoded proteins. Predicting this set is therefore invariably the first step after the completion of the genome DNA sequence. Here we review the main computational pipelines used to generate the human reference protein-coding gene sets.
Resumo:
A pool of oligonucleotides encoding a start methionine and nine random amino acids was inserted at the 5'-end of the gene for the yeast cytochrome oxidase subunit IV lacking its own mitochondrial targeting sequence. Approximately one-quarter of the randomly generated sequences targeted subunit IV to its correct intramitochondrial location in vivo. Sequence analysis of 89 randomly generated sequences showed that their efficiencies as mitochondrial targeting signals correlated with the potential to fold into an amphiphilic alpha-helix. Functional targeting sequences were enriched in arginine and isoleucine residues but contained few aspartate, glutamate, and proline residues. Nonfunctional sequences predicted to have significant helical amphiphilicity often had at least one acidic or multiple helix-breaking residues that would be expected to interfere with targeting functioning. These results support the hypothesis that the signal for targeting a protein into the mitochondrial matrix is usually a positively charged amphiphilic helix.
Resumo:
We designed a trap system to isolate different amino acid sequences which could target proteins to the cell surface via GPI anchor transfer. This selection procedure is based on the insertion of various sequences which regenerate a functional GPI anchor signal sequence and therefore provoke re-expression at the surface of a reporter molecule. Using this trap for cell surface targeting sequences, we could show the importance of the defined elements essential for GPI anchor addition. Such a system could be used for an exhaustive analysis of the carboxyl terminus structural requirements for GPI membrane anchoring.
Resumo:
BACKGROUND: Conserved non-coding sequences in the human genome are approximately tenfold more abundant than known genes, and have been hypothesized to mark the locations of cis-regulatory elements. However, the global contribution of conserved non-coding sequences to the transcriptional regulation of human genes is currently unknown. Deeply conserved elements shared between humans and teleost fish predominantly flank genes active during morphogenesis and are enriched for positive transcriptional regulatory elements. However, such deeply conserved elements account for <1% of the conserved non-coding sequences in the human genome, which are predominantly mammalian. RESULTS: We explored the regulatory potential of a large sample of these 'common' conserved non-coding sequences using a variety of classic assays, including chromatin remodeling, and enhancer/repressor and promoter activity. When tested across diverse human model cell types, we find that the fraction of experimentally active conserved non-coding sequences within any given cell type is low (approximately 5%), and that this proportion increases only modestly when considered collectively across cell types. CONCLUSIONS: The results suggest that classic assays of cis-regulatory potential are unlikely to expose the functional potential of the substantial majority of mammalian conserved non-coding sequences in the human genome.
Resumo:
BACKGROUND: Analysis of the first reported complete genome sequence of Bifidobacterium longum NCC2705, an actinobacterium colonizing the gastrointestinal tract, uncovered its proteomic relatedness to Streptomyces coelicolor and Mycobacterium tuberculosis. However, a rapid scrutiny by genometric methods revealed a genome organization totally different from all so far sequenced high-GC Gram-positive chromosomes. RESULTS: Generally, the cumulative GC- and ORF orientation skew curves of prokaryotic genomes consist of two linear segments of opposite slope: the minimum and the maximum of the curves correspond to the origin and the terminus of chromosome replication, respectively. However, analyses of the B. longum NCC2705 chromosome yielded six, instead of two, linear segments, while its dnaA locus, usually associated with the origin of replication, was not located at the minimum of the curves. Furthermore, the coorientation of gene transcription with replication was very low. Comparison with closely related actinobacteria strongly suggested that the chromosome of B. longum was misassembled, and the identification of two pairs of relatively long homologous DNA sequences offers the possibility for an alternative genome assembly proposed here below. By genometric criteria, this configuration displays all of the characters common to bacteria, in particular to related high-GC Gram-positives. In addition, it is compatible with the partially sequenced genome of DJO10A B. longum strain. Recently, a corrected sequence of B. longum NCC2705, with a configuration similar to the one proposed here below, has been deposited in GenBank, confirming our predictions. CONCLUSION: Genometric analyses, in conjunction with standard bioinformatic tools and knowledge of bacterial chromosome architecture, represent fast and straightforward methods for the evaluation of chromosome assembly.
Resumo:
The vast majority of the biology of a newly sequenced genome is inferred from the set of encoded proteins. Predicting this set is therefore invariably the first step after the completion of the genome DNA sequence. Here we review the main computational pipelines used to generate the human reference protein-coding gene sets.
Resumo:
The shrews of the Sorex araneus group, characterized by the sexual chromosome complex XY1, Y2 have been intensively studied by morphological, karyotypical, and biochemical analyses. Nevertheless, the phylogenetic relationships among the species belonging to the araneus complex are still under debate, as different approaches gave often contradictory results. In this paper, partial nucleotide sequences of the mitochondrial DNA cytochrome b gene (1011 bp) were determined for 6 species of the araneus group from Eurasia and North America. We also included in the data set the sequences of Sorex samniticus, whose relationships with the araneus group remain controversial. Three other species representing two major karyological groups were also examined. Both parsimony and distance trees strongly support the monophyly of the araneus group. Sorex sumniticus is significantly more closely related to the araneus complex than to the other species included in the analysis. Based on the branching pattern within the araneus group, an attempt has been made to reconstruct the colonization history of the Holarctic region.
Resumo:
Sequential randomized prediction of an arbitrary binary sequence isinvestigated. No assumption is made on the mechanism of generating the bit sequence. The goal of the predictor is to minimize its relative loss, i.e., to make (almost) as few mistakes as the best ``expert'' in a fixed, possibly infinite, set of experts. We point out a surprising connection between this prediction problem and empirical process theory. First, in the special case of static (memoryless) experts, we completely characterize the minimax relative loss in terms of the maximum of an associated Rademacher process. Then we show general upper and lower bounds on the minimaxrelative loss in terms of the geometry of the class of experts. As main examples, we determine the exact order of magnitude of the minimax relative loss for the class of autoregressive linear predictors and for the class of Markov experts.
Resumo:
The MyHits web site (http://myhits.isb-sib.ch) is an integrated service dedicated to the analysis of protein sequences. Since its first description in 2004, both the user interface and the back end of the server were improved. A number of tools (e.g. MAFFT, Jacop, Dotlet, Jalview, ESTScan) were added or updated to improve the usability of the service. The MySQL schema and its associated API were revamped and the database engine (HitKeeper) was separated from the web interface. This paper summarizes the current status of the server, with an emphasis on the new services.
Resumo:
The analysis of conservation between the human and mouse genomes resulted in the identification of a large number of conserved nongenic sequences (CNGs). The functional significance of this nongenic conservation remains unknown, however. The availability of the sequence of a third mammalian genome, the dog, allows for a large-scale analysis of evolutionary attributes of CNGs in mammals. We have aligned 1638 previously identified CNGs and 976 conserved exons (CODs) from human chromosome 21 (Hsa21) with their orthologous sequences in mouse and dog. Attributes of selective constraint, such as sequence conservation, clustering, and direction of substitutions were compared between CNGs and CODs, showing a clear distinction between the two classes. We subsequently performed a chromosome-wide analysis of CNGs by correlating selective constraint metrics with their position on the chromosome and relative to their distance from genes. We found that CNGs appear to be randomly arranged in intergenic regions, with no bias to be closer or farther from genes. Moreover, conservation and clustering of substitutions of CNGs appear to be completely independent of their distance from genes. These results suggest that the majority of CNGs are not typical of previously described regulatory elements in terms of their location. We propose models for a global role of CNGs in genome function and regulation, through long-distance cis or trans chromosomal interactions.
Resumo:
We consider adaptive sequential lossy coding of bounded individual sequences when the performance is measured by the sequentially accumulated mean squared distortion. Theencoder and the decoder are connected via a noiseless channel of capacity $R$ and both are assumed to have zero delay. No probabilistic assumptions are made on how the sequence to be encoded is generated. For any bounded sequence of length $n$, the distortion redundancy is defined as the normalized cumulative distortion of the sequential scheme minus the normalized cumulative distortion of the best scalarquantizer of rate $R$ which is matched to this particular sequence. We demonstrate the existence of a zero-delay sequential scheme which uses common randomization in the encoder and the decoder such that the normalized maximum distortion redundancy converges to zero at a rate $n^{-1/5}\log n$ as the length of the encoded sequence $n$ increases without bound.
Resumo:
Partial DNA sequences from two mitochondrial (mt) and one nuclear gene (cytochrome b, 12S rRNA, and C-mos) were used to estimate the phylogenetic relationships among the six extant species of skinks endemic to the Cape Verde Archipelago. The species form a monophyletic unit, indicating a single colonization of the islands, probably from West Africa. Mabuya vaillanti and M. delalandii are sister taxa, as indicated by morphological characters. Mabuya fogoensis and M. stangeri are closely related, but the former is probably paraphyletic. Mabuya spinalis and M. salensis are also probably paraphyletic. Within species, samples from separate islands always form monophyletic groups. Some colonization events can be hypothesized, which are in line with the age of the islands. C-mos variation is concordant with the topology derived from mtDNA.