987 resultados para Eukaryotic Genomes


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Eukaryotic genome similarity relationships are inferred using sequence information derived from large aggregates of genomic sequences. Comparisons within and between species sample sequences are based on the profile of dinucleotide relative abundance values (The profile is ρ*XY = f*XY/f*Xf*Y for all XY, where f*X denotes the frequency of the nucleotide X and f*XY denotes the frequency of the dinucleotide XY, both computed from the sequence concatenated with its inverted complement). Previous studies with respect to prokaryotes and this study document that profiles of different DNA sequence samples (sample size ≥50 kb) from the same organism are generally much more similar to each other than they are to profiles from other organisms, and that closely related organisms generally have more similar profiles than do distantly related organisms. On this basis we refer to the collection {ρ*XY} as the genome signature. This paper identifies ρ*XY extremes and compares genome signature differences for a diverse range of eukaryotic species. Interpretations on the mechanisms maintaining these profile differences center on genome-wide replication, repair, DNA structures, and context-dependent mutational biases. It is also observed that mitochondrial genome signature differences between species parallel the corresponding nuclear genome signature differences despite large differences between corresponding mitochondrial and nuclear signatures. The genome signature differences also have implications for contrasts between rodents and other mammals, and between monocot and dicot plants, as well as providing evidence for similarities among fungi and the diversity of protists.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Eukaryotic genomes display segmental patterns of variation in various properties, including GC content and degree of evolutionary conservation. DNA segmentation algorithms are aimed at identifying statistically significant boundaries between such segments. Such algorithms may provide a means of discovering new classes of functional elements in eukaryotic genomes. This paper presents a model and an algorithm for Bayesian DNA segmentation and considers the feasibility of using it to segment whole eukaryotic genomes. The algorithm is tested on a range of simulated and real DNA sequences, and the following conclusions are drawn. Firstly, the algorithm correctly identifies non-segmented sequence, and can thus be used to reject the null hypothesis of uniformity in the property of interest. Secondly, estimates of the number and locations of change-points produced by the algorithm are robust to variations in algorithm parameters and initial starting conditions and correspond to real features in the data. Thirdly, the algorithm is successfully used to segment human chromosome 1 according to GC content, thus demonstrating the feasibility of Bayesian segmentation of eukaryotic genomes. The software described in this paper is available from the author's website (www.uq.edu.au/similar to uqjkeith/) or upon request to the author.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

La cerca de similituds als codis genètics de dos espècies, ens permet obtenir molta informació de la evolució dels seus genomes. Aquesta informació afavoreix el descobriment de gens que es conserven amb la mateixa funcionalitat a diferents espècies. També té importants aplicacions mèdiques i ens permet entendre els processos evolutius que han portat a la diversitat d'espècies de l'actualitat. El present treball té l'objectiu d'automatitzar una sèrie de processos d'un servidor d'aplicacions web: http://platypus.uab.cat, que realitzin de forma òptima i eficient, la comparació dels genomes eucariotes, tots amb tots, conforme aquests genomes siguin seqüenciats. Així aquestes comparacions entre genomes de organismes superiors podran ser consultades via web.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

DNA double-strand breaks (DSBs) represent a major threat to the genomic stability of eukaryotic cells. DNA repair mechanisms such as non-homologous end joining (NHEJ) are responsible for the maintenance of eukaryotic genomes. Dysfunction of one or more of the many protein complexes that function in NHEJ can lead to sensitivity to DNA damaging agents, apoptosis, genomic instability, and severe combined immunodeficiency. One protein, Pso2p, was shown to participate in the repair of DSBs induced by DNA inter-strand cross-linking (ICL) agents such as cisplatin, nitrogen mustard or photo-activated bi-functional psoralens. The molecular function of Pso2p in DNA repair is unknown, but yeast and mammalian cell line mutants for PSO2 show the same cellular responses as strains with defects in NHEJ, e.g., sensitivity to ICLs and apoptosis. The Pso2p human homologue Artemis participates in V(D)J recombination. Mutations in Artemis induce a variety of immunological deficiencies, a predisposition to lymphomas, and an increase in chromosomal aberrations. In order to better understand the role of Pso2p in the repair of DSBs generated as repair intermediates of ICLs, an in silico approach was used to characterize the catalytic domain of Pso2p, which led to identification of novel Pso2p homologues in other organisms. Moreover, we found the catalytic core of Pso2p fused to different domains. In plants, a specific ATP-dependent DNA ligase I contains the catalytic core of Pso2p, constituting a new DNA ligase family, which was named LIG6. The possible functions of Pso2p/Artemis/Lig6p in NHEJ and V(D)J recombination and in other cellular metabolic reactions are discussed.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Background: We report an analysis of a protein network of functionally linked proteins, identified from a phylogenetic statistical analysis of complete eukaryotic genomes. Phylogenetic methods identify pairs of proteins that co-evolve on a phylogenetic tree, and have been shown to have a high probability of correctly identifying known functional links. Results: The eukaryotic correlated evolution network we derive displays the familiar power law scaling of connectivity. We introduce the use of explicit phylogenetic methods to reconstruct the ancestral presence or absence of proteins at the interior nodes of a phylogeny of eukaryote species. We find that the connectivity distribution of proteins at the point they arise on the tree and join the network follows a power law, as does the connectivity distribution of proteins at the time they are lost from the network. Proteins resident in the network acquire connections over time, but we find no evidence that 'preferential attachment' - the phenomenon of newly acquired connections in the network being more likely to be made to proteins with large numbers of connections - influences the network structure. We derive a 'variable rate of attachment' model in which proteins vary in their propensity to form network interactions independently of how many connections they have or of the total number of connections in the network, and show how this model can produce apparent power-law scaling without preferential attachment. Conclusion: A few simple rules can explain the topological structure and evolutionary changes to protein-interaction networks: most change is concentrated in satellite proteins of low connectivity and small phenotypic effect, and proteins differ in their propensity to form attachments. Given these rules of assembly, power law scaled networks naturally emerge from simple principles of selection, yielding protein interaction networks that retain a high-degree of robustness on short time scales and evolvability on longer evolutionary time scales.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Using computer programs developed for this purpose, we searched for various repeated sequences including inverted, direct tandem, and homopurine–homopyrimidine mirror repeats in various prokaryotes, eukaryotes, and an archaebacterium. Comparison of observed frequencies with expectations revealed that in bacterial genomes and organelles the frequency of different repeats is either random or enriched for inverted and/or direct tandem repeats. By contrast, in all eukaryotic genomes studied, we observed an overrepresentation of all repeats, especially homopurine–homopyrimidine mirror repeats. Analysis of the genomic distribution of all abundant repeats showed that they are virtually excluded from coding sequences. Unexpectedly, the frequencies of abundant repeats normalized for their expectations were almost perfect exponential functions of their size, and for a given repeat this function was indistinguishable between different genomes.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The SWISS-PROT group at EBI has developed the Proteome Analysis Database utilising existing resources and providing comparative analysis of the predicted protein coding sequences of the complete genomes of bacteria, archaea and eukaryotes (http://www.ebi.ac.uk/proteome/). The two main projects used, InterPro and CluSTr, give a new perspective on families, domains and sites and cover 31–67% (InterPro statistics) of the proteins from each of the complete genomes. CluSTr covers the three complete eukaryotic genomes and the incomplete human genome data. The Proteome Analysis Database is accompanied by a program that has been designed to carry out InterPro proteome comparisons for any one proteome against any other one or more of the proteomes in the database.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Simple sequence repeats (SSRs), consisting of tandemly repeated multiple copies of mono-, di-, tri-, or tetranucleotide motifs, are ubiquitous in eukaryotic genomes and are frequently used as genetic markers, taking advantage of their length polymorphism. We have examined the polymorphism of such sequences in the chloroplast genomes of plants, by using a PCR-based assay. GenBank searches identified the presence of several (dA)n.(dT)n mononucleotide stretches in chloroplast genomes. A chloroplast (cp) SSR was identified in three pine species (Pinus contorta, Pinus sylvestris, and Pinus thunbergii) 312 bp upstream of the psbA gene. DNA amplification of this repeated region from 11 pine species identified nine length variants. The polymorphic amplified fragments were isolated and the DNA sequence was determined, confirming that the length polymorphism was caused by variation in the length of the repeated region. In the pines, the chloroplast genome is transmitted through pollen and this PCR assay may be used to monitor gene flow in this genus. Analysis of 305 individuals from seven populations of Pinus leucodermis Ant. revealed the presence of four variants with intrapopulational diversities ranging from 0.000 to 0.629 and an average of 0.320. Restriction fragment length polymorphism analysis of cpDNA on the same populations previously failed to detect any variation. Population subdivision based on cpSSR was higher (Gst = 0.22, where Gst is coefficient of gene differentiation) than that revealed in a previous isozyme study (Gst = 0.05). We anticipate that SSR loci within the chloroplast genome should provide a highly informative assay for the analysis of the genetic structure of plant populations.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The identification and annotation of protein-coding genes is one of the primary goals of whole-genome sequencing projects, and the accuracy of predicting the primary protein products of gene expression is vital to the interpretation of the available data and the design of downstream functional applications. Nevertheless, the comprehensive annotation of eukaryotic genomes remains a considerable challenge. Many genomes submitted to public databases, including those of major model organisms, contain significant numbers of wrong and incomplete gene predictions. We present a community-based reannotation of the Aspergillus nidulans genome with the primary goal of increasing the number and quality of protein functional assignments through the careful review of experts in the field of fungal biology. (C) 2009 Elsevier Inc. All rights reserved.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Microsatellites or simple sequence repeats (SSRs) are ubiquitous in eukaryotic genomes. Single-locus SSR markers have been developed for a number of species, although there is a major bottleneck in developing SSR markers whereby flanking sequences must be known to design 5'-anchors for polymerase chain reaction (PCR) primers. Inter SSR (ISSR) fingerprinting was developed such that no sequence knowledge was required. Primers based on a repeat sequence, such as (CA)(n), can be made with a degenerate 3'-anchor, such as (CA)(8)RG or (AGC)(6)TY. The resultant PCR reaction amplifies the sequence between two SSRs, yielding a multilocus marker system useful for fingerprinting, diversity analysis and genome mapping. PCR products are radiolabelled with P-32 or P-33 via end-labelling or PCR incorporation, and separated on a polyacrylamide sequencing gel prior to autoradiographic visualisation. A typical reaction yields 20-100 bands per lane depending on the species and primer. We have used ISSR fingerprinting in a number of plant species, and report here some results on two important tropical species, sorghum and banana. Previous investigators have demonstrated that ISSR analysis usually detects a higher level of polymorphism than that detected with restriction fragment length polymorphism (RFLP) or random amplified polymorphic DNA (RAPD) analyses. Our data indicate that this is not a result of greater polymorphism genetically, but rather technical reasons related to the detection methodology used for ISSR analysis.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Eukaryotic phenotypic diversity arises from multitasking of a core proteome of limited size. Multitasking is routine in computers, as well as in other sophisticated information systems, and requires multiple inputs and outputs to control and integrate network activity. Higher eukaryotes have a mosaic gene structure with a dual output, mRNA (protein-coding) sequences and introns, which are released from the pre-mRNA by posttranscriptional processing. Introns have been enormously successful as a class of sequences and comprise up to 95% of the primary transcripts of protein-coding genes in mammals. In addition, many other transcripts (perhaps more than half) do not encode proteins at all, but appear both to be developmentally regulated and to have genetic function. We suggest that these RNAs (eRNAs) have evolved to function as endogenous network control molecules which enable direct gene-gene communication and multitasking of eukaryotic genomes. Analysis of a range of complex genetic phenomena in which RNA is involved or implicated, including co-suppression, transgene silencing, RNA interference, imprinting, methylation, and transvection, suggests that a higher-order regulatory system based on RNA signals operates in the higher eukaryotes and involves chromatin remodeling as well as other RNA-DNA, RNA-RNA, and RNA-protein interactions. The evolution of densely connected gene networks would be expected to result in a relatively stable core proteome due to the multiple reuse of components, implying,that cellular differentiation and phenotypic variation in the higher eukaryotes results primarily from variation in the control architecture. Thus, network integration and multitasking using trans-acting RNA molecules produced in parallel with protein-coding sequences may underpin both the evolution of developmentally sophisticated multicellular organisms and the rapid expansion of phenotypic complexity into uncontested environments such as those initiated in the Cambrian radiation and those seen after major extinction events.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In recent years, analysis of the genomes of many organisms has received increasing international attention. The bulk of the effort to date has centred on the Human Genome Project and analysis of model organisms such as yeast, Drosophila and Caenorhabditis elegans. More recently, the revolution in genome sequencing and gene identification has begun to impact on infectious disease organisms. Initially, much of the effort was concentrated on prokaryotes, but small eukaryotic genomes, including the protozoan parasites Plasmodium, Toxoplasma and trypanosomatids (Leishmania, Trypanosoma brucei and T. cruzi), as well as some multicellular organisms, such as Brugia and Schistosoma, are benefiting from the technological advances of the genome era. These advances promise a radical new approach to the development of novel diagnostic tools, chemotherapeutic targets and vaccines for infectious disease organisms, as well as to the more detailed analysis of cell biology and function.Several networks or consortia linking laboratories around the world have been established to support these parasite genome projects[1] (for more information, see http://www.ebi.ac.uk/ parasites/paratable.html). Five of these networks were supported by an initiative launched in 1994 by the Specific Programme for Research and Tropical Diseases (TDR) of the WHO[2, 3, 4, 5, 6]. The Leishmania Genome Network (LGN) is one of these[3]. Its activities are reported at http://www.ebi.ac.uk/parasites/leish.html, and its current aim is to map and sequence the genome of Leishmania by the year 2002. All the mapping, hybridization and sequence data are also publicly available from LeishDB, an AceDB-based genome database (http://www.ebi.ac.uk/parasites/LGN/leissssoft.html).

Relevância:

60.00% 60.00%

Publicador:

Resumo:

SUMMARY The expression state of a eukaryotic gene depends in part on its location in the chromosome. This position effect results from the organization of eukaryotic genomes into discrete functional domains, defined by local differences in chromatin structure. The expression of genes within each domain appears to be defined and maintained by the concerted action of regulatory elements such as promoters, enhancers, silencers and locus control regions. Individual domains may be bordered by boundary elements that separate regions of permissive and silent chromatin. When located next to chromosomal elements such as telomeres, genes can be subjected to epigenetic silencing. In yeast, this is mediated by the propagation of the SIR proteins from telomeres towards more centromeric regions. Particular transcription factors can protect downstream genes from silencing when tethered between the gene and the telomere, and they may thus act as chromatin domain boundaries. Here we have studied one of these transcription factors, CTF-1, that binds directly histone H3. A deletion mutagenesis localized the barrier activity to CTF-1 histone-binding domain. A saturating point mutagenesis of this domain identified several amino-acid substitutions that similarly inhibited the boundary and histone-binding activities. Chromatin immunoprecipitation experiments indicated that the barrier protein efficiently prevents the spreading of SIR proteins, and that it separates domains of hypoacetylated and hyperacetylated histones. Together, these results suggest a mechanism by which proteins such as CTF-1 may interact directly with histone H3 to prevent the propagation of a silent chromatin structure, thereby defining boundaries of permissive and silent chromatin domains. RESUME L'expression des gènes eucaryotes dépend en partie de leur localisation sur les chromosomes. Cet effet de position résulte de l'organisation des génomes eucaryotes en domaines fonctionnels, définis par des changements locaux au niveau de la structure de la chromatine. Dans chacun de ces domaines, l'expression des gènes est définie et maintenue par l'action concertée de différents éléments régulateurs tels que les promoteurs, les amplificateurs, les silenceurs et les locus control régions. Ces domaines peuvent être entourés par des éléments barrière, séparant les régions de chromatine répressive des régions permissive pour l'expression des gènes. Lorsqu'ils se situent à proximité d'éléments chromosomiques comme les telomères, les gènes peuvent être réprimés de manière épigénétique. Chez la levure, cette répression est établie par la propagation des protéines SIR depuis les télomères vers les régions centromériques. Certains facteurs de transcription peuvent empêcher la répression d'un gène, lorsqu'ils sont placés entre ce gène et le télomère. Nous avons étudié un de ces facteurs, CTF-1, qui a la particularité de lier directement l'histone H3. La délétion de certaines parties de CTF-1 a permis de déterminer que la région responsable de l'activité barrière correspond au domaine d'interaction avec H3. Plusieurs mutations points effectuées dans ce domaine inhibent à la fois l'activité barrière et la capacité de lier H3. Des expériences d'immuno-précipitation de la chromatine indiquent que la protéine barrière CTF-1 prévient efficacement la propagation des protéines SIR et sépare des domaines contenant des histones hypo-acétylées de ceux constitués d'histones hyper-acétylées. Ces résultats suggèrent que CTF-1 interagit directement avec l'histone H3 pour empêcher la propagation de la chromatine répressive, délimitant ainsi des domaines de chromatine permissive et des domaines de chromatine silencieuse.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Poor understanding of the spliceosomal mechanisms to select intronic 3' ends (3'ss) is a major obstacle to deciphering eukaryotic genomes. Here, we discern the rules for global 3'ss selection in yeast. We show that, in contrast to the uniformity of yeast splicing, the spliceosome uses all available 3'ss within a distance window from the intronic branch site (BS), and that in 70% of all possible 3'ss this is likely to be mediated by pre-mRNA structures. Our results reveal that one of these RNA folds acts as an RNA thermosensor, modulating alternative splicing in response to heat shock by controlling alternate 3'ss availability. Thus, our data point to a deeper role for the pre-mRNA in the control of its own fate, and to a simple mechanism for some alternative splicing.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We evaluated 25 protocol variants of 14 independent computational methods for exon identification, transcript reconstruction and expression-level quantification from RNA-seq data. Our results show that most algorithms are able to identify discrete transcript components with high success rates but that assembly of complete isoform structures poses a major challenge even when all constituent elements are identified. Expression-level estimates also varied widely across methods, even when based on similar transcript models. Consequently, the complexity of higher eukaryotic genomes imposes severe limitations on transcript recall and splice product discrimination that are likely to remain limiting factors for the analysis of current-generation RNA-seq data.