968 resultados para EUKARYOTIC GENOMES


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Eukaryotic genome similarity relationships are inferred using sequence information derived from large aggregates of genomic sequences. Comparisons within and between species sample sequences are based on the profile of dinucleotide relative abundance values (The profile is ρ*XY = f*XY/f*Xf*Y for all XY, where f*X denotes the frequency of the nucleotide X and f*XY denotes the frequency of the dinucleotide XY, both computed from the sequence concatenated with its inverted complement). Previous studies with respect to prokaryotes and this study document that profiles of different DNA sequence samples (sample size ≥50 kb) from the same organism are generally much more similar to each other than they are to profiles from other organisms, and that closely related organisms generally have more similar profiles than do distantly related organisms. On this basis we refer to the collection {ρ*XY} as the genome signature. This paper identifies ρ*XY extremes and compares genome signature differences for a diverse range of eukaryotic species. Interpretations on the mechanisms maintaining these profile differences center on genome-wide replication, repair, DNA structures, and context-dependent mutational biases. It is also observed that mitochondrial genome signature differences between species parallel the corresponding nuclear genome signature differences despite large differences between corresponding mitochondrial and nuclear signatures. The genome signature differences also have implications for contrasts between rodents and other mammals, and between monocot and dicot plants, as well as providing evidence for similarities among fungi and the diversity of protists.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Eukaryotic genomes display segmental patterns of variation in various properties, including GC content and degree of evolutionary conservation. DNA segmentation algorithms are aimed at identifying statistically significant boundaries between such segments. Such algorithms may provide a means of discovering new classes of functional elements in eukaryotic genomes. This paper presents a model and an algorithm for Bayesian DNA segmentation and considers the feasibility of using it to segment whole eukaryotic genomes. The algorithm is tested on a range of simulated and real DNA sequences, and the following conclusions are drawn. Firstly, the algorithm correctly identifies non-segmented sequence, and can thus be used to reject the null hypothesis of uniformity in the property of interest. Secondly, estimates of the number and locations of change-points produced by the algorithm are robust to variations in algorithm parameters and initial starting conditions and correspond to real features in the data. Thirdly, the algorithm is successfully used to segment human chromosome 1 according to GC content, thus demonstrating the feasibility of Bayesian segmentation of eukaryotic genomes. The software described in this paper is available from the author's website (www.uq.edu.au/similar to uqjkeith/) or upon request to the author.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The highly modular nature of protein kinases generates diverse functional roles mediated by evolutionary events such as domain recombination, insertion and deletion of domains. Usually domain architecture of a kinase is related to the subfamily to which the kinase catalytic domain belongs. However outlier kinases with unusual domain architectures serve in the expansion of the functional space of the protein kinase family. For example, Src kinases are made-up of SH2 and SH3 domains in addition to the kinase catalytic domain. A kinase which lacks these two domains but retains sequence characteristics within the kinase catalytic domain is an outlier that is likely to have modes of regulation different from classical src kinases. This study defines two types of outlier kinases: hybrids and rogues depending on the nature of domain recombination. Hybrid kinases are those where the catalytic kinase domain belongs to a kinase subfamily but the domain architecture is typical of another kinase subfamily. Rogue kinases are those with kinase catalytic domain characteristic of a kinase subfamily but the domain architecture is typical of neither that subfamily nor any other kinase subfamily. This report provides a consolidated set of such hybrid and rogue kinases gleaned from six eukaryotic genomes-S. cerevisiae, D. melanogaster, C. elegans, M. musculus, T. rubripes and H. sapiens-and discusses their functions. The presence of such kinases necessitates a revisiting of the classification scheme of the protein kinase family using full length sequences apart from classical classification using solely the sequences of kinase catalytic domains. The study of these kinases provides a good insight in engineering signalling pathways for a desired output. Lastly, identification of hybrids and rogues in pathogenic protozoa such as P. falciparum sheds light on possible strategies in host-pathogen interactions.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

5-methylcytosine (m(5)C) as a rare base exists in eukaryotic genomes, which is a normal constitution in many eukaryotic DNA and the existence of m(5)C is a feature of eukaryotic DNA. Under regular physiological conditions, cytosine of eukaryotic DNA is usually methylated. Up to the present, many people consider that the m(5)C may be mutation hotspots by the deamination leading to gene mutation. Our study indicated that the spontaneous mutation caused by the transition of G.C --> A.T, in eukaryotic DNA, may result from the tautomer changing of base pairs and may also be cause by other factor actions, however it could not be caused by the deamination of m(5)C.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Background: Short and long interspersed elements (SINEs and LINEs, respectively), two types of retroposons, are active in shaping the architecture of genomes and powerful tools for studies of phylogeny and population biology. Here we developed special protocol to apply biotin-streptavidin bead system into isolation of interspersed repeated sequences rapidly and efficiently, in which SINEs and LINEs were captured directly from digested genomic DNA by hybridization to bead-probe complex in solution instead of traditional strategy including genomic library construction and screening. Results: A new couple of SINEs and LINEs that shared an almost identical 3'tail was isolated and characterized in silver carp and bighead carp of two closely related species. These SINEs (34 members), designated HAmo SINE family, were little divergent in sequence and flanked by obvious TSD indicated that HAmo SINE was very young family. The copy numbers of this family was estimated to 2 x 10(5) and 1.7 x 10(5) per haploid genome by Real-Time qPCR, respectively. The LINEs, identified as the homologs of LINE2 in other fishes, had a conserved primary sequence and secondary structures of the 3'tail region that was almost identical to that of HAmo SINE. These evidences suggest that HAmo SINEs are active and amplified recently utilizing the enzymatic machinery for retroposition of HAmoL2 through the recognition of higher-order structures of the conserved 42-tail region. We analyzed the possible structures of HAmo SINE that lead to successful amplification in genome and then deduced that HAmo SINE, SmaI SINE and FokI SINE that were similar in sequence each other, were probably generated independently and created by LINE family within the same lineage of a LINE phylogeny in the genomes of different hosts. Conclusion: The presented results show the advantage of the novel method for retroposons isolation and a pair of young SINE family and its partner LINE family in two carp fishes, which strengthened the hypotheses containing the slippage model for initiation of reverse transcription, retropositional parasitism of SINEs on LINEs, the formation of the stem loop structure in 3'tail region of some SINEs and LINEs and the mechanism of template switching in generating new SINE family.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Background: We report an analysis of a protein network of functionally linked proteins, identified from a phylogenetic statistical analysis of complete eukaryotic genomes. Phylogenetic methods identify pairs of proteins that co-evolve on a phylogenetic tree, and have been shown to have a high probability of correctly identifying known functional links. Results: The eukaryotic correlated evolution network we derive displays the familiar power law scaling of connectivity. We introduce the use of explicit phylogenetic methods to reconstruct the ancestral presence or absence of proteins at the interior nodes of a phylogeny of eukaryote species. We find that the connectivity distribution of proteins at the point they arise on the tree and join the network follows a power law, as does the connectivity distribution of proteins at the time they are lost from the network. Proteins resident in the network acquire connections over time, but we find no evidence that 'preferential attachment' - the phenomenon of newly acquired connections in the network being more likely to be made to proteins with large numbers of connections - influences the network structure. We derive a 'variable rate of attachment' model in which proteins vary in their propensity to form network interactions independently of how many connections they have or of the total number of connections in the network, and show how this model can produce apparent power-law scaling without preferential attachment. Conclusion: A few simple rules can explain the topological structure and evolutionary changes to protein-interaction networks: most change is concentrated in satellite proteins of low connectivity and small phenotypic effect, and proteins differ in their propensity to form attachments. Given these rules of assembly, power law scaled networks naturally emerge from simple principles of selection, yielding protein interaction networks that retain a high-degree of robustness on short time scales and evolvability on longer evolutionary time scales.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Using computer programs developed for this purpose, we searched for various repeated sequences including inverted, direct tandem, and homopurine–homopyrimidine mirror repeats in various prokaryotes, eukaryotes, and an archaebacterium. Comparison of observed frequencies with expectations revealed that in bacterial genomes and organelles the frequency of different repeats is either random or enriched for inverted and/or direct tandem repeats. By contrast, in all eukaryotic genomes studied, we observed an overrepresentation of all repeats, especially homopurine–homopyrimidine mirror repeats. Analysis of the genomic distribution of all abundant repeats showed that they are virtually excluded from coding sequences. Unexpectedly, the frequencies of abundant repeats normalized for their expectations were almost perfect exponential functions of their size, and for a given repeat this function was indistinguishable between different genomes.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The SWISS-PROT group at EBI has developed the Proteome Analysis Database utilising existing resources and providing comparative analysis of the predicted protein coding sequences of the complete genomes of bacteria, archaea and eukaryotes (http://www.ebi.ac.uk/proteome/). The two main projects used, InterPro and CluSTr, give a new perspective on families, domains and sites and cover 31–67% (InterPro statistics) of the proteins from each of the complete genomes. CluSTr covers the three complete eukaryotic genomes and the incomplete human genome data. The Proteome Analysis Database is accompanied by a program that has been designed to carry out InterPro proteome comparisons for any one proteome against any other one or more of the proteomes in the database.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Simple sequence repeats (SSRs), consisting of tandemly repeated multiple copies of mono-, di-, tri-, or tetranucleotide motifs, are ubiquitous in eukaryotic genomes and are frequently used as genetic markers, taking advantage of their length polymorphism. We have examined the polymorphism of such sequences in the chloroplast genomes of plants, by using a PCR-based assay. GenBank searches identified the presence of several (dA)n.(dT)n mononucleotide stretches in chloroplast genomes. A chloroplast (cp) SSR was identified in three pine species (Pinus contorta, Pinus sylvestris, and Pinus thunbergii) 312 bp upstream of the psbA gene. DNA amplification of this repeated region from 11 pine species identified nine length variants. The polymorphic amplified fragments were isolated and the DNA sequence was determined, confirming that the length polymorphism was caused by variation in the length of the repeated region. In the pines, the chloroplast genome is transmitted through pollen and this PCR assay may be used to monitor gene flow in this genus. Analysis of 305 individuals from seven populations of Pinus leucodermis Ant. revealed the presence of four variants with intrapopulational diversities ranging from 0.000 to 0.629 and an average of 0.320. Restriction fragment length polymorphism analysis of cpDNA on the same populations previously failed to detect any variation. Population subdivision based on cpSSR was higher (Gst = 0.22, where Gst is coefficient of gene differentiation) than that revealed in a previous isozyme study (Gst = 0.05). We anticipate that SSR loci within the chloroplast genome should provide a highly informative assay for the analysis of the genetic structure of plant populations.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The current explosion of DNA sequence information has generated increasing evidence for the claim that noncoding repetitive DNA sequences present within and around different genes could play an important role in genetic control processes, although the precise role and mechanism by which these sequences function are poorly understood. Several of the simple repetitive sequences which occur in a large number of loci throughout the human and other eukaryotic genomes satisfy the sequence criteria for forming non-B DNA structures in vitro. We have summarized some of the features of three different types of simple repeats that highlight the importance of repetitive DNA in the control of gene expression and chromatin organization. (i) (TG/CA)n repeats are widespread and conserved in many loci. These sequences are associated with nucleosomes of varying linker length and may play a role in chromatin organization. These Z-potential sequences can help absorb superhelical stress during transcription and aid in recombination. (ii) Human telomeric repeat (TTAGGG)n adopts a novel quadruplex structure and exhibits unusual chromatin organization. This unusual structural motif could explain chromosome pairing and stability. (iii) Intragenic amplification of (CTG)n/(CAG)n trinucleotide repeat, which is now known to be associated with several genetic disorders, could down-regulate gene expression in vivo. The overall implications of these findings vis-à-vis repetitive sequences in the genome are summarized.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Horizontal gene transfer (HGT) is known to be a major force in genome evolution. The acquisition of genes from viruses by eukaryotic genomes is a well-studied example of HGT, including rare cases of non-retroviral RNA virus integration. The present study describes the integration of cucumber mosaic virus RNA-1 into soybean genome. After an initial metatranscriptomic analysis of small RNAs derived from soybean, the de novo assembly resulted a 3029-nt contig homologous to RNA-1. The integration of this sequence in the soybean genome was confirmed by DNA deep sequencing. The locus where the integration occurred harbors the full RNA-1 sequence followed by the partial sequence of an endogenous mRNA and another sequence of RNA-1 as an inverted repeat and allowing the formation of a hairpin structure. This region recombined into a retrotransposon located inside an exon of a soybean gene. The nucleotide similarity of the integrated sequence compared to other Cucumber mosaic virus sequences indicates that the integration event occurred recently. We described a rare event of non-retroviral RNA virus integration in soybean that leads to the production of a double-stranded RNA in a similar fashion to virus resistance RNAi plants.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

With the preponderance of multidomain proteins in eukaryotic genomes, it is essential to recognize the constituent domains and their functions. Often function involves communications across the domain interfaces, and the knowledge of the interacting sites is essential to our understanding of the structure-function relationship. Using evolutionary information extracted from homologous domains in at least two diverse domain architectures (single and multidomain), we predict the interface residues corresponding to domains from the two-domain proteins. We also use information from the three-dimensional structures of individual domains of two-domain proteins to train naive Bayes classifier model to predict the interfacial residues. Our predictions are highly accurate (approximate to 85%) and specific (approximate to 95%) to the domain-domain interfaces. This method is specific to multidomain proteins which contain domains in at least more than one protein architectural context. Using predicted residues to constrain domain-domain interaction, rigid-body docking was able to provide us with accurate full-length protein structures with correct orientation of domains. We believe that these results can be of considerable interest toward rational protein and interaction design, apart from providing us with valuable information on the nature of interactions. Proteins 2014; 82:1219-1234. (c) 2013 Wiley Periodicals, Inc.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Cytosine DNA methylation protects eukaryotic genomes by silencing transposons and harmful DNAs, but also regulates gene expression during normal development. Loss of CG methylation in the Arabidopsis thaliana met1 and ddm1 mutants causes varied and stochastic developmental defects that are often inherited independently of the original met1 or ddm1 mutation. Loss of non-CG methylation in plants with combined mutations in the DRM and CMT3 genes also causes a suite of developmental defects. We show here that the pleiotropic developmental defects of drm1 drm2 cmt3 triple mutant plants are fully recessive, and unlike phenotypes caused by met1 and ddm1, are not inherited independently of the drm and cmt3 mutations. Developmental phenotypes are also reversed when drm1 drm2 cmt3 plants are transformed with DRM2 or CMT3, implying that non-CG DNA methylation is efficiently re-established by sequence-specific signals. We provide evidence that these signals include RNA silencing though the 24-nucleotide short interfering RNA (siRNA) pathway as well as histone H3K9 methylation, both of which converge on the putative chromatin-remodeling protein DRD1. These signals act in at least three partially intersecting pathways that control the locus-specific patterning of non-CG methylation by the DRM2 and CMT3 methyltransferases. Our results suggest that non-CG DNA methylation that is inherited via a network of persistent targeting signals has been co-opted to regulate developmentally important genes. © 2006 Chan et al.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Recent transcription profiling studies have revealed an unexpectedly large proportion of antisense transcripts in eukaryotic genomes. These antisense genes seem to regulate gene expression by interacting with sense genes. Previous studies have focused on the non-coding antisense genes, but the possible regulatory role of the antisense protein is poorly understood. In this study, we found that a protein encoded by the antisense gene ADF1 acts as a transcription suppressor, regulating the expression of sense gene MDF1 in Saccharomyces cerevisiae. Based on the evolutionary, genetic, cytological and biochemical evidence, we show that the protein-coding sense gene MDF1 most likely originated de novo from a previously non-coding sequence and can significantly suppress the mating efficiency of baker's yeast in rich medium by binding MAT alpha 2 and thus promote vegetative growth. These results shed new light on several important issues, including a new sense-antisense interaction mechanism, the de novo origination of a functional gene, and the regulation of yeast mating pathway.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Using next-generation sequencing technology alone, we have successfully generated and assembled a draft sequence of the giant panda genome. The assembled contigs (2.25 gigabases (Gb)) cover approximately 94% of the whole genome, and the remaining gaps (0.05 Gb) seem to contain carnivore-specific repeats and tandem repeats. Comparisons with the dog and human showed that the panda genome has a lower divergence rate. The assessment of panda genes potentially underlying some of its unique traits indicated that its bamboo diet might be more dependent on its gut microbiome than its own genetic composition. We also identified more than 2.7 million heterozygous single nucleotide polymorphisms in the diploid genome. Our data and analyses provide a foundation for promoting mammalian genetic research, and demonstrate the feasibility for using next-generation sequencing technologies for accurate, cost-effective and rapid de novo assembly of large eukaryotic genomes.