84 resultados para Prokaryotic Genomes
em CentAUR: Central Archive University of Reading - UK
Resumo:
Diversity in the chloroplast genome of 171 accessions representing the Brassica 'C' (n = 9) genome, including domesticated and wild B. oleracea and nine inter-fertile related wild species, was investigated using six chloroplast SSR (microsatellite) markers. The lack of diversity detected among 105 cultivated and wild accessions of B. oleracea contrasted starkly with that found within its wild relatives. The vast majority of B. oleracea accessions shared a single haplotype, whereas as many as six haplotypes were detected in two wild species, B. villosa Biv. and B. cretica Lam.. The SSRs proved to be highly polymorphic across haplotypes, with calculated genetic diversity values (H) of 0.23-0.87. In total, 23 different haplotypes were detected in C genome species, with an additional five haplotypes detected in B. rapa L. (A genome n = 10) and another in B. nigra L. (B genome, n = 8). The low chloroplast diversity of B. oleracea is not suggestive of multiple domestication events. The predominant B. oleracea haplotype was also common in B. incana Ten. and present in low frequencies in B. villosa, B. macrocarpa Guss, B. rupestris Raf. and B. cretica. The chloroplast SSRs reveal a wealth of diversity within wild Brassica species that will facilitate further evolutionary and phylogeographic studies of this important crop genus.
Resumo:
Background: We report an analysis of a protein network of functionally linked proteins, identified from a phylogenetic statistical analysis of complete eukaryotic genomes. Phylogenetic methods identify pairs of proteins that co-evolve on a phylogenetic tree, and have been shown to have a high probability of correctly identifying known functional links. Results: The eukaryotic correlated evolution network we derive displays the familiar power law scaling of connectivity. We introduce the use of explicit phylogenetic methods to reconstruct the ancestral presence or absence of proteins at the interior nodes of a phylogeny of eukaryote species. We find that the connectivity distribution of proteins at the point they arise on the tree and join the network follows a power law, as does the connectivity distribution of proteins at the time they are lost from the network. Proteins resident in the network acquire connections over time, but we find no evidence that 'preferential attachment' - the phenomenon of newly acquired connections in the network being more likely to be made to proteins with large numbers of connections - influences the network structure. We derive a 'variable rate of attachment' model in which proteins vary in their propensity to form network interactions independently of how many connections they have or of the total number of connections in the network, and show how this model can produce apparent power-law scaling without preferential attachment. Conclusion: A few simple rules can explain the topological structure and evolutionary changes to protein-interaction networks: most change is concentrated in satellite proteins of low connectivity and small phenotypic effect, and proteins differ in their propensity to form attachments. Given these rules of assembly, power law scaled networks naturally emerge from simple principles of selection, yielding protein interaction networks that retain a high-degree of robustness on short time scales and evolvability on longer evolutionary time scales.
Resumo:
The eukaryotic genome is a mosaic of eubacterial and archaeal genes in addition to those unique to itself. The mosaic may have arisen as the result of two prokaryotes merging their genomes, or from genes acquired from an endosymbiont of eubacterial origin. A third possibility is that the eukaryotic genome arose from successive events of lateral gene transfer over long periods of time. This theory does not exclude the endosymbiont, but questions whether it is necessary to explain the peculiar set of eukaryotic genes. We use phylogenetic studies and reconstructions of ancestral first appearances of genes on the prokaryotic phylogeny to assess evidence for the lateral gene transfer scenario. We find that phylogenies advanced to support fusion can also arise from a succession of lateral gene transfer events. Our reconstructions of ancestral first appearances of genes reveal that the various genes that make up the eukaryotic mosaic arose at different times and in diverse lineages on the prokaryotic tree, and were not available in a single lineage. Successive events of lateral gene transfer can explain the unusual mosaic structure of the eukaryotic genome, with its content linked to the immediate adaptive value of the genes its acquired. Progress in understanding eukaryotes may come from identifying ancestral features such as the eukaryotic splicesome that could explain why this lineage invaded, or created, the eukaryoticniche.
Resumo:
An important element of the developing field of proteomics is to understand protein-protein interactions and other functional links amongst genes. Across-species correlation methods for detecting functional links work on the premise that functionally linked proteins will tend to show a common pattern of presence and absence across a range of genomes. We describe a maximum likelihood statistical model for predicting functional gene linkages. The method detects independent instances of the correlated gain or loss of pairs of proteins on phylogenetic trees, reducing the high rates of false positives observed in conventional across-species methods that do not explicitly incorporate a phylogeny. We show, in a dataset of 10,551 protein pairs, that the phylogenetic method improves by up to 35% on across-species analyses at identifying known functionally linked proteins. The method shows that protein pairs with at least two to three correlated events of gain or loss are almost certainly functionally linked. Contingent evolution, in which one gene's presence or absence depends upon the presence of another, can also be detected phylogenetically, and may identify genes whose functional significance depends upon its interaction with other genes. Incorporating phylogenetic information improves the prediction of functional linkages. The improvement derives from having a lower rate of false positives and from detecting trends that across-species analyses miss. Phylogenetic methods can easily be incorporated into the screening of large-scale bioinformatics datasets to identify sets of protein links and to characterise gene networks.
Resumo:
Phylogenetic hypotheses for the largely South African genus Pelargonium L'Hér. (Geraniaceae) were derived based on DNA sequence data from nuclear, chloroplast and mitochondrial encoded regions. The datasets were unequally represented and comprised cpDNA trnL-F sequences for 152 taxa, nrDNA ITS sequences for 55 taxa, and mtDNA nad1 b/c exons for 51 taxa. Phylogenetic hypotheses derived from the separate three datasets were overall congruent. A single hypothesis synthesising the information in the three datasets was constructed following a total evidence approach and implementing dataset specific stepmatrices in order to correct for substitution biases. Pelargonium was found to consist of five main clades, some with contrasting evolutionary patterns with respect to biogeographic distributions, dispersal capacity, pollination biology and karyological diversification. The five main clades are structured in two (subgeneric) clades that correlate with chromosome size. One of these clades includes a "winter rainfall clade" containing more than 70% of all currently described Pelargonium species, and all restricted to the South African Cape winter rainfall region. Apart from (woody) shrubs and small herbaceous rosette subshrubs, this clade comprises a large "xerophytic" clade including geophytes, stem and leaf succulents, harbouring in total almost half of the genus. This clade is considered to be the result of in situ proliferation, possibly in response to late-Miocene and Pliocene aridification events. Nested within it is a radiation comprising c. 80 species from the geophytic Pelargonium section Hoarea, all characterised by the possession of (a series of) tunicate tubers.
Resumo:
Homeobox genes encode DNA-binding proteins, many of which are implicated in the control of embryonic development. Evolutionarily, most homeobox genes fall into two related clades: the ANTP and the PRD classes. Some genes in ANTP class, notably Hox, ParaHox, and NK genes, have an intriguing arrangement into physical clusters. To investigate the evolutionary history of these gene clusters, we examined homeobox gene chromosomal locations in the cephalochordate amphioxus, Branchiostoma floridae. We deduce that 22 amphioxus ANTP class homeobox genes localize in just three chromosomes. One contains the Hox cluster plus AmphiEn, AmphiMnx, and AmphiDll. The ParaHox cluster resides in another chromosome, whereas a third chromosome contains the NK type homeobox genes, including AmphiMsx and ArnphiTlx. By comparative analysis we infer that clustering of ANTP class homeobox genes evolved just once, during a series of extensive cis-duplication events of genes early in animal evolution. A trans-duplication event occurred later to yield the Hox and ParaHox gene clusters on different chromosomes. The results obtained have implications for understanding the origin of homeobox gene clustering, the diversification of the ANTP class of homeobox genes, and the evolution of animal genomes.
Resumo:
Nucleotides in the terminal loop of the poliovirus 2C cis-acting replication element (2C(CRE)), a 61 nt structured RNA, function as the template for the addition of two uridylate (U) residues to the viral protein VPg. This uridylylation reaction leads to the formation of VPgpUpU, which is used by the viral RNA polymerase as a nucleotide-peptide primer for genome replication. Although VPg primes both positive- and negative-strand replication, the specific requirement for 2C(CRE)-mediated uridylylation for one or both events has not been demonstrated. We have used a cell-free in vitro translation and replication reaction to demonstrate that 2C(CRE) is not required for the initiation of the negative-sense strand, which is synthesized in the absence of 2C(CRE)-mediated VPgpUpU formation. We propose that the 3' poly(A) tail could serve as the template for the formation of a VPg-poly(U) primer that functions in the initiation of negative-sense strands.
Resumo:
Biological Crossover occurs during the early stages of meiosis. During this process the chromosomes undergoing crossover are synapsed together at a number of homogenous sequence sections, it is within such synapsed sections that crossover occurs. The SVLC (Synapsing Variable Length Crossover) Algorithm recurrently synapses homogenous genetic sequences together in order of length. The genomes are considered to be flexible with crossover only being permitted within the synapsed sections. Consequently, common sequences are automatically preserved with only the genetic differences being exchanged, independent of the length of such differences. In addition to providing a rationale for variable length crossover it also provides a genotypic similarity metric for variable length genomes enabling standard niche formation techniques to be utilised. In a simple variable length test problem the SVLC algorithm outperforms current variable length crossover techniques.
Synapsing variable length crossover: An algorithm for crossing and comparing variable length genomes
Resumo:
The Synapsing Variable Length Crossover (SVLC) algorithm provides a biologically inspired method for performing meaningful crossover between variable length genomes. In addition to providing a rationale for variable length crossover it also provides a genotypic similarity metric for variable length genomes enabling standard niche formation techniques to be used with variable length genomes. Unlike other variable length crossover techniques which consider genomes to be rigid inflexible arrays and where some or all of the crossover points are randomly selected, the SVLC algorithm considers genomes to be flexible and chooses non-random crossover points based on the common parental sequence similarity. The SVLC Algorithm recurrently "glues" or synapses homogenous genetic sub-sequences together. This is done in such a way that common parental sequences are automatically preserved in the offspring with only the genetic differences being exchanged or removed, independent of the length of such differences. In a variable length test problem the SVLC algorithm is shown to outperform current variable length crossover techniques. The SVLC algorithm is also shown to work in a more realistic robot neural network controller evolution application.
Resumo:
The synapsing variable-length crossover (SVLC algorithm provides a biologically inspired method for performing meaningful crossover between variable-length genomes. In addition to providing a rationale for variable-length crossover, it also provides a genotypic similarity metric for variable-length genomes, enabling standard niche formation techniques to be used with variable-length genomes. Unlike other variable-length crossover techniques which consider genomes to be rigid inflexible arrays and where some or all of the crossover points are randomly selected, the SVLC algorithm considers genomes to be flexible and chooses non-random crossover points based on the common parental sequence similarity. The SVLC algorithm recurrently "glues" or synapses homogenous genetic subsequences together. This is done in such a way that common parental sequences are automatically preserved in the offspring with only the genetic differences being exchanged or removed, independent of the length of such differences. In a variable-length test problem, the SVLC algorithm compares favorably with current variable-length crossover techniques. The variable-length approach is further advocated by demonstrating how a variable-length genetic algorithm (GA) can obtain a high fitness solution in fewer iterations than a traditional fixed-length GA in a two-dimensional vector approximation task.
Resumo:
Members of the genus Pseudomonas inhabit a wide variety of environments, which is reflected in their versatile metabolic capacity and broad potential for adaptation to fluctuating environmental conditions. Here, we examine and compare the genomes of a range of Pseudomonas spp. encompassing plant, insect and human pathogens, and environmental saprophytes. In addition to a large number of allelic differences of common genes that confer regulatory and metabolic flexibility, genome analysis suggests that many other factors contribute to the diversity and adaptability of Pseudomonas spp. Horizontal gene transfer has impacted the capability of pathogenic Pseudomonas spp. in terms of disease severity (Pseudomonas aeruginosa) and specificity (Pseudomonas syringae). Genome rearrangements likely contribute to adaptation, and a considerable complement of unique genes undoubtedly contributes to strain- and species-specific activities by as yet unknown mechanisms. Because of the lack of conserved phenotypic differences, the classification of the genus has long been contentious. DNA hybridization and genome-based analyses show close relationships among members of P. aeruginosa, but that isolates within the Pseudomonas fluorescens and P. syringae species are less closely related and may constitute different species. Collectively, genome sequences of Pseudomonas spp. have provided insights into pathogenesis and the genetic basis for diversity and adaptation.
Resumo:
Currently, the Genomic Threading Database (GTD) contains structural assignments for the proteins encoded within the genomes of nine eukaryotes and 101 prokaryotes. Structural annotations are carried out using a modified version of GenTHREADER, a reliable fold recognition method. The Gen THREADER annotation jobs are distributed across multiple clusters of processors using grid technology and the predictions are deposited in a relational database accessible via a web interface at http://bioinf.cs.ucl.ac.uk/GTD. Using this system, up to 84% of proteins encoded within a genome can be confidently assigned to known folds with 72% of the residues aligned. On average in the GTD, 64% of proteins encoded within a genome are confidently assigned to known folds and 58% of the residues are aligned to structures.
Resumo:
Background: Microarray based comparative genomic hybridisation (CGH) experiments have been used to study numerous biological problems including understanding genome plasticity in pathogenic bacteria. Typically such experiments produce large data sets that are difficult for biologists to handle. Although there are some programmes available for interpretation of bacterial transcriptomics data and CGH microarray data for looking at genetic stability in oncogenes, there are none specifically to understand the mosaic nature of bacterial genomes. Consequently a bottle neck still persists in accurate processing and mathematical analysis of these data. To address this shortfall we have produced a simple and robust CGH microarray data analysis process that may be automated in the future to understand bacterial genomic diversity. Results: The process involves five steps: cleaning, normalisation, estimating gene presence and absence or divergence, validation, and analysis of data from test against three reference strains simultaneously. Each stage of the process is described and we have compared a number of methods available for characterising bacterial genomic diversity, for calculating the cut-off between gene presence and absence or divergence, and shown that a simple dynamic approach using a kernel density estimator performed better than both established, as well as a more sophisticated mixture modelling technique. We have also shown that current methods commonly used for CGH microarray analysis in tumour and cancer cell lines are not appropriate for analysing our data. Conclusion: After carrying out the analysis and validation for three sequenced Escherichia coli strains, CGH microarray data from 19 E. coli O157 pathogenic test strains were used to demonstrate the benefits of applying this simple and robust process to CGH microarray studies using bacterial genomes.
Resumo:
Life-history traits vary substantially across species, and have been demonstrated to affect substitution rates. We compute genomewide, branch-specific estimates of male mutation bias (the ratio of male-to-female mutation rates) across 32 mammalian genomes and study how these vary with life-history traits (generation time, metabolic rate, and sperm competition). We also investigate the influence of life-history traits on substitution rates at unconstrained sites across a wide phylogenetic range. We observe that increased generation time is the strongest predictor of variation in both substitution rates (for which it is a negative predictor) and male mutation bias (for which it is a positive predictor). Although less significant, we also observe that estimates of metabolic rate, reflecting replication-independent DNA damage and repair mechanisms, correlate negatively with autosomal substitution rates, and positively with male mutation bias. Finally, in contrast to expectations, we find no significant correlation between sperm competition and either autosomal substitution rates or male mutation bias. Our results support the important but frequently opposite effects of some, but not all, life history traits on substitution rates. KEY WORDS: Generation time, genome evolution, metabolic rate, sperm competition.
Resumo:
Background: Targeted Induced Loci Lesions IN Genomes (TILLING) is increasingly being used to generate and identify mutations in target genes of crop genomes. TILLING populations of several thousand lines have been generated in a number of crop species including Brassica rapa. Genetic analysis of mutants identified by TILLING requires an efficient, high-throughput and cost effective genotyping method to track the mutations through numerous generations. High resolution melt (HRM) analysis has been used in a number of systems to identify single nucleotide polymorphisms (SNPs) and insertion/deletions (IN/DELs) enabling the genotyping of different types of samples. HRM is ideally suited to high-throughput genotyping of multiple TILLING mutants in complex crop genomes. To date it has been used to identify mutants and genotype single mutations. The aim of this study was to determine if HRM can facilitate downstream analysis of multiple mutant lines identified by TILLING in order to characterise allelic series of EMS induced mutations in target genes across a number of generations in complex crop genomes. Results: We demonstrate that HRM can be used to genotype allelic series of mutations in two genes, BraA.CAX1a and BraA.MET1.a in Brassica rapa. We analysed 12 mutations in BraA.CAX1.a and five in BraA.MET1.a over two generations including a back-cross to the wild-type. Using a commercially available HRM kit and the Lightscanner™ system we were able to detect mutations in heterozygous and homozygous states for both genes. Conclusions: Using HRM genotyping on TILLING derived mutants, it is possible to generate an allelic series of mutations within multiple target genes rapidly. Lines suitable for phenotypic analysis can be isolated approximately 8-9 months (3 generations) from receiving M3 seed of Brassica rapa from the RevGenUK TILLING service.