952 resultados para Prokaryotic Genomes
Resumo:
Termites have colonized many habitats and are among the most abundant animals in tropical ecosystems, which they modify considerably through their actions. The timing of their rise in abundance and of the dispersal events that gave rise to modern termite lineages is not well understood. To shed light on termite origins and diversification, we sequenced the mitochondrial genome of 48 termite species and combined them with 18 previously sequenced termite mitochondrial genomes for phylogenetic and molecular clock analyses using multiple fossil calibrations. The 66 genomes represent most major clades of termites. Unlike previous phylogenetic studies based on fewer molecular data, our phylogenetic tree is fully resolved for the lower termites. The phylogenetic positions of Macrotermitinae and Apicotermitinae are also resolved as the basal groups in the higher termites, but in the crown termitid groups, including Termitinae + Syntermitinae + Nasutitermitinae + Cubitermitinae, the position of some nodes remains uncertain. Our molecular clock tree indicates that the lineages leading to termites and Cryptocercus roaches diverged 170 Ma (153-196 Ma 95% confidence interval [CI]), that modern Termitidae arose 54 Ma (46-66 Ma 95% CI), and that the crown termitid group arose 40 Ma (35-49 Ma 95% CI). This indicates that the distribution of basal termite clades was influenced by the final stages of the breakup of Pangaea. Our inference of ancestral geographic ranges shows that the Termitidae, which includes more than 75% of extant termite species, most likely originated in Africa or Asia, and acquired their pantropical distribution after a series of dispersal and subsequent diversification events.
Resumo:
Whole genome sequences are generally accepted as excellent tools for studying evolutionary relationships. Due to the problems caused by the uncertainty in alignment, existing tools for phylogenetic analysis based on multiple alignments could not be directly applied to the whole-genome comparison and phylogenomic studies. There has been a growing interest in alignment-free methods for phylogenetic analysis using complete genome data. The “distances” used in these alignment-free methods are not proper distance metrics in the strict mathematical sense. In this study, we first review them in a more general frame — dissimilarity. Then we propose some new dissimilarities for phylogenetic analysis. Last three genome datasets are employed to evaluate these dissimilarities from a biological point of view.
Resumo:
The uses of genetic sequences to inform, enable or create products or services for human biomedicine are substantially different from their uses in crop-based agriculture. Here, we explore what similarities and differences may emerge in patent use and strategies, and map patent-disclosed sequences onto three important plant genomes: maize (corn), rice and soybean. We focus on those referenced in the granted patent claims to compare their uses to the approach used in human gene patenting.
Resumo:
Postnatal myofibre characteristics and muscle mass are largely determined during fetal development and may be significantly affected by epigenetic parent-of-origin effects. However, data on such effects in prenatal muscle development that could help understand unexplained variation in postnatal muscle traits are lacking. In a bovine model we studied effects of distinct maternal and paternal genomes, fetal sex, and non-genetic maternal effects on fetal myofibre characteristics and muscle mass. Data from 73 fetuses (Day153, 54% term) of four genetic groups with purebred and reciprocal cross Angus and Brahman genetics were analyzed using general linear models. Parental genomes explained the greatest proportion of variation in myofibre size of Musculus semitendinosus (80-96%) and in absolute and relative weights of M. supraspinatus, M. longissimus dorsi, M. quadriceps femoris and M. semimembranosus (82-89% and 56-93%, respectively). Paternal genome in interaction with maternal genome (P<0.05) explained most genetic variation in cross sectional area (CSA) of fast myotubes (68%), while maternal genome alone explained most genetic variation in CSA of fast myofibres (93%, P<0.01). Furthermore, maternal genome independently (M. semimembranosus, 88%, P<0.0001) or in combination (M. supraspinatus, 82%; M. longissimus dorsi, 93%; M. quadriceps femoris, 86%) with nested maternal weight effect (5-6%, P<0.05), was the predominant source of variation for absolute muscle weights. Effects of paternal genome on muscle mass decreased from thoracic to pelvic limb and accounted for all (M. supraspinatus, 97%, P<0.0001) or most (M. longissimus dorsi, 69%, P<0.0001; M. quadriceps femoris, 54%, P<0.001) genetic variation in relative weights. An interaction between maternal and paternal genomes (P<0.01) and effects of maternal weight (P<0.05) on expression of H19, a master regulator of an imprinted gene network, and negative correlations between H19 expression and fetal muscle mass (P<0.001), suggested imprinted genes and miRNA interference as mechanisms for differential effects of maternal and paternal genomes on fetal muscle.
Resumo:
The rapid increase in genome sequence information has necessitated the annotation of their functional elements, particularly those occurring in the non-coding regions, in the genomic context. Promoter region is the key regulatory region, which enables the gene to be transcribed or repressed, but it is difficult to determine experimentally. Hence an in silico identification of promoters is crucial in order to guide experimental work and to pin point the key region that controls the transcription initiation of a gene. In this analysis, we demonstrate that while the promoter regions are in general less stable than the flanking regions, their average free energy varies depending on the GC composition of the flanking genomic sequence. We have therefore obtained a set of free energy threshold values, for genomic DNA with varying GC content and used them as generic criteria for predicting promoter regions in several microbial genomes, using an in-house developed tool `PromPredict'. On applying it to predict promoter regions corresponding to the 1144 and 612 experimentally validated TSSs in E. coli (50.8% GC) and B. subtilis (43.5% GC) sensitivity of 99% and 95% and precision values of 58% and 60%, respectively, were achieved. For the limited data set of 81 TSSs available for M. tuberculosis (65.6% GC) a sensitivity of 100% and precision of 49% was obtained.
Resumo:
The csrA is a carbon storage regulator gene that encodes a protein with multiple RNA interaction sites. Bacterial non-coding small RNAs like csrB, csrC and their counterparts in diverse bacterial genus are identified to control the regulatory activities of CsrA and its orthologs. An attempt has been made in this study to identify 'novel' non-coding small RNAs that are involved in the regulatory activities of csrA gene. All CsrA-interacting small RNAs are computationally fingerprinted to have multiple occurrence of 7-nucleotide CsrA interacting repeats [CAGGA(U/A/C)G] along with a 18-nucleotide upstream binding site. However, in several of the genomes like Haemophilus spp, the upstream binding site is not identified. The current methodology overcomes this difficulty by identifying small RNA-specific orphan transcriptional units within the intergenic regions of the genome. The results could identify all known CsrA-interacting small RNAs in E. coli, Vibrio cholerae and Pseudomonas aeruginosa genomes and additionally has picked six new possible CsrA-interacting small RNA regions in E. coli. Our computational analysis indicates that known rygD and rprA sRNAs in E. coli could possibly interact with CsrA proteins. The study is extended to three of the Haemophilus genomes that could identify seven new possible CsrA interacting small RNAs.
Resumo:
Muscoidea is a significant dipteran clade that includes house flies (Family Muscidae), latrine flies (F. Fannidae), dung flies (F. Scathophagidae) and root maggot flies (F. Anthomyiidae). It is comprised of approximately 7000 described species. The monophyly of the Muscoidea and the precise relationships of muscoids to the closest superfamily the Oestroidea (blow flies, flesh flies etc) are both unresolved. Until now mitochondrial (mt) genomes were available for only two of the four muscoid families precluding a thorough test of phylogenetic relationships using this data source. Here we present the first two mt genomes for the families Fanniidae (Euryomma sp.) (family Fanniidae) and Anthomyiidae (Delia platura (Meigen, 1826)). We also conducted phylogenetic analyses containing of these newly sequenced mt genomes plus 15 other species representative of dipteran diversity to address the internal relationship of Muscoidea and its systematic position. Both maximum-likelihood and Bayesian analyses suggested that Muscoidea was not a monophyletic group with the relationship: (Fanniidae + Muscidae) + ((Anthomyiidae + Scathophagidae) + (Calliphoridae + Sarcophagidae)), supported by the majority of analysed datasets. This also infers that Oestroidea was paraphyletic in the majority of analyses. Divergence time estimation suggested that the earliest split within the Calyptratae, separating (Tachinidae + Oestridae) from the remaining families, occurred in the Early Eocene. The main divergence within the paraphyletic muscoidea grade was between Fanniidae + Muscidae and the lineage ((Anthomyiidae + Scathophagidae) + (Calliphoridae + Sarcophagidae)) which occurred in the Late Eocene
Resumo:
A repetitive sequence collection is one where portions of a base sequence of length n are repeated many times with small variations, forming a collection of total length N. Examples of such collections are version control data and genome sequences of individuals, where the differences can be expressed by lists of basic edit operations. Flexible and efficient data analysis on a such typically huge collection is plausible using suffix trees. However, suffix tree occupies O(N log N) bits, which very soon inhibits in-memory analyses. Recent advances in full-text self-indexing reduce the space of suffix tree to O(N log σ) bits, where σ is the alphabet size. In practice, the space reduction is more than 10-fold, for example on suffix tree of Human Genome. However, this reduction factor remains constant when more sequences are added to the collection. We develop a new family of self-indexes suited for the repetitive sequence collection setting. Their expected space requirement depends only on the length n of the base sequence and the number s of variations in its repeated copies. That is, the space reduction factor is no longer constant, but depends on N / n. We believe the structures developed in this work will provide a fundamental basis for storage and retrieval of individual genomes as they become available due to rapid progress in the sequencing technologies.
Resumo:
Background:Bacterial non-coding small RNAs (sRNAs) have attracted considerable attention due to their ubiquitous nature and contribution to numerous cellular processes including survival, adaptation and pathogenesis. Existing computational approaches for identifying bacterial sRNAs demonstrate varying levels of success and there remains considerable room for improvement. Methodology/Principal Findings: Here we have proposed a transcriptional signal-based computational method to identify intergenic sRNA transcriptional units (TUs) in completely sequenced bacterial genomes. Our sRNAscanner tool uses position weight matrices derived from experimentally defined E. coli K-12 MG1655 sRNA promoter and rho-independent terminator signals to identify intergenic sRNA TUs through sliding window based genome scans. Analysis of genomes representative of twelve species suggested that sRNAscanner demonstrated equivalent sensitivity to sRNAPredict2, the best performing bioinformatics tool available presently. However, each algorithm yielded substantial numbers of known and uncharacterized hits that were unique to one or the other tool only. sRNAscanner identified 118 novel putative intergenic sRNA genes in Salmonella enterica Typhimurium LT2, none of which were flagged by sRNAPredict2. Candidate sRNA locations were compared with available deep sequencing libraries derived from Hfq-co-immunoprecipitated RNA purified from a second Typhimurium strain (Sittka et al. (2008) PLoS Genetics 4: e1000163). Sixteen potential novel sRNAs computationally predicted and detected in deep sequencing libraries were selected for experimental validation by Northern analysis using total RNA isolated from bacteria grown under eleven different growth conditions. RNA bands of expected sizes were detected in Northern blots for six of the examined candidates. Furthermore, the 5'-ends of these six Northern-supported sRNA candidates were successfully mapped using 5'-RACE analysis. Conclusions/Significance: We have developed, computationally examined and experimentally validated the sRNAscanner algorithm. Data derived from this study has successfully identified six novel S. Typhimurium sRNA genes. In addition, the computational specificity analysis we have undertaken suggests that similar to 40% of sRNAscanner hits with high cumulative sum of scores represent genuine, undiscovered sRNA genes. Collectively, these data strongly support the utility of sRNAscanner and offer a glimpse of its potential to reveal large numbers of sRNA genes that have to date defied identification. sRNAscanner is available from: http://bicmku.in:8081/sRNAscanner or http://cluster.physics.iisc.ernet.in/sRNAscanner/.
Resumo:
Sixteen million nucleotide sequence of genome of various organisms have been analysed to detect and study the extent of occurrence of simple repetitive sequences. Two sequence motifs (TG/CA)n and (CT/AG)n capable of adopting unusual DNA structures, left handed Z-conformation and triple-helical conformation respectively, are found to be abundant in rodent and human genomes, but almost completely absent in bacterial genome. (TG/CA)n and (CT/AG)n sequences are present mostly in the intron or 5'/3' flanking regions of the genes. The presence of such repeat motifs in genomic sequence of higher eukaryotes has been correlated with their possible functional significance in nucleosome organization, recombination and gene expression.
Resumo:
Motivation: The number of bacterial genomes being sequenced is increasing very rapidly and hence, it is crucial to have procedures for rapid and reliable annotation of their functional elements such as promoter regions, which control the expression of each gene or each transcription unit of the genome. The present work addresses this requirement and presents a generic method applicable across organisms. Results: Relative stability of the DNA double helical sequences has been used to discriminate promoter regions from non-promoter regions. Based on the difference in stability between neighboring regions, an algorithm has been implemented to predict promoter regions on a large scale over 913 microbial genome sequences. The average free energy values for the promoter regions as well as their downstream regions are found to differ, depending on their GC content. Threshold values to identify promoter regions have been derived using sequences flanking a subset of translation start sites from all microbial genomes and then used to predict promoters over the complete genome sequences. An average recall value of 72% (which indicates the percentage of protein and RNA coding genes with predicted promoter regions assigned to them) and precision of 56% is achieved over the 913 microbial genome dataset.
Resumo:
In this article we describe and demonstrate the versatility of a computer program, GENOME MAPPING, that uses interactive graphics and runs on an IRIS workstation. The program helps to visualize as well as analyse global and local patterns of genomic DNA sequences. It was developed keeping in mind the requirements of the human genome sequencing programme, which requires rapid analysis of the data. Using GENOME MAPPING one can discern signature patterns of different kinds of sequences and analyse such patterns for repetitive as well as rare sequence strings. Further, one can visualize the extent of global homology between different genomic sequences. An application of our method to the published yeast mitochondrial genome data shows similar sequence organizations in the entire sequence and in smaller subsequences.
Resumo:
In this paper, we report an analysis of the protein sequence length distribution for 13 bacteria, four archaea and one eukaryote whose genomes have been completely sequenced, The frequency distribution of protein sequence length for all the 18 organisms are remarkably similar, independent of genome size and can be described in terms of a lognormal probability distribution function. A simple stochastic model based on multiplicative processes has been proposed to explain the sequence length distribution. The stochastic model supports the random-origin hypothesis of protein sequences in genomes. Distributions of large proteins deviate from the overall lognormal behavior. Their cumulative distribution follows a power-law analogous to Pareto's law used to describe the income distribution of the wealthy. The protein sequence length distribution in genomes of organisms has important implications for microbial evolution and applications. (C) 1999 Elsevier Science B.V. All rights reserved.
Resumo:
The cis-regulatory regions on DNA serve as binding sites for proteins such as transcription factors and RNA polymerase. The combinatorial interaction of these proteins plays a crucial role in transcription initiation, which is an important point of control in the regulation of gene expression. We present here an analysis of the performance of an in silico method for predicting cis-regulatory regions in the plant genomes of Arabidopsis (Arabidopsis thaliana) and rice (Oryza sativa) on the basis of free energy of DNA melting. For protein-coding genes, we achieve recall and precision of 96% and 42% for Arabidopsis and 97% and 31% for rice, respectively. For noncoding RNA genes, the program gives recall and precision of 94% and 75% for Arabidopsis and 95% and 90% for rice, respectively. Moreover, 96% of the false-positive predictions were located in noncoding regions of primary transcripts, out of which 20% were found in the first intron alone, indicating possible regulatory roles. The predictions for orthologous genes from the two genomes showed a good correlation with respect to prediction scores and promoter organization. Comparison of our results with an existing program for promoter prediction in plant genomes indicates that our method shows improved prediction capability.
Resumo:
Restriction-modification (R-M) systems are ubiquitous and are often considered primitive immune systems in bacteria. Their diversity and prevalence across the prokaryotic kingdom are an indication of their success as a defense mechanism against invading genomes. However, their cellular defense function does not adequately explain the basis for their immaculate specificity in sequence recognition and nonuniform distribution, ranging from none to too many, in diverse species. The present review deals with new developments which provide insights into the roles of these enzymes in other aspects of cellular function. In this review, emphasis is placed on novel hypotheses and various findings that have not yet been dealt with in a critical review. Emerging studies indicate their role in various cellular processes other than host defense, virulence, and even controlling the rate of evolution of the organism. We also discuss how R-M systems could have successfully evolved and be involved in additional cellular portfolios, thereby increasing the relative fitness of their hosts in the population.