65 resultados para Prokaryotic Genomes
Resumo:
Background:Overwhelming majority of the Serine/Threonine protein kinases identified by gleaning archaeal and eubacterial genomes could not be classified into any of the well known Hanks and Hunter subfamilies of protein kinases. This is owing to the development of Hanks and Hunter classification scheme based on eukaryotic protein kinases which are highly divergent from their prokaryotic homologues. A large dataset of prokaryotic Serine/Threonine protein kinases recognized from genomes of prokaryotes have been used to develop a classification framework for prokaryotic Ser/Thr protein kinases. Methodology/Principal Findings: We have used traditional sequence alignment and phylogenetic approaches and clustered the prokaryotic kinases which represent 72 subfamilies with at least 4 members in each. Such a clustering enables classification of prokaryotic Ser/Thr kinases and it can be used as a framework to classify newly identified prokaryotic Ser/Thr kinases. After series of searches in a comprehensive sequence database we recognized that 38 subfamilies of prokaryotic protein kinases are associated to a specific taxonomic level. For example 4, 6 and 3 subfamilies have been identified that are currently specific to phylum proteobacteria, cyanobacteria and actinobacteria respectively. Similarly subfamilies which are specific to an order, sub-order, class, family and genus have also been identified. In addition to these, we also identify organism-diverse subfamilies. Members of these clusters are from organisms of different taxonomic levels, such as archaea, bacteria, eukaryotes and viruses.Conclusion/Significance: Interestingly, occurrence of several taxonomic level specific subfamilies of prokaryotic kinases contrasts with classification of eukaryotic protein kinases in which most of the popular subfamilies of eukaryotic protein kinases occur diversely in several eukaryotes. Many prokaryotic Ser/Thr kinases exhibit a wide variety of modular organization which indicates a degree of complexity and protein-protein interactions in the signaling pathways in these microbes.
Resumo:
The rapid increase in genome sequence information has necessitated the annotation of their functional elements, particularly those occurring in the non-coding regions, in the genomic context. Promoter region is the key regulatory region, which enables the gene to be transcribed or repressed, but it is difficult to determine experimentally. Hence an in silico identification of promoters is crucial in order to guide experimental work and to pin point the key region that controls the transcription initiation of a gene. In this analysis, we demonstrate that while the promoter regions are in general less stable than the flanking regions, their average free energy varies depending on the GC composition of the flanking genomic sequence. We have therefore obtained a set of free energy threshold values, for genomic DNA with varying GC content and used them as generic criteria for predicting promoter regions in several microbial genomes, using an in-house developed tool `PromPredict'. On applying it to predict promoter regions corresponding to the 1144 and 612 experimentally validated TSSs in E. coli (50.8% GC) and B. subtilis (43.5% GC) sensitivity of 99% and 95% and precision values of 58% and 60%, respectively, were achieved. For the limited data set of 81 TSSs available for M. tuberculosis (65.6% GC) a sensitivity of 100% and precision of 49% was obtained.
Resumo:
The csrA is a carbon storage regulator gene that encodes a protein with multiple RNA interaction sites. Bacterial non-coding small RNAs like csrB, csrC and their counterparts in diverse bacterial genus are identified to control the regulatory activities of CsrA and its orthologs. An attempt has been made in this study to identify 'novel' non-coding small RNAs that are involved in the regulatory activities of csrA gene. All CsrA-interacting small RNAs are computationally fingerprinted to have multiple occurrence of 7-nucleotide CsrA interacting repeats [CAGGA(U/A/C)G] along with a 18-nucleotide upstream binding site. However, in several of the genomes like Haemophilus spp, the upstream binding site is not identified. The current methodology overcomes this difficulty by identifying small RNA-specific orphan transcriptional units within the intergenic regions of the genome. The results could identify all known CsrA-interacting small RNAs in E. coli, Vibrio cholerae and Pseudomonas aeruginosa genomes and additionally has picked six new possible CsrA-interacting small RNA regions in E. coli. Our computational analysis indicates that known rygD and rprA sRNAs in E. coli could possibly interact with CsrA proteins. The study is extended to three of the Haemophilus genomes that could identify seven new possible CsrA interacting small RNAs.
Resumo:
Background:Bacterial non-coding small RNAs (sRNAs) have attracted considerable attention due to their ubiquitous nature and contribution to numerous cellular processes including survival, adaptation and pathogenesis. Existing computational approaches for identifying bacterial sRNAs demonstrate varying levels of success and there remains considerable room for improvement. Methodology/Principal Findings: Here we have proposed a transcriptional signal-based computational method to identify intergenic sRNA transcriptional units (TUs) in completely sequenced bacterial genomes. Our sRNAscanner tool uses position weight matrices derived from experimentally defined E. coli K-12 MG1655 sRNA promoter and rho-independent terminator signals to identify intergenic sRNA TUs through sliding window based genome scans. Analysis of genomes representative of twelve species suggested that sRNAscanner demonstrated equivalent sensitivity to sRNAPredict2, the best performing bioinformatics tool available presently. However, each algorithm yielded substantial numbers of known and uncharacterized hits that were unique to one or the other tool only. sRNAscanner identified 118 novel putative intergenic sRNA genes in Salmonella enterica Typhimurium LT2, none of which were flagged by sRNAPredict2. Candidate sRNA locations were compared with available deep sequencing libraries derived from Hfq-co-immunoprecipitated RNA purified from a second Typhimurium strain (Sittka et al. (2008) PLoS Genetics 4: e1000163). Sixteen potential novel sRNAs computationally predicted and detected in deep sequencing libraries were selected for experimental validation by Northern analysis using total RNA isolated from bacteria grown under eleven different growth conditions. RNA bands of expected sizes were detected in Northern blots for six of the examined candidates. Furthermore, the 5'-ends of these six Northern-supported sRNA candidates were successfully mapped using 5'-RACE analysis. Conclusions/Significance: We have developed, computationally examined and experimentally validated the sRNAscanner algorithm. Data derived from this study has successfully identified six novel S. Typhimurium sRNA genes. In addition, the computational specificity analysis we have undertaken suggests that similar to 40% of sRNAscanner hits with high cumulative sum of scores represent genuine, undiscovered sRNA genes. Collectively, these data strongly support the utility of sRNAscanner and offer a glimpse of its potential to reveal large numbers of sRNA genes that have to date defied identification. sRNAscanner is available from: http://bicmku.in:8081/sRNAscanner or http://cluster.physics.iisc.ernet.in/sRNAscanner/.
Resumo:
Sixteen million nucleotide sequence of genome of various organisms have been analysed to detect and study the extent of occurrence of simple repetitive sequences. Two sequence motifs (TG/CA)n and (CT/AG)n capable of adopting unusual DNA structures, left handed Z-conformation and triple-helical conformation respectively, are found to be abundant in rodent and human genomes, but almost completely absent in bacterial genome. (TG/CA)n and (CT/AG)n sequences are present mostly in the intron or 5'/3' flanking regions of the genes. The presence of such repeat motifs in genomic sequence of higher eukaryotes has been correlated with their possible functional significance in nucleosome organization, recombination and gene expression.
Resumo:
Motivation: The number of bacterial genomes being sequenced is increasing very rapidly and hence, it is crucial to have procedures for rapid and reliable annotation of their functional elements such as promoter regions, which control the expression of each gene or each transcription unit of the genome. The present work addresses this requirement and presents a generic method applicable across organisms. Results: Relative stability of the DNA double helical sequences has been used to discriminate promoter regions from non-promoter regions. Based on the difference in stability between neighboring regions, an algorithm has been implemented to predict promoter regions on a large scale over 913 microbial genome sequences. The average free energy values for the promoter regions as well as their downstream regions are found to differ, depending on their GC content. Threshold values to identify promoter regions have been derived using sequences flanking a subset of translation start sites from all microbial genomes and then used to predict promoters over the complete genome sequences. An average recall value of 72% (which indicates the percentage of protein and RNA coding genes with predicted promoter regions assigned to them) and precision of 56% is achieved over the 913 microbial genome dataset.
Resumo:
In this article we describe and demonstrate the versatility of a computer program, GENOME MAPPING, that uses interactive graphics and runs on an IRIS workstation. The program helps to visualize as well as analyse global and local patterns of genomic DNA sequences. It was developed keeping in mind the requirements of the human genome sequencing programme, which requires rapid analysis of the data. Using GENOME MAPPING one can discern signature patterns of different kinds of sequences and analyse such patterns for repetitive as well as rare sequence strings. Further, one can visualize the extent of global homology between different genomic sequences. An application of our method to the published yeast mitochondrial genome data shows similar sequence organizations in the entire sequence and in smaller subsequences.
Resumo:
In this paper, we report an analysis of the protein sequence length distribution for 13 bacteria, four archaea and one eukaryote whose genomes have been completely sequenced, The frequency distribution of protein sequence length for all the 18 organisms are remarkably similar, independent of genome size and can be described in terms of a lognormal probability distribution function. A simple stochastic model based on multiplicative processes has been proposed to explain the sequence length distribution. The stochastic model supports the random-origin hypothesis of protein sequences in genomes. Distributions of large proteins deviate from the overall lognormal behavior. Their cumulative distribution follows a power-law analogous to Pareto's law used to describe the income distribution of the wealthy. The protein sequence length distribution in genomes of organisms has important implications for microbial evolution and applications. (C) 1999 Elsevier Science B.V. All rights reserved.
Resumo:
The cis-regulatory regions on DNA serve as binding sites for proteins such as transcription factors and RNA polymerase. The combinatorial interaction of these proteins plays a crucial role in transcription initiation, which is an important point of control in the regulation of gene expression. We present here an analysis of the performance of an in silico method for predicting cis-regulatory regions in the plant genomes of Arabidopsis (Arabidopsis thaliana) and rice (Oryza sativa) on the basis of free energy of DNA melting. For protein-coding genes, we achieve recall and precision of 96% and 42% for Arabidopsis and 97% and 31% for rice, respectively. For noncoding RNA genes, the program gives recall and precision of 94% and 75% for Arabidopsis and 95% and 90% for rice, respectively. Moreover, 96% of the false-positive predictions were located in noncoding regions of primary transcripts, out of which 20% were found in the first intron alone, indicating possible regulatory roles. The predictions for orthologous genes from the two genomes showed a good correlation with respect to prediction scores and promoter organization. Comparison of our results with an existing program for promoter prediction in plant genomes indicates that our method shows improved prediction capability.
Resumo:
Restriction-modification (R-M) systems are ubiquitous and are often considered primitive immune systems in bacteria. Their diversity and prevalence across the prokaryotic kingdom are an indication of their success as a defense mechanism against invading genomes. However, their cellular defense function does not adequately explain the basis for their immaculate specificity in sequence recognition and nonuniform distribution, ranging from none to too many, in diverse species. The present review deals with new developments which provide insights into the roles of these enzymes in other aspects of cellular function. In this review, emphasis is placed on novel hypotheses and various findings that have not yet been dealt with in a critical review. Emerging studies indicate their role in various cellular processes other than host defense, virulence, and even controlling the rate of evolution of the organism. We also discuss how R-M systems could have successfully evolved and be involved in additional cellular portfolios, thereby increasing the relative fitness of their hosts in the population.
Resumo:
Candida albicans and Candida dubliniensis are diploid, predominantly asexual human-pathogenic yeasts. In this study, we constructed tetraploid (4n) strains of C. albicans of the same or different lineages by spheroplast fusion. Induction of chromosome loss in the tetraploid C. albicans generated diploid or near-diploid progeny strains but did not produce any haploid progeny. We also constructed stable heterotetraploid somatic hybrid strains (2n + 2n) of C. albicans and C. dubliniensis by spheroplast fusion. Heterodiploid (n + n) progeny hybrids were obtained after inducing chromosome loss in a stable heterotetraploid hybrid. To identify a subset of hybrid heterodiploid progeny strains carrying at least one copy of all chromosomes of both species, unique centromere sequences of various chromosomes of each species were used as markers in PCR analysis. The reduction of chromosome content was confirmed by a comparative genome hybridization (CGH) assay. The hybrid strains were found to be stably propagated. Chromatin immunoprecipitation (ChIP) assays with antibodies against centromere-specific histones (C. albicans Cse4/C. dubliniensis Cse4) revealed that the centromere identity of chromosomes of each species is maintained in the hybrid genomes of the heterotetraploid and heterodiploid strains. Thus, our results suggest that the diploid genome content is not obligatory for the survival of either C. albicans or C. dubliniensis. In keeping with the recent discovery of the existence of haploid C. albicans strains, the heterodiploid strains of our study can be excellent tools for further species-specific genome elimination, yielding true haploid progeny of C. albicans or C. dubliniensis in future.
Resumo:
We report the crystal structure of the first prokaryotic aspartic proteinase-like domain identified in the genome of Mycobacterium tuberculosis. A search in the genomes of Mycobacterium species showed that the C-terminal domains of some of the PE family proteins contain two classic DT/SG motifs of aspartic proteinases with a low overall sequence similarity to HIV proteinase. The three-dimensional structure of one of them, Rv0977 (PE_PGRS16) of M. tuberculosis revealed the characteristic pepsinf-old and catalytic site architecture. However, the active site was completely blocked by the N-terminal His-tag. Surprisingly, the enzyme was found to be inactive even after the removal of the N-terminal His-tag. A comparison of the structure with pepsins showed significant differences in the critical substrate binding residues and in the flap tyrosine conformation that could contribute to the lack of proteolytic activity of Rv0977. (C) 2013 The Authors. Published by Elsevier B.V. on behalf of Federation of European Biochemical Societies. All rights reserved.
Resumo:
The highly modular nature of protein kinases generates diverse functional roles mediated by evolutionary events such as domain recombination, insertion and deletion of domains. Usually domain architecture of a kinase is related to the subfamily to which the kinase catalytic domain belongs. However outlier kinases with unusual domain architectures serve in the expansion of the functional space of the protein kinase family. For example, Src kinases are made-up of SH2 and SH3 domains in addition to the kinase catalytic domain. A kinase which lacks these two domains but retains sequence characteristics within the kinase catalytic domain is an outlier that is likely to have modes of regulation different from classical src kinases. This study defines two types of outlier kinases: hybrids and rogues depending on the nature of domain recombination. Hybrid kinases are those where the catalytic kinase domain belongs to a kinase subfamily but the domain architecture is typical of another kinase subfamily. Rogue kinases are those with kinase catalytic domain characteristic of a kinase subfamily but the domain architecture is typical of neither that subfamily nor any other kinase subfamily. This report provides a consolidated set of such hybrid and rogue kinases gleaned from six eukaryotic genomes-S. cerevisiae, D. melanogaster, C. elegans, M. musculus, T. rubripes and H. sapiens-and discusses their functions. The presence of such kinases necessitates a revisiting of the classification scheme of the protein kinase family using full length sequences apart from classical classification using solely the sequences of kinase catalytic domains. The study of these kinases provides a good insight in engineering signalling pathways for a desired output. Lastly, identification of hybrids and rogues in pathogenic protozoa such as P. falciparum sheds light on possible strategies in host-pathogen interactions.
Resumo:
We have identified strong topoisomerase sites (STS) for Mycobacteruim smegmatis topoisomerase I in double-stranded DNA context using electrophoretic mobility shift assay of enzyme-DNA covalent complexes; Mg2+, an essential component for DNA relaxation activity of the enzyme, is not required for binding to DNA, The enzyme makes single-stranded nicks, with transient covalent interaction at the 5'-end of the broken DNA strand, a characteristic akin to prokaryotic topoisomerases. More importantly, the enzyme binds to duplex DNA having a preferred site with high affinity, a. property similar to the eukaryotic type I topoisomerases, The preferred cleavage site is mapped on a 65 bp duplex DNA and found to be CG/TCTT. Thus, the enzyme resembles other prokaryotic type I topoisomerases in mechanistics of the reaction, but is similar to eukaryotic enzymes in DNA recognition properties.
Resumo:
Background: Tuberculosis still remains one of the largest killer infectious diseases, warranting the identification of newer targets and drugs. Identification and validation of appropriate targets for designing drugs are critical steps in drug discovery, which are at present major bottle-necks. A majority of drugs in current clinical use for many diseases have been designed without the knowledge of the targets, perhaps because standard methodologies to identify such targets in a high-throughput fashion do not really exist. With different kinds of 'omics' data that are now available, computational approaches can be powerful means of obtaining short-lists of possible targets for further experimental validation. Results: We report a comprehensive in silico target identification pipeline, targetTB, for Mycobacterium tuberculosis. The pipeline incorporates a network analysis of the protein-protein interactome, a flux balance analysis of the reactome, experimentally derived phenotype essentiality data, sequence analyses and a structural assessment of targetability, using novel algorithms recently developed by us. Using flux balance analysis and network analysis, proteins critical for survival of M. tuberculosis are first identified, followed by comparative genomics with the host, finally incorporating a novel structural analysis of the binding sites to assess the feasibility of a protein as a target. Further analyses include correlation with expression data and non-similarity to gut flora proteins as well as 'anti-targets' in the host, leading to the identification of 451 high-confidence targets. Through phylogenetic profiling against 228 pathogen genomes, shortlisted targets have been further explored to identify broad-spectrum antibiotic targets, while also identifying those specific to tuberculosis. Targets that address mycobacterial persistence and drug resistance mechanisms are also analysed. Conclusion: The pipeline developed provides rational schema for drug target identification that are likely to have high rates of success, which is expected to save enormous amounts of money, resources and time in the drug discovery process. A thorough comparison with previously suggested targets in the literature demonstrates the usefulness of the integrated approach used in our study, highlighting the importance of systems-level analyses in particular. The method has the potential to be used as a general strategy for target identification and validation and hence significantly impact most drug discovery programmes.