208 resultados para Genomic sequence database

em Indian Institute of Science - Bangalore - Índia


Relevância:

100.00% 100.00%

Publicador:

Resumo:

The genomic sequences of several RNA plant viruses including cucumber mosaic virus, brome mosaic virus, alfalfa mosaic virus and tobacco mosaic virus have become available recently. The former two viruses are icosahedral while the latter two are bullet and rod shaped, respectively in particle morphology. The non-structural 3a proteins of cucumber mosaic virus and brome mosaic virus have an amino acid sequence homology of 35% and hence are evolutionarily related. In contrast, the coat proteins exhibit little homology, although the circular dichroism spectrum of these viruses are similar. The non-coding regions of the genome also exhibit variable but extensive homology. Comparison of the brome mosaic virus and alfalfa mosaic virus sequences reveals that they are probably related although with a much larger evolutionary distance. The polypeptide folds of the coat protein of three biologically distinct isometric plant viruses, tomato bushy stunt virus, southern bean mosaic virus and satellite tobacco necrosis virus have been shown to display a striking resemblance. All of them consist of a topologically similar 8-standard β-barrel. The implications of these studies to the understanding of the evolution of plant viruses will be discussed.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The 3prime terminal 1255nt sequence of Physalis mottle virus (PhMV) genomic RNA has been determined from a set of overlapping cDNA clones. The open reading frame (ORF) at the 3prime terminus corresponds to the amino acid sequence of the coat protein (CP) determined earlier except for the absence of the dipeptide, Lys-Leu, at position 110-111. In addition, the sequence upstream of the CP gene contains the message coding for 178 amino acid residues of the C-terminus of the putative replicase protein (RP). The sequence downstream of the CP gene contains an untranslated region whose terminal 80 nucleotides can be folded into a characteristic tRNA-like structure. A phylogenetic tree constructed after aligning separately the sequence of the CP, the replicase protein (RP) and the tRNA-like structure determined in this study with the corresponding sequences of other tymoviruses shows that PhMV wrongly named belladonna mottle virus [BDMV(I)] is a separate tymovirus and not another strain of BDMV(E) as originally envisaged. The phylogenetic tree in all the three cases is identical showing that any subset of genomic sequence of sufficient length can be used for establishing evolutionary relationships among tymoviruses.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The 3' terminal 1255 nt sequence of Physalis mottle virus (PhMV) genomic RNA has been determined from a set of overlapping cDNA clones. The open reading frame (ORF) at the 3' terminus corresponds to the amino acid sequence of the coat protein (CP) determined earlier except for the absence of the dipeptide, Lys-Leu, at position 110-111. In addiition, the sequence upstream of the CP gene contains the message coding for 178 amino acid residues of the C-terminus of the putative replicase protein (RP). The sequence downstream of the CP gene contains an untranslated region whose terminal 80 nucleotides can be folded into a characteristic tRNA-like structure. A phylogenetic tree constructed after aligning separately the sequence of the CP, the replicase protein (RP) and the tRNA-like structure determined in this study with the corresponding sequences of other tymoviruses shows that PhMV wrongly named belladonna mottle virus [BDMV(I)] is a separate tymovirus and not another strain of BDMV(E) as originally envisaged. The phylogenetic tree in all the three cases is identical showing that any subset of genomic sequence of sufficient length can be used for establishing evolutionary relationships among tymoviruses.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Background: Development of sensitive sequence search procedures for the detection of distant relationships between proteins at superfamily/fold level is still a big challenge. The intermediate sequence search approach is the most frequently employed manner of identifying remote homologues effectively. In this study, examination of serine proteases of prolyl oligopeptidase, rhomboid and subtilisin protein families were carried out using plant serine proteases as queries from two genomes including A. thaliana and O. sativa and 13 other families of unrelated folds to identify the distant homologues which could not be obtained using PSI-BLAST. Methodology/Principal Findings: We have proposed to start with multiple queries of classical serine protease members to identify remote homologues in families, using a rigorous approach like Cascade PSI-BLAST. We found that classical sequence based approaches, like PSI-BLAST, showed very low sequence coverage in identifying plant serine proteases. The algorithm was applied on enriched sequence database of homologous domains and we obtained overall average coverage of 88% at family, 77% at superfamily or fold level along with specificity of similar to 100% and Mathew's correlation coefficient of 0.91. Similar approach was also implemented on 13 other protein families representing every structural class in SCOP database. Further investigation with statistical tests, like jackknifing, helped us to better understand the influence of neighbouring protein families. Conclusions/Significance: Our study suggests that employment of multiple queries of a family for the Cascade PSI-BLAST searches is useful for predicting distant relationships effectively even at superfamily level. We have proposed a generalized strategy to cover all the distant members of a particular family using multiple query sequences. Our findings reveal that prior selection of sequences as query and the presence of neighbouring families can be important for covering the search space effectively in minimal computational time. This study also provides an understanding of the `bridging' role of related families.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Protein functional annotation relies on the identification of accurate relationships, sequence divergence being a key factor. This is especially evident when distant protein relationships are demonstrated only with three-dimensional structures. To address this challenge, we describe a computational approach to purposefully bridge gaps between related protein families through directed design of protein-like ``linker'' sequences. For this, we represented SCOP domain families, integrated with sequence homologues, as multiple profiles and performed HMM-HMM alignments between related domain families. Where convincing alignments were achieved, we applied a roulette wheel-based method to design 3,611,010 protein-like sequences corresponding to 374 SCOP folds. To analyze their ability to link proteins in homology searches, we used 3024 queries to search two databases, one containing only natural sequences and another one additionally containing designed sequences. Our results showed that augmented database searches showed up to 30% improvement in fold coverage for over 74% of the folds, with 52 folds achieving all theoretically possible connections. Although sequences could not be designed between some families, the availability of designed sequences between other families within the fold established the sequence continuum to demonstrate 373 difficult relationships. Ultimately, as a practical and realistic extension, we demonstrate that such protein-like sequences can be ``plugged-into'' routine and generic sequence database searches to empower not only remote homology detection but also fold recognition. Our richly statistically supported findings show that complementary searches in both databases will increase the effectiveness of sequence-based searches in recognizing all homologues sharing a common fold. (C) 2013 Elsevier Ltd. All rights reserved.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Human CGI-58 (for comparative gene identification-58) and YLR099c, encoding Ict1p in Saccharomyces cerevisiae, have recently been identified as acyl-CoA-dependent lysophosphatidic acid acyltransferases. Sequence database searches for CGI-58 like proteins in Arabidopsis (Arabidopsis thaliana) revealed 24 proteins with At4g24160, a member of the alpha/beta-hydrolase family of proteins being the closest homolog. At4g24160 contains three motifs that are conserved across the plant species: a GXSXG lipase motif, a HX4D acyltransferase motif, and V(X)(3)HGF, a probable lipid binding motif. Dendrogram analysis of yeast ICT1, CGI-58, and At4g24160 placed these three polypeptides in the same group. Here, we describe and characterize At4g24160 as, to our knowledge, the first soluble lysophosphatidic acid acyltransferase in plants. A lipidomics approach revealed that At4g24160 has additional triacylglycerol lipase and phosphatidylcholine hydrolyzing enzymatic activities. These data establish At4g24160, a protein with a previously unknown function, as an enzyme that might play a pivotal role in maintaining the lipid homeostasis in plants by regulating both phospholipid and neutral lipid levels.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The rapid increase in genome sequence information has necessitated the annotation of their functional elements, particularly those occurring in the non-coding regions, in the genomic context. Promoter region is the key regulatory region, which enables the gene to be transcribed or repressed, but it is difficult to determine experimentally. Hence an in silico identification of promoters is crucial in order to guide experimental work and to pin point the key region that controls the transcription initiation of a gene. In this analysis, we demonstrate that while the promoter regions are in general less stable than the flanking regions, their average free energy varies depending on the GC composition of the flanking genomic sequence. We have therefore obtained a set of free energy threshold values, for genomic DNA with varying GC content and used them as generic criteria for predicting promoter regions in several microbial genomes, using an in-house developed tool `PromPredict'. On applying it to predict promoter regions corresponding to the 1144 and 612 experimentally validated TSSs in E. coli (50.8% GC) and B. subtilis (43.5% GC) sensitivity of 99% and 95% and precision values of 58% and 60%, respectively, were achieved. For the limited data set of 81 TSSs available for M. tuberculosis (65.6% GC) a sensitivity of 100% and precision of 49% was obtained.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In this paper, we present numerical evidence that supports the notion of minimization in the sequence space of proteins for a target conformation. We use the conformations of the real proteins in the Protein Data Bank (PDB) and present computationally efficient methods to identify the sequences with minimum energy. We use edge-weighted connectivity graph for ranking the residue sites with reduced amino acid alphabet and then use continuous optimization to obtain the energy-minimizing sequences. Our methods enable the computation of a lower bound as well as a tight upper bound for the energy of a given conformation. We validate our results by using three different inter-residue energy matrices for five proteins from protein data bank (PDB), and by comparing our energy-minimizing sequences with 80 million diverse sequences that are generated based on different considerations in each case. When we submitted some of our chosen energy-minimizing sequences to Basic Local Alignment Search Tool (BLAST), we obtained some sequences from non-redundant protein sequence database that are similar to ours with an E-value of the order of 10(-7). In summary, we conclude that proteins show a trend towards minimizing energy in the sequence space but do not seem to adopt the global energy-minimizing sequence. The reason for this could be either that the existing energy matrices are not able to accurately represent the inter-residue interactions in the context of the protein environment or that Nature does not push the optimization in the sequence space, once it is able to perform the function.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Background:Overwhelming majority of the Serine/Threonine protein kinases identified by gleaning archaeal and eubacterial genomes could not be classified into any of the well known Hanks and Hunter subfamilies of protein kinases. This is owing to the development of Hanks and Hunter classification scheme based on eukaryotic protein kinases which are highly divergent from their prokaryotic homologues. A large dataset of prokaryotic Serine/Threonine protein kinases recognized from genomes of prokaryotes have been used to develop a classification framework for prokaryotic Ser/Thr protein kinases. Methodology/Principal Findings: We have used traditional sequence alignment and phylogenetic approaches and clustered the prokaryotic kinases which represent 72 subfamilies with at least 4 members in each. Such a clustering enables classification of prokaryotic Ser/Thr kinases and it can be used as a framework to classify newly identified prokaryotic Ser/Thr kinases. After series of searches in a comprehensive sequence database we recognized that 38 subfamilies of prokaryotic protein kinases are associated to a specific taxonomic level. For example 4, 6 and 3 subfamilies have been identified that are currently specific to phylum proteobacteria, cyanobacteria and actinobacteria respectively. Similarly subfamilies which are specific to an order, sub-order, class, family and genus have also been identified. In addition to these, we also identify organism-diverse subfamilies. Members of these clusters are from organisms of different taxonomic levels, such as archaea, bacteria, eukaryotes and viruses.Conclusion/Significance: Interestingly, occurrence of several taxonomic level specific subfamilies of prokaryotic kinases contrasts with classification of eukaryotic protein kinases in which most of the popular subfamilies of eukaryotic protein kinases occur diversely in several eukaryotes. Many prokaryotic Ser/Thr kinases exhibit a wide variety of modular organization which indicates a degree of complexity and protein-protein interactions in the signaling pathways in these microbes.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

X-ray diffraction studies on single crystals of a few viruses have led to the elucidation of their three dimensional structure at near atomic resolution. Both the tertiary structure of the coat protein subunit and the quaternary organization of the icosahedral capsid in these viruses are remarkably similar. These studies have led to a critical re-examination of the structural principles in the architecture of isometric viruses and suggestions of alternative mechanisms of assembly. Apart from their role in the assembly of the virus particle, the coat proteins of certian viruses have been shown to inhibit the replication of the cognate RNA leading to cross-protection. The coat protein amino acid sequence and the genomic sequence of several spherical plant RNA viruses have been determined in the last decade. Experimental data on the mechanisms of uncoating, gene expression and replication of several classes of viruses have also become available. The function of the non-structural proteins of some viruses have been determined. This rapid progress has provided a wealth of information on several key steps in the life cycle of RNA viruses. The function of the viral coat protein, capsid architecture, assembly and disassembly and replication of isometric RNA plant viruses are discussed in the light of this accumulated knowledge.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Sixteen million nucleotide sequence of genome of various organisms have been analysed to detect and study the extent of occurrence of simple repetitive sequences. Two sequence motifs (TG/CA)n and (CT/AG)n capable of adopting unusual DNA structures, left handed Z-conformation and triple-helical conformation respectively, are found to be abundant in rodent and human genomes, but almost completely absent in bacterial genome. (TG/CA)n and (CT/AG)n sequences are present mostly in the intron or 5'/3' flanking regions of the genes. The presence of such repeat motifs in genomic sequence of higher eukaryotes has been correlated with their possible functional significance in nucleosome organization, recombination and gene expression.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

X-ray diffraction studies on single crystals of a few viruses have led to the elucidation of their three dimensional structure at near atomic resolution. Both the tertiary structure of the coat protein subunit and the quaternary morganization of the icosahedral capsid in these viruses are remarkably similar. These studies have led to a critical re-examination of the structural principles in the architecture of isometric viruses and suggestions of alternative mechanisms of assembly. Apart from their role in the assembly of the virus particle, the coat proteins of certian viruses have been shown to inhibit the replication of the cognate RNA leading to cross-protection. The coat protein amino acid sequence and the genomic sequence of several spherical plant RNA viruses have been determined in the last decade. Experimental data on the mechanisms of uncoating, gene expression and replication of several classes of viruses have also become available. The function of the non-structural proteins of some viruses have been determined. This rapid progress has provided a wealth of information on several key steps in the life cycle of RNA viruses. The function of the viral coat protein, capsid architecture, assembly and disassembly and replication of isometric RNA plant viruses are discussed in the light of this accumulated knowledge.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Thiolases are important in fatty-acid degradation and biosynthetic pathways. Analysis of the genomic sequence of Mycobacterium smegmatis suggests the presence of several putative thiolase genes. One of these genes appears to code for an SCP-x protein. Human SCP-x consists of an N-terminal domain (referred to as SCP2 thiolase) and a C-terminal domain (referred as sterol carrier protein 2). Here, the cloning, expression, purification and crystallization of this putative SCP-x protein from M. smegmatis are reported. The crystals diffracted X-rays to 2.5 angstrom resolution and belonged to the triclinic space group P1. Calculation of rotation functions using X-ray diffraction data suggests that the protein is likely to possess a hexameric oligomerization with 32 symmetry which has not been observed in the other six known classes of this enzyme.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The ability to metabolize aromatic beta-glucosides such as salicin and arbutin varies among members of the Enterobacteriaceae. The ability of Escherichia coli to degrade salicin and arbutin appears to be cryptic, subject to activation of the bgl genes, whereas many members of the Klebsiella genus can metabolize these sugars. We have examined the genetic basis for beta-glucoside utilization in Klebsiella aerogenes. The Klebsiella equivalents of bglG, bglB and bglR have been cloned using the genome sequence database of Klebsiella pneumoniae. Nucleotide sequencing shows that the K. aerogenes bgl genes show substantial similarities to the E. coli counterparts. The K. aerogenes bgl genes in multiple copies can also complement E. coli mutants deficient in bglG encoding the antiterminator and bglB encoding the phospho-beta-glucosidase, suggesting that they are functional homologues. The regulatory region bglR of K aerogenes shows a high degree of similarity of the sequences involved in BglG-mediated regulation. Interestingly, the regions corresponding to the negative elements present in the E. coli regulatory region show substantial divergence in K aerogenes. The possible evolutionary implications of the results are discussed. (C) 2003 Federation of European Microbiological Societies. Published by Elsevier Science B.v. All rights reserved.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Physical clustering of genes has been shown in plants; however, little is known about gene clusters that have different functions, particularly those expressed in the tomato fruit. A class I 17.6 small heat shock protein (Sl17.6 shsp) gene was cloned and used as a probe to screen a tomato (Solanum lycopersicum) genomic library. An 8.3-kb genomic fragment was isolated and its DNA sequence determined. Analysis of the genomic fragment identified intronless open reading frames of three class I shsp genes (Sl17.6, Sl20.0, and Sl20.1), the Sl17.6 gene flanked by Sl20.1 and Sl20.0, with complete 5' and 3' UTRs. Upstream of the Sl20.0 shsp, and within the shsp gene cluster, resides a box C/D snoRNA cluster made of SlsnoR12.1 and SlU24a. Characteristic C and D, and C' and D', boxes are conserved in SlsnoR12.1 and SlU24a while the upstream flanking region of SlsnoR12.1 carries TATA box 1, homol-E and homol-D box-like cis sequences, TM6 promoter, and an uncharacterized tomato EST. Molecular phylogenetic analysis revealed that this particular arrangement of shsps is conserved in tomato genome but is distinct from other species. The intronless genomic sequence is decorated with cis elements previously shown to be responsive to cues from plant hormones, dehydration, cold, heat, and MYC/MYB and WRKY71 transcription factors. Chromosomal mapping localized the tomato genomic sequence on the short arm of chromosome 6 in the introgression line (IL) 6-3. Quantitative polymerase chain reaction analysis of gene cluster members revealed differential expression during ripening of tomato fruit, and relatively different abundances in other plant parts.