259 resultados para Alignments.


Relevância:

10.00% 10.00%

Publicador:

Resumo:

The Identification and Classification of Bacteria (ICB) database (http:/www.mbio.co.jp/icb) contains currently available information about the DNA gyrase subunit B (gyrB) gene in bacteria. The database is designed to provide the scientific community with a reference point for using gyrB as an evolutionary and taxonomic marker. Nucleic and amino acid sequence data are currently available for over 850 strains, along with alignments at several different taxonomic levels and an exhaustive review of primer selection and background information.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The PlantsP database is a curated database that combines information derived from sequences with experimental functional genomics information. PlantsP focuses on plant protein kinases and protein phosphatases. The database will specifically provide a resource for information on a collection of T-DNA insertion mutants (knockouts) in each protein kinase and phosphatase in Arabidopsis thaliana. PlantsP also provides a curated view of each protein that includes a comprehensive annotation of functionally related sequence motifs, sequence family definitions, alignments and phylogenetic trees, and descriptive information drawn directly from the literature. PlantsP is available at http://PlantsP.sdsc.edu.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

When many protein sequences are available for estimating the time of divergence between two species, it is customary to estimate the time for each protein separately and then use the average for all proteins as the final estimate. However, it can be shown that this estimate generally has an upward bias, and that an unbiased estimate is obtained by using distances based on concatenated sequences. We have shown that two concatenation-based distances, i.e., average gamma distance weighted with sequence length (d2) and multiprotein gamma distance (d3), generally give more satisfactory results than other concatenation-based distances. Using these two distance measures for 104 protein sequences, we estimated the time of divergence between mice and rats to be approximately 33 million years ago. Similarly, the time of divergence between humans and rodents was estimated to be approximately 96 million years ago. We also investigated the dependency of time estimates on statistical methods and various assumptions made by using sequence data from eubacteria, protists, plants, fungi, and animals. Our best estimates of the times of divergence between eubacteria and eukaryotes, between protists and other eukaryotes, and between plants, fungi, and animals were 3, 1.7, and 1.3 billion years ago, respectively. However, estimates of ancient divergence times are subject to a substantial amount of error caused by uncertainty of the molecular clock, horizontal gene transfer, errors in sequence alignments, etc.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

As the number of protein folds is quite limited, a mode of analysis that will be increasingly common in the future, especially with the advent of structural genomics, is to survey and re-survey the finite parts list of folds from an expanding number of perspectives. We have developed a new resource, called PartsList, that lets one dynamically perform these comparative fold surveys. It is available on the web at http://bioinfo.mbb.yale.edu/partslist and http://www.partslist.org. The system is based on the existing fold classifications and functions as a form of companion annotation for them, providing ‘global views’ of many already completed fold surveys. The central idea in the system is that of comparison through ranking; PartsList will rank the approximately 420 folds based on more than 180 attributes. These include: (i) occurrence in a number of completely sequenced genomes (e.g. it will show the most common folds in the worm versus yeast); (ii) occurrence in the structure databank (e.g. most common folds in the PDB); (iii) both absolute and relative gene expression information (e.g. most changing folds in expression over the cell cycle); (iv) protein–protein interactions, based on experimental data in yeast and comprehensive PDB surveys (e.g. most interacting fold); (v) sensitivity to inserted transposons; (vi) the number of functions associated with the fold (e.g. most multi-functional folds); (vii) amino acid composition (e.g. most Cys-rich folds); (viii) protein motions (e.g. most mobile folds); and (ix) the level of similarity based on a comprehensive set of structural alignments (e.g. most structurally variable folds). The integration of whole-genome expression and protein–protein interaction data with structural information is a particularly novel feature of our system. We provide three ways of visualizing the rankings: a profiler emphasizing the progression of high and low ranks across many pre-selected attributes, a dynamic comparer for custom comparisons and a numerical rankings correlator. These allow one to directly compare very different attributes of a fold (e.g. expression level, genome occurrence and maximum motion) in the uniform numerical format of ranks. This uniform framework, in turn, highlights the way that the frequency of many of the attributes falls off with approximate power-law behavior (i.e. according to V–b, for attribute value V and constant exponent b), with a few folds having large values and most having small values.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Accurate multiple alignments of 86 domains that occur in signaling proteins have been constructed and used to provide a Web-based tool (SMART: simple modular architecture research tool) that allows rapid identification and annotation of signaling domain sequences. The majority of signaling proteins are multidomain in character with a considerable variety of domain combinations known. Comparison with established databases showed that 25% of our domain set could not be deduced from SwissProt and 41% could not be annotated by Pfam. SMART is able to determine the modular architectures of single sequences or genomes; application to the entire yeast genome revealed that at least 6.7% of its genes contain one or more signaling domains, approximately 350 greater than previously annotated. The process of constructing SMART predicted (i) novel domain homologues in unexpected locations such as band 4.1-homologous domains in focal adhesion kinases; (ii) previously unknown domain families, including a citron-homology domain; (iii) putative functions of domain families after identification of additional family members, for example, a ubiquitin-binding role for ubiquitin-associated domains (UBA); (iv) cellular roles for proteins, such predicted DEATH domains in netrin receptors further implicating these molecules in axonal guidance; (v) signaling domains in known disease genes such as SPRY domains in both marenostrin/pyrin and Midline 1; (vi) domains in unexpected phylogenetic contexts such as diacylglycerol kinase homologues in yeast and bacteria; and (vii) likely protein misclassifications exemplified by a predicted pleckstrin homology domain in a Candida albicans protein, previously described as an integrin.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

We present a method for discovering conserved sequence motifs from families of aligned protein sequences. The method has been implemented as a computer program called emotif (http://motif.stanford.edu/emotif). Given an aligned set of protein sequences, emotif generates a set of motifs with a wide range of specificities and sensitivities. emotif also can generate motifs that describe possible subfamilies of a protein superfamily. A disjunction of such motifs often can represent the entire superfamily with high specificity and sensitivity. We have used emotif to generate sets of motifs from all 7,000 protein alignments in the blocks and prints databases. The resulting database, called identify (http://motif.stanford.edu/identify), contains more than 50,000 motifs. For each alignment, the database contains several motifs having a probability of matching a false positive that range from 10−10 to 10−5. Highly specific motifs are well suited for searching entire proteomes, while generating very few false predictions. identify assigns biological functions to 25–30% of all proteins encoded by the Saccharomyces cerevisiae genome and by several bacterial genomes. In particular, identify assigned functions to 172 of proteins of unknown function in the yeast genome.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Reports of nuclear tRNA aminoacylation and its role in tRNA nuclear export (Lund and Dahlberg, 1998; Sarkar et al., 1999; Grosshans et al., 2000a) have led to the prediction that there should be nuclear pools of aminoacyl-tRNA synthetases. We report that in budding yeast there are nuclear pools of tyrosyl-tRNA synthetase, Tys1p. By sequence alignments we predicted a Tys1p nuclear localization sequence and showed it to be sufficient for nuclear location of a passenger protein. Mutations of this nuclear localization sequence in endogenous Tys1p reduce nuclear Tys1p pools, indicating that the motif is also important for nucleus location. The mutations do not significantly affect catalytic activity, but they do cause defects in export of tRNAs to the cytosol. Despite export defects, the cells are viable, indicating that nuclear tRNA aminoacylation is not required for all tRNA nuclear export paths. Because the tRNA nuclear exportin, Los1p, is also unessential, we tested whether tRNA aminoacylation and Los1p operate in alternative tRNA nuclear export paths. No genetic interactions between aminoacyl-tRNA synthetases and Los1p were detected, indicating that tRNA nuclear aminoacylation and Los1p operate in the same export pathway or there are more than two pathways for tRNA nuclear export.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Phospholipase A2 (PLA2) was purified about 180,000 times compared with the starting soluble-protein extract from developing elm (Ulmus glabra) seeds. On sodium dodecyl sulfate-polyacrylamide gel electrophoresis the purified fraction showed a single protein band with a mobility that corresponded to 15 kD, from which activity could be recovered. When analyzed by matrix-assisted laser-desorption ionization-time-of-flight mass spectrometry, the enzyme had a deduced mass of 13,900 D. A 53-amino acid-long N-terminal sequence was determined and aligned with other sequences, giving 62% identity to the deduced amino acid sequence of some rice (Oryza sativa) expressed sequence tag clones. The purified enzyme had an alkaline pH optimum and required Ca2+ for activity. It was unusually stable with regard to heat, acidity, and organic solvents but was sensitive to disulfide bond-reducing agents. The enzyme is a true PLA2, neither hydrolyzing the sn-1 position of phosphatidylcholine nor having any activity toward lysophosphatidylcholine or diacylglycerol. The biochemical data and amino acid sequence alignments indicate that the enzyme is related to the well-characterized family of animal secretory PLA2s and, to our knowledge, is the first plant enzyme of this type to be described.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this paper, a new way to think about, and to construct, pairwise as well as multiple alignments of DNA and protein sequences is proposed. Rather than forcing alignments to either align single residues or to introduce gaps by defining an alignment as a path running right from the source up to the sink in the associated dot-matrix diagram, we propose to consider alignments as consistent equivalence relations defined on the set of all positions occurring in all sequences under consideration. We also propose constructing alignments from whole segments exhibiting highly significant overall similarity rather than by aligning individual residues. Consequently, we present an alignment algorithm that (i) is based on segment-to-segment comparison instead of the commonly used residue-to-residue comparison and which (ii) avoids the well-known difficulties concerning the choice of appropriate gap penalties: gaps are not treated explicity, but remain as those parts of the sequences that do not belong to any of the aligned segments. Finally, we discuss the application of our algorithm to two test examples and compare it with commonly used alignment methods. As a first example, we aligned a set of 11 DNA sequences coding for functional helix-loop-helix proteins. Though the sequences show only low overall similarity, our program correctly aligned all of the 11 functional sites, which was a unique result among the methods tested. As a by-product, the reading frames of the sequences were identified. Next, we aligned a set of ribonuclease H proteins and compared our results with alignments produced by other programs as reported by McClure et al. [McClure, M. A., Vasi, T. K. & Fitch, W. M. (1994) Mol. Biol. Evol. 11, 571-592]. Our program was one of the best scoring programs. However, in contrast to other methods, our protein alignments are independent of user-defined parameters.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Inherited defects in the gene for methylmalonyl-CoA mutase (EC 5.4.99.2) result in the mut forms of methylmalonic aciduria. mut- mutations lead to the absence of detectable mutase activity and are not corrected by excess cobalamin, whereas mut- mutations exhibit residual activity when exposed to excess cobalamin. Many of the mutations that cause methylmalonic aciduria in humans affect residues in the C-terminal region of the methylmalonyl-CoA mutase. This portion of the methylmalonyl-CoA mutase sequence can be aligned with regions in other B12 (cobalamin)-dependent enzymes, including the C-terminal portion of the cobalamin-binding region of methionine synthase. The alignments allow the mutations of human methylmalonyl-CoA mutase to be mapped onto the structure of the cobalamin-binding fragment of methionine synthase from Escherichia coli (EC 2.1.1.13), which has recently been determined by x-ray crystallography. In this structure, the dimethylbenzimidazole ligand to the cobalt in free cobalamin has been displaced by a histidine ligand, and the dimethylbenzimidazole nucleotide "tail" is thrust into a deep hydrophobic pocket in the protein. Previously identified mut0 and mut- mutations (Gly-623 --> Arg, Gly-626 --> Cys, and Gly-648 --> Asp) of the mutase are predicted to interfere with the structure and/or stability of the loop that carries His-627, the presumed lower axial ligand to the cobalt of adenosylcobalamin. Two mutants that lead to severe impairment (mut0) are Gly-630 --> Glu and Gly-703 --> Arg, which map to the binding site for the dimethylbenzimidazole nucleotide substituent of adenosylcobalamin. The substitution of larger residues for glycine is predicted to block the binding of adenosylcobalamin.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The correspondence between the transversion/transition ratio and the neighboring base composition in chloroplast DNA is examined. For 18 noncoding regions of the chloroplast genome, alignments between rice (Oryza sativa) and maize (Zea mays) were generated by two different methods. Difficulties of aligning noncoding DNA are discussed, and the alignments are analyzed in a manner that reduces alignment artifacts. Sequence divergence is < 10%, so multiple substitutions at a site are assumed to be rare. Observed substitutions were analyzed with respect to the A+T content of the two immediately flanking bases. It is shown that as this content increases, the proportion of transversions also increases. When both the 5'- and 3'-flanking nucleotides are G or C (A+T content of 0), only 25% of the observed substitutions are transversions. However, when both the 5'- and 3'-flanking nucleotides are A or T (A+T content of 2), 57% of the observed substitutions are transversions. Therefore, the influence of flanking base composition on substitutions, previously reported for a single noncoding region, is a general feature of the chloroplast genome.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Genetic and physiological studies of the Drosophila Hyperkinetic (Hk) mutant revealed defects in the function or regulation of K+ channels encoded by the Shaker (Sh) locus. The Hk polypeptide, determined from analysis of cDNA clones, is a homologue of mammalian K+ channel beta subunits (Kv beta). Coexpression of Hk with Sh in Xenopus oocytes increases current amplitudes and changes the voltage dependence and kinetics of activation and inactivation, consistent with predicted functions of Hk in vivo. Sequence alignments show that Hk, together with mammalian Kv beta, represents an additional branch of the aldo-keto reductase superfamily. These results are relevant to understanding the function and evolutionary origin of Kv beta.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

We present homologies between archaeal and eucaryal DNA-dependent RNA polymerase (RNAP) subunits and transcription factors. The sequences of the Sulfolobus acidocaldarius subunits D, E, and N and alignments with eucaryal homologs are presented here. The similarities between archaeal transcription factors and their eucaryal homologs TFIIB and TBP have been established in other laboratories. The archaeal RNAP subunits H, K, and N, respectively, show high sequence similarity to ABC27, ABC23, and ABC10 beta (found in all three eucaryal RNAPs); subunit D, to AC40 (common to polymerase II and polymerase III) and B44 (polymerase II); and subunit L, to AC19 and B12.5. The similarity of subunit D and its eucaryal homologs to bacterial alpha is limited to the "alpha-motif," which is also present in subunit L and its eucaryal homologs. Genes encoding homologs of the related eucaryal RNAP subunits A12.2/B12.6 and also homologs of eucaryal transcription elongation factors of the TFIIS family have been detected in Sulfolobus acidocaldarius and Thermococcus celer. In archaea, the protein is not an RNAP subunit. Together with the sequence similarities between archaeal box A-containing and eucaryal TATA box-containing promoters, this shows that the archaeal and eucaryal transcription systems are truly homologous and that they differ structurally and functionally from the bacterial transcription machinery. In contrast, however, a number of genes for the archaeal transcription apparatus are organized in clusters resembling the clusters of transcription-associated genes in Bacteria.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

O Brasil possui uma posição privilegiada quando se refere à produção de etanol. Por questões históricas e geográficas o país é responsável por mais de 30 % da produção mundial de etanol, com uma produção nacional de mais de 28 bilhões de litros em 2014. Para maximizar o rendimento desse processo, está em desenvolvimento a tecnologia associada ao etanol de segunda geração ou etanol lignocelulósico. Os principais desafios desta tecnologia são: melhorar a eficiência de conversão do substrato em produto e a produção em grande escala utilizando substratos de baixo custo. Com o objetivo de melhorar a eficiência do processo de conversão foram estudadas proteínas auxiliares (expansinas) que, em conjunto com celulases, melhoram a despolimerização de biomassa lignocelulósica em açúcares fermentescíveis. Além disso, realizou-se também a caracterização de enzimas ativas de carboidratos (CAZymes) de origem termofílica do organismo Thermogemmatispora sp. T81, devido a capacidade que estas proteínas apresentam de manter a atividade e conformação estrutural em altas temperaturas por um prolongado período de tempo. A partir de análises utilizando bioinformática, os genes que codificam para expansinas de Xanthomonas campestris, Bacillus licheniformis e Trichoderma reesei foram clonados e expressos em E. coli, e seus produtos gênicos (as expansinas) tiveram seus índices de sinergismo (devido atuação conjunta com coquetéis comerciais) e atividade catalítica determinados. Adicionalmente, dispondo de alinhamentos estruturais, foi proposto um mecanismo hidrolítico para elas. Em relação à bactéria Thermogemmatispora sp. T81, foram realizadas análises genômicas e proteômicas, a fim de selecionar enzimas superexpressas em meio celulósico. Seus genes foram clonados heterologamente em E. coli e o produto de expressão caracterizado bioquimicamente (cromatografia, ensaios de atividade e perfil de hidrólise) e estruturalmente (SAXS e dicroísmo circular). Os índices de sinergismo determinados foram de 2,47; 1,96 e 2,44 para as expansinas de Xanthomonas campestris, Bacillus licheniformis e Trichoderma reesei, respectivamente. A partir dos alinhamentos estruturais foi proposto a díade Asp/Glu como sitio catalítico em expansinas. As análises de proteômica possibilitaram a seleção de quatro alvos de clonagem, por apresentarem alto índice de expressão quando a bactéria foi cultivada em meio celulósico. Estas proteínas foram caracterizadas quanto a atividade e apresentaram um perfil comum: temperatura ótima de ação (de 70 a 75 °C), pH ótimo de 5, e hidrolisam preferencialmente substratos hemicelulósicos (xilano). A porcentagem de estruturais secundárias das proteínas em estudo foram confirmadas com predições teóricas ao se utilizar a técnica de dicroísmo circular. Desta maneira, os objetivos iniciais propostos neste projeto foram concluídos com a determinação do grau de sinergismo das proteínas expansinas em estudo e a proposição de um mecanismo de hidrólise para as mesmas, considerando que tais proteínas por mais de 20 anos tiveram sua atividade definida exclusivamente como acessória. Além disso, este estudo contribui com a identificação e seleção de genes para CAZymes termofilícas com aplicação biotecnológica devido às propriedades termoestáveis apresentadas.