78 resultados para MULTIPLE SEQUENCE ALIGNMENT

em National Center for Biotechnology Information - NCBI


Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper, a new way to think about, and to construct, pairwise as well as multiple alignments of DNA and protein sequences is proposed. Rather than forcing alignments to either align single residues or to introduce gaps by defining an alignment as a path running right from the source up to the sink in the associated dot-matrix diagram, we propose to consider alignments as consistent equivalence relations defined on the set of all positions occurring in all sequences under consideration. We also propose constructing alignments from whole segments exhibiting highly significant overall similarity rather than by aligning individual residues. Consequently, we present an alignment algorithm that (i) is based on segment-to-segment comparison instead of the commonly used residue-to-residue comparison and which (ii) avoids the well-known difficulties concerning the choice of appropriate gap penalties: gaps are not treated explicity, but remain as those parts of the sequences that do not belong to any of the aligned segments. Finally, we discuss the application of our algorithm to two test examples and compare it with commonly used alignment methods. As a first example, we aligned a set of 11 DNA sequences coding for functional helix-loop-helix proteins. Though the sequences show only low overall similarity, our program correctly aligned all of the 11 functional sites, which was a unique result among the methods tested. As a by-product, the reading frames of the sequences were identified. Next, we aligned a set of ribonuclease H proteins and compared our results with alignments produced by other programs as reported by McClure et al. [McClure, M. A., Vasi, T. K. & Fitch, W. M. (1994) Mol. Biol. Evol. 11, 571-592]. Our program was one of the best scoring programs. However, in contrast to other methods, our protein alignments are independent of user-defined parameters.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

There is a need for faster and more sensitive algorithms for sequence similarity searching in view of the rapidly increasing amounts of genomic sequence data available. Parallel processing capabilities in the form of the single instruction, multiple data (SIMD) technology are now available in common microprocessors and enable a single microprocessor to perform many operations in parallel. The ParAlign algorithm has been specifically designed to take advantage of this technology. The new algorithm initially exploits parallelism to perform a very rapid computation of the exact optimal ungapped alignment score for all diagonals in the alignment matrix. Then, a novel heuristic is employed to compute an approximate score of a gapped alignment by combining the scores of several diagonals. This approximate score is used to select the most interesting database sequences for a subsequent Smith–Waterman alignment, which is also parallelised. The resulting method represents a substantial improvement compared to existing heuristics. The sensitivity and specificity of ParAlign was found to be as good as Smith–Waterman implementations when the same method for computing the statistical significance of the matches was used. In terms of speed, only the significantly less sensitive NCBI BLAST 2 program was found to outperform the new approach. Online searches are available at http://dna.uio.no/search/

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Gene recognition is one of the most important problems in computational molecular biology. Previous attempts to solve this problem were based on statistics, and applications of combinatorial methods for gene recognition were almost unexplored. Recent advances in large-scale cDNA sequencing open a way toward a new approach to gene recognition that uses previously sequenced genes as a clue for recognition of newly sequenced genes. This paper describes a spliced alignment algorithm and software tool that explores all possible exon assemblies in polynomial time and finds the multiexon structure with the best fit to a related protein. Unlike other existing methods, the algorithm successfully recognizes genes even in the case of short exons or exons with unusual codon usage; we also report correct assemblies for genes with more than 10 exons. On a test sample of human genes with known mammalian relatives, the average correlation between the predicted and actual proteins was 99%. The algorithm correctly reconstructed 87% of genes and the rare discrepancies between the predicted and real exon-intron structures were caused either by short (less than 5 amino acids) initial/terminal exons or by alternative splicing. Moreover, the algorithm predicts human genes reasonably well when the homologous protein is nonvertebrate or even prokaryotic. The surprisingly good performance of the method was confirmed by extensive simulations: in particular, with target proteins at 160 accepted point mutations (PAM) (25% similarity), the correlation between the predicted and actual genes was still as high as 95%.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

BAliBASE is specifically designed to serve as an evaluation resource to address all the problems encountered when aligning complete sequences. The database contains high quality, manually constructed multiple sequence alignments together with detailed annotations. The alignments are all based on three-dimensional structural superpositions, with the exception of the transmembrane sequences. The first release provided sets of reference alignments dealing with the problems of high variability, unequal repartition and large N/C-terminal extensions and internal insertions. Here we describe version 2.0 of the database, which incorporates three new reference sets of alignments containing structural repeats, trans­membrane sequences and circular permutations to evaluate the accuracy of detection/prediction and alignment of these complex sequences. BAliBASE can be viewed at the web site http://www-igbmc.u-strasbg.fr/BioInfo/BAliBASE2/index.html or can be downloaded from ftp://ftp-igbmc.u-strasbg.fr/pub/BAliBASE2/.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The Mycetozoa include the cellular (dictyostelid), acellular (myxogastrid), and protostelid slime molds. However, available molecular data are in disagreement on both the monophyly and phylogenetic position of the group. Ribosomal RNA trees show the myxogastrid and dictyostelid slime molds as unrelated early branching lineages, but actin and β-tubulin trees place them together as a single coherent (monophyletic) group, closely related to the animal–fungal clade. We have sequenced the elongation factor-1α genes from one member of each division of the Mycetozoa, including Dictyostelium discoideum, for which cDNA sequences were previously available. Phylogenetic analyses of these sequences strongly support a monophyletic Mycetozoa, with the myxogastrid and dictyostelid slime molds most closely related to each other. All phylogenetic methods used also place this coherent Mycetozoan assemblage as emerging among the multicellular eukaryotes, tentatively supported as more closely related to animals + fungi than are green plants. With our data there are now three proteins that consistently support a monophyletic Mycetozoa and at least four that place these taxa within the “crown” of the eukaryote tree. We suggest that ribosomal RNA data should be more closely examined with regard to these questions, and we emphasize the importance of developing multiple sequence data sets.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The parasitic bacterium Mycoplasma genitalium has a small, reduced genome with close to a basic set of genes. As a first step toward determining the families of protein domains that form the products of these genes, we have used the multiple sequence programs psi-blast and geanfammer to match the sequences of the 467 gene products of M. genitalium to the sequences of the domains that form proteins of known structure [Protein Data Bank (PDB) sequences]. PDB sequences (274) match all of 106 M. genitalium sequences and some parts of another 85; thus, 41% of its total sequences are matched in all or part. The evolutionary relationships of the PDB domains that match M. genitalium are described in the structural classification of proteins (SCOP) database. Using this information, we show that the domains in the matched M. genitalium sequences come from 114 superfamilies and that 58% of them have arisen by gene duplication. This level of duplication is more than twice that found by using pairwise sequence comparisons. The PDB domain matches also describe the domain structure of the matched sequences: just over a quarter contain one domain and the rest have combinations of two or more domains.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

σ32, the product of the rpoH gene in Escherichia coli, provides promoter specificity by interacting with core RNAP. Amino acid sequence alignment of σ32 with other sigma factors in the σ70 family has revealed regions of sequence homology. We have investigated the function of the most highly conserved region, 2.2, using purified products of various rpoH alleles. Core RNAP binding analysis by glycerol gradient sedimentation has revealed reduced core RNAP affinity for one of the mutant σ32 proteins, Q80R. This reduced core interaction is exacerbated in the presence of σ70, which competes with σ32 for binding of core RNAP. When a different but more conserved amino acid was introduced at this position by site-directed mutagenesis (Q80N), this mutant sigma factor still displayed a significant reduction in its core RNAP affinity. Based on these results, we conclude that at least one specific amino acid in region 2.2 is involved in core RNAP interaction.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Gp180, a duck protein that was proposed to be a cell surface receptor for duck hepatitis B virus, is the homolog of metallocarboxypeptidase D, a mammalian protein thought to function in the trans-Golgi network (TGN) in the processing of proteins that transit the secretory pathway. Both gp180 and mammalian metallocarboxypeptidase D are type I integral membrane proteins that contain a 58-residue cytosolic C-terminal tail that is highly conserved between duck and rat. To investigate the regions of the gp180 tail involved with TGN retention and intracellular trafficking, gp180 and various deletion and point mutations were expressed in the AtT-20 mouse pituitary corticotroph cell line. Full length gp180 is enriched in the TGN and also cycles to the cell surface. Truncation of the C-terminal 56 residues of the cytosolic tail eliminates the enrichment in the TGN and the retrieval from the cell surface. Truncation of 12–43 residues of the tail reduced retention in the TGN and greatly accelerated the turnover of the protein. In contrast, deletion of the C-terminal 45 residues, which truncates a potential YxxL-like sequence (FxxL), reduced the protein turnover and caused accumulation of the protein on the cell surface. A point mutation of the FxxL sequence to AxxL slowed internalization, showing that this element is important for retrieval from the cell surface. Mutation of a pair of casein kinase II sites within an acidic cluster showed that they are also important for trafficking. The present study demonstrates that multiple sequence elements within the cytoplasmic tail of gp180 participate in TGN localization.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The pore-forming α subunit of large conductance voltage- and Ca2+-sensitive K (MaxiK) channels is regulated by a β subunit that has two membrane-spanning regions separated by an extracellular loop. To investigate the structural determinants in the pore-forming α subunit necessary for β-subunit modulation, we made chimeric constructs between a human MaxiK channel and the Drosophila homologue, which we show is insensitive to β-subunit modulation, and analyzed the topology of the α subunit. A comparison of multiple sequence alignments with hydrophobicity plots revealed that MaxiK channel α subunits have a unique hydrophobic segment (S0) at the N terminus. This segment is in addition to the six putative transmembrane segments (S1–S6) usually found in voltage-dependent ion channels. The transmembrane nature of this unique S0 region was demonstrated by in vitro translation experiments. Moreover, normal functional expression of signal sequence fusions and in vitro N-linked glycosylation experiments indicate that S0 leads to an exoplasmic N terminus. Therefore, we propose a new model where MaxiK channels have a seventh transmembrane segment at the N terminus (S0). Chimeric exchange of 41 N-terminal amino acids, including S0, from the human MaxiK channel to the Drosophila homologue transfers β-subunit regulation to the otherwise unresponsive Drosophila channel. Both the unique S0 region and the exoplasmic N terminus are necessary for this gain of function.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

An analysis of the x-ray structure of homodimeric avian farnesyl diphosphate synthase (geranyltransferase, EC 2.5.1.10) coupled with information about conserved amino acids obtained from a sequence alignment of 35 isoprenyl diphosphate synthases that synthesize farnesyl (C15), geranylgeranyl (C20), and higher chain length isoprenoid diphosphates suggested that the side chains of residues corresponding to F112 and F113 in the avian enzyme were important for determining the ultimate length of the hydrocarbon chains. This hypothesis was supported by site-directed mutagenesis to transform wild-type avian farnesyl diphosphate synthase (FPS) into synthases capable of producing geranylgeranyl diphosphate (F112A), geranylfarnesyl (C25) diphosphate (F113S), and longer chain prenyl diphosphates (F112A/F113S). An x-ray analysis of the structure of the F112A/F113S mutant in the apo state and with allylic substrates bound produced the strongest evidence that these mutations caused the observed change in product specificity by directly altering the size of the binding pocket for the growing isoprenoid chain in the active site of the enzyme. The proposed binding pocket in the apo mutant structure was increased in depth by 5.8 Å as compared with that for the wild-type enzyme. Allylic diphosphates were observed in the holo structures, bound through magnesium ions to the aspartates of the first of two conserved aspartate-rich sequences (D117–D121), with the hydrocarbon tails of all the ligands growing down the hydrophobic pocket toward the mutation site. A model was constructed to show how the growth of a long chain prenyl product may proceed by creation of a hydrophobic passageway from the FPS active site to the outside surface of the enzyme.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Ligand-Gated Ion Channels (LGIC) are polymeric transmembrane proteins involved in the fast response to numerous neurotransmitters. All these receptors are formed by homologous subunits and the last two decades revealed an unexpected wealth of genes coding for these subunits. The Ligand-Gated Ion Channel database (LGICdb) has been developed to handle this increasing amount of data. The database aims to provide only one entry for each gene, containing annotated nucleic acid and protein sequences. The repository is carefully structured and the entries can be retrieved by various criteria. In addition to the sequences, the LGICdb provides multiple sequence alignments, phylogenetic analyses and atomic coordinates when available. The database is accessible via the World Wide Web (http://www.pasteur.fr/recherche/banques/LGIC/LGIC.html), where it is continuously updated. The version 16 (September 2000) available for download contained 333 entries covering 34 species.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

TIGRFAMs is a collection of protein families featuring curated multiple sequence alignments, hidden Markov models and associated information designed to support the automated functional identification of proteins by sequence homology. We introduce the term ‘equivalog’ to describe members of a set of homologous proteins that are conserved with respect to function since their last common ancestor. Related proteins are grouped into equivalog families where possible, and otherwise into protein families with other hierarchically defined homology types. TIGRFAMs currently contains over 800 protein families, available for searching or downloading at www.tigr.org/TIGRFAMs. Classification by equivalog family, where achievable, complements classification by orthology, superfamily, domain or motif. It provides the information best suited for automatic assignment of specific functions to proteins from large-scale genome sequencing projects.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Many persistent viruses have evolved the ability to subvert MHC class I antigen presentation. Indeed, human cytomegalovirus (HCMV) encodes at least four proteins that down-regulate cell-surface expression of class I. The HCMV unique short (US)2 glycoprotein binds newly synthesized class I molecules within the endoplasmic reticulum (ER) and subsequently targets them for proteasomal degradation. We report the crystal structure of US2 bound to the HLA-A2/Tax peptide complex. US2 associates with HLA-A2 at the junction of the peptide-binding region and the α3 domain, a novel binding surface on class I that allows US2 to bind independently of peptide sequence. Mutation of class I heavy chains confirms the importance of this binding site in vivo. Available data on class I-ER chaperone interactions indicate that chaperones would not impede US2 binding. Unexpectedly, the US2 ER-luminal domain forms an Ig-like fold. A US2 structure-based sequence alignment reveals that seven HCMV proteins, at least three of which function in immune evasion, share the same fold as US2. The structure allows design of further experiments to determine how US2 targets class I molecules for degradation.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Apert syndrome (AS) is characterized by craniosynostosis (premature fusion of cranial sutures) and severe syndactyly of the hands and feet. Two activating mutations, Ser-252 → Trp and Pro-253 → Arg, in fibroblast growth factor receptor 2 (FGFR2) account for nearly all known cases of AS. To elucidate the mechanism by which these substitutions cause AS, we determined the crystal structures of these two FGFR2 mutants in complex with fibroblast growth factor 2 (FGF2) . These structures demonstrate that both mutations introduce additional interactions between FGFR2 and FGF2, thereby augmenting FGFR2–FGF2 affinity. Moreover, based on these structures and sequence alignment of the FGF family, we propose that the Pro-253 → Arg mutation will indiscriminately increase the affinity of FGFR2 toward any FGF. In contrast, the Ser-252 → Trp mutation will selectively enhance the affinity of FGFR2 toward a limited subset of FGFs. These predictions are consistent with previous biochemical data describing the effects of AS mutations on FGF binding. Alterations in FGFR2 ligand affinity and specificity may allow inappropriate autocrine or paracrine activation of FGFR2. Furthermore, the distinct gain-of-function interactions observed in each crystal structure provide a model to explain the phenotypic variability among AS patients.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The three-dimensional structure of Aspergillus niger pectin lyase B (PLB) has been determined by crystallographic techniques at a resolution of 1.7 Å. The model, with all 359 amino acids and 339 water molecules, refines to a final crystallographic R factor of 16.5%. The polypeptide backbone folds into a large right-handed cylinder, termed a parallel β helix. Loops of various sizes and conformations protrude from the central helix and probably confer function. The largest loop of 53 residues folds into a small domain consisting of three antiparallel β strands, one turn of an α helix, and one turn of a 310 helix. By comparison with the structure of Erwinia chrysanthemi pectate lyase C (PelC), the primary sequence alignment between the pectate and pectin lyase subfamilies has been corrected and the active site region for the pectin lyases deduced. The substrate-binding site in PLB is considerably less hydrophilic than the comparable PelC region and consists of an extensive network of highly conserved Trp and His residues. The PLB structure provides an atomic explanation for the lack of a catalytic requirement for Ca2+ in the pectin lyase family, in contrast to that found in the pectate lyase enzymes. Surprisingly, however, the PLB site analogous to the Ca2+ site in PelC is filled with a positive charge provided by a conserved Arg in the pectin lyases. The significance of the finding with regard to the enzymatic mechanism is discussed.