24 resultados para Sequence Alignment

em National Center for Biotechnology Information - NCBI


Relevância:

100.00% 100.00%

Publicador:

Resumo:

There is a need for faster and more sensitive algorithms for sequence similarity searching in view of the rapidly increasing amounts of genomic sequence data available. Parallel processing capabilities in the form of the single instruction, multiple data (SIMD) technology are now available in common microprocessors and enable a single microprocessor to perform many operations in parallel. The ParAlign algorithm has been specifically designed to take advantage of this technology. The new algorithm initially exploits parallelism to perform a very rapid computation of the exact optimal ungapped alignment score for all diagonals in the alignment matrix. Then, a novel heuristic is employed to compute an approximate score of a gapped alignment by combining the scores of several diagonals. This approximate score is used to select the most interesting database sequences for a subsequent Smith–Waterman alignment, which is also parallelised. The resulting method represents a substantial improvement compared to existing heuristics. The sensitivity and specificity of ParAlign was found to be as good as Smith–Waterman implementations when the same method for computing the statistical significance of the matches was used. In terms of speed, only the significantly less sensitive NCBI BLAST 2 program was found to outperform the new approach. Online searches are available at http://dna.uio.no/search/

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper, a new way to think about, and to construct, pairwise as well as multiple alignments of DNA and protein sequences is proposed. Rather than forcing alignments to either align single residues or to introduce gaps by defining an alignment as a path running right from the source up to the sink in the associated dot-matrix diagram, we propose to consider alignments as consistent equivalence relations defined on the set of all positions occurring in all sequences under consideration. We also propose constructing alignments from whole segments exhibiting highly significant overall similarity rather than by aligning individual residues. Consequently, we present an alignment algorithm that (i) is based on segment-to-segment comparison instead of the commonly used residue-to-residue comparison and which (ii) avoids the well-known difficulties concerning the choice of appropriate gap penalties: gaps are not treated explicity, but remain as those parts of the sequences that do not belong to any of the aligned segments. Finally, we discuss the application of our algorithm to two test examples and compare it with commonly used alignment methods. As a first example, we aligned a set of 11 DNA sequences coding for functional helix-loop-helix proteins. Though the sequences show only low overall similarity, our program correctly aligned all of the 11 functional sites, which was a unique result among the methods tested. As a by-product, the reading frames of the sequences were identified. Next, we aligned a set of ribonuclease H proteins and compared our results with alignments produced by other programs as reported by McClure et al. [McClure, M. A., Vasi, T. K. & Fitch, W. M. (1994) Mol. Biol. Evol. 11, 571-592]. Our program was one of the best scoring programs. However, in contrast to other methods, our protein alignments are independent of user-defined parameters.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Gene recognition is one of the most important problems in computational molecular biology. Previous attempts to solve this problem were based on statistics, and applications of combinatorial methods for gene recognition were almost unexplored. Recent advances in large-scale cDNA sequencing open a way toward a new approach to gene recognition that uses previously sequenced genes as a clue for recognition of newly sequenced genes. This paper describes a spliced alignment algorithm and software tool that explores all possible exon assemblies in polynomial time and finds the multiexon structure with the best fit to a related protein. Unlike other existing methods, the algorithm successfully recognizes genes even in the case of short exons or exons with unusual codon usage; we also report correct assemblies for genes with more than 10 exons. On a test sample of human genes with known mammalian relatives, the average correlation between the predicted and actual proteins was 99%. The algorithm correctly reconstructed 87% of genes and the rare discrepancies between the predicted and real exon-intron structures were caused either by short (less than 5 amino acids) initial/terminal exons or by alternative splicing. Moreover, the algorithm predicts human genes reasonably well when the homologous protein is nonvertebrate or even prokaryotic. The surprisingly good performance of the method was confirmed by extensive simulations: in particular, with target proteins at 160 accepted point mutations (PAM) (25% similarity), the correlation between the predicted and actual genes was still as high as 95%.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

σ32, the product of the rpoH gene in Escherichia coli, provides promoter specificity by interacting with core RNAP. Amino acid sequence alignment of σ32 with other sigma factors in the σ70 family has revealed regions of sequence homology. We have investigated the function of the most highly conserved region, 2.2, using purified products of various rpoH alleles. Core RNAP binding analysis by glycerol gradient sedimentation has revealed reduced core RNAP affinity for one of the mutant σ32 proteins, Q80R. This reduced core interaction is exacerbated in the presence of σ70, which competes with σ32 for binding of core RNAP. When a different but more conserved amino acid was introduced at this position by site-directed mutagenesis (Q80N), this mutant sigma factor still displayed a significant reduction in its core RNAP affinity. Based on these results, we conclude that at least one specific amino acid in region 2.2 is involved in core RNAP interaction.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

An analysis of the x-ray structure of homodimeric avian farnesyl diphosphate synthase (geranyltransferase, EC 2.5.1.10) coupled with information about conserved amino acids obtained from a sequence alignment of 35 isoprenyl diphosphate synthases that synthesize farnesyl (C15), geranylgeranyl (C20), and higher chain length isoprenoid diphosphates suggested that the side chains of residues corresponding to F112 and F113 in the avian enzyme were important for determining the ultimate length of the hydrocarbon chains. This hypothesis was supported by site-directed mutagenesis to transform wild-type avian farnesyl diphosphate synthase (FPS) into synthases capable of producing geranylgeranyl diphosphate (F112A), geranylfarnesyl (C25) diphosphate (F113S), and longer chain prenyl diphosphates (F112A/F113S). An x-ray analysis of the structure of the F112A/F113S mutant in the apo state and with allylic substrates bound produced the strongest evidence that these mutations caused the observed change in product specificity by directly altering the size of the binding pocket for the growing isoprenoid chain in the active site of the enzyme. The proposed binding pocket in the apo mutant structure was increased in depth by 5.8 Å as compared with that for the wild-type enzyme. Allylic diphosphates were observed in the holo structures, bound through magnesium ions to the aspartates of the first of two conserved aspartate-rich sequences (D117–D121), with the hydrocarbon tails of all the ligands growing down the hydrophobic pocket toward the mutation site. A model was constructed to show how the growth of a long chain prenyl product may proceed by creation of a hydrophobic passageway from the FPS active site to the outside surface of the enzyme.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Many persistent viruses have evolved the ability to subvert MHC class I antigen presentation. Indeed, human cytomegalovirus (HCMV) encodes at least four proteins that down-regulate cell-surface expression of class I. The HCMV unique short (US)2 glycoprotein binds newly synthesized class I molecules within the endoplasmic reticulum (ER) and subsequently targets them for proteasomal degradation. We report the crystal structure of US2 bound to the HLA-A2/Tax peptide complex. US2 associates with HLA-A2 at the junction of the peptide-binding region and the α3 domain, a novel binding surface on class I that allows US2 to bind independently of peptide sequence. Mutation of class I heavy chains confirms the importance of this binding site in vivo. Available data on class I-ER chaperone interactions indicate that chaperones would not impede US2 binding. Unexpectedly, the US2 ER-luminal domain forms an Ig-like fold. A US2 structure-based sequence alignment reveals that seven HCMV proteins, at least three of which function in immune evasion, share the same fold as US2. The structure allows design of further experiments to determine how US2 targets class I molecules for degradation.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Apert syndrome (AS) is characterized by craniosynostosis (premature fusion of cranial sutures) and severe syndactyly of the hands and feet. Two activating mutations, Ser-252 → Trp and Pro-253 → Arg, in fibroblast growth factor receptor 2 (FGFR2) account for nearly all known cases of AS. To elucidate the mechanism by which these substitutions cause AS, we determined the crystal structures of these two FGFR2 mutants in complex with fibroblast growth factor 2 (FGF2) . These structures demonstrate that both mutations introduce additional interactions between FGFR2 and FGF2, thereby augmenting FGFR2–FGF2 affinity. Moreover, based on these structures and sequence alignment of the FGF family, we propose that the Pro-253 → Arg mutation will indiscriminately increase the affinity of FGFR2 toward any FGF. In contrast, the Ser-252 → Trp mutation will selectively enhance the affinity of FGFR2 toward a limited subset of FGFs. These predictions are consistent with previous biochemical data describing the effects of AS mutations on FGF binding. Alterations in FGFR2 ligand affinity and specificity may allow inappropriate autocrine or paracrine activation of FGFR2. Furthermore, the distinct gain-of-function interactions observed in each crystal structure provide a model to explain the phenotypic variability among AS patients.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The three-dimensional structure of Aspergillus niger pectin lyase B (PLB) has been determined by crystallographic techniques at a resolution of 1.7 Å. The model, with all 359 amino acids and 339 water molecules, refines to a final crystallographic R factor of 16.5%. The polypeptide backbone folds into a large right-handed cylinder, termed a parallel β helix. Loops of various sizes and conformations protrude from the central helix and probably confer function. The largest loop of 53 residues folds into a small domain consisting of three antiparallel β strands, one turn of an α helix, and one turn of a 310 helix. By comparison with the structure of Erwinia chrysanthemi pectate lyase C (PelC), the primary sequence alignment between the pectate and pectin lyase subfamilies has been corrected and the active site region for the pectin lyases deduced. The substrate-binding site in PLB is considerably less hydrophilic than the comparable PelC region and consists of an extensive network of highly conserved Trp and His residues. The PLB structure provides an atomic explanation for the lack of a catalytic requirement for Ca2+ in the pectin lyase family, in contrast to that found in the pectate lyase enzymes. Surprisingly, however, the PLB site analogous to the Ca2+ site in PelC is filled with a positive charge provided by a conserved Arg in the pectin lyases. The significance of the finding with regard to the enzymatic mechanism is discussed.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The guinea pig estrogen sulfotransferase gene has been cloned and compared to three other cloned steroid and phenol sulfotransferase genes (human estrogen sulfotransferase, human phenol sulfotransferase, and guinea pig 3 alpha-hydroxysteroid sulfotransferase). The four sulfotransferase genes demonstrate a common outstanding feature: the splice sites for their 3'-terminal exons are identically located. That is, the 3'-terminal exon splice sites involve a glycine that constitutes the N-terminal glycine of an invariably conserved GXXGXXK motif present in all steroid and phenol sulfotransferases for which primary structures are known. This consistency strongly suggests that all steroid and phenol sulfotransferase genes will be similarly spliced. The GXXGXXK motif forms the active binding site for the universal sulfonate donor 3'-phosphoadenosine 5'-phosphosulfate. Amino acid sequence alignment of 19 cloned steroid and phenol sulfotransferases starting with the GXXGXXK motif indicates that the 3'-terminal exon for each steroid and phenol sulfotransferase gene encodes a similarly sized C-terminal fragment of the protein. Interestingly, on further analysis of the alignment, three distinct amino acid sequence patterns emerge. The presence of the conserved functional GXXGXXK motif suggests that the protein domains encoded by steroid and phenol sulfotransferase 3'-terminal exons have evolved from a common ancestor. Furthermore, it is hypothesized that during the course of evolution, the 3'-terminal exon further diverged into at least three sulfotransferase subdivisions: a phenol or aryl group, an estrogen or phenolic steroid group, and a neutral steroid group.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

STACK is a tool for detection and visualisation of expressed transcript variation in the context of developmental and pathological states. The datasystem organises and reconstructs human transcripts from available public data in the context of expression state. The expression state of a transcript can include developmental state, pathological association, site of expression and isoform of expressed transcript. STACK consensus transcripts are reconstructed from clusters that capture and reflect the growing evidence of transcript diversity. The comprehensive capture of transcript variants is achieved by the use of a novel clustering approach that is tolerant of sub-sequence diversity and does not rely on pairwise alignment. This is in contrast with other gene indexing projects. STACK is generated at least four times a year and represents the exhaustive processing of all publicly available human EST data extracted from GenBank. This processed information can be explored through 15 tissue-specific categories, a disease-related category and a whole-body index and is accessible via WWW at http://www.sanbi.ac.za/Dbases.html. STACK represents a broadly applicable resource, as it is the only reconstructed transcript database for which the tools for its generation are also broadly available (http://www.sanbi.ac.za/CODES).

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We present an approach to map large numbers of Tc1 transposon insertions in the genome of Caenorhabditis elegans. Strains have been described that contain up to 500 polymorphic Tc1 insertions. From these we have cloned and shotgun sequenced over 2000 Tc1 flanks, resulting in an estimated set of 400 or more distinct Tc1 insertion alleles. Alignment of these sequences revealed a weak Tc1 insertion site consensus sequence that was symmetric around the invariant TA target site and reads CAYATATRTG. The Tc1 flanking sequences were compared with 40 Mbp of a C. elegans genome sequence. We found 151 insertions within the sequenced area, a density of ≈1 Tc1 insertion in every 265 kb. As the rest of the C. elegans genome sequence is obtained, remaining Tc1 alleles will fall into place. These mapped Tc1 insertions can serve two functions: (i) insertions in or near genes can be used to isolate deletion derivatives that have that gene mutated; and (ii) they represent a dense collection of polymorphic sequence-tagged sites. We demonstrate a strategy to use these Tc1 sequence-tagged sites in fine-mapping mutations.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The double helix is a ubiquitous feature of RNA molecules and provides a target for nucleases involved in RNA maturation and decay. Escherichia coli ribonuclease III participates in maturation and decay pathways by site-specifically cleaving double-helical structures in cellular and viral RNAs. The site of cleavage can determine RNA functional activity and half-life and is specified in part by local tertiary structure elements such as internal loops. The involvement of base pair sequence in determining cleavage sites is unclear, because RNase III can efficiently degrade polymeric double-stranded RNAs of low sequence complexity. An alignment of RNase III substrates revealed an exclusion of specific Watson–Crick bp sequences at defined positions relative to the cleavage site. Inclusion of these “disfavored” sequences in a model substrate strongly inhibited cleavage in vitro by interfering with RNase III binding. Substrate cleavage also was inhibited by a 3-bp sequence from the selenocysteine-accepting tRNASec, which acts as an antideterminant of EF-Tu binding to tRNASec. The inhibitory bp sequences, together with local tertiary structure, can confer site specificity to cleavage of cellular and viral substrates without constraining the degradative action of RNase III on polymeric double-stranded RNA. Base pair antideterminants also may protect double-helical elements in other RNA molecules with essential functions.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The distribution of optimal local alignment scores of random sequences plays a vital role in evaluating the statistical significance of sequence alignments. These scores can be well described by an extreme-value distribution. The distribution’s parameters depend upon the scoring system employed and the random letter frequencies; in general they cannot be derived analytically, but must be estimated by curve fitting. For obtaining accurate parameter estimates, a form of the recently described ‘island’ method has several advantages. We describe this method in detail, and use it to investigate the functional dependence of these parameters on finite-length edge effects.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The IMGT/HLA Database (www.ebi.ac.uk/imgt/hla/) specialises in sequences of polymorphic genes of the HLA system, the human major histocompatibility complex (MHC). The HLA complex is located within the 6p21.3 region on the short arm of human chromosome 6 and contains more than 220 genes of diverse function. Many of the genes encode proteins of the immune system and these include the 21 highly polymorphic HLA genes, which influence the outcome of clinical transplantation and confer susceptibility to a wide range of non-infectious diseases. The database contains sequences for all HLA alleles officially recognised by the WHO Nomenclature Committee for Factors of the HLA System and provides users with online tools and facilities for their retrieval and analysis. These include allele reports, alignment tools and detailed descriptions of the source cells. The online IMGT/HLA submission tool allows both new and confirmatory sequences to be submitted directly to the WHO Nomenclature Committee. The latest version (release 1.7.0 July 2000) contains 1220 HLA alleles derived from over 2700 component sequences from the EMBL/GenBank/DDBJ databases. The HLA database provides a model which will be extended to provide specialist databases for polymorphic MHC genes of other species.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

BAliBASE is specifically designed to serve as an evaluation resource to address all the problems encountered when aligning complete sequences. The database contains high quality, manually constructed multiple sequence alignments together with detailed annotations. The alignments are all based on three-dimensional structural superpositions, with the exception of the transmembrane sequences. The first release provided sets of reference alignments dealing with the problems of high variability, unequal repartition and large N/C-terminal extensions and internal insertions. Here we describe version 2.0 of the database, which incorporates three new reference sets of alignments containing structural repeats, trans­membrane sequences and circular permutations to evaluate the accuracy of detection/prediction and alignment of these complex sequences. BAliBASE can be viewed at the web site http://www-igbmc.u-strasbg.fr/BioInfo/BAliBASE2/index.html or can be downloaded from ftp://ftp-igbmc.u-strasbg.fr/pub/BAliBASE2/.