41 resultados para local sequence alignment problem


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Gene recognition is one of the most important problems in computational molecular biology. Previous attempts to solve this problem were based on statistics, and applications of combinatorial methods for gene recognition were almost unexplored. Recent advances in large-scale cDNA sequencing open a way toward a new approach to gene recognition that uses previously sequenced genes as a clue for recognition of newly sequenced genes. This paper describes a spliced alignment algorithm and software tool that explores all possible exon assemblies in polynomial time and finds the multiexon structure with the best fit to a related protein. Unlike other existing methods, the algorithm successfully recognizes genes even in the case of short exons or exons with unusual codon usage; we also report correct assemblies for genes with more than 10 exons. On a test sample of human genes with known mammalian relatives, the average correlation between the predicted and actual proteins was 99%. The algorithm correctly reconstructed 87% of genes and the rare discrepancies between the predicted and real exon-intron structures were caused either by short (less than 5 amino acids) initial/terminal exons or by alternative splicing. Moreover, the algorithm predicts human genes reasonably well when the homologous protein is nonvertebrate or even prokaryotic. The surprisingly good performance of the method was confirmed by extensive simulations: in particular, with target proteins at 160 accepted point mutations (PAM) (25% similarity), the correlation between the predicted and actual genes was still as high as 95%.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We have searched for a minimal interaction motif in τ protein that supports the aggregation into Alzheimer-like paired helical filaments. Digestion of the repeat domain with different proteases yields a GluC-induced fragment comprising 43 residues (termed PHF43), which represents the third repeat of τ plus some flanking residues. This fragment self assembles readily into thin filaments without a paired helical appearance, but these filaments are highly competent to nucleate bona fide PHFs from full-length τ. Probing the interactions of PHF43 with overlapping peptides derived from the full τ sequence yields a minimal hexapeptide interaction motif of 306VQIVYK311 at the beginning of the third internal repeat. This motif coincides with the highest predicted β-structure potential in τ. CD and Fourier transform infrared spectroscopy shows that PHF43 acquires pronounced β structure in conditions of self assembly. Point mutations in the hexapeptide region by proline-scanning mutagenesis prevent the aggregation. The data indicate that PHF assembly is initiated by a short fragment containing the minimal interaction motif forming a local β structure embedded in a largely random-coil protein.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

There is a need for faster and more sensitive algorithms for sequence similarity searching in view of the rapidly increasing amounts of genomic sequence data available. Parallel processing capabilities in the form of the single instruction, multiple data (SIMD) technology are now available in common microprocessors and enable a single microprocessor to perform many operations in parallel. The ParAlign algorithm has been specifically designed to take advantage of this technology. The new algorithm initially exploits parallelism to perform a very rapid computation of the exact optimal ungapped alignment score for all diagonals in the alignment matrix. Then, a novel heuristic is employed to compute an approximate score of a gapped alignment by combining the scores of several diagonals. This approximate score is used to select the most interesting database sequences for a subsequent Smith–Waterman alignment, which is also parallelised. The resulting method represents a substantial improvement compared to existing heuristics. The sensitivity and specificity of ParAlign was found to be as good as Smith–Waterman implementations when the same method for computing the statistical significance of the matches was used. In terms of speed, only the significantly less sensitive NCBI BLAST 2 program was found to outperform the new approach. Online searches are available at http://dna.uio.no/search/

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper, a new way to think about, and to construct, pairwise as well as multiple alignments of DNA and protein sequences is proposed. Rather than forcing alignments to either align single residues or to introduce gaps by defining an alignment as a path running right from the source up to the sink in the associated dot-matrix diagram, we propose to consider alignments as consistent equivalence relations defined on the set of all positions occurring in all sequences under consideration. We also propose constructing alignments from whole segments exhibiting highly significant overall similarity rather than by aligning individual residues. Consequently, we present an alignment algorithm that (i) is based on segment-to-segment comparison instead of the commonly used residue-to-residue comparison and which (ii) avoids the well-known difficulties concerning the choice of appropriate gap penalties: gaps are not treated explicity, but remain as those parts of the sequences that do not belong to any of the aligned segments. Finally, we discuss the application of our algorithm to two test examples and compare it with commonly used alignment methods. As a first example, we aligned a set of 11 DNA sequences coding for functional helix-loop-helix proteins. Though the sequences show only low overall similarity, our program correctly aligned all of the 11 functional sites, which was a unique result among the methods tested. As a by-product, the reading frames of the sequences were identified. Next, we aligned a set of ribonuclease H proteins and compared our results with alignments produced by other programs as reported by McClure et al. [McClure, M. A., Vasi, T. K. & Fitch, W. M. (1994) Mol. Biol. Evol. 11, 571-592]. Our program was one of the best scoring programs. However, in contrast to other methods, our protein alignments are independent of user-defined parameters.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Local protein structure prediction efforts have consistently failed to exceed approximately 70% accuracy. We characterize the degeneracy of the mapping from local sequence to local structure responsible for this failure by investigating the extent to which similar sequence segments found in different proteins adopt similar three-dimensional structures. Sequence segments 3-15 residues in length from 154 different protein families are partitioned into neighborhoods containing segments with similar sequences using cluster analysis. The consistency of the sequence-to-structure mapping is assessed by comparing the local structures adopted by sequence segments in the same neighborhood in proteins of known structure. In the 154 families, 45% and 28% of the positions occur in neighborhoods in which one and two local structures predominate, respectively. The sequence patterns that characterize the neighborhoods in the first class probably include virtually all of the short sequence motifs in proteins that consistently occur in a particular local structure. These patterns, many of which occur in transitions between secondary structural elements, are an interesting combination of previously studied and novel motifs. The identification of sequence patterns that consistently occur in one or a small number of local structures in proteins should contribute to the prediction of protein structure from sequence.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

One gene locus on chromosome I in Saccharomyces cerevisiae encodes a protein (YAB5_YEAST; accession no. P31378) with local sequence similarity to the DNA repair glycosylase endonuclease III from Escherichia coli. We have analyzed the function of this gene, now assigned NTG1 (endonuclease three-like glycosylase 1), by cloning, mutant analysis, and gene expression in E. coli. Targeted gene disruption of NTG1 produces a mutant that is sensitive to H2O2 and menadione, indicating that NTG1 is required for repair of oxidative DNA damage in vivo. Northern blot analysis and expression studies of a NTG1-lacZ gene fusion showed that NTG1 is induced by cell exposure to different DNA damaging agents, particularly menadione, and hence belongs to the DNA damage-inducible regulon in S. cerevisiae. When expressed in E. coli, the NTG1 gene product cleaves plasmid DNA damaged by osmium tetroxide, thus, indicating specificity for thymine glycols in DNA similarly as is the case for EndoIII. However, NTG1 also releases formamidopyrimidines from DNA with high efficiency and, hence, represents a glycosylase with a novel range of substrate recognition. Sequences similar to NTG1 from other eukaryotes, including Caenorhabditis elegans, Schizosaccharomyces pombe, and mammals, have recently been entered in the GenBank suggesting the universal presence of NTG1-like genes in higher organisms. S. cerevisiae NTG1 does not have the [4Fe-4S] cluster DNA binding domain characteristic of the other members of this family.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Telomerase, a ribonucleoprotein complex, adds hexameric repeats called “telomeres” to the growing ends of chromosomal DNA. Characterization of mammalian telomerase has been elusive because of its low level of expression. We describe a bioinformatics approach to enrich and characterize the human telomerase complex. Using local sequence homology search methods, we detected similarity of the Tetrahymena p80 subunit of telomerase with the autoantigen Ro60. Antibodies to Ro60 immunoprecipitated the telomerase activity. Ro60 and p80 proteins were cross-recognizable by antibodies to either protein. Telomerase activity and the RNA component of telomerase complex were localized to a doublet in a native gel from the Ro60 antibody-precipitated material. The enriched material showed specific binding to a TTA GGG probe in vitro in an RNA template-dependent manner. Polyclonal antibodies to the doublet also immunoprecipitated the telomerase activity. These results suggest an evolutionary conservation of the telomerase proteins.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

σ32, the product of the rpoH gene in Escherichia coli, provides promoter specificity by interacting with core RNAP. Amino acid sequence alignment of σ32 with other sigma factors in the σ70 family has revealed regions of sequence homology. We have investigated the function of the most highly conserved region, 2.2, using purified products of various rpoH alleles. Core RNAP binding analysis by glycerol gradient sedimentation has revealed reduced core RNAP affinity for one of the mutant σ32 proteins, Q80R. This reduced core interaction is exacerbated in the presence of σ70, which competes with σ32 for binding of core RNAP. When a different but more conserved amino acid was introduced at this position by site-directed mutagenesis (Q80N), this mutant sigma factor still displayed a significant reduction in its core RNAP affinity. Based on these results, we conclude that at least one specific amino acid in region 2.2 is involved in core RNAP interaction.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

An analysis of the x-ray structure of homodimeric avian farnesyl diphosphate synthase (geranyltransferase, EC 2.5.1.10) coupled with information about conserved amino acids obtained from a sequence alignment of 35 isoprenyl diphosphate synthases that synthesize farnesyl (C15), geranylgeranyl (C20), and higher chain length isoprenoid diphosphates suggested that the side chains of residues corresponding to F112 and F113 in the avian enzyme were important for determining the ultimate length of the hydrocarbon chains. This hypothesis was supported by site-directed mutagenesis to transform wild-type avian farnesyl diphosphate synthase (FPS) into synthases capable of producing geranylgeranyl diphosphate (F112A), geranylfarnesyl (C25) diphosphate (F113S), and longer chain prenyl diphosphates (F112A/F113S). An x-ray analysis of the structure of the F112A/F113S mutant in the apo state and with allylic substrates bound produced the strongest evidence that these mutations caused the observed change in product specificity by directly altering the size of the binding pocket for the growing isoprenoid chain in the active site of the enzyme. The proposed binding pocket in the apo mutant structure was increased in depth by 5.8 Å as compared with that for the wild-type enzyme. Allylic diphosphates were observed in the holo structures, bound through magnesium ions to the aspartates of the first of two conserved aspartate-rich sequences (D117–D121), with the hydrocarbon tails of all the ligands growing down the hydrophobic pocket toward the mutation site. A model was constructed to show how the growth of a long chain prenyl product may proceed by creation of a hydrophobic passageway from the FPS active site to the outside surface of the enzyme.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

To improve the accuracy of predicting membrane protein sorting signals, we developed a general methodology for defining trafficking signal consensus sequences in the environment of the living cell. Our approach uses retroviral gene transfer to create combinatorial expression libraries of trafficking signal variants in mammalian cells, flow cytometry to sort cells based on trafficking phenotype, and quantitative trafficking assays to measure the efficacy of individual signals. Using this strategy to analyze arginine- and lysine-based endoplasmic reticulum localization signals, we demonstrate that small changes in the local sequence context dramatically alter signal strength, generating a broad spectrum of trafficking phenotypes. Finally, using sequences from our screen, we found that the potency of di-lysine, but not di-arginine, mediated endoplasmic reticulum localization was correlated with the strength of interaction with α-COP.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The energy of DNA deformation plays a crucial and active role in its packaging and its function in the cell. Considerable effort has gone into developing methodologies capable of evaluating the local sequence-directed curvature and flexibility of a DNA chain. These studies thus far have focused on DNA constructs expressly tailored either with anomalous flexibility or curvature tracts. Here we demonstrate that these two structural properties can be mapped also along the chain of a “natural” DNA with any sequence on the basis of its scanning force microscope (SFM) images. To know the orientation of the sequence of the investigated DNA molecules in their SFM images, we prepared a palindromic dimer of the long DNA molecule under study. The palindromic symmetry also acted as an internal gauge of the statistical significance of the analysis carried out on the SFM images of the dimer molecules. It was found that although the curvature modulus is not efficient in separating static and dynamic contributions to the curvature of the population of molecules, the curvature taken with its direction (its sign in two dimensions) permits the direct separation of the intrinsic curvature from the flexibility contributions. The sequence-dependent flexibility seems to vary monotonically with the chain's intrinsic curvature; the chain rigidity was found to modulate as its local thermodynamic stability and does not correlate with the dinucleotide chain rigidities evaluation made from x-ray data by other authors.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Many persistent viruses have evolved the ability to subvert MHC class I antigen presentation. Indeed, human cytomegalovirus (HCMV) encodes at least four proteins that down-regulate cell-surface expression of class I. The HCMV unique short (US)2 glycoprotein binds newly synthesized class I molecules within the endoplasmic reticulum (ER) and subsequently targets them for proteasomal degradation. We report the crystal structure of US2 bound to the HLA-A2/Tax peptide complex. US2 associates with HLA-A2 at the junction of the peptide-binding region and the α3 domain, a novel binding surface on class I that allows US2 to bind independently of peptide sequence. Mutation of class I heavy chains confirms the importance of this binding site in vivo. Available data on class I-ER chaperone interactions indicate that chaperones would not impede US2 binding. Unexpectedly, the US2 ER-luminal domain forms an Ig-like fold. A US2 structure-based sequence alignment reveals that seven HCMV proteins, at least three of which function in immune evasion, share the same fold as US2. The structure allows design of further experiments to determine how US2 targets class I molecules for degradation.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Apert syndrome (AS) is characterized by craniosynostosis (premature fusion of cranial sutures) and severe syndactyly of the hands and feet. Two activating mutations, Ser-252 → Trp and Pro-253 → Arg, in fibroblast growth factor receptor 2 (FGFR2) account for nearly all known cases of AS. To elucidate the mechanism by which these substitutions cause AS, we determined the crystal structures of these two FGFR2 mutants in complex with fibroblast growth factor 2 (FGF2) . These structures demonstrate that both mutations introduce additional interactions between FGFR2 and FGF2, thereby augmenting FGFR2–FGF2 affinity. Moreover, based on these structures and sequence alignment of the FGF family, we propose that the Pro-253 → Arg mutation will indiscriminately increase the affinity of FGFR2 toward any FGF. In contrast, the Ser-252 → Trp mutation will selectively enhance the affinity of FGFR2 toward a limited subset of FGFs. These predictions are consistent with previous biochemical data describing the effects of AS mutations on FGF binding. Alterations in FGFR2 ligand affinity and specificity may allow inappropriate autocrine or paracrine activation of FGFR2. Furthermore, the distinct gain-of-function interactions observed in each crystal structure provide a model to explain the phenotypic variability among AS patients.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The three-dimensional structure of Aspergillus niger pectin lyase B (PLB) has been determined by crystallographic techniques at a resolution of 1.7 Å. The model, with all 359 amino acids and 339 water molecules, refines to a final crystallographic R factor of 16.5%. The polypeptide backbone folds into a large right-handed cylinder, termed a parallel β helix. Loops of various sizes and conformations protrude from the central helix and probably confer function. The largest loop of 53 residues folds into a small domain consisting of three antiparallel β strands, one turn of an α helix, and one turn of a 310 helix. By comparison with the structure of Erwinia chrysanthemi pectate lyase C (PelC), the primary sequence alignment between the pectate and pectin lyase subfamilies has been corrected and the active site region for the pectin lyases deduced. The substrate-binding site in PLB is considerably less hydrophilic than the comparable PelC region and consists of an extensive network of highly conserved Trp and His residues. The PLB structure provides an atomic explanation for the lack of a catalytic requirement for Ca2+ in the pectin lyase family, in contrast to that found in the pectate lyase enzymes. Surprisingly, however, the PLB site analogous to the Ca2+ site in PelC is filled with a positive charge provided by a conserved Arg in the pectin lyases. The significance of the finding with regard to the enzymatic mechanism is discussed.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Escherichia coli possesses three SOS-inducible DNA polymerases (Pol II, IV, and V) that were recently found to participate in translesion synthesis and mutagenesis. Involvement of these polymerases appears to depend on the nature of the lesion and its local sequence context, as illustrated by the bypass of a single N-2-acetylaminofluorene adduct within the NarI mutation hot spot. Indeed, error-free bypass requires Pol V (umuDC), whereas mutagenic (−2 frameshift) bypass depends on Pol II (polB). In this paper, we show that purified DNA Pol II is able in vitro to generate the −2 frameshift bypass product observed in vivo at the NarI sites. Although the ΔpolB strain is completely defective in this mutation pathway, introduction of the polB gene on a low copy number plasmid restores the −2 frameshift pathway. In fact, modification of the relative copy number of polB versus umuDC genes results in a corresponding modification in the use of the frameshift versus error-free translesion pathways, suggesting a direct competition between Pol II and V for the bypass of the same lesion. Whether such a polymerase competition model for translesion synthesis will prove to be generally applicable remains to be confirmed.