41 resultados para local sequence alignment problem
Resumo:
Integration of human immunodeficiency virus type 1 cDNA into a target DNA can be strongly influenced by the conformation of the target. For example, integration in vitro is sometimes favored in target DNAs containing sequence-directed bends or DNA distortions caused by bound proteins. We have analyzed the effect of DNA bending by studying integration into two well-characterized protein-DNA complexes: Escherichia coli integration host factor (IHF) protein bound to a phage IHF site, and the DNA binding domain of human lymphoid enhancer factor (LEF) bound to a LEF site. Both of these proteins have previously been reported to bend DNA by approximately 140 degrees. Binding of IHF greatly increases the efficiency of in vitro integration at hotspots within the IHF site. We analyzed a series of mutants in which the IHF site was modified at the most prominent hotspot. We found that each variant still displayed enhanced integration upon IHF binding. Evidently the local sequence is not critical for formation of an IHF hotspot. LEF binding did not create preferred sites for integration. The different effects of IHF and LEF binding can be rationalized in terms of the different proposed conformations of the two protein-DNA complexes.
Resumo:
The guinea pig estrogen sulfotransferase gene has been cloned and compared to three other cloned steroid and phenol sulfotransferase genes (human estrogen sulfotransferase, human phenol sulfotransferase, and guinea pig 3 alpha-hydroxysteroid sulfotransferase). The four sulfotransferase genes demonstrate a common outstanding feature: the splice sites for their 3'-terminal exons are identically located. That is, the 3'-terminal exon splice sites involve a glycine that constitutes the N-terminal glycine of an invariably conserved GXXGXXK motif present in all steroid and phenol sulfotransferases for which primary structures are known. This consistency strongly suggests that all steroid and phenol sulfotransferase genes will be similarly spliced. The GXXGXXK motif forms the active binding site for the universal sulfonate donor 3'-phosphoadenosine 5'-phosphosulfate. Amino acid sequence alignment of 19 cloned steroid and phenol sulfotransferases starting with the GXXGXXK motif indicates that the 3'-terminal exon for each steroid and phenol sulfotransferase gene encodes a similarly sized C-terminal fragment of the protein. Interestingly, on further analysis of the alignment, three distinct amino acid sequence patterns emerge. The presence of the conserved functional GXXGXXK motif suggests that the protein domains encoded by steroid and phenol sulfotransferase 3'-terminal exons have evolved from a common ancestor. Furthermore, it is hypothesized that during the course of evolution, the 3'-terminal exon further diverged into at least three sulfotransferase subdivisions: a phenol or aryl group, an estrogen or phenolic steroid group, and a neutral steroid group.
Resumo:
The distribution of optimal local alignment scores of random sequences plays a vital role in evaluating the statistical significance of sequence alignments. These scores can be well described by an extreme-value distribution. The distribution’s parameters depend upon the scoring system employed and the random letter frequencies; in general they cannot be derived analytically, but must be estimated by curve fitting. For obtaining accurate parameter estimates, a form of the recently described ‘island’ method has several advantages. We describe this method in detail, and use it to investigate the functional dependence of these parameters on finite-length edge effects.
Resumo:
The double helix is a ubiquitous feature of RNA molecules and provides a target for nucleases involved in RNA maturation and decay. Escherichia coli ribonuclease III participates in maturation and decay pathways by site-specifically cleaving double-helical structures in cellular and viral RNAs. The site of cleavage can determine RNA functional activity and half-life and is specified in part by local tertiary structure elements such as internal loops. The involvement of base pair sequence in determining cleavage sites is unclear, because RNase III can efficiently degrade polymeric double-stranded RNAs of low sequence complexity. An alignment of RNase III substrates revealed an exclusion of specific Watson–Crick bp sequences at defined positions relative to the cleavage site. Inclusion of these “disfavored” sequences in a model substrate strongly inhibited cleavage in vitro by interfering with RNase III binding. Substrate cleavage also was inhibited by a 3-bp sequence from the selenocysteine-accepting tRNASec, which acts as an antideterminant of EF-Tu binding to tRNASec. The inhibitory bp sequences, together with local tertiary structure, can confer site specificity to cleavage of cellular and viral substrates without constraining the degradative action of RNase III on polymeric double-stranded RNA. Base pair antideterminants also may protect double-helical elements in other RNA molecules with essential functions.
Resumo:
The database reported here is derived using the Combinatorial Extension (CE) algorithm which compares pairs of protein polypeptide chains and provides a list of structurally similar proteins along with their structure alignments. Using CE, structure–structure alignments can provide insights into biological function. When a protein of known function is shown to be structurally similar to a protein of unknown function, a relationship might be inferred; a relationship not necessarily detectable from sequence comparison alone. Establishing structure–structure relationships in this way is of great importance as we enter an era of structural genomics where there is a likelihood of an increasing number of structures with unknown functions being determined. Thus the CE database is an example of a useful tool in the annotation of protein structures of unknown function. Comparisons can be performed on the complete PDB or on a structurally representative subset of proteins. The source protein(s) can be from the PDB (updated monthly) or uploaded by the user. CE provides sequence alignments resulting from structural alignments and Cartesian coordinates for the aligned structures, which may be analyzed using the supplied Compare3D Java applet, or downloaded for further local analysis. Searches can be run from the CE web site, http://cl.sdsc.edu/ce.html, or the database and software downloaded from the site for local use.
Resumo:
STACK is a tool for detection and visualisation of expressed transcript variation in the context of developmental and pathological states. The datasystem organises and reconstructs human transcripts from available public data in the context of expression state. The expression state of a transcript can include developmental state, pathological association, site of expression and isoform of expressed transcript. STACK consensus transcripts are reconstructed from clusters that capture and reflect the growing evidence of transcript diversity. The comprehensive capture of transcript variants is achieved by the use of a novel clustering approach that is tolerant of sub-sequence diversity and does not rely on pairwise alignment. This is in contrast with other gene indexing projects. STACK is generated at least four times a year and represents the exhaustive processing of all publicly available human EST data extracted from GenBank. This processed information can be explored through 15 tissue-specific categories, a disease-related category and a whole-body index and is accessible via WWW at http://www.sanbi.ac.za/Dbases.html. STACK represents a broadly applicable resource, as it is the only reconstructed transcript database for which the tools for its generation are also broadly available (http://www.sanbi.ac.za/CODES).
Resumo:
Competing hypotheses seek to explain the evolution of oxygenic and anoxygenic processes of photosynthesis. Since chlorophyll is less reduced and precedes bacteriochlorophyll on the modern biosynthetic pathway, it has been proposed that chlorophyll preceded bacteriochlorophyll in its evolution. However, recent analyses of nucleotide sequences that encode chlorophyll and bacteriochlorophyll biosynthetic enzymes appear to provide support for an alternative hypothesis. This is that the evolution of bacteriochlorophyll occurred earlier than the evolution of chlorophyll. Here we demonstrate that the presence of invariant sites in sequence datasets leads to inconsistency in tree building (including maximum-likelihood methods). Homologous sequences with different biological functions often share invariant sites at the same nucleotide positions. However, different constraints can also result in additional invariant sites unique to the genes, which have specific and different biological functions. Consequently, the distribution of these sites can be uneven between the different types of homologous genes. The presence of invariant sites, shared by related biosynthetic genes as well as those unique to only some of these genes, has misled the recent evolutionary analysis of oxygenic and anoxygenic photosynthetic pigments. We evaluate an alternative scheme for the evolution of chlorophyll and bacteriochlorophyll.
Resumo:
Praying mantids use binocular cues to judge whether their prey is in striking distance. When there are several moving targets within their binocular visual field, mantids need to solve the correspondence problem. They must select between the possible pairings of retinal images in the two eyes so that they can strike at a single real target. In this study, mantids were presented with two targets in various configurations, and the resulting fixating saccades that precede the strike were analyzed. The distributions of saccades show that mantids consistently prefer one out of several possible matches. Selection is in part guided by the position and the spatiotemporal features of the target image in each eye. Selection also depends upon the binocular disparity of the images, suggesting that insects can perform local binocular computations. The pairing rules ensure that mantids tend to aim at real targets and not at “ghost” targets arising from false matches.
Resumo:
We present an approach to map large numbers of Tc1 transposon insertions in the genome of Caenorhabditis elegans. Strains have been described that contain up to 500 polymorphic Tc1 insertions. From these we have cloned and shotgun sequenced over 2000 Tc1 flanks, resulting in an estimated set of 400 or more distinct Tc1 insertion alleles. Alignment of these sequences revealed a weak Tc1 insertion site consensus sequence that was symmetric around the invariant TA target site and reads CAYATATRTG. The Tc1 flanking sequences were compared with 40 Mbp of a C. elegans genome sequence. We found 151 insertions within the sequenced area, a density of ≈1 Tc1 insertion in every 265 kb. As the rest of the C. elegans genome sequence is obtained, remaining Tc1 alleles will fall into place. These mapped Tc1 insertions can serve two functions: (i) insertions in or near genes can be used to isolate deletion derivatives that have that gene mutated; and (ii) they represent a dense collection of polymorphic sequence-tagged sites. We demonstrate a strategy to use these Tc1 sequence-tagged sites in fine-mapping mutations.
Resumo:
For each pair (n, k) with 1 ≤ k < n, we construct a tight frame (ρλ : λ ∈ Λ) for L2 (Rn), which we call a frame of k-plane ridgelets. The intent is to efficiently represent functions that are smooth away from singularities along k-planes in Rn. We also develop tools to help decide whether k-plane ridgelets provide the desired efficient representation. We first construct a wavelet-like tight frame on the X-ray bundle χn,k—the fiber bundle having the Grassman manifold Gn,k of k-planes in Rn for base space, and for fibers the orthocomplements of those planes. This wavelet-like tight frame is the pushout to χn,k, via the smooth local coordinates of Gn,k, of an orthonormal basis of tensor Meyer wavelets on Euclidean space Rk(n−k) × Rn−k. We then use the X-ray isometry [Solmon, D. C. (1976) J. Math. Anal. Appl. 56, 61–83] to map this tight frame isometrically to a tight frame for L2(Rn)—the k-plane ridgelets. This construction makes analysis of a function f ∈ L2(Rn) by k-plane ridgelets identical to the analysis of the k-plane X-ray transform of f by an appropriate wavelet-like system for χn,k. As wavelets are typically effective at representing point singularities, it may be expected that these new systems will be effective at representing objects whose k-plane X-ray transform has a point singularity. Objects with discontinuities across hyperplanes are of this form, for k = n − 1.
Resumo:
The generalized master equations (GMEs) that contain multiple time scales have been derived quantum mechanically. The GME method has then been applied to a model of charge migration in proteins that invokes the hole hopping between local amino acid sites driven by the torsional motions of the floppy backbones. This model is then applied to analyze the experimental results for sequence-dependent long-range hole transport in DNA reported by Meggers et al. [Meggers, E., Michel-Beyerle, M. E., & Giese, B. (1998) J. Am. Chem. Soc. 120, 12950–12955]. The model has also been applied to analyze the experimental results of femtosecond dynamics of DNA-mediated electron transfer reported by Zewail and co-workers [Wan, C., Fiebig, T., Kelley, S. O., Treadway, C. R., Barton, J. K. & Zewail, A. H. (1999) Proc. Natl. Acad. Sci. USA 96, 6014–6019]. The initial events in the dynamics of protein folding have begun to attract attention. The GME obtained in this paper will be applicable to this problem.
Resumo:
In the last decade, two tools, one drawn from information theory and the other from artificial neural networks, have proven particularly useful in many different areas of sequence analysis. The work presented herein indicates that these two approaches can be joined in a general fashion to produce a very powerful search engine that is capable of locating members of a given nucleic acid sequence family in either local or global sequence searches. This program can, in turn, be queried for its definition of the motif under investigation, ranking each base in context for its contribution to membership in the motif family. In principle, the method used can be applied to any binding motif, including both DNA and RNA sequence families, given sufficient family size.
Resumo:
An additivity-based sequence to reactivity algorithm for the interaction of members of the Kazal family of protein inhibitors with six selected serine proteinases is described. Ten consensus variable contact positions in the inhibitor were identified, and the 19 possible variants at each of these positions were expressed. The free energies of interaction of these variants and the wild type were measured. For an additive system, this data set allows for the calculation of all possible sequences, subject to some restrictions. The algorithm was extensively tested. It is exceptionally fast so that all possible sequences can be predicted. The strongest, the most specific possible, and the least specific inhibitors were designed, and an evolutionary problem was solved.
Resumo:
Chromosome 7q22 has been the focus of many cytogenetic and molecular studies aimed at delineating regions commonly deleted in myeloid leukemias and myelodysplastic syndromes. We have compared a gene-dense, GC-rich sub-region of 7q22 with the orthologous region on mouse chromosome 5. A physical map of 640 kb of genomic DNA from mouse chromosome 5 was derived from a series of overlapping bacterial artificial chromosomes. A 296 kb segment from the physical map, spanning Ache to Tfr2, was compared with 267 kb of human sequence. We identified a conserved linkage of 12 genes including an open reading frame flanked by Ache and Asr2, a novel cation-chloride cotransporter interacting protein Cip1, Ephb4, Zan and Perq1. While some of these genes have been previously described, in each case we present new data derived from our comparative sequence analysis. Adjacent unfinished sequence data from the mouse contains an orthologous block of 10 additional genes including three novel cDNA sequences that we subsequently mapped to human 7q22. Methods for displaying comparative genomic information, including unfinished sequence data, are becoming increasingly important. We supplement our printed comparative analysis with a new, Web-based program called Laj (local alignments with java). Laj provides interactive access to archived pairwise sequence alignments via the WWW. It displays synchronized views of a dot-plot, a percent identity plot, a nucleotide-level local alignment and a variety of relevant annotations. Our mouse–human comparison can be viewed at http://web.uvic.ca/~bioweb/laj.html. Laj is available at http://bio.cse.psu.edu/, along with online documentation and additional examples of annotated genomic regions.
Resumo:
The IMGT/HLA Database (www.ebi.ac.uk/imgt/hla/) specialises in sequences of polymorphic genes of the HLA system, the human major histocompatibility complex (MHC). The HLA complex is located within the 6p21.3 region on the short arm of human chromosome 6 and contains more than 220 genes of diverse function. Many of the genes encode proteins of the immune system and these include the 21 highly polymorphic HLA genes, which influence the outcome of clinical transplantation and confer susceptibility to a wide range of non-infectious diseases. The database contains sequences for all HLA alleles officially recognised by the WHO Nomenclature Committee for Factors of the HLA System and provides users with online tools and facilities for their retrieval and analysis. These include allele reports, alignment tools and detailed descriptions of the source cells. The online IMGT/HLA submission tool allows both new and confirmatory sequences to be submitted directly to the WHO Nomenclature Committee. The latest version (release 1.7.0 July 2000) contains 1220 HLA alleles derived from over 2700 component sequences from the EMBL/GenBank/DDBJ databases. The HLA database provides a model which will be extended to provide specialist databases for polymorphic MHC genes of other species.