12 resultados para Attitude alignment
em National Center for Biotechnology Information - NCBI
Resumo:
The distribution of optimal local alignment scores of random sequences plays a vital role in evaluating the statistical significance of sequence alignments. These scores can be well described by an extreme-value distribution. The distribution’s parameters depend upon the scoring system employed and the random letter frequencies; in general they cannot be derived analytically, but must be estimated by curve fitting. For obtaining accurate parameter estimates, a form of the recently described ‘island’ method has several advantages. We describe this method in detail, and use it to investigate the functional dependence of these parameters on finite-length edge effects.
Resumo:
BAliBASE is specifically designed to serve as an evaluation resource to address all the problems encountered when aligning complete sequences. The database contains high quality, manually constructed multiple sequence alignments together with detailed annotations. The alignments are all based on three-dimensional structural superpositions, with the exception of the transmembrane sequences. The first release provided sets of reference alignments dealing with the problems of high variability, unequal repartition and large N/C-terminal extensions and internal insertions. Here we describe version 2.0 of the database, which incorporates three new reference sets of alignments containing structural repeats, transmembrane sequences and circular permutations to evaluate the accuracy of detection/prediction and alignment of these complex sequences. BAliBASE can be viewed at the web site http://www-igbmc.u-strasbg.fr/BioInfo/BAliBASE2/index.html or can be downloaded from ftp://ftp-igbmc.u-strasbg.fr/pub/BAliBASE2/.
Resumo:
The database reported here is derived using the Combinatorial Extension (CE) algorithm which compares pairs of protein polypeptide chains and provides a list of structurally similar proteins along with their structure alignments. Using CE, structure–structure alignments can provide insights into biological function. When a protein of known function is shown to be structurally similar to a protein of unknown function, a relationship might be inferred; a relationship not necessarily detectable from sequence comparison alone. Establishing structure–structure relationships in this way is of great importance as we enter an era of structural genomics where there is a likelihood of an increasing number of structures with unknown functions being determined. Thus the CE database is an example of a useful tool in the annotation of protein structures of unknown function. Comparisons can be performed on the complete PDB or on a structurally representative subset of proteins. The source protein(s) can be from the PDB (updated monthly) or uploaded by the user. CE provides sequence alignments resulting from structural alignments and Cartesian coordinates for the aligned structures, which may be analyzed using the supplied Compare3D Java applet, or downloaded for further local analysis. Searches can be run from the CE web site, http://cl.sdsc.edu/ce.html, or the database and software downloaded from the site for local use.
Resumo:
PALI (release 1.2) contains three-dimensional (3-D) structure-dependent sequence alignments as well as structure-based phylogenetic trees of homologous protein domains in various families. The data set of homologous protein structures has been derived by consulting the SCOP database (release 1.50) and the data set comprises 604 families of homologous proteins involving 2739 protein domain structures with each family made up of at least two members. Each member in a family has been structurally aligned with every other member in the same family (pairwise alignment) and all the members in the family are also aligned using simultaneous superposition (multiple alignment). The structural alignments are performed largely automatically, with manual interventions especially in the cases of distantly related proteins, using the program STAMP (version 4.2). Every family is also associated with two dendrograms, calculated using PHYLIP (version 3.5), one based on a structural dissimilarity metric defined for every pairwise alignment and the other based on similarity of topologically equivalent residues. These dendrograms enable easy comparison of sequence and structure-based relationships among the members in a family. Structure-based alignments with the details of structural and sequence similarities, superposed coordinate sets and dendrograms can be accessed conveniently using a web interface. The database can be queried for protein pairs with sequence or structural similarities falling within a specified range. Thus PALI forms a useful resource to help in analysing the relationship between sequence and structure variation at a given level of sequence similarity. PALI also contains over 653 ‘orphans’ (single member families). Using the web interface involving PSI_BLAST and PHYLIP it is possible to associate the sequence of a new protein with one of the families in PALI and generate a phylogenetic tree combining the query sequence and proteins of known 3-D structure. The database with the web interfaced search and dendrogram generation tools can be accessed at http://pa uling.mbu.iisc.ernet.in/~pali.
Resumo:
STACK is a tool for detection and visualisation of expressed transcript variation in the context of developmental and pathological states. The datasystem organises and reconstructs human transcripts from available public data in the context of expression state. The expression state of a transcript can include developmental state, pathological association, site of expression and isoform of expressed transcript. STACK consensus transcripts are reconstructed from clusters that capture and reflect the growing evidence of transcript diversity. The comprehensive capture of transcript variants is achieved by the use of a novel clustering approach that is tolerant of sub-sequence diversity and does not rely on pairwise alignment. This is in contrast with other gene indexing projects. STACK is generated at least four times a year and represents the exhaustive processing of all publicly available human EST data extracted from GenBank. This processed information can be explored through 15 tissue-specific categories, a disease-related category and a whole-body index and is accessible via WWW at http://www.sanbi.ac.za/Dbases.html. STACK represents a broadly applicable resource, as it is the only reconstructed transcript database for which the tools for its generation are also broadly available (http://www.sanbi.ac.za/CODES).
Resumo:
There is a need for faster and more sensitive algorithms for sequence similarity searching in view of the rapidly increasing amounts of genomic sequence data available. Parallel processing capabilities in the form of the single instruction, multiple data (SIMD) technology are now available in common microprocessors and enable a single microprocessor to perform many operations in parallel. The ParAlign algorithm has been specifically designed to take advantage of this technology. The new algorithm initially exploits parallelism to perform a very rapid computation of the exact optimal ungapped alignment score for all diagonals in the alignment matrix. Then, a novel heuristic is employed to compute an approximate score of a gapped alignment by combining the scores of several diagonals. This approximate score is used to select the most interesting database sequences for a subsequent Smith–Waterman alignment, which is also parallelised. The resulting method represents a substantial improvement compared to existing heuristics. The sensitivity and specificity of ParAlign was found to be as good as Smith–Waterman implementations when the same method for computing the statistical significance of the matches was used. In terms of speed, only the significantly less sensitive NCBI BLAST 2 program was found to outperform the new approach. Online searches are available at http://dna.uio.no/search/
Resumo:
The cortical microtubule array provides spatial information to the cellulose-synthesizing machinery within the plasma membrane of elongating cells. Until now data indicated that information is transferred from organized cortical microtubules to the cellulose-synthesizing complex, which results in the deposition of ordered cellulosic walls. How cortical microtubules become aligned is unclear. The literature indicates that biophysical forces, transmitted by the organized cellulose component of the cell wall, provide a spatial cue to orient cortical microtubules. This hypothesis was tested on tobacco (Nicotiana tabacum L.) protoplasts and suspension-cultured cells treated with the cellulose synthesis inhibitor isoxaben. Isoxaben (0.25–2.5 μm) inhibited the synthesis of cellulose microfibrils (detected by staining with 1 μg mL−1 fluorescent dye and polarized birefringence), the cells failed to elongate, and the cortical microtubules failed to become organized. The affects of isoxaben were reversible, and after its removal microtubules reorganized and cells elongated. Isoxaben did not depolymerize microtubules in vivo or inhibit the polymerization of tubulin in vitro. These data are consistent with the hypothesis that cellulose microfibrils, and hence cell elongation, are involved in providing spatial cues for cortical microtubule organization. These results compel us to extend the microtubule/microfibril paradigm to include the bidirectional flow of information.
Resumo:
In this paper, a new way to think about, and to construct, pairwise as well as multiple alignments of DNA and protein sequences is proposed. Rather than forcing alignments to either align single residues or to introduce gaps by defining an alignment as a path running right from the source up to the sink in the associated dot-matrix diagram, we propose to consider alignments as consistent equivalence relations defined on the set of all positions occurring in all sequences under consideration. We also propose constructing alignments from whole segments exhibiting highly significant overall similarity rather than by aligning individual residues. Consequently, we present an alignment algorithm that (i) is based on segment-to-segment comparison instead of the commonly used residue-to-residue comparison and which (ii) avoids the well-known difficulties concerning the choice of appropriate gap penalties: gaps are not treated explicity, but remain as those parts of the sequences that do not belong to any of the aligned segments. Finally, we discuss the application of our algorithm to two test examples and compare it with commonly used alignment methods. As a first example, we aligned a set of 11 DNA sequences coding for functional helix-loop-helix proteins. Though the sequences show only low overall similarity, our program correctly aligned all of the 11 functional sites, which was a unique result among the methods tested. As a by-product, the reading frames of the sequences were identified. Next, we aligned a set of ribonuclease H proteins and compared our results with alignments produced by other programs as reported by McClure et al. [McClure, M. A., Vasi, T. K. & Fitch, W. M. (1994) Mol. Biol. Evol. 11, 571-592]. Our program was one of the best scoring programs. However, in contrast to other methods, our protein alignments are independent of user-defined parameters.
Resumo:
Gene recognition is one of the most important problems in computational molecular biology. Previous attempts to solve this problem were based on statistics, and applications of combinatorial methods for gene recognition were almost unexplored. Recent advances in large-scale cDNA sequencing open a way toward a new approach to gene recognition that uses previously sequenced genes as a clue for recognition of newly sequenced genes. This paper describes a spliced alignment algorithm and software tool that explores all possible exon assemblies in polynomial time and finds the multiexon structure with the best fit to a related protein. Unlike other existing methods, the algorithm successfully recognizes genes even in the case of short exons or exons with unusual codon usage; we also report correct assemblies for genes with more than 10 exons. On a test sample of human genes with known mammalian relatives, the average correlation between the predicted and actual proteins was 99%. The algorithm correctly reconstructed 87% of genes and the rare discrepancies between the predicted and real exon-intron structures were caused either by short (less than 5 amino acids) initial/terminal exons or by alternative splicing. Moreover, the algorithm predicts human genes reasonably well when the homologous protein is nonvertebrate or even prokaryotic. The surprisingly good performance of the method was confirmed by extensive simulations: in particular, with target proteins at 160 accepted point mutations (PAM) (25% similarity), the correlation between the predicted and actual genes was still as high as 95%.
Resumo:
The solution structures of calicheamicin gamma 1I, its cycloaromatized analog (calicheamicin epsilon), and its aryl tetrasaccharide complexed to a common DNA hairpin duplex have been determined by NMR and distance-refined molecular dynamics computations. Sequence specificity is associated with carbohydrate-DNA recognition that places the aryl tetrasaccharide component of all three ligands in similar orientations in the minor groove at the d(T-C-C-T).d(A-G-G-A) segment. The complementary fit of the ligands and the DNA minor groove binding site creates numerous van der Waals contacts as well as hydrogen bonding interactions. Notable are the iodine and sulfur atoms of calicheamicin that hydrogen bond with the exposed amino proton of the 5'- and 3'-guanines, respectively, of the d(A-G-G-A) segment. The sequence-specific carbohydrate binding orients the enediyne aglycone of calicheamicin gamma 1I such that its C3 and C6 proradical centers are adjacent to the cleavage sites. While the enediyne aglycone of calicheamicin gamma 1I is tilted relative to the helix axis and spans the minor groove, the cycloaromatized aglycone is aligned approximately parallel to the helix axis in the respective complexes. Specific localized conformational perturbations in the DNA have been identified from imino proton complexation shifts and changes in specific sugar pucker patterns on complex formation. The helical parameters for the carbohydrate binding site are comparable with corresponding values in B-DNA fibers while a widening of the groove is observed at the adjacent aglycone binding site.
Resumo:
High-resolution physical maps of the genomes of three Rhodobacter capsulatus strains, derived from ordered cosmid libraries, were aligned. The 1.2-Mb segment of the SB1003 genome studied here is adjacent to a 1-Mb region analyzed previously [Fonstein, M., Nikolskaya, T. & Haselkorn, H. (1995) J. Bacteriol. 177, 2368-2372]. Probes derived from the ordered cosmid set of R. capsulatus SB1003 were used to link cosmids from the St. Louis and 2.3.1 strain libraries. Cosmids selected this way did not merge into a single contig but formed several unlinked groups. EcoRV restriction maps of the ordered cosmids were then constructed using lambda terminase and fused to derive fragments of the chromosomal map. In order to link these fragments, their ends were transcribed to produce secondary probes for hybridization to gridded cosmid libraries of the same strains. This linking reduced the number of subcontigs to three for the St. Louis strain and one for the 2.3.1 strain. Hybridization of the same probes back to the ordered cosmid set of SB1003 positioned the subcontigs on the high-resolution physical map of SB1003. The final alignment of the restriction maps shows numerous large and small translocations in this 1.2-Mb chromosomal region of the three Rhodobacter strains. In addition, the chromosomes of the three strains, whose fine-structure maps can now be compared over 2.2 Mb, are seen to contain regions of 15-80 kb in which restriction sites are highly polymorphic, interspersed among regions in which the positions of restriction sites are highly conserved.
Resumo:
Average hepatic expression (mRNA per cell per gene) of a metallothionein-rat growth hormone (rGH) gene with its natural introns was about 15-fold higher than an intronless version when tested in transgenic mice. We examined the idea that intron removal leads to an alteration in chromatin structure that might be responsible for this effect. Using an in vitro chromatin assembly system, we observed that nucleosomes were aligned in a characteristic ordered array over the gene and promoter when all introns were present. Linker histones were necessary for this alignment to occur. In contrast, nucleosome alignment was perturbed in constructs lacking some or all of the introns. A similar disruption of nucleosome alignment was observed when comparing chromatin from livers of transgenic mice carrying rGH transgenes with or without introns. In vitro, sequences at the 3' end of the rGH gene position nucleosomes and facilitate nucleosome alignment upstream; however, nucleosome alignment does not occur on the approximately 3 kb of downstream flanking rat sequence. These observations suggest that signals present in genomic rGH DNA may serve to establish appropriate nucleosome alignment during development and, possibly, to restore nucleosome alignment to the transcribed region after disruption incurred by the passage of an RNA polymerase molecule, thereby facilitating subsequent rounds of transcription.