41 resultados para local sequence alignment problem
Resumo:
BAliBASE is specifically designed to serve as an evaluation resource to address all the problems encountered when aligning complete sequences. The database contains high quality, manually constructed multiple sequence alignments together with detailed annotations. The alignments are all based on three-dimensional structural superpositions, with the exception of the transmembrane sequences. The first release provided sets of reference alignments dealing with the problems of high variability, unequal repartition and large N/C-terminal extensions and internal insertions. Here we describe version 2.0 of the database, which incorporates three new reference sets of alignments containing structural repeats, transmembrane sequences and circular permutations to evaluate the accuracy of detection/prediction and alignment of these complex sequences. BAliBASE can be viewed at the web site http://www-igbmc.u-strasbg.fr/BioInfo/BAliBASE2/index.html or can be downloaded from ftp://ftp-igbmc.u-strasbg.fr/pub/BAliBASE2/.
Resumo:
PALI (release 1.2) contains three-dimensional (3-D) structure-dependent sequence alignments as well as structure-based phylogenetic trees of homologous protein domains in various families. The data set of homologous protein structures has been derived by consulting the SCOP database (release 1.50) and the data set comprises 604 families of homologous proteins involving 2739 protein domain structures with each family made up of at least two members. Each member in a family has been structurally aligned with every other member in the same family (pairwise alignment) and all the members in the family are also aligned using simultaneous superposition (multiple alignment). The structural alignments are performed largely automatically, with manual interventions especially in the cases of distantly related proteins, using the program STAMP (version 4.2). Every family is also associated with two dendrograms, calculated using PHYLIP (version 3.5), one based on a structural dissimilarity metric defined for every pairwise alignment and the other based on similarity of topologically equivalent residues. These dendrograms enable easy comparison of sequence and structure-based relationships among the members in a family. Structure-based alignments with the details of structural and sequence similarities, superposed coordinate sets and dendrograms can be accessed conveniently using a web interface. The database can be queried for protein pairs with sequence or structural similarities falling within a specified range. Thus PALI forms a useful resource to help in analysing the relationship between sequence and structure variation at a given level of sequence similarity. PALI also contains over 653 ‘orphans’ (single member families). Using the web interface involving PSI_BLAST and PHYLIP it is possible to associate the sequence of a new protein with one of the families in PALI and generate a phylogenetic tree combining the query sequence and proteins of known 3-D structure. The database with the web interfaced search and dendrogram generation tools can be accessed at http://pa uling.mbu.iisc.ernet.in/~pali.
Resumo:
The human prion gene contains five copies of a 24 nt repeat that is highly conserved among species. An analysis of folding free energies of the human prion mRNA, in particular in the repeat region, suggested biased codon selection and the presence of RNA patterns. In particular, pseudoknots, similar to the one predicted by Wills in the human prion mRNA, were identified in the repeat region of all available prion mRNAs available in GenBank, but not those of birds and the red slider turtle. An alignment of these mRNAs, which share low sequence homology, shows several co-variations that maintain the pseudoknot pattern. The presence of pseudoknots in yeast Sup35p and Rnq1 suggests acquisition in the prokaryotic era. Computer generated three-dimensional structures of the human prion pseudoknot highlight protein and RNA interaction domains, which suggest a possible effect in prion protein translation. The role of pseudoknots in prion diseases is discussed as individuals with extra copies of the 24 nt repeat develop the familial form of Creutzfeldt–Jakob disease.
Resumo:
Here we study the effect of point mutations in proteins on the redistributions of the conformational substates. We show that regardless of the location of a mutation in the protein structure and of its type, the observed movements of the backbone recur largely at the same positions in the structures. Despite the different interactions that are disrupted and formed by the residue substitution, not only are the conformations very similar, but the regions that move are also the same, regardless of their sequential or spatial distance from the mutation. This observation leads us to conclude that, apart from some extreme cases, the details of the interactions are not critically important in determining the protein conformation or in specifying which parts of the protein would be more prone to take on different local conformations in response to changes in the sequence. This finding further illustrates why proteins manifest a robustness toward many mutational events. This nonuniform distribution of the conformer population is consistently observed in a variety of protein structural types. Topology is critically important in determining folding pathways, kinetics, building block cutting, and anatomy trees. Here we show that topology is also very important in determining which regions of the protein structure will respond to sequence changes, regardless of the sequential or spatial location of the mutation.
Resumo:
It has been suggested that delayed DNA replication underlies fragility at common human fragile sites, but specific sequences responsible for expression of these inducible fragile sites have not been identified. One approach to identify such cis-acting sequences within the large nonexonic regions of fragile sites would be to identify conserved functional elements within orthologous fragile sites by interspecies sequence comparison. This study describes a comparison of orthologous fragile regions, the human FRA3B/FHIT and the murine Fra14A2/Fhit locus. We sequenced over 600 kbp of the mouse Fra14A2, covering the region orthologous to the fragile epicenter of FRA3B, and determined the Fhit deletion break points in a mouse kidney cancer cell line (RENCA). The murine Fra14A2 locus, like the human FRA3B, was characterized by a high AT content. Alignment of the two sequences showed that this fragile region was stable in evolution despite its susceptibility to mitotic recombination on inhibition of DNA replication. There were also several unusual highly conserved regions (HCRs). The positions of predicted matrix attachment regions (MARs), possibly related to replication origins, were not conserved. Of known fragile region landmarks, five cancer cell break points, one viral integration site, and one aphidicolin break cluster were located within or near HCRs. Thus, comparison of orthologous fragile regions has identified highly conserved sequences with possible functional roles in maintenance of fragility.
Resumo:
We present a method for discovering conserved sequence motifs from families of aligned protein sequences. The method has been implemented as a computer program called emotif (http://motif.stanford.edu/emotif). Given an aligned set of protein sequences, emotif generates a set of motifs with a wide range of specificities and sensitivities. emotif also can generate motifs that describe possible subfamilies of a protein superfamily. A disjunction of such motifs often can represent the entire superfamily with high specificity and sensitivity. We have used emotif to generate sets of motifs from all 7,000 protein alignments in the blocks and prints databases. The resulting database, called identify (http://motif.stanford.edu/identify), contains more than 50,000 motifs. For each alignment, the database contains several motifs having a probability of matching a false positive that range from 10−10 to 10−5. Highly specific motifs are well suited for searching entire proteomes, while generating very few false predictions. identify assigns biological functions to 25–30% of all proteins encoded by the Saccharomyces cerevisiae genome and by several bacterial genomes. In particular, identify assigned functions to 172 of proteins of unknown function in the yeast genome.
Resumo:
We present an approach for assessing the significance of sequence and structure comparisons by using nearly identical statistical formalisms for both sequence and structure. Doing so involves an all-vs.-all comparison of protein domains [taken here from the Structural Classification of Proteins (scop) database] and then fitting a simple distribution function to the observed scores. By using this distribution, we can attach a statistical significance to each comparison score in the form of a P value, the probability that a better score would occur by chance. As expected, we find that the scores for sequence matching follow an extreme-value distribution. The agreement, moreover, between the P values that we derive from this distribution and those reported by standard programs (e.g., blast and fasta validates our approach. Structure comparison scores also follow an extreme-value distribution when the statistics are expressed in terms of a structural alignment score (essentially the sum of reciprocated distances between aligned atoms minus gap penalties). We find that the traditional metric of structural similarity, the rms deviation in atom positions after fitting aligned atoms, follows a different distribution of scores and does not perform as well as the structural alignment score. Comparison of the sequence and structure statistics for pairs of proteins known to be related distantly shows that structural comparison is able to detect approximately twice as many distant relationships as sequence comparison at the same error rate. The comparison also indicates that there are very few pairs with significant similarity in terms of sequence but not structure whereas many pairs have significant similarity in terms of structure but not sequence.
Resumo:
Cardiac hypertrophy is associated with altered expression of the components of the cardiac renin-angiotensin system (RAS). While in vitro data suggest that local mechanical stimuli serve as important regulatory modulators of cardiac RAS activity, no in vivo studies have so far corroborated these observations. The aims of this study were to (i) examine the respective influence of local, mechanical versus systemic, soluble factors on the modulation of cardiac RAS gene expression in vivo; (ii) measure gene expression of all known components of the RAS simultaneously; and (iii) establish sequence information and an assay system for the RAS of the dog, one of the most important model organisms in cardiovascular research. We therefore examined a canine model of right ventricular hypertrophy and failure (RVHF) in which the right ventricle (RV) is hemodynamically loaded, the left ventricle (LV) is hemodynamically unloaded, while both are exposed to the same circulating milieu of soluble factors. Using specific competitive PCR assays, we found that RVHF was associated with significant increases in RV mRNA levels of angiotensin converting enzyme and angiotensin II type 2 receptor, and with significant decreases of RV expression of chymase and the angiotensin II type 1 receptor, while RV angiotensinogen and renin remained unchanged. All components remained unchanged in the LV. We conclude that (i) dissociated regional regulation of RAS components in RV and LV indicates modulation by local, mechanical, not soluble, systemic stimuli; (ii) components of the cardiac RAS are independently and differentially regulated; and (iii) opposite changes in the expression of angiotensin converting enzyme and chymase, and of angiotensin II type I and angiotensin II type 2 receptors, may indicate different physiological roles of these RAS components in RVHF.
Sequence similarity analysis of Escherichia coli proteins: functional and evolutionary implications.
Resumo:
A computer analysis of 2328 protein sequences comprising about 60% of the Escherichia coli gene products was performed using methods for database screening with individual sequences and alignment blocks. A high fraction of E. coli proteins--86%--shows significant sequence similarity to other proteins in current databases; about 70% show conservation at least at the level of distantly related bacteria, and about 40% contain ancient conserved regions (ACRs) shared with eukaryotic or Archaeal proteins. For > 90% of the E. coli proteins, either functional information or sequence similarity, or both, are available. Forty-six percent of the E. coli proteins belong to 299 clusters of paralogs (intraspecies homologs) defined on the basis of pairwise similarity. Another 10% could be included in 70 superclusters using motif detection methods. The majority of the clusters contain only two to four members. In contrast, nearly 25% of all E. coli proteins belong to the four largest superclusters--namely, permeases, ATPases and GTPases with the conserved "Walker-type" motif, helix-turn-helix regulatory proteins, and NAD(FAD)-binding proteins. We conclude that bacterial protein sequences generally are highly conserved in evolution, with about 50% of all ACR-containing protein families represented among the E. coli gene products. With the current sequence databases and methods of their screening, computer analysis yields useful information on the functions and evolutionary relationships of the vast majority of genes in a bacterial genome. Sequence similarity with E. coli proteins allows the prediction of functions for a number of important eukaryotic genes, including several whose products are implicated in human diseases.
Resumo:
The solution structures of calicheamicin gamma 1I, its cycloaromatized analog (calicheamicin epsilon), and its aryl tetrasaccharide complexed to a common DNA hairpin duplex have been determined by NMR and distance-refined molecular dynamics computations. Sequence specificity is associated with carbohydrate-DNA recognition that places the aryl tetrasaccharide component of all three ligands in similar orientations in the minor groove at the d(T-C-C-T).d(A-G-G-A) segment. The complementary fit of the ligands and the DNA minor groove binding site creates numerous van der Waals contacts as well as hydrogen bonding interactions. Notable are the iodine and sulfur atoms of calicheamicin that hydrogen bond with the exposed amino proton of the 5'- and 3'-guanines, respectively, of the d(A-G-G-A) segment. The sequence-specific carbohydrate binding orients the enediyne aglycone of calicheamicin gamma 1I such that its C3 and C6 proradical centers are adjacent to the cleavage sites. While the enediyne aglycone of calicheamicin gamma 1I is tilted relative to the helix axis and spans the minor groove, the cycloaromatized aglycone is aligned approximately parallel to the helix axis in the respective complexes. Specific localized conformational perturbations in the DNA have been identified from imino proton complexation shifts and changes in specific sugar pucker patterns on complex formation. The helical parameters for the carbohydrate binding site are comparable with corresponding values in B-DNA fibers while a widening of the groove is observed at the adjacent aglycone binding site.
Resumo:
Average hepatic expression (mRNA per cell per gene) of a metallothionein-rat growth hormone (rGH) gene with its natural introns was about 15-fold higher than an intronless version when tested in transgenic mice. We examined the idea that intron removal leads to an alteration in chromatin structure that might be responsible for this effect. Using an in vitro chromatin assembly system, we observed that nucleosomes were aligned in a characteristic ordered array over the gene and promoter when all introns were present. Linker histones were necessary for this alignment to occur. In contrast, nucleosome alignment was perturbed in constructs lacking some or all of the introns. A similar disruption of nucleosome alignment was observed when comparing chromatin from livers of transgenic mice carrying rGH transgenes with or without introns. In vitro, sequences at the 3' end of the rGH gene position nucleosomes and facilitate nucleosome alignment upstream; however, nucleosome alignment does not occur on the approximately 3 kb of downstream flanking rat sequence. These observations suggest that signals present in genomic rGH DNA may serve to establish appropriate nucleosome alignment during development and, possibly, to restore nucleosome alignment to the transcribed region after disruption incurred by the passage of an RNA polymerase molecule, thereby facilitating subsequent rounds of transcription.