24 resultados para SEQUENCE ALIGNMENT
Resumo:
The database reported here is derived using the Combinatorial Extension (CE) algorithm which compares pairs of protein polypeptide chains and provides a list of structurally similar proteins along with their structure alignments. Using CE, structure–structure alignments can provide insights into biological function. When a protein of known function is shown to be structurally similar to a protein of unknown function, a relationship might be inferred; a relationship not necessarily detectable from sequence comparison alone. Establishing structure–structure relationships in this way is of great importance as we enter an era of structural genomics where there is a likelihood of an increasing number of structures with unknown functions being determined. Thus the CE database is an example of a useful tool in the annotation of protein structures of unknown function. Comparisons can be performed on the complete PDB or on a structurally representative subset of proteins. The source protein(s) can be from the PDB (updated monthly) or uploaded by the user. CE provides sequence alignments resulting from structural alignments and Cartesian coordinates for the aligned structures, which may be analyzed using the supplied Compare3D Java applet, or downloaded for further local analysis. Searches can be run from the CE web site, http://cl.sdsc.edu/ce.html, or the database and software downloaded from the site for local use.
Resumo:
PALI (release 1.2) contains three-dimensional (3-D) structure-dependent sequence alignments as well as structure-based phylogenetic trees of homologous protein domains in various families. The data set of homologous protein structures has been derived by consulting the SCOP database (release 1.50) and the data set comprises 604 families of homologous proteins involving 2739 protein domain structures with each family made up of at least two members. Each member in a family has been structurally aligned with every other member in the same family (pairwise alignment) and all the members in the family are also aligned using simultaneous superposition (multiple alignment). The structural alignments are performed largely automatically, with manual interventions especially in the cases of distantly related proteins, using the program STAMP (version 4.2). Every family is also associated with two dendrograms, calculated using PHYLIP (version 3.5), one based on a structural dissimilarity metric defined for every pairwise alignment and the other based on similarity of topologically equivalent residues. These dendrograms enable easy comparison of sequence and structure-based relationships among the members in a family. Structure-based alignments with the details of structural and sequence similarities, superposed coordinate sets and dendrograms can be accessed conveniently using a web interface. The database can be queried for protein pairs with sequence or structural similarities falling within a specified range. Thus PALI forms a useful resource to help in analysing the relationship between sequence and structure variation at a given level of sequence similarity. PALI also contains over 653 ‘orphans’ (single member families). Using the web interface involving PSI_BLAST and PHYLIP it is possible to associate the sequence of a new protein with one of the families in PALI and generate a phylogenetic tree combining the query sequence and proteins of known 3-D structure. The database with the web interfaced search and dendrogram generation tools can be accessed at http://pa uling.mbu.iisc.ernet.in/~pali.
Resumo:
The human prion gene contains five copies of a 24 nt repeat that is highly conserved among species. An analysis of folding free energies of the human prion mRNA, in particular in the repeat region, suggested biased codon selection and the presence of RNA patterns. In particular, pseudoknots, similar to the one predicted by Wills in the human prion mRNA, were identified in the repeat region of all available prion mRNAs available in GenBank, but not those of birds and the red slider turtle. An alignment of these mRNAs, which share low sequence homology, shows several co-variations that maintain the pseudoknot pattern. The presence of pseudoknots in yeast Sup35p and Rnq1 suggests acquisition in the prokaryotic era. Computer generated three-dimensional structures of the human prion pseudoknot highlight protein and RNA interaction domains, which suggest a possible effect in prion protein translation. The role of pseudoknots in prion diseases is discussed as individuals with extra copies of the 24 nt repeat develop the familial form of Creutzfeldt–Jakob disease.
Resumo:
It has been suggested that delayed DNA replication underlies fragility at common human fragile sites, but specific sequences responsible for expression of these inducible fragile sites have not been identified. One approach to identify such cis-acting sequences within the large nonexonic regions of fragile sites would be to identify conserved functional elements within orthologous fragile sites by interspecies sequence comparison. This study describes a comparison of orthologous fragile regions, the human FRA3B/FHIT and the murine Fra14A2/Fhit locus. We sequenced over 600 kbp of the mouse Fra14A2, covering the region orthologous to the fragile epicenter of FRA3B, and determined the Fhit deletion break points in a mouse kidney cancer cell line (RENCA). The murine Fra14A2 locus, like the human FRA3B, was characterized by a high AT content. Alignment of the two sequences showed that this fragile region was stable in evolution despite its susceptibility to mitotic recombination on inhibition of DNA replication. There were also several unusual highly conserved regions (HCRs). The positions of predicted matrix attachment regions (MARs), possibly related to replication origins, were not conserved. Of known fragile region landmarks, five cancer cell break points, one viral integration site, and one aphidicolin break cluster were located within or near HCRs. Thus, comparison of orthologous fragile regions has identified highly conserved sequences with possible functional roles in maintenance of fragility.
Resumo:
We present a method for discovering conserved sequence motifs from families of aligned protein sequences. The method has been implemented as a computer program called emotif (http://motif.stanford.edu/emotif). Given an aligned set of protein sequences, emotif generates a set of motifs with a wide range of specificities and sensitivities. emotif also can generate motifs that describe possible subfamilies of a protein superfamily. A disjunction of such motifs often can represent the entire superfamily with high specificity and sensitivity. We have used emotif to generate sets of motifs from all 7,000 protein alignments in the blocks and prints databases. The resulting database, called identify (http://motif.stanford.edu/identify), contains more than 50,000 motifs. For each alignment, the database contains several motifs having a probability of matching a false positive that range from 10−10 to 10−5. Highly specific motifs are well suited for searching entire proteomes, while generating very few false predictions. identify assigns biological functions to 25–30% of all proteins encoded by the Saccharomyces cerevisiae genome and by several bacterial genomes. In particular, identify assigned functions to 172 of proteins of unknown function in the yeast genome.
Resumo:
We present an approach for assessing the significance of sequence and structure comparisons by using nearly identical statistical formalisms for both sequence and structure. Doing so involves an all-vs.-all comparison of protein domains [taken here from the Structural Classification of Proteins (scop) database] and then fitting a simple distribution function to the observed scores. By using this distribution, we can attach a statistical significance to each comparison score in the form of a P value, the probability that a better score would occur by chance. As expected, we find that the scores for sequence matching follow an extreme-value distribution. The agreement, moreover, between the P values that we derive from this distribution and those reported by standard programs (e.g., blast and fasta validates our approach. Structure comparison scores also follow an extreme-value distribution when the statistics are expressed in terms of a structural alignment score (essentially the sum of reciprocated distances between aligned atoms minus gap penalties). We find that the traditional metric of structural similarity, the rms deviation in atom positions after fitting aligned atoms, follows a different distribution of scores and does not perform as well as the structural alignment score. Comparison of the sequence and structure statistics for pairs of proteins known to be related distantly shows that structural comparison is able to detect approximately twice as many distant relationships as sequence comparison at the same error rate. The comparison also indicates that there are very few pairs with significant similarity in terms of sequence but not structure whereas many pairs have significant similarity in terms of structure but not sequence.
Sequence similarity analysis of Escherichia coli proteins: functional and evolutionary implications.
Resumo:
A computer analysis of 2328 protein sequences comprising about 60% of the Escherichia coli gene products was performed using methods for database screening with individual sequences and alignment blocks. A high fraction of E. coli proteins--86%--shows significant sequence similarity to other proteins in current databases; about 70% show conservation at least at the level of distantly related bacteria, and about 40% contain ancient conserved regions (ACRs) shared with eukaryotic or Archaeal proteins. For > 90% of the E. coli proteins, either functional information or sequence similarity, or both, are available. Forty-six percent of the E. coli proteins belong to 299 clusters of paralogs (intraspecies homologs) defined on the basis of pairwise similarity. Another 10% could be included in 70 superclusters using motif detection methods. The majority of the clusters contain only two to four members. In contrast, nearly 25% of all E. coli proteins belong to the four largest superclusters--namely, permeases, ATPases and GTPases with the conserved "Walker-type" motif, helix-turn-helix regulatory proteins, and NAD(FAD)-binding proteins. We conclude that bacterial protein sequences generally are highly conserved in evolution, with about 50% of all ACR-containing protein families represented among the E. coli gene products. With the current sequence databases and methods of their screening, computer analysis yields useful information on the functions and evolutionary relationships of the vast majority of genes in a bacterial genome. Sequence similarity with E. coli proteins allows the prediction of functions for a number of important eukaryotic genes, including several whose products are implicated in human diseases.
Resumo:
The solution structures of calicheamicin gamma 1I, its cycloaromatized analog (calicheamicin epsilon), and its aryl tetrasaccharide complexed to a common DNA hairpin duplex have been determined by NMR and distance-refined molecular dynamics computations. Sequence specificity is associated with carbohydrate-DNA recognition that places the aryl tetrasaccharide component of all three ligands in similar orientations in the minor groove at the d(T-C-C-T).d(A-G-G-A) segment. The complementary fit of the ligands and the DNA minor groove binding site creates numerous van der Waals contacts as well as hydrogen bonding interactions. Notable are the iodine and sulfur atoms of calicheamicin that hydrogen bond with the exposed amino proton of the 5'- and 3'-guanines, respectively, of the d(A-G-G-A) segment. The sequence-specific carbohydrate binding orients the enediyne aglycone of calicheamicin gamma 1I such that its C3 and C6 proradical centers are adjacent to the cleavage sites. While the enediyne aglycone of calicheamicin gamma 1I is tilted relative to the helix axis and spans the minor groove, the cycloaromatized aglycone is aligned approximately parallel to the helix axis in the respective complexes. Specific localized conformational perturbations in the DNA have been identified from imino proton complexation shifts and changes in specific sugar pucker patterns on complex formation. The helical parameters for the carbohydrate binding site are comparable with corresponding values in B-DNA fibers while a widening of the groove is observed at the adjacent aglycone binding site.
Resumo:
Average hepatic expression (mRNA per cell per gene) of a metallothionein-rat growth hormone (rGH) gene with its natural introns was about 15-fold higher than an intronless version when tested in transgenic mice. We examined the idea that intron removal leads to an alteration in chromatin structure that might be responsible for this effect. Using an in vitro chromatin assembly system, we observed that nucleosomes were aligned in a characteristic ordered array over the gene and promoter when all introns were present. Linker histones were necessary for this alignment to occur. In contrast, nucleosome alignment was perturbed in constructs lacking some or all of the introns. A similar disruption of nucleosome alignment was observed when comparing chromatin from livers of transgenic mice carrying rGH transgenes with or without introns. In vitro, sequences at the 3' end of the rGH gene position nucleosomes and facilitate nucleosome alignment upstream; however, nucleosome alignment does not occur on the approximately 3 kb of downstream flanking rat sequence. These observations suggest that signals present in genomic rGH DNA may serve to establish appropriate nucleosome alignment during development and, possibly, to restore nucleosome alignment to the transcribed region after disruption incurred by the passage of an RNA polymerase molecule, thereby facilitating subsequent rounds of transcription.