195 resultados para Multiple Sequence Alignment


Relevância:

80.00% 80.00%

Publicador:

Resumo:

Structural similarity among proteins is reflected in the distribution of hydropathicity along the amino acids in the protein sequence. Similarities in the hydropathy distributions are obvious for homologous proteins within a protein family. They also were observed for proteins with related structures, even when sequence similarities were undetectable. Here we present a novel method that employs the hydropathy distribution in proteins for identification of (sub)families in a set of (homologous) proteins. We represent proteins as points in a generalized hydropathy space, represented by vectors of specifically defined features. The features are derived from hydropathy of the individual amino acids. Projection of this space onto principal axes reveals groups of proteins with related hydropathy distributions. The groups identified correspond well to families of structurally and functionally related proteins. We found that this method accurately identifies protein families in a set of proteins, or subfamilies in a set of homologous proteins. Our results show that protein families can be identified by the analysis of hydropathy distribution, without the need for sequence alignment. (C) 2005 Wiley-Liss, Inc.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Background: The residue-wise contact order (RWCO) describes the sequence separations between the residues of interest and its contacting residues in a protein sequence. It is a new kind of one-dimensional protein structure that represents the extent of long-range contacts and is considered as a generalization of contact order. Together with secondary structure, accessible surface area, the B factor, and contact number, RWCO provides comprehensive and indispensable important information to reconstructing the protein three-dimensional structure from a set of one-dimensional structural properties. Accurately predicting RWCO values could have many important applications in protein three-dimensional structure prediction and protein folding rate prediction, and give deep insights into protein sequence-structure relationships. Results: We developed a novel approach to predict residue-wise contact order values in proteins based on support vector regression (SVR), starting from primary amino acid sequences. We explored seven different sequence encoding schemes to examine their effects on the prediction performance, including local sequence in the form of PSI-BLAST profiles, local sequence plus amino acid composition, local sequence plus molecular weight, local sequence plus secondary structure predicted by PSIPRED, local sequence plus molecular weight and amino acid composition, local sequence plus molecular weight and predicted secondary structure, and local sequence plus molecular weight, amino acid composition and predicted secondary structure. When using local sequences with multiple sequence alignments in the form of PSI-BLAST profiles, we could predict the RWCO distribution with a Pearson correlation coefficient (CC) between the predicted and observed RWCO values of 0.55, and root mean square error (RMSE) of 0.82, based on a well-defined dataset with 680 protein sequences. Moreover, by incorporating global features such as molecular weight and amino acid composition we could further improve the prediction performance with the CC to 0.57 and an RMSE of 0.79. In addition, combining the predicted secondary structure by PSIPRED was found to significantly improve the prediction performance and could yield the best prediction accuracy with a CC of 0.60 and RMSE of 0.78, which provided at least comparable performance compared with the other existing methods. Conclusion: The SVR method shows a prediction performance competitive with or at least comparable to the previously developed linear regression-based methods for predicting RWCO values. In contrast to support vector classification (SVC), SVR is very good at estimating the raw value profiles of the samples. The successful application of the SVR approach in this study reinforces the fact that support vector regression is a powerful tool in extracting the protein sequence-structure relationship and in estimating the protein structural profiles from amino acid sequences.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Scorpion toxins are important physiological probes for characterizing ion channels. Molecular databases have limited functional annotation of scorpion toxins. Their function can be inferred by searching for conserved motifs in sequence signature databases that are derived statistically but are not necessarily biologically relevant. Mutation studies provide biological information on residues and positions important for structure-function relationship but are not normally used for extraction of binding motifs. 3D structure analyses also aid in the extraction of peptide motifs in which non-contiguous residues are clustered spatially. Here we present new, functionally relevant peptide motifs for ion channels, derived from the analyses of scorpion toxin native and mutant peptides. Copyright (c) 2006 European Peptide Society and John Wiley & Sons, Ltd.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Monocrotaline is a pyrrolizidine alkaloid known to cause toxicity in humans and animals. Its mechanism of biological action is still unclear although DNA crosslinking has been suggested to a play a role in its activity. In this study we found that an active metabolite of monocrotaline, dehydromonocrotaline (DHM), alkylates guanines at the N7 position of DNA with a preference for 5'-GG and 5'-GA sequences; In addition, it generates piperidine- and heat-resistant multiple DNA crosslinks, as confirmed by electrophoresis and electron microscopy. On the basis of these findings, we propose that DHM undergoes rapid polymerization to a structure which is able to crosslink several fragments of DNA.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Conventionally, protein structure prediction via threading relies on some nonoptimal method to align a protein sequence to each member of a library of known structures. We show how a score function (force field) can be modified so as to allow the direct application of a dynamic programming algorithm to the problem. This involves an approximation whose damage can be minimized by an optimization process during score function parameter determination. The method is compared to sequence to structure alignments using a more conventional pair-wise score function and the frozen approximation. The new method produces results comparable to the frozen approximation, but is faster and has fewer adjustable parameters. It is also free of memory of the template's original amino acid sequence, and does not suffer from a problem of nonconvergence, which can be shown to occur with the frozen approximation. Alignments generated by the simplified score function can then be ranked using a second score function with the approximations removed. (C) 1999 John Wiley & Sons, Inc.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

In this study, we propose a novel method to predict the solvent accessible surface areas of transmembrane residues. For both transmembrane alpha-helix and beta-barrel residues, the correlation coefficients between the predicted and observed accessible surface areas are around 0.65. On the basis of predicted accessible surface areas, residues exposed to the lipid environment or buried inside a protein can be identified by using certain cutoff thresholds. We have extensively examined our approach based on different definitions of accessible surface areas and a variety of sets of control parameters. Given that experimentally determining the structures of membrane proteins is very difficult and membrane proteins are actually abundant in nature, our approach is useful for theoretically modeling membrane protein tertiary structures, particularly for modeling the assembly of transmembrane domains. This approach can be used to annotate the membrane proteins in proteomes to provide extra structural and functional information.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Despite the success of conventional Sanger sequencing, significant regions of many genomes still present major obstacles to sequencing. Here we propose a novel approach with the potential to alleviate a wide range of sequencing difficulties. The technique involves extracting target DNA sequence from variants generated by introduction of random mutations. The introduction of mutations does not destroy original sequence information, but distributes it amongst multiple variants. Some of these variants lack problematic features of the target and are more amenable to conventional sequencing. The technique has been successfully demonstrated with mutation levels up to an average 18% base substitution and has been used to read previously intractable poly(A), AT-rich and GC-rich motifs.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Analysis of the structure of the urochordate Herdmania curvata ribosomal DNA intergenic spacer (IGS) and its role in transcription initiation and termination suggests that rRNA gene regulation in this chordate differs from that in vertebrates. A cloned H, curvata IGS is 1881 bp and composed predominantly of two classes of similar repeat sequences that largely alternate in a tandem array. Southern blot hybridization demonstrates that the IGS length variation within an individual and population is largely the result of changes in internal repeat number. Nuclease S1 mapping and primer extension analyses suggest that there are two transcription initiation sites at the 3' end of the most 3' repetitive element; these sites are 6 nucleotides apart. Unlike mouse, Xenopus, and Drosophila, there is no evidence of transcription starting elsewhere in the IGS. Most sequence differences between the promoter repeat and the other internal repeats are in the vicinity of the putative initiation sites. As in Drosophila, nuclease S1 mapping of transcription termination sites suggest that there is not a definitive stop site and a majority of the pre-rRNAs read through a substantial portion of the IGS. Some transcription appears to proceed completely through the promoter repeat into the adjacent rDNA unit. Analysis of oocyte RNA by reverse transcription-polymerase chain reaction (RT-PCR) confirms that readthrough transcription into the adjacent rDNA unit is occurring in some small IGS length variants; there is no evidence of complete readthrough of IGSs larger than 1.0 kb.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Liver samples from rabbits killed by RHDV, collected from five States in Australia in 1996 and 1997 were analysed by RT-PCR. A 398 bp fragment of the capsid protein (VP60) gene was amplified by PCR and directly sequenced. The alignment of the nucleotide and amino acid sequences and their comparison with the original strain of the virus released in Australia indicated genetic changes after two years have been small with 98.2% to 100% identity. The constructed phylogenetic tree suggests slight differences in nucleotide substitutions in various States but there is no clear evidence of clustering of sequences according to their geographic origin. In practical terms, sequencing of viral RNA provides a means of testing the efficacy of further releases and subsequent spread of the virus if such a strategy is employed as a means of enhancing RHD as a biological control of the wild rabbit in Australia.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Fluorescence in situ hybridization of a tile path of DNA subclones has previously enabled the cytogenetic definition of the minimal DNA sequence which spans the FRA16D common chromosomal fragile site, located at 16q23.2. Homozygous deletion of the FRA16D locus has been reported in adenocarcinomas of stomach, colon, lung and ovary. We have sequenced the 270 kb containing the FRA16D fragile site and the minimal homozygously deleted region in tumour cells. This sequence enabled localization of some of the tumour cell breakpoints to regions which contain AT-rich secondary structures similar to those associated with the FRA10B and FRA16B rare fragile sites. The FRA16D DNA sequence also led to the identification of an alternatively spliced gene, named FOR (fragile site FRA16D oxidoreductase), exons of which span both the fragile site and the minimal region of homozygous deletion. In addition, the complete DNA sequence of the FRA16D-containing FOR intron reveals no evidence of additional authentic transcripts. Alternatively spliced FOR transcripts (FOR I, FOR II and FOR III) encode proteins which share N-terminal WW domains and differ at their C-terminus, with FOR III having a truncated oxidoreductase domain. FRA16D-associated deletions selectively affect the FOR gene transcripts. Three out of five previously mapped translocation breakpoints in multiple myeloma are also located within the FOR gene. FOR is therefore the principle genetic target for DNA instability at 16q23.2 and perturbation of FOR function is likely to contribute to the biological consequences of DNA instability at FRA16D in cancer cells.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Within steroid receptor heterocomplexes the large tetraticopeptide repeat-containing immunophilins, cyclophilin 40 (CyP40), FKBP51, and FKBP52, target a common interaction site in heat shock protein 90 (HspSO) and act coordinately with HspSO to modulate receptor activity. The reversible nature of the interaction between the immunophilins and HspSO suggests that relative cellular abundance might be a key determinant of the immunophilin component within steroid receptor complexes. To investigate CyP40 gene regulation, we have isolated a fi-kilobase (kb) 5 ' -flanking region of the human gene and demonstrated that a similar to 50 base pair (bp) sequence adjacent to the transcription start site is essential for CyP40 basal expression. Three tandemly arranged Ets sites within this critical region were identified as binding elements for the multimeric Ets-related transcription factor, GA binding protein (GABP). Functional studies of this proximal promoter sequence, in combination with mutational analysis, confirmed these sites to be crucial for basal promoter function. Furthermore, overexpression of both GABP alpha and GABP beta subunits in Cos1 cells resulted in increased endogenous CyP40 mRNA levels. Significantly, a parallel increase in FKBP52 mRNA expression was not observed, highlighting an important difference in the mode of regulation of the CyP40 and FKBP52 genes. Our results identify GABP as a key regulator of CyP40 expression. GAFF is a common target of mitogen and stress-activated pathways and may integrate these diverse extracellular signals to regulate CyP40 gene expression.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The habit of inducing plant galls has evolved multiple times among insects but most species diversity occurs in only a few groups, such as gall midges and gall wasps. This phylogenetic clustering may reflect adaptive radiations in insect groups in which the trait has evolved. Alternatively, multiple independent origins of galling may suggest a selective advantage to the habit. We use DNA sequence data to examine the origins of galling among the most speciose group of gall-inducing scale insects, the eriococcids. We determine that the galling habit has evolved multiple times, including four times in Australian taxa, suggesting that there has been a selective advantage to galling in Australia. Additionally, although most gall-inducing eriococcid species occur on Myrtaceae, we found that lineages feeding on Myrtaceae are no more likely to have evolved the galling habit than those feeding on other plant groups. However, most gall-inducing species-richness is clustered in only two clades (Apiomorpha and Lachnodius + Opisthoscelis), all of which occur exclusively on Eucalyptus s.s. The Eriococcidae and the large genus Eriococcus were determined to be non-monophyletic and each will require revision. (C) 2004 The Linnean Society of London.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We have investigated molecular mechanisms of the embryonic development of an ascidian, a primitive chordate which shares features of both invertebrates and vertebrates, with a view to identifying genes involved in development and metamorphosis, We isolated 12 partial cDNA sequences which were expressed in a stage-specific manner using differential display, We report here the isolation of a full-length cDNA sequence for one of these genes which was specifically expressed during the tailbud and larval stages of ascidian development, This cDNA, 1213 bp in length, is predicted to encode a protein of 337 amino acids containing four epidermal growth factor (EGF)-like repeats and three novel cysteine-rich repeats, Characterization of its spatial expression pattern by in situ hybridisation in late tailbud and larval embryos demonstrated strong expression localised throughout the papillae and anteriormost trunk and weaker expression in the epidermis of the remainder of the embryo, As recent evidence indicates that the signal for metamorphosis originates in the anterior trunk region, these results suggest that this gene may have a role in signalling the initiation of metamorphosis. (C) 1997 Wiley-Liss, Inc.