971 resultados para SEQUENCE ALIGNMENT
Resumo:
Background: Sensitive remote homology detection and accurate alignments especially in the midnight zone of sequence similarity are needed for better function annotation and structural modeling of proteins. An algorithm, AlignHUSH for HMM-HMM alignment has been developed which is capable of recognizing distantly related domain families The method uses structural information, in the form of predicted secondary structure probabilities, and hydrophobicity of amino acids to align HMMs of two sets of aligned sequences. The effect of using adjoining column(s) information has also been investigated and is found to increase the sensitivity of HMM-HMM alignments and remote homology detection. Results: We have assessed the performance of AlignHUSH using known evolutionary relationships available in SCOP. AlignHUSH performs better than the best HMM-HMM alignment methods and is observed to be even more sensitive at higher error rates. Accuracy of the alignments obtained using AlignHUSH has been assessed using the structure-based alignments available in BaliBASE. The alignment length and the alignment quality are found to be appropriate for homology modeling and function annotation. The alignment accuracy is found to be comparable to existing methods for profile-profile alignments. Conclusions: A new method to align HMMs has been developed and is shown to have better sensitivity at error rates of 10% and above when compared to other available programs. The proposed method could effectively aid obtaining clues to functions of proteins of yet unknown function. A web-server incorporating the AlignHUSH method is available at http://crick.mbu.iisc.ernet.in/similar to alignhush/
Resumo:
Structural alignments are the most widely used tools for comparing proteins with low sequence similarity. The main contribution of this paper is to derive various kernels on proteins from structural alignments, which do not use sequence information. Central to the kernels is a novel alignment algorithm which matches substructures of fixed size using spectral graph matching techniques. We derive positive semi-definite kernels which capture the notion of similarity between substructures. Using these as base more sophisticated kernels on protein structures are proposed. To empirically evaluate the kernels we used a 40% sequence non-redundant structures from 15 different SCOP superfamilies. The kernels when used with SVMs show competitive performance with CE, a state of the art structure comparison program.
Resumo:
Establishing functional relationships between multi-domain protein sequences is a non-trivial task. Traditionally, delineating functional assignment and relationships of proteins requires domain assignments as a prerequisite. This process is sensitive to alignment quality and domain definitions. In multi-domain proteins due to multiple reasons, the quality of alignments is poor. We report the correspondence between the classification of proteins represented as full-length gene products and their functions. Our approach differs fundamentally from traditional methods in not performing the classification at the level of domains. Our method is based on an alignment free local matching scores (LMS) computation at the amino-acid sequence level followed by hierarchical clustering. As there are no gold standards for full-length protein sequence classification, we resorted to Gene Ontology and domain-architecture based similarity measures to assess our classification. The final clusters obtained using LMS show high functional and domain architectural similarities. Comparison of the current method with alignment based approaches at both domain and full-length protein showed superiority of the LMS scores. Using this method we have recreated objective relationships among different protein kinase sub-families and also classified immunoglobulin containing proteins where sub-family definitions do not exist currently. This method can be applied to any set of protein sequences and hence will be instrumental in analysis of large numbers of full-length protein sequences.
Resumo:
The complete genome of spring viraemia of carp virus (SVCV) strain A-1 isolated from cultured common carp (Cyprinus carpio) in China was sequenced and characterized. Reverse transcription-polymerase chain reaction (RT-PCR) derived clones were constructed and the DNA was sequenced. It showed that the entire genome of SVCV A-1 consists of 11,100 nucleotide base pairs, the predicted size of the viral RNA of rhabdoviruses. However, the additional insertions in bp 4633-4676 and bp 4684-4724 of SVCV A-1 were different from the other two published SVCV complete genomes. Five open reading frames (ORFs) of SVCV A-1 were identified and further confirmed by RT-PCR and DNA sequencing of their respective RT-PCR products. The 5 structural proteins encoded by the viral RNA were ordered 3'-N-P-M-G-L-5'. This is the first report of a complete genome sequence of SVCV isolated from cultured carp in China. Phylogenetic analysis indicates that SVCV A-1 is closely related to the members of the genus Vesiculovirus, family Rhabdoviridae.
Resumo:
Cyprinidae is the largest fish family in the world and contains about 210 genera and 2010 species. Appropriate DNA markers must be selected for the phylogenetic analyses of Cyprinidae. In present study, the 1st intron of the S7 ribosomal protein (r-protein) gene is first used to examine the relationships among cyprinid fishes. The length of the 1st intron obtained by PCR amplification ranges from 655 to 859 by in the 16 cyprinid species investigated, and is 602 by in Myxocyprinus asiaticus. Out of the alignment of 925 nucleotide sites obtained, the parsimony informative sites are 499 and occupy 54% of the total sites. The results indicate that the 1st intron sequences of the S7 r-protein gene in cyprinids are rich in informative sites and vary remarkably in sequence divergence from 2.3% between close species to 66.6% between distant species. The bootstrap values of the interior nodes in the NJ (neighbor-joining) and MP (most-parsimony) trees based on the present S7 r-protein gene data are higher than those based on cytochrome b and the d-loop region respectively. Therefore, the 1st intron sequences of the S7 r-protein gene in cyprinids are sensitive enough for phylogenetic analyses, and the 1st intron is an appropriate genetic marker for the phylogenetic reconstruction of the taxa in different cyprinid subfamilies. However, attempts to discuss whether the present S7 r-protein gene data can be applied to the phylogeny of the taxa at the level of the family or the higher categories in Cypriniformes need further studies.
Resumo:
Growth hormone (GH), prolactin (PRL) and somatolactin (SL) were purified simultaneously under alkaline condition (pH 9.0) from pituitary glands of sea perch (Lateolabrax japonicas) by a two-step procedure involving gel filtration on Sephadex G-100 and reverse-phase high-performance liquid chromatography (rpHPLC). At each step of purification, fractions were monitored by sodium dodecyl sulfate polyacrylamide gel electrophoresis (SDS-PAGE) and by immunoblotting with chum salmon GH. PRL and SL antisera. The yields of sea perch GH, PRL and SL were 4.2, 1.0 and 0.28 mg/g wet tissue, respectively. The molecular weights of 19,200 and 20,370 Da were estimated by SDS-PAGE for sea perch GH and PRL, respectively. Two forms of sea perch SL were found: one (28,400 Da) is probably glycosylated, while the other one (23,200 Da) is believed to be deglycosylated. GH bioactivity was examined by an in vivo assay. Intraperitoneal injection of sea perch GH at a dose of 0.01 and 0.1 mug/g body weight at 7-day intervals resulted in a significant increase in body weight and length of juvenile rainbow trout. The complete sea-perch GH amino acid sequence of 187 residues was determined by sequencing fragments cleaved by chemicals and enzymes. Alignment of sea-perch GH with those of other fish GHs revealed that sea-perch GH is most similar to advanced marine fish, such as tuna, gilthead sea bream, yellowfin porgy, red sea bream, bonito and yellow tail with 98.4, 96.2%, 95.7%, 95.2%, 94.1% and 91% sequence identity, respectively. Sea-perch GH has low identity to Atlantic cod (76.5%), hardtail (73.3%), flounder (68.4%), chum salmon (66.3%), carp (54%) and blue shark (38%). Partial amino-acid sequences of 127 of sea-perch PRL and the N-terminal of 16 amino-acid sequence of sea-perch SL have been determined. The data show that sea-perch PRL has a slightly higher sequence identity with tilapia PRL( 73.2%) than with chum salmon PRL(70%) in this 127 amino-acid sequence. (C) 2001 Elsevier Science B.V. All rights reserved.
Resumo:
The bay scallop, Argopecten irradians irradians, introduced from North America, has become one of the most important aquaculture species in China. Inan effort to identify scallop genes involved in host defense, a high-quality cDNA library was constructed from whole body tissues of the bay scallop. A total of 5828 successful sequencing reactions yielded 4995 expressed sequence tags (ESTs) longer than 100 bp. Cluster and assembly analyses of the ESTs identified 637 contigs (consisting of 2853 sequences) and 2142 singletons, totaling 2779 unique sequences. Basic Local Alignment Search Tool (BLAST) analysis showed that the majority (73%) of the unique sequences had no significant homology (E-value >= 0.005) to sequences in GenBank. Among the 748 sequences with significant GenBank matches, 160 (21.4%) were for genes related to metabolism, 131 (17.5%) for cell/organism defense, 124 (16.6%) for gene/protein expression, 83 (11.1%) for cell structure/motility, 70 (9.4%) for cell signaling/communication, 17 (2.3%) for cell division, and 163 (21.8%) matched to genes of unknown functions. The list of host-defense genes included many genes with known and important roles in innate defense such as lectins, defensins, proteases, protease inhibitors, heat shock proteins, antioxidants, and Toll-like receptors. The study provides a significant number of ESTs for gene discovery and candidate genes for studying host defense in scallops and other molluscs.
Resumo:
Mark Pagel, Andrew Meade (2004). A phylogenetic mixture model for detecting pattern-heterogeneity in gene sequence or character-state data. Systematic Biology, 53(4), 571-581. RAE2008
Resumo:
It has been widely thought that measuring the misalignment angle between the orbital plane of a transiting exoplanet and the spin of its host star was a good discriminator between different migration processes for hot-Jupiters. Specifically, well-aligned hot-Jupiter systems (as measured by the Rossiter-McLaughlin effect) were thought to have formed via migration through interaction with a viscous disc, while misaligned systems were thought to have undergone a more violent dynamical history. These conclusions were based on the assumption that the planet-forming disc was well-aligned with the host star. Recent work by Lai et al. has challenged this assumption, and proposes that the star-disc interaction in the pre-main sequence phase can exert a torque on the star and change its rotation axis angle. We have estimated the stellar rotation axis of a sample of stars which host spatially resolved debris disks. Comparison of our derived stellar rotation axis inclination angles with the geometrically measured debris-disk inclinations shows no evidence for a misalignment between the two.
Resumo:
It has been widely thought that measuring the misalignment angle between the orbital plane of a transiting exoplanet and the spin of its host star was a good discriminator between different migration processes for hot-Jupiters. Specifically, well-aligned hot-Jupiter systems (as measured by the Rossiter-McLaughlin effect) were thought to have formed via migration through interaction with a viscous disc, while misaligned systems were thought to have undergone a more violent dynamical history. These conclusions were based on the assumption that the planet-forming disc was well-aligned with the host star. Recent work by a number of authors has challenged this assumption by proposing mechanisms that act to drive the star-disc interaction out of alignment during the pre-main-sequence phase. We have estimated the stellar rotation axis of a sample of stars which host spatially resolved debris discs. Comparison of our derived stellar rotation axis inclination angles with the geometrically measured debris-disc inclinations shows no evidence for a misalignment between the two.
Resumo:
A new information-theoretic approach is presented for finding the pose of an object in an image. The technique does not require information about the surface properties of the object, besides its shape, and is robust with respect to variations of illumination. In our derivation, few assumptions are made about the nature of the imaging process. As a result the algorithms are quite general and can foreseeably be used in a wide variety of imaging situations. Experiments are presented that demonstrate the approach registering magnetic resonance (MR) images with computed tomography (CT) images, aligning a complex 3D object model to real scenes including clutter and occlusion, tracking a human head in a video sequence and aligning a view-based 2D object model to real images. The method is based on a formulation of the mutual information between the model and the image called EMMA. As applied here the technique is intensity-based, rather than feature-based. It works well in domains where edge or gradient-magnitude based methods have difficulty, yet it is more robust than traditional correlation. Additionally, it has an efficient implementation that is based on stochastic approximation. Finally, we will describe a number of additional real-world applications that can be solved efficiently and reliably using EMMA. EMMA can be used in machine learning to find maximally informative projections of high-dimensional data. EMMA can also be used to detect and correct corruption in magnetic resonance images (MRI).
Resumo:
We describe a general likelihood-based 'mixture model' for inferring phylogenetic trees from gene-sequence or other character-state data. The model accommodates cases in which different sites in the alignment evolve in qualitatively distinct ways, but does not require prior knowledge of these patterns or partitioning of the data. We call this qualitative variability in the pattern of evolution across sites "pattern-heterogeneity" to distinguish it from both a homogenous process of evolution and from one characterized principally by differences in rates of evolution. We present studies to show that the model correctly retrieves the signals of pattern-heterogeneity from simulated gene-sequence data, and we apply the method to protein-coding genes and to a ribosomal 12S data set. The mixture model outperforms conventional partitioning in both these data sets. We implement the mixture model such that it can simultaneously detect rate- and pattern-heterogeneity. The model simplifies to a homogeneous model or a rate- variability model as special cases, and therefore always performs at least as well as these two approaches, and often considerably improves upon them. We make the model available within a Bayesian Markov-chain Monte Carlo framework for phylogenetic inference, as an easy-to-use computer program.
Resumo:
The alignment of model amyloid peptide YYKLVFFC is investigated in bulk and at a solid surface using a range of spectroscopic methods employing polarized radiation. The peptide is based on a core sequence of the amyloid beta (A beta) peptide, KLVFF. The attached tyrosine and cysteine units are exploited to yield information on alignment and possible formation of disulfide or dityrosine links. Polarized Raman spectroscopy on aligned stalks provides information on tyrosine orientation, which complements data from linear dichroism (LD) on aqueous solutions subjected to shear in a Couette cell. LD provides a detailed picture of alignment of peptide strands and aromatic residues and was also used to probe the kinetics of self-assembly. This suggests initial association of phenylalanine residues, followed by subsequent registry of strands and orientation of tyrosine residues. X-ray diffraction (XRD) data from aligned stalks is used to extract orientational order parameters from the 0.48 nm reflection in the cross-beta pattern, from which an orientational distribution function is obtained. X-ray diffraction on solutions subject to capillary flow confirmed orientation in situ at the level of the cross-beta pattern. The information on fibril and tyrosine orientation from polarized Raman spectroscopy is compared with results from NEXAFS experiments on samples prepared as films on silicon. This indicates fibrils are aligned parallel to the surface, with phenyl ring normals perpendicular to the surface. Possible disulfide bridging leading to peptide dimer formation was excluded by Raman spectroscopy, whereas dityrosine formation was probed by fluorescence experiments and was found not to occur except under alkaline conditions. Congo red binding was found not to influence the cross-beta XRD pattern.
Resumo:
The self-assembly and hydrogelation properties of two Fmoc-tripeptides [Fmoc = N-(fluorenyl-9-methoxycarbonyl)] are investigated, in borate buffer and other basic solutions. A remarkable difference in self-assembly properties is observed comparing Fmoc-VLK(Boc) with Fmoc-K(Boc)LV, both containing K protected by N(epsilon)-tert-butyloxycarbonate (Boc). In borate buffer, the former peptide forms highly anisotropic fibrils which show local alignment, and the hydrogels show flow-aligning properties. In contrast, Fmoc-K(Boc)LV forms highly branched fibrils that produce isotropic hydrogels with a much higher modulus (G' > 10(4) Pa), and lower concentration for hydrogel formation. The distinct self-assembled structures are ascribed to conformational differences, as revealed by secondary structure probes (CD, FTIR, Raman spectroscopy) and X-ray diffraction. Fmoc-VLK(Boc) forms well-defined beta-sheets with a cross-beta X-ray diffraction pattern, whereas Fmoc-KLV(Boc) forms unoriented assemblies with multiple stacked sheets. Interchange of the K and V residues when inverting the tripeptide sequence thus leads to substantial differences in self-assembled structures, suggesting a promising approach to control hydrogel properties.
Resumo:
The elucidation of the domain content of a given protein sequence in the absence of determined structure or significant sequence homology to known domains is an important problem in structural biology. Here we address how successfully the delineation of continuous domains can be accomplished in the absence of sequence homology using simple baseline methods, an existing prediction algorithm (Domain Guess by Size), and a newly developed method (DomSSEA). The study was undertaken with a view to measuring the usefulness of these prediction methods in terms of their application to fully automatic domain assignment. Thus, the sensitivity of each domain assignment method was measured by calculating the number of correctly assigned top scoring predictions. We have implemented a new continuous domain identification method using the alignment of predicted secondary structures of target sequences against observed secondary structures of chains with known domain boundaries as assigned by Class Architecture Topology Homology (CATH). Taking top predictions only, the success rate of the method in correctly assigning domain number to the representative chain set is 73.3%. The top prediction for domain number and location of domain boundaries was correct for 24% of the multidomain set (±20 residues). These results have been put into context in relation to the results obtained from the other prediction methods assessed