971 resultados para SEQUENCE ALIGNMENT
Resumo:
Motivation: Within bioinformatics, the textual alignment of amino acid sequences has long dominated the determination of similarity between proteins, with all that implies for shared structure, function, and evolutionary descent. Despite the relative success of modern-day sequence alignment algorithms, so-called alignment-free approaches offer a complementary means of determining and expressing similarity, with potential benefits in certain key applications, such as regression analysis of protein structure-function studies, where alignment-base similarity has performed poorly. Results: Here, we offer a fresh, statistical physics-based perspective focusing on the question of alignment-free comparison, in the process adapting results from “first passage probability distribution” to summarize statistics of ensemble averaged amino acid propensity values. In this paper, we introduce and elaborate this approach.
Resumo:
Background Flower development in kiwifruit (Actinidia spp.) is initiated in the first growing season, when undifferentiated primordia are established in latent shoot buds. These primordia can differentiate into flowers in the second growing season, after the winter dormancy period and upon accumulation of adequate winter chilling. Kiwifruit is an important horticultural crop, yet little is known about the molecular regulation of flower development. Results To study kiwifruit flower development, nine MADS-box genes were identified and functionally characterized. Protein sequence alignment, phenotypes obtained upon overexpression in Arabidopsis and expression patterns suggest that the identified genes are required for floral meristem and floral organ specification. Their role during budbreak and flower development was studied. A spontaneous kiwifruit mutant was utilized to correlate the extended expression domains of these flowering genes with abnormal floral development. Conclusions This study provides a description of flower development in kiwifruit at the molecular level. It has identified markers for flower development, and candidates for manipulation of kiwifruit growth, phase change and time of flowering. The expression in normal and aberrant flowers provided a model for kiwifruit flower development.
Resumo:
Biopanning of phage-displayed random peptide libraries is a powerful technique for identifying peptides that mimic epitopes (mimotopes) for monoclonal antibodies (mAbs). However, peptides derived using polyclonal antisera may represent epitopes for a diverse range of antibodies. Hence following screening of phage libraries with polyclonal antisera, including autoimmune disease sera, a procedure is required to distinguish relevant from irrelevant phagotopes. We therefore applied the multiple sequence alignment algorithm PILEUP together with a matrix for scoring amino acid substitutions based on physicochemical properties to generate guide trees depicting relatedness of selected peptides. A random heptapeptide library was biopanned nine times using no selecting antibodies, immunoglobulin G (IgG) from sera of subjects with autoimmune diseases (primary biliary cirrhosis (PBC) and type 1 diabetes) and three murine ascites fluids that contained mAbs to overlapping epitope(s) on the Ross River Virus envelope protein 2. Peptides randomly sampled from the library were distributed throughout the guide tree of the total set of peptides whilst many of the peptides derived in the absence of selecting antibody aligned to a single cluster. Moreover peptides selected by different sources of IgG aligned to separate clusters, each with a different amino acid motif. These alignments were validated by testing all of the 53 phagotopes derived using IgG from PBC sera for reactivity by capture ELISA with antibodies affinity purified on the E2 subunit of the pyruvate dehydrogenase complex (PDC-E2), the major autoantigen in PBC: only those phagotopes that aligned to PBC-associated clusters were reactive. Hence the multiple sequence alignment procedure discriminates relevant from irrelevant phagotopes and thus a major difficulty with biopanning phage-displayed random peptide libraries with polyclonal antibodies is surmounted.
Resumo:
The discovery of GH (Glycoside Hydrolase) 19 chitinases in Streptomyces sp. raises the possibility of the presence of these proteins in other bacterial species, since they were initially thought to be confined to higher plants. The present study mainly concentrates on the phylogenetic distribution and homology conservation in GH19 family chitinases. Extensive database searches are performed to identify the presence of GH19 family chitinases in the three major super kingdoms of life. Multiple sequence alignment of all the identified GH19 chitinase family members resulted in the identification of globally conserved residues. We further identified conserved sequence motifs across the major sub groups within the family. Estimation of evolutionary distance between the various bacterial and plant chitinases are carried out to better understand the pattern of evolution. Our study also supports the horizontal gene transfer theory, which states that GH19 chitinase genes are transferred from higher plants to bacteria. Further, the present study sheds light on the phylogenetic distribution and identifies unique sequence signatures that define GH19 chitinase family of proteins. The identified motifs could be used as markers to delineate uncharacterized GH19 family chitinases. The estimation of evolutionary distance between chitinase identified in plants and bacteria shows that the flowering plants are more related to chitinase in actinobacteria than that of identified in purple bacteria. We propose a model to elucidate the natural history of GH19 family chitinases.
Resumo:
The increasing number of available protein structures requires efficient tools for multiple structure comparison. Indeed, multiple structural alignments are essential for the analysis of function, evolution and architecture of protein structures. For this purpose, we proposed a new web server called multiple Protein Block Alignment (mulPBA). This server implements a method based on a structural alphabet to describe the backbone conformation of a protein chain in terms of dihedral angles. This sequence-like' representation enables the use of powerful sequence alignment methods for primary structure comparison, followed by an iterative refinement of the structural superposition. This approach yields alignments superior to most of the rigid-body alignment methods and highly comparable with the flexible structure comparison approaches. We implement this method in a web server designed to do multiple structure superimpositions from a set of structures given by the user. Outputs are given as both sequence alignment and superposed 3D structures visualized directly by static images generated by PyMol or through a Jmol applet allowing dynamic interaction. Multiple global quality measures are given. Relatedness between structures is indicated by a distance dendogram. Superimposed structures in PDB format can be also downloaded, and the results are quickly obtained. mulPBA server can be accessed at www.dsimb.inserm.fr/dsimb_tools/mulpba/.
Resumo:
Human La protein is known to be an essential host factor for translation and replication of hepatitis C virus (HCV) RNA. Previously, we have demonstrated that residues responsible for interaction of human La protein with the HCV internal ribosomal entry site (IRES) around the initiator AUG within stem-loop IV form a beta-turn in the RNA recognition motif (RRM) structure. In this study, sequence alignment and mutagenesis suggest that the HCV RNA-interacting beta-turn is conserved only in humans and chimpanzees, the species primarily known to be infected by HCV. A 7-mer peptide corresponding to the HCV RNA-interacting region of human La inhibits HCV translation, whereas another peptide corresponding to the mouse La sequence was unable to do so. Furthermore, IRES-mediated translation was found to be significantly high in the presence of recombinant human La protein in vitro in rabbit reticulocyte lysate. We observed enhanced replication with HCV subgenomic and full-length replicons upon overexpression of either human La protein or a chimeric mouse La protein harboring a human La beta-turn sequence in mouse cells. Taken together, our results raise the possibility of creating an immunocompetent HCV mouse model using human-specific cell entry factors and a humanized form of La protein.
Resumo:
As a basic tool of modern biology, sequence alignment can provide us useful information in fold, function, and active site of protein. For many cases, the increased quality of sequence alignment means a better performance. The motivation of present work is to increase ability of the existing scoring scheme/algorithm by considering residue–residue correlations better. Based on a coarse-grained approach, the hydrophobic force between each pair of residues is written out from protein sequence. It results in the construction of an intramolecular hydrophobic force network that describes the whole residue–residue interactions of each protein molecule, and characterizes protein's biological properties in the hydrophobic aspect. A former work has suggested that such network can characterize the top weighted feature regarding hydrophobicity. Moreover, for each homologous protein of a family, the corresponding network shares some common and representative family characters that eventually govern the conservation of biological properties during protein evolution. In present work, we score such family representative characters of a protein by the deviation of its intramolecular hydrophobic force network from that of background. Such score can assist the existing scoring schemes/algorithms, and boost up the ability of multiple sequences alignment, e.g. achieving a prominent increase (50%) in searching the structurally alike residue segments at a low identity level. As the theoretical basis is different, the present scheme can assist most existing algorithms, and improve their efficiency remarkably.
Resumo:
The structure-based sequence motif of the distant proteins in evolution, protein tyrosine phosphatases (PTP) I and II superfamilies, as an example, has been defined by the structural comparison, structure-based sequence alignment and analyses on substitut
Resumo:
BACKGROUND: With the maturation of next-generation DNA sequencing (NGS) technologies, the throughput of DNA sequencing reads has soared to over 600 gigabases from a single instrument run. General purpose computing on graphics processing units (GPGPU), extracts the computing power from hundreds of parallel stream processors within graphics processing cores and provides a cost-effective and energy efficient alternative to traditional high-performance computing (HPC) clusters. In this article, we describe the implementation of BarraCUDA, a GPGPU sequence alignment software that is based on BWA, to accelerate the alignment of sequencing reads generated by these instruments to a reference DNA sequence. FINDINGS: Using the NVIDIA Compute Unified Device Architecture (CUDA) software development environment, we ported the most computational-intensive alignment component of BWA to GPU to take advantage of the massive parallelism. As a result, BarraCUDA offers a magnitude of performance boost in alignment throughput when compared to a CPU core while delivering the same level of alignment fidelity. The software is also capable of supporting multiple CUDA devices in parallel to further accelerate the alignment throughput. CONCLUSIONS: BarraCUDA is designed to take advantage of the parallelism of GPU to accelerate the alignment of millions of sequencing reads generated by NGS instruments. By doing this, we could, at least in part streamline the current bioinformatics pipeline such that the wider scientific community could benefit from the sequencing technology.BarraCUDA is currently available from http://seqbarracuda.sf.net.
Resumo:
The computational detection of regulatory elements in DNA is a difficult but important problem impacting our progress in understanding the complex nature of eukaryotic gene regulation. Attempts to utilize cross-species conservation for this task have been hampered both by evolutionary changes of functional sites and poor performance of general-purpose alignment programs when applied to non-coding sequence. We describe a new and flexible framework for modeling binding site evolution in multiple related genomes, based on phylogenetic pair hidden Markov models which explicitly model the gain and loss of binding sites along a phylogeny. We demonstrate the value of this framework for both the alignment of regulatory regions and the inference of precise binding-site locations within those regions. As the underlying formalism is a stochastic, generative model, it can also be used to simulate the evolution of regulatory elements. Our implementation is scalable in terms of numbers of species and sequence lengths and can produce alignments and binding-site predictions with accuracy rivaling or exceeding current systems that specialize in only alignment or only binding-site prediction. We demonstrate the validity and power of various model components on extensive simulations of realistic sequence data and apply a specific model to study Drosophila enhancers in as many as ten related genomes and in the presence of gain and loss of binding sites. Different models and modeling assumptions can be easily specified, thus providing an invaluable tool for the exploration of biological hypotheses that can drive improvements in our understanding of the mechanisms and evolution of gene regulation.
Resumo:
Cyathostomins comprise a group of 50 species of parasitic nematodes that infect equids. Ribosomal DNA sequences, in particular the intergenic spacer (IGS) region, have been utilized via several methodologies to identify pre-parasitic stages of the commonest species that affect horses. These methods rely on the availability of accurate sequence information for each species, as well as detailed knowledge of the levels of intra- and inter-specific variation. Here, the IGS DNA region was amplified and sequenced from 10 cyathostomin species for which sequence was not previously available. Also, additional IGS DNA sequences were generated from individual worms of 8 species already studied. Comparative analysis of these sequences revealed a greater range of intra-specific variation than previously reported (up to 23%); whilst the level of inter-specific variation (3-62%) was similar to that identified in earlier studies. The reverse line blot (RLB) method has been used to exploit the cyathostomin IGS DNA region for species identification. Here, we report validation of novel and existing DNA probes for identification of cyathostomins using this method and highlight their application in differentiating life-cycle stages such as third-stage larvae that cannot be identified to species by morphological means.
Resumo:
Motivation: DNA assembly programs classically perform an all-against-all comparison of reads to identify overlaps, followed by a multiple sequence alignment and generation of a consensus sequence. If the aim is to assemble a particular segment, instead of a whole genome or transcriptome, a target-specific assembly is a more sensible approach. GenSeed is a Perl program that implements a seed-driven recursive assembly consisting of cycles comprising a similarity search, read selection and assembly. The iterative process results in a progressive extension of the original seed sequence. GenSeed was tested and validated on many applications, including the reconstruction of nuclear genes or segments, full-length transcripts, and extrachromosomal genomes. The robustness of the method was confirmed through the use of a variety of DNA and protein seeds, including short sequences derived from SAGE and proteome projects.
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
Sao Paulo State Research Foundation-FAPESP