873 resultados para Motif Discovery
Resumo:
Understanding the machinery of gene regulation to control gene expression has been one of the main focuses of bioinformaticians for years. We use a multi-objective genetic algorithm to evolve a specialized version of side effect machines for degenerate motif discovery. We compare some suggested objectives for the motifs they find, test different multi-objective scoring schemes and probabilistic models for the background sequence models and report our results on a synthetic dataset and some biological benchmarking suites. We conclude with a comparison of our algorithm with some widely used motif discovery algorithms in the literature and suggest future directions for research in this area.
Resumo:
cERMIT is a computationally efficient motif discovery tool based on analyzing genome-wide quantitative regulatory evidence. Instead of pre-selecting promising candidate sequences, it utilizes information across all sequence regions to search for high-scoring motifs. We apply cERMIT on a range of direct binding and overexpression datasets; it substantially outperforms state-of-the-art approaches on curated ChIP-chip datasets, and easily scales to current mammalian ChIP-seq experiments with data on thousands of non-coding regions.
Resumo:
Background: Regulation of gene expression in Plasmodium falciparum (Pf) remains poorly understood. While over half the genes are estimated to be regulated at the transcriptional level, few regulatory motifs and transcription regulators have been found. Results: The study seeks to identify putative regulatory motifs in the upstream regions of 13 functional groups of genes expressed in the intraerythrocytic developmental cycle of Pf. Three motif-discovery programs were used for the purpose, and motifs were searched for only on the gene coding strand. Four motifs – the 'G-rich', the 'C-rich', the 'TGTG' and the 'CACA' motifs – were identified, and zero to all four of these occur in the 13 sets of upstream regions. The 'CACA motif' was absent in functional groups expressed during the ring to early trophozoite transition. For functional groups expressed in each transition, the motifs tended to be similar. Upstream motifs in some functional groups showed 'positional conservation' by occurring at similar positions relative to the translational start site (TLS); this increases their significance as regulatory motifs. In the ribonucleotide synthesis, mitochondrial, proteasome and organellar translation machinery genes, G-rich, C-rich, CACA and TGTG motifs, respectively, occur with striking positional conservation. In the organellar translation machinery group, G-rich motifs occur close to the TLS. The same motifs were sometimes identified for multiple functional groups; differences in location and abundance of the motifs appear to ensure different modes of action. Conclusion: The identification of positionally conserved over-represented upstream motifs throws light on putative regulatory elements for transcription in Pf.
Resumo:
Les facteurs de transcription sont des protéines spécialisées qui jouent un rôle important dans différents processus biologiques tel que la différenciation, le cycle cellulaire et la tumorigenèse. Ils régulent la transcription des gènes en se fixant sur des séquences d’ADN spécifiques (éléments cis-régulateurs). L’identification de ces éléments est une étape cruciale dans la compréhension des réseaux de régulation des gènes. Avec l’avènement des technologies de séquençage à haut débit, l’identification de tout les éléments fonctionnels dans les génomes, incluant gènes et éléments cis-régulateurs a connu une avancée considérable. Alors qu’on est arrivé à estimer le nombre de gènes chez différentes espèces, l’information sur les éléments qui contrôlent et orchestrent la régulation de ces gènes est encore mal définie. Grace aux techniques de ChIP-chip et de ChIP-séquençage il est possible d’identifier toutes les régions du génome qui sont liées par un facteur de transcription d’intérêt. Plusieurs approches computationnelles ont été développées pour prédire les sites fixés par les facteurs de transcription. Ces approches sont classées en deux catégories principales: les algorithmes énumératifs et probabilistes. Toutefois, plusieurs études ont montré que ces approches génèrent des taux élevés de faux négatifs et de faux positifs ce qui rend difficile l’interprétation des résultats et par conséquent leur validation expérimentale. Dans cette thèse, nous avons ciblé deux objectifs. Le premier objectif a été de développer une nouvelle approche pour la découverte des sites de fixation des facteurs de transcription à l’ADN (SAMD-ChIP) adaptée aux données de ChIP-chip et de ChIP-séquençage. Notre approche implémente un algorithme hybride qui combine les deux stratégies énumérative et probabiliste, afin d’exploiter les performances de chacune d’entre elles. Notre approche a montré ses performances, comparée aux outils de découvertes de motifs existants sur des jeux de données simulées et des jeux de données de ChIP-chip et de ChIP-séquençage. SAMD-ChIP présente aussi l’avantage d’exploiter les propriétés de distributions des sites liés par les facteurs de transcription autour du centre des régions liées afin de limiter la prédiction aux motifs qui sont enrichis dans une fenêtre de longueur fixe autour du centre de ces régions. Les facteurs de transcription agissent rarement seuls. Ils forment souvent des complexes pour interagir avec l’ADN pour réguler leurs gènes cibles. Ces interactions impliquent des facteurs de transcription dont les sites de fixation à l’ADN sont localisés proches les uns des autres ou bien médier par des boucles de chromatine. Notre deuxième objectif a été d’exploiter la proximité spatiale des sites liés par les facteurs de transcription dans les régions de ChIP-chip et de ChIP-séquençage pour développer une approche pour la prédiction des motifs composites (motifs composés par deux sites et séparés par un espacement de taille fixe). Nous avons testé ce module pour prédire la co-localisation entre les deux demi-sites ERE qui forment le site ERE, lié par le récepteur des œstrogènes ERα. Ce module a été incorporé à notre outil de découverte de motifs SAMD-ChIP.
Resumo:
Background: Current methods to find significantly under- and over-represented gene ontology (GO) terms in a set of genes consider the genes as equally probable balls in a bag, as may be appropriate for transcripts in micro-array data. However, due to the varying length of genes and intergenic regions, that approach is inappropriate for deciding if any GO terms are correlated with a set of genomic positions. Results: We present an algorithm - GONOME - that can determine which GO terms are significantly associated with a set of genomic positions given a genome annotated with (at least) the starts and ends of genes. We show that certain GO terms may appear to be significantly associated with a set of randomly chosen positions in the human genome if gene lengths are not considered, and that these same terms have been reported as significantly over-represented in a number of recent papers. This apparent over-representation disappears when gene lengths are considered, as GONOME does. For example, we show that, when gene length is taken into account, the term development is not significantly enriched in genes associated with human CpG islands, in contradiction to a previous report. We further demonstrate the efficacy of GONOME by showing that occurrences of the proteosome-associated control element (PACE) upstream activating sequence in the S. cerevisiae genome associate significantly to appropriate GO terms. An extension of this approach yields a whole-genome motif discovery algorithm that allows identification of many other promoter sequences linked to different types of genes, including a large group of previously unknown motifs significantly associated with the terms 'translation' and 'translational elongation'. Conclusion: GONOME is an algorithm that correctly extracts over-represented GO terms from a set of genomic positions. By explicitly considering gene size, GONOME avoids a systematic bias toward GO terms linked to large genes. Inappropriate use of existing algorithms that do not take gene size into account has led to erroneous or suspect conclusions. Reciprocally GONOME may be used to identify new features in genomes that are significantly associated with particular categories of genes.
Resumo:
Cyclotides are a fascinating family of plant-derived peptides characterized by their head-to-tail cyclized backbone and knotted arrangement of three disulfide bonds. This conserved structural architecture, termed the CCK (cyclic cystine knot), is responsible for their exceptional resistance to thermal, chemical and enzymatic degradation. Cyclotides have a variety of biological activities, but their insecticidal activities suggest that their primary function is in plant defence. In the present study, we determined the cyclotide content of the sweet violet Viola odorata, a member of the Violaceae family. We identified 30 cyclotides from the aerial parts and roots of this plant, 13 of which are novel sequences. The new sequences provide information about the natural diversity of cyclotides and the role of particular residues in defining structure and function. As many of the biological activities of cyclotides appear to be associated with membrane interactions, we used haemolytic activity as a marker of bioactivity for a selection of the new cyclotides. The new cyclotides were tested for their ability to resist proteolysis by a range of enzymes and, in common with other cyclotides, were completely resistant to trypsin, pepsin and thermolysin. The results show that while biological activity varies with the sequence, the proteolytic stability of the framework does not, and appears to be an inherent feature of the cyclotide framework. The structure of one of the new cyclotides, cycloviolacin O14, was determined and shown to contain the CCK motif. This study confirms that cyclotides may be regarded as a natural combinatorial template that displays a variety of peptide epitopes most likely targeted to a range of plant pests and pathogens.
Resumo:
Stimulated by the efficacy of copper (I) catalysed Huisgen-type 1,3-dipolar cycloaddition of terminal alkynes and organic azides to generate 1,4-disubstituted 1,2,3-triazole derivatives, the importance of ‘click’ chemistry in the synthesis of organic and biological molecular systems is ever increasing.[1] The mild reaction conditions have also led to this reaction gaining favour in the construction of interlocked molecular architectures.[2-4] In the majority of cases however, the triazole group simply serves as a covalent linkage with no function in the resulting organic molecular framework. More recently a renewed interest has been shown in the transition metal coordination chemistry of triazole ligands.[3, 5, 6] In addition novel aryl macrocyclic and acyclic triazole based oligomers have been shown to recognise halide anions via cooperative triazole C5-H….anion hydrogen bonds.[7] In light of this it is surprising the potential anion binding affinity of the positively charged triazolium motif has not, with one notable exception,[8] been investigated. With the objective of manipulating the unique topological cavities of mechanically bonded molecules for anion recognition purposes, we have developed general methods of using anions to template the formation of interpenetrated and interlocked structures.[9-13] Herein we report the first examples of exploiting the 1,2,3-triazolium group in the anion templated formation of pseudorotaxane and rotaxane assemblies. In an unprecedented discovery the bromide anion is shown to be a superior templating reagent to chloride in the synthesis of a novel triazolium axle containing [2]rotaxane. Furthermore the resulting rotaxane interlocked host system exhibits the rare selectivity preference for bromide over chloride...
Resumo:
Dinuclear complexes containing a (mu-oxo)bis(mu-carboxylato) diruthenium (III) core have been prepared by a novel synthetic route using metal-metal bonded diruthenium(II,III) tetracarboxylates as precursors. The complexes have been structurally characterized and they are redox active. The terminal ligands play an important role in tuning the electronic structure of the core. The stability of the core is found to be dependent on the size and pi-acidic nature of the terminal ligand cis- to the mu-oxo ligand. The chemistry of such tribridged complexes is relatively new. The rapid growth of this chemistry is based on the discovery of similar core structures in several non-heme iron- and manganese-containing metalloproteins. The tribridged core presents a new structural motif in coordination chemistry. The chemistry of diruthenium complexes with a [Ru-2(mu-O) (mu-O(2)CR)(2)(2+)] core has been reviewed.
Resumo:
A combined computational and experimental polymorph search was undertaken to establish the crystal forms of 7-fluoroisatin, a simple molecule with no reported crystal structures, to evaluate the value of crystal structure prediction studies as an aid to solid form discovery. Three polymorphs were found in a manual crystallisation screen, as well as two solvates. Form I ( P2(1)/c, Z0 1), found from the majority of solvent evaporation experiments, corresponded to the most stable form in the computational search of Z0 1 structures. Form III ( P21/ a, Z0 2) is probably a metastable form, which was only found concomitantly with form I, and has the same dimeric R2 2( 8) hydrogen bonding motif as form I and the majority of the computed low energy structures. However, the most thermodynamically stable polymorph, form II ( P1 , Z0 2), has an expanded four molecule R 4 4( 18) hydrogen bonding motif, which could not have been found within the routine computational study. The computed relative energies of the three forms are not in accord with experimental results. Thus, the experimental finding of three crystalline polymorphs of 7- fluoroisatin illustrates the many challenges for computational screening to be a tool for the experimental crystal engineer, in contrast to the results for an analogous investigation of 5- fluoroisatin.
Resumo:
The cyclotides are a family of small disulfide rich proteins that have a cyclic peptide backbone and a cystine knot formed by three conserved disulfide bonds. The combination of these two structural motifs contributes to the exceptional chemical, thermal and enzymatic stability of the cyclotides, which retain bioactivity after boiling. They were initially discovered based on native medicine or screening studies associated with some of their various activities, which include uterotonic action, anti-HIV activity, neurotensin antagonism, and cytotoxicity. They are present in plants from the Rubiaceae, Violaceae and Cucurbitaccae families and their natural function in plants appears to be in host defense: they have potent activity against certain insect pests and they also have antimicrobial activity. There are currently around 50 published sequences of cyclotides and their rate of discovery has been increasing over recent years. Ultimately the family may comprise thousands of members. This article describes the background to the discovery of the cyclotides, their structural characterization, chemical synthesis, genetic origin, biological activities and potential applications in the pharmaceutical and agricultural industries. Their unique topological features make them interesting from a protein folding perspective. Because of their highly stable peptide framework they might make useful templates in drug design programs, and their insecticidal activity opens the possibility of applications in crop protection.
Resumo:
This project identified a novel family of six 66-68 residue peptides from the venom of two Australian funnel-web spiders, Hadronyche sp. 20 and H. infensa: Orchid Beach (Hexathelidae: Atracinae), that appear to undergo N- and/or C-terminal post-translational modifications and conform to an ancestral protein fold. These peptides all show significant amino acid sequence homology to atracotoxin-Hvf17 (ACTX-Hvf17), a non-toxic peptide isolated from the venom of H. versuta, and a variety of AVIT family proteins including mamba intestinal toxin 1 (MIT1) and its mammalian and piscine orthologs prokineticin 1 (PK1) and prokineticin 2 PK2). These AVIT family proteins target prokineticin receptors involved in the sensitization of nociceptors and gastrointestinal smooth muscle activation. Given their sequence homology to MITI, we have named these spider venom peptides the MIT-like atracotoxin (ACTX) family. Using isolated rat stomach fundus or guinea-pia ileum organ bath preparations we have shown that the prototypical ACTX-Hvf17, at concentrations up to 1 mu M, did not stimulate smooth muscle contractility, nor did it inhibit contractions induced by human PK1 (hPK1). The peptide also lacked activity on other isolated smooth muscle preparations including rat aorta. Furthermore, a FLIPR Ca2+ flux assay using HEK293 cells expressing prokineticin receptors showed that ACTX-Hvf17 fails to activate or block hPK1 or hPK2 receptors. Therefore, while the MIT-like ACTX family appears to adopt the ancestral disulfide-directed beta-hairpin protein fold of MIT1, a motif believed to be shared by other AVIT family peptides, variations in the amino acid sequence and surface charge result in a loss of activity on prokineticin receptors. (c) 2005 Elsevier Inc. All rights reserved.
Resumo:
Cyclotides are mini-proteins of 28-37 amino acid residues that have the unusual feature of a head-to-tail cyclic backbone surrounding a cystine knot. This molecular architecture gives the cyclotides heightened resistance to thermal, chemical and enzymatic degradation and has prompted investigations into their use as scaffolds in peptide therapeutics. There are now more than 80 reported cyclotide sequences from plants in the families Rubiaceae, Violaceae and Cucurbitaceae, with a wide variety of biological activities observed. However, potentially limiting the development of cyclotide-based therapeutics is a lack of understanding of the mechanism by which these peptides are cyclized in vivo. Until now, no linear versions of cyclotides have been reported, limiting our understanding of the cyclization mechanism. This study reports the discovery of a naturally occurring linear cyclotide, violacin A, from the plant Viola odorata and discusses the implications for in vivo cyclization of peptides. The elucidation of the cDNA clone of violacin A revealed a point mutation that introduces a stop codon, which inhibits the translation of a key Asn residue that is thought to be required for cyclization. The three-dimensional solution structure of violacin A was determined and found to adopt the cystine knot fold of native cyclotides. Enzymatic stability assays on violacin A indicate that despite an increase in the flexibility of the structure relative to cyclic counterparts, the cystine knot preserves the overall stability of the molecule. (c) 2006 Elsevier Ltd. All rights reserved.
Resumo:
Cyclotides are peptides from plants of the Rubiaceae and Violaceae families that have the unusual characteristic of a macrocylic backbone. They are further characterized by their incorporation of a cystine knot in which two disulfides, along with the intervening backbone residues, form a ring through which a third disulfide is threaded. The cyclotides have been found in every Violaceae species screened to date but are apparently present in only a few Rubiaceae species. The selective distribution reported so far raises questions about the evolution of the cyclotides within the plant kingdom. In this study, we use a combined bioinformatics and expression analysis approach to elucidate the evolution and distribution of the cyclotides in the plant kingdom and report the discovery of related sequences widespread in the Poaceae family, including crop plants such as rice ( Oryza sativa), maize ( Zea mays), and wheat ( Triticum aestivum), which carry considerable economic and social importance. The presence of cyclotide-like sequences within these plants suggests that the cyclotides may be derived from an ancestral gene of great antiquity. Quantitative RT-PCR was used to show that two of the discovered cyclotide-like genes from rice and barley ( Hordeum vulgare) have tissue-specific expression patterns.
Resumo:
The cyclotides are a family of circular proteins with a range of biological activities and potential pharmaceutical and agricultural applications. The biosynthetic mechanism of cyclization is unknown and the discovery of novel sequences may assist in achieving this goal. In the present study, we have isolated a new cyclotide from Oldenlandia affinis, kalata B8, which appears to be a hybrid of the two major subfamilies (Mobius and bracelet) of currently known cyclotides. We have determined the three-dimensional structure of kalata B8 and observed broadening of resonances directly involved in the cystine knot motif, suggesting flexibility in this region despite it being the core structural element of the cyclotides. The cystine knot motif is widespread throughout Nature and inherently stable, making this apparent flexibility a surprising result. Further-more, there appears to be isomerization of the peptide backbone at an Asp-Gly sequence in the region involved in the cyclization process. Interestingly, such isomerization has been previously characterized in related cyclic knottins from Momordica cochinchinensis that have no sequence similarity to kalata B8 apart from the six conserved cysteine residues and may result from a common mechanism of cyclization. Kalata B8 also provides insight into the structure-activity relationships of cyclotides as it displays anti-HIV activity but lacks haemolytic activity. The 'uncoupling' of these two activities has not previously been observed for the cyclotides and may be related to the unusual hydrophilic nature of the peptide.
Dinoflagellate Genomic Organization and Phylogenetic Marker Discovery Utilizing Deep Sequencing Data
Resumo:
Dinoflagellates possess large genomes in which most genes are present in many copies. This has made studies of their genomic organization and phylogenetics challenging. Recent advances in sequencing technology have made deep sequencing of dinoflagellate transcriptomes feasible. This dissertation investigates the genomic organization of dinoflagellates to better understand the challenges of assembling dinoflagellate transcriptomic and genomic data from short read sequencing methods, and develops new techniques that utilize deep sequencing data to identify orthologous genes across a diverse set of taxa. To better understand the genomic organization of dinoflagellates, a genomic cosmid clone of the tandemly repeated gene Alchohol Dehydrogenase (AHD) was sequenced and analyzed. The organization of this clone was found to be counter to prevailing hypotheses of genomic organization in dinoflagellates. Further, a new non-canonical splicing motif was described that could greatly improve the automated modeling and annotation of genomic data. A custom phylogenetic marker discovery pipeline, incorporating methods that leverage the statistical power of large data sets was written. A case study on Stramenopiles was undertaken to test the utility in resolving relationships between known groups as well as the phylogenetic affinity of seven unknown taxa. The pipeline generated a set of 373 genes useful as phylogenetic markers that successfully resolved relationships among the major groups of Stramenopiles, and placed all unknown taxa on the tree with strong bootstrap support. This pipeline was then used to discover 668 genes useful as phylogenetic markers in dinoflagellates. Phylogenetic analysis of 58 dinoflagellates, using this set of markers, produced a phylogeny with good support of all branches. The Suessiales were found to be sister to the Peridinales. The Prorocentrales formed a monophyletic group with the Dinophysiales that was sister to the Gonyaulacales. The Gymnodinales was found to be paraphyletic, forming three monophyletic groups. While this pipeline was used to find phylogenetic markers, it will likely also be useful for finding orthologs of interest for other purposes, for the discovery of horizontally transferred genes, and for the separation of sequences in metagenomic data sets.