980 resultados para regulatory networks
Resumo:
Exponential growth of genomic data in the last two decades has made manual analyses impractical for all but trial studies. As genomic analyses have become more sophisticated, and move toward comparisons across large datasets, computational approaches have become essential. One of the most important biological questions is to understand the mechanisms underlying gene regulation. Genetic regulation is commonly investigated and modelled through the use of transcriptional regulatory network (TRN) structures. These model the regulatory interactions between two key components: transcription factors (TFs) and the target genes (TGs) they regulate. Transcriptional regulatory networks have proven to be invaluable scientific tools in Bioinformatics. When used in conjunction with comparative genomics, they have provided substantial insights into the evolution of regulatory interactions. Current approaches to regulatory network inference, however, omit two additional key entities: promoters and transcription factor binding sites (TFBSs). In this study, we attempted to explore the relationships among these regulatory components in bacteria. Our primary goal was to identify relationships that can assist in reducing the high false positive rates associated with transcription factor binding site predictions and thereupon enhance the reliability of the inferred transcription regulatory networks. In our preliminary exploration of relationships between the key regulatory components in Escherichia coli transcription, we discovered a number of potentially useful features. The combination of location score and sequence dissimilarity scores increased de novo binding site prediction accuracy by 13.6%. Another important observation made was with regards to the relationship between transcription factors grouped by their regulatory role and corresponding promoter strength. Our study of E.coli ��70 promoters, found support at the 0.1 significance level for our hypothesis | that weak promoters are preferentially associated with activator binding sites to enhance gene expression, whilst strong promoters have more repressor binding sites to repress or inhibit gene transcription. Although the observations were specific to �70, they nevertheless strongly encourage additional investigations when more experimentally confirmed data are available. In our preliminary exploration of relationships between the key regulatory components in E.coli transcription, we discovered a number of potentially useful features { some of which proved successful in reducing the number of false positives when applied to re-evaluate binding site predictions. Of chief interest was the relationship observed between promoter strength and TFs with respect to their regulatory role. Based on the common assumption, where promoter homology positively correlates with transcription rate, we hypothesised that weak promoters would have more transcription factors that enhance gene expression, whilst strong promoters would have more repressor binding sites. The t-tests assessed for E.coli �70 promoters returned a p-value of 0.072, which at 0.1 significance level suggested support for our (alternative) hypothesis; albeit this trend may only be present for promoters where corresponding TFBSs are either all repressors or all activators. Nevertheless, such suggestive results strongly encourage additional investigations when more experimentally confirmed data will become available. Much of the remainder of the thesis concerns a machine learning study of binding site prediction, using the SVM and kernel methods, principally the spectrum kernel. Spectrum kernels have been successfully applied in previous studies of protein classification [91, 92], as well as the related problem of promoter predictions [59], and we have here successfully applied the technique to refining TFBS predictions. The advantages provided by the SVM classifier were best seen in `moderately'-conserved transcription factor binding sites as represented by our E.coli CRP case study. Inclusion of additional position feature attributes further increased accuracy by 9.1% but more notable was the considerable decrease in false positive rate from 0.8 to 0.5 while retaining 0.9 sensitivity. Improved prediction of transcription factor binding sites is in turn extremely valuable in improving inference of regulatory relationships, a problem notoriously prone to false positive predictions. Here, the number of false regulatory interactions inferred using the conventional two-component model was substantially reduced when we integrated de novo transcription factor binding site predictions as an additional criterion for acceptance in a case study of inference in the Fur regulon. This initial work was extended to a comparative study of the iron regulatory system across 20 Yersinia strains. This work revealed interesting, strain-specific difierences, especially between pathogenic and non-pathogenic strains. Such difierences were made clear through interactive visualisations using the TRNDifi software developed as part of this work, and would have remained undetected using conventional methods. This approach led to the nomination of the Yfe iron-uptake system as a candidate for further wet-lab experimentation due to its potential active functionality in non-pathogens and its known participation in full virulence of the bubonic plague strain. Building on this work, we introduced novel structures we have labelled as `regulatory trees', inspired by the phylogenetic tree concept. Instead of using gene or protein sequence similarity, the regulatory trees were constructed based on the number of similar regulatory interactions. While the common phylogentic trees convey information regarding changes in gene repertoire, which we might regard being analogous to `hardware', the regulatory tree informs us of the changes in regulatory circuitry, in some respects analogous to `software'. In this context, we explored the `pan-regulatory network' for the Fur system, the entire set of regulatory interactions found for the Fur transcription factor across a group of genomes. In the pan-regulatory network, emphasis is placed on how the regulatory network for each target genome is inferred from multiple sources instead of a single source, as is the common approach. The benefit of using multiple reference networks, is a more comprehensive survey of the relationships, and increased confidence in the regulatory interactions predicted. In the present study, we distinguish between relationships found across the full set of genomes as the `core-regulatory-set', and interactions found only in a subset of genomes explored as the `sub-regulatory-set'. We found nine Fur target gene clusters present across the four genomes studied, this core set potentially identifying basic regulatory processes essential for survival. Species level difierences are seen at the sub-regulatory-set level; for example the known virulence factors, YbtA and PchR were found in Y.pestis and P.aerguinosa respectively, but were not present in both E.coli and B.subtilis. Such factors and the iron-uptake systems they regulate, are ideal candidates for wet-lab investigation to determine whether or not they are pathogenic specific. In this study, we employed a broad range of approaches to address our goals and assessed these methods using the Fur regulon as our initial case study. We identified a set of promising feature attributes; demonstrated their success in increasing transcription factor binding site prediction specificity while retaining sensitivity, and showed the importance of binding site predictions in enhancing the reliability of regulatory interaction inferences. Most importantly, these outcomes led to the introduction of a range of visualisations and techniques, which are applicable across the entire bacterial spectrum and can be utilised in studies beyond the understanding of transcriptional regulatory networks.
Resumo:
The transcriptional regulation of gene expression is orchestrated by complex networks of interacting genes. Increasing evidence indicates that these `transcriptional regulatory networks' (TRNs) in bacteria have an inherently hierarchical architecture, although the design principles and the specific advantages offered by this type of organization have not yet been fully elucidated. In this study, we focussed on the hierarchical structure of the TRN of the gram-positive bacterium Bacillus subtilis and performed a comparative analysis with the TRN of the gram-negative bacterium Escherichia coli. Using a graph-theoretic approach, we organized the transcription factors (TFs) and sigma-factors in the TRNs of B. subtilis and E. coli into three hierarchical levels (Top, Middle and Bottom) and studied several structural and functional properties across them. In addition to many similarities, we found also specific differences, explaining the majority of them with variations in the distribution of s-factors across the hierarchical levels in the two organisms. We then investigated the control of target metabolic genes by transcriptional regulators to characterize the differential regulation of three distinct metabolic subsystems (catabolism, anabolism and central energy metabolism). These results suggest that the hierarchical architecture that we observed in B. subtilis represents an effective organization of its TRN to achieve flexibility in response to a wide range of diverse stimuli.
Resumo:
Cells exhibit a diverse repertoire of dynamic behaviors. These dynamic functions are implemented by circuits of interacting biomolecules. Although these regulatory networks function deterministically by executing specific programs in response to extracellular signals, molecular interactions are inherently governed by stochastic fluctuations. This molecular noise can manifest as cell-to-cell phenotypic heterogeneity in a well-mixed environment. Single-cell variability may seem like a design flaw but the coexistence of diverse phenotypes in an isogenic population of cells can also serve a biological function by increasing the probability of survival of individual cells upon an abrupt change in environmental conditions. Decades of extensive molecular and biochemical characterization have revealed the connectivity and mechanisms that constitute regulatory networks. We are now confronted with the challenge of integrating this information to link the structure of these circuits to systems-level properties such as cellular decision making. To investigate cellular decision-making, we used the well studied galactose gene-regulatory network in \textit{Saccharomyces cerevisiae}. We analyzed the mechanism and dynamics of the coexistence of two stable on and off states for pathway activity. We demonstrate that this bimodality in the pathway activity originates from two positive feedback loops that trigger bistability in the network. By measuring the dynamics of single-cells in a mixed sugar environment, we observe that the bimodality in gene expression is a transient phenomenon. Our experiments indicate that early pathway activation in a cohort of cells prior to galactose metabolism can accelerate galactose consumption and provide a transient increase in growth rate. Together these results provide important insights into strategies implemented by cells that may have been evolutionary advantageous in competitive environments.
Resumo:
Reconstruction of biochemical reaction networks (BRN) and genetic regulatory networks (GRN) in particular is a central topic in systems biology which raises crucial theoretical challenges in system identification. Nonlinear Ordinary Differential Equations (ODEs) that involve polynomial and rational functions are typically used to model biochemical reaction networks. Such nonlinear models make the problem of determining the connectivity of biochemical networks from time-series experimental data quite difficult. In this paper, we present a network reconstruction algorithm that can deal with ODE model descriptions containing polynomial and rational functions. Rather than identifying the parameters of linear or nonlinear ODEs characterised by pre-defined equation structures, our methodology allows us to determine the nonlinear ODEs structure together with their associated parameters. To solve the network reconstruction problem, we cast it as a compressive sensing (CS) problem and use sparse Bayesian learning (SBL) algorithms as a computationally efficient and robust way to obtain its solution. © 2012 IEEE.
Resumo:
Determining how information flows along anatomical brain pathways is a fundamental requirement for understanding how animals perceive their environments, learn, and behave. Attempts to reveal such neural information flow have been made using linear computational methods, but neural interactions are known to be nonlinear. Here, we demonstrate that a dynamic Bayesian network (DBN) inference algorithm we originally developed to infer nonlinear transcriptional regulatory networks from gene expression data collected with microarrays is also successful at inferring nonlinear neural information flow networks from electrophysiology data collected with microelectrode arrays. The inferred networks we recover from the songbird auditory pathway are correctly restricted to a subset of known anatomical paths, are consistent with timing of the system, and reveal both the importance of reciprocal feedback in auditory processing and greater information flow to higher-order auditory areas when birds hear natural as opposed to synthetic sounds. A linear method applied to the same data incorrectly produces networks with information flow to non-neural tissue and over paths known not to exist. To our knowledge, this study represents the first biologically validated demonstration of an algorithm to successfully infer neural information flow networks.
Resumo:
We develop an approach utilizing randomized genotypes to rigorously infer causal regulatory relationships among genes at the transcriptional level, based on experiments in which genotyping and expression profiling are performed. This approach can be used to build transcriptional regulatory networks and to identify putative regulators of genes. We apply the method to an experiment in yeast, in which genes known to be in the same processes and functions are recovered in the resulting transcriptional regulatory network.
Resumo:
In the aftermath of the financial crash of 2008, policy makers operating in international financial regulatory networks discovered macroprudential regulation (MPR), but macroprudential regulation has had a stunted or arrested development that can be explained with reference to five factors that are recounted in this article
Resumo:
Gene expression is a quantitative trait that can be mapped genetically in structured populations to identify expression quantitative trait loci (eQTL). Genes and regulatory networks underlying complex traits can subsequently be inferred. Using a recently released genome sequence, we have defined cis- and trans-eQTL and their environmental response to low phosphorus (P) availability within a complex plant genome and found hotspots of trans-eQTL within the genome. Interval mapping, using P supply as a covariate, revealed 18,876 eQTL. trans-eQTL hotspots occurred on chromosomes A06 and A01 within Brassica rapa; these were enriched with P metabolism-related Gene Ontology terms (A06) as well as chloroplast-and photosynthesis-related terms (A01). We have also attributed heritability components to measures of gene expression across environments, allowing the identification of novel gene expression markers and gene expression changes associated with low P availability. Informative gene expression markers were used to map eQTL and P use efficiency-related QTL. Genes responsive to P supply had large environmental and heritable variance components. Regulatory loci and genes associated with P use efficiency identified through eQTL analysis are potential targets for further characterization and may have potential for crop improvement.
Resumo:
Transcription factors (TFs) are major players in gene regulatory networks and interactions between TFs and their target genes furnish spatiotemporal patterns of gene expression. Establishing the architecture of regulatory networks requires gathering information on TFs, their targets in the genome, and the corresponding binding sites. We have developed GRASSIUS (Grass Regulatory Information Services) as a knowledge-based Web resource that integrates information on TFs and gene promoters across the grasses. In its initial implementation, GRASSIUS consists of two separate, yet linked, databases. GrassTFDB holds information on TFs from maize (Zea mays), sorghum (Sorghum bicolor), sugarcane (Saccharum spp.), and rice (Oryza sativa). TFs are classified into families and phylogenetic relationships begin to uncover orthologous relationships among the participating species. This database also provides a centralized clearinghouse for TF synonyms in the grasses. GrassTFDB is linked to the grass TFome collection, which provides clones in recombination-based vectors corresponding to full-length open reading frames for a growing number of grass TFs. GrassPROMDB contains promoter and cis-regulatory element information for those grass species and genes for which enough data are available. The integration of GrassTFDB and GrassPROMDB will be accomplished through GrassRegNet as a first step in representing the architecture of grass regulatory networks. GRASSIUS can be accessed from www.grassius.org.
Resumo:
Pós-graduação em Ciências Biológicas (Genética) - IBB
Resumo:
Modern sugarcane cultivars are complex hybrids resulting from crosses among several Saccharum species. Traditional breeding methods have been employed extensively in different countries over the past decades to develop varieties with increased sucrose yield and resistance to pests and diseases. Conventional variety improvement, however, may be limited by the narrow pool of suitable genes. Thus, molecular genetics is seen as a promising tool to assist in the process of developing improved varieties. The SUCEST-FUN Project (http://sucest-fun.org) aims to associate function with sugarcane genes using a variety of tools, in particular those that enable the study of the sugarcane transcriptome. An extensive analysis has been conducted to characterise, phenotypically, sugarcane genotypes with regard to their sucrose content, biomass and drought responses. Through the analysis of different cultivars, genes associated with sucrose content, yield, lignin and drought have been identified. Currently, tools are being developed to determine signalling and regulatory networks in grasses, and to sequence the sugarcane genome, as well as to identify sugarcane promoters. This is being implemented through the SUCEST-FUN (http://sucest-fun.org) and GRASSIUS databases (http://grassius.org), the cloning of sugarcane promoters, the identification of cis-regulatory elements (CRE) using Chromatin Immunoprecipitation-sequencing (ChIP-Seq) and the generation of a comprehensive Signal Transduction and Transcription gene catalogue (SUCAST Catalogue).
Resumo:
Abstract Background A popular model for gene regulatory networks is the Boolean network model. In this paper, we propose an algorithm to perform an analysis of gene regulatory interactions using the Boolean network model and time-series data. Actually, the Boolean network is restricted in the sense that only a subset of all possible Boolean functions are considered. We explore some mathematical properties of the restricted Boolean networks in order to avoid the full search approach. The problem is modeled as a Constraint Satisfaction Problem (CSP) and CSP techniques are used to solve it. Results We applied the proposed algorithm in two data sets. First, we used an artificial dataset obtained from a model for the budding yeast cell cycle. The second data set is derived from experiments performed using HeLa cells. The results show that some interactions can be fully or, at least, partially determined under the Boolean model considered. Conclusions The algorithm proposed can be used as a first step for detection of gene/protein interactions. It is able to infer gene relationships from time-series data of gene expression, and this inference process can be aided by a priori knowledge available.
Resumo:
Abstract Background The structure of regulatory networks remains an open question in our understanding of complex biological systems. Interactions during complete viral life cycles present unique opportunities to understand how host-parasite network take shape and behave. The Anticarsia gemmatalis multiple nucleopolyhedrovirus (AgMNPV) is a large double-stranded DNA virus, whose genome may encode for 152 open reading frames (ORFs). Here we present the analysis of the ordered cascade of the AgMNPV gene expression. Results We observed an earlier onset of the expression than previously reported for other baculoviruses, especially for genes involved in DNA replication. Most ORFs were expressed at higher levels in a more permissive host cell line. Genes with more than one copy in the genome had distinct expression profiles, which could indicate the acquisition of new functionalities. The transcription gene regulatory network (GRN) for 149 ORFs had a modular topology comprising five communities of highly interconnected nodes that separated key genes that are functionally related on different communities, possibly maximizing redundancy and GRN robustness by compartmentalization of important functions. Core conserved functions showed expression synchronicity, distinct GRN features and significantly less genetic diversity, consistent with evolutionary constraints imposed in key elements of biological systems. This reduced genetic diversity also had a positive correlation with the importance of the gene in our estimated GRN, supporting a relationship between phylogenetic data of baculovirus genes and network features inferred from expression data. We also observed that gene arrangement in overlapping transcripts was conserved among related baculoviruses, suggesting a principle of genome organization. Conclusions Albeit with a reduced number of nodes (149), the AgMNPV GRN had a topology and key characteristics similar to those observed in complex cellular organisms, which indicates that modularity may be a general feature of biological gene regulatory networks.
Resumo:
Real living cell is a complex system governed by many process which are not yet fully understood: the process of cell differentiation is one of these. In this thesis work we make use of a cell differentiation model to develop gene regulatory networks (Boolean networks) with desired differentiation dynamics. To accomplish this task we have introduced techniques of automatic design and we have performed experiments using various differentiation trees. The results obtained have shown that the developed algorithms, except the Random algorithm, are able to generate Boolean networks with interesting differentiation dynamics. Moreover, we have presented some possible future applications and developments of the cell differentiation model in robotics and in medical research. Understanding the mechanisms involved in biological cells can gives us the possibility to explain some not yet understood dangerous disease, i.e the cancer. Le cellula è un sistema complesso governato da molti processi ancora non pienamente compresi: il differenziamento cellulare è uno di questi. In questa tesi utilizziamo un modello di differenziamento cellulare per sviluppare reti di regolazione genica (reti Booleane) con dinamiche di differenziamento desiderate. Per svolgere questo compito abbiamo introdotto tecniche di progettazione automatica e abbiamo eseguito esperimenti utilizzando vari alberi di differenziamento. I risultati ottenuti hanno mostrato che gli algoritmi sviluppati, eccetto l'algoritmo Random, sono in grado di poter generare reti Booleane con dinamiche di differenziamento interessanti. Inoltre, abbiamo presentato alcune possibili applicazioni e sviluppi futuri del modello di differenziamento in robotica e nella ricerca medica. Capire i meccanismi alla base del funzionamento cellulare può fornirci la possibilità di spiegare patologie ancora oggi non comprese, come il cancro.
Resumo:
It is system dynamics that determines the function of cells, tissues and organisms. To develop mathematical models and estimate their parameters are an essential issue for studying dynamic behaviors of biological systems which include metabolic networks, genetic regulatory networks and signal transduction pathways, under perturbation of external stimuli. In general, biological dynamic systems are partially observed. Therefore, a natural way to model dynamic biological systems is to employ nonlinear state-space equations. Although statistical methods for parameter estimation of linear models in biological dynamic systems have been developed intensively in the recent years, the estimation of both states and parameters of nonlinear dynamic systems remains a challenging task. In this report, we apply extended Kalman Filter (EKF) to the estimation of both states and parameters of nonlinear state-space models. To evaluate the performance of the EKF for parameter estimation, we apply the EKF to a simulation dataset and two real datasets: JAK-STAT signal transduction pathway and Ras/Raf/MEK/ERK signaling transduction pathways datasets. The preliminary results show that EKF can accurately estimate the parameters and predict states in nonlinear state-space equations for modeling dynamic biochemical networks.