945 resultados para Bioinformatics
Resumo:
Abstract Background Heterologous promoter analysis in Plasmodium has revealed the existence of conserved cis regulatory elements as promoters from different species can drive expression of reporter genes in heterologous transfection assays. Here, the functional characterization of different Plasmodium vivax promoters in Plasmodium falciparum using luciferase as the reporter gene is presented. Methods Luciferase reporter plasmids harboring the upstream regions of the msp1, dhfr, and vir3 genes as well as the full-length intergenic regions of the vir23/24 and ef-1α genes of P. vivax were constructed and transiently transfected in P. falciparum. Results Only the constructs with the full-length intergenic regions of the vir23/24 and ef-1α genes were recognized by the P. falciparum transcription machinery albeit to values approximately two orders of magnitude lower than those reported by luc plasmids harbouring promoter regions from P. falciparum and Plasmodium berghei. A bioinformatics approach allowed the identification of a motif (GCATAT) in the ef-1α intergenic region that is conserved in five Plasmodium species but is degenerate (GCANAN) in P. vivax. Mutations of this motif in the P. berghei ef-1α promoter region decreased reporter expression indicating it is active in gene expression in Plasmodium. Conclusion Together, this data indicates that promoter regions of P. vivax are poorly or not recognized by the P. falciparum transcription machinery suggesting the existence of P. vivax-specific transcription regulatory elements.
A Robust Structural PGN Model for Control of Cell-Cycle Progression Stabilized by Negative Feedbacks
Resumo:
The cell division cycle comprises a sequence of phenomena controlled by a stable and robust genetic network. We applied a probabilistic genetic network (PGN) to construct a hypothetical model with a dynamical behavior displaying the degree of robustness typical of the biological cell cycle. The structure of our PGN model was inspired in well-established biological facts such as the existence of integrator subsystems, negative and positive feedback loops, and redundant signaling pathways. Our model represents genes interactions as stochastic processes and presents strong robustness in the presence of moderate noise and parameters fluctuations. A recently published deterministic yeast cell-cycle model does not perform as well as our PGN model, even upon moderate noise conditions. In addition, self stimulatory mechanisms can give our PGN model the possibility of having a pacemaker activity similar to the observed in the oscillatory embryonic cell cycle.
Resumo:
Abstract Background Smallpox is a lethal disease that was endemic in many parts of the world until eradicated by massive immunization. Due to its lethality, there are serious concerns about its use as a bioweapon. Here we analyze publicly available microarray data to further understand survival of smallpox infected macaques, using systems biology approaches. Our goal is to improve the knowledge about the progression of this disease. Results We used KEGG pathways annotations to define groups of genes (or modules), and subsequently compared them to macaque survival times. This technique provided additional insights about the host response to this disease, such as increased expression of the cytokines and ECM receptors in the individuals with higher survival times. These results could indicate that these gene groups could influence an effective response from the host to smallpox. Conclusion Macaques with higher survival times clearly express some specific pathways previously unidentified using regular gene-by-gene approaches. Our work also shows how third party analysis of public datasets can be important to support new hypotheses to relevant biological problems.
Resumo:
Abstract Background To understand the molecular mechanisms underlying important biological processes, a detailed description of the gene products networks involved is required. In order to define and understand such molecular networks, some statistical methods are proposed in the literature to estimate gene regulatory networks from time-series microarray data. However, several problems still need to be overcome. Firstly, information flow need to be inferred, in addition to the correlation between genes. Secondly, we usually try to identify large networks from a large number of genes (parameters) originating from a smaller number of microarray experiments (samples). Due to this situation, which is rather frequent in Bioinformatics, it is difficult to perform statistical tests using methods that model large gene-gene networks. In addition, most of the models are based on dimension reduction using clustering techniques, therefore, the resulting network is not a gene-gene network but a module-module network. Here, we present the Sparse Vector Autoregressive model as a solution to these problems. Results We have applied the Sparse Vector Autoregressive model to estimate gene regulatory networks based on gene expression profiles obtained from time-series microarray experiments. Through extensive simulations, by applying the SVAR method to artificial regulatory networks, we show that SVAR can infer true positive edges even under conditions in which the number of samples is smaller than the number of genes. Moreover, it is possible to control for false positives, a significant advantage when compared to other methods described in the literature, which are based on ranks or score functions. By applying SVAR to actual HeLa cell cycle gene expression data, we were able to identify well known transcription factor targets. Conclusion The proposed SVAR method is able to model gene regulatory networks in frequent situations in which the number of samples is lower than the number of genes, making it possible to naturally infer partial Granger causalities without any a priori information. In addition, we present a statistical test to control the false discovery rate, which was not previously possible using other gene regulatory network models.
Resumo:
Abstract Background A popular model for gene regulatory networks is the Boolean network model. In this paper, we propose an algorithm to perform an analysis of gene regulatory interactions using the Boolean network model and time-series data. Actually, the Boolean network is restricted in the sense that only a subset of all possible Boolean functions are considered. We explore some mathematical properties of the restricted Boolean networks in order to avoid the full search approach. The problem is modeled as a Constraint Satisfaction Problem (CSP) and CSP techniques are used to solve it. Results We applied the proposed algorithm in two data sets. First, we used an artificial dataset obtained from a model for the budding yeast cell cycle. The second data set is derived from experiments performed using HeLa cells. The results show that some interactions can be fully or, at least, partially determined under the Boolean model considered. Conclusions The algorithm proposed can be used as a first step for detection of gene/protein interactions. It is able to infer gene relationships from time-series data of gene expression, and this inference process can be aided by a priori knowledge available.
Resumo:
Background The use of the knowledge produced by sciences to promote human health is the main goal of translational medicine. To make it feasible we need computational methods to handle the large amount of information that arises from bench to bedside and to deal with its heterogeneity. A computational challenge that must be faced is to promote the integration of clinical, socio-demographic and biological data. In this effort, ontologies play an essential role as a powerful artifact for knowledge representation. Chado is a modular ontology-oriented database model that gained popularity due to its robustness and flexibility as a generic platform to store biological data; however it lacks supporting representation of clinical and socio-demographic information. Results We have implemented an extension of Chado – the Clinical Module - to allow the representation of this kind of information. Our approach consists of a framework for data integration through the use of a common reference ontology. The design of this framework has four levels: data level, to store the data; semantic level, to integrate and standardize the data by the use of ontologies; application level, to manage clinical databases, ontologies and data integration process; and web interface level, to allow interaction between the user and the system. The clinical module was built based on the Entity-Attribute-Value (EAV) model. We also proposed a methodology to migrate data from legacy clinical databases to the integrative framework. A Chado instance was initialized using a relational database management system. The Clinical Module was implemented and the framework was loaded using data from a factual clinical research database. Clinical and demographic data as well as biomaterial data were obtained from patients with tumors of head and neck. We implemented the IPTrans tool that is a complete environment for data migration, which comprises: the construction of a model to describe the legacy clinical data, based on an ontology; the Extraction, Transformation and Load (ETL) process to extract the data from the source clinical database and load it in the Clinical Module of Chado; the development of a web tool and a Bridge Layer to adapt the web tool to Chado, as well as other applications. Conclusions Open-source computational solutions currently available for translational science does not have a model to represent biomolecular information and also are not integrated with the existing bioinformatics tools. On the other hand, existing genomic data models do not represent clinical patient data. A framework was developed to support translational research by integrating biomolecular information coming from different “omics” technologies with patient’s clinical and socio-demographic data. This framework should present some features: flexibility, compression and robustness. The experiments accomplished from a use case demonstrated that the proposed system meets requirements of flexibility and robustness, leading to the desired integration. The Clinical Module can be accessed in http://dcm.ffclrp.usp.br/caib/pg=iptrans webcite.
Resumo:
Background Trypanosomatids of the genera Angomonas and Strigomonas live in a mutualistic association characterized by extensive metabolic cooperation with obligate endosymbiotic Betaproteobacteria. However, the role played by the symbiont has been more guessed by indirect means than evidenced. Symbiont-harboring trypanosomatids, in contrast to their counterparts lacking symbionts, exhibit lower nutritional requirements and are autotrophic for essential amino acids. To evidence the symbiont’s contributions to this autotrophy, entire genomes of symbionts and trypanosomatids with and without symbionts were sequenced here. Results Analyses of the essential amino acid pathways revealed that most biosynthetic routes are in the symbiont genome. By contrast, the host trypanosomatid genome contains fewer genes, about half of which originated from different bacterial groups, perhaps only one of which (ornithine cyclodeaminase, EC:4.3.1.12) derived from the symbiont. Nutritional, enzymatic, and genomic data were jointly analyzed to construct an integrated view of essential amino acid metabolism in symbiont-harboring trypanosomatids. This comprehensive analysis showed perfect concordance among all these data, and revealed that the symbiont contains genes for enzymes that complete essential biosynthetic routes for the host amino acid production, thus explaining the low requirement for these elements in symbiont-harboring trypanosomatids. Phylogenetic analyses show that the cooperation between symbionts and their hosts is complemented by multiple horizontal gene transfers, from bacterial lineages to trypanosomatids, that occurred several times in the course of their evolution. Transfers occur preferentially in parts of the pathways that are missing from other eukaryotes. Conclusion We have herein uncovered the genetic and evolutionary bases of essential amino acid biosynthesis in several trypanosomatids with and without endosymbionts, explaining and complementing decades of experimental results. We uncovered the remarkable plasticity in essential amino acid biosynthesis pathway evolution in these protozoans, demonstrating heavy influence of horizontal gene transfer events, from Bacteria to trypanosomatid nuclei, in the evolution of these pathways.
Resumo:
Abstract Background Recent medical and biological technology advances have stimulated the development of new testing systems that have been providing huge, varied amounts of molecular and clinical data. Growing data volumes pose significant challenges for information processing systems in research centers. Additionally, the routines of genomics laboratory are typically characterized by high parallelism in testing and constant procedure changes. Results This paper describes a formal approach to address this challenge through the implementation of a genetic testing management system applied to human genome laboratory. We introduced the Human Genome Research Center Information System (CEGH) in Brazil, a system that is able to support constant changes in human genome testing and can provide patients updated results based on the most recent and validated genetic knowledge. Our approach uses a common repository for process planning to ensure reusability, specification, instantiation, monitoring, and execution of processes, which are defined using a relational database and rigorous control flow specifications based on process algebra (ACP). The main difference between our approach and related works is that we were able to join two important aspects: 1) process scalability achieved through relational database implementation, and 2) correctness of processes using process algebra. Furthermore, the software allows end users to define genetic testing without requiring any knowledge about business process notation or process algebra. Conclusions This paper presents the CEGH information system that is a Laboratory Information Management System (LIMS) based on a formal framework to support genetic testing management for Mendelian disorder studies. We have proved the feasibility and showed usability benefits of a rigorous approach that is able to specify, validate, and perform genetic testing using easy end user interfaces.
Resumo:
Background: The insect exoskeleton provides shape, waterproofing, and locomotion via attached somatic muscles. The exoskeleton is renewed during molting, a process regulated by ecdysteroid hormones. The holometabolous pupa transforms into an adult during the imaginal molt, when the epidermis synthe3sizes the definitive exoskeleton that then differentiates progressively. An important issue in insect development concerns how the exoskeletal regions are constructed to provide their morphological, physiological and mechanical functions. We used whole-genome oligonucleotide microarrays to screen for genes involved in exoskeletal formation in the honeybee thoracic dorsum. Our analysis included three sampling times during the pupal-to-adult molt, i.e., before, during and after the ecdysteroid-induced apolysis that triggers synthesis of the adult exoskeleton. Results: Gene ontology annotation based on orthologous relationships with Drosophila melanogaster genes placed the honeybee differentially expressed genes (DEGs) into distinct categories of Biological Process and Molecular Function, depending on developmental time, revealing the functional elements required for adult exoskeleton formation. Of the 1,253 unique DEGs, 547 were upregulated in the thoracic dorsum after apolysis, suggesting induction by the ecdysteroid pulse. The upregulated gene set included 20 of the 47 cuticular protein (CP) genes that were previously identified in the honeybee genome, and three novel putative CP genes that do not belong to a known CP family. In situ hybridization showed that two of the novel genes were abundantly expressed in the epidermis during adult exoskeleton formation, strongly implicating them as genuine CP genes. Conserved sequence motifs identified the CP genes as members of the CPR, Tweedle, Apidermin, CPF, CPLCP1 and Analogous-to-Peritrophins families. Furthermore, 28 of the 36 muscle-related DEGs were upregulated during the de novo formation of striated fibers attached to the exoskeleton. A search for cis-regulatory motifs in the 5′-untranslated region of the DEGs revealed potential binding sites for known transcription factors. Construction of a regulatory network showed that various upregulated CP- and muscle-related genes (15 and 21 genes, respectively) share common elements, suggesting co-regulation during thoracic exoskeleton formation. Conclusions: These findings help reveal molecular aspects of rigid thoracic exoskeleton formation during the ecdysteroid-coordinated pupal-to-adult molt in the honeybee.
Resumo:
Abstract Background Pancreatic ductal adenocarcinoma (PDAC) is known by its aggressiveness and lack of effective therapeutic options. Thus, improvement in current knowledge of molecular changes associated with pancreatic cancer is urgently needed to explore novel venues of diagnostics and treatment of this dismal disease. While there is mounting evidence that long noncoding RNAs (lncRNAs) transcribed from intronic and intergenic regions of the human genome may play different roles in the regulation of gene expression in normal and cancer cells, their expression pattern and biological relevance in pancreatic cancer is currently unknown. In the present work we investigated the relative abundance of a collection of lncRNAs in patients' pancreatic tissue samples aiming at identifying gene expression profiles correlated to pancreatic cancer and metastasis. Methods Custom 3,355-element spotted cDNA microarray interrogating protein-coding genes and putative lncRNA were used to obtain expression profiles from 38 clinical samples of tumor and non-tumor pancreatic tissues. Bioinformatics analyses were performed to characterize structure and conservation of lncRNAs expressed in pancreatic tissues, as well as to identify expression signatures correlated to tissue histology. Strand-specific reverse transcription followed by PCR and qRT-PCR were employed to determine strandedness of lncRNAs and to validate microarray results, respectively. Results We show that subsets of intronic/intergenic lncRNAs are expressed across tumor and non-tumor pancreatic tissue samples. Enrichment of promoter-associated chromatin marks and over-representation of conserved DNA elements and stable secondary structure predictions suggest that these transcripts are generated from independent transcriptional units and that at least a fraction is under evolutionary selection, and thus potentially functional. Statistically significant expression signatures comprising protein-coding mRNAs and lncRNAs that correlate to PDAC or to pancreatic cancer metastasis were identified. Interestingly, loci harboring intronic lncRNAs differentially expressed in PDAC metastases were enriched in genes associated to the MAPK pathway. Orientation-specific RT-PCR documented that intronic transcripts are expressed in sense, antisense or both orientations relative to protein-coding mRNAs. Differential expression of a subset of intronic lncRNAs (PPP3CB, MAP3K14 and DAPK1 loci) in metastatic samples was confirmed by Real-Time PCR. Conclusion Our findings reveal sets of intronic lncRNAs expressed in pancreatic tissues whose abundance is correlated to PDAC or metastasis, thus pointing to the potential relevance of this class of transcripts in biological processes related to malignant transformation and metastasis in pancreatic cancer.
Resumo:
Abstract Background HCV is prevalent throughout the world. It is a major cause of chronic liver disease. There is no effective vaccine and the most common therapy, based on Peginterferon, has a success rate of ~50%. The mechanisms underlying viral resistance have not been elucidated but it has been suggested that both host and virus contribute to therapy outcome. Non-structural 5A (NS5A) protein, a critical virus component, is involved in cellular and viral processes. Methods The present study analyzed structural and functional features of 345 sequences of HCV-NS5A genotypes 1 or 3, using in silico tools. Results There was residue type composition and secondary structure differences between the genotypes. In addition, second structural variance were statistical different for each response group in genotype 3. A motif search indicated conserved glycosylation, phosphorylation and myristoylation sites that could be important in structural stabilization and function. Furthermore, a highly conserved integrin ligation site was identified, and could be linked to nuclear forms of NS5A. ProtFun indicated NS5A to have diverse enzymatic and nonenzymatic activities, participating in a great range of cell functions, with statistical difference between genotypes. Conclusion This study presents new insights into the HCV-NS5A. It is the first study that using bioinformatics tools, suggests differences between genotypes and response to therapy that can be related to NS5A protein features. Therefore, it emphasizes the importance of using bioinformatics tools in viral studies. Data acquired herein will aid in clarifying the structure/function of this protein and in the development of antiviral agents.
Resumo:
Abstract Background The study and analysis of gene expression measurements is the primary focus of functional genomics. Once expression data is available, biologists are faced with the task of extracting (new) knowledge associated to the underlying biological phenomenon. Most often, in order to perform this task, biologists execute a number of analysis activities on the available gene expression dataset rather than a single analysis activity. The integration of heteregeneous tools and data sources to create an integrated analysis environment represents a challenging and error-prone task. Semantic integration enables the assignment of unambiguous meanings to data shared among different applications in an integrated environment, allowing the exchange of data in a semantically consistent and meaningful way. This work aims at developing an ontology-based methodology for the semantic integration of gene expression analysis tools and data sources. The proposed methodology relies on software connectors to support not only the access to heterogeneous data sources but also the definition of transformation rules on exchanged data. Results We have studied the different challenges involved in the integration of computer systems and the role software connectors play in this task. We have also studied a number of gene expression technologies, analysis tools and related ontologies in order to devise basic integration scenarios and propose a reference ontology for the gene expression domain. Then, we have defined a number of activities and associated guidelines to prescribe how the development of connectors should be carried out. Finally, we have applied the proposed methodology in the construction of three different integration scenarios involving the use of different tools for the analysis of different types of gene expression data. Conclusions The proposed methodology facilitates the development of connectors capable of semantically integrating different gene expression analysis tools and data sources. The methodology can be used in the development of connectors supporting both simple and nontrivial processing requirements, thus assuring accurate data exchange and information interpretation from exchanged data.
Resumo:
The control of gene expression by miRNAs has been widely investigated in different species and cell types. Following a probabilistic rather than a deterministic regimen, the action of these short nucleotide sequences on specific genes depends on intracellular concentration,which in turn reflects the balance between biosynthesis and degradation. Recent studies have described the involvement of XRN2, an exoribonuclease, in miRNA degradation and PAPD4, an atypical poly(A) polymerase, in miRNA stability. Herein, we examined the expression of XRN2 and PAPD4 in developing and adult rat hippocampi. Combining bioinformatics and real-time PCR,we demonstrated that XRN2 and PAPD4 expression is regulated by the uncorrelated action of transcription factors, resulting in distinct gene expression profiles during development. Analyses of nuclei position and nestin labeling revealed that both proteins progressively accumulated during neuronal differentiation, and that they are weakly expressed in immature neurons and absent in glial and endothelial cells. Despite the differences in subcellular localization, both genes were concurrently identified within identical neuronal subpopulations, including specific inhibitory interneurons. Thus, we cope with a singular circumstance in biology: an almost complete intersected expression of functional-opposed genes, reinforcing that their antagonistically driven actions on miRNAs “make sense” if simultaneously present at the same cells. Considering that the transcriptome in the nervous system is finely tuned to physiological processes, it was remarkable that miRNA stability-related genes were oncurrently identified in neurons that play essential roles in cognitive functions such as memory and learning. In summary, this study reveals a possible new mechanism for the control of miRNA expression.
Resumo:
Although it is well known that the thyroid hormone (T3) is an important positive regulator of cardiac function over a short term and that it also promotes deleterious effects over a long term, the molecular mechanisms for such effects are not yet well understood. Because most alterations in cardiac function are associated with changes in sarcomeric machinery, the present work was undertaken to find novel sarcomeric hot spots driven by T3 in the heart. A microarray analysis indicated that the M-band is a major hot spot, and the structural sarcomeric gene coding for the M-protein is severely down-regulated by T3. Real-time quantitative PCR-based measurements confirmed that T3 (1, 5, 50, and 100 physiological doses for 2 days) sharply decreased the M-protein gene and protein expression in vivo in a dose-dependent manner. Furthermore, the M-protein gene expression was elevated 3.4-fold in hypothyroid rats. Accordingly, T3 was able to rapidly and strongly reduce the M-protein gene expression in neonatal cardiomyocytes. Deletions at the M-protein promoter and bioinformatics approach suggested an area responsive to T3, which was confirmed by chromatin immunoprecipitation assay. Functional assays in cultured neonatal cardiomyocytes revealed that depletion of M-protein (by small interfering RNA) drives a severe decrease in speed of contraction. Interestingly, mRNA and protein levels of other M-band components, myomesin and embryonic-heart myomesin, were not altered by T3. We concluded that the M-protein expression is strongly and rapidly repressed by T3 in cardiomyocytes, which represents an important aspect for the basis of T3-dependent sarcomeric deleterious effects in the heart.
Resumo:
BACKGROUND: In the alpha subclass of proteobacteria iron homeostasis is controlled by diverse iron responsive regulators. Caulobacter crescentus, an important freshwater α-proteobacterium, uses the ferric uptake repressor (Fur) for such purpose. However, the impact of the iron availability on the C. crescentus transcriptome and an overall perspective of the regulatory networks involved remain unknown. RESULTS: In this work we report the identification of iron-responsive and Fur-regulated genes in C. crescentus using microarray-based global transcriptional analyses. We identified 42 genes that were strongly upregulated both by mutation of fur and by iron limitation condition. Among them, there are genes involved in iron uptake (four TonB-dependent receptor gene clusters, and feoAB), riboflavin biosynthesis and genes encoding hypothetical proteins. Most of these genes are associated with predicted Fur binding sites, implicating them as direct targets of Fur-mediated repression. These data were validated by β-galactosidase and EMSA assays for two operons encoding putative transporters. The role of Fur as a positive regulator is also evident, given that 27 genes were downregulated both by mutation of fur and under low-iron condition. As expected, this group includes many genes involved in energy metabolism, mostly iron-using enzymes. Surprisingly, included in this group are also TonB-dependent receptors genes and the genes fixK, fixT and ftrB encoding an oxygen signaling network required for growth during hypoxia. Bioinformatics analyses suggest that positive regulation by Fur is mainly indirect. In addition to the Fur modulon, iron limitation altered expression of 113 more genes, including induction of genes involved in Fe-S cluster assembly, oxidative stress and heat shock response, as well as repression of genes implicated in amino acid metabolism, chemotaxis and motility. CONCLUSIONS: Using a global transcriptional approach, we determined the C. crescentus iron stimulon. Many but not all of iron responsive genes were directly or indirectly controlled by Fur. The iron limitation stimulon overlaps with other regulatory systems, such as the RpoH and FixK regulons. Altogether, our results showed that adaptation of C. crescentus to iron limitation not only involves increasing the transcription of iron-acquisition systems and decreasing the production of iron-using proteins, but also includes novel genes and regulatory mechanisms