990 resultados para Genomics data


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Abstract : The human body is composed of a huge number of cells acting together in a concerted manner. The current understanding is that proteins perform most of the necessary activities in keeping a cell alive. The DNA, on the other hand, stores the information on how to produce the different proteins in the genome. Regulating gene transcription is the first important step that can thus affect the life of a cell, modify its functions and its responses to the environment. Regulation is a complex operation that involves specialized proteins, the transcription factors. Transcription factors (TFs) can bind to DNA and activate the processes leading to the expression of genes into new proteins. Errors in this process may lead to diseases. In particular, some transcription factors have been associated with a lethal pathological state, commonly known as cancer, associated with uncontrolled cellular proliferation, invasiveness of healthy tissues and abnormal responses to stimuli. Understanding cancer-related regulatory programs is a difficult task, often involving several TFs interacting together and influencing each other's activity. This Thesis presents new computational methodologies to study gene regulation. In addition we present applications of our methods to the understanding of cancer-related regulatory programs. The understanding of transcriptional regulation is a major challenge. We address this difficult question combining computational approaches with large collections of heterogeneous experimental data. In detail, we design signal processing tools to recover transcription factors binding sites on the DNA from genome-wide surveys like chromatin immunoprecipitation assays on tiling arrays (ChIP-chip). We then use the localization about the binding of TFs to explain expression levels of regulated genes. In this way we identify a regulatory synergy between two TFs, the oncogene C-MYC and SP1. C-MYC and SP1 bind preferentially at promoters and when SP1 binds next to C-NIYC on the DNA, the nearby gene is strongly expressed. The association between the two TFs at promoters is reflected by the binding sites conservation across mammals, by the permissive underlying chromatin states 'it represents an important control mechanism involved in cellular proliferation, thereby involved in cancer. Secondly, we identify the characteristics of TF estrogen receptor alpha (hERa) target genes and we study the influence of hERa in regulating transcription. hERa, upon hormone estrogen signaling, binds to DNA to regulate transcription of its targets in concert with its co-factors. To overcome the scarce experimental data about the binding sites of other TFs that may interact with hERa, we conduct in silico analysis of the sequences underlying the ChIP sites using the collection of position weight matrices (PWMs) of hERa partners, TFs FOXA1 and SP1. We combine ChIP-chip and ChIP-paired-end-diTags (ChIP-pet) data about hERa binding on DNA with the sequence information to explain gene expression levels in a large collection of cancer tissue samples and also on studies about the response of cells to estrogen. We confirm that hERa binding sites are distributed anywhere on the genome. However, we distinguish between binding sites near promoters and binding sites along the transcripts. The first group shows weak binding of hERa and high occurrence of SP1 motifs, in particular near estrogen responsive genes. The second group shows strong binding of hERa and significant correlation between the number of binding sites along a gene and the strength of gene induction in presence of estrogen. Some binding sites of the second group also show presence of FOXA1, but the role of this TF still needs to be investigated. Different mechanisms have been proposed to explain hERa-mediated induction of gene expression. Our work supports the model of hERa activating gene expression from distal binding sites by interacting with promoter bound TFs, like SP1. hERa has been associated with survival rates of breast cancer patients, though explanatory models are still incomplete: this result is important to better understand how hERa can control gene expression. Thirdly, we address the difficult question of regulatory network inference. We tackle this problem analyzing time-series of biological measurements such as quantification of mRNA levels or protein concentrations. Our approach uses the well-established penalized linear regression models where we impose sparseness on the connectivity of the regulatory network. We extend this method enforcing the coherence of the regulatory dependencies: a TF must coherently behave as an activator, or a repressor on all its targets. This requirement is implemented as constraints on the signs of the regressed coefficients in the penalized linear regression model. Our approach is better at reconstructing meaningful biological networks than previous methods based on penalized regression. The method is tested on the DREAM2 challenge of reconstructing a five-genes/TFs regulatory network obtaining the best performance in the "undirected signed excitatory" category. Thus, these bioinformatics methods, which are reliable, interpretable and fast enough to cover large biological dataset, have enabled us to better understand gene regulation in humans.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The Microbe browser is a web server providing comparative microbial genomics data. It offers comprehensive, integrated data from GenBank, RefSeq, UniProt, InterPro, Gene Ontology and the Orthologs Matrix Project (OMA) database, displayed along with gene predictions from five software packages. The Microbe browser is daily updated from the source databases and includes all completely sequenced bacterial and archaeal genomes. The data are displayed in an easy-to-use, interactive website based on Ensembl software. The Microbe browser is available at http://microbe.vital-it.ch/. Programmatic access is available through the OMA application programming interface (API) at http://microbe.vital-it.ch/api.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The last few years have seen the advent of high-throughput technologies to analyze various properties of the transcriptome and proteome of several organisms. The congruency of these different data sources, or lack thereof, can shed light on the mechanisms that govern cellular function. A central challenge for bioinformatics research is to develop a unified framework for combining the multiple sources of functional genomics information and testing associations between them, thus obtaining a robust and integrated view of the underlying biology. We present a graph theoretic approach to test the significance of the association between multiple disparate sources of functional genomics data by proposing two statistical tests, namely edge permutation and node label permutation tests. We demonstrate the use of the proposed tests by finding significant association between a Gene Ontology-derived "predictome" and data obtained from mRNA expression and phenotypic experiments for Saccharomyces cerevisiae. Moreover, we employ the graph theoretic framework to recast a surprising discrepancy presented in Giaever et al. (2002) between gene expression and knockout phenotype, using expression data from a different set of experiments.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Cloud computing provides a promising solution to the genomics data deluge problem resulting from the advent of next-generation sequencing (NGS) technology. Based on the concepts of “resources-on-demand” and “pay-as-you-go”, scientists with no or limited infrastructure can have access to scalable and cost-effective computational resources. However, the large size of NGS data causes a significant data transfer latency from the client’s site to the cloud, which presents a bottleneck for using cloud computing services. In this paper, we provide a streaming-based scheme to overcome this problem, where the NGS data is processed while being transferred to the cloud. Our scheme targets the wide class of NGS data analysis tasks, where the NGS sequences can be processed independently from one another. We also provide the elastream package that supports the use of this scheme with individual analysis programs or with workflow systems. Experiments presented in this paper show that our solution mitigates the effect of data transfer latency and saves both time and cost of computation.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

BACKGROUND: Pseudogenes have long been considered as nonfunctional genomic sequences. However, recent evidence suggests that many of them might have some form of biological activity, and the possibility of functionality has increased interest in their accurate annotation and integration with functional genomics data. RESULTS: As part of the GENCODE annotation of the human genome, we present the first genome-wide pseudogene assignment for protein-coding genes, based on both large-scale manual annotation and in silico pipelines. A key aspect of this coupled approach is that it allows us to identify pseudogenes in an unbiased fashion as well as untangle complex events through manual evaluation. We integrate the pseudogene annotations with the extensive ENCODE functional genomics information. In particular, we determine the expression level, transcription-factor and RNA polymerase II binding, and chromatin marks associated with each pseudogene. Based on their distribution, we develop simple statistical models for each type of activity, which we validate with large-scale RT-PCR-Seq experiments. Finally, we compare our pseudogenes with conservation and variation data from primate alignments and the 1000 Genomes project, producing lists of pseudogenes potentially under selection. CONCLUSIONS: At one extreme, some pseudogenes possess conventional characteristics of functionality; these may represent genes that have recently died. On the other hand, we find interesting patterns of partial activity, which may suggest that dead genes are being resurrected as functioning non-coding RNAs. The activity data of each pseudogene are stored in an associated resource, psiDR, which will be useful for the initial identification of potentially functional pseudogenes.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Mammalian physiology and behavior follow daily rhythms that are orchestrated by endogenous timekeepers known as circadian clocks. Rhythms in transcription are considered the main mechanism to engender rhythmic gene expression, but important roles for posttranscriptional mechanisms have recently emerged as well (reviewed in Lim and Allada (2013) [1]). We have recently reported on the use of ribosome profiling (RPF-seq), a method based on the high-throughput sequencing of ribosome protected mRNA fragments, to explore the temporal regulation of translation efficiency (Janich et al., 2015 [2]). Through the comparison of around-the-clock RPF-seq and matching RNA-seq data we were able to identify 150 genes, involved in ribosome biogenesis, iron metabolism and other pathways, whose rhythmicity is generated entirely at the level of protein synthesis. The temporal transcriptome and translatome data sets from this study have been deposited in NCBI's Gene Expression Omnibus under the accession number GSE67305. Here we provide additional information on the experimental setup and on important optimization steps pertaining to the ribosome profiling technique in mouse liver and to data analysis.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Paracoccidioides lutzii, formerly known as 'Pb01-like' strains in the P. brasiliensis complex, is proposed as a new species based on phylogenetic and comparative genomics data, recombination analysis, and morphological characteristics. Conidia of P. lutzii are elongated, different from those of P. brasiliensis. P. lutzii occurs in the central and northern regions of Brazil. Studies comparing P. brasiliensis and P. lutzii may have significant clinical consequences for the diagnosis and treatment of paracoccidioidomycosis.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Laryngeal squamous cell carcinoma (LSCC) is one of the most common malignancies of the head and neck tumors Zhang et al., 2013 [1]). Previous studies have associated its occurrence with social activities, such as tobacco and alcohol consumption (Hashibe et al., 2007a [2]; Hashibe et al., 2007b [3]; Shangina et al., 2006 [4]). Here, we performed a genome-wide gene expression profiling in thirty-one patients positively diagnosed for LSCC, in order to investigate new targets involved in tumorigenesis.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The apicomplexan parasite, Theileria annulata, is the causative agent of tropical theileriosis, a devastating lymphoproliferative disease of cattle. The schizont stage transforms bovine leukocytes and provides an intriguing model to study host/pathogen interactions. The genome of T. annulata has been sequenced and transcriptomic data are rapidly accumulating. In contrast, little is known about the proteome of the schizont, the pathogenic, transforming life cycle stage of the parasite. Using one-dimensional (1-D) gel LC-MS/MS, a proteomic analysis of purified T. annulata schizonts was carried out. In whole parasite lysates, 645 proteins were identified. Proteins with transmembrane domains (TMDs) were under-represented and no proteins with more than four TMDs could be detected. To tackle this problem, Triton X-114 treatment was applied, which facilitates the extraction of membrane proteins, followed by 1-D gel LC-MS/MS. This resulted in the identification of an additional 153 proteins. Half of those had one or more TMD and 30 proteins with more than four TMDs were identified. This demonstrates that Triton X-114 treatment can provide a valuable additional tool for the identification of new membrane proteins in proteomic studies. With two exceptions, all proteins involved in glycolysis and the citric acid cycle were identified. For at least 29% of identified proteins, the corresponding transcripts were not present in the existing expressed sequence tag databases. The proteomics data were integrated into the publicly accessible database resource at EuPathDB (www.eupathdb.org) so that mass spectrometry-based protein expression evidence for T. annulata can be queried alongside transcriptional and other genomics data available for these parasites.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Diffuse gliomas are highly lethal central nervous system malignancies which, unfortunately, are the most common primary brain tumor and also the least responsive to the very few therapeutic modalities currently available to treat them. IGFBP2 is a newly recognized oncogene that is operative in multiple cancer types, including glioma, and shows promise for a targeted therapeutic approach. Elevated IGFBP2 expression is present in high-grade glioma and correlates with poor survival. We have previously demonstrated that IGFBP2 induces glioma development and progression in a spontaneous glioma mouse model, which highlighted its significance and potential for future therapy. However, we did not yet know the key physiological pathways associated with this newly characterized oncogene. We first evaluated human glioma genomics data harnessed from the publicly available Rembrandt source to identify major pathways associated with IGFBP2 expression. Integrin and ILK, among other cell migration and invasion-related pathways, were the most prominently associated. We confirmed that these pathways are regulated by IGFBP2 in glioma cells lines, and demonstrated that 1) IGFBP2 activates integrin α5β1, leading to the activation of key pathways important in glioma; 2) IGFBP2 mediates cell migration pathways through ILK; and 3) IGFBP2 activates NF-kB via an integrin α5 interaction. We then sought to determine whether this was a physiologically active signaling pathway in vivo by assessing its ability to induce glioma progression in the RCAS/tv-a spontaneous glioma mouse model. We found that ILK is a key downstream mediator of IGFBP2 that is required for the induction of glioma progression. Most significantly, a genetic therapeutic approach revealed that perturbation of any point in the pathway thwarted tumor progression, providing strong evidence that targeting the key players could potentially produce a significant benefit for human glioma patients. The elucidation of this signaling pathway is a critical step, since efforts to create a small molecule drug targeting IGFBP2 have so far not been successful, but a number of inhibitors of the other pathway constituents, including ILK, integrin and NF-kB, have been developed.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Amidases [EC 3.5.1.4] capable of converting indole-3-acetamide (IAM) into the major plant growth hormone indole-3-acetic acid (IAA) are assumed to be involved in auxin de novo biosynthesis. With the emerging amount of genomics data, it was possible to identify over forty proteins with substantial homology to the already characterized amidases from Arabidopsis and tobacco. The observed high conservation of amidase-like proteins throughout the plant kingdom may suggest an important role of theses enzymes in plant development. Here, we report cloning and functional analysis of four, thus far, uncharacterized plant amidases from Oryza sativa, Sorghum bicolor, Medicago truncatula, and Populus trichocarpa. Intriguingly, we were able to demonstrate that the examined amidases are also capable of converting phenyl-2-acetamide (PAM) into phenyl-2-acetic acid (PAA), an auxin endogenous to several plant species including Arabidopsis. Furthermore, we compared the subcellular localization of the enzymes to that of Arabidopsis AMI1, providing further evidence for similar enzymatic functions. Our results point to the presence of a presumably conserved pathway of auxin biosynthesis via IAM, as amidases, both of monocot, and dicot origins, were analyzed.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

To carry out their specific roles in the cell, genes and gene products often work together in groups, forming many relationships among themselves and with other molecules. Such relationships include physical protein-protein interaction relationships, regulatory relationships, metabolic relationships, genetic relationships, and much more. With advances in science and technology, some high throughput technologies have been developed to simultaneously detect tens of thousands of pairwise protein-protein interactions and protein-DNA interactions. However, the data generated by high throughput methods are prone to noise. Furthermore, the technology itself has its limitations, and cannot detect all kinds of relationships between genes and their products. Thus there is a pressing need to investigate all kinds of relationships and their roles in a living system using bioinformatic approaches, and is a central challenge in Computational Biology and Systems Biology. This dissertation focuses on exploring relationships between genes and gene products using bioinformatic approaches. Specifically, we consider problems related to regulatory relationships, protein-protein interactions, and semantic relationships between genes. A regulatory element is an important pattern or "signal", often located in the promoter of a gene, which is used in the process of turning a gene "on" or "off". Predicting regulatory elements is a key step in exploring the regulatory relationships between genes and gene products. In this dissertation, we consider the problem of improving the prediction of regulatory elements by using comparative genomics data. With regard to protein-protein interactions, we have developed bioinformatics techniques to estimate support for the data on these interactions. While protein-protein interactions and regulatory relationships can be detected by high throughput biological techniques, there is another type of relationship called semantic relationship that cannot be detected by a single technique, but can be inferred using multiple sources of biological data. The contributions of this thesis involved the development and application of a set of bioinformatic approaches that address the challenges mentioned above. These included (i) an EM-based algorithm that improves the prediction of regulatory elements using comparative genomics data, (ii) an approach for estimating the support of protein-protein interaction data, with application to functional annotation of genes, (iii) a novel method for inferring functional network of genes, and (iv) techniques for clustering genes using multi-source data.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Even though a large amount of evidence would suggest that PP2A serine/threonine protein phosphatase acts as a tumour suppressor the genomics data to support this claim is limited. We fit a sparse binary Markov random field with individual sample's total mutational frequency as an additional covariate to model the dependencies between the mutations occurring in the PP2A encoding genes. We utilize the data from recent large scale cancer genomics studies, where the whole genome from a human tumour biopsy has been analysed. Our results show a complex network of interactions between the occurrence of mutations in our twenty examined genes. According to our analysis the mutations occurring in the genes PPP2R1A, PPP2R3A, and PPP2R2B are identified as the key mutations. These genes form the core of the network of conditional dependency between the mutations in the investigated twenty genes. Additionally, we note that the mutations occurring in PPP2R4 seem to be more influential in samples with higher number of total mutations. The mutations occurring in the set of genes suggested by our results has been shown to contribute to the transformation of human cells. We conclude that our evidence further supports the claim that PP2A acts as a tumour suppressor and restoring PP2A activity is an appealing therapeutic strategy.