12 resultados para gene expression data
em Duke University
Resumo:
BACKGROUND: Biological processes occur on a vast range of time scales, and many of them occur concurrently. As a result, system-wide measurements of gene expression have the potential to capture many of these processes simultaneously. The challenge however, is to separate these processes and time scales in the data. In many cases the number of processes and their time scales is unknown. This issue is particularly relevant to developmental biologists, who are interested in processes such as growth, segmentation and differentiation, which can all take place simultaneously, but on different time scales. RESULTS: We introduce a flexible and statistically rigorous method for detecting different time scales in time-series gene expression data, by identifying expression patterns that are temporally shifted between replicate datasets. We apply our approach to a Saccharomyces cerevisiae cell-cycle dataset and an Arabidopsis thaliana root developmental dataset. In both datasets our method successfully detects processes operating on several different time scales. Furthermore we show that many of these time scales can be associated with particular biological functions. CONCLUSIONS: The spatiotemporal modules identified by our method suggest the presence of multiple biological processes, acting at distinct time scales in both the Arabidopsis root and yeast. Using similar large-scale expression datasets, the identification of biological processes acting at multiple time scales in many organisms is now possible.
Resumo:
BACKGROUND: Nonparametric Bayesian techniques have been developed recently to extend the sophistication of factor models, allowing one to infer the number of appropriate factors from the observed data. We consider such techniques for sparse factor analysis, with application to gene-expression data from three virus challenge studies. Particular attention is placed on employing the Beta Process (BP), the Indian Buffet Process (IBP), and related sparseness-promoting techniques to infer a proper number of factors. The posterior density function on the model parameters is computed using Gibbs sampling and variational Bayesian (VB) analysis. RESULTS: Time-evolving gene-expression data are considered for respiratory syncytial virus (RSV), Rhino virus, and influenza, using blood samples from healthy human subjects. These data were acquired in three challenge studies, each executed after receiving institutional review board (IRB) approval from Duke University. Comparisons are made between several alternative means of per-forming nonparametric factor analysis on these data, with comparisons as well to sparse-PCA and Penalized Matrix Decomposition (PMD), closely related non-Bayesian approaches. CONCLUSIONS: Applying the Beta Process to the factor scores, or to the singular values of a pseudo-SVD construction, the proposed algorithms infer the number of factors in gene-expression data. For real data the "true" number of factors is unknown; in our simulations we consider a range of noise variances, and the proposed Bayesian models inferred the number of factors accurately relative to other methods in the literature, such as sparse-PCA and PMD. We have also identified a "pan-viral" factor of importance for each of the three viruses considered in this study. We have identified a set of genes associated with this pan-viral factor, of interest for early detection of such viruses based upon the host response, as quantified via gene-expression data.
Resumo:
BACKGROUND: Over the past two decades more than fifty thousand unique clinical and biological samples have been assayed using the Affymetrix HG-U133 and HG-U95 GeneChip microarray platforms. This substantial repository has been used extensively to characterize changes in gene expression between biological samples, but has not been previously mined en masse for changes in mRNA processing. We explored the possibility of using HG-U133 microarray data to identify changes in alternative mRNA processing in several available archival datasets. RESULTS: Data from these and other gene expression microarrays can now be mined for changes in transcript isoform abundance using a program described here, SplicerAV. Using in vivo and in vitro breast cancer microarray datasets, SplicerAV was able to perform both gene and isoform specific expression profiling within the same microarray dataset. Our reanalysis of Affymetrix U133 plus 2.0 data generated by in vitro over-expression of HRAS, E2F3, beta-catenin (CTNNB1), SRC, and MYC identified several hundred oncogene-induced mRNA isoform changes, one of which recognized a previously unknown mechanism of EGFR family activation. Using clinical data, SplicerAV predicted 241 isoform changes between low and high grade breast tumors; with changes enriched among genes coding for guanyl-nucleotide exchange factors, metalloprotease inhibitors, and mRNA processing factors. Isoform changes in 15 genes were associated with aggressive cancer across the three breast cancer datasets. CONCLUSIONS: Using SplicerAV, we identified several hundred previously uncharacterized isoform changes induced by in vitro oncogene over-expression and revealed a previously unknown mechanism of EGFR activation in human mammary epithelial cells. We analyzed Affymetrix GeneChip data from over 400 human breast tumors in three independent studies, making this the largest clinical dataset analyzed for en masse changes in alternative mRNA processing. The capacity to detect RNA isoform changes in archival microarray data using SplicerAV allowed us to carry out the first analysis of isoform specific mRNA changes directly associated with cancer survival.
Resumo:
The neurodegenerative disease Friedreich's ataxia (FRDA) is the most common autosomal-recessively inherited ataxia and is caused by a GAA triplet repeat expansion in the first intron of the frataxin gene. In this disease, transcription of frataxin, a mitochondrial protein involved in iron homeostasis, is impaired, resulting in a significant reduction in mRNA and protein levels. Global gene expression analysis was performed in peripheral blood samples from FRDA patients as compared to controls, which suggested altered expression patterns pertaining to genotoxic stress. We then confirmed the presence of genotoxic DNA damage by using a gene-specific quantitative PCR assay and discovered an increase in both mitochondrial and nuclear DNA damage in the blood of these patients (p<0.0001, respectively). Additionally, frataxin mRNA levels correlated with age of onset of disease and displayed unique sets of gene alterations involved in immune response, oxidative phosphorylation, and protein synthesis. Many of the key pathways observed by transcription profiling were downregulated, and we believe these data suggest that patients with prolonged frataxin deficiency undergo a systemic survival response to chronic genotoxic stress and consequent DNA damage detectable in blood. In conclusion, our results yield insight into the nature and progression of FRDA, as well as possible therapeutic approaches. Furthermore, the identification of potential biomarkers, including the DNA damage found in peripheral blood, may have predictive value in future clinical trials.
Resumo:
Eukaryotic genomes are mostly composed of noncoding DNA whose role is still poorly understood. Studies in several organisms have shown correlations between the length of the intergenic and genic sequences of a gene and the expression of its corresponding mRNA transcript. Some studies have found a positive relationship between intergenic sequence length and expression diversity between tissues, and concluded that genes under greater regulatory control require more regulatory information in their intergenic sequences. Other reports found a negative relationship between expression level and gene length and the interpretation was that there is selection pressure for highly expressed genes to remain small. However, a correlation between gene sequence length and expression diversity, opposite to that observed for intergenic sequences, has also been reported, and to date there is no testable explanation for this observation. To shed light on these varied and sometimes conflicting results, we performed a thorough study of the relationships between sequence length and gene expression using cell-type (tissue) specific microarray data in Arabidopsis thaliana. We measured median gene expression across tissues (expression level), expression variability between tissues (expression pattern uniformity), and expression variability between replicates (expression noise). We found that intergenic (upstream and downstream) and genic (coding and noncoding) sequences have generally opposite relationships with respect to expression, whether it is tissue variability, median, or expression noise. To explain these results we propose a model, in which the lengths of the intergenic and genic sequences have opposite effects on the ability of the transcribed region of the gene to be epigenetically regulated for differential expression. These findings could shed light on the role and influence of noncoding sequences on gene expression.
Resumo:
In the event of a terrorist-mediated attack in the United States using radiological or improvised nuclear weapons, it is expected that hundreds of thousands of people could be exposed to life-threatening levels of ionizing radiation. We have recently shown that genome-wide expression analysis of the peripheral blood (PB) can generate gene expression profiles that can predict radiation exposure and distinguish the dose level of exposure following total body irradiation (TBI). However, in the event a radiation-mass casualty scenario, many victims will have heterogeneous exposure due to partial shielding and it is unknown whether PB gene expression profiles would be useful in predicting the status of partially irradiated individuals. Here, we identified gene expression profiles in the PB that were characteristic of anterior hemibody-, posterior hemibody- and single limb-irradiation at 0.5 Gy, 2 Gy and 10 Gy in C57Bl6 mice. These PB signatures predicted the radiation status of partially irradiated mice with a high level of accuracy (range 79-100%) compared to non-irradiated mice. Interestingly, PB signatures of partial body irradiation were poorly predictive of radiation status by site of injury (range 16-43%), suggesting that the PB molecular response to partial body irradiation was anatomic site specific. Importantly, PB gene signatures generated from TBI-treated mice failed completely to predict the radiation status of partially irradiated animals or non-irradiated controls. These data demonstrate that partial body irradiation, even to a single limb, generates a characteristic PB signature of radiation injury and thus may necessitate the use of multiple signatures, both partial body and total body, to accurately assess the status of an individual exposed to radiation.
Resumo:
During bacterial growth, a cell approximately doubles in size before division, after which it splits into two daughter cells. This process is subjected to the inherent perturbations of cellular noise and thus requires regulation for cell-size homeostasis. The mechanisms underlying the control and dynamics of cell size remain poorly understood owing to the difficulty in sizing individual bacteria over long periods of time in a high-throughput manner. Here we measure and analyse long-term, single-cell growth and division across different Escherichia coli strains and growth conditions. We show that a subset of cells in a population exhibit transient oscillations in cell size with periods that stretch across several (more than ten) generations. Our analysis reveals that a simple law governing cell-size control-a noisy linear map-explains the origins of these cell-size oscillations across all strains. This noisy linear map implements a negative feedback on cell-size control: a cell with a larger initial size tends to divide earlier, whereas one with a smaller initial size tends to divide later. Combining simulations of cell growth and division with experimental data, we demonstrate that this noisy linear map generates transient oscillations, not just in cell size, but also in constitutive gene expression. Our work provides new insights into the dynamics of bacterial cell-size regulation with implications for the physiological processes involved.
Resumo:
Acute respiratory infections caused by bacterial or viral pathogens are among the most common reasons for seeking medical care. Despite improvements in pathogen-based diagnostics, most patients receive inappropriate antibiotics. Host response biomarkers offer an alternative diagnostic approach to direct antimicrobial use. This observational cohort study determined whether host gene expression patterns discriminate noninfectious from infectious illness and bacterial from viral causes of acute respiratory infection in the acute care setting. Peripheral whole blood gene expression from 273 subjects with community-onset acute respiratory infection (ARI) or noninfectious illness, as well as 44 healthy controls, was measured using microarrays. Sparse logistic regression was used to develop classifiers for bacterial ARI (71 probes), viral ARI (33 probes), or a noninfectious cause of illness (26 probes). Overall accuracy was 87% (238 of 273 concordant with clinical adjudication), which was more accurate than procalcitonin (78%, P < 0.03) and three published classifiers of bacterial versus viral infection (78 to 83%). The classifiers developed here externally validated in five publicly available data sets (AUC, 0.90 to 0.99). A sixth publicly available data set included 25 patients with co-identification of bacterial and viral pathogens. Applying the ARI classifiers defined four distinct groups: a host response to bacterial ARI, viral ARI, coinfection, and neither a bacterial nor a viral response. These findings create an opportunity to develop and use host gene expression classifiers as diagnostic platforms to combat inappropriate antibiotic use and emerging antibiotic resistance.
Resumo:
BACKGROUND: Previous work has demonstrated the potential for peripheral blood (PB) gene expression profiling for the detection of disease or environmental exposures. METHODS AND FINDINGS: We have sought to determine the impact of several variables on the PB gene expression profile of an environmental exposure, ionizing radiation, and to determine the specificity of the PB signature of radiation versus other genotoxic stresses. Neither genotype differences nor the time of PB sampling caused any lessening of the accuracy of PB signatures to predict radiation exposure, but sex difference did influence the accuracy of the prediction of radiation exposure at the lowest level (50 cGy). A PB signature of sepsis was also generated and both the PB signature of radiation and the PB signature of sepsis were found to be 100% specific at distinguishing irradiated from septic animals. We also identified human PB signatures of radiation exposure and chemotherapy treatment which distinguished irradiated patients and chemotherapy-treated individuals within a heterogeneous population with accuracies of 90% and 81%, respectively. CONCLUSIONS: We conclude that PB gene expression profiles can be identified in mice and humans that are accurate in predicting medical conditions, are specific to each condition and remain highly accurate over time.
Resumo:
Hybrid dysfunctions, such as sterility, may result in part from disruptions in the regulation of gene expression. Studies of hybrids within the Drosophila simulans clade have reported genes expressed above or below the expression observed in their parent species, and such misexpression is associated with male sterility in multigenerational backcross hybrids. However, these studies often examined whole bodies rather than testes or had limited replication using less-sensitive but global techniques. Here, we use a new RNA isolation technique to re-examine hybrid gene expression disruptions in both testes and whole bodies from single Drosophila males by real-time quantitative RT-PCR. We find two early-spermatogenesis transcripts are underexpressed in hybrid whole-bodies but not in assays of testes alone, while two late-spermatogenesis transcripts seem to be underexpressed in both whole-bodies and testes alone. Although the number of transcripts surveyed is limited, these results provide some support for a previous hypothesis that the spermatogenesis pathway in these sterile hybrids may be disrupted sometime after the expression of the early meiotic arrest genes.
Resumo:
Knowing the timing, level, cellular localization, and cell type that a gene is expressed in contributes to our understanding of the function of the gene. Each of these features can be accomplished with in situ hybridization to mRNAs within cells. Here we present a radioactive in situ hybridization method modified from Clayton et al. (1988)(1) that has been working successfully in our lab for many years, especially for adult vertebrate brains(2-5). The long complementary RNA (cRNA) probes to the target sequence allows for detection of low abundance transcripts(6,7). Incorporation of radioactive nucleotides into the cRNA probes allows for further detection sensitivity of low abundance transcripts and quantitative analyses, either by light sensitive x-ray film or emulsion coated over the tissue. These detection methods provide a long-term record of target gene expression. Compared with non-radioactive probe methods, such as DIG-labeling, the radioactive probe hybridization method does not require multiple amplification steps using HRP-antibodies and/or TSA kit to detect low abundance transcripts. Therefore, this method provides a linear relation between signal intensity and targeted mRNA amounts for quantitative analysis. It allows processing 100-200 slides simultaneously. It works well for different developmental stages of embryos. Most developmental studies of gene expression use whole embryos and non-radioactive approaches(8,9), in part because embryonic tissue is more fragile than adult tissue, with less cohesion between cells, making it difficult to see boundaries between cell populations with tissue sections. In contrast, our radioactive approach, due to the larger range of sensitivity, is able to obtain higher contrast in resolution of gene expression between tissue regions, making it easier to see boundaries between populations. Using this method, researchers could reveal the possible significance of a newly identified gene, and further predict the function of the gene of interest.
Resumo:
Developmental signals in metazoans play critical roles in inducing cell differentiation from multipotent progenitors. The existing paradigm posits that the signals operate directly through their downstream transcription factors to activate expression of cell type-specific genes, which are the hallmark of cell identity. We have investigated the mechanism through which Wnt signaling induces osteoblast differentiation in an osteoblast-adipocyte bipotent progenitor cell line. Unexpectedly, Wnt3a acutely suppresses the expression of a large number of genes while inducing osteoblast differentiation. The suppressed genes include Pparg and Cebpa, which encode adipocyte-specifying transcription factors and suppression of which is sufficient to induce osteoblast differentiation. The large scale gene suppression induced by Wnt3a corresponds to a global decrease in histone acetylation, an epigenetic modification that is associated with gene activation. Mechanistically, Wnt3a does not alter histone acetyltransferase or deacetylase activities but, rather, decreases the level of acetyl-CoA in the nucleus. The Wnt-induced decrease in histone acetylation is independent of β-catenin signaling but, rather, correlates with suppression of glucose metabolism in the tricarboxylic acid cycle. Functionally, preventing histone deacetylation by increasing nucleocytoplasmic acetyl-CoA levels impairs Wnt3a-induced osteoblast differentiation. Thus, Wnt signaling induces osteoblast differentiation in part through histone deacetylation and epigenetic suppression of an alternative cell fate.