916 resultados para EXPRESSION DATA
Resumo:
Background: Using array comparative genomic hybridization (aCGH), a large number of deleted genomic regions have been identified in human cancers. However, subsequent efforts to identify target genes selected for inactivation in these regions have often been challenging. Methods: We integrated here genome-wide copy number data with gene expression data and non-sense mediated mRNA decay rates in breast cancer cell lines to prioritize gene candidates that are likely to be tumour suppressor genes inactivated by bi-allelic genetic events. The candidates were sequenced to identify potential mutations. Results: This integrated genomic approach led to the identification of RIC8A at 11p15 as a putative candidate target gene for the genomic deletion in the ZR-75-1 breast cancer cell line. We identified a truncating mutation in this cell line, leading to loss of expression and rapid decay of the transcript. We screened 127 breast cancers for RIC8A mutations, but did not find any pathogenic mutations. No promoter hypermethylation in these tumours was detected either. However, analysis of gene expression data from breast tumours identified a small group of aggressive tumours that displayed low levels of RIC8A transcripts. qRT-PCR analysis of 38 breast tumours showed a strong association between low RIC8A expression and the presence of TP53 mutations (P = 0.006). Conclusion: We demonstrate a data integration strategy leading to the identification of RIC8A as a gene undergoing a classical double-hit genetic inactivation in a breast cancer cell line, as well as in vivo evidence of loss of RIC8A expression in a subgroup of aggressive TP53 mutant breast cancers.
Resumo:
Glioblastoma (GBM) is the most common and aggressive primary brain tumor with very poor patient median survival. To identify a microRNA (miRNA) expression signature that can predict GBM patient survival, we analyzed the miRNA expression data of GBM patients (n = 222) derived from The Cancer Genome Atlas (TCGA) dataset. We divided the patients randomly into training and testing sets with equal number in each group. We identified 10 significant miRNAs using Cox regression analysis on the training set and formulated a risk score based on the expression signature of these miRNAs that segregated the patients into high and low risk groups with significantly different survival times (hazard ratio HR] = 2.4; 95% CI = 1.4-3.8; p < 0.0001). Of these 10 miRNAs, 7 were found to be risky miRNAs and 3 were found to be protective. This signature was independently validated in the testing set (HR = 1.7; 95% CI = 1.1-2.8; p = 0.002). GBM patients with high risk scores had overall poor survival compared to the patients with low risk scores. Overall survival among the entire patient set was 35.0% at 2 years, 21.5% at 3 years, 18.5% at 4 years and 11.8% at 5 years in the low risk group, versus 11.0%, 5.5%, 0.0 and 0.0% respectively in the high risk group (HR = 2.0; 95% CI = 1.4-2.8; p < 0.0001). Cox multivariate analysis with patient age as a covariate on the entire patient set identified risk score based on the 10 miRNA expression signature to be an independent predictor of patient survival (HR = 1.120; 95% CI = 1.04-1.20; p = 0.003). Thus we have identified a miRNA expression signature that can predict GBM patient survival. These findings may have implications in the understanding of gliomagenesis, development of targeted therapy and selection of high risk cancer patients for adjuvant therapy.
Resumo:
Background: Gene expression technologies have opened up new ways to diagnose and treat cancer and other diseases. Clustering algorithms are a useful approach with which to analyze genome expression data. They attempt to partition the genes into groups exhibiting similar patterns of variation in expression level. An important problem associated with gene classification is to discern whether the clustering process can find a relevant partition as well as the identification of new genes classes. There are two key aspects to classification: the estimation of the number of clusters, and the decision as to whether a new unit (gene, tumor sample ... ) belongs to one of these previously identified clusters or to a new group. Results: ICGE is a user-friendly R package which provides many functions related to this problem: identify the number of clusters using mixed variables, usually found by applied biomedical researchers; detect whether the data have a cluster structure; identify whether a new unit belongs to one of the pre-identified clusters or to a novel group, and classify new units into the corresponding cluster. The functions in the ICGE package are accompanied by help files and easy examples to facilitate its use. Conclusions: We demonstrate the utility of ICGE by analyzing simulated and real data sets. The results show that ICGE could be very useful to a broad research community.
Resumo:
Numerous transcription factors self-assemble into different order oligomeric species in a way that is actively regulated by the cell. Until now, no general functional role has been identified for this widespread process. Here, we capture the effects of modulated self-assembly in gene expression with a novel quantitative framework. We show that this mechanism provides precision and flexibility, two seemingly antagonistic properties, to the sensing of diverse cellular signals by systems that share common elements present in transcription factors like p53, NF-kappa B, STATs, Oct and RXR. Applied to the nuclear hormone receptor RXR, this framework accurately reproduces a broad range of classical, previously unexplained, sets of gene expression data and corroborates the existence of a precise functional regime with flexible properties that can be controlled both at a genome-wide scale and at the individual promoter level.
Resumo:
A nonparametric Bayesian extension of Factor Analysis (FA) is proposed where observed data $\mathbf{Y}$ is modeled as a linear superposition, $\mathbf{G}$, of a potentially infinite number of hidden factors, $\mathbf{X}$. The Indian Buffet Process (IBP) is used as a prior on $\mathbf{G}$ to incorporate sparsity and to allow the number of latent features to be inferred. The model's utility for modeling gene expression data is investigated using randomly generated data sets based on a known sparse connectivity matrix for E. Coli, and on three biological data sets of increasing complexity.
Resumo:
We live in an era of abundant data. This has necessitated the development of new and innovative statistical algorithms to get the most from experimental data. For example, faster algorithms make practical the analysis of larger genomic data sets, allowing us to extend the utility of cutting-edge statistical methods. We present a randomised algorithm that accelerates the clustering of time series data using the Bayesian Hierarchical Clustering (BHC) statistical method. BHC is a general method for clustering any discretely sampled time series data. In this paper we focus on a particular application to microarray gene expression data. We define and analyse the randomised algorithm, before presenting results on both synthetic and real biological data sets. We show that the randomised algorithm leads to substantial gains in speed with minimal loss in clustering quality. The randomised time series BHC algorithm is available as part of the R package BHC, which is available for download from Bioconductor (version 2.10 and above) via http://bioconductor.org/packages/2.10/bioc/html/BHC.html. We have also made available a set of R scripts which can be used to reproduce the analyses carried out in this paper. These are available from the following URL. https://sites.google.com/site/randomisedbhc/.
Resumo:
Anterior gradient 2 (Agr2) genes encode secretory proteins, and play significant roles in anterior-posterior patterning and tumor metastasis. Agr2 transcripts were shown to display quite diverse tissue distribution in different species, and little was known about the cellular localization of Agr2 proteins. In this study, we identified an Agr2 homologue from gibe[ carp (Carassius auratus gibelio), and revealed the expression patterns and cellular localization during embryogenesis and in adult tissues. The full-length cDNA of CagAgr2 is 803 nucleotides (nt) with an open reading frame of 510 nt encoding 169 amino acids. The Agr2 C-terminus matches to the class I PDZ-interacting motif, suggesting that it might be a PDZ-binding protein. During embryogenesis, CagAgr2 was found to be transcribed in the mucus-secreting hatching gland from tailbud stage and later in the pharynx region, swim bladder and pronephric duct as revealed by RT-PCR and whole mount in situ hybridization. In the adult fish, its transcription was predominantly confined to the kidney, and lower transcription levels were also found in the intestine, ovary and gills. To further localize the Agr2 protein, the anti-CagAgr2 polyclonal antibody was produced and used for immunofluorescence observation. In agreement with mRNA expression data, the Agr2 protein was localized in the pronephric duct of 3dph larvae. In adult fish, Agr2 protein expression is confined to the renal collecting system with asymmetric distribution along the apical-basolateral axis. The data provided suggestive evidence that fish Agr2 might be involved in differentiation and secretory functions of kidney epithelium. (C) 2009 Elsevier Inc. All rights reserved.
Resumo:
The accurate recognition of cancer subtypes is very significant in clinic. Especially, the DNA microarray gene expression technology is applied to diagnosing and recognizing cancer types. This paper proposed a method of that recognized cancer subtypes based on geometrical learning. Firstly, the cancer genes expression profiles data was pretreated and selected feature genes by conventional method; then the expression data of feature genes in the training samples was construed each convex hull in the high-dimensional space using training algorithm of geometrical learning, while the independent test set was tested by the recognition algorithm of geometrical learning. The method was applied to the human acute leukemia gene expression data. The accuracy rate reached to 100%. The experiments have proved its efficiency and feasibility.
Resumo:
Identifying protein-protein interactions is crucial for understanding cellular functions. Genomic data provides opportunities and challenges in identifying these interactions. We uncover the rules for predicting protein-protein interactions using a frequent pattern tree (FPT) approach modified to generate a minimum set of rules (mFPT), with rule attributes constructed from the interaction features of the yeast genomic data. The mFPT prediction accuracy is benchmarked against other commonly used methods such as Bayesian networks and logistic regressions under various statistical measures. Our study indicates that mFPT outranks other methods in predicting the protein-protein interactions for the database used. We predict a new protein-protein interaction complex whose biological function is related to premRNA splicing and new protein-protein interactions within existing complexes based on the rules generated.
Resumo:
The SIEGE (Smoking Induced Epithelial Gene Expression) database is a clinical resource for compiling and analyzing gene expression data from epithelial cells of the human intra-thoracic airway. This database supports a translational research study whose goal is to profile the changes in airway gene expression that are induced by cigarette smoke. RNA is isolated from airway epithelium obtained at bronchoscopy from current-, former- and never-smoker subjects, and hybridized to Affymetrix HG-U133A Genechips, which measure the level of expression of ~22 500 human transcripts. The microarray data generated along with relevant patient information is uploaded to SIEGE by study administrators using the database's web interface, found at http://pulm.bumc.bu.edu/siegeDB. PERL-coded scripts integrated with SIEGE perform various quality control functions including the processing, filtering and formatting of stored data. The R statistical package is used to import database expression values and execute a number of statistical analyses including t-tests, correlation coefficients and hierarchical clustering. Values from all statistical analyses can be queried through CGI-based tools and web forms found on the �Search� section of the database website. Query results are embedded with graphical capabilities as well as with links to other databases containing valuable gene resources, including Entrez Gene, GO, Biocarta, GeneCards, dbSNP and the NCBI Map Viewer.
Resumo:
BACKGROUND: Mutations in the TP53 gene are extremely common and occur very early in the progression of serous ovarian cancers. Gene expression patterns that relate to mutational status may provide insight into the etiology and biology of the disease. METHODS: The TP53 coding region was sequenced in 89 frozen serous ovarian cancers, 40 early stage (I/II) and 49 advanced stage (III/IV). Affymetrix U133A expression data was used to define gene expression patterns by mutation, type of mutation, and cancer stage. RESULTS: Missense or chain terminating (null) mutations in TP53 were found in 59/89 (66%) ovarian cancers. Early stage cancers had a significantly higher rate of null mutations than late stage disease (38% vs. 8%, p < 0.03). In advanced stage cases, mutations were more prevalent in short term survivors than long term survivors (81% vs. 30%, p = 0.0004). Gene expression patterns had a robust ability to predict TP53 status within training data. By using early versus late stage disease for out of sample predictions, the signature derived from early stage cancers could accurately (86%) predict mutation status of late stage cancers. CONCLUSIONS: This represents the first attempt to define a genomic signature of TP53 mutation in ovarian cancer. Patterns of gene expression characteristic of TP53 mutation could be discerned and included several genes that are known p53 targets or have been described in the context of expression signatures of TP53 mutation in breast cancer.
Resumo:
BACKGROUND: Nonparametric Bayesian techniques have been developed recently to extend the sophistication of factor models, allowing one to infer the number of appropriate factors from the observed data. We consider such techniques for sparse factor analysis, with application to gene-expression data from three virus challenge studies. Particular attention is placed on employing the Beta Process (BP), the Indian Buffet Process (IBP), and related sparseness-promoting techniques to infer a proper number of factors. The posterior density function on the model parameters is computed using Gibbs sampling and variational Bayesian (VB) analysis. RESULTS: Time-evolving gene-expression data are considered for respiratory syncytial virus (RSV), Rhino virus, and influenza, using blood samples from healthy human subjects. These data were acquired in three challenge studies, each executed after receiving institutional review board (IRB) approval from Duke University. Comparisons are made between several alternative means of per-forming nonparametric factor analysis on these data, with comparisons as well to sparse-PCA and Penalized Matrix Decomposition (PMD), closely related non-Bayesian approaches. CONCLUSIONS: Applying the Beta Process to the factor scores, or to the singular values of a pseudo-SVD construction, the proposed algorithms infer the number of factors in gene-expression data. For real data the "true" number of factors is unknown; in our simulations we consider a range of noise variances, and the proposed Bayesian models inferred the number of factors accurately relative to other methods in the literature, such as sparse-PCA and PMD. We have also identified a "pan-viral" factor of importance for each of the three viruses considered in this study. We have identified a set of genes associated with this pan-viral factor, of interest for early detection of such viruses based upon the host response, as quantified via gene-expression data.
Resumo:
BACKGROUND: MicroRNAs (miRNAs) are oligoribonucleotides with an important role in regulation of gene expression at the level of translation. Despite imperfect target complementarity, they can also significantly reduce mRNA levels. The validity of miRNA target gene predictions is difficult to assess at the protein level. We sought, therefore, to determine whether a general lowering of predicted target gene mRNA expression by endogenous miRNAs was detectable within microarray gene expression profiles. RESULTS: The target gene sets predicted for each miRNA were mapped onto known gene expression data from a range of tissues. Whether considering mean absolute target gene expression, rank sum tests or 'ranked ratios', many miRNAs with significantly reduced target gene expression corresponded to those known to be expressed in the cognate tissue. Expression levels of miRNAs with reduced target mRNA levels were higher than those of miRNAs with no detectable effect on mRNA expression. Analysis of microarray data gathered after artificial perturbation of expression of a specific miRNA confirmed the predicted increase or decrease in influence of the altered miRNA upon mRNA levels. Strongest associations were observed with targets predicted by TargetScan. CONCLUSION: We have demonstrated that the effect of a miRNA on its target mRNAs' levels can be measured within a single gene expression profile. This emphasizes the extent of this mode of regulation in vivo and confirms that many of the predicted miRNA-mRNA interactions are correct. The success of this approach has revealed the vast potential for extracting information about miRNA function from gene expression profiles.
Resumo:
Background: MicroRNAs (miRNAs) are small RNA molecules (similar to 22 nucleotides) which have been shown to play an important role both in development and in maintenance of adult tissue. Conditional inactivation of miRNAs in the eye causes loss of visual function and progressive retinal degeneration. In addition to inhibiting translation, miRNAs can mediate degradation of targeted mRNAs. We have previously shown that candidate miRNAs affecting transcript levels in a tissue can be deduced from mRNA microarray expression profiles. The purpose of this study was to predict miRNAs which affect mRNA levels in developing and adult retinal tissue and to confirm their expression.
Results: Microarray expression data from ciliary epithelial retinal stem cells (CE-RSCs), developing and adult mouse retina were generated or downloaded from public repositories. Analysis of gene expression profiles detected the effects of multiple miRNAs in CE-RSCs and retina. The expression of 20 selected miRNAs was confirmed by RT-PCR and the cellular distribution of representative candidates analyzed by in situ hybridization. The expression levels of miRNAs correlated with the significance of their predicted effects upon mRNA expression. Highly expressed miRNAs included miR-124, miR-125a, miR-125b, miR-204 and miR-9. Over-expression of three miRNAs with significant predicted effects upon global mRNA levels resulted in a decrease in mRNA expression of five out of six individual predicted target genes assayed.
Conclusions: This study has detected the effect of miRNAs upon mRNA expression in immature and adult retinal tissue and cells. The validity of these observations is supported by the experimental confirmation of candidate miRNA expression and the regulation of predicted target genes following miRNA over-expression. Identified miRNAs are likely to be important in retinal development and function. Misregulation of these miRNAs might contribute to retinal degeneration and disease. Conversely, manipulation of their expression could potentially be used as a therapeutic tool in the future.
Resumo:
The Microarray Innovations in Leukemia study assessed the clinical utility of gene expression profiling as a single test to subtype leukemias into conventional categories of myeloid and lymphoid malignancies. METHODS: The investigation was performed in 11 laboratories across three continents and included 3,334 patients. An exploratory retrospective stage I study was designed for biomarker discovery and generated whole-genome expression profiles from 2,143 patients with leukemias and myelodysplastic syndromes. The gene expression profiling-based diagnostic accuracy was further validated in a prospective second study stage of an independent cohort of 1,191 patients. RESULTS: On the basis of 2,096 samples, the stage I study achieved 92.2% classification accuracy for all 18 distinct classes investigated (median specificity of 99.7%). In a second cohort of 1,152 prospectively collected patients, a classification scheme reached 95.6% median sensitivity and 99.8% median specificity for 14 standard subtypes of acute leukemia (eight acute lymphoblastic leukemia and six acute myeloid leukemia classes, n = 693). In 29 (57%) of 51 discrepant cases, the microarray results had outperformed routine diagnostic methods. CONCLUSION: Gene expression profiling is a robust technology for the diagnosis of hematologic malignancies with high accuracy. It may complement current diagnostic algorithms and could offer a reliable platform for patients who lack access to today's state-of-the-art diagnostic work-up. Our comprehensive gene expression data set will be submitted to the public domain to foster research focusing on the molecular understanding of leukemias