919 resultados para Microarray Experiments
Resumo:
An important and common problem in microarray experiments is the detection of genes that are differentially expressed in a given number of classes. As this problem concerns the selection of significant genes from a large pool of candidate genes, it needs to be carried out within the framework of multiple hypothesis testing. In this paper, we focus on the use of mixture models to handle the multiplicity issue. With this approach, a measure of the local FDR (false discovery rate) is provided for each gene. An attractive feature of the mixture model approach is that it provides a framework for the estimation of the prior probability that a gene is not differentially expressed, and this probability can subsequently be used in forming a decision rule. The rule can also be formed to take the false negative rate into account. We apply this approach to a well-known publicly available data set on breast cancer, and discuss our findings with reference to other approaches.
Resumo:
An important and common problem in microarray experiments is the detection of genes that are differentially expressed in a given number of classes. As this problem concerns the selection of significant genes from a large pool of candidate genes, it needs to be carried out within the framework of multiple hypothesis testing. In this paper, we focus on the use of mixture models to handle the multiplicity issue. With this approach, a measure of the local false discovery rate is provided for each gene, and it can be implemented so that the implied global false discovery rate is bounded as with the Benjamini-Hochberg methodology based on tail areas. The latter procedure is too conservative, unless it is modified according to the prior probability that a gene is not differentially expressed. An attractive feature of the mixture model approach is that it provides a framework for the estimation of this probability and its subsequent use in forming a decision rule. The rule can also be formed to take the false negative rate into account.
Resumo:
Motivation: An important problem in microarray experiments is the detection of genes that are differentially expressed in a given number of classes. We provide a straightforward and easily implemented method for estimating the posterior probability that an individual gene is null. The problem can be expressed in a two-component mixture framework, using an empirical Bayes approach. Current methods of implementing this approach either have some limitations due to the minimal assumptions made or with more specific assumptions are computationally intensive. Results: By converting to a z-score the value of the test statistic used to test the significance of each gene, we propose a simple two-component normal mixture that models adequately the distribution of this score. The usefulness of our approach is demonstrated on three real datasets.
Resumo:
Motivation: The clustering of gene profiles across some experimental conditions of interest contributes significantly to the elucidation of unknown gene function, the validation of gene discoveries and the interpretation of biological processes. However, this clustering problem is not straightforward as the profiles of the genes are not all independently distributed and the expression levels may have been obtained from an experimental design involving replicated arrays. Ignoring the dependence between the gene profiles and the structure of the replicated data can result in important sources of variability in the experiments being overlooked in the analysis, with the consequent possibility of misleading inferences being made. We propose a random-effects model that provides a unified approach to the clustering of genes with correlated expression levels measured in a wide variety of experimental situations. Our model is an extension of the normal mixture model to account for the correlations between the gene profiles and to enable covariate information to be incorporated into the clustering process. Hence the model is applicable to longitudinal studies with or without replication, for example, time-course experiments by using time as a covariate, and to cross-sectional experiments by using categorical covariates to represent the different experimental classes. Results: We show that our random-effects model can be fitted by maximum likelihood via the EM algorithm for which the E(expectation) and M(maximization) steps can be implemented in closed form. Hence our model can be fitted deterministically without the need for time-consuming Monte Carlo approximations. The effectiveness of our model-based procedure for the clustering of correlated gene profiles is demonstrated on three real datasets, representing typical microarray experimental designs, covering time-course, repeated-measurement and cross-sectional data. In these examples, relevant clusters of the genes are obtained, which are supported by existing gene-function annotation. A synthetic dataset is considered too.
Resumo:
An important and common problem in microarray experiments is the detection of genes that are differentially expressed in a given number of classes. As this problem concerns the selection of significant genes from a large pool of candidate genes, it needs to be carried out within the framework of multiple hypothesis testing. In this paper, we focus on the use of mixture models to handle the multiplicity issue. With this approach, a measure of the local FDR (false discovery rate) is provided for each gene. An attractive feature of the mixture model approach is that it provides a framework for the estimation of the prior probability that a gene is not differentially expressed, and this probability can subsequently be used in forming a decision rule. The rule can also be formed to take the false negative rate into account. We apply this approach to a well-known publicly available data set on breast cancer, and discuss our findings with reference to other approaches.
Resumo:
This thesis is a study of low-dimensional visualisation methods for data visualisation under certainty of the input data. It focuses on the two main feed-forward neural network algorithms which are NeuroScale and Generative Topographic Mapping (GTM) by trying to make both algorithms able to accommodate the uncertainty. The two models are shown not to work well under high levels of noise within the data and need to be modified. The modification of both models, NeuroScale and GTM, are verified by using synthetic data to show their ability to accommodate the noise. The thesis is interested in the controversy surrounding the non-uniqueness of predictive gene lists (PGL) of predicting prognosis outcome of breast cancer patients as available in DNA microarray experiments. Many of these studies have ignored the uncertainty issue resulting in random correlations of sparse model selection in high dimensional spaces. The visualisation techniques are used to confirm that the patients involved in such medical studies are intrinsically unclassifiable on the basis of provided PGL evidence. This additional category of ‘unclassifiable’ should be accommodated within medical decision support systems if serious errors and unnecessary adjuvant therapy are to be avoided.
Resumo:
Background: The controversy surrounding the non-uniqueness of predictive gene lists (PGL) of small selected subsets of genes from very large potential candidates as available in DNA microarray experiments is now widely acknowledged 1. Many of these studies have focused on constructing discriminative semi-parametric models and as such are also subject to the issue of random correlations of sparse model selection in high dimensional spaces. In this work we outline a different approach based around an unsupervised patient-specific nonlinear topographic projection in predictive gene lists. Methods: We construct nonlinear topographic projection maps based on inter-patient gene-list relative dissimilarities. The Neuroscale, the Stochastic Neighbor Embedding(SNE) and the Locally Linear Embedding(LLE) techniques have been used to construct two-dimensional projective visualisation plots of 70 dimensional PGLs per patient, classifiers are also constructed to identify the prognosis indicator of each patient using the resulting projections from those visualisation techniques and investigate whether a-posteriori two prognosis groups are separable on the evidence of the gene lists. A literature-proposed predictive gene list for breast cancer is benchmarked against a separate gene list using the above methods. Generalisation ability is investigated by using the mapping capability of Neuroscale to visualise the follow-up study, but based on the projections derived from the original dataset. Results: The results indicate that small subsets of patient-specific PGLs have insufficient prognostic dissimilarity to permit a distinction between two prognosis patients. Uncertainty and diversity across multiple gene expressions prevents unambiguous or even confident patient grouping. Comparative projections across different PGLs provide similar results. Conclusion: The random correlation effect to an arbitrary outcome induced by small subset selection from very high dimensional interrelated gene expression profiles leads to an outcome with associated uncertainty. This continuum and uncertainty precludes any attempts at constructing discriminative classifiers. However a patient's gene expression profile could possibly be used in treatment planning, based on knowledge of other patients' responses. We conclude that many of the patients involved in such medical studies are intrinsically unclassifiable on the basis of provided PGL evidence. This additional category of 'unclassifiable' should be accommodated within medical decision support systems if serious errors and unnecessary adjuvant therapy are to be avoided.
Resumo:
Background: Expressed Sequence Tags (ESTs) are in general used to gain a first insight into gene activities from a species of interest. Subsequently, and typically based on a combination of EST and genome sequences, microarray-based expression analyses are performed for a variety of conditions. In some cases, a multitude of EST and microarray experiments are conducted for one species, covering different tissues, cell states, and cell types. Under these circumstances, the challenge arises to combine results derived from the different expression profiling strategies, with the goal to uncover novel information on the basis of the integrated datasets. Findings: Using our new analysis tool, MediPlEx (MEDIcago truncatula multiPLe EXpression analysis), expression data from EST experiments, oligonucleotide microarrays and Affymetrix GeneChips® can be combined and analyzed, leading to a novel approach to integrated transcriptome analysis. We have validated our tool via the identification of a set of well-characterized AM-specific and AM-induced marker genes, identified by MediPlEx on the basis of in silico and experimental gene expression profiles from roots colonized with AM fungi. Conclusions: MediPlEx offers an integrated analysis pipeline for different sets of expression data generated for the model legume Medicago truncatula. As expected, in silico and experimental gene expression data that cover the same biological condition correlate well. The collection of differentially expressed genes identified via MediPlEx provides a starting point for functional studies in plant mutants.
Resumo:
DNA Microarray was developed to monitor the expression of many genes from Xylella fastidiosa, allowing the side by-side comparison of two situations in a single experiment. The experiments were performed using X. fastidiosa cells grown in two culture media: BCYE and XDM2. The primers were synthesized, spotted onto glass slides and the array was hybridized against fluorescently labeled cDNAs. The emitted signals were quantified, normalized and the data were statistically analyzed to verify the differentially expressed genes. According to the data, 104 genes were differentially expressed in XDM2 and 30 genes in BCYE media. The present study showed that DNA microarray technique efficiently differentiate the expressed genes under different conditions.
Resumo:
MOTIVATION: Microarray results accumulated in public repositories are widely reused in meta-analytical studies and secondary databases. The quality of the data obtained with this technology varies from experiment to experiment, and an efficient method for quality assessment is necessary to ensure their reliability. RESULTS: The lack of a good benchmark has hampered evaluation of existing methods for quality control. In this study, we propose a new independent quality metric that is based on evolutionary conservation of expression profiles. We show, using 11 large organ-specific datasets, that IQRray, a new quality metrics developed by us, exhibits the highest correlation with this reference metric, among 14 metrics tested. IQRray outperforms other methods in identification of poor quality arrays in datasets composed of arrays from many independent experiments. In contrast, the performance of methods designed for detecting outliers in a single experiment like Normalized Unscaled Standard Error and Relative Log Expression was low because of the inability of these methods to detect datasets containing only low-quality arrays and because the scores cannot be directly compared between experiments. AVAILABILITY AND IMPLEMENTATION: The R implementation of IQRray is available at: ftp://lausanne.isb-sib.ch/pub/databases/Bgee/general/IQRray.R. CONTACT: Marta.Rosikiewicz@unil.ch SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Resumo:
Gene set enrichment (GSE) analysis is a popular framework for condensing information from gene expression profiles into a pathway or signature summary. The strengths of this approach over single gene analysis include noise and dimension reduction, as well as greater biological interpretability. As molecular profiling experiments move beyond simple case-control studies, robust and flexible GSE methodologies are needed that can model pathway activity within highly heterogeneous data sets. To address this challenge, we introduce Gene Set Variation Analysis (GSVA), a GSE method that estimates variation of pathway activity over a sample population in an unsupervised manner. We demonstrate the robustness of GSVA in a comparison with current state of the art sample-wise enrichment methods. Further, we provide examples of its utility in differential pathway activity and survival analysis. Lastly, we show how GSVA works analogously with data from both microarray and RNA-seq experiments. GSVA provides increased power to detect subtle pathway activity changes over a sample population in comparison to corresponding methods. While GSE methods are generally regarded as end points of a bioinformatic analysis, GSVA constitutes a starting point to build pathway-centric models of biology. Moreover, GSVA contributes to the current need of GSE methods for RNA-seq data. GSVA is an open source software package for R which forms part of the Bioconductor project and can be downloaded at http://www.bioconductor.org.
Resumo:
Currently, numerous high-throughput technologies are available for the study of human carcinomas. In literature, many variations of these techniques have been described. The common denominator for these methodologies is the high amount of data obtained in a single experiment, in a short time period, and at a fairly low cost. However, these methods have also been described with several problems and limitations. The purpose of this study was to test the applicability of two selected high-throughput methods, cDNA and tissue microarrays (TMA), in cancer research. Two common human malignancies, breast and colorectal cancer, were used as examples. This thesis aims to present some practical considerations that need to be addressed when applying these techniques. cDNA microarrays were applied to screen aberrant gene expression in breast and colon cancers. Immunohistochemistry was used to validate the results and to evaluate the association of selected novel tumour markers with the outcome of the patients. The type of histological material used in immunohistochemistry was evaluated especially considering the applicability of whole tissue sections and different types of TMAs. Special attention was put on the methodological details in the cDNA microarray and TMA experiments. In conclusion, many potential tumour markers were identified in the cDNA microarray analyses. Immunohistochemistry could be applied to validate the observed gene expression changes of selected markers and to associate their expression change with patient outcome. In the current experiments, both TMAs and whole tissue sections could be used for this purpose. This study showed for the first time that securin and p120 catenin protein expression predict breast cancer outcome and the immunopositivity of carbonic anhydrase IX associates with the outcome of rectal cancer. The predictive value of these proteins was statistically evident also in multivariate analyses with up to a 13.1- fold risk for cancer specific death in a specific subgroup of patients.
Resumo:
High-throughput screening of cellular effects of RNA interference (RNAi) libraries is now being increasingly applied to explore the role of genes in specific cell biological processes and disease states. However, the technology is still limited to specialty laboratories, due to the requirements for robotic infrastructure, access to expensive reagent libraries, expertise in high-throughput screening assay development, standardization, data analysis and applications. In the future, alternative screening platforms will be required to expand functional large-scale experiments to include more RNAi constructs, allow combinatorial loss-of-function analyses (e.g. genegene or gene-drug interaction), gain-of-function screens, multi-parametric phenotypic readouts or comparative analysis of many different cell types. Such comprehensive perturbation of gene networks in cells will require a major increase in the flexibility of the screening platforms, throughput and reduction of costs. As an alternative for the conventional multi-well based high-throughput screening -platforms, here the development of a novel cell spot microarray method for production of high density siRNA reverse transfection arrays is described. The cell spot microarray platform is distinguished from the majority of other transfection cell microarray techniques by the spatially confined array layout that allow highly parallel screening of large-scale RNAi reagent libraries with assays otherwise difficult or not applicable to high-throughput screening. This study depicts the development of the cell spot microarray method along with biological application examples of high-content immunofluorescence and phenotype based cancer cell biological analyses focusing on the regulation of prostate cancer cell growth, maintenance of genomic integrity in breast cancer cells, and functional analysis of integrin protein-protein interactions in situ.
Resumo:
Background: Microarray based comparative genomic hybridisation (CGH) experiments have been used to study numerous biological problems including understanding genome plasticity in pathogenic bacteria. Typically such experiments produce large data sets that are difficult for biologists to handle. Although there are some programmes available for interpretation of bacterial transcriptomics data and CGH microarray data for looking at genetic stability in oncogenes, there are none specifically to understand the mosaic nature of bacterial genomes. Consequently a bottle neck still persists in accurate processing and mathematical analysis of these data. To address this shortfall we have produced a simple and robust CGH microarray data analysis process that may be automated in the future to understand bacterial genomic diversity. Results: The process involves five steps: cleaning, normalisation, estimating gene presence and absence or divergence, validation, and analysis of data from test against three reference strains simultaneously. Each stage of the process is described and we have compared a number of methods available for characterising bacterial genomic diversity, for calculating the cut-off between gene presence and absence or divergence, and shown that a simple dynamic approach using a kernel density estimator performed better than both established, as well as a more sophisticated mixture modelling technique. We have also shown that current methods commonly used for CGH microarray analysis in tumour and cancer cell lines are not appropriate for analysing our data. Conclusion: After carrying out the analysis and validation for three sequenced Escherichia coli strains, CGH microarray data from 19 E. coli O157 pathogenic test strains were used to demonstrate the benefits of applying this simple and robust process to CGH microarray studies using bacterial genomes.
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)