749 resultados para Oligo-microarrays
Resumo:
An optimal multiple testing procedure is identified for linear hypotheses under the general linear model, maximizing the expected number of false null hypotheses rejected at any significance level. The optimal procedure depends on the unknown data-generating distribution, but can be consistently estimated. Drawing information together across many hypotheses, the estimated optimal procedure provides an empirical alternative hypothesis by adapting to underlying patterns of departure from the null. Proposed multiple testing procedures based on the empirical alternative are evaluated through simulations and an application to gene expression microarray data. Compared to a standard multiple testing procedure, it is not unusual for use of an empirical alternative hypothesis to increase by 50% or more the number of true positives identified at a given significance level.
Resumo:
Under a two-level hierarchical model, suppose that the distribution of the random parameter is known or can be estimated well. Data are generated via a fixed, but unobservable realization of this parameter. In this paper, we derive the smallest confidence region of the random parameter under a joint Bayesian/frequentist paradigm. On average this optimal region can be much smaller than the corresponding Bayesian highest posterior density region. The new estimation procedure is appealing when one deals with data generated under a highly parallel structure, for example, data from a trial with a large number of clinical centers involved or genome-wide gene-expession data for estimating individual gene- or center-specific parameters simultaneously. The new proposal is illustrated with a typical microarray data set and its performance is examined via a small simulation study.
Resumo:
Use of microarray technology often leads to high-dimensional and low- sample size data settings. Over the past several years, a variety of novel approaches have been proposed for variable selection in this context. However, only a small number of these have been adapted for time-to-event data where censoring is present. Among standard variable selection methods shown both to have good predictive accuracy and to be computationally efficient is the elastic net penalization approach. In this paper, adaptation of the elastic net approach is presented for variable selection both under the Cox proportional hazards model and under an accelerated failure time (AFT) model. Assessment of the two methods is conducted through simulation studies and through analysis of microarray data obtained from a set of patients with diffuse large B-cell lymphoma where time to survival is of interest. The approaches are shown to match or exceed the predictive performance of a Cox-based and an AFT-based variable selection method. The methods are moreover shown to be much more computationally efficient than their respective Cox- and AFT- based counterparts.
Resumo:
Genome-wide association studies (GWAS) are used to discover genes underlying complex, heritable disorders for which less powerful study designs have failed in the past. The number of GWAS has skyrocketed recently with findings reported in top journals and the mainstream media. Mircorarrays are the genotype calling technology of choice in GWAS as they permit exploration of more than a million single nucleotide polymorphisms (SNPs)simultaneously. The starting point for the statistical analyses used by GWAS, to determine association between loci and disease, are genotype calls (AA, AB, or BB). However, the raw data, microarray probe intensities, are heavily processed before arriving at these calls. Various sophisticated statistical procedures have been proposed for transforming raw data into genotype calls. We find that variability in microarray output quality across different SNPs, different arrays, and different sample batches has substantial inuence on the accuracy of genotype calls made by existing algorithms. Failure to account for these sources of variability, GWAS run the risk of adversely affecting the quality of reported findings. In this paper we present solutions based on a multi-level mixed model. Software implementation of the method described in this paper is available as free and open source code in the crlmm R/BioConductor.
Resumo:
Submicroscopic changes in chromosomal DNA copy number dosage are common and have been implicated in many heritable diseases and cancers. Recent high-throughput technologies have a resolution that permits the detection of segmental changes in DNA copy number that span thousands of basepairs across the genome. Genome-wide association studies (GWAS) may simultaneously screen for copy number-phenotype and SNP-phenotype associations as part of the analytic strategy. However, genome-wide array analyses are particularly susceptible to batch effects as the logistics of preparing DNA and processing thousands of arrays often involves multiple laboratories and technicians, or changes over calendar time to the reagents and laboratory equipment. Failure to adjust for batch effects can lead to incorrect inference and requires inefficient post-hoc quality control procedures that exclude regions that are associated with batch. Our work extends previous model-based approaches for copy number estimation by explicitly modeling batch effects and using shrinkage to improve locus-specific estimates of copy number uncertainty. Key features of this approach include the use of diallelic genotype calls from experimental data to estimate batch- and locus-specific parameters of background and signal without the requirement of training data. We illustrate these ideas using a study of bipolar disease and a study of chromosome 21 trisomy. The former has batch effects that dominate much of the observed variation in quantile-normalized intensities, while the latter illustrates the robustness of our approach to datasets where as many as 25% of the samples have altered copy number. Locus-specific estimates of copy number can be plotted on the copy-number scale to investigate mosaicism and guide the choice of appropriate downstream approaches for smoothing the copy number as a function of physical position. The software is open source and implemented in the R package CRLMM available at Bioconductor (http:www.bioconductor.org).
Resumo:
The ability to measure gene expression on a genome-wide scale is one of the most promising accomplishments in molecular biology. Microarrays, the technology that first permitted this, were riddled with problems due to unwanted sources of variability. Many of these problems are now mitigated, after a decade’s worth of statistical methodology development. The recently developed RNA sequencing (RNA-seq) technology has generated much excitement in part due to claims of reduced variability in comparison to microarrays. However, we show RNA-seq data demonstrates unwanted and obscuring variability similar to what was first observed in microarrays. In particular, we find GC-content has a strong sample specific effect on gene expression measurements that, if left uncorrected, leads to false positives in downstream results. We also report on commonly observed data distortions that demonstrate the need for data normalization. Here we describe statistical methodology that improves precision by 42% without loss of accuracy. Our resulting conditional quantile normalization (CQN) algorithm combines robust generalized regression to remove systematic bias introduced by deterministic features such as GC-content, and quantile normalization to correct for global distortions.
Resumo:
The association of simian virus 40 (SV40) with malignant pleural mesothelioma is currently under debate. In some malignancies of viral aetiology, viral DNA can be detected in the patients' serum or plasma. To characterize the prevalence of SV40 in Swiss mesothelioma patients, we optimized a real-time PCR for quantitative detection of SV40 DNA in plasma, and used a monoclonal antibody for immunohistochemical detection of SV40 in mesothelioma tissue microarrays. Real-time PCR was linear over five orders of magnitude, and sensitive to a single gene copy. Repeat PCR determinations showed excellent reproducibility. However, SV40 status varied for independent DNA isolates of single samples. We noted that SV40 detection rates by PCR were drastically reduced by the implementation of strict room compartmentalization and decontamination procedures. Therefore, we systematically addressed common sources of contamination and found no cross-reactivity with DNA of other polyomaviruses. Contamination during PCR was rare and plasmid contamination was infrequent. SV40 DNA was reproducibly detected in only 4 of 78 (5.1%) plasma samples. SV40 DNA levels were low and not consistently observed in paired plasma and tumour samples from the same patient. Immunohistochemical analysis revealed a weak but reproducible SV40 staining in 16 of 341 (4.7%) mesotheliomas. Our data support the occurrence of non-reproducible SV40 PCR amplifications and underscore the importance of proper sample handling and analysis. SV40 DNA and protein were found at low prevalence (5%) in plasma and tumour tissue, respectively. This suggests that SV40 does not appear to play a major role in the development of mesothelioma.
Resumo:
OBJECTIVE: To identify markers associated with the chondrogenic capacity of expanded human articular chondrocytes and to use these markers for sorting of more highly chondrogenic subpopulations. METHODS: The chondrogenic capacity of chondrocyte populations derived from different donors (n = 21) or different clonal strains from the same cartilage biopsy specimen (n = 21) was defined based on the glycosaminoglycan (GAG) content of tissues generated using a pellet culture model. Selected cell populations were analyzed by microarray and flow cytometry. In some experiments, cells were sorted using antibodies against molecules found to be associated with differential chondrogenic capacity and again assessed in pellet cultures. RESULTS: Significance Analysis of Microarrays indicated that chondrocytes with low chondrogenic capacity expressed higher levels of insulin-like growth factor 1 and of catabolic genes (e.g., matrix metalloproteinase 2, aggrecanase 2), while chondrocytes with high chondrogenic capacity expressed higher levels of genes involved in cell-cell or cell-matrix interactions (e.g., CD49c, CD49f). Flow cytometry analysis showed that CD44, CD151, and CD49c were expressed at significantly higher levels in chondrocytes with higher chondrogenic capacity. Flow cytometry analysis of clonal chondrocyte strains indicated that CD44 and CD151 could also identify more chondrogenic clones. Chondrocytes sorted for brighter CD49c or CD44 signal expression produced tissues with higher levels of GAG per DNA (up to 1.4-fold) and type II collagen messenger RNA (up to 3.4-fold) than did unsorted cells. CONCLUSION: We identified markers that allow characterization of the capacity of monolayer-expanded chondrocytes to form in vitro cartilaginous tissue and enable enrichment for subpopulations with higher chondrogenic capacity. These markers might be used as a means to predict and possibly improve the outcome of cell-based cartilage repair techniques.