5 resultados para Quantile regressions

em Collection Of Biostatistics Research Archive


Relevância:

20.00% 20.00%

Publicador:

Resumo:

The ability to measure gene expression on a genome-wide scale is one of the most promising accomplishments in molecular biology. Microarrays, the technology that first permitted this, were riddled with problems due to unwanted sources of variability. Many of these problems are now mitigated, after a decade’s worth of statistical methodology development. The recently developed RNA sequencing (RNA-seq) technology has generated much excitement in part due to claims of reduced variability in comparison to microarrays. However, we show RNA-seq data demonstrates unwanted and obscuring variability similar to what was first observed in microarrays. In particular, we find GC-content has a strong sample specific effect on gene expression measurements that, if left uncorrected, leads to false positives in downstream results. We also report on commonly observed data distortions that demonstrate the need for data normalization. Here we describe statistical methodology that improves precision by 42% without loss of accuracy. Our resulting conditional quantile normalization (CQN) algorithm combines robust generalized regression to remove systematic bias introduced by deterministic features such as GC-content, and quantile normalization to correct for global distortions.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Recent interest in spatial pattern in terrestrial ecosystems has come from an awareness of theintimate relationship between spatial heterogeneity of soil resources and maintenance of plant species diversity. Soil and vegetation can vary spatially inresponse to several state factors of the system. In this study, we examined fine-scale spatial variability of soil nutrients and vascular plant species in contrasting herb-dominated communities (a pasture and an oldfield) to determine degree of spatial dependenceamong soil variables and plant community characteristics within these communities by sampling at 1-m intervals. Each site was divided into 25 1-m 2 plots. Mineral soil was sampled (2-cm diameter, 5-cm depth) from each of four 0.25-m2 quarters and combined into a single composite sample per plot. Soil organic matter was measured as loss-on-ignition. Extractable NH4 and NO3 were determined before and after laboratory incubation to determine potential net N mineralization and nitrification. Cations were analyzed using inductively coupled plasma emission spectrometry. Vegetation was assessed using estimated percent cover. Most soiland plant variables exhibited sharp contrasts betweenpasture and old-field sites, with the old field having significantly higher net N mineralization/nitrification, pH, Ca, Mg, Al, plant cover, and species diversity, richness, and evenness. Multiple regressions revealedthat all plant variables (species diversity, richness,evenness, and cover) were significantly related to soil characteristics (available nitrogen, organic matter,moisture, pH, Ca, and Mg) in the pasture; in the old field only cover was significantly related to soil characteristics (organic matter and moisture). Both sites contrasted sharply with respect to spatial pattern of soil variables, with the old field exhibiting a higher degree of spatial dependence. These results demonstrate that land-use practices can exert profound influence on spatial heterogeneity of both soil properties and vegetation in herb-dominated communities.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Submicroscopic changes in chromosomal DNA copy number dosage are common and have been implicated in many heritable diseases and cancers. Recent high-throughput technologies have a resolution that permits the detection of segmental changes in DNA copy number that span thousands of basepairs across the genome. Genome-wide association studies (GWAS) may simultaneously screen for copy number-phenotype and SNP-phenotype associations as part of the analytic strategy. However, genome-wide array analyses are particularly susceptible to batch effects as the logistics of preparing DNA and processing thousands of arrays often involves multiple laboratories and technicians, or changes over calendar time to the reagents and laboratory equipment. Failure to adjust for batch effects can lead to incorrect inference and requires inefficient post-hoc quality control procedures that exclude regions that are associated with batch. Our work extends previous model-based approaches for copy number estimation by explicitly modeling batch effects and using shrinkage to improve locus-specific estimates of copy number uncertainty. Key features of this approach include the use of diallelic genotype calls from experimental data to estimate batch- and locus-specific parameters of background and signal without the requirement of training data. We illustrate these ideas using a study of bipolar disease and a study of chromosome 21 trisomy. The former has batch effects that dominate much of the observed variation in quantile-normalized intensities, while the latter illustrates the robustness of our approach to datasets where as many as 25% of the samples have altered copy number. Locus-specific estimates of copy number can be plotted on the copy-number scale to investigate mosaicism and guide the choice of appropriate downstream approaches for smoothing the copy number as a function of physical position. The software is open source and implemented in the R package CRLMM available at Bioconductor (http:www.bioconductor.org).