39 resultados para Dept. of Zoology
Resumo:
In the simultaneous estimation of a large number of related quantities, multilevel models provide a formal mechanism for efficiently making use of the ensemble of information for deriving individual estimates. In this article we investigate the ability of the likelihood to identify the relationship between signal and noise in multilevel linear mixed models. Specifically, we consider the ability of the likelihood to diagnose conjugacy or independence between the signals and noises. Our work was motivated by the analysis of data from high-throughput experiments in genomics. The proposed model leads to a more flexible family. However, we further demonstrate that adequately capitalizing on the benefits of a well fitting fully-specified likelihood in the terms of gene ranking is difficult.
Resumo:
A recent article in this journal (Ioannidis JP (2005) Why most published research findings are false. PLoS Med 2: e124) argued that more than half of published research findings in the medical literature are false. In this commentary, we examine the structure of that argument, and show that it has three basic components: 1)An assumption that the prior probability of most hypotheses explored in medical research is below 50%. 2)Dichotomization of P-values at the 0.05 level and introduction of a “bias” factor (produced by significance-seeking), the combination of which severely weakens the evidence provided by every design. 3)Use of Bayes theorem to show that, in the face of weak evidence, hypotheses with low prior probabilities cannot have posterior probabilities over 50%. Thus, the claim is based on a priori assumptions that most tested hypotheses are likely to be false, and then the inferential model used makes it impossible for evidence from any study to overcome this handicap. We focus largely on step (2), explaining how the combination of dichotomization and “bias” dilutes experimental evidence, and showing how this dilution leads inevitably to the stated conclusion. We also demonstrate a fallacy in another important component of the argument –that papers in “hot” fields are more likely to produce false findings. We agree with the paper’s conclusions and recommendations that many medical research findings are less definitive than readers suspect, that P-values are widely misinterpreted, that bias of various forms is widespread, that multiple approaches are needed to prevent the literature from being systematically biased and the need for more data on the prevalence of false claims. But calculating the unreliability of the medical research literature, in whole or in part, requires more empirical evidence and different inferential models than were used. The claim that “most research findings are false for most research designs and for most fields” must be considered as yet unproven.
Resumo:
In linear mixed models, model selection frequently includes the selection of random effects. Two versions of the Akaike information criterion (AIC) have been used, based either on the marginal or on the conditional distribution. We show that the marginal AIC is no longer an asymptotically unbiased estimator of the Akaike information, and in fact favours smaller models without random effects. For the conditional AIC, we show that ignoring estimation uncertainty in the random effects covariance matrix, as is common practice, induces a bias that leads to the selection of any random effect not predicted to be exactly zero. We derive an analytic representation of a corrected version of the conditional AIC, which avoids the high computational cost and imprecision of available numerical approximations. An implementation in an R package is provided. All theoretical results are illustrated in simulation studies, and their impact in practice is investigated in an analysis of childhood malnutrition in Zambia.
Resumo:
Objective. To examine effects of primary care physicians (PCPs) and patients on the association between charges for primary care and specialty care in a point-of-service (POS) health plan. Data Source. Claims from 1996 for 3,308 adult male POS plan members, each of whom was assigned to one of the 50 family practitioner-PCPs with the largest POS plan member-loads. Study Design. A hierarchical multivariate two-part model was fitted using a Gibbs sampler to estimate PCPs' effects on patients' annual charges for two types of services, primary care and specialty care, the associations among PCPs' effects, and within-patient associations between charges for the two services. Adjusted Clinical Groups (ACGs) were used to adjust for case-mix. Principal Findings. PCPs with higher case-mix adjusted rates of specialist use were less likely to see their patients at least once during the year (estimated correlation: –.40; 95% CI: –.71, –.008) and provided fewer services to patients that they saw (estimated correlation: –.53; 95% CI: –.77, –.21). Ten of 11 PCPs whose case-mix adjusted effects on primary care charges were significantly less than or greater than zero (p < .05) had estimated, case-mix adjusted effects on specialty care charges that were of opposite sign (but not significantly different than zero). After adjustment for ACG and PCP effects, the within-patient, estimated odds ratio for any use of primary care given any use of specialty care was .57 (95% CI: .45, .73). Conclusions. PCPs and patients contributed independently to a trade-off between utilization of primary care and specialty care. The trade-off appeared to partially offset significant differences in the amount of care provided by PCPs. These findings were possible because we employed a hierarchical multivariate model rather than separate univariate models.
Resumo:
While many time-series studies of ozone and daily mortality identified positive associations,others yielded null or inconclusive results. We performed a meta-analysis of 144 effect estimates from 39 time-series studies, and estimated pooled effects by lags, age groups,cause-specific mortality, and concentration metrics. We compared results to estimates from the National Morbidity, Mortality, and Air Pollution Study (NMMAPS), a time-series study of 95 large U.S. cities from 1987 to 2000. Both meta-analysis and NMMAPS results provided strong evidence of a short-term association between ozone and mortality, with larger effects for cardiovascular and respiratory mortality, the elderly, and current day ozone exposure as compared to other single day lags. In both analyses, results were not sensitive to adjustment for particulate matter and model specifications. In the meta-analysis we found that a 10 ppb increase in daily ozone is associated with a 0.83 (95% confidence interval: 0.53, 1.12%) increase in total mortality, whereas the corresponding NMMAPS estimate is 0.25%(0.12, 0.39%). Meta-analysis results were consistently larger than those from NMMAPS,indicating publication bias. Additional publication bias is evident regarding the choice of lags in time-series studies, and the larger heterogeneity in posterior city-specific estimates in the meta-analysis, as compared with NMAMPS.
Resumo:
In most microarray technologies, a number of critical steps are required to convert raw intensity measurements into the data relied upon by data analysts, biologists and clinicians. These data manipulations, referred to as preprocessing, can influence the quality of the ultimate measurements. In the last few years, the high-throughput measurement of gene expression is the most popular application of microarray technology. For this application, various groups have demonstrated that the use of modern statistical methodology can substantially improve accuracy and precision of gene expression measurements, relative to ad-hoc procedures introduced by designers and manufacturers of the technology. Currently, other applications of microarrays are becoming more and more popular. In this paper we describe a preprocessing methodology for a technology designed for the identification of DNA sequence variants in specific genes or regions of the human genome that are associated with phenotypes of interest such as disease. In particular we describe methodology useful for preprocessing Affymetrix SNP chips and obtaining genotype calls with the preprocessed data. We demonstrate how our procedure improves existing approaches using data from three relatively large studies including one in which large number independent calls are available. Software implementing these ideas are avialble from the Bioconductor oligo package.
Resumo:
Numerous time series studies have provided strong evidence of an association between increased levels of ambient air pollution and increased levels of hospital admissions, typically at 0, 1, or 2 days after an air pollution episode. An important research aim is to extend existing statistical models so that a more detailed understanding of the time course of hospitalization after exposure to air pollution can be obtained. Information about this time course, combined with prior knowledge about biological mechanisms, could provide the basis for hypotheses concerning the mechanism by which air pollution causes disease. Previous studies have identified two important methodological questions: (1) How can we estimate the shape of the distributed lag between increased air pollution exposure and increased mortality or morbidity? and (2) How should we estimate the cumulative population health risk from short-term exposure to air pollution? Distributed lag models are appropriate tools for estimating air pollution health effects that may be spread over several days. However, estimation for distributed lag models in air pollution and health applications is hampered by the substantial noise in the data and the inherently weak signal that is the target of investigation. We introduce an hierarchical Bayesian distributed lag model that incorporates prior information about the time course of pollution effects and combines information across multiple locations. The model has a connection to penalized spline smoothing using a special type of penalty matrix. We apply the model to estimating the distributed lag between exposure to particulate matter air pollution and hospitalization for cardiovascular and respiratory disease using data from a large United States air pollution and hospitalization database of Medicare enrollees in 94 counties covering the years 1999-2002.
Resumo:
Functional neuroimaging techniques enable investigations into the neural basis of human cognition, emotions, and behaviors. In practice, applications of functional magnetic resonance imaging (fMRI) have provided novel insights into the neuropathophysiology of major psychiatric,neurological, and substance abuse disorders, as well as into the neural responses to their treatments. Modern activation studies often compare localized task-induced changes in brain activity between experimental groups. One may also extend voxel-level analyses by simultaneously considering the ensemble of voxels constituting an anatomically defined region of interest (ROI) or by considering means or quantiles of the ROI. In this work we present a Bayesian extension of voxel-level analyses that offers several notable benefits. First, it combines whole-brain voxel-by-voxel modeling and ROI analyses within a unified framework. Secondly, an unstructured variance/covariance for regional mean parameters allows for the study of inter-regional functional connectivity, provided enough subjects are available to allow for accurate estimation. Finally, an exchangeable correlation structure within regions allows for the consideration of intra-regional functional connectivity. We perform estimation for our model using Markov Chain Monte Carlo (MCMC) techniques implemented via Gibbs sampling which, despite the high throughput nature of the data, can be executed quickly (less than 30 minutes). We apply our Bayesian hierarchical model to two novel fMRI data sets: one considering inhibitory control in cocaine-dependent men and the second considering verbal memory in subjects at high risk for Alzheimer’s disease. The unifying hierarchical model presented in this manuscript is shown to enhance the interpretation content of these data sets.
Resumo:
Equivalence testing is growing in use in scientific research outside of its traditional role in the drug approval process. Largely due to its ease of use and recommendation from the United States Food and Drug Administration guidance, the most common statistical method for testing (bio)equivalence is the two one-sided tests procedure (TOST). Like classical point-null hypothesis testing, TOST is subject to multiplicity concerns as more comparisons are made. In this manuscript, a condition that bounds the family-wise error rate (FWER) using TOST is given. This condition then leads to a simple solution for controlling the FWER. Specifically, we demonstrate that if all pairwise comparisons of k independent groups are being evaluated for equivalence, then simply scaling the nominal Type I error rate down by (k - 1) is sufficient to maintain the family-wise error rate at the desired value or less. The resulting rule is much less conservative than the equally simple Bonferroni correction. An example of equivalence testing in a non drug-development setting is given.
Resumo:
Simulation-based assessment is a popular and frequently necessary approach to evaluation of statistical procedures. Sometimes overlooked is the ability to take advantage of underlying mathematical relations and we focus on this aspect. We show how to take advantage of large-sample theory when conducting a simulation using the analysis of genomic data as a motivating example. The approach uses convergence results to provide an approximation to smaller-sample results, results that are available only by simulation. We consider evaluating and comparing a variety of ranking-based methods for identifying the most highly associated SNPs in a genome-wide association study, derive integral equation representations of the pre-posterior distribution of percentiles produced by three ranking methods, and provide examples comparing performance. These results are of interest in their own right and set the framework for a more extensive set of comparisons.
Resumo:
Quantifying the health effects associated with simultaneous exposure to many air pollutants is now a research priority of the US EPA. Bayesian hierarchical models (BHM) have been extensively used in multisite time series studies of air pollution and health to estimate health effects of a single pollutant adjusted for potential confounding of other pollutants and other time-varying factors. However, when the scientific goal is to estimate the impacts of many pollutants jointly, a straightforward application of BHM is challenged by the need to specify a random-effect distribution on a high-dimensional vector of nuisance parameters, which often do not have an easy interpretation. In this paper we introduce a new BHM formulation, which we call "reduced BHM", aimed at analyzing clustered data sets in the presence of a large number of random effects that are not of primary scientific interest. At the first stage of the reduced BHM, we calculate the integrated likelihood of the parameter of interest (e.g. excess number of deaths attributed to simultaneous exposure to high levels of many pollutants). At the second stage, we specify a flexible random-effect distribution directly on the parameter of interest. The reduced BHM overcomes many of the challenges in the specification and implementation of full BHM in the context of a large number of nuisance parameters. In simulation studies we show that the reduced BHM performs comparably to the full BHM in many scenarios, and even performs better in some cases. Methods are applied to estimate location-specific and overall relative risks of cardiovascular hospital admissions associated with simultaneous exposure to elevated levels of particulate matter and ozone in 51 US counties during the period 1999-2005.
Resumo:
We propose a method for diagnosing confounding bias under a model which links a spatially and temporally varying exposure and health outcome. We decompose the association into orthogonal components, corresponding to distinct spatial and temporal scales of variation. If the model fully controls for confounding, the exposure effect estimates should be equal at the different temporal and spatial scales. We show that the overall exposure effect estimate is a weighted average of the scale-specific exposure effect estimates. We use this approach to estimate the association between monthly averages of fine particles (PM2.5) over the preceding 12 months and monthly mortality rates in 113 U.S. counties from 2000-2002. We decompose the association between PM2.5 and mortality into two components: 1) the association between “national trends” in PM2.5 and mortality; and 2) the association between “local trends,” defined as county-specificdeviations from national trends. This second component provides evidence as to whether counties having steeper declines in PM2.5 also have steeper declines in mortality relative to their national trends. We find that the exposure effect estimates are different at these two spatio-temporalscales, which raises concerns about confounding bias. We believe that the association between trends in PM2.5 and mortality at the national scale is more likely to be confounded than is the association between trends in PM2.5 and mortality at the local scale. If the association at the national scale is set aside, there is little evidence of an association between 12-month exposure to PM2.5 and mortality.
Resumo:
Bioequivalence trials are abbreviated clinical trials whereby a generic drug or new formulation is evaluated to determine if it is "equivalent" to a corresponding previously approved brand-name drug or formulation. In this manuscript, we survey the process of testing bioequivalence and advocate the likelihood paradigm for representing the resulting data as evidence. We emphasize the unique conflicts between hypothesis testing and confidence intervals in this area - which we believe are indicative of the existence of the systemic defects in the frequentist approach - that the likelihood paradigm avoids. We suggest the direct use of profile likelihoods for evaluating bioequivalence and examine the main properties of profile likelihoods and estimated likelihoods under simulation. This simulation study shows that profile likelihoods are a reasonable alternative to the (unknown) true likelihood for a range of parameters commensurate with bioequivalence research. Our study also shows that the standard methods in the current practice of bioequivalence trials offers only weak evidence from the evidential point of view.
Resumo:
Amplifications and deletions of chromosomal DNA, as well as copy-neutral loss of heterozygosity have been associated with diseases processes. High-throughput single nucleotide polymorphism (SNP) arrays are useful for making genome-wide estimates of copy number and genotype calls. Because neighboring SNPs in high throughput SNP arrays are likely to have dependent copy number and genotype due to the underlying haplotype structure and linkage disequilibrium, hidden Markov models (HMM) may be useful for improving genotype calls and copy number estimates that do not incorporate information from nearby SNPs. We improve previous approaches that utilize a HMM framework for inference in high throughput SNP arrays by integrating copy number, genotype calls, and the corresponding confidence scores when available. Using simulated data, we demonstrate how confidence scores control smoothing in a probabilistic framework. Software for fitting HMMs to SNP array data is available in the R package ICE.
Resumo:
In evaluating the accuracy of diagnosis tests, it is common to apply two imperfect tests jointly or sequentially to a study population. In a recent meta-analysis of the accuracy of microsatellite instability testing (MSI) and traditional mutation analysis (MUT) in predicting germline mutations of the mismatch repair (MMR) genes, a Bayesian approach (Chen, Watson, and Parmigiani 2005) was proposed to handle missing data resulting from partial testing and the lack of a gold standard. In this paper, we demonstrate an improved estimation of the sensitivities and specificities of MSI and MUT by using a nonlinear mixed model and a Bayesian hierarchical model, both of which account for the heterogeneity across studies through study-specific random effects. The methods can be used to estimate the accuracy of two imperfect diagnostic tests in other meta-analyses when the prevalence of disease, the sensitivities and/or the specificities of diagnostic tests are heterogeneous among studies. Furthermore, simulation studies have demonstrated the importance of carefully selecting appropriate random effects on the estimation of diagnostic accuracy measurements in this scenario.