83 resultados para Johns Hopkins Hospital

em Collection Of Biostatistics Research Archive


Relevância:

90.00% 90.00%

Publicador:

Resumo:

Numerous time series studies have provided strong evidence of an association between increased levels of ambient air pollution and increased levels of hospital admissions, typically at 0, 1, or 2 days after an air pollution episode. An important research aim is to extend existing statistical models so that a more detailed understanding of the time course of hospitalization after exposure to air pollution can be obtained. Information about this time course, combined with prior knowledge about biological mechanisms, could provide the basis for hypotheses concerning the mechanism by which air pollution causes disease. Previous studies have identified two important methodological questions: (1) How can we estimate the shape of the distributed lag between increased air pollution exposure and increased mortality or morbidity? and (2) How should we estimate the cumulative population health risk from short-term exposure to air pollution? Distributed lag models are appropriate tools for estimating air pollution health effects that may be spread over several days. However, estimation for distributed lag models in air pollution and health applications is hampered by the substantial noise in the data and the inherently weak signal that is the target of investigation. We introduce an hierarchical Bayesian distributed lag model that incorporates prior information about the time course of pollution effects and combines information across multiple locations. The model has a connection to penalized spline smoothing using a special type of penalty matrix. We apply the model to estimating the distributed lag between exposure to particulate matter air pollution and hospitalization for cardiovascular and respiratory disease using data from a large United States air pollution and hospitalization database of Medicare enrollees in 94 counties covering the years 1999-2002.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Quantifying the health effects associated with simultaneous exposure to many air pollutants is now a research priority of the US EPA. Bayesian hierarchical models (BHM) have been extensively used in multisite time series studies of air pollution and health to estimate health effects of a single pollutant adjusted for potential confounding of other pollutants and other time-varying factors. However, when the scientific goal is to estimate the impacts of many pollutants jointly, a straightforward application of BHM is challenged by the need to specify a random-effect distribution on a high-dimensional vector of nuisance parameters, which often do not have an easy interpretation. In this paper we introduce a new BHM formulation, which we call "reduced BHM", aimed at analyzing clustered data sets in the presence of a large number of random effects that are not of primary scientific interest. At the first stage of the reduced BHM, we calculate the integrated likelihood of the parameter of interest (e.g. excess number of deaths attributed to simultaneous exposure to high levels of many pollutants). At the second stage, we specify a flexible random-effect distribution directly on the parameter of interest. The reduced BHM overcomes many of the challenges in the specification and implementation of full BHM in the context of a large number of nuisance parameters. In simulation studies we show that the reduced BHM performs comparably to the full BHM in many scenarios, and even performs better in some cases. Methods are applied to estimate location-specific and overall relative risks of cardiovascular hospital admissions associated with simultaneous exposure to elevated levels of particulate matter and ozone in 51 US counties during the period 1999-2005.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

In this paper, we consider estimation of the causal effect of a treatment on an outcome from observational data collected in two phases. In the first phase, a simple random sample of individuals are drawn from a population. On these individuals, information is obtained on treatment, outcome, and a few low-dimensional confounders. These individuals are then stratified according to these factors. In the second phase, a random sub-sample of individuals are drawn from each stratum, with known, stratum-specific selection probabilities. On these individuals, a rich set of confounding factors are collected. In this setting, we introduce four estimators: (1) simple inverse weighted, (2) locally efficient, (3) doubly robust and (4)enriched inverse weighted. We evaluate the finite-sample performance of these estimators in a simulation study. We also use our methodology to estimate the causal effect of trauma care on in-hospital mortality using data from the National Study of Cost and Outcomes of Trauma.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The AEGISS (Ascertainment and Enhancement of Gastrointestinal Infection Surveillance and Statistics) project aims to use spatio-temporal statistical methods to identify anomalies in the space-time distribution of non-specific, gastrointestinal infections in the UK, using the Southampton area in southern England as a test-case. In this paper, we use the AEGISS project to illustrate how spatio-temporal point process methodology can be used in the development of a rapid-response, spatial surveillance system. Current surveillance of gastroenteric disease in the UK relies on general practitioners reporting cases of suspected food-poisoning through a statutory notification scheme, voluntary laboratory reports of the isolation of gastrointestinal pathogens and standard reports of general outbreaks of infectious intestinal disease by public health and environmental health authorities. However, most statutory notifications are made only after a laboratory reports the isolation of a gastrointestinal pathogen. As a result, detection is delayed and the ability to react to an emerging outbreak is reduced. For more detailed discussion, see Diggle et al. (2003). A new and potentially valuable source of data on the incidence of non-specific gastro-enteric infections in the UK is NHS Direct, a 24-hour phone-in clinical advice service. NHS Direct data are less likely than reports by general practitioners to suffer from spatially and temporally localized inconsistencies in reporting rates. Also, reporting delays by patients are likely to be reduced, as no appointments are needed. Against this, NHS Direct data sacrifice specificity. Each call to NHS Direct is classified only according to the general pattern of reported symptoms (Cooper et al, 2003). The current paper focuses on the use of spatio-temporal statistical analysis for early detection of unexplained variation in the spatio-temporal incidence of non-specific gastroenteric symptoms, as reported to NHS Direct. Section 2 describes our statistical formulation of this problem, the nature of the available data and our approach to predictive inference. Section 3 describes the stochastic model. Section 4 gives the results of fitting the model to NHS Direct data. Section 5 shows how the model is used for spatio-temporal prediction. The paper concludes with a short discussion.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Mendelian models can predict who carries an inherited deleterious mutation of known disease genes based on family history. For example, the BRCAPRO model is commonly used to identify families who carry mutations of BRCA1 and BRCA2, based on familial breast and ovarian cancers. These models incorporate the age of diagnosis of diseases in relatives and current age or age of death. We develop a rigorous foundation for handling multiple diseases with censoring. We prove that any disease unrelated to mutations can be excluded from the model, unless it is sufficiently common and dependent on a mutation-related disease time. Furthermore, if a family member has a disease with higher probability density among mutation carriers, but the model does not account for it, then the carrier probability is deflated. However, even if a family only has diseases the model accounts for, if the model excludes a mutation-related disease, then the carrier probability will be inflated. In light of these results, we extend BRCAPRO to account for surviving all non-breast/ovary cancers as a single outcome. The extension also enables BRCAPRO to extract more useful information from male relatives. Using 1500 familes from the Cancer Genetics Network, accounting for surviving other cancers improves BRCAPRO’s concordance index from 0.758 to 0.762 (p = 0.046), improves its positive predictive value from 35% to 39% (p < 10−6) without impacting its negative predictive value, and improves its overall calibration, although calibration slightly worsens for those with carrier probability < 10%. Copyright c 2000 John Wiley & Sons, Ltd.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The aim of many genetic studies is to locate the genomic regions (called quantitative trait loci, QTLs) that contribute to variation in a quantitative trait (such as body weight). Confidence intervals for the locations of QTLs are particularly important for the design of further experiments to identify the gene or genes responsible for the effect. Likelihood support intervals are the most widely used method to obtain confidence intervals for QTL location, but the non-parametric bootstrap has also been recommended. Through extensive computer simulation, we show that bootstrap confidence intervals are poorly behaved and so should not be used in this context. The profile likelihood (or LOD curve) for QTL location has a tendency to peak at genetic markers, and so the distribution of the maximum likelihood estimate (MLE) of QTL location has the unusual feature of point masses at genetic markers; this contributes to the poor behavior of the bootstrap. Likelihood support intervals and approximate Bayes credible intervals, on the other hand, are shown to behave appropriately.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Microarray technology is a powerful tool able to measure RNA expression for thousands of genes at once. Various studies have been published comparing competing platforms with mixed results: some find agreement, others do not. As the number of researchers starting to use microarrays and the number of crossplatform meta-analysis studies rapidly increase, appropriate platform assessments become more important. Here we present results from a comparison study that offers important improvements over those previously described in the literature. In particular, we notice that none of the previously published papers consider differences between labs. For this paper, a consortium of ten labs from the Washington DC/Baltimore (USA) area was formed to compare three heavily used platforms using identical RNA samples: Appropriate statistical analysis demonstrates that relatively large differences exist between labs using the same platform, but that the results from the best performing labs agree rather well. Supplemental material is available from http://www.biostat.jhsph.edu/~ririzarr/techcomp/

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Professor Sir David R. Cox (DRC) is widely acknowledged as among the most important scientists of the second half of the twentieth century. He inherited the mantle of statistical science from Pearson and Fisher, advanced their ideas, and translated statistical theory into practice so as to forever change the application of statistics in many fields, but especially biology and medicine. The logistic and proportional hazards models he substantially developed, are arguably among the most influential biostatistical methods in current practice. This paper looks forward over the period from DRC's 80th to 90th birthdays, to speculate about the future of biostatistics, drawing lessons from DRC's contributions along the way. We consider "Cox's model" of biostatistics, an approach to statistical science that: formulates scientific questions or quantities in terms of parameters gamma in probability models f(y; gamma) that represent in a parsimonious fashion, the underlying scientific mechanisms (Cox, 1997); partition the parameters gamma = theta, eta into a subset of interest theta and other "nuisance parameters" eta necessary to complete the probability distribution (Cox and Hinkley, 1974); develops methods of inference about the scientific quantities that depend as little as possible upon the nuisance parameters (Barndorff-Nielsen and Cox, 1989); and thinks critically about the appropriate conditional distribution on which to base infrences. We briefly review exciting biomedical and public health challenges that are capable of driving statistical developments in the next decade. We discuss the statistical models and model-based inferences central to the CM approach, contrasting them with computationally-intensive strategies for prediction and inference advocated by Breiman and others (e.g. Breiman, 2001) and to more traditional design-based methods of inference (Fisher, 1935). We discuss the hierarchical (multi-level) model as an example of the future challanges and opportunities for model-based inference. We then consider the role of conditional inference, a second key element of the CM. Recent examples from genetics are used to illustrate these ideas. Finally, the paper examines causal inference and statistical computing, two other topics we believe will be central to biostatistics research and practice in the coming decade. Throughout the paper, we attempt to indicate how DRC's work and the "Cox Model" have set a standard of excellence to which all can aspire in the future.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The affected sib/relative pair (ASP/ARP) design is often used with covariates to find genes that can cause a disease in pathways other than through those covariates. However, such "covariates" can themselves have genetic determinants, and the validity of existing methods has so far only been argued under implicit assumptions. We propose an explicit causal formulation of the problem using potential outcomes and principal stratification. The general role of this formulation is to identify and separate the meaning of the different assumptions that can provide valid causal inference in linkage analysis. This separation helps to (a) develop better methods under explicit assumptions, and (b) show the different ways in which these assumptions can fail, which is necessary for developing further specific designs to test these assumptions and confirm or improve the inference. Using this formulation in the specific problem above, we show that, when the "covariate" (e.g., addiction to smoking) also has genetic determinants, then existing methods, including those previously thought as valid, can declare linkage between the disease and marker loci even when no such linkage exists. We also introduce design strategies to address the problem.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

An important aspect of the QTL mapping problem is the treatment of missing genotype data. If complete genotype data were available, QTL mapping would reduce to the problem of model selection in linear regression. However, in the consideration of loci in the intervals between the available genetic markers, genotype data is inherently missing. Even at the typed genetic markers, genotype data is seldom complete, as a result of failures in the genotyping assays or for the sake of economy (for example, in the case of selective genotyping, where only individuals with extreme phenotypes are genotyped). We discuss the use of algorithms developed for hidden Markov models (HMMs) to deal with the missing genotype data problem.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We derive a new class of iterative schemes for accelerating the convergence of the EM algorithm, by exploiting the connection between fixed point iterations and extrapolation methods. First, we present a general formulation of one-step iterative schemes, which are obtained by cycling with the extrapolation methods. We, then square the one-step schemes to obtain the new class of methods, which we call SQUAREM. Squaring a one-step iterative scheme is simply applying it twice within each cycle of the extrapolation method. Here we focus on the first order or rank-one extrapolation methods for two reasons, (1) simplicity, and (2) computational efficiency. In particular, we study two first order extrapolation methods, the reduced rank extrapolation (RRE1) and minimal polynomial extrapolation (MPE1). The convergence of the new schemes, both one-step and squared, is non-monotonic with respect to the residual norm. The first order one-step and SQUAREM schemes are linearly convergent, like the EM algorithm but they have a faster rate of convergence. We demonstrate, through five different examples, the effectiveness of the first order SQUAREM schemes, SqRRE1 and SqMPE1, in accelerating the EM algorithm. The SQUAREM schemes are also shown to be vastly superior to their one-step counterparts, RRE1 and MPE1, in terms of computational efficiency. The proposed extrapolation schemes can fail due to the numerical problems of stagnation and near breakdown. We have developed a new hybrid iterative scheme that combines the RRE1 and MPE1 schemes in such a manner that it overcomes both stagnation and near breakdown. The squared first order hybrid scheme, SqHyb1, emerges as the iterative scheme of choice based on our numerical experiments. It combines the fast convergence of the SqMPE1, while avoiding near breakdowns, with the stability of SqRRE1, while avoiding stagnations. The SQUAREM methods can be incorporated very easily into an existing EM algorithm. They only require the basic EM step for their implementation and do not require any other auxiliary quantities such as the complete data log likelihood, and its gradient or hessian. They are an attractive option in problems with a very large number of parameters, and in problems where the statistical model is complex, the EM algorithm is slow and each EM step is computationally demanding.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

There are numerous statistical methods for quantitative trait linkage analysis in human studies. An ideal such method would have high power to detect genetic loci contributing to the trait, would be robust to non-normality in the phenotype distribution, would be appropriate for general pedigrees, would allow the incorporation of environmental covariates, and would be appropriate in the presence of selective sampling. We recently described a general framework for quantitative trait linkage analysis, based on generalized estimating equations, for which many current methods are special cases. This procedure is appropriate for general pedigrees and easily accommodates environmental covariates. In this paper, we use computer simulations to investigate the power robustness of a variety of linkage test statistics built upon our general framework. We also propose two novel test statistics that take account of higher moments of the phenotype distribution, in order to accommodate non-normality. These new linkage tests are shown to have high power and to be robust to non-normality. While we have not yet examined the performance of our procedures in the context of selective sampling via computer simulations, the proposed tests satisfy all of the other qualities of an ideal quantitative trait linkage analysis method.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Assessments of environmental and territorial justice are similar in that both assess whether empirical relations between the spatial arrangement of undesirable hazards (or desirable public goods and services) and socio-demographic groups are consistent with notions of social justice, evaluating the spatial distribution of benefits and burdens (outcome equity) and the process that produces observed differences (process equity. Using proximity to major highways in NYC as a case study, we review methodological issues pertinent to both fields and discuss choice and computation of exposure measures, but focus primarily on measures of inequity. We present inequity measures computed from the empirically estimated joint distribution of exposure and demographics and compare them to traditional measures such as linear regression, logistic regression and Theil’s entropy index. We find that measures computed from the full joint distribution provide more unified, transparent and intuitive operational definitions of inequity and show how the approach can be used to structure siting and decommissioning decisions.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Smoothing splines are a popular approach for non-parametric regression problems. We use periodic smoothing splines to fit a periodic signal plus noise model to data for which we assume there are underlying circadian patterns. In the smoothing spline methodology, choosing an appropriate smoothness parameter is an important step in practice. In this paper, we draw a connection between smoothing splines and REACT estimators that provides motivation for the creation of criteria for choosing the smoothness parameter. The new criteria are compared to three existing methods, namely cross-validation, generalized cross-validation, and generalization of maximum likelihood criteria, by a Monte Carlo simulation and by an application to the study of circadian patterns. For most of the situations presented in the simulations, including the practical example, the new criteria out-perform the three existing criteria.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The NMMAPS data package contains daily mortality, air pollution, and weather data originally assembled as part of the National Morbidity,Mortality, and Air Pollution Study (NMMAPS). The data have recently been updated and are available for 108 United States cities for the years 1987--2000. The package provides tools for building versions of the full database in a structured and reproducible manner. These database derivatives may be more suitable for particular analyses. We describe how to use the package to implement a multi-city time series analysis of mortality and PM(10). In addition we demonstrate how to reproduce recent findings based on the NMMAPS data.