7 resultados para regression discrete models
em Duke University
Resumo:
PURPOSE: The role of PM10 in the development of allergic diseases remains controversial among epidemiological studies, partly due to the inability to control for spatial variations in large-scale risk factors. This study aims to investigate spatial correspondence between the level of PM10 and allergic diseases at the sub-district level in Seoul, Korea, in order to evaluate whether the impact of PM10 is observable and spatially varies across the subdistricts. METHODS: PM10 measurements at 25 monitoring stations in the city were interpolated to 424 sub-districts where annual inpatient and outpatient count data for 3 types of allergic diseases (atopic dermatitis, asthma, and allergic rhinitis) were collected. We estimated multiple ordinary least square regression models to examine the association of the PM10 level with each of the allergic diseases, controlling for various sub-district level covariates. Geographically weighted regression (GWR) models were conducted to evaluate how the impact of PM10 varies across the sub-districts. RESULTS: PM10 was found to be a significant predictor of atopic dermatitis patient count (P<0.01), with greater association when spatially interpolated at the sub-district level. No significant effect of PM10 was observed on allergic rhinitis and asthma when socioeconomic factors were controlled for. GWR models revealed spatial variation of PM10 effects on atopic dermatitis across the sub-districts in Seoul. The relationship of PM10 levels to atopic dermatitis patient counts is found to be significant only in the Gangbuk region (P<0.01), along with other covariates including average land value, poverty rate, level of education and apartment rate (P<0.01). CONCLUSIONS: Our findings imply that PM10 effects on allergic diseases might not be consistent throughout Seoul. GIS-based spatial modeling techniques could play a role in evaluating spatial variation of air pollution impacts on allergic diseases at the sub-district level, which could provide valuable guidelines for environmental and public health policymakers.
Resumo:
This paper provides a root-n consistent, asymptotically normal weighted least squares estimator of the coefficients in a truncated regression model. The distribution of the errors is unknown and permits general forms of unknown heteroskedasticity. Also provided is an instrumental variables based two-stage least squares estimator for this model, which can be used when some regressors are endogenous, mismeasured, or otherwise correlated with the errors. A simulation study indicates that the new estimators perform well in finite samples. Our limiting distribution theory includes a new asymptotic trimming result addressing the boundary bias in first-stage density estimation without knowledge of the support boundary. © 2007 Cambridge University Press.
Resumo:
In regression analysis of counts, a lack of simple and efficient algorithms for posterior computation has made Bayesian approaches appear unattractive and thus underdeveloped. We propose a lognormal and gamma mixed negative binomial (NB) regression model for counts, and present efficient closed-form Bayesian inference; unlike conventional Poisson models, the proposed approach has two free parameters to include two different kinds of random effects, and allows the incorporation of prior information, such as sparsity in the regression coefficients. By placing a gamma distribution prior on the NB dispersion parameter r, and connecting a log-normal distribution prior with the logit of the NB probability parameter p, efficient Gibbs sampling and variational Bayes inference are both developed. The closed-form updates are obtained by exploiting conditional conjugacy via both a compound Poisson representation and a Polya-Gamma distribution based data augmentation approach. The proposed Bayesian inference can be implemented routinely, while being easily generalizable to more complex settings involving multivariate dependence structures. The algorithms are illustrated using real examples. Copyright 2012 by the author(s)/owner(s).
Resumo:
BACKGROUND: Both compulsory detoxification treatment and community-based methadone maintenance treatment (MMT) exist for heroin addicts in China. We aim to examine the effectiveness of three intervention models for referring heroin addicts released from compulsory detoxification centers to community methadone maintenance treatment (MMT) clinics in Dehong prefecture, Yunnan province, China. METHODS: Using a quasi-experimental study design, three different referral models were assigned to four detoxification centers. Heroin addicts were enrolled based on their fulfillment to eligibility criteria and provision of informed consent. Two months prior to their release, information on demographic characteristics, history of heroin use, and prior participation in intervention programs was collected via a survey, and blood samples were obtained for HIV testing. All subjects were followed for six months after release from detoxification centers. Multi-level logistic regression analysis was used to examine factors predicting successful referrals to MMT clinics. RESULTS: Of the 226 participants who were released and followed, 9.7% were successfully referred to MMT(16.2% of HIV-positive participants and 7.0% of HIV-negative participants). A higher proportion of successful referrals was observed among participants who received both referral cards and MMT treatment while still in detoxification centers (25.8%) as compared to those who received both referral cards and police-assisted MMT enrollment (5.4%) and those who received referral cards only (0%). Furthermore, those who received referral cards and MMT treatment while still in detoxification had increased odds of successful referral to an MMT clinic (adjusted OR = 1.2, CI = 1.1-1.3). Having participated in an MMT program prior to detention (OR = 1.5, CI = 1.3-1.6) was the only baseline covariate associated with increased odds of successful referral. CONCLUSION: Findings suggest that providing MMT within detoxification centers promotes successful referral of heroin addicts to community-based MMT upon their release.
Resumo:
The work presented in this dissertation is focused on applying engineering methods to develop and explore probabilistic survival models for the prediction of decompression sickness in US NAVY divers. Mathematical modeling, computational model development, and numerical optimization techniques were employed to formulate and evaluate the predictive quality of models fitted to empirical data. In Chapters 1 and 2 we present general background information relevant to the development of probabilistic models applied to predicting the incidence of decompression sickness. The remainder of the dissertation introduces techniques developed in an effort to improve the predictive quality of probabilistic decompression models and to reduce the difficulty of model parameter optimization.
The first project explored seventeen variations of the hazard function using a well-perfused parallel compartment model. Models were parametrically optimized using the maximum likelihood technique. Model performance was evaluated using both classical statistical methods and model selection techniques based on information theory. Optimized model parameters were overall similar to those of previously published Results indicated that a novel hazard function definition that included both ambient pressure scaling and individually fitted compartment exponent scaling terms.
We developed ten pharmacokinetic compartmental models that included explicit delay mechanics to determine if predictive quality could be improved through the inclusion of material transfer lags. A fitted discrete delay parameter augmented the inflow to the compartment systems from the environment. Based on the observation that symptoms are often reported after risk accumulation begins for many of our models, we hypothesized that the inclusion of delays might improve correlation between the model predictions and observed data. Model selection techniques identified two models as having the best overall performance, but comparison to the best performing model without delay and model selection using our best identified no delay pharmacokinetic model both indicated that the delay mechanism was not statistically justified and did not substantially improve model predictions.
Our final investigation explored parameter bounding techniques to identify parameter regions for which statistical model failure will not occur. When a model predicts a no probability of a diver experiencing decompression sickness for an exposure that is known to produce symptoms, statistical model failure occurs. Using a metric related to the instantaneous risk, we successfully identify regions where model failure will not occur and identify the boundaries of the region using a root bounding technique. Several models are used to demonstrate the techniques, which may be employed to reduce the difficulty of model optimization for future investigations.
Resumo:
Fitting statistical models is computationally challenging when the sample size or the dimension of the dataset is huge. An attractive approach for down-scaling the problem size is to first partition the dataset into subsets and then fit using distributed algorithms. The dataset can be partitioned either horizontally (in the sample space) or vertically (in the feature space), and the challenge arise in defining an algorithm with low communication, theoretical guarantees and excellent practical performance in general settings. For sample space partitioning, I propose a MEdian Selection Subset AGgregation Estimator ({\em message}) algorithm for solving these issues. The algorithm applies feature selection in parallel for each subset using regularized regression or Bayesian variable selection method, calculates the `median' feature inclusion index, estimates coefficients for the selected features in parallel for each subset, and then averages these estimates. The algorithm is simple, involves very minimal communication, scales efficiently in sample size, and has theoretical guarantees. I provide extensive experiments to show excellent performance in feature selection, estimation, prediction, and computation time relative to usual competitors.
While sample space partitioning is useful in handling datasets with large sample size, feature space partitioning is more effective when the data dimension is high. Existing methods for partitioning features, however, are either vulnerable to high correlations or inefficient in reducing the model dimension. In the thesis, I propose a new embarrassingly parallel framework named {\em DECO} for distributed variable selection and parameter estimation. In {\em DECO}, variables are first partitioned and allocated to m distributed workers. The decorrelated subset data within each worker are then fitted via any algorithm designed for high-dimensional problems. We show that by incorporating the decorrelation step, DECO can achieve consistent variable selection and parameter estimation on each subset with (almost) no assumptions. In addition, the convergence rate is nearly minimax optimal for both sparse and weakly sparse models and does NOT depend on the partition number m. Extensive numerical experiments are provided to illustrate the performance of the new framework.
For datasets with both large sample sizes and high dimensionality, I propose a new "divided-and-conquer" framework {\em DEME} (DECO-message) by leveraging both the {\em DECO} and the {\em message} algorithm. The new framework first partitions the dataset in the sample space into row cubes using {\em message} and then partition the feature space of the cubes using {\em DECO}. This procedure is equivalent to partitioning the original data matrix into multiple small blocks, each with a feasible size that can be stored and fitted in a computer in parallel. The results are then synthezied via the {\em DECO} and {\em message} algorithm in a reverse order to produce the final output. The whole framework is extremely scalable.