7 resultados para Indicators. Conversions. Quantitative Research. Logistic Regression
em Collection Of Biostatistics Research Archive
Resumo:
In epidemiological work, outcomes are frequently non-normal, sample sizes may be large, and effects are often small. To relate health outcomes to geographic risk factors, fast and powerful methods for fitting spatial models, particularly for non-normal data, are required. We focus on binary outcomes, with the risk surface a smooth function of space. We compare penalized likelihood models, including the penalized quasi-likelihood (PQL) approach, and Bayesian models based on fit, speed, and ease of implementation. A Bayesian model using a spectral basis representation of the spatial surface provides the best tradeoff of sensitivity and specificity in simulations, detecting real spatial features while limiting overfitting and being more efficient computationally than other Bayesian approaches. One of the contributions of this work is further development of this underused representation. The spectral basis model outperforms the penalized likelihood methods, which are prone to overfitting, but is slower to fit and not as easily implemented. Conclusions based on a real dataset of cancer cases in Taiwan are similar albeit less conclusive with respect to comparing the approaches. The success of the spectral basis with binary data and similar results with count data suggest that it may be generally useful in spatial models and more complicated hierarchical models.
Resumo:
Assessments of environmental and territorial justice are similar in that both assess whether empirical relations between the spatial arrangement of undesirable hazards (or desirable public goods and services) and socio-demographic groups are consistent with notions of social justice, evaluating the spatial distribution of benefits and burdens (outcome equity) and the process that produces observed differences (process equity. Using proximity to major highways in NYC as a case study, we review methodological issues pertinent to both fields and discuss choice and computation of exposure measures, but focus primarily on measures of inequity. We present inequity measures computed from the empirically estimated joint distribution of exposure and demographics and compare them to traditional measures such as linear regression, logistic regression and Theil’s entropy index. We find that measures computed from the full joint distribution provide more unified, transparent and intuitive operational definitions of inequity and show how the approach can be used to structure siting and decommissioning decisions.
Resumo:
A marker that is strongly associated with outcome (or disease) is often assumed to be effective for classifying individuals according to their current or future outcome. However, for this to be true, the associated odds ratio must be of a magnitude rarely seen in epidemiological studies. An illustration of the relationship between odds ratios and receiver operating characteristic (ROC) curves shows, for example, that a marker with an odds ratio as high as 3 is in fact a very poor classification tool. If a marker identifies 10 percent of controls as positive (false positives) and has an odds ratio of 3, then it will only correctly identify 25 percent of cases as positive (true positives). Moreover, the authors illustrate that a single measure of association such as an odds ratio does not meaningfully describe a marker’s ability to classify subjects. Appropriate statistical methods for assessing and reporting the classification power of a marker are described. The serious pitfalls of using more traditional methods based on parameters in logistic regression models are illustrated.
Resumo:
This paper considers a wide class of semiparametric problems with a parametric part for some covariate effects and repeated evaluations of a nonparametric function. Special cases in our approach include marginal models for longitudinal/clustered data, conditional logistic regression for matched case-control studies, multivariate measurement error models, generalized linear mixed models with a semiparametric component, and many others. We propose profile-kernel and backfitting estimation methods for these problems, derive their asymptotic distributions, and show that in likelihood problems the methods are semiparametric efficient. While generally not true, with our methods profiling and backfitting are asymptotically equivalent. We also consider pseudolikelihood methods where some nuisance parameters are estimated from a different algorithm. The proposed methods are evaluated using simulation studies and applied to the Kenya hemoglobin data.
Resumo:
Generalized linear mixed models (GLMMs) provide an elegant framework for the analysis of correlated data. Due to the non-closed form of the likelihood, GLMMs are often fit by computational procedures like penalized quasi-likelihood (PQL). Special cases of these models are generalized linear models (GLMs), which are often fit using algorithms like iterative weighted least squares (IWLS). High computational costs and memory space constraints often make it difficult to apply these iterative procedures to data sets with very large number of cases. This paper proposes a computationally efficient strategy based on the Gauss-Seidel algorithm that iteratively fits sub-models of the GLMM to subsetted versions of the data. Additional gains in efficiency are achieved for Poisson models, commonly used in disease mapping problems, because of their special collapsibility property which allows data reduction through summaries. Convergence of the proposed iterative procedure is guaranteed for canonical link functions. The strategy is applied to investigate the relationship between ischemic heart disease, socioeconomic status and age/gender category in New South Wales, Australia, based on outcome data consisting of approximately 33 million records. A simulation study demonstrates the algorithm's reliability in analyzing a data set with 12 million records for a (non-collapsible) logistic regression model.
Resumo:
We introduce a diagnostic test for the mixing distribution in a generalised linear mixed model. The test is based on the difference between the marginal maximum likelihood and conditional maximum likelihood estimates of a subset of the fixed effects in the model. We derive the asymptotic variance of this difference, and propose a test statistic that has a limiting chi-square distribution under the null hypothesis that the mixing distribution is correctly specified. For the important special case of the logistic regression model with random intercepts, we evaluate via simulation the power of the test in finite samples under several alternative distributional forms for the mixing distribution. We illustrate the method by applying it to data from a clinical trial investigating the effects of hormonal contraceptives in women.