915 resultados para Logistic regression model
Resumo:
We introduce a diagnostic test for the mixing distribution in a generalised linear mixed model. The test is based on the difference between the marginal maximum likelihood and conditional maximum likelihood estimates of a subset of the fixed effects in the model. We derive the asymptotic variance of this difference, and propose a test statistic that has a limiting chi-square distribution under the null hypothesis that the mixing distribution is correctly specified. For the important special case of the logistic regression model with random intercepts, we evaluate via simulation the power of the test in finite samples under several alternative distributional forms for the mixing distribution. We illustrate the method by applying it to data from a clinical trial investigating the effects of hormonal contraceptives in women.
Resumo:
Background mortality is an essential component of any forest growth and yield model. Forecasts of mortality contribute largely to the variability and accuracy of model predictions at the tree, stand and forest level. In the present study, I implement and evaluate state-of-the-art techniques to increase the accuracy of individual tree mortality models, similar to those used in many of the current variants of the Forest Vegetation Simulator, using data from North Idaho and Montana. The first technique addresses methods to correct for bias induced by measurement error typically present in competition variables. The second implements survival regression and evaluates its performance against the traditional logistic regression approach. I selected the regression calibration (RC) algorithm as a good candidate for addressing the measurement error problem. Two logistic regression models for each species were fitted, one ignoring the measurement error, which is the “naïve” approach, and the other applying RC. The models fitted with RC outperformed the naïve models in terms of discrimination when the competition variable was found to be statistically significant. The effect of RC was more obvious where measurement error variance was large and for more shade-intolerant species. The process of model fitting and variable selection revealed that past emphasis on DBH as a predictor variable for mortality, while producing models with strong metrics of fit, may make models less generalizable. The evaluation of the error variance estimator developed by Stage and Wykoff (1998), and core to the implementation of RC, in different spatial patterns and diameter distributions, revealed that the Stage and Wykoff estimate notably overestimated the true variance in all simulated stands, but those that are clustered. Results show a systematic bias even when all the assumptions made by the authors are guaranteed. I argue that this is the result of the Poisson-based estimate ignoring the overlapping area of potential plots around a tree. Effects, especially in the application phase, of the variance estimate justify suggested future efforts of improving the accuracy of the variance estimate. The second technique implemented and evaluated is a survival regression model that accounts for the time dependent nature of variables, such as diameter and competition variables, and the interval-censored nature of data collected from remeasured plots. The performance of the model is compared with the traditional logistic regression model as a tool to predict individual tree mortality. Validation of both approaches shows that the survival regression approach discriminates better between dead and alive trees for all species. In conclusion, I showed that the proposed techniques do increase the accuracy of individual tree mortality models, and are a promising first step towards the next generation of background mortality models. I have also identified the next steps to undertake in order to advance mortality models further.
Resumo:
In this thesis, we consider Bayesian inference on the detection of variance change-point models with scale mixtures of normal (for short SMN) distributions. This class of distributions is symmetric and thick-tailed and includes as special cases: Gaussian, Student-t, contaminated normal, and slash distributions. The proposed models provide greater flexibility to analyze a lot of practical data, which often show heavy-tail and may not satisfy the normal assumption. As to the Bayesian analysis, we specify some prior distributions for the unknown parameters in the variance change-point models with the SMN distributions. Due to the complexity of the joint posterior distribution, we propose an efficient Gibbs-type with Metropolis- Hastings sampling algorithm for posterior Bayesian inference. Thereafter, following the idea of [1], we consider the problems of the single and multiple change-point detections. The performance of the proposed procedures is illustrated and analyzed by simulation studies. A real application to the closing price data of U.S. stock market has been analyzed for illustrative purposes.
Resumo:
This study investigates the degree to which gender, ethnicity, relationship to perpetrator, and geomapped socio-economic factors significantly predict the incidence of childhood sexual abuse, physical abuse and non- abuse. These variables are then linked to geographic identifiers using geographic information system (GIS) technology to develop a geo-mapping framework for child sexual and physical abuse prevention.
Resumo:
Objectives. This paper seeks to assess the effect on statistical power of regression model misspecification in a variety of situations. ^ Methods and results. The effect of misspecification in regression can be approximated by evaluating the correlation between the correct specification and the misspecification of the outcome variable (Harris 2010).In this paper, three misspecified models (linear, categorical and fractional polynomial) were considered. In the first section, the mathematical method of calculating the correlation between correct and misspecified models with simple mathematical forms was derived and demonstrated. In the second section, data from the National Health and Nutrition Examination Survey (NHANES 2007-2008) were used to examine such correlations. Our study shows that comparing to linear or categorical models, the fractional polynomial models, with the higher correlations, provided a better approximation of the true relationship, which was illustrated by LOESS regression. In the third section, we present the results of simulation studies that demonstrate overall misspecification in regression can produce marked decreases in power with small sample sizes. However, the categorical model had greatest power, ranging from 0.877 to 0.936 depending on sample size and outcome variable used. The power of fractional polynomial model was close to that of linear model, which ranged from 0.69 to 0.83, and appeared to be affected by the increased degrees of freedom of this model.^ Conclusion. Correlations between alternative model specifications can be used to provide a good approximation of the effect on statistical power of misspecification when the sample size is large. When model specifications have known simple mathematical forms, such correlations can be calculated mathematically. Actual public health data from NHANES 2007-2008 were used as examples to demonstrate the situations with unknown or complex correct model specification. Simulation of power for misspecified models confirmed the results based on correlation methods but also illustrated the effect of model degrees of freedom on power.^
Resumo:
The standard analyses of survival data involve the assumption that survival and censoring are independent. When censoring and survival are related, the phenomenon is known as informative censoring. This paper examines the effects of an informative censoring assumption on the hazard function and the estimated hazard ratio provided by the Cox model.^ The limiting factor in all analyses of informative censoring is the problem of non-identifiability. Non-identifiability implies that it is impossible to distinguish a situation in which censoring and death are independent from one in which there is dependence. However, it is possible that informative censoring occurs. Examination of the literature indicates how others have approached the problem and covers the relevant theoretical background.^ Three models are examined in detail. The first model uses conditionally independent marginal hazards to obtain the unconditional survival function and hazards. The second model is based on the Gumbel Type A method for combining independent marginal distributions into bivariate distributions using a dependency parameter. Finally, a formulation based on a compartmental model is presented and its results described. For the latter two approaches, the resulting hazard is used in the Cox model in a simulation study.^ The unconditional survival distribution formed from the first model involves dependency, but the crude hazard resulting from this unconditional distribution is identical to the marginal hazard, and inferences based on the hazard are valid. The hazard ratios formed from two distributions following the Gumbel Type A model are biased by a factor dependent on the amount of censoring in the two populations and the strength of the dependency of death and censoring in the two populations. The Cox model estimates this biased hazard ratio. In general, the hazard resulting from the compartmental model is not constant, even if the individual marginal hazards are constant, unless censoring is non-informative. The hazard ratio tends to a specific limit.^ Methods of evaluating situations in which informative censoring is present are described, and the relative utility of the three models examined is discussed. ^
Resumo:
Logistic regression is one of the most important tools in the analysis of epidemiological and clinical data. Such data often contain missing values for one or more variables. Common practice is to eliminate all individuals for whom any information is missing. This deletion approach does not make efficient use of available information and often introduces bias.^ Two methods were developed to estimate logistic regression coefficients for mixed dichotomous and continuous covariates including partially observed binary covariates. The data were assumed missing at random (MAR). One method (PD) used predictive distribution as weight to calculate the average of the logistic regressions performing on all possible values of missing observations, and the second method (RS) used a variant of resampling technique. Additional seven methods were compared with these two approaches in a simulation study. They are: (1) Analysis based on only the complete cases, (2) Substituting the mean of the observed values for the missing value, (3) An imputation technique based on the proportions of observed data, (4) Regressing the partially observed covariates on the remaining continuous covariates, (5) Regressing the partially observed covariates on the remaining continuous covariates conditional on response variable, (6) Regressing the partially observed covariates on the remaining continuous covariates and response variable, and (7) EM algorithm. Both proposed methods showed smaller standard errors (s.e.) for the coefficient involving the partially observed covariate and for the other coefficients as well. However, both methods, especially PD, are computationally demanding; thus for analysis of large data sets with partially observed covariates, further refinement of these approaches is needed. ^
Resumo:
The problem of analyzing data with updated measurements in the time-dependent proportional hazards model arises frequently in practice. One available option is to reduce the number of intervals (or updated measurements) to be included in the Cox regression model. We empirically investigated the bias of the estimator of the time-dependent covariate while varying the effect of failure rate, sample size, true values of the parameters and the number of intervals. We also evaluated how often a time-dependent covariate needs to be collected and assessed the effect of sample size and failure rate on the power of testing a time-dependent effect.^ A time-dependent proportional hazards model with two binary covariates was considered. The time axis was partitioned into k intervals. The baseline hazard was assumed to be 1 so that the failure times were exponentially distributed in the ith interval. A type II censoring model was adopted to characterize the failure rate. The factors of interest were sample size (500, 1000), type II censoring with failure rates of 0.05, 0.10, and 0.20, and three values for each of the non-time-dependent and time-dependent covariates (1/4,1/2,3/4).^ The mean of the bias of the estimator of the coefficient of the time-dependent covariate decreased as sample size and number of intervals increased whereas the mean of the bias increased as failure rate and true values of the covariates increased. The mean of the bias of the estimator of the coefficient was smallest when all of the updated measurements were used in the model compared with two models that used selected measurements of the time-dependent covariate. For the model that included all the measurements, the coverage rates of the estimator of the coefficient of the time-dependent covariate was in most cases 90% or more except when the failure rate was high (0.20). The power associated with testing a time-dependent effect was highest when all of the measurements of the time-dependent covariate were used. An example from the Systolic Hypertension in the Elderly Program Cooperative Research Group is presented. ^
Resumo:
This paper addresses the question of maximizing classifier accuracy for classifying task-related mental activity from Magnetoencelophalography (MEG) data. We propose the use of different sources of information and introduce an automatic channel selection procedure. To determine an informative set of channels, our approach combines a variety of machine learning algorithms: feature subset selection methods, classifiers based on regularized logistic regression, information fusion, and multiobjective optimization based on probabilistic modeling of the search space. The experimental results show that our proposal is able to improve classification accuracy compared to approaches whose classifiers use only one type of MEG information or for which the set of channels is fixed a priori.
Resumo:
This paper analyses the relationship between productive efficiency and online-social-networks (OSN) in Spanish telecommunications firms. A data-envelopment-analysis (DEA) is used and several indicators of business ?social Media? activities are incorporated. A super-efficiency analysis and bootstrapping techniques are performed to increase the model?s robustness and accuracy. Then, a logistic regression model is applied to characterise factors and drivers of good performance in OSN. Results reveal the company?s ability to absorb and utilise OSNs as a key factor in improving the productive efficiency. This paper presents a model for assessing the strategic performance of the presence and activity in OSN.
Resumo:
logitcprplot can be used after logistic regression for graphing a component-plus-residual plot (a.k.a. partial residual plot) for a given predictor, including a lowess, local polynomial, restricted cubic spline, fractional polynomial, penalized spline, regression spline, running line, or adaptive variable span running line smooth
Resumo:
rrlogit fits a maximum-likelihood logistic regression for randomized response data.
Resumo:
Also issued as thesis (M.S.) University of Illinois.
Finite mixture regression model with random effects: application to neonatal hospital length of stay
Resumo:
A two-component mixture regression model that allows simultaneously for heterogeneity and dependency among observations is proposed. By specifying random effects explicitly in the linear predictor of the mixture probability and the mixture components, parameter estimation is achieved by maximising the corresponding best linear unbiased prediction type log-likelihood. Approximate residual maximum likelihood estimates are obtained via an EM algorithm in the manner of generalised linear mixed model (GLMM). The method can be extended to a g-component mixture regression model with the component density from the exponential family, leading to the development of the class of finite mixture GLMM. For illustration, the method is applied to analyse neonatal length of stay (LOS). It is shown that identification of pertinent factors that influence hospital LOS can provide important information for health care planning and resource allocation. (C) 2002 Elsevier Science B.V. All rights reserved.