2 resultados para LOGISTIC REGRESSION WITH STATE-DEPENDENT SAMPLE SELECTION
em Digital Commons - Michigan Tech
Resumo:
Background mortality is an essential component of any forest growth and yield model. Forecasts of mortality contribute largely to the variability and accuracy of model predictions at the tree, stand and forest level. In the present study, I implement and evaluate state-of-the-art techniques to increase the accuracy of individual tree mortality models, similar to those used in many of the current variants of the Forest Vegetation Simulator, using data from North Idaho and Montana. The first technique addresses methods to correct for bias induced by measurement error typically present in competition variables. The second implements survival regression and evaluates its performance against the traditional logistic regression approach. I selected the regression calibration (RC) algorithm as a good candidate for addressing the measurement error problem. Two logistic regression models for each species were fitted, one ignoring the measurement error, which is the “naïve” approach, and the other applying RC. The models fitted with RC outperformed the naïve models in terms of discrimination when the competition variable was found to be statistically significant. The effect of RC was more obvious where measurement error variance was large and for more shade-intolerant species. The process of model fitting and variable selection revealed that past emphasis on DBH as a predictor variable for mortality, while producing models with strong metrics of fit, may make models less generalizable. The evaluation of the error variance estimator developed by Stage and Wykoff (1998), and core to the implementation of RC, in different spatial patterns and diameter distributions, revealed that the Stage and Wykoff estimate notably overestimated the true variance in all simulated stands, but those that are clustered. Results show a systematic bias even when all the assumptions made by the authors are guaranteed. I argue that this is the result of the Poisson-based estimate ignoring the overlapping area of potential plots around a tree. Effects, especially in the application phase, of the variance estimate justify suggested future efforts of improving the accuracy of the variance estimate. The second technique implemented and evaluated is a survival regression model that accounts for the time dependent nature of variables, such as diameter and competition variables, and the interval-censored nature of data collected from remeasured plots. The performance of the model is compared with the traditional logistic regression model as a tool to predict individual tree mortality. Validation of both approaches shows that the survival regression approach discriminates better between dead and alive trees for all species. In conclusion, I showed that the proposed techniques do increase the accuracy of individual tree mortality models, and are a promising first step towards the next generation of background mortality models. I have also identified the next steps to undertake in order to advance mortality models further.
Resumo:
Several deterministic and probabilistic methods are used to evaluate the probability of seismically induced liquefaction of a soil. The probabilistic models usually possess some uncertainty in that model and uncertainties in the parameters used to develop that model. These model uncertainties vary from one statistical model to another. Most of the model uncertainties are epistemic, and can be addressed through appropriate knowledge of the statistical model. One such epistemic model uncertainty in evaluating liquefaction potential using a probabilistic model such as logistic regression is sampling bias. Sampling bias is the difference between the class distribution in the sample used for developing the statistical model and the true population distribution of liquefaction and non-liquefaction instances. Recent studies have shown that sampling bias can significantly affect the predicted probability using a statistical model. To address this epistemic uncertainty, a new approach was developed for evaluating the probability of seismically-induced soil liquefaction, in which a logistic regression model in combination with Hosmer-Lemeshow statistic was used. This approach was used to estimate the population (true) distribution of liquefaction to non-liquefaction instances of standard penetration test (SPT) and cone penetration test (CPT) based most updated case histories. Apart from this, other model uncertainties such as distribution of explanatory variables and significance of explanatory variables were also addressed using KS test and Wald statistic respectively. Moreover, based on estimated population distribution, logistic regression equations were proposed to calculate the probability of liquefaction for both SPT and CPT based case history. Additionally, the proposed probability curves were compared with existing probability curves based on SPT and CPT case histories.