896 resultados para Nonparametric regression
Resumo:
In this paper, we study panel count data with informative observation times. We assume nonparametric and semiparametric proportional rate models for the underlying recurrent event process, where the form of the baseline rate function is left unspecified and a subject-specific frailty variable inflates or deflates the rate function multiplicatively. The proposed models allow the recurrent event processes and observation times to be correlated through their connections with the unobserved frailty; moreover, the distributions of both the frailty variable and observation times are considered as nuisance parameters. The baseline rate function and the regression parameters are estimated by maximizing a conditional likelihood function of observed event counts and solving estimation equations. Large sample properties of the proposed estimators are studied. Numerical studies demonstrate that the proposed estimation procedures perform well for moderate sample sizes. An application to a bladder tumor study is presented to illustrate the use of the proposed methods.
Resumo:
Latent class regression models are useful tools for assessing associations between covariates and latent variables. However, evaluation of key model assumptions cannot be performed using methods from standard regression models due to the unobserved nature of latent outcome variables. This paper presents graphical diagnostic tools to evaluate whether or not latent class regression models adhere to standard assumptions of the model: conditional independence and non-differential measurement. An integral part of these methods is the use of a Markov Chain Monte Carlo estimation procedure. Unlike standard maximum likelihood implementations for latent class regression model estimation, the MCMC approach allows us to calculate posterior distributions and point estimates of any functions of parameters. It is this convenience that allows us to provide the diagnostic methods that we introduce. As a motivating example we present an analysis focusing on the association between depression and socioeconomic status, using data from the Epidemiologic Catchment Area study. We consider a latent class regression analysis investigating the association between depression and socioeconomic status measures, where the latent variable depression is regressed on education and income indicators, in addition to age, gender, and marital status variables. While the fitted latent class regression model yields interesting results, the model parameters are found to be invalid due to the violation of model assumptions. The violation of these assumptions is clearly identified by the presented diagnostic plots. These methods can be applied to standard latent class and latent class regression models, and the general principle can be extended to evaluate model assumptions in other types of models.
Resumo:
We develop fast fitting methods for generalized functional linear models. An undersmooth of the functional predictor is obtained by projecting on a large number of smooth eigenvectors and the coefficient function is estimated using penalized spline regression. Our method can be applied to many functional data designs including functions measured with and without error, sparsely or densely sampled. The methods also extend to the case of multiple functional predictors or functional predictors with a natural multilevel structure. Our approach can be implemented using standard mixed effects software and is computationally fast. Our methodology is motivated by a diffusion tensor imaging (DTI) study. The aim of this study is to analyze differences between various cerebral white matter tract property measurements of multiple sclerosis (MS) patients and controls. While the statistical developments proposed here were motivated by the DTI study, the methodology is designed and presented in generality and is applicable to many other areas of scientific research. An online appendix provides R implementations of all simulations.
Resumo:
Permutation tests are useful for drawing inferences from imaging data because of their flexibility and ability to capture features of the brain that are difficult to capture parametrically. However, most implementations of permutation tests ignore important confounding covariates. To employ covariate control in a nonparametric setting we have developed a Markov chain Monte Carlo (MCMC) algorithm for conditional permutation testing using propensity scores. We present the first use of this methodology for imaging data. Our MCMC algorithm is an extension of algorithms developed to approximate exact conditional probabilities in contingency tables, logit, and log-linear models. An application of our non-parametric method to remove potential bias due to the observed covariates is presented.
Resumo:
This paper considers statistical models in which two different types of events, such as the diagnosis of a disease and the remission of the disease, occur alternately over time and are observed subject to right censoring. We propose nonparametric estimators for the joint distribution of bivariate recurrence times and the marginal distribution of the first recurrence time. In general, the marginal distribution of the second recurrence time cannot be estimated due to an identifiability problem, but a conditional distribution of the second recurrence time can be estimated non-parametrically. In literature, statistical methods have been developed to estimate the joint distribution of bivariate recurrence times based on data of the first pair of censored bivariate recurrence times. These methods are efficient in the current model because recurrence times of higher orders are not used. Asymptotic properties of the estimators are established. Numerical studies demonstrate the estimator performs well with practical sample sizes. We apply the proposed method to a Denmark psychiatric case register data set for illustration of the methods and theory.
Resumo:
Northern hardwood management was assessed throughout the state of Michigan using data collected on recently harvested stands in 2010 and 2011. Methods of forensic estimation of diameter at breast height were compared and an ideal, localized equation form was selected for use in reconstructing pre-harvest stand structures. Comparisons showed differences in predictive ability among available equation forms which led to substantial financial differences when used to estimate the value of removed timber. Management on all stands was then compared among state, private, and corporate landowners. Comparisons of harvest intensities against a liberal interpretation of a well-established management guideline showed that approximately one third of harvests were conducted in a manner which may imply that the guideline was followed. One third showed higher levels of removals than recommended, and one third of harvests were less intensive than recommended. Multiple management guidelines and postulated objectives were then synthesized into a novel system of harvest taxonomy, against which all harvests were compared. This further comparison showed approximately the same proportions of harvests, while distinguishing sanitation cuts and the future productive potential of harvests cut more intensely than suggested by guidelines. Stand structures are commonly represented using diameter distributions. Parametric and nonparametric techniques for describing diameter distributions were employed on pre-harvest and post-harvest data. A common polynomial regression procedure was found to be highly sensitive to the method of histogram construction which provides the data points for the regression. The discriminative ability of kernel density estimation was substantially different from that of the polynomial regression technique.
Resumo:
Background mortality is an essential component of any forest growth and yield model. Forecasts of mortality contribute largely to the variability and accuracy of model predictions at the tree, stand and forest level. In the present study, I implement and evaluate state-of-the-art techniques to increase the accuracy of individual tree mortality models, similar to those used in many of the current variants of the Forest Vegetation Simulator, using data from North Idaho and Montana. The first technique addresses methods to correct for bias induced by measurement error typically present in competition variables. The second implements survival regression and evaluates its performance against the traditional logistic regression approach. I selected the regression calibration (RC) algorithm as a good candidate for addressing the measurement error problem. Two logistic regression models for each species were fitted, one ignoring the measurement error, which is the “naïve” approach, and the other applying RC. The models fitted with RC outperformed the naïve models in terms of discrimination when the competition variable was found to be statistically significant. The effect of RC was more obvious where measurement error variance was large and for more shade-intolerant species. The process of model fitting and variable selection revealed that past emphasis on DBH as a predictor variable for mortality, while producing models with strong metrics of fit, may make models less generalizable. The evaluation of the error variance estimator developed by Stage and Wykoff (1998), and core to the implementation of RC, in different spatial patterns and diameter distributions, revealed that the Stage and Wykoff estimate notably overestimated the true variance in all simulated stands, but those that are clustered. Results show a systematic bias even when all the assumptions made by the authors are guaranteed. I argue that this is the result of the Poisson-based estimate ignoring the overlapping area of potential plots around a tree. Effects, especially in the application phase, of the variance estimate justify suggested future efforts of improving the accuracy of the variance estimate. The second technique implemented and evaluated is a survival regression model that accounts for the time dependent nature of variables, such as diameter and competition variables, and the interval-censored nature of data collected from remeasured plots. The performance of the model is compared with the traditional logistic regression model as a tool to predict individual tree mortality. Validation of both approaches shows that the survival regression approach discriminates better between dead and alive trees for all species. In conclusion, I showed that the proposed techniques do increase the accuracy of individual tree mortality models, and are a promising first step towards the next generation of background mortality models. I have also identified the next steps to undertake in order to advance mortality models further.
Resumo:
In this thesis, we consider Bayesian inference on the detection of variance change-point models with scale mixtures of normal (for short SMN) distributions. This class of distributions is symmetric and thick-tailed and includes as special cases: Gaussian, Student-t, contaminated normal, and slash distributions. The proposed models provide greater flexibility to analyze a lot of practical data, which often show heavy-tail and may not satisfy the normal assumption. As to the Bayesian analysis, we specify some prior distributions for the unknown parameters in the variance change-point models with the SMN distributions. Due to the complexity of the joint posterior distribution, we propose an efficient Gibbs-type with Metropolis- Hastings sampling algorithm for posterior Bayesian inference. Thereafter, following the idea of [1], we consider the problems of the single and multiple change-point detections. The performance of the proposed procedures is illustrated and analyzed by simulation studies. A real application to the closing price data of U.S. stock market has been analyzed for illustrative purposes.
Resumo:
This morning Dr. Battle will introduce descriptive statistics and linear regression and how to apply these concepts in mathematical modeling. You will also learn how to use a spreadsheet to help with statistical analysis and to create graphs.
Resumo:
OBJECTIVES: This paper is concerned with checking goodness-of-fit of binary logistic regression models. For the practitioners of data analysis, the broad classes of procedures for checking goodness-of-fit available in the literature are described. The challenges of model checking in the context of binary logistic regression are reviewed. As a viable solution, a simple graphical procedure for checking goodness-of-fit is proposed. METHODS: The graphical procedure proposed relies on pieces of information available from any logistic analysis; the focus is on combining and presenting these in an informative way. RESULTS: The information gained using this approach is presented with three examples. In the discussion, the proposed method is put into context and compared with other graphical procedures for checking goodness-of-fit of binary logistic models available in the literature. CONCLUSION: A simple graphical method can significantly improve the understanding of any logistic regression analysis and help to prevent faulty conclusions.
Resumo:
A combinatorial protocol (CP) is introduced here to interface it with the multiple linear regression (MLR) for variable selection. The efficiency of CP-MLR is primarily based on the restriction of entry of correlated variables to the model development stage. It has been used for the analysis of Selwood et al data set [16], and the obtained models are compared with those reported from GFA [8] and MUSEUM [9] approaches. For this data set CP-MLR could identify three highly independent models (27, 28 and 31) with Q2 value in the range of 0.632-0.518. Also, these models are divergent and unique. Even though, the present study does not share any models with GFA [8], and MUSEUM [9] results, there are several descriptors common to all these studies, including the present one. Also a simulation is carried out on the same data set to explain the model formation in CP-MLR. The results demonstrate that the proposed method should be able to offer solutions to data sets with 50 to 60 descriptors in reasonable time frame. By carefully selecting the inter-parameter correlation cutoff values in CP-MLR one can identify divergent models and handle data sets larger than the present one without involving excessive computer time.