986 resultados para switching regression model


Relevância:

90.00% 90.00%

Publicador:

Resumo:

We introduce a diagnostic test for the mixing distribution in a generalised linear mixed model. The test is based on the difference between the marginal maximum likelihood and conditional maximum likelihood estimates of a subset of the fixed effects in the model. We derive the asymptotic variance of this difference, and propose a test statistic that has a limiting chi-square distribution under the null hypothesis that the mixing distribution is correctly specified. For the important special case of the logistic regression model with random intercepts, we evaluate via simulation the power of the test in finite samples under several alternative distributional forms for the mixing distribution. We illustrate the method by applying it to data from a clinical trial investigating the effects of hormonal contraceptives in women.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Latent class regression models are useful tools for assessing associations between covariates and latent variables. However, evaluation of key model assumptions cannot be performed using methods from standard regression models due to the unobserved nature of latent outcome variables. This paper presents graphical diagnostic tools to evaluate whether or not latent class regression models adhere to standard assumptions of the model: conditional independence and non-differential measurement. An integral part of these methods is the use of a Markov Chain Monte Carlo estimation procedure. Unlike standard maximum likelihood implementations for latent class regression model estimation, the MCMC approach allows us to calculate posterior distributions and point estimates of any functions of parameters. It is this convenience that allows us to provide the diagnostic methods that we introduce. As a motivating example we present an analysis focusing on the association between depression and socioeconomic status, using data from the Epidemiologic Catchment Area study. We consider a latent class regression analysis investigating the association between depression and socioeconomic status measures, where the latent variable depression is regressed on education and income indicators, in addition to age, gender, and marital status variables. While the fitted latent class regression model yields interesting results, the model parameters are found to be invalid due to the violation of model assumptions. The violation of these assumptions is clearly identified by the presented diagnostic plots. These methods can be applied to standard latent class and latent class regression models, and the general principle can be extended to evaluate model assumptions in other types of models.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Background mortality is an essential component of any forest growth and yield model. Forecasts of mortality contribute largely to the variability and accuracy of model predictions at the tree, stand and forest level. In the present study, I implement and evaluate state-of-the-art techniques to increase the accuracy of individual tree mortality models, similar to those used in many of the current variants of the Forest Vegetation Simulator, using data from North Idaho and Montana. The first technique addresses methods to correct for bias induced by measurement error typically present in competition variables. The second implements survival regression and evaluates its performance against the traditional logistic regression approach. I selected the regression calibration (RC) algorithm as a good candidate for addressing the measurement error problem. Two logistic regression models for each species were fitted, one ignoring the measurement error, which is the “naïve” approach, and the other applying RC. The models fitted with RC outperformed the naïve models in terms of discrimination when the competition variable was found to be statistically significant. The effect of RC was more obvious where measurement error variance was large and for more shade-intolerant species. The process of model fitting and variable selection revealed that past emphasis on DBH as a predictor variable for mortality, while producing models with strong metrics of fit, may make models less generalizable. The evaluation of the error variance estimator developed by Stage and Wykoff (1998), and core to the implementation of RC, in different spatial patterns and diameter distributions, revealed that the Stage and Wykoff estimate notably overestimated the true variance in all simulated stands, but those that are clustered. Results show a systematic bias even when all the assumptions made by the authors are guaranteed. I argue that this is the result of the Poisson-based estimate ignoring the overlapping area of potential plots around a tree. Effects, especially in the application phase, of the variance estimate justify suggested future efforts of improving the accuracy of the variance estimate. The second technique implemented and evaluated is a survival regression model that accounts for the time dependent nature of variables, such as diameter and competition variables, and the interval-censored nature of data collected from remeasured plots. The performance of the model is compared with the traditional logistic regression model as a tool to predict individual tree mortality. Validation of both approaches shows that the survival regression approach discriminates better between dead and alive trees for all species. In conclusion, I showed that the proposed techniques do increase the accuracy of individual tree mortality models, and are a promising first step towards the next generation of background mortality models. I have also identified the next steps to undertake in order to advance mortality models further.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Ecosystems are faced with high rates of species loss which has consequences for their functions and services. To assess the effects of plant species diversity on the nitrogen (N) cycle, we developed a model for monthly mean nitrate (NO3-N) concentrations in soil solution in 0-30 cm mineral soil depth using plant species and functional group richness and functional composition as drivers and assessing the effects of conversion of arable land to grassland, spatially heterogeneous soil properties, and climate. We used monthly mean NO3-N concentrations from 62 plots of a grassland plant diversity experiment from 2003 to 2006. Plant species richness (1-60) and functional group composition (1-4 functional groups: legumes, grasses, non-leguminous tall herbs, non-leguminous small herbs) were manipulated in a factorial design. Plant community composition, time since conversion from arable land to grassland, soil texture, and climate data (precipitation, soil moisture, air and soil temperature) were used to develop one general Bayesian multiple regression model for the 62 plots to allow an in-depth evaluation using the experimental design. The model simulated NO3-N concentrations with an overall Bayesian coefficient of determination of 0.48. The temporal course of NO3-N concentrations was simulated differently well for the individual plots with a maximum plot-specific Nash-Sutcliffe Efficiency of 0.57. The model shows that NO3-N concentrations decrease with species richness, but this relation reverses if more than approx. 25 % of legume species are included in the mixture. Presence of legumes increases and presence of grasses decreases NO3-N concentrations compared to mixtures containing only small and tall herbs. Altogether, our model shows that there is a strong influence of plant community composition on NO3-N concentrations.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Consider a nonparametric regression model Y=mu*(X) + e, where the explanatory variables X are endogenous and e satisfies the conditional moment restriction E[e|W]=0 w.p.1 for instrumental variables W. It is well known that in these models the structural parameter mu* is 'ill-posed' in the sense that the function mapping the data to mu* is not continuous. In this paper, we derive the efficiency bounds for estimating linear functionals E[p(X)mu*(X)] and int_{supp(X)}p(x)mu*(x)dx, where p is a known weight function and supp(X) the support of X, without assuming mu* to be well-posed or even identified.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Strategies are compared for the development of a linear regression model with stochastic (multivariate normal) regressor variables and the subsequent assessment of its predictive ability. Bias and mean squared error of four estimators of predictive performance are evaluated in simulated samples of 32 population correlation matrices. Models including all of the available predictors are compared with those obtained using selected subsets. The subset selection procedures investigated include two stopping rules, C$\sb{\rm p}$ and S$\sb{\rm p}$, each combined with an 'all possible subsets' or 'forward selection' of variables. The estimators of performance utilized include parametric (MSEP$\sb{\rm m}$) and non-parametric (PRESS) assessments in the entire sample, and two data splitting estimates restricted to a random or balanced (Snee's DUPLEX) 'validation' half sample. The simulations were performed as a designed experiment, with population correlation matrices representing a broad range of data structures.^ The techniques examined for subset selection do not generally result in improved predictions relative to the full model. Approaches using 'forward selection' result in slightly smaller prediction errors and less biased estimators of predictive accuracy than 'all possible subsets' approaches but no differences are detected between the performances of C$\sb{\rm p}$ and S$\sb{\rm p}$. In every case, prediction errors of models obtained by subset selection in either of the half splits exceed those obtained using all predictors and the entire sample.^ Only the random split estimator is conditionally (on $\\beta$) unbiased, however MSEP$\sb{\rm m}$ is unbiased on average and PRESS is nearly so in unselected (fixed form) models. When subset selection techniques are used, MSEP$\sb{\rm m}$ and PRESS always underestimate prediction errors, by as much as 27 percent (on average) in small samples. Despite their bias, the mean squared errors (MSE) of these estimators are at least 30 percent less than that of the unbiased random split estimator. The DUPLEX split estimator suffers from large MSE as well as bias, and seems of little value within the context of stochastic regressor variables.^ To maximize predictive accuracy while retaining a reliable estimate of that accuracy, it is recommended that the entire sample be used for model development, and a leave-one-out statistic (e.g. PRESS) be used for assessment. ^

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Traditional comparison of standardized mortality ratios (SMRs) can be misleading if the age-specific mortality ratios are not homogeneous. For this reason, a regression model has been developed which incorporates the mortality ratio as a function of age. This model is then applied to mortality data from an occupational cohort study. The nature of the occupational data necessitates the investigation of mortality ratios which increase with age. These occupational data are used primarily to illustrate and develop the statistical methodology.^ The age-specific mortality ratio (MR) for the covariates of interest can be written as MR(,ij...m) = ((mu)(,ij...m)/(theta)(,ij...m)) = r(.)exp (Z('')(,ij...m)(beta)) where (mu)(,ij...m) and (theta)(,ij...m) denote the force of mortality in the study and chosen standard populations in the ij...m('th) stratum, respectively, r is the intercept, Z(,ij...m) is the vector of covariables associated with the i('th) age interval, and (beta) is a vector of regression coefficients associated with these covariables. A Newton-Raphson iterative procedure has been used for determining the maximum likelihood estimates of the regression coefficients.^ This model provides a statistical method for a logical and easily interpretable explanation of an occupational cohort mortality experience. Since it gives a reasonable fit to the mortality data, it can also be concluded that the model is fairly realistic. The traditional statistical method for the analysis of occupational cohort mortality data is to present a summary index such as the SMR under the assumption of constant (homogeneous) age-specific mortality ratios. Since the mortality ratios for occupational groups usually increase with age, the homogeneity assumption of the age-specific mortality ratios is often untenable. The traditional method of comparing SMRs under the homogeneity assumption is a special case of this model, without age as a covariate.^ This model also provides a statistical technique to evaluate the relative risk between two SMRs or a dose-response relationship among several SMRs. The model presented has application in the medical, demographic and epidemiologic areas. The methods developed in this thesis are suitable for future analyses of mortality or morbidity data when the age-specific mortality/morbidity experience is a function of age or when there is an interaction effect between confounding variables needs to be evaluated. ^

Relevância:

90.00% 90.00%

Publicador:

Resumo:

In regression analysis, covariate measurement error occurs in many applications. The error-prone covariates are often referred to as latent variables. In this proposed study, we extended the study of Chan et al. (2008) on recovering latent slope in a simple regression model to that in a multiple regression model. We presented an approach that applied the Monte Carlo method in the Bayesian framework to the parametric regression model with the measurement error in an explanatory variable. The proposed estimator applied the conditional expectation of latent slope given the observed outcome and surrogate variables in the multiple regression models. A simulation study was presented showing that the method produces estimator that is efficient in the multiple regression model, especially when the measurement error variance of surrogate variable is large.^

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The tobacco-specific nitrosamine 4-(methylnitrosamino)-1-(3-pyridyl)-1-butanone (NNK) is an obvious carcinogen for lung cancer. Since CBMN (Cytokinesis-blocked micronucleus) has been found to be extremely sensitive to NNK-induced genetic damage, it is a potential important factor to predict the lung cancer risk. However, the association between lung cancer and NNK-induced genetic damage measured by CBMN assay has not been rigorously examined. ^ This research develops a methodology to model the chromosomal changes under NNK-induced genetic damage in a logistic regression framework in order to predict the occurrence of lung cancer. Since these chromosomal changes were usually not observed very long due to laboratory cost and time, a resampling technique was applied to generate the Markov chain of the normal and the damaged cell for each individual. A joint likelihood between the resampled Markov chains and the logistic regression model including transition probabilities of this chain as covariates was established. The Maximum likelihood estimation was applied to carry on the statistical test for comparison. The ability of this approach to increase discriminating power to predict lung cancer was compared to a baseline "non-genetic" model. ^ Our method offered an option to understand the association between the dynamic cell information and lung cancer. Our study indicated the extent of DNA damage/non-damage using the CBMN assay provides critical information that impacts public health studies of lung cancer risk. This novel statistical method could simultaneously estimate the process of DNA damage/non-damage and its relationship with lung cancer for each individual.^

Relevância:

90.00% 90.00%

Publicador:

Resumo:

It is well known that an identification problem exists in the analysis of age-period-cohort data because of the relationship among the three factors (date of birth + age at death = date of death). There are numerous suggestions about how to analyze the data. No one solution has been satisfactory. The purpose of this study is to provide another analytic method by extending the Cox's lifetable regression model with time-dependent covariates. The new approach contains the following features: (1) It is based on the conditional maximum likelihood procedure using a proportional hazard function described by Cox (1972), treating the age factor as the underlying hazard to estimate the parameters for the cohort and period factors. (2) The model is flexible so that both the cohort and period factors can be treated as dummy or continuous variables, and the parameter estimations can be obtained for numerous combinations of variables as in a regression analysis. (3) The model is applicable even when the time period is unequally spaced.^ Two specific models are considered to illustrate the new approach and applied to the U.S. prostate cancer data. We find that there are significant differences between all cohorts and there is a significant period effect for both whites and nonwhites. The underlying hazard increases exponentially with age indicating that old people have much higher risk than young people. A log transformation of relative risk shows that the prostate cancer risk declined in recent cohorts for both models. However, prostate cancer risk declined 5 cohorts (25 years) earlier for whites than for nonwhites under the period factor model (0 0 0 1 1 1 1). These latter results are similar to the previous study by Holford (1983).^ The new approach offers a general method to analyze the age-period-cohort data without using any arbitrary constraint in the model. ^

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Interannual environmental variability in Peru is dominated by the El Niño Southern Oscillation (ENSO). The most dramatic changes are associated with the warm El Niño (EN) phase (opposite the cold La Niña phase), which disrupts the normal coastal upwelling and affects the dynamics of many coastal marine and terrestrial resources. This study presents a trophic model for Sechura Bay, located at the northern extension of the Peruvian upwelling system, where ENSO-induced environmental variability is most extreme. Using an initial steady-state model for the year 1996, we explore the dynamics of the ecosystem through the year 2003 (including the strong EN of 1997/98 and the weaker EN of 2002/03). Based on support from literature, we force biomass of several non-trophically-mediated 'drivers' (e.g. Scallops, Benthic detritivores, Octopus, and Littoral fish) to observe whether the fit between historical and simulated changes (by the trophic model) is improved. The results indicate that the Sechura Bay Ecosystem is a relatively inefficient system from a community energetics point of view, likely due to the periodic perturbations of ENSO. A combination of high system productivity and low trophic level target species of invertebrates (i.e. scallops) and fish (i.e. anchoveta) results in high catches and an efficient fishery. The importance of environmental drivers is suggested, given the relatively small improvements in the fit of the simulation with the addition of trophic drivers on remaining functional groups' dynamics. An additional multivariate regression model is presented for the scallop Argopecten purpuratus, which demonstrates a significant correlation between both spawning stock size and riverine discharge-mediated mortality on catch levels. These results are discussed in the context of the appropriateness of trophodynamic modeling in relatively open systems, and how management strategies may be focused given the highly environmentally influenced marine resources of the region.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

A research has been carried out in two-lanehighways in the Madrid Region to propose an alternativemodel for the speed-flowrelationship using regular loop data. The model is different in shape and, in some cases, slopes with respect to the contents of Highway Capacity Manual (HCM). A model is proposed for a mountainous area road, something for which the HCM does not provide explicitly a solution. The problem of a mountain road with high flows to access a popular recreational area is discussed, and some solutions are proposed. Up to 7 one-way sections of two-lanehighways have been selected, aiming at covering a significant number of different characteristics, to verify the proposed method the different classes of highways on which the Manual classifies them. In order to enunciate the model and to verify the basic variables of these types of roads a high number of data have been used. The counts were collected in the same way that the Madrid Region Highway Agency performs their counts. A total of 1.471 hours have been collected, in periods of 5 minutes. The models have been verified by means of specific statistical test (R2, T-Student, Durbin-Watson, ANOVA, etc.) and with the diagnostics of the contrast of assumptions (normality, linearity, homoscedasticity and independence). The model proposed for this type of highways with base conditions, can explain the different behaviors as traffic volumes increase, and follows a polynomial multiple regression model of order 3, S shaped. As secondary results of this research, the levels of service and the capacities of this road have been measured with the 2000 HCM methodology, and the results discussed. © 2011 Published by Elsevier Ltd.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This paper analyses the relationship between productive efficiency and online-social-networks (OSN) in Spanish telecommunications firms. A data-envelopment-analysis (DEA) is used and several indicators of business ?social Media? activities are incorporated. A super-efficiency analysis and bootstrapping techniques are performed to increase the model?s robustness and accuracy. Then, a logistic regression model is applied to characterise factors and drivers of good performance in OSN. Results reveal the company?s ability to absorb and utilise OSNs as a key factor in improving the productive efficiency. This paper presents a model for assessing the strategic performance of the presence and activity in OSN.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Road accidents are a very relevant issue in many countries and macroeconomic models are very frequently applied by academia and administrations to reduce their frequency and consequences. The selection of explanatory variables and response transformation parameter within the Bayesian framework for the selection of the set of explanatory variables a TIM and 3IM (two input and three input models) procedures are proposed. The procedure also uses the DIC and pseudo -R2 goodness of fit criteria. The model to which the methodology is applied is a dynamic regression model with Box-Cox transformation (BCT) for the explanatory variables and autorgressive (AR) structure for the response. The initial set of 22 explanatory variables are identified. The effects of these factors on the fatal accident frequency in Spain, during 2000-2012, are estimated. The dependent variable is constructed considering the stochastic trend component.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Authors discuss the effects that economic crises generate on the global market shares of tourism destinations, through a series of potential transmission mechanisms based on the main economic competitiveness determinants identified in the previous literature using a non-linear approach. Specifically a Markov Switching Regression approach is used to estimate the effect of two basic transmission mechanisms: reductions of internal and external tourism demands and falling investment.