94 resultados para predictive regression model
em Biblioteca Digital da Produção Intelectual da Universidade de São Paulo (BDPI/USP)
Resumo:
We have considered a Bayesian approach for the nonlinear regression model by replacing the normal distribution on the error term by some skewed distributions, which account for both skewness and heavy tails or skewness alone. The type of data considered in this paper concerns repeated measurements taken in time on a set of individuals. Such multiple observations on the same individual generally produce serially correlated outcomes. Thus, additionally, our model does allow for a correlation between observations made from the same individual. We have illustrated the procedure using a data set to study the growth curves of a clinic measurement of a group of pregnant women from an obstetrics clinic in Santiago, Chile. Parameter estimation and prediction were carried out using appropriate posterior simulation schemes based in Markov Chain Monte Carlo methods. Besides the deviance information criterion (DIC) and the conditional predictive ordinate (CPO), we suggest the use of proper scoring rules based on the posterior predictive distribution for comparing models. For our data set, all these criteria chose the skew-t model as the best model for the errors. These DIC and CPO criteria are also validated, for the model proposed here, through a simulation study. As a conclusion of this study, the DIC criterion is not trustful for this kind of complex model.
Resumo:
A bathtub-shaped failure rate function is very useful in survival analysis and reliability studies. The well-known lifetime distributions do not have this property. For the first time, we propose a location-scale regression model based on the logarithm of an extended Weibull distribution which has the ability to deal with bathtub-shaped failure rate functions. We use the method of maximum likelihood to estimate the model parameters and some inferential procedures are presented. We reanalyze a real data set under the new model and the log-modified Weibull regression model. We perform a model check based on martingale-type residuals and generated envelopes and the statistics AIC and BIC to select appropriate models. (C) 2009 Elsevier B.V. All rights reserved.
Resumo:
We introduce the log-beta Weibull regression model based on the beta Weibull distribution (Famoye et al., 2005; Lee et al., 2007). We derive expansions for the moment generating function which do not depend on complicated functions. The new regression model represents a parametric family of models that includes as sub-models several widely known regression models that can be applied to censored survival data. We employ a frequentist analysis, a jackknife estimator, and a parametric bootstrap for the parameters of the proposed model. We derive the appropriate matrices for assessing local influences on the parameter estimates under different perturbation schemes and present some ways to assess global influences. Further, for different parameter settings, sample sizes, and censoring percentages, several simulations are performed. In addition, the empirical distribution of some modified residuals are displayed and compared with the standard normal distribution. These studies suggest that the residual analysis usually performed in normal linear regression models can be extended to a modified deviance residual in the proposed regression model applied to censored data. We define martingale and deviance residuals to evaluate the model assumptions. The extended regression model is very useful for the analysis of real data and could give more realistic fits than other special regression models.
A bivariate regression model for matched paired survival data: local influence and residual analysis
Resumo:
The use of bivariate distributions plays a fundamental role in survival and reliability studies. In this paper, we consider a location scale model for bivariate survival times based on the proposal of a copula to model the dependence of bivariate survival data. For the proposed model, we consider inferential procedures based on maximum likelihood. Gains in efficiency from bivariate models are also examined in the censored data setting. For different parameter settings, sample sizes and censoring percentages, various simulation studies are performed and compared to the performance of the bivariate regression model for matched paired survival data. Sensitivity analysis methods such as local and total influence are presented and derived under three perturbation schemes. The martingale marginal and the deviance marginal residual measures are used to check the adequacy of the model. Furthermore, we propose a new measure which we call modified deviance component residual. The methodology in the paper is illustrated on a lifetime data set for kidney patients.
Resumo:
In this paper we have discussed inference aspects of the skew-normal nonlinear regression models following both, a classical and Bayesian approach, extending the usual normal nonlinear regression models. The univariate skew-normal distribution that will be used in this work was introduced by Sahu et al. (Can J Stat 29:129-150, 2003), which is attractive because estimation of the skewness parameter does not present the same degree of difficulty as in the case with Azzalini (Scand J Stat 12:171-178, 1985) one and, moreover, it allows easy implementation of the EM-algorithm. As illustration of the proposed methodology, we consider a data set previously analyzed in the literature under normality.
Resumo:
In interval-censored survival data, the event of interest is not observed exactly but is only known to occur within some time interval. Such data appear very frequently. In this paper, we are concerned only with parametric forms, and so a location-scale regression model based on the exponentiated Weibull distribution is proposed for modeling interval-censored data. We show that the proposed log-exponentiated Weibull regression model for interval-censored data represents a parametric family of models that include other regression models that are broadly used in lifetime data analysis. Assuming the use of interval-censored data, we employ a frequentist analysis, a jackknife estimator, a parametric bootstrap and a Bayesian analysis for the parameters of the proposed model. We derive the appropriate matrices for assessing local influences on the parameter estimates under different perturbation schemes and present some ways to assess global influences. Furthermore, for different parameter settings, sample sizes and censoring percentages, various simulations are performed; in addition, the empirical distribution of some modified residuals are displayed and compared with the standard normal distribution. These studies suggest that the residual analysis usually performed in normal linear regression models can be straightforwardly extended to a modified deviance residual in log-exponentiated Weibull regression models for interval-censored data. (C) 2009 Elsevier B.V. All rights reserved.
Resumo:
The Birnbaum-Saunders distribution has been used quite effectively to model times to failure for materials subject to fatigue and for modeling lifetime data. In this paper we obtain asymptotic expansions, up to order n(-1/2) and under a sequence of Pitman alternatives, for the non-null distribution functions of the likelihood ratio, Wald, score and gradient test statistics in the Birnbaum-Saunders regression model. The asymptotic distributions of all four statistics are obtained for testing a subset of regression parameters and for testing the shape parameter. Monte Carlo simulation is presented in order to compare the finite-sample performance of these tests. We also present two empirical applications. (C) 2010 Elsevier B.V. All rights reserved.
Resumo:
The Birnbaum-Saunders regression model is becoming increasingly popular in lifetime analyses and reliability studies. In this model, the signed likelihood ratio statistic provides the basis for testing inference and construction of confidence limits for a single parameter of interest. We focus on the small sample case, where the standard normal distribution gives a poor approximation to the true distribution of the statistic. We derive three adjusted signed likelihood ratio statistics that lead to very accurate inference even for very small samples. Two empirical applications are presented. (C) 2010 Elsevier B.V. All rights reserved.
Resumo:
We analyse the finite-sample behaviour of two second-order bias-corrected alternatives to the maximum-likelihood estimator of the parameters in a multivariate normal regression model with general parametrization proposed by Patriota and Lemonte [A. G. Patriota and A. J. Lemonte, Bias correction in a multivariate regression model with genereal parameterization, Stat. Prob. Lett. 79 (2009), pp. 1655-1662]. The two finite-sample corrections we consider are the conventional second-order bias-corrected estimator and the bootstrap bias correction. We present the numerical results comparing the performance of these estimators. Our results reveal that analytical bias correction outperforms numerical bias corrections obtained from bootstrapping schemes.
Resumo:
This paper derives the second-order biases Of maximum likelihood estimates from a multivariate normal model where the mean vector and the covariance matrix have parameters in common. We show that the second order bias can always be obtained by means of ordinary weighted least-squares regressions. We conduct simulation studies which indicate that the bias correction scheme yields nearly unbiased estimators. (C) 2009 Elsevier B.V. All rights reserved.
Resumo:
In 2004 the National Household Survey (Pesquisa Nacional par Amostras de Domicilios - PNAD) estimated the prevalence of food and nutrition insecurity in Brazil. However, PNAD data cannot be disaggregated at the municipal level. The objective of this study was to build a statistical model to predict severe food insecurity for Brazilian municipalities based on the PNAD dataset. Exclusion criteria were: incomplete food security data (19.30%); informants younger than 18 years old (0.07%); collective households (0.05%); households headed by indigenous persons (0.19%). The modeling was carried out in three stages, beginning with the selection of variables related to food insecurity using univariate logistic regression. The variables chosen to construct the municipal estimates were selected from those included in PNAD as well as the 2000 Census. Multivariate logistic regression was then initiated, removing the non-significant variables with odds ratios adjusted by multiple logistic regression. The Wald Test was applied to check the significance of the coefficients in the logistic equation. The final model included the variables: per capita income; years of schooling; race and gender of the household head; urban or rural residence; access to public water supply; presence of children; total number of household inhabitants and state of residence. The adequacy of the model was tested using the Hosmer-Lemeshow test (p=0.561) and ROC curve (area=0.823). Tests indicated that the model has strong predictive power and can be used to determine household food insecurity in Brazilian municipalities, suggesting that similar predictive models may be useful tools in other Latin American countries.
Resumo:
Model trees are a particular case of decision trees employed to solve regression problems. They have the advantage of presenting an interpretable output, helping the end-user to get more confidence in the prediction and providing the basis for the end-user to have new insight about the data, confirming or rejecting hypotheses previously formed. Moreover, model trees present an acceptable level of predictive performance in comparison to most techniques used for solving regression problems. Since generating the optimal model tree is an NP-Complete problem, traditional model tree induction algorithms make use of a greedy top-down divide-and-conquer strategy, which may not converge to the global optimal solution. In this paper, we propose a novel algorithm based on the use of the evolutionary algorithms paradigm as an alternate heuristic to generate model trees in order to improve the convergence to globally near-optimal solutions. We call our new approach evolutionary model tree induction (E-Motion). We test its predictive performance using public UCI data sets, and we compare the results to traditional greedy regression/model trees induction algorithms, as well as to other evolutionary approaches. Results show that our method presents a good trade-off between predictive performance and model comprehensibility, which may be crucial in many machine learning applications. (C) 2010 Elsevier Inc. All rights reserved.
Resumo:
This work deals with a procedure for model re-identification of a process in closed loop with ail already existing commercial MPC. The controller considered here has a two-layer structure where the upper layer performs a target calculation based on a simplified steady-state optimization of the process. Here, it is proposed a methodology where a test signal is introduced in a tuning parameter of the target calculation layer. When the outputs are controlled by zones instead of at fixed set points, the approach allows the continuous operation of the process without an excessive disruption of the operating objectives as process constraints and product specifications remain satisfied during the identification test. The application of the method is illustrated through the simulation of two processes of the oil refining industry. (c) 2008 Elsevier Ltd. All rights reserved.
Resumo:
In this paper, we compare three residuals to assess departures from the error assumptions as well as to detect outlying observations in log-Burr XII regression models with censored observations. These residuals can also be used for the log-logistic regression model, which is a special case of the log-Burr XII regression model. For different parameter settings, sample sizes and censoring percentages, various simulation studies are performed and the empirical distribution of each residual is displayed and compared with the standard normal distribution. These studies suggest that the residual analysis usually performed in normal linear regression models can be straightforwardly extended to the modified martingale-type residual in log-Burr XII regression models with censored data.
Resumo:
In a sample of censored survival times, the presence of an immune proportion of individuals who are not subject to death, failure or relapse, may be indicated by a relatively high number of individuals with large censored survival times. In this paper the generalized log-gamma model is modified for the possibility that long-term survivors may be present in the data. The model attempts to separately estimate the effects of covariates on the surviving fraction, that is, the proportion of the population for which the event never occurs. The logistic function is used for the regression model of the surviving fraction. Inference for the model parameters is considered via maximum likelihood. Some influence methods, such as the local influence and total local influence of an individual are derived, analyzed and discussed. Finally, a data set from the medical area is analyzed under the log-gamma generalized mixture model. A residual analysis is performed in order to select an appropriate model.