818 resultados para Robust Regression
Resumo:
Support Vector Machines Regression (SVMR) is a regression technique which has been recently introduced by V. Vapnik and his collaborators (Vapnik, 1995; Vapnik, Golowich and Smola, 1996). In SVMR the goodness of fit is measured not by the usual quadratic loss function (the mean square error), but by a different loss function called Vapnik"s $epsilon$- insensitive loss function, which is similar to the "robust" loss functions introduced by Huber (Huber, 1981). The quadratic loss function is well justified under the assumption of Gaussian additive noise. However, the noise model underlying the choice of Vapnik's loss function is less clear. In this paper the use of Vapnik's loss function is shown to be equivalent to a model of additive and Gaussian noise, where the variance and mean of the Gaussian are random variables. The probability distributions for the variance and mean will be stated explicitly. While this work is presented in the framework of SVMR, it can be extended to justify non-quadratic loss functions in any Maximum Likelihood or Maximum A Posteriori approach. It applies not only to Vapnik's loss function, but to a much broader class of loss functions.
Resumo:
This correspondence introduces a new orthogonal forward regression (OFR) model identification algorithm using D-optimality for model structure selection and is based on an M-estimators of parameter estimates. M-estimator is a classical robust parameter estimation technique to tackle bad data conditions such as outliers. Computationally, The M-estimator can be derived using an iterative reweighted least squares (IRLS) algorithm. D-optimality is a model structure robustness criterion in experimental design to tackle ill-conditioning in model Structure. The orthogonal forward regression (OFR), often based on the modified Gram-Schmidt procedure, is an efficient method incorporating structure selection and parameter estimation simultaneously. The basic idea of the proposed approach is to incorporate an IRLS inner loop into the modified Gram-Schmidt procedure. In this manner, the OFR algorithm for parsimonious model structure determination is extended to bad data conditions with improved performance via the derivation of parameter M-estimators with inherent robustness to outliers. Numerical examples are included to demonstrate the effectiveness of the proposed algorithm.
Resumo:
Recent temperature extremes have highlighted the importance of assessing projected changes in the variability of temperature as well as the mean. A large fraction of present day temperature variance is associated with thermal advection, as anomalous winds blow across the land-sea temperature contrast for instance. Models project robust heterogeneity in the 21st century warming pattern under greenhouse gas forcing, resulting in land-sea temperature contrasts increasing in summer and decreasing in winter, and the pole-to-equator temperature gradient weakening in winter. In this study, future monthly variability changes in the 17 member ensemble ESSENCE are assessed. In winter, variability in midlatitudes decreases while in very high latitudes and the tropics it increases. In summer, variability increases over most land areas and in the tropics, with decreasing variability in high latitude oceans. Multiple regression analysis is used to determine the contributions to variability changes from changing temperature gradients and circulation patterns. Thermal advection is found to be of particular importance in the northern hemisphere winter midlatitudes, where the change in mean state temperature gradients alone could account for over half the projected changes. Changes in thermal advection are also found to be important in summer in Europe and coastal areas, although less so than in winter. Comparison with CMIP5 data shows that the midlatitude changes in variability are robust across large regions, particularly high northern latitudes in winter and mid northern latitudes in summer.
Resumo:
The purpose of this paper is to develop a Bayesian analysis for nonlinear regression models under scale mixtures of skew-normal distributions. This novel class of models provides a useful generalization of the symmetrical nonlinear regression models since the error distributions cover both skewness and heavy-tailed distributions such as the skew-t, skew-slash and the skew-contaminated normal distributions. The main advantage of these class of distributions is that they have a nice hierarchical representation that allows the implementation of Markov chain Monte Carlo (MCMC) methods to simulate samples from the joint posterior distribution. In order to examine the robust aspects of this flexible class, against outlying and influential observations, we present a Bayesian case deletion influence diagnostics based on the Kullback-Leibler divergence. Further, some discussions on the model selection criteria are given. The newly developed procedures are illustrated considering two simulations study, and a real data previously analyzed under normal and skew-normal nonlinear regression models. (C) 2010 Elsevier B.V. All rights reserved.
Resumo:
In this paper we deal with robust inference in heteroscedastic measurement error models Rather than the normal distribution we postulate a Student t distribution for the observed variables Maximum likelihood estimates are computed numerically Consistent estimation of the asymptotic covariance matrices of the maximum likelihood and generalized least squares estimators is also discussed Three test statistics are proposed for testing hypotheses of interest with the asymptotic chi-square distribution which guarantees correct asymptotic significance levels Results of simulations and an application to a real data set are also reported (C) 2009 The Korean Statistical Society Published by Elsevier B V All rights reserved
Resumo:
Birnbaum-Saunders models have largely been applied in material fatigue studies and reliability analyses to relate the total time until failure with some type of cumulative damage. In many problems related to the medical field, such as chronic cardiac diseases and different types of cancer, a cumulative damage caused by several risk factors might cause some degradation that leads to a fatigue process. In these cases, BS models can be suitable for describing the propagation lifetime. However, since the cumulative damage is assumed to be normally distributed in the BS distribution, the parameter estimates from this model can be sensitive to outlying observations. In order to attenuate this influence, we present in this paper BS models, in which a Student-t distribution is assumed to explain the cumulative damage. In particular, we show that the maximum likelihood estimates of the Student-t log-BS models attribute smaller weights to outlying observations, which produce robust parameter estimates. Also, some inferential results are presented. In addition, based on local influence and deviance component and martingale-type residuals, a diagnostics analysis is derived. Finally, a motivating example from the medical field is analyzed using log-BS regression models. Since the parameter estimates appear to be very sensitive to outlying and influential observations, the Student-t log-BS regression model should attenuate such influences. The model checking methodologies developed in this paper are used to compare the fitted models.
A robust Bayesian approach to null intercept measurement error model with application to dental data
Resumo:
Measurement error models often arise in epidemiological and clinical research. Usually, in this set up it is assumed that the latent variable has a normal distribution. However, the normality assumption may not be always correct. Skew-normal/independent distribution is a class of asymmetric thick-tailed distributions which includes the Skew-normal distribution as a special case. In this paper, we explore the use of skew-normal/independent distribution as a robust alternative to null intercept measurement error model under a Bayesian paradigm. We assume that the random errors and the unobserved value of the covariate (latent variable) follows jointly a skew-normal/independent distribution, providing an appealing robust alternative to the routine use of symmetric normal distribution in this type of model. Specific distributions examined include univariate and multivariate versions of the skew-normal distribution, the skew-t distributions, the skew-slash distributions and the skew contaminated normal distributions. The methods developed is illustrated using a real data set from a dental clinical trial. (C) 2008 Elsevier B.V. All rights reserved.
Resumo:
This paper considers two-sided tests for the parameter of an endogenous variable in an instrumental variable (IV) model with heteroskedastic and autocorrelated errors. We develop the nite-sample theory of weighted-average power (WAP) tests with normal errors and a known long-run variance. We introduce two weights which are invariant to orthogonal transformations of the instruments; e.g., changing the order in which the instruments appear. While tests using the MM1 weight can be severely biased, optimal tests based on the MM2 weight are naturally two-sided when errors are homoskedastic. We propose two boundary conditions that yield two-sided tests whether errors are homoskedastic or not. The locally unbiased (LU) condition is related to the power around the null hypothesis and is a weaker requirement than unbiasedness. The strongly unbiased (SU) condition is more restrictive than LU, but the associated WAP tests are easier to implement. Several tests are SU in nite samples or asymptotically, including tests robust to weak IV (such as the Anderson-Rubin, score, conditional quasi-likelihood ratio, and I. Andrews' (2015) PI-CLC tests) and two-sided tests which are optimal when the sample size is large and instruments are strong. We refer to the WAP-SU tests based on our weights as MM1-SU and MM2-SU tests. Dropping the restrictive assumptions of normality and known variance, the theory is shown to remain valid at the cost of asymptotic approximations. The MM2-SU test is optimal under the strong IV asymptotics, and outperforms other existing tests under the weak IV asymptotics.
Resumo:
An extension of some standard likelihood based procedures to heteroscedastic nonlinear regression models under scale mixtures of skew-normal (SMSN) distributions is developed. This novel class of models provides a useful generalization of the heteroscedastic symmetrical nonlinear regression models (Cysneiros et al., 2010), since the random term distributions cover both symmetric as well as asymmetric and heavy-tailed distributions such as skew-t, skew-slash, skew-contaminated normal, among others. A simple EM-type algorithm for iteratively computing maximum likelihood estimates of the parameters is presented and the observed information matrix is derived analytically. In order to examine the performance of the proposed methods, some simulation studies are presented to show the robust aspect of this flexible class against outlying and influential observations and that the maximum likelihood estimates based on the EM-type algorithm do provide good asymptotic properties. Furthermore, local influence measures and the one-step approximations of the estimates in the case-deletion model are obtained. Finally, an illustration of the methodology is given considering a data set previously analyzed under the homoscedastic skew-t nonlinear regression model. (C) 2012 Elsevier B.V. All rights reserved.
Resumo:
In this paper, an alternative skew Student-t family of distributions is studied. It is obtained as an extension of the generalized Student-t (GS-t) family introduced by McDonald and Newey [10]. The extension that is obtained can be seen as a reparametrization of the skewed GS-t distribution considered by Theodossiou [14]. A key element in the construction of such an extension is that it can be stochastically represented as a mixture of an epsilon-skew-power-exponential distribution [1] and a generalized-gamma distribution. From this representation, we can readily derive theoretical properties and easy-to-implement simulation schemes. Furthermore, we study some of its main properties including stochastic representation, moments and asymmetry and kurtosis coefficients. We also derive the Fisher information matrix, which is shown to be nonsingular for some special cases such as when the asymmetry parameter is null, that is, at the vicinity of symmetry, and discuss maximum-likelihood estimation. Simulation studies for some particular cases and real data analysis are also reported, illustrating the usefulness of the extension considered.
Resumo:
In this paper, we carry out robust modeling and influence diagnostics in Birnbaum-Saunders (BS) regression models. Specifically, we present some aspects related to BS and log-BS distributions and their generalizations from the Student-t distribution, and develop BS-t regression models, including maximum likelihood estimation based on the EM algorithm and diagnostic tools. In addition, we apply the obtained results to real data from insurance, which shows the uses of the proposed model. Copyright (c) 2011 John Wiley & Sons, Ltd.
Resumo:
In environmental epidemiology, exposure X and health outcome Y vary in space and time. We present a method to diagnose the possible influence of unmeasured confounders U on the estimated effect of X on Y and to propose several approaches to robust estimation. The idea is to use space and time as proxy measures for the unmeasured factors U. We start with the time series case where X and Y are continuous variables at equally-spaced times and assume a linear model. We define matching estimator b(u)s that correspond to pairs of observations with specific lag u. Controlling for a smooth function of time, St, using a kernel estimator is roughly equivalent to estimating the association with a linear combination of the b(u)s with weights that involve two components: the assumptions about the smoothness of St and the normalized variogram of the X process. When an unmeasured confounder U exists, but the model otherwise correctly controls for measured confounders, the excess variation in b(u)s is evidence of confounding by U. We use the plot of b(u)s versus lag u, lagged-estimator-plot (LEP), to diagnose the influence of U on the effect of X on Y. We use appropriate linear combination of b(u)s or extrapolate to b(0) to obtain novel estimators that are more robust to the influence of smooth U. The methods are extended to time series log-linear models and to spatial analyses. The LEP plot gives us a direct view of the magnitude of the estimators for each lag u and provides evidence when models did not adequately describe the data.
Resumo:
Knowledge of landmarks and contours in anteroposterior (AP) pelvis X-rays is invaluable for computer aided diagnosis, hip surgery planning and image-guided interventions. This paper presents a fully automatic and robust approach for landmarking and segmentation of both pelvis and femur in a conventional AP X-ray. Our approach is based on random forest regression and hierarchical sparse shape composition. Experiments conducted on 436 clinical AP pelvis x-rays show that our approach achieves an average point-to-curve error around 1.3 mm for femur and 2.2 mm for pelvis, both with success rates around 98%. Compared to existing methods, our approach exhibits better performance in both the robustness and the accuracy.
Resumo:
We present an independent calibration model for the determination of biogenic silica (BSi) in sediments, developed from analysis of synthetic sediment mixtures and application of Fourier transform infrared spectroscopy (FTIRS) and partial least squares regression (PLSR) modeling. In contrast to current FTIRS applications for quantifying BSi, this new calibration is independent from conventional wet-chemical techniques and their associated measurement uncertainties. This approach also removes the need for developing internal calibrations between the two methods for individual sediments records. For the independent calibration, we produced six series of different synthetic sediment mixtures using two purified diatom extracts, with one extract mixed with quartz sand, calcite, 60/40 quartz/calcite and two different natural sediments, and a second extract mixed with one of the natural sediments. A total of 306 samples—51 samples per series—yielded BSi contents ranging from 0 to 100 %. The resulting PLSR calibration model between the FTIR spectral information and the defined BSi concentration of the synthetic sediment mixtures exhibits a strong cross-validated correlation ( R2cv = 0.97) and a low root-mean square error of cross-validation (RMSECV = 4.7 %). Application of the independent calibration to natural lacustrine and marine sediments yields robust BSi reconstructions. At present, the synthetic mixtures do not include the variation in organic matter that occurs in natural samples, which may explain the somewhat lower prediction accuracy of the calibration model for organic-rich samples.