9 resultados para Robust regression
em CentAUR: Central Archive University of Reading - UK
Resumo:
We use sunspot group observations from the Royal Greenwich Observatory (RGO) to investigate the effects of intercalibrating data from observers with different visual acuities. The tests are made by counting the number of groups RB above a variable cut-off threshold of observed total whole-spot area (uncorrected for foreshortening) to simulate what a lower acuity observer would have seen. The synthesised annual means of RB are then re-scaled to the full observed RGO group number RA using a variety of regression techniques. It is found that a very high correlation between RA and RB (rAB > 0.98) does not prevent large errors in the intercalibration (for example sunspot maximum values can be over 30 % too large even for such levels of rAB). In generating the backbone sunspot number (RBB), Svalgaard and Schatten (2015, this issue) force regression fits to pass through the scatter plot origin which generates unreliable fits (the residuals do not form a normal distribution) and causes sunspot cycle amplitudes to be exaggerated in the intercalibrated data. It is demonstrated that the use of Quantile-Quantile (“Q Q”) plots to test for a normal distribution is a useful indicator of erroneous and misleading regression fits. Ordinary least squares linear fits, not forced to pass through the origin, are sometimes reliable (although the optimum method used is shown to be different when matching peak and average sunspot group numbers). However, other fits are only reliable if non-linear regression is used. From these results it is entirely possible that the inflation of solar cycle amplitudes in the backbone group sunspot number as one goes back in time, relative to related solar-terrestrial parameters, is entirely caused by the use of inappropriate and non-robust regression techniques to calibrate the sunspot data.
Obesity and diabetes, the built environment, and the ‘local’ food economy in the United States, 2007
Resumo:
Obesity and diabetes are increasingly attributed to environmental factors, however, little attention has been paid to the influence of the ‘local’ food economy. This paper examines the association of measures relating to the built environment and ‘local’ agriculture with U.S. county-level prevalence of obesity and diabetes. Key indicators of the ‘local’ food economy include the density of farmers’ markets and the presence of farms with direct sales. This paper employs a robust regression estimator to account for non-normality of the data and to accommodate outliers. Overall, the built environment is associated with the prevalence of obesity and diabetes and a strong local’ food economy may play an important role in prevention. Results imply considerable scope for community-level interventions.
Resumo:
This study investigates the determinants of commercial and retail airport revenues as well as revenues from real estate operations. Cross-sectional OLS, 2SLS and robust regression models of European airports identify a number of significant drivers of airport revenues. Aviation revenues per passenger are mainly determined by the national income per capita in which the airport is located, the percentage of leisure travelers and the size of the airport proxied by total aviation revenues. Main drivers of commercial revenues per passenger include the total number of passengers passing through the airport, the ratio of commercial to total revenues, the national income, the share of domestic and leisure travelers and the total number of flights. These results are in line with previous findings of a negative influence of business travelers on commercial revenues per passenger. We also find that a high amount of retail space per passenger is generally associated with lower commercial revenues per square meter confirming decreasing marginal revenue effects. Real estate revenues per passenger are positively associated with national income per capita at airport location, share of intra-EU passengers and percent delayed flights. Overall, aviation and non-aviation revenues appear to be strongly interlinked, underlining the potential for a comprehensive airport management strategy above and beyond mere cost minimization of the aviation sector.
Resumo:
Drawing upon an updated and expanded dataset of Energy Star and LEED labeled commercial offices, this paper investigates the effect of eco-labeling on rental rates, sale prices and occupancy rates. Using OLS and robust regression procedures, hedonic modeling is used to test whether the presence of an eco-label has a significant positive effect on rental rates, sale prices and occupancy rates. The study suggests that estimated coefficients can be sensitive to outlier treatment. For sale prices and occupancy rates, there are notable differences between estimated coefficients for OLS and robust regressions. The results suggest that both Energy Star and LEED offices obtain rental premiums of approximately 3%. A 17% sale price premium is estimated for Energy Star labeled offices but no significant sale price premium is estimated for LEED labeled offices. Surprisingly, no significant occupancy premium is estimated for Energy Star labeled offices and a negative occupancy premium is estimated for LEED labeled offices.
Resumo:
In this correspondence new robust nonlinear model construction algorithms for a large class of linear-in-the-parameters models are introduced to enhance model robustness via combined parameter regularization and new robust structural selective criteria. In parallel to parameter regularization, we use two classes of robust model selection criteria based on either experimental design criteria that optimizes model adequacy, or the predicted residual sums of squares (PRESS) statistic that optimizes model generalization capability, respectively. Three robust identification algorithms are introduced, i.e., combined A- and D-optimality with regularized orthogonal least squares algorithm, respectively; and combined PRESS statistic with regularized orthogonal least squares algorithm. A common characteristic of these algorithms is that the inherent computation efficiency associated with the orthogonalization scheme in orthogonal least squares or regularized orthogonal least squares has been extended such that the new algorithms are computationally efficient. Numerical examples are included to demonstrate effectiveness of the algorithms.
Resumo:
This letter introduces a new robust nonlinear identification algorithm using the Predicted REsidual Sums of Squares (PRESS) statistic and for-ward regression. The major contribution is to compute the PRESS statistic within a framework of a forward orthogonalization process and hence construct a model with a good generalization property. Based on the properties of the PRESS statistic the proposed algorithm can achieve a fully automated procedure without resort to any other validation data set for iterative model evaluation.
Resumo:
This correspondence introduces a new orthogonal forward regression (OFR) model identification algorithm using D-optimality for model structure selection and is based on an M-estimators of parameter estimates. M-estimator is a classical robust parameter estimation technique to tackle bad data conditions such as outliers. Computationally, The M-estimator can be derived using an iterative reweighted least squares (IRLS) algorithm. D-optimality is a model structure robustness criterion in experimental design to tackle ill-conditioning in model Structure. The orthogonal forward regression (OFR), often based on the modified Gram-Schmidt procedure, is an efficient method incorporating structure selection and parameter estimation simultaneously. The basic idea of the proposed approach is to incorporate an IRLS inner loop into the modified Gram-Schmidt procedure. In this manner, the OFR algorithm for parsimonious model structure determination is extended to bad data conditions with improved performance via the derivation of parameter M-estimators with inherent robustness to outliers. Numerical examples are included to demonstrate the effectiveness of the proposed algorithm.
Resumo:
Recent temperature extremes have highlighted the importance of assessing projected changes in the variability of temperature as well as the mean. A large fraction of present day temperature variance is associated with thermal advection, as anomalous winds blow across the land-sea temperature contrast for instance. Models project robust heterogeneity in the 21st century warming pattern under greenhouse gas forcing, resulting in land-sea temperature contrasts increasing in summer and decreasing in winter, and the pole-to-equator temperature gradient weakening in winter. In this study, future monthly variability changes in the 17 member ensemble ESSENCE are assessed. In winter, variability in midlatitudes decreases while in very high latitudes and the tropics it increases. In summer, variability increases over most land areas and in the tropics, with decreasing variability in high latitude oceans. Multiple regression analysis is used to determine the contributions to variability changes from changing temperature gradients and circulation patterns. Thermal advection is found to be of particular importance in the northern hemisphere winter midlatitudes, where the change in mean state temperature gradients alone could account for over half the projected changes. Changes in thermal advection are also found to be important in summer in Europe and coastal areas, although less so than in winter. Comparison with CMIP5 data shows that the midlatitude changes in variability are robust across large regions, particularly high northern latitudes in winter and mid northern latitudes in summer.