71 resultados para Weighted regression
Resumo:
An efficient two-level model identification method aiming at maximising a model׳s generalisation capability is proposed for a large class of linear-in-the-parameters models from the observational data. A new elastic net orthogonal forward regression (ENOFR) algorithm is employed at the lower level to carry out simultaneous model selection and elastic net parameter estimation. The two regularisation parameters in the elastic net are optimised using a particle swarm optimisation (PSO) algorithm at the upper level by minimising the leave one out (LOO) mean square error (LOOMSE). There are two elements of original contributions. Firstly an elastic net cost function is defined and applied based on orthogonal decomposition, which facilitates the automatic model structure selection process with no need of using a predetermined error tolerance to terminate the forward selection process. Secondly it is shown that the LOOMSE based on the resultant ENOFR models can be analytically computed without actually splitting the data set, and the associate computation cost is small due to the ENOFR procedure. Consequently a fully automated procedure is achieved without resort to any other validation data set for iterative model evaluation. Illustrative examples are included to demonstrate the effectiveness of the new approaches.
Resumo:
We are looking into variants of a domination set problem in social networks. While randomised algorithms for solving the minimum weighted domination set problem and the minimum alpha and alpha-rate domination problem on simple graphs are already present in the literature, we propose here a randomised algorithm for the minimum weighted alpha-rate domination set problem which is, to the best of our knowledge, the first such algorithm. A theoretical approximation bound based on a simple randomised rounding technique is given. The algorithm is implemented in Python and applied to a UK Twitter mentions networks using a measure of individuals’ influence (klout) as weights. We argue that the weights of vertices could be interpreted as the costs of getting those individuals on board for a campaign or a behaviour change intervention. The minimum weighted alpha-rate dominating set problem can therefore be seen as finding a set that minimises the total cost and each individual in a network has at least alpha percentage of its neighbours in the chosen set. We also test our algorithm on generated graphs with several thousand vertices and edges. Our results on this real-life Twitter networks and generated graphs show that the implementation is reasonably efficient and thus can be used for real-life applications when creating social network based interventions, designing social media campaigns and potentially improving users’ social media experience.
Resumo:
We utilize energy budget diagnostics from the Coupled Model Intercomparison Project phase 5 (CMIP5) to evaluate the models' climate forcing since preindustrial times employing an established regression technique. The climate forcing evaluated this way, termed the adjusted forcing (AF), includes a rapid adjustment term associated with cloud changes and other tropospheric and land-surface changes. We estimate a 2010 total anthropogenic and natural AF from CMIP5 models of 1.9 ± 0.9 W m−2 (5–95% range). The projected AF of the Representative Concentration Pathway simulations are lower than their expected radiative forcing (RF) in 2095 but agree well with efficacy weighted forcings from integrated assessment models. The smaller AF, compared to RF, is likely due to cloud adjustment. Multimodel time series of temperature change and AF from 1850 to 2100 have large intermodel spreads throughout the period. The intermodel spread of temperature change is principally driven by forcing differences in the present day and climate feedback differences in 2095, although forcing differences are still important for model spread at 2095. We find no significant relationship between the equilibrium climate sensitivity (ECS) of a model and its 2003 AF, in contrast to that found in older models where higher ECS models generally had less forcing. Given the large present-day model spread, there is no indication of any tendency by modelling groups to adjust their aerosol forcing in order to produce observed trends. Instead, some CMIP5 models have a relatively large positive forcing and overestimate the observed temperature change.
Resumo:
This study examines the long-run performance of initial public offerings on the Stock Exchange of Mauritius (SEM). The results show that the 3-year equally weighted cumulative adjusted returns average −16.5%. The magnitude of this underperformance is consistent with most reported studies in different developed and emerging markets. Based on multivariate regression models, firms with small issues and higher ex ante financial strength seem on average to experience greater long-run underperformance, supporting the divergence of opinion and overreaction hypotheses. On the other hand, Mauritian firms do not on average time their offerings to lower cost of capital and as such, there seems to be limited support for the windows of opportunity hypothesis.
Resumo:
The objective of this study is to investigate whether parentally-reported gastro-intestinal (GI) symptoms are increased in a population-derived sample of children with autism spectrum disorders (ASD) compared to controls. Participants included 132 children with ASD and 81 with special educational needs (SEN) but no ASD, aged 10-14 years plus 82 typically developing (TD) children. Data were collected on GI symptoms, diet, cognitive abilities, and developmental histories. Nearly half (weighted rate 46.5 %) of children with ASD had at least one individual lifetime GI symptom compared with 21.8 % of TD children and 29.2 % of those with SEN. Children with ASD had more past and current GI symptoms than TD or SEN groups although fewer current symptoms were reported in all groups compared with the past. The ASD group had significantly increased past vomiting and diarrhoea compared with the TD group and more abdominal pain than the SEN group. The ASD group had more current constipation (when defined as bowel movement less than three times per week) and soiling than either the TD or SEN groups. No association was found between GI symptoms and intellectual ability, ASD severity, ASD regression or limited or faddy diet. Parents report more GI symptoms in children with ASD than children with either SEN or TD children but the frequency of reported symptoms is greater in the past than currently in all groups.
Resumo:
An efficient data based-modeling algorithm for nonlinear system identification is introduced for radial basis function (RBF) neural networks with the aim of maximizing generalization capability based on the concept of leave-one-out (LOO) cross validation. Each of the RBF kernels has its own kernel width parameter and the basic idea is to optimize the multiple pairs of regularization parameters and kernel widths, each of which is associated with a kernel, one at a time within the orthogonal forward regression (OFR) procedure. Thus, each OFR step consists of one model term selection based on the LOO mean square error (LOOMSE), followed by the optimization of the associated kernel width and regularization parameter, also based on the LOOMSE. Since like our previous state-of-the-art local regularization assisted orthogonal least squares (LROLS) algorithm, the same LOOMSE is adopted for model selection, our proposed new OFR algorithm is also capable of producing a very sparse RBF model with excellent generalization performance. Unlike our previous LROLS algorithm which requires an additional iterative loop to optimize the regularization parameters as well as an additional procedure to optimize the kernel width, the proposed new OFR algorithm optimizes both the kernel widths and regularization parameters within the single OFR procedure, and consequently the required computational complexity is dramatically reduced. Nonlinear system identification examples are included to demonstrate the effectiveness of this new approach in comparison to the well-known approaches of support vector machine and least absolute shrinkage and selection operator as well as the LROLS algorithm.
Resumo:
A new class of parameter estimation algorithms is introduced for Gaussian process regression (GPR) models. It is shown that the integration of the GPR model with probability distance measures of (i) the integrated square error and (ii) Kullback–Leibler (K–L) divergence are analytically tractable. An efficient coordinate descent algorithm is proposed to iteratively estimate the kernel width using golden section search which includes a fast gradient descent algorithm as an inner loop to estimate the noise variance. Numerical examples are included to demonstrate the effectiveness of the new identification approaches.
Resumo:
Classical regression methods take vectors as covariates and estimate the corresponding vectors of regression parameters. When addressing regression problems on covariates of more complex form such as multi-dimensional arrays (i.e. tensors), traditional computational models can be severely compromised by ultrahigh dimensionality as well as complex structure. By exploiting the special structure of tensor covariates, the tensor regression model provides a promising solution to reduce the model’s dimensionality to a manageable level, thus leading to efficient estimation. Most of the existing tensor-based methods independently estimate each individual regression problem based on tensor decomposition which allows the simultaneous projections of an input tensor to more than one direction along each mode. As a matter of fact, multi-dimensional data are collected under the same or very similar conditions, so that data share some common latent components but can also have their own independent parameters for each regression task. Therefore, it is beneficial to analyse regression parameters among all the regressions in a linked way. In this paper, we propose a tensor regression model based on Tucker Decomposition, which identifies not only the common components of parameters across all the regression tasks, but also independent factors contributing to each particular regression task simultaneously. Under this paradigm, the number of independent parameters along each mode is constrained by a sparsity-preserving regulariser. Linked multiway parameter analysis and sparsity modeling further reduce the total number of parameters, with lower memory cost than their tensor-based counterparts. The effectiveness of the new method is demonstrated on real data sets.
Resumo:
Background Children with callous-unemotional (CU) traits, a proposed precursor to adult psychopathy, are characterized by impaired emotion recognition, reduced responsiveness to others’ distress, and a lack of guilt or empathy. Reduced attention to faces, and more specifically to the eye region, has been proposed to underlie these difficulties, although this has never been tested longitudinally from infancy. Attention to faces occurs within the context of dyadic caregiver interactions, and early environment including parenting characteristics has been associated with CU traits. The present study tested whether infants’ preferential tracking of a face with direct gaze and levels of maternal sensitivity predict later CU traits. Methods Data were analyzed from a stratified random sample of 213 participants drawn from a population-based sample of 1233 first-time mothers. Infants’ preferential face tracking at 5 weeks and maternal sensitivity at 29 weeks were entered into a weighted linear regression as predictors of CU traits at 2.5 years. Results Controlling for a range of confounders (e.g., deprivation), lower preferential face tracking predicted higher CU traits (p = .001). Higher maternal sensitivity predicted lower CU traits in girls (p = .009), but not boys. No significant interaction between face tracking and maternal sensitivity was found. Conclusions This is the first study to show that attention to social features during infancy as well as early sensitive parenting predict the subsequent development of CU traits. Identifying such early atypicalities offers the potential for developing parent-mediated interventions in children at risk for developing CU traits.
Resumo:
Forecasting wind power is an important part of a successful integration of wind power into the power grid. Forecasts with lead times longer than 6 h are generally made by using statistical methods to post-process forecasts from numerical weather prediction systems. Two major problems that complicate this approach are the non-linear relationship between wind speed and power production and the limited range of power production between zero and nominal power of the turbine. In practice, these problems are often tackled by using non-linear non-parametric regression models. However, such an approach ignores valuable and readily available information: the power curve of the turbine's manufacturer. Much of the non-linearity can be directly accounted for by transforming the observed power production into wind speed via the inverse power curve so that simpler linear regression models can be used. Furthermore, the fact that the transformed power production has a limited range can be taken care of by employing censored regression models. In this study, we evaluate quantile forecasts from a range of methods: (i) using parametric and non-parametric models, (ii) with and without the proposed inverse power curve transformation and (iii) with and without censoring. The results show that with our inverse (power-to-wind) transformation, simpler linear regression models with censoring perform equally or better than non-linear models with or without the frequently used wind-to-power transformation.
Resumo:
We use sunspot group observations from the Royal Greenwich Observatory (RGO) to investigate the effects of intercalibrating data from observers with different visual acuities. The tests are made by counting the number of groups RB above a variable cut-off threshold of observed total whole-spot area (uncorrected for foreshortening) to simulate what a lower acuity observer would have seen. The synthesised annual means of RB are then re-scaled to the full observed RGO group number RA using a variety of regression techniques. It is found that a very high correlation between RA and RB (rAB > 0.98) does not prevent large errors in the intercalibration (for example sunspot maximum values can be over 30 % too large even for such levels of rAB). In generating the backbone sunspot number (RBB), Svalgaard and Schatten (2015, this issue) force regression fits to pass through the scatter plot origin which generates unreliable fits (the residuals do not form a normal distribution) and causes sunspot cycle amplitudes to be exaggerated in the intercalibrated data. It is demonstrated that the use of Quantile-Quantile (“Q Q”) plots to test for a normal distribution is a useful indicator of erroneous and misleading regression fits. Ordinary least squares linear fits, not forced to pass through the origin, are sometimes reliable (although the optimum method used is shown to be different when matching peak and average sunspot group numbers). However, other fits are only reliable if non-linear regression is used. From these results it is entirely possible that the inflation of solar cycle amplitudes in the backbone group sunspot number as one goes back in time, relative to related solar-terrestrial parameters, is entirely caused by the use of inappropriate and non-robust regression techniques to calibrate the sunspot data.