60 resultados para Partial least square regression
Resumo:
This correspondence introduces a new orthogonal forward regression (OFR) model identification algorithm using D-optimality for model structure selection and is based on an M-estimators of parameter estimates. M-estimator is a classical robust parameter estimation technique to tackle bad data conditions such as outliers. Computationally, The M-estimator can be derived using an iterative reweighted least squares (IRLS) algorithm. D-optimality is a model structure robustness criterion in experimental design to tackle ill-conditioning in model Structure. The orthogonal forward regression (OFR), often based on the modified Gram-Schmidt procedure, is an efficient method incorporating structure selection and parameter estimation simultaneously. The basic idea of the proposed approach is to incorporate an IRLS inner loop into the modified Gram-Schmidt procedure. In this manner, the OFR algorithm for parsimonious model structure determination is extended to bad data conditions with improved performance via the derivation of parameter M-estimators with inherent robustness to outliers. Numerical examples are included to demonstrate the effectiveness of the proposed algorithm.
Resumo:
In this correspondence new robust nonlinear model construction algorithms for a large class of linear-in-the-parameters models are introduced to enhance model robustness via combined parameter regularization and new robust structural selective criteria. In parallel to parameter regularization, we use two classes of robust model selection criteria based on either experimental design criteria that optimizes model adequacy, or the predicted residual sums of squares (PRESS) statistic that optimizes model generalization capability, respectively. Three robust identification algorithms are introduced, i.e., combined A- and D-optimality with regularized orthogonal least squares algorithm, respectively; and combined PRESS statistic with regularized orthogonal least squares algorithm. A common characteristic of these algorithms is that the inherent computation efficiency associated with the orthogonalization scheme in orthogonal least squares or regularized orthogonal least squares has been extended such that the new algorithms are computationally efficient. Numerical examples are included to demonstrate effectiveness of the algorithms.
Resumo:
Artificial diet studies were used to differentiate among physical and chemical mechanisms affecting the suitability to diamondback moth (Plutella xylostella L.), of 16 food substrates obtained by growing four different brassicas in the glasshouse or field and measuring the pest's performance on either leaf discs or a diet incorporating leaf powders. Leaves of Chinese cabbage and the cabbage cultivar 'Minicole' were, respectively, the most and least suitable leaves for the insect, but this ranking was reversed on artificial diet. Leaves of glasshouse-grown plants were more suitable than those of plants grown in the fields. Differences in the suitability of leaves to diamondback moth appeared to be largely determined by leaf toughness and surface wax load. Concentrations of individual glucosinolates in the brassicas probably acted as phagostimulants, so increasing their intrinsic susceptibility to diamondback moth, but the effect of the physical factors appeared more important.
Resumo:
A novel partitioned least squares (PLS) algorithm is presented, in which estimates from several simple system models are combined by means of a Bayesian methodology of pooling partial knowledge. The method has the added advantage that, when the simple models are of a similar structure, it lends itself directly to parallel processing procedures, thereby speeding up the entire parameter estimation process by several factors.
Resumo:
A new parameter-estimation algorithm, which minimises the cross-validated prediction error for linear-in-the-parameter models, is proposed, based on stacked regression and an evolutionary algorithm. It is initially shown that cross-validation is very important for prediction in linear-in-the-parameter models using a criterion called the mean dispersion error (MDE). Stacked regression, which can be regarded as a sophisticated type of cross-validation, is then introduced based on an evolutionary algorithm, to produce a new parameter-estimation algorithm, which preserves the parsimony of a concise model structure that is determined using the forward orthogonal least-squares (OLS) algorithm. The PRESS prediction errors are used for cross-validation, and the sunspot and Canadian lynx time series are used to demonstrate the new algorithms.
Resumo:
Three new trinuclear copper(II) complexes, [(CuL1)(3)(mu(3)-OH)](ClO4)(2)center dot 3.75H(2)O (1), [(CuL2)(3)(mu(3)-OH)](ClO4)(2) (2) and [(CuL3)(3)(mu(3)-OH)](BF4)(2)center dot 0.5CH(3)CN (3) have been synthesized from three tridentate Schiff bases HL1, HL2, and HL3 (HL1 = 2-[(2-amino-ethylimino)-methyl]-phenol, HL2 = 2-[(2-methylamino-ethylimino)-methyl]-phenol and HL3 = 2-[1-(2-dimethylamino-ethylimino)-ethyl]-phenol). The complexes are characterized by single-crystal X-ray diffraction analyses, IR, UV-vis and EPR spectroscopy, and variable-temperature magnetic measurements. All the compounds contain a partial cubane [Cu3O4] core consisting of the trinuclear unit [(CuL)(3)(mu(3)-OH)](2+) together with perchlorate or fluoroborate anions. In each of the complexes, the three copper atoms are five-coordinated with a distorted square-pyramidal geometry except in complex 1, in which one of the Cu-II ions of the trinuclear unit is six-coordinate being in addition weakly coordinated to one of the perchlorate anions. Variable-temperature magnetic measurements and EPR spectra indicate an antiferromagnetic exchange coupling between the CuII ions of complexes 1 and 2, while this turned out to be ferromagnetic for complex 3. Experimental values have been fitted according to an isotropic exchange Hamiltonian. Calculations based on Density Functional Theory have also been performed in order to estimate the exchange coupling constants in these three complexes. Both sets of values indicate similar trends and specially calculated J values establish a magneto-structural correlation between them and the Cu-O-Cu bond angle, in that the coupling is more ferromagnetic for smaller bond angle values.
Resumo:
In this paper we propose an efficient two-level model identification method for a large class of linear-in-the-parameters models from the observational data. A new elastic net orthogonal forward regression (ENOFR) algorithm is employed at the lower level to carry out simultaneous model selection and elastic net parameter estimation. The two regularization parameters in the elastic net are optimized using a particle swarm optimization (PSO) algorithm at the upper level by minimizing the leave one out (LOO) mean square error (LOOMSE). Illustrative examples are included to demonstrate the effectiveness of the new approaches.
Resumo:
This paper investigates whether energy performance ratings, as measured by mandatory Energy Performance Certificates (EPCs), are reflected in the sale prices of residential properties. This is the first large-scale empirical study of this topic in the UK involving approximately 400,000 dwellings in the period from 1995 to 2011. Applying hedonic regression and an augmented repeat sales regression, we find a positive relationship between the energy efficiency rating of a dwelling and the transaction price per square metre. The price effects of superior energy performance tend to be higher for terraced dwellings and flats compared to detached and semi-detached dwellings. The evidence is less clear-cut for house price growth rates but remains supportive of an overall positive association. Overall, the results of this study appear to support the hypothesis that energy efficiency levels are reflected in UK house prices, at least in recent years.
Resumo:
We develop a new sparse kernel density estimator using a forward constrained regression framework, within which the nonnegative and summing-to-unity constraints of the mixing weights can easily be satisfied. Our main contribution is to derive a recursive algorithm to select significant kernels one at time based on the minimum integrated square error (MISE) criterion for both the selection of kernels and the estimation of mixing weights. The proposed approach is simple to implement and the associated computational cost is very low. Specifically, the complexity of our algorithm is in the order of the number of training data N, which is much lower than the order of N2 offered by the best existing sparse kernel density estimators. Numerical examples are employed to demonstrate that the proposed approach is effective in constructing sparse kernel density estimators with comparable accuracy to those of the classical Parzen window estimate and other existing sparse kernel density estimators.
Resumo:
In this paper a modified algorithm is suggested for developing polynomial neural network (PNN) models. Optimal partial description (PD) modeling is introduced at each layer of the PNN expansion, a task accomplished using the orthogonal least squares (OLS) method. Based on the initial PD models determined by the polynomial order and the number of PD inputs, OLS selects the most significant regressor terms reducing the output error variance. The method produces PNN models exhibiting a high level of accuracy and superior generalization capabilities. Additionally, parsimonious models are obtained comprising a considerably smaller number of parameters compared to the ones generated by means of the conventional PNN algorithm. Three benchmark examples are elaborated, including modeling of the gas furnace process as well as the iris and wine classification problems. Extensive simulation results and comparison with other methods in the literature, demonstrate the effectiveness of the suggested modeling approach.
Resumo:
An efficient two-level model identification method aiming at maximising a model׳s generalisation capability is proposed for a large class of linear-in-the-parameters models from the observational data. A new elastic net orthogonal forward regression (ENOFR) algorithm is employed at the lower level to carry out simultaneous model selection and elastic net parameter estimation. The two regularisation parameters in the elastic net are optimised using a particle swarm optimisation (PSO) algorithm at the upper level by minimising the leave one out (LOO) mean square error (LOOMSE). There are two elements of original contributions. Firstly an elastic net cost function is defined and applied based on orthogonal decomposition, which facilitates the automatic model structure selection process with no need of using a predetermined error tolerance to terminate the forward selection process. Secondly it is shown that the LOOMSE based on the resultant ENOFR models can be analytically computed without actually splitting the data set, and the associate computation cost is small due to the ENOFR procedure. Consequently a fully automated procedure is achieved without resort to any other validation data set for iterative model evaluation. Illustrative examples are included to demonstrate the effectiveness of the new approaches.
Resumo:
This paper investigates whether energy performance ratings, as measured by mandatory Energy Performance Certificates (EPCs), are reflected in the sale prices of residential properties. This is the first large-scale empirical study of this topic in England involving 333,095 dwellings sold at least twice in the period from 1995 to 2012. Applying hedonic regression and an augmented repeat sales regression, we find a positive relationship between the energy efficiency rating of a dwelling and the transaction price per square metre. The price effects of superior energy performance tend to be higher for terraced dwellings and flats compared to detached and semi-detached dwellings. The evidence is less clear-cut for rates of house price growth but remains supportive of a positive association. Overall, the results of this study suggest that energy efficiency labels have a measurable and significant impact on house prices in England
Resumo:
A new class of parameter estimation algorithms is introduced for Gaussian process regression (GPR) models. It is shown that the integration of the GPR model with probability distance measures of (i) the integrated square error and (ii) Kullback–Leibler (K–L) divergence are analytically tractable. An efficient coordinate descent algorithm is proposed to iteratively estimate the kernel width using golden section search which includes a fast gradient descent algorithm as an inner loop to estimate the noise variance. Numerical examples are included to demonstrate the effectiveness of the new identification approaches.
Resumo:
We use sunspot group observations from the Royal Greenwich Observatory (RGO) to investigate the effects of intercalibrating data from observers with different visual acuities. The tests are made by counting the number of groups RB above a variable cut-off threshold of observed total whole-spot area (uncorrected for foreshortening) to simulate what a lower acuity observer would have seen. The synthesised annual means of RB are then re-scaled to the full observed RGO group number RA using a variety of regression techniques. It is found that a very high correlation between RA and RB (rAB > 0.98) does not prevent large errors in the intercalibration (for example sunspot maximum values can be over 30 % too large even for such levels of rAB). In generating the backbone sunspot number (RBB), Svalgaard and Schatten (2015, this issue) force regression fits to pass through the scatter plot origin which generates unreliable fits (the residuals do not form a normal distribution) and causes sunspot cycle amplitudes to be exaggerated in the intercalibrated data. It is demonstrated that the use of Quantile-Quantile (“Q Q”) plots to test for a normal distribution is a useful indicator of erroneous and misleading regression fits. Ordinary least squares linear fits, not forced to pass through the origin, are sometimes reliable (although the optimum method used is shown to be different when matching peak and average sunspot group numbers). However, other fits are only reliable if non-linear regression is used. From these results it is entirely possible that the inflation of solar cycle amplitudes in the backbone group sunspot number as one goes back in time, relative to related solar-terrestrial parameters, is entirely caused by the use of inappropriate and non-robust regression techniques to calibrate the sunspot data.
Resumo:
In this paper, we develop a novel constrained recursive least squares algorithm for adaptively combining a set of given multiple models. With data available in an online fashion, the linear combination coefficients of submodels are adapted via the proposed algorithm.We propose to minimize the mean square error with a forgetting factor, and apply the sum to one constraint to the combination parameters. Moreover an l1-norm constraint to the combination parameters is also applied with the aim to achieve sparsity of multiple models so that only a subset of models may be selected into the final model. Then a weighted l2-norm is applied as an approximation to the l1-norm term. As such at each time step, a closed solution of the model combination parameters is available. The contribution of this paper is to derive the proposed constrained recursive least squares algorithm that is computational efficient by exploiting matrix theory. The effectiveness of the approach has been demonstrated using both simulated and real time series examples.