930 resultados para kernel estimates
Resumo:
A unified approach is proposed for sparse kernel data modelling that includes regression and classification as well as probability density function estimation. The orthogonal-least-squares forward selection method based on the leave-one-out test criteria is presented within this unified data-modelling framework to construct sparse kernel models that generalise well. Examples from regression, classification and density estimation applications are used to illustrate the effectiveness of this generic sparse kernel data modelling approach.
Resumo:
Using the classical Parzen window estimate as the target function, the kernel density estimation is formulated as a regression problem and the orthogonal forward regression technique is adopted to construct sparse kernel density estimates. The proposed algorithm incrementally minimises a leave-one-out test error score to select a sparse kernel model, and a local regularisation method is incorporated into the density construction process to further enforce sparsity. The kernel weights are finally updated using the multiplicative nonnegative quadratic programming algorithm, which has the ability to reduce the model size further. Except for the kernel width, the proposed algorithm has no other parameters that need tuning, and the user is not required to specify any additional criterion to terminate the density construction procedure. Two examples are used to demonstrate the ability of this regression-based approach to effectively construct a sparse kernel density estimate with comparable accuracy to that of the full-sample optimised Parzen window density estimate.
Resumo:
Exact error estimates for evaluating multi-dimensional integrals are considered. An estimate is called exact if the rates of convergence for the low- and upper-bound estimate coincide. The algorithm with such an exact rate is called optimal. Such an algorithm has an unimprovable rate of convergence. The problem of existing exact estimates and optimal algorithms is discussed for some functional spaces that define the regularity of the integrand. Important for practical computations data classes are considered: classes of functions with bounded derivatives and Holder type conditions. The aim of the paper is to analyze the performance of two optimal classes of algorithms: deterministic and randomized for computing multidimensional integrals. It is also shown how the smoothness of the integrand can be exploited to construct better randomized algorithms.
Resumo:
The note proposes an efficient nonlinear identification algorithm by combining a locally regularized orthogonal least squares (LROLS) model selection with a D-optimality experimental design. The proposed algorithm aims to achieve maximized model robustness and sparsity via two effective and complementary approaches. The LROLS method alone is capable of producing a very parsimonious model with excellent generalization performance. The D-optimality design criterion further enhances the model efficiency and robustness. An added advantage is that the user only needs to specify a weighting for the D-optimality cost in the combined model selecting criterion and the entire model construction procedure becomes automatic. The value of this weighting does not influence the model selection procedure critically and it can be chosen with ease from a wide range of values.
Resumo:
Nonlinear system identification is considered using a generalized kernel regression model. Unlike the standard kernel model, which employs a fixed common variance for all the kernel regressors, each kernel regressor in the generalized kernel model has an individually tuned diagonal covariance matrix that is determined by maximizing the correlation between the training data and the regressor using a repeated guided random search based on boosting optimization. An efficient construction algorithm based on orthogonal forward regression with leave-one-out (LOO) test statistic and local regularization (LR) is then used to select a parsimonious generalized kernel regression model from the resulting full regression matrix. The proposed modeling algorithm is fully automatic and the user is not required to specify any criterion to terminate the construction procedure. Experimental results involving two real data sets demonstrate the effectiveness of the proposed nonlinear system identification approach.
Resumo:
A greedy technique is proposed to construct parsimonious kernel classifiers using the orthogonal forward selection method and boosting based on Fisher ratio for class separability measure. Unlike most kernel classification methods, which restrict kernel means to the training input data and use a fixed common variance for all the kernel terms, the proposed technique can tune both the mean vector and diagonal covariance matrix of individual kernel by incrementally maximizing Fisher ratio for class separability measure. An efficient weighted optimization method is developed based on boosting to append kernels one by one in an orthogonal forward selection procedure. Experimental results obtained using this construction technique demonstrate that it offers a viable alternative to the existing state-of-the-art kernel modeling methods for constructing sparse Gaussian radial basis function network classifiers. that generalize well.
Resumo:
We propose a simple yet computationally efficient construction algorithm for two-class kernel classifiers. In order to optimise classifier's generalisation capability, an orthogonal forward selection procedure is used to select kernels one by one by minimising the leave-one-out (LOO) misclassification rate directly. It is shown that the computation of the LOO misclassification rate is very efficient owing to orthogonalisation. Examples are used to demonstrate that the proposed algorithm is a viable alternative to construct sparse two-class kernel classifiers in terms of performance and computational efficiency.
Resumo:
Many kernel classifier construction algorithms adopt classification accuracy as performance metrics in model evaluation. Moreover, equal weighting is often applied to each data sample in parameter estimation. These modeling practices often become problematic if the data sets are imbalanced. We present a kernel classifier construction algorithm using orthogonal forward selection (OFS) in order to optimize the model generalization for imbalanced two-class data sets. This kernel classifier identification algorithm is based on a new regularized orthogonal weighted least squares (ROWLS) estimator and the model selection criterion of maximal leave-one-out area under curve (LOO-AUC) of the receiver operating characteristics (ROCs). It is shown that, owing to the orthogonalization procedure, the LOO-AUC can be calculated via an analytic formula based on the new regularized orthogonal weighted least squares parameter estimator, without actually splitting the estimation data set. The proposed algorithm can achieve minimal computational expense via a set of forward recursive updating formula in searching model terms with maximal incremental LOO-AUC value. Numerical examples are used to demonstrate the efficacy of the algorithm.
Resumo:
Using the classical Parzen window (PW) estimate as the target function, the sparse kernel density estimator is constructed in a forward-constrained regression (FCR) manner. The proposed algorithm selects significant kernels one at a time, while the leave-one-out (LOO) test score is minimized subject to a simple positivity constraint in each forward stage. The model parameter estimation in each forward stage is simply the solution of jackknife parameter estimator for a single parameter, subject to the same positivity constraint check. For each selected kernels, the associated kernel width is updated via the Gauss-Newton method with the model parameter estimate fixed. The proposed approach is simple to implement and the associated computational cost is very low. Numerical examples are employed to demonstrate the efficacy of the proposed approach.
Resumo:
Most of the dissolved organic carbon (DOC) exported from catchments is transported during storm events. Accurate assessments of DOC fluxes are essential to understand long-term trends in the transport of DOC from terrestrial to aquatic systems, and also the loss of carbon from peatlands to determine changes in the source/sink status of peatland carbon stores. However, many long-term monitoring programmes collect water samples at a frequency (e.g. weekly/monthly) less than the time period of a typical storm event (typically <1–2 days). As widespread observations in catchments dominated by organo-mineral soils have shown that both concentration and flux of DOC increases during storm events, lower frequency monitoring could result in substantial underestimation of DOC flux as the most dynamic periods of transport are missed. However, our intensive monitoring study in a UK upland peatland catchment showed a contrasting response to these previous studies. Our results showed that (i) DOC concentrations decreased during autumn storm events and showed a poor relationship with flow during other seasons; and that (ii) this decrease in concentrations during autumn storms caused DOC flux estimates based on weekly monitoring data to be over-estimated, rather than under-estimated, because of over rather than under estimation of the flow-weighted mean concentration used in flux calculations. However, as DOC flux is ultimately controlled by discharge volume, and therefore rainfall, and the magnitude of change in discharge was greater than the magnitude of decline in concentrations, DOC flux increased during individual storm events. The implications for long-term DOC trends are therefore contradictory, as increased rainfall could increase flux but cause an overall decrease in DOC concentrations from peatland streams. Care needs to be taken when interpreting long-term trends in DOC flux rather than concentration; as flux is calculated from discharge estimates, and discharge is controlled by rainfall, DOC flux and rainfall/discharge will always be well correlated.
Resumo:
We analyze a fully discrete spectral method for the numerical solution of the initial- and periodic boundary-value problem for two nonlinear, nonlocal, dispersive wave equations, the Benjamin–Ono and the Intermediate Long Wave equations. The equations are discretized in space by the standard Fourier–Galerkin spectral method and in time by the explicit leap-frog scheme. For the resulting fully discrete, conditionally stable scheme we prove an L2-error bound of spectral accuracy in space and of second-order accuracy in time.
Resumo:
Ten projects constructed in Ghana between 2003 and 2010 are examined and analysed to ascertain the reliability of estimated costs provided for the projects. Cost estimates for five of the projects were calculated by consultants and cost estimates for the five remaining projects were calculated by contractors. Cost estimates prepared by contractors seemed to be closer to actual costs than estimates calculated by consultants. Projects estimated by consultants experienced an average cost overrun of 40% and time overrun of 62% whereas projects priced by contractors experienced an average cost overrun of 6% and time overrun of 41%. It seemed that contractors had a better understanding of the actual construction processes and a clearer expectation of the needs of the client hence an ability to calculate estimates that were closer to reality. Construction clients in Ghana should rely on contractors for more realistic cost estimates as estimates by consultants may be inaccurate. Where consultants are employed, an allowance of up 40% should be added to the estimated costs as a margin for inaccuracy.
Resumo:
This study presents a systematic and quantitative analysis of the effect of inhomogeneous surface albedo on shortwave cloud absorption estimates. We used 3D radiative transfer modeling over a checkerboard surface albedo to calculate cloud absorption. We have found that accounting for surface heterogeneity enhances cloud absorption. However, the enhancement is not sufficient to explain the reported difference between measured and modeled cloud absorption.