7 resultados para imbalanced data

em CentAUR: Central Archive University of Reading - UK


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Many kernel classifier construction algorithms adopt classification accuracy as performance metrics in model evaluation. Moreover, equal weighting is often applied to each data sample in parameter estimation. These modeling practices often become problematic if the data sets are imbalanced. We present a kernel classifier construction algorithm using orthogonal forward selection (OFS) in order to optimize the model generalization for imbalanced two-class data sets. This kernel classifier identification algorithm is based on a new regularized orthogonal weighted least squares (ROWLS) estimator and the model selection criterion of maximal leave-one-out area under curve (LOO-AUC) of the receiver operating characteristics (ROCs). It is shown that, owing to the orthogonalization procedure, the LOO-AUC can be calculated via an analytic formula based on the new regularized orthogonal weighted least squares parameter estimator, without actually splitting the estimation data set. The proposed algorithm can achieve minimal computational expense via a set of forward recursive updating formula in searching model terms with maximal incremental LOO-AUC value. Numerical examples are used to demonstrate the efficacy of the algorithm.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We propose a new class of neurofuzzy construction algorithms with the aim of maximizing generalization capability specifically for imbalanced data classification problems based on leave-one-out (LOO) cross validation. The algorithms are in two stages, first an initial rule base is constructed based on estimating the Gaussian mixture model with analysis of variance decomposition from input data; the second stage carries out the joint weighted least squares parameter estimation and rule selection using orthogonal forward subspace selection (OFSS)procedure. We show how different LOO based rule selection criteria can be incorporated with OFSS, and advocate either maximizing the leave-one-out area under curve of the receiver operating characteristics, or maximizing the leave-one-out Fmeasure if the data sets exhibit imbalanced class distribution. Extensive comparative simulations illustrate the effectiveness of the proposed algorithms.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The combination of the synthetic minority oversampling technique (SMOTE) and the radial basis function (RBF) classifier is proposed to deal with classification for imbalanced two-class data. In order to enhance the significance of the small and specific region belonging to the positive class in the decision region, the SMOTE is applied to generate synthetic instances for the positive class to balance the training data set. Based on the over-sampled training data, the RBF classifier is constructed by applying the orthogonal forward selection procedure, in which the classifier structure and the parameters of RBF kernels are determined using a particle swarm optimization algorithm based on the criterion of minimizing the leave-one-out misclassification rate. The experimental results on both simulated and real imbalanced data sets are presented to demonstrate the effectiveness of our proposed algorithm.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

This contribution proposes a powerful technique for two-class imbalanced classification problems by combining the synthetic minority over-sampling technique (SMOTE) and the particle swarm optimisation (PSO) aided radial basis function (RBF) classifier. In order to enhance the significance of the small and specific region belonging to the positive class in the decision region, the SMOTE is applied to generate synthetic instances for the positive class to balance the training data set. Based on the over-sampled training data, the RBF classifier is constructed by applying the orthogonal forward selection procedure, in which the classifier's structure and the parameters of RBF kernels are determined using a PSO algorithm based on the criterion of minimising the leave-one-out misclassification rate. The experimental results obtained on a simulated imbalanced data set and three real imbalanced data sets are presented to demonstrate the effectiveness of our proposed algorithm.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

This contribution proposes a novel probability density function (PDF) estimation based over-sampling (PDFOS) approach for two-class imbalanced classification problems. The classical Parzen-window kernel function is adopted to estimate the PDF of the positive class. Then according to the estimated PDF, synthetic instances are generated as the additional training data. The essential concept is to re-balance the class distribution of the original imbalanced data set under the principle that synthetic data sample follows the same statistical properties. Based on the over-sampled training data, the radial basis function (RBF) classifier is constructed by applying the orthogonal forward selection procedure, in which the classifier’s structure and the parameters of RBF kernels are determined using a particle swarm optimisation algorithm based on the criterion of minimising the leave-one-out misclassification rate. The effectiveness of the proposed PDFOS approach is demonstrated by the empirical study on several imbalanced data sets.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The assimilation of measurements from the stratosphere and mesosphere is becoming increasingly common as the lids of weather prediction and climate models rise into the mesosphere and thermosphere. However, the dynamics of the middle atmosphere pose specific challenges to the assimilation of measurements from this region. Forecast-error variances can be very large in the mesosphere and this can render assimilation schemes very sensitive to the details of the specification of forecast error correlations. An example is shown where observations in the stratosphere are able to produce increments in the mesosphere. Such sensitivity of the assimilation scheme to misspecification of covariances can also amplify any existing biases in measurements or forecasts. Since both models and measurements of the middle atmosphere are known to have biases, the separation of these sources of bias remains a issue. Finally, well-known deficiencies of assimilation schemes, such as the production of imbalanced states or the assumption of zero bias, are proposed explanations for the inaccurate transport resulting from assimilated winds. The inability of assimilated winds to accurately transport constituents in the middle atmosphere remains a fundamental issue limiting the use of assimilated products for applications involving longer time-scales.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

There has been a significant increase in the skill and resolution of numerical weather prediction models (NWPs) in recent decades, extending the time scales of useful weather predictions. The land-surface models (LSMs) of NWPs are often employed in hydrological applications, which raises the question of how hydrologically representative LSMs really are. In this paper, precipitation (P), evaporation (E) and runoff (R) from the European Centre for Medium-Range Weather Forecasts (ECMWF) global models were evaluated against observational products. The forecasts differ substantially from observed data for key hydrological variables. In addition, imbalanced surface water budgets, mostly caused by data assimilation, were found on both global (P-E) and basin scales (P-E-R), with the latter being more important. Modeled surface fluxes should be used with care in hydrological applications and further improvement in LSMs in terms of process descriptions, resolution and estimation of uncertainties is needed to accurately describe the land-surface water budgets.