18 resultados para imbalanced class problem

em CentAUR: Central Archive University of Reading - UK


Relevância:

90.00% 90.00%

Publicador:

Resumo:

We propose a new class of neurofuzzy construction algorithms with the aim of maximizing generalization capability specifically for imbalanced data classification problems based on leave-one-out (LOO) cross validation. The algorithms are in two stages, first an initial rule base is constructed based on estimating the Gaussian mixture model with analysis of variance decomposition from input data; the second stage carries out the joint weighted least squares parameter estimation and rule selection using orthogonal forward subspace selection (OFSS)procedure. We show how different LOO based rule selection criteria can be incorporated with OFSS, and advocate either maximizing the leave-one-out area under curve of the receiver operating characteristics, or maximizing the leave-one-out Fmeasure if the data sets exhibit imbalanced class distribution. Extensive comparative simulations illustrate the effectiveness of the proposed algorithms.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Recent studies showed that features extracted from brain MRIs can well discriminate Alzheimer’s disease from Mild Cognitive Impairment. This study provides an algorithm that sequentially applies advanced feature selection methods for findings the best subset of features in terms of binary classification accuracy. The classifiers that provided the highest accuracies, have been then used for solving a multi-class problem by the one-versus-one strategy. Although several approaches based on Regions of Interest (ROIs) extraction exist, the prediction power of features has not yet investigated by comparing filter and wrapper techniques. The findings of this work suggest that (i) the IntraCranial Volume (ICV) normalization can lead to overfitting and worst the accuracy prediction of test set and (ii) the combined use of a Random Forest-based filter with a Support Vector Machines-based wrapper, improves accuracy of binary classification.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Many kernel classifier construction algorithms adopt classification accuracy as performance metrics in model evaluation. Moreover, equal weighting is often applied to each data sample in parameter estimation. These modeling practices often become problematic if the data sets are imbalanced. We present a kernel classifier construction algorithm using orthogonal forward selection (OFS) in order to optimize the model generalization for imbalanced two-class data sets. This kernel classifier identification algorithm is based on a new regularized orthogonal weighted least squares (ROWLS) estimator and the model selection criterion of maximal leave-one-out area under curve (LOO-AUC) of the receiver operating characteristics (ROCs). It is shown that, owing to the orthogonalization procedure, the LOO-AUC can be calculated via an analytic formula based on the new regularized orthogonal weighted least squares parameter estimator, without actually splitting the estimation data set. The proposed algorithm can achieve minimal computational expense via a set of forward recursive updating formula in searching model terms with maximal incremental LOO-AUC value. Numerical examples are used to demonstrate the efficacy of the algorithm.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This contribution proposes a powerful technique for two-class imbalanced classification problems by combining the synthetic minority over-sampling technique (SMOTE) and the particle swarm optimisation (PSO) aided radial basis function (RBF) classifier. In order to enhance the significance of the small and specific region belonging to the positive class in the decision region, the SMOTE is applied to generate synthetic instances for the positive class to balance the training data set. Based on the over-sampled training data, the RBF classifier is constructed by applying the orthogonal forward selection procedure, in which the classifier's structure and the parameters of RBF kernels are determined using a PSO algorithm based on the criterion of minimising the leave-one-out misclassification rate. The experimental results obtained on a simulated imbalanced data set and three real imbalanced data sets are presented to demonstrate the effectiveness of our proposed algorithm.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

We consider the numerical treatment of second kind integral equations on the real line of the form ∅(s) = ∫_(-∞)^(+∞)▒〖κ(s-t)z(t)ϕ(t)dt,s=R〗 (abbreviated ϕ= ψ+K_z ϕ) in which K ϵ L_1 (R), z ϵ L_∞ (R) and ψ ϵ BC(R), the space of bounded continuous functions on R, are assumed known and ϕ ϵ BC(R) is to be determined. We first derive sharp error estimates for the finite section approximation (reducing the range of integration to [-A, A]) via bounds on (1-K_z )^(-1)as an operator on spaces of weighted continuous functions. Numerical solution by a simple discrete collocation method on a uniform grid on R is then analysed: in the case when z is compactly supported this leads to a coefficient matrix which allows a rapid matrix-vector multiply via the FFT. To utilise this possibility we propose a modified two-grid iteration, a feature of which is that the coarse grid matrix is approximated by a banded matrix, and analyse convergence and computational cost. In cases where z is not compactly supported a combined finite section and two-grid algorithm can be applied and we extend the analysis to this case. As an application we consider acoustic scattering in the half-plane with a Robin or impedance boundary condition which we formulate as a boundary integral equation of the class studied. Our final result is that if z (related to the boundary impedance in the application) takes values in an appropriate compact subset Q of the complex plane, then the difference between ϕ(s)and its finite section approximation computed numerically using the iterative scheme proposed is ≤C_1 [kh log⁡〖(1⁄kh)+(1-Θ)^((-1)⁄2) (kA)^((-1)⁄2) 〗 ] in the interval [-ΘA,ΘA](Θ<1) for kh sufficiently small, where k is the wavenumber and h the grid spacing. Moreover this numerical approximation can be computed in ≤C_2 N log⁡N operations, where N = 2A/h is the number of degrees of freedom. The values of the constants C1 and C2 depend only on the set Q and not on the wavenumber k or the support of z.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This contribution proposes a novel probability density function (PDF) estimation based over-sampling (PDFOS) approach for two-class imbalanced classification problems. The classical Parzen-window kernel function is adopted to estimate the PDF of the positive class. Then according to the estimated PDF, synthetic instances are generated as the additional training data. The essential concept is to re-balance the class distribution of the original imbalanced data set under the principle that synthetic data sample follows the same statistical properties. Based on the over-sampled training data, the radial basis function (RBF) classifier is constructed by applying the orthogonal forward selection procedure, in which the classifier’s structure and the parameters of RBF kernels are determined using a particle swarm optimisation algorithm based on the criterion of minimising the leave-one-out misclassification rate. The effectiveness of the proposed PDFOS approach is demonstrated by the empirical study on several imbalanced data sets.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Objective: To introduce a new approach to problem based learning (PBL) used in the context of medicinal chemistry practical class teaching pharmacy students. Design: The described chemistry practical is based on independent studies by small groups of undergraduate students (4-5), who design their own practical work taking relevant professional standards into account. Students are carefully guided by feedback and acquire a set of skills important to their future profession as healthcare professionals. This model has been tailored to the application of PBL in a chemistry practical class setting for a large student cohort (150 students). Assessment: The achievement of learning outcomes is based on the submission of relevant documentation including a certificate of analysis, in addition to peer assessment. Some of the learning outcomes are also assessed in the final written examination at the end of the academic year. Conclusion: The described design of a novel PBL chemistry laboratory course for pharmacy students has been found to be successful. Self-reflective learning and engagement with feedback were encouraged, and students enjoyed the challenging learning experience. Skills that are highly essential for the students’ future careers as healthcare professionals are promoted.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper, an improved stochastic discrimination (SD) is introduced to reduce the error rate of the standard SD in the context of multi-class classification problem. The learning procedure of the improved SD consists of two stages. In the first stage, a standard SD, but with shorter learning period is carried out to identify an important space where all the misclassified samples are located. In the second stage, the standard SD is modified by (i) restricting sampling in the important space; and (ii) introducing a new discriminant function for samples in the important space. It is shown by mathematical derivation that the new discriminant function has the same mean, but smaller variance than that of standard SD for samples in the important space. It is also analyzed that the smaller the variance of the discriminant function, the lower the error rate of the classifier. Consequently, the proposed improved SD improves standard SD by its capability of achieving higher classification accuracy. Illustrative examples axe provided to demonstrate the effectiveness of the proposed improved SD.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We analyze a fully discrete spectral method for the numerical solution of the initial- and periodic boundary-value problem for two nonlinear, nonlocal, dispersive wave equations, the Benjamin–Ono and the Intermediate Long Wave equations. The equations are discretized in space by the standard Fourier–Galerkin spectral method and in time by the explicit leap-frog scheme. For the resulting fully discrete, conditionally stable scheme we prove an L2-error bound of spectral accuracy in space and of second-order accuracy in time.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A technique is derived for solving a non-linear optimal control problem by iterating on a sequence of simplified problems in linear quadratic form. The technique is designed to achieve the correct solution of the original non-linear optimal control problem in spite of these simplifications. A mixed approach with a discrete performance index and continuous state variable system description is used as the basis of the design, and it is shown how the global problem can be decomposed into local sub-system problems and a co-ordinator within a hierarchical framework. An analysis of the optimality and convergence properties of the algorithm is presented and the effectiveness of the technique is demonstrated using a simulation example with a non-separable performance index.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The combination of the synthetic minority oversampling technique (SMOTE) and the radial basis function (RBF) classifier is proposed to deal with classification for imbalanced two-class data. In order to enhance the significance of the small and specific region belonging to the positive class in the decision region, the SMOTE is applied to generate synthetic instances for the positive class to balance the training data set. Based on the over-sampled training data, the RBF classifier is constructed by applying the orthogonal forward selection procedure, in which the classifier structure and the parameters of RBF kernels are determined using a particle swarm optimization algorithm based on the criterion of minimizing the leave-one-out misclassification rate. The experimental results on both simulated and real imbalanced data sets are presented to demonstrate the effectiveness of our proposed algorithm.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A three-point difference scheme recently proposed in Ref. 1 for the numerical solution of a class of linear, singularly perturbed, two-point boundary-value problems is investigated. The scheme is derived from a first-order approximation to the original problem with a small deviating argument. It is shown here that, in the limit, as the deviating argument tends to zero, the difference scheme converges to a one-sided approximation to the original singularly perturbed equation in conservation form. The limiting scheme is shown to be stable on any uniform grid. Therefore, no advantage arises from using the deviating argument, and the most accurate and efficient results are obtained with the deviation at its zero limit.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We propose a Nystr¨om/product integration method for a class of second kind integral equations on the real line which arise in problems of two-dimensional scalar and elastic wave scattering by unbounded surfaces. Stability and convergence of the method is established with convergence rates dependent on the smoothness of components of the kernel. The method is applied to the problem of acoustic scattering by a sound soft one-dimensional surface which is the graph of a function f, and superalgebraic convergence is established in the case when f is infinitely smooth. Numerical results are presented illustrating this behavior for the case when f is periodic (the diffraction grating case). The Nystr¨om method for this problem is stable and convergent uniformly with respect to the period of the grating, in contrast to standard integral equation methods for diffraction gratings which fail at a countable set of grating periods.