82 resultados para Classifier Generalization Ability
Resumo:
This paper presents an efficient construction algorithm for obtaining sparse kernel density estimates based on a regression approach that directly optimizes model generalization capability. Computational efficiency of the density construction is ensured using an orthogonal forward regression, and the algorithm incrementally minimizes the leave-one-out test score. A local regularization method is incorporated naturally into the density construction process to further enforce sparsity. An additional advantage of the proposed algorithm is that it is fully automatic and the user is not required to specify any criterion to terminate the density construction procedure. This is in contrast to an existing state-of-art kernel density estimation method using the support vector machine (SVM), where the user is required to specify some critical algorithm parameter. Several examples are included to demonstrate the ability of the proposed algorithm to effectively construct a very sparse kernel density estimate with comparable accuracy to that of the full sample optimized Parzen window density estimate. Our experimental results also demonstrate that the proposed algorithm compares favorably with the SVM method, in terms of both test accuracy and sparsity, for constructing kernel density estimates.
Resumo:
The paper introduces an efficient construction algorithm for obtaining sparse linear-in-the-weights regression models based on an approach of directly optimizing model generalization capability. This is achieved by utilizing the delete-1 cross validation concept and the associated leave-one-out test error also known as the predicted residual sums of squares (PRESS) statistic, without resorting to any other validation data set for model evaluation in the model construction process. Computational efficiency is ensured using an orthogonal forward regression, but the algorithm incrementally minimizes the PRESS statistic instead of the usual sum of the squared training errors. A local regularization method can naturally be incorporated into the model selection procedure to further enforce model sparsity. The proposed algorithm is fully automatic, and the user is not required to specify any criterion to terminate the model construction procedure. Comparisons with some of the existing state-of-art modeling methods are given, and several examples are included to demonstrate the ability of the proposed algorithm to effectively construct sparse models that generalize well.
Resumo:
A greedy technique is proposed to construct parsimonious kernel classifiers using the orthogonal forward selection method and boosting based on Fisher ratio for class separability measure. Unlike most kernel classification methods, which restrict kernel means to the training input data and use a fixed common variance for all the kernel terms, the proposed technique can tune both the mean vector and diagonal covariance matrix of individual kernel by incrementally maximizing Fisher ratio for class separability measure. An efficient weighted optimization method is developed based on boosting to append kernels one by one in an orthogonal forward selection procedure. Experimental results obtained using this construction technique demonstrate that it offers a viable alternative to the existing state-of-the-art kernel modeling methods for constructing sparse Gaussian radial basis function network classifiers. that generalize well.
Resumo:
This work compares and contrasts results of classifying time-domain ECG signals with pathological conditions taken from the MITBIH arrhythmia database. Linear discriminant analysis and a multi-layer perceptron were used as classifiers. The neural network was trained by two different methods, namely back-propagation and a genetic algorithm. Converting the time-domain signal into the wavelet domain reduced the dimensionality of the problem at least 10-fold. This was achieved using wavelets from the db6 family as well as using adaptive wavelets generated using two different strategies. The wavelet transforms used in this study were limited to two decomposition levels. A neural network with evolved weights proved to be the best classifier with a maximum of 99.6% accuracy when optimised wavelet-transform ECG data wits presented to its input and 95.9% accuracy when the signals presented to its input were decomposed using db6 wavelets. The linear discriminant analysis achieved a maximum classification accuracy of 95.7% when presented with optimised and 95.5% with db6 wavelet coefficients. It is shown that the much simpler signal representation of a few wavelet coefficients obtained through an optimised discrete wavelet transform facilitates the classification of non-stationary time-variant signals task considerably. In addition, the results indicate that wavelet optimisation may improve the classification ability of a neural network. (c) 2005 Elsevier B.V. All rights reserved.
Resumo:
We propose a simple yet computationally efficient construction algorithm for two-class kernel classifiers. In order to optimise classifier's generalisation capability, an orthogonal forward selection procedure is used to select kernels one by one by minimising the leave-one-out (LOO) misclassification rate directly. It is shown that the computation of the LOO misclassification rate is very efficient owing to orthogonalisation. Examples are used to demonstrate that the proposed algorithm is a viable alternative to construct sparse two-class kernel classifiers in terms of performance and computational efficiency.
Resumo:
Costs of resistance are widely assumed to be important in the evolution of parasite and pathogen defence in animals, but they have been demonstrated experimentally on very few occasions. Endoparasitoids are insects whose larvae develop inside the bodies of other insects where they defend themselves from attack by their hosts' immune systems (especially cellular encapsulation). Working with Drosophila melanogaster and its endoparasitoid Leptopilina boulardi, we selected for increased resistance in four replicate populations of flies. The percentage of flies surviving attack increased from about 0.5% to between 40% and 50% in five generations, revealing substantial additive genetic variation in resistance in the field population from which our culture was established. In comparison with four control lines, flies from selected lines suffered from lower larval survival under conditions of moderate to severe intraspecific competition.
Resumo:
In this paper the meteorological processes responsible for transporting tracer during the second ETEX (European Tracer EXperiment) release are determined using the UK Met Office Unified Model (UM). The UM predicted distribution of tracer is also compared with observations from the ETEX campaign. The dominant meteorological process is a warm conveyor belt which transports large amounts of tracer away from the surface up to a height of 4 km over a 36 h period. Convection is also an important process, transporting tracer to heights of up to 8 km. Potential sources of error when using an operational numerical weather prediction model to forecast air quality are also investigated. These potential sources of error include model dynamics, model resolution and model physics. In the UM a semi-Lagrangian monotonic advection scheme is used with cubic polynomial interpolation. This can predict unrealistic negative values of tracer which are subsequently set to zero, and hence results in an overprediction of tracer concentrations. In order to conserve mass in the UM tracer simulations it was necessary to include a flux corrected transport method. Model resolution can also affect the accuracy of predicted tracer distributions. Low resolution simulations (50 km grid length) were unable to resolve a change in wind direction observed during ETEX 2, this led to an error in the transport direction and hence an error in tracer distribution. High resolution simulations (12 km grid length) captured the change in wind direction and hence produced a tracer distribution that compared better with the observations. The representation of convective mixing was found to have a large effect on the vertical transport of tracer. Turning off the convective mixing parameterisation in the UM significantly reduced the vertical transport of tracer. Finally, air quality forecasts were found to be sensitive to the timing of synoptic scale features. Errors in the position of the cold front relative to the tracer release location of only 1 h resulted in changes in the predicted tracer concentrations that were of the same order of magnitude as the absolute tracer concentrations.
Resumo:
A generalized or tunable-kernel model is proposed for probability density function estimation based on an orthogonal forward regression procedure. Each stage of the density estimation process determines a tunable kernel, namely, its center vector and diagonal covariance matrix, by minimizing a leave-one-out test criterion. The kernel mixing weights of the constructed sparse density estimate are finally updated using the multiplicative nonnegative quadratic programming algorithm to ensure the nonnegative and unity constraints, and this weight-updating process additionally has the desired ability to further reduce the model size. The proposed tunable-kernel model has advantages, in terms of model generalization capability and model sparsity, over the standard fixed-kernel model that restricts kernel centers to the training data points and employs a single common kernel variance for every kernel. On the other hand, it does not optimize all the model parameters together and thus avoids the problems of high-dimensional ill-conditioned nonlinear optimization associated with the conventional finite mixture model. Several examples are included to demonstrate the ability of the proposed novel tunable-kernel model to effectively construct a very compact density estimate accurately.
Resumo:
We develop a particle swarm optimisation (PSO) aided orthogonal forward regression (OFR) approach for constructing radial basis function (RBF) classifiers with tunable nodes. At each stage of the OFR construction process, the centre vector and diagonal covariance matrix of one RBF node is determined efficiently by minimising the leave-one-out (LOO) misclassification rate (MR) using a PSO algorithm. Compared with the state-of-the-art regularisation assisted orthogonal least square algorithm based on the LOO MR for selecting fixednode RBF classifiers, the proposed PSO aided OFR algorithm for constructing tunable-node RBF classifiers offers significant advantages in terms of better generalisation performance and smaller model size as well as imposes lower computational complexity in classifier construction process. Moreover, the proposed algorithm does not have any hyperparameter that requires costly tuning based on cross validation.
Resumo:
Orlistat is an anti-obesity treatment with which several gastrointestinal (GI) side-effects are commonly associated in the initial stages of therapy. There is no physiological explanation as to why two-thirds of those who take the drug experience one or more side-effects. It has been hypothesized that the GI microbiota may protect from or contribute to these GI disturbances. Using in vitro batch culture and human gut model systems, studies were conducted to determine whether increased availability of dietary lipids and/or orlistat affect the composition and/or activity of the faecal microbiota. Results from 24-h batch culture fermentation experiments demonstrated no effect of orlistat in the presence or absence of a dietary lipid (olive oil) on the composition of bacterial communities [as determined by fluorescence in situ hybridization (FISH) and denaturing gradient gel electrophoresis (DGGE) analyses], but did show there was great variability in the lipolytic activities of the microbiotas of individuals, as determined by gas chromatography analysis of long-chain fatty acids in samples. Subsequent studies focused on the effect of orlistat in the presence and absence of lipid in in vitro human gut model systems. Systems were run for 14 days with gut model medium (GMM) only (to steady state, SS), then fed at 12-h intervals with 50 mg orlistat, 2 g olive oil or a mixture of both for 14 days. FISH and DGGE were used to monitor changes in bacterial populations. Bacteria were cultivated from the GMM only (control) systems at SS. All strains isolated were screened for lipolytic activity using tributyrin agar. FISH and DGGE demonstrated that none of the compounds (singly or in combination) added to the systems had any notable effect on microbial population dynamics for any of the donors, although Subdoligranulum populations appeared to be inhibited by orlistat in the presence or absence of lipid. Orlistat had little or no effect on the metabolism of indigenous and added lipids in the fermentation systems, but there was great variability in the way the faecal microbiotas of the donors were able to degrade added lipids. Variability in lipid degradation could be correlated with the number and activity of isolated lipolytic bacteria. The mechanism by which orlistat and the GI microbiota cause side-effects in individuals is unknown, but several hypotheses have been proposed to account for their manifestation. The demonstration of great variability in the lipolytic activity of microbiotas to degrade lipids led to a large-scale cultivation-based study of lipolytic/lipase-positive bacteria present in the human faecal microbiota. Of 4,000 colonies isolated from 15 donors using five different agars, 378 strains were identified that had lipase activity. Molecular identification of strains isolated from five donors demonstrated that lipase activity is more prevalent in the human GI microbiota than previously thought, with members of the phyla Firmicutes, Bacteroidetes and Actinobacteria identified. Molecular identification and characterization of the substrate specificities of the strains will be carried out as part of ongoing work.
Resumo:
The combination of the synthetic minority oversampling technique (SMOTE) and the radial basis function (RBF) classifier is proposed to deal with classification for imbalanced two-class data. In order to enhance the significance of the small and specific region belonging to the positive class in the decision region, the SMOTE is applied to generate synthetic instances for the positive class to balance the training data set. Based on the over-sampled training data, the RBF classifier is constructed by applying the orthogonal forward selection procedure, in which the classifier structure and the parameters of RBF kernels are determined using a particle swarm optimization algorithm based on the criterion of minimizing the leave-one-out misclassification rate. The experimental results on both simulated and real imbalanced data sets are presented to demonstrate the effectiveness of our proposed algorithm.
Resumo:
Grain legumes, such as peas (Pisum sativum L.), are known to be weak competitors against weeds when grown as the sole crop. In this study, the weed-suppression effect of pea–barley (Hordeum vulgare L.)intercropping compared to the respective sole crops was examined in organic field experiments across Western Europe (i.e., Denmark, the United Kingdom, France, Germany and Italy). Spring pea (P) and barley(B) were sown either as the sole crop, at the recommended plant density (P100 and B100, respectively), or in replacement (P50B50) or additive (P100B50)intercropping designs for three seasons (2003–2005). The weed biomass was three times higher under the pea sole crops than under both the intercrops and barley sole crops at maturity. The inclusion of joint experiments in several countries and various growing conditions showed that intercrops maintain a highly asymmetric competition over weeds, regardless of the particular weed infestation (species and productivity), the crop biomass or the soil nitrogen availability. The intercropping weed suppression was highly resilient, whereas the weed suppression in pea sole crops was lower and more variable. The pea–barley intercrops exhibited high levels of weed suppression, even with a low percentage of barley in the total biomass. Despite a reduced leaf area in the case of a low soil N availability, the barley sole crops and intercrops displayed high weed suppression, probably because of their strong competitive capability to absorb soil N. Higher soil N availabilities entailed increased leaf areas and competitive ability for light, which contributed to the overall competitive ability against weeds for all of the treatments. The contribution of the weeds in the total dry matter and soil N acquisition was higher in the pea sole crop than in the other treatments, in spite of the higher leaf areas in the pea crops.
Resumo:
This contribution proposes a powerful technique for two-class imbalanced classification problems by combining the synthetic minority over-sampling technique (SMOTE) and the particle swarm optimisation (PSO) aided radial basis function (RBF) classifier. In order to enhance the significance of the small and specific region belonging to the positive class in the decision region, the SMOTE is applied to generate synthetic instances for the positive class to balance the training data set. Based on the over-sampled training data, the RBF classifier is constructed by applying the orthogonal forward selection procedure, in which the classifier's structure and the parameters of RBF kernels are determined using a PSO algorithm based on the criterion of minimising the leave-one-out misclassification rate. The experimental results obtained on a simulated imbalanced data set and three real imbalanced data sets are presented to demonstrate the effectiveness of our proposed algorithm.