Biblioteca Digital

38 resultados para kernel density estimation

A fast algorithm for sparse probability density function construction

Relevância:

90.00% 90.00%

Publicador:

Resumo:

A new sparse kernel density estimator is introduced. Our main contribution is to develop a recursive algorithm for the selection of signiﬁcant kernels one at time using the minimum integrated square error (MISE) criterion for both kernel selection. The proposed approach is simple to implement and the associated computational cost is very low. Numerical examples are employed to demonstrate that the proposed approach is effective in constructing sparse kernel density estimators with competitive accuracy to existing kernel density estimators.

Sparse density estimator with tunable kernels

Relevância:

90.00% 90.00%

Publicador:

Resumo:

A new sparse kernel density estimator with tunable kernels is introduced within a forward constrained regression framework whereby the nonnegative and summing-to-unity constraints of the mixing weights can easily be satisfied. Based on the minimum integrated square error criterion, a recursive algorithm is developed to select significant kernels one at time, and the kernel width of the selected kernel is then tuned using the gradient descent algorithm. Numerical examples are employed to demonstrate that the proposed approach is effective in constructing very sparse kernel density estimators with competitive accuracy to existing kernel density estimators.

PETS2009: dataset and challenge

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper describes the crowd image analysis challenge that forms part of the PETS 2009 workshop. The aim of this challenge is to use new or existing systems for i) crowd count and density estimation, ii) tracking of individual(s) within a crowd, and iii) detection of separate flows and specific crowd events, in a real-world environment. The dataset scenarios were filmed from multiple cameras and involve multiple actors.

An overview of the PETS 2009 challenge

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper describes the crowd image analysis challenge that forms part of the PETS 2009 workshop. The aim of this challenge is to use new or existing systems for i) crowd count and density estimation, ii) tracking of individual(s) within a crowd, and iii) detection of separate flows and specific crowd events, in a real-world environment. The dataset scenarios were filmed from multiple cameras and involve multiple actors.

Particle swarm optimization aided orthogonal forward regression for unified data modeling

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We propose a unified data modeling approach that is equally applicable to supervised regression and classification applications, as well as to unsupervised probability density function estimation. A particle swarm optimization (PSO) aided orthogonal forward regression (OFR) algorithm based on leave-one-out (LOO) criteria is developed to construct parsimonious radial basis function (RBF) networks with tunable nodes. Each stage of the construction process determines the center vector and diagonal covariance matrix of one RBF node by minimizing the LOO statistics. For regression applications, the LOO criterion is chosen to be the LOO mean square error, while the LOO misclassification rate is adopted in two-class classification applications. By adopting the Parzen window estimate as the desired response, the unsupervised density estimation problem is transformed into a constrained regression problem. This PSO aided OFR algorithm for tunable-node RBF networks is capable of constructing very parsimonious RBF models that generalize well, and our analysis and experimental results demonstrate that the algorithm is computationally even simpler than the efficient regularization assisted orthogonal least square algorithm based on LOO criteria for selecting fixed-node RBF models. Another significant advantage of the proposed learning procedure is that it does not have learning hyperparameters that have to be tuned using costly cross validation. The effectiveness of the proposed PSO aided OFR construction procedure is illustrated using several examples taken from regression and classification, as well as density estimation applications.

A process for analysis of microarray comparative genomics hybridisation studies for bacterial genomes

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Background: Microarray based comparative genomic hybridisation (CGH) experiments have been used to study numerous biological problems including understanding genome plasticity in pathogenic bacteria. Typically such experiments produce large data sets that are difficult for biologists to handle. Although there are some programmes available for interpretation of bacterial transcriptomics data and CGH microarray data for looking at genetic stability in oncogenes, there are none specifically to understand the mosaic nature of bacterial genomes. Consequently a bottle neck still persists in accurate processing and mathematical analysis of these data. To address this shortfall we have produced a simple and robust CGH microarray data analysis process that may be automated in the future to understand bacterial genomic diversity. Results: The process involves five steps: cleaning, normalisation, estimating gene presence and absence or divergence, validation, and analysis of data from test against three reference strains simultaneously. Each stage of the process is described and we have compared a number of methods available for characterising bacterial genomic diversity, for calculating the cut-off between gene presence and absence or divergence, and shown that a simple dynamic approach using a kernel density estimator performed better than both established, as well as a more sophisticated mixture modelling technique. We have also shown that current methods commonly used for CGH microarray analysis in tumour and cancer cell lines are not appropriate for analysing our data. Conclusion: After carrying out the analysis and validation for three sequenced Escherichia coli strains, CGH microarray data from 19 E. coli O157 pathogenic test strains were used to demonstrate the benefits of applying this simple and robust process to CGH microarray studies using bacterial genomes.

A Gaussian-mixture ensemble transform filter

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We generalize the popular ensemble Kalman filter to an ensemble transform filter, in which the prior distribution can take the form of a Gaussian mixture or a Gaussian kernel density estimator. The design of the filter is based on a continuous formulation of the Bayesian filter analysis step. We call the new filter algorithm the ensemble Gaussian-mixture filter (EGMF). The EGMF is implemented for three simple test problems (Brownian dynamics in one dimension, Langevin dynamics in two dimensions and the three-dimensional Lorenz-63 model). It is demonstrated that the EGMF is capable of tracking systems with non-Gaussian uni- and multimodal ensemble distributions. Copyright © 2011 Royal Meteorological Society

Using zero-norm constraint for sparse probability density function estimation

Relevância:

50.00% 50.00%

Publicador:

Resumo:

A new sparse kernel probability density function (pdf) estimator based on zero-norm constraint is constructed using the classical Parzen window (PW) estimate as the target function. The so-called zero-norm of the parameters is used in order to achieve enhanced model sparsity, and it is suggested to minimize an approximate function of the zero-norm. It is shown that under certain condition, the kernel weights of the proposed pdf estimator based on the zero-norm approximation can be updated using the multiplicative nonnegative quadratic programming algorithm. Numerical examples are employed to demonstrate the efficacy of the proposed approach.

Uncertainties in annual riverine phosphorus load estimation: Impact of load estimation methodology, sampling frequency, baseflow index and catchment population density

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Models developed to identify the rates and origins of nutrient export from land to stream require an accurate assessment of the nutrient load present in the water body in order to calibrate model parameters and structure. These data are rarely available at a representative scale and in an appropriate chemical form except in research catchments. Observational errors associated with nutrient load estimates based on these data lead to a high degree of uncertainty in modelling and nutrient budgeting studies. Here, daily paired instantaneous P and flow data for 17 UK research catchments covering a total of 39 water years (WY) have been used to explore the nature and extent of the observational error associated with nutrient flux estimates based on partial fractions and infrequent sampling. The daily records were artificially decimated to create 7 stratified sampling records, 7 weekly records, and 30 monthly records from each WY and catchment. These were used to evaluate the impact of sampling frequency on load estimate uncertainty. The analysis underlines the high uncertainty of load estimates based on monthly data and individual P fractions rather than total P. Catchments with a high baseflow index and/or low population density were found to return a lower RMSE on load estimates when sampled infrequently than those with a tow baseflow index and high population density. Catchment size was not shown to be important, though a limitation of this study is that daily records may fail to capture the full range of P export behaviour in smaller catchments with flashy hydrographs, leading to an underestimate of uncertainty in Load estimates for such catchments. Further analysis of sub-daily records is needed to investigate this fully. Here, recommendations are given on load estimation methodologies for different catchment types sampled at different frequencies, and the ways in which this analysis can be used to identify observational error and uncertainty for model calibration and nutrient budgeting studies. (c) 2006 Elsevier B.V. All rights reserved.

Probability density function estimation based over-sampling for imbalanced two-class problems

Relevância:

40.00% 40.00%

Publicador:

PDFOS: PDF estimation based over-sampling for imbalanced two-class problems

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This contribution proposes a novel probability density function (PDF) estimation based over-sampling (PDFOS) approach for two-class imbalanced classification problems. The classical Parzen-window kernel function is adopted to estimate the PDF of the positive class. Then according to the estimated PDF, synthetic instances are generated as the additional training data. The essential concept is to re-balance the class distribution of the original imbalanced data set under the principle that synthetic data sample follows the same statistical properties. Based on the over-sampled training data, the radial basis function (RBF) classifier is constructed by applying the orthogonal forward selection procedure, in which the classifier’s structure and the parameters of RBF kernels are determined using a particle swarm optimisation algorithm based on the criterion of minimising the leave-one-out misclassification rate. The effectiveness of the proposed PDFOS approach is demonstrated by the empirical study on several imbalanced data sets.

A kernel-based two-class classifier for imbalanced data sets

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Many kernel classifier construction algorithms adopt classification accuracy as performance metrics in model evaluation. Moreover, equal weighting is often applied to each data sample in parameter estimation. These modeling practices often become problematic if the data sets are imbalanced. We present a kernel classifier construction algorithm using orthogonal forward selection (OFS) in order to optimize the model generalization for imbalanced two-class data sets. This kernel classifier identification algorithm is based on a new regularized orthogonal weighted least squares (ROWLS) estimator and the model selection criterion of maximal leave-one-out area under curve (LOO-AUC) of the receiver operating characteristics (ROCs). It is shown that, owing to the orthogonalization procedure, the LOO-AUC can be calculated via an analytic formula based on the new regularized orthogonal weighted least squares parameter estimator, without actually splitting the estimation data set. The proposed algorithm can achieve minimal computational expense via a set of forward recursive updating formula in searching model terms with maximal incremental LOO-AUC value. Numerical examples are used to demonstrate the efficacy of the algorithm.

The effects of variation in snow properties on snow mass estimation using the Chang algorithm

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Estimating snow mass at continental scales is difficult but important for understanding landatmosphere interactions, biogeochemical cycles and Northern latitudes’ hydrology. Remote sensing provides the only consistent global observations, but the uncertainty in measurements is poorly understood. Existing techniques for the remote sensing of snow mass are based on the Chang algorithm, which relates the absorption of Earth-emitted microwave radiation by a snow layer to the snow mass within the layer. The absorption also depends on other factors such as the snow grain size and density, which are assumed and fixed within the algorithm. We examine the assumptions, compare them to field measurements made at the NASA Cold Land Processes Experiment (CLPX) Colorado field site in 2002–3, and evaluate the consequences of deviation and variability for snow mass retrieval. The accuracy of the emission model used to devise the algorithm also has an impact on its accuracy, so we test this with the CLPX measurements of snow properties against SSM/I and AMSR-E satellite measurements.

Advancements in the estimation of ice particle fall speeds using laboratory and field measurements

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Accurate estimates for the fall speed of natural hydrometeors are vital if their evolution in clouds is to be understood quantitatively. In this study, laboratory measurements of the terminal velocity vt for a variety of ice particle models settling in viscous fluids, along with wind-tunnel and field measurements of ice particles settling in air, have been analyzed and compared to common methods of computing vt from the literature. It is observed that while these methods work well for a number of particle types, they fail for particles with open geometries, specifically those particles for which the area ratio Ar is small (Ar is defined as the area of the particle projected normal to the flow divided by the area of a circumscribing disc). In particular, the fall speeds of stellar and dendritic crystals, needles, open bullet rosettes, and low-density aggregates are all overestimated. These particle types are important in many cloud types: aggregates in particular often dominate snow precipitation at the ground and vertically pointing Doppler radar measurements. Based on the laboratory data, a simple modification to previous computational methods is proposed, based on the area ratio. This new method collapses the available drag data onto an approximately universal curve, and the resulting errors in the computed fall speeds relative to the tank data are less than 25% in all cases. Comparison with the (much more scattered) measurements of ice particles falling in air show strong support for this new method, with the area ratio bias apparently eliminated.

On merging gradient estimation with mean-tracking techniques for cluster identification

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper discusses how numerical gradient estimation methods may be used in order to reduce the computational demands on a class of multidimensional clustering algorithms. The study is motivated by the recognition that several current point-density based cluster identification algorithms could benefit from a reduction of computational demand if approximate a-priori estimates of the cluster centres present in a given data set could be supplied as starting conditions for these algorithms. In this particular presentation, the algorithm shown to benefit from the technique is the Mean-Tracking (M-T) cluster algorithm, but the results obtained from the gradient estimation approach may also be applied to other clustering algorithms and their related disciplines.

«
1
2
3
»