154 resultados para Kernel density estimator


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background: Microarray based comparative genomic hybridisation (CGH) experiments have been used to study numerous biological problems including understanding genome plasticity in pathogenic bacteria. Typically such experiments produce large data sets that are difficult for biologists to handle. Although there are some programmes available for interpretation of bacterial transcriptomics data and CGH microarray data for looking at genetic stability in oncogenes, there are none specifically to understand the mosaic nature of bacterial genomes. Consequently a bottle neck still persists in accurate processing and mathematical analysis of these data. To address this shortfall we have produced a simple and robust CGH microarray data analysis process that may be automated in the future to understand bacterial genomic diversity. Results: The process involves five steps: cleaning, normalisation, estimating gene presence and absence or divergence, validation, and analysis of data from test against three reference strains simultaneously. Each stage of the process is described and we have compared a number of methods available for characterising bacterial genomic diversity, for calculating the cut-off between gene presence and absence or divergence, and shown that a simple dynamic approach using a kernel density estimator performed better than both established, as well as a more sophisticated mixture modelling technique. We have also shown that current methods commonly used for CGH microarray analysis in tumour and cancer cell lines are not appropriate for analysing our data. Conclusion: After carrying out the analysis and validation for three sequenced Escherichia coli strains, CGH microarray data from 19 E. coli O157 pathogenic test strains were used to demonstrate the benefits of applying this simple and robust process to CGH microarray studies using bacterial genomes.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We generalize the popular ensemble Kalman filter to an ensemble transform filter, in which the prior distribution can take the form of a Gaussian mixture or a Gaussian kernel density estimator. The design of the filter is based on a continuous formulation of the Bayesian filter analysis step. We call the new filter algorithm the ensemble Gaussian-mixture filter (EGMF). The EGMF is implemented for three simple test problems (Brownian dynamics in one dimension, Langevin dynamics in two dimensions and the three-dimensional Lorenz-63 model). It is demonstrated that the EGMF is capable of tracking systems with non-Gaussian uni- and multimodal ensemble distributions. Copyright © 2011 Royal Meteorological Society

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Using the classical Parzen window estimate as the target function, the kernel density estimation is formulated as a regression problem and the orthogonal forward regression technique is adopted to construct sparse kernel density estimates. The proposed algorithm incrementally minimises a leave-one-out test error score to select a sparse kernel model, and a local regularisation method is incorporated into the density construction process to further enforce sparsity. The kernel weights are finally updated using the multiplicative nonnegative quadratic programming algorithm, which has the ability to reduce the model size further. Except for the kernel width, the proposed algorithm has no other parameters that need tuning, and the user is not required to specify any additional criterion to terminate the density construction procedure. Two examples are used to demonstrate the ability of this regression-based approach to effectively construct a sparse kernel density estimate with comparable accuracy to that of the full-sample optimised Parzen window density estimate.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

A new sparse kernel probability density function (pdf) estimator based on zero-norm constraint is constructed using the classical Parzen window (PW) estimate as the target function. The so-called zero-norm of the parameters is used in order to achieve enhanced model sparsity, and it is suggested to minimize an approximate function of the zero-norm. It is shown that under certain condition, the kernel weights of the proposed pdf estimator based on the zero-norm approximation can be updated using the multiplicative nonnegative quadratic programming algorithm. Numerical examples are employed to demonstrate the efficacy of the proposed approach.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A unified approach is proposed for sparse kernel data modelling that includes regression and classification as well as probability density function estimation. The orthogonal-least-squares forward selection method based on the leave-one-out test criteria is presented within this unified data-modelling framework to construct sparse kernel models that generalise well. Examples from regression, classification and density estimation applications are used to illustrate the effectiveness of this generic sparse kernel data modelling approach.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Many kernel classifier construction algorithms adopt classification accuracy as performance metrics in model evaluation. Moreover, equal weighting is often applied to each data sample in parameter estimation. These modeling practices often become problematic if the data sets are imbalanced. We present a kernel classifier construction algorithm using orthogonal forward selection (OFS) in order to optimize the model generalization for imbalanced two-class data sets. This kernel classifier identification algorithm is based on a new regularized orthogonal weighted least squares (ROWLS) estimator and the model selection criterion of maximal leave-one-out area under curve (LOO-AUC) of the receiver operating characteristics (ROCs). It is shown that, owing to the orthogonalization procedure, the LOO-AUC can be calculated via an analytic formula based on the new regularized orthogonal weighted least squares parameter estimator, without actually splitting the estimation data set. The proposed algorithm can achieve minimal computational expense via a set of forward recursive updating formula in searching model terms with maximal incremental LOO-AUC value. Numerical examples are used to demonstrate the efficacy of the algorithm.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A generalized or tunable-kernel model is proposed for probability density function estimation based on an orthogonal forward regression procedure. Each stage of the density estimation process determines a tunable kernel, namely, its center vector and diagonal covariance matrix, by minimizing a leave-one-out test criterion. The kernel mixing weights of the constructed sparse density estimate are finally updated using the multiplicative nonnegative quadratic programming algorithm to ensure the nonnegative and unity constraints, and this weight-updating process additionally has the desired ability to further reduce the model size. The proposed tunable-kernel model has advantages, in terms of model generalization capability and model sparsity, over the standard fixed-kernel model that restricts kernel centers to the training data points and employs a single common kernel variance for every kernel. On the other hand, it does not optimize all the model parameters together and thus avoids the problems of high-dimensional ill-conditioned nonlinear optimization associated with the conventional finite mixture model. Several examples are included to demonstrate the ability of the proposed novel tunable-kernel model to effectively construct a very compact density estimate accurately.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Atmospheric electricity measurements were made at Lerwick Observatory in the Shetland Isles (60°09′N, 1°08′W) during most of the 20th century. The Potential Gradient (PG) was measured from 1926 to 84 and the air-earth conduction current (Jc) was measured during the final decade of the PG measurements. Daily Jc values (1978–1984) observed at 15 UT are presented here for the first time, with independently-obtained PG measurements used to select valid data. The 15 UT Jc (1978–1984) spans 0.5–9.5 pA/m2, with median 2.5 pA/m2; the columnar resistance at Lerwick is estimated as 70 PΩm2. Smoke measurements confirm the low pollution properties of the site. Analysis of the monthly variation of Lerwick Jc data shows that winter (DJF) Jc is significantly greater than the summer (JJA) Jc by 20%. The Lerwick atmospheric electricity seasonality differs from the global lightning seasonality, but Jc has a similar seasonal phasing to that observed in Nimbostratus clouds globally, suggesting a role for non-thunderstorm rain clouds in the seasonality of the global circuit.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Point defects in metal oxides such as TiO2 are key to their applications in numerous technologies. The investigation of thermally induced nonstoichiometry in TiO2 is complicated by the difficulties in preparing and determining a desired degree of nonstoichiometry. We study controlled self-doping of TiO2 by adsorption of 1/8 and 1/16 monolayer Ti at the (110) surface using a combination of experimental and computational approaches to unravel the details of the adsorption process and the oxidation state of Ti. Upon adsorption of Ti, x-ray and ultraviolet photoemission spectroscopy (XPS and UPS) show formation of reduced Ti. Comparison of pure density functional theory (DFT) with experiment shows that pure DFT provides an inconsistent description of the electronic structure. To surmount this difficulty, we apply DFT corrected for on-site Coulomb interaction (DFT+U) to describe reduced Ti ions. The optimal value of U is 3 eV, determined from comparison of the computed Ti 3d electronic density of states with the UPS data. DFT+U and UPS show the appearance of a Ti 3d adsorbate-induced state at 1.3 eV above the valence band and 1.0 eV below the conduction band. The computations show that the adsorbed Ti atom is oxidized to Ti2+ and a fivefold coordinated surface Ti atom is reduced to Ti3+, while the remaining electron is distributed among other surface Ti atoms. The UPS data are best fitted with reduced Ti2+ and Ti3+ ions. These results demonstrate that the complexity of doped metal oxides is best understood with a combination of experiment and appropriate computations.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The shallow water equations are solved using a mesh of polygons on the sphere, which adapts infrequently to the predicted future solution. Infrequent mesh adaptation reduces the cost of adaptation and load-balancing and will thus allow for more accurate mapping on adaptation. We simulate the growth of a barotropically unstable jet adapting the mesh every 12 h. Using an adaptation criterion based largely on the gradient of the vorticity leads to a mesh with around 20 per cent of the cells of a uniform mesh that gives equivalent results. This is a similar proportion to previous studies of the same test case with mesh adaptation every 1–20 min. The prediction of the mesh density involves solving the shallow water equations on a coarse mesh in advance of the locally refined mesh in order to estimate where features requiring higher resolution will grow, decay or move to. The adaptation criterion consists of two parts: that resolved on the coarse mesh, and that which is not resolved and so is passively advected on the coarse mesh. This combination leads to a balance between resolving features controlled by the large-scale dynamics and maintaining fine-scale features.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Models developed to identify the rates and origins of nutrient export from land to stream require an accurate assessment of the nutrient load present in the water body in order to calibrate model parameters and structure. These data are rarely available at a representative scale and in an appropriate chemical form except in research catchments. Observational errors associated with nutrient load estimates based on these data lead to a high degree of uncertainty in modelling and nutrient budgeting studies. Here, daily paired instantaneous P and flow data for 17 UK research catchments covering a total of 39 water years (WY) have been used to explore the nature and extent of the observational error associated with nutrient flux estimates based on partial fractions and infrequent sampling. The daily records were artificially decimated to create 7 stratified sampling records, 7 weekly records, and 30 monthly records from each WY and catchment. These were used to evaluate the impact of sampling frequency on load estimate uncertainty. The analysis underlines the high uncertainty of load estimates based on monthly data and individual P fractions rather than total P. Catchments with a high baseflow index and/or low population density were found to return a lower RMSE on load estimates when sampled infrequently than those with a tow baseflow index and high population density. Catchment size was not shown to be important, though a limitation of this study is that daily records may fail to capture the full range of P export behaviour in smaller catchments with flashy hydrographs, leading to an underestimate of uncertainty in Load estimates for such catchments. Further analysis of sub-daily records is needed to investigate this fully. Here, recommendations are given on load estimation methodologies for different catchment types sampled at different frequencies, and the ways in which this analysis can be used to identify observational error and uncertainty for model calibration and nutrient budgeting studies. (c) 2006 Elsevier B.V. All rights reserved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The density (BSG) of bone increases, at the osteon scale, during lifetime aging within the bone. In addition, post-mortem diagenetic change due to microbial attack produces denser bioapatite. Thus, fractionation of finely powdered bone on the basis of density should not only enable younger and older populations of osteons to be separated but also make it possible to separate out a less diagenetically altered component. We show that the density fractionation method can be used as a tool to investigate the isotopic history within an individual's lifetime, both in recent and archaeological contexts, and we use the bomb C-14 atmospheric pulse for validating the method.