920 resultados para kernel estimators
Resumo:
A new sparse kernel probability density function (pdf) estimator based on zero-norm constraint is constructed using the classical Parzen window (PW) estimate as the target function. The so-called zero-norm of the parameters is used in order to achieve enhanced model sparsity, and it is suggested to minimize an approximate function of the zero-norm. It is shown that under certain condition, the kernel weights of the proposed pdf estimator based on the zero-norm approximation can be updated using the multiplicative nonnegative quadratic programming algorithm. Numerical examples are employed to demonstrate the efficacy of the proposed approach.
Resumo:
Two so-called “integrated” polarimetric rate estimation techniques, ZPHI (Testud et al., 2000) and ZZDR (Illingworth and Thompson, 2005), are evaluated using 12 episodes of the year 2005 observed by the French C-band operational Trappes radar, located near Paris. The term “integrated” means that the concentration parameter of the drop size distribution is assumed to be constant over some area and the algorithms retrieve it using the polarimetric variables in that area. The evaluation is carried out in ideal conditions (no partial beam blocking, no ground-clutter contamination, no bright band contamination, a posteriori calibration of the radar variables ZH and ZDR) using hourly rain gauges located at distances less than 60 km from the radar. Also included in the comparison, for the sake of benchmarking, is a conventional Z = 282R1.66 estimator, with and without attenuation correction and with and without adjustment by rain gauges as currently done operationally at Météo France. Under those ideal conditions, the two polarimetric algorithms, which rely solely on radar data, appear to perform as well if not better, pending on the measurements conditions (attenuation, rain rates, …), than the conventional algorithms, even when the latter take into account rain gauges through the adjustment scheme. ZZDR with attenuation correction is the best estimator for hourly rain gauge accumulations lower than 5 mm h−1 and ZPHI is the best one above that threshold. A perturbation analysis has been conducted to assess the sensitivity of the various estimators with respect to biases on ZH and ZDR, taking into account the typical accuracy and stability that can be reasonably achieved with modern operational radars these days (1 dB on ZH and 0.2 dB on ZDR). A +1 dB positive bias on ZH (radar too hot) results in a +14% overestimation of the rain rate with the conventional estimator used in this study (Z = 282R^1.66), a -19% underestimation with ZPHI and a +23% overestimation with ZZDR. Additionally, a +0.2 dB positive bias on ZDR results in a typical rain rate under- estimation of 15% by ZZDR.
Resumo:
Statistical graphics are a fundamental, yet often overlooked, set of components in the repertoire of data analytic tools. Graphs are quick and efficient, yet simple instruments of preliminary exploration of a dataset to understand its structure and to provide insight into influential aspects of inference such as departures from assumptions and latent patterns. In this paper, we present and assess a graphical device for choosing a method for estimating population size in capture-recapture studies of closed populations. The basic concept is derived from a homogeneous Poisson distribution where the ratios of neighboring Poisson probabilities multiplied by the value of the larger neighbor count are constant. This property extends to the zero-truncated Poisson distribution which is of fundamental importance in capture–recapture studies. In practice however, this distributional property is often violated. The graphical device developed here, the ratio plot, can be used for assessing specific departures from a Poisson distribution. For example, simple contaminations of an otherwise homogeneous Poisson model can be easily detected and a robust estimator for the population size can be suggested. Several robust estimators are developed and a simulation study is provided to give some guidance on which should be used in practice. More systematic departures can also easily be detected using the ratio plot. In this paper, the focus is on Gamma mixtures of the Poisson distribution which leads to a linear pattern (called structured heterogeneity) in the ratio plot. More generally, the paper shows that the ratio plot is monotone for arbitrary mixtures of power series densities.
Resumo:
Background: Microarray based comparative genomic hybridisation (CGH) experiments have been used to study numerous biological problems including understanding genome plasticity in pathogenic bacteria. Typically such experiments produce large data sets that are difficult for biologists to handle. Although there are some programmes available for interpretation of bacterial transcriptomics data and CGH microarray data for looking at genetic stability in oncogenes, there are none specifically to understand the mosaic nature of bacterial genomes. Consequently a bottle neck still persists in accurate processing and mathematical analysis of these data. To address this shortfall we have produced a simple and robust CGH microarray data analysis process that may be automated in the future to understand bacterial genomic diversity. Results: The process involves five steps: cleaning, normalisation, estimating gene presence and absence or divergence, validation, and analysis of data from test against three reference strains simultaneously. Each stage of the process is described and we have compared a number of methods available for characterising bacterial genomic diversity, for calculating the cut-off between gene presence and absence or divergence, and shown that a simple dynamic approach using a kernel density estimator performed better than both established, as well as a more sophisticated mixture modelling technique. We have also shown that current methods commonly used for CGH microarray analysis in tumour and cancer cell lines are not appropriate for analysing our data. Conclusion: After carrying out the analysis and validation for three sequenced Escherichia coli strains, CGH microarray data from 19 E. coli O157 pathogenic test strains were used to demonstrate the benefits of applying this simple and robust process to CGH microarray studies using bacterial genomes.
Resumo:
Liquid clouds play a profound role in the global radiation budget but it is difficult to remotely retrieve their vertical profile. Ordinary narrow field-of-view (FOV) lidars receive a strong return from such clouds but the information is limited to the first few optical depths. Wideangle multiple-FOV lidars can isolate radiation scattered multiple times before returning to the instrument, often penetrating much deeper into the cloud than the singly-scattered signal. These returns potentially contain information on the vertical profile of extinction coefficient, but are challenging to interpret due to the lack of a fast radiative transfer model for simulating them. This paper describes a variational algorithm that incorporates a fast forward model based on the time-dependent two-stream approximation, and its adjoint. Application of the algorithm to simulated data from a hypothetical airborne three-FOV lidar with a maximum footprint width of 600m suggests that this approach should be able to retrieve the extinction structure down to an optical depth of around 6, and total opticaldepth up to at least 35, depending on the maximum lidar FOV. The convergence behavior of Gauss-Newton and quasi-Newton optimization schemes are compared. We then present results from an application of the algorithm to observations of stratocumulus by the 8-FOV airborne “THOR” lidar. It is demonstrated how the averaging kernel can be used to diagnose the effective vertical resolution of the retrieved profile, and therefore the depth to which information on the vertical structure can be recovered. This work enables exploitation of returns from spaceborne lidar and radar subject to multiple scattering more rigorously than previously possible.
Resumo:
The estimation of the long-term wind resource at a prospective site based on a relatively short on-site measurement campaign is an indispensable task in the development of a commercial wind farm. The typical industry approach is based on the measure-correlate-predict �MCP� method where a relational model between the site wind velocity data and the data obtained from a suitable reference site is built from concurrent records. In a subsequent step, a long-term prediction for the prospective site is obtained from a combination of the relational model and the historic reference data. In the present paper, a systematic study is presented where three new MCP models, together with two published reference models �a simple linear regression and the variance ratio method�, have been evaluated based on concurrent synthetic wind speed time series for two sites, simulating the prospective and the reference site. The synthetic method has the advantage of generating time series with the desired statistical properties, including Weibull scale and shape factors, required to evaluate the five methods under all plausible conditions. In this work, first a systematic discussion of the statistical fundamentals behind MCP methods is provided and three new models, one based on a nonlinear regression and two �termed kernel methods� derived from the use of conditional probability density functions, are proposed. All models are evaluated by using five metrics under a wide range of values of the correlation coefficient, the Weibull scale, and the Weibull shape factor. Only one of all models, a kernel method based on bivariate Weibull probability functions, is capable of accurately predicting all performance metrics studied.
Resumo:
References (20)Cited By (1)Export CitationAboutAbstract Proper scoring rules provide a useful means to evaluate probabilistic forecasts. Independent from scoring rules, it has been argued that reliability and resolution are desirable forecast attributes. The mathematical expectation value of the score allows for a decomposition into reliability and resolution related terms, demonstrating a relationship between scoring rules and reliability/resolution. A similar decomposition holds for the empirical (i.e. sample average) score over an archive of forecast–observation pairs. This empirical decomposition though provides a too optimistic estimate of the potential score (i.e. the optimum score which could be obtained through recalibration), showing that a forecast assessment based solely on the empirical resolution and reliability terms will be misleading. The differences between the theoretical and empirical decomposition are investigated, and specific recommendations are given how to obtain better estimators of reliability and resolution in the case of the Brier and Ignorance scoring rule.
Resumo:
Sampling strategies for monitoring the status and trends in wildlife populations are often determined before the first survey is undertaken. However, there may be little information about the distribution of the population and so the sample design may be inefficient. Through time, as data are collected, more information about the distribution of animals in the survey region is obtained but it can be difficult to incorporate this information in the survey design. This paper introduces a framework for monitoring motile wildlife populations within which the design of future surveys can be adapted using data from past surveys whilst ensuring consistency in design-based estimates of status and trends through time. In each survey, part of the sample is selected from the previous survey sample using simple random sampling. The rest is selected with inclusion probability proportional to predicted abundance. Abundance is predicted using a model constructed from previous survey data and covariates for the whole survey region. Unbiased design-based estimators of status and trends and their variances are derived from two-phase sampling theory. Simulations over the short and long-term indicate that in general more precise estimates of status and trends are obtained using this mixed strategy than a strategy in which all of the sample is retained or all selected with probability proportional to predicted abundance. Furthermore the mixed strategy is robust to poor predictions of abundance. Estimates of status are more precise than those obtained from a rotating panel design.
Resumo:
Simulating spiking neural networks is of great interest to scientists wanting to model the functioning of the brain. However, large-scale models are expensive to simulate due to the number and interconnectedness of neurons in the brain. Furthermore, where such simulations are used in an embodied setting, the simulation must be real-time in order to be useful. In this paper we present NeMo, a platform for such simulations which achieves high performance through the use of highly parallel commodity hardware in the form of graphics processing units (GPUs). NeMo makes use of the Izhikevich neuron model which provides a range of realistic spiking dynamics while being computationally efficient. Our GPU kernel can deliver up to 400 million spikes per second. This corresponds to a real-time simulation of around 40 000 neurons under biologically plausible conditions with 1000 synapses per neuron and a mean firing rate of 10 Hz.
Resumo:
The translation of an ensemble of model runs into a probability distribution is a common task in model-based prediction. Common methods for such ensemble interpretations proceed as if verification and ensemble were draws from the same underlying distribution, an assumption not viable for most, if any, real world ensembles. An alternative is to consider an ensemble as merely a source of information rather than the possible scenarios of reality. This approach, which looks for maps between ensembles and probabilistic distributions, is investigated and extended. Common methods are revisited, and an improvement to standard kernel dressing, called ‘affine kernel dressing’ (AKD), is introduced. AKD assumes an affine mapping between ensemble and verification, typically not acting on individual ensemble members but on the entire ensemble as a whole, the parameters of this mapping are determined in parallel with the other dressing parameters, including a weight assigned to the unconditioned (climatological) distribution. These amendments to standard kernel dressing, albeit simple, can improve performance significantly and are shown to be appropriate for both overdispersive and underdispersive ensembles, unlike standard kernel dressing which exacerbates over dispersion. Studies are presented using operational numerical weather predictions for two locations and data from the Lorenz63 system, demonstrating both effectiveness given operational constraints and statistical significance given a large sample.
Resumo:
Seamless phase II/III clinical trials combine traditional phases II and III into a single trial that is conducted in two stages, with stage 1 used to answer phase II objectives such as treatment selection and stage 2 used for the confirmatory analysis, which is a phase III objective. Although seamless phase II/III clinical trials are efficient because the confirmatory analysis includes phase II data from stage 1, inference can pose statistical challenges. In this paper, we consider point estimation following seamless phase II/III clinical trials in which stage 1 is used to select the most effective experimental treatment and to decide if, compared with a control, the trial should stop at stage 1 for futility. If the trial is not stopped, then the phase III confirmatory part of the trial involves evaluation of the selected most effective experimental treatment and the control. We have developed two new estimators for the treatment difference between these two treatments with the aim of reducing bias conditional on the treatment selection made and on the fact that the trial continues to stage 2. We have demonstrated the properties of these estimators using simulations
Resumo:
We consider in this paper the solvability of linear integral equations on the real line, in operator form (λ−K)φ=ψ, where and K is an integral operator. We impose conditions on the kernel, k, of K which ensure that K is bounded as an operator on . Let Xa denote the weighted space as |s|→∞}. Our first result is that if, additionally, |k(s,t)|⩽κ(s−t), with and κ(s)=O(|s|−b) as |s|→∞, for some b>1, then the spectrum of K is the same on Xa as on X, for 0kernel takes the form k(s,t)=κ(s−t)z(t), with , , and κ(s)=O(|s|−b) as |s|→∞, for some b>1. As an example where kernels of this latter form occur we discuss a boundary integral equation formulation of an impedance boundary value problem for the Helmholtz equation in a half-plane.
Resumo:
We propose a Nystr¨om/product integration method for a class of second kind integral equations on the real line which arise in problems of two-dimensional scalar and elastic wave scattering by unbounded surfaces. Stability and convergence of the method is established with convergence rates dependent on the smoothness of components of the kernel. The method is applied to the problem of acoustic scattering by a sound soft one-dimensional surface which is the graph of a function f, and superalgebraic convergence is established in the case when f is infinitely smooth. Numerical results are presented illustrating this behavior for the case when f is periodic (the diffraction grating case). The Nystr¨om method for this problem is stable and convergent uniformly with respect to the period of the grating, in contrast to standard integral equation methods for diffraction gratings which fail at a countable set of grating periods.
Resumo:
This paper considers general second kind integral equations of the form(in operator form φ − kφ = ψ), where the functions k and ψ are assumed known, with ψ ∈ Y, the space of bounded continuous functions on R, and k such that the mapping s → k(s, · ), from R to L1(R), is bounded and continuous. The function φ ∈ Y is the solution to be determined. Conditions on a set W ⊂ BC(R, L1(R)) are obtained such that a generalised Fredholm alternative holds: If W satisfies these conditions and I − k is injective for all k ∈ W then I − k is also surjective for all k ∈ W and, moreover, the inverse operators (I − k) − 1 on Y are uniformly bounded for k ∈ W. The approximation of the kernel in the integral equation by a sequence (kn) converging in a weak sense to k is also considered and results on stability and convergence are obtained. These general theorems are used to establish results for two special classes of kernels: k(s, t) = κ(s − t)z(t) and k(s, t) = κ(s − t)λ(s − t, t), where κ ∈ L1(R), z ∈ L∞(R), and λ ∈ BC((R\{0}) × R). Kernels of both classes arise in problems of time harmonic wave scattering by unbounded surfaces. The general integral equation results are here applied to prove the existence of a solution for a boundary integral equation formulation of scattering by an infinite rough surface and to consider the stability and convergence of approximation of the rough surface problem by a sequence of diffraction grating problems of increasingly large period.
Resumo:
We consider second kind integral equations of the form x(s) - (abbreviated x - K x = y ), in which Ω is some unbounded subset of Rn. Let Xp denote the weighted space of functions x continuous on Ω and satisfying x (s) = O(|s|-p ),s → ∞We show that if the kernel k(s,t) decays like |s — t|-q as |s — t| → ∞ for some sufficiently large q (and some other mild conditions on k are satisfied), then K ∈ B(XP) (the set of bounded linear operators on Xp), for 0 ≤ p ≤ q. If also (I - K)-1 ∈ B(X0) then (I - K)-1 ∈ B(XP) for 0 < p < q, and (I- K)-1∈ B(Xq) if further conditions on k hold. Thus, if k(s, t) = O(|s — t|-q). |s — t| → ∞, and y(s)=O(|s|-p), s → ∞, the asymptotic behaviour of the solution x may be estimated as x (s) = O(|s|-r), |s| → ∞, r := min(p, q). The case when k(s,t) = к(s — t), so that the equation is of Wiener-Hopf type, receives especial attention. Conditions, in terms of the symbol of I — K, for I — K to be invertible or Fredholm on Xp are established for certain cases (Ω a half-space or cone). A boundary integral equation, which models three-dimensional acoustic propaga-tion above flat ground, absorbing apart from an infinite rigid strip, illustrates the practical application and sharpness of the above results. This integral equation mod-els, in particular, road traffic noise propagation along an infinite road surface sur-rounded by absorbing ground. We prove that the sound propagating along the rigid road surface eventually decays with distance at the same rate as sound propagating above the absorbing ground.