601 resultados para Estimators
Resumo:
None of the current surveillance streams monitoring the presence of scrapie in Great Britain provide a comprehensive and unbiased estimate of the prevalence of the disease at the holding level. Previous work to estimate the under-ascertainment adjusted prevalence of scrapie in Great Britain applied multiple-list capture-recapture methods. The enforcement of new control measures on scrapie-affected holdings in 2004 has stopped the overlapping between surveillance sources and, hence, the application of multiple-list capture-recapture models. Alternative methods, still under the capture-recapture methodology, relying on repeated entries in one single list have been suggested in these situations. In this article, we apply one-list capture-recapture approaches to data held on the Scrapie Notifications Database to estimate the undetected population of scrapie-affected holdings with clinical disease in Great Britain for the years 2002, 2003, and 2004. For doing so, we develop a new diagnostic tool for indication of heterogeneity as well as a new understanding of the Zelterman and Chao's lower bound estimators to account for potential unobserved heterogeneity. We demonstrate that the Zelterman estimator can be viewed as a maximum likelihood estimator for a special, locally truncated Poisson likelihood equivalent to a binomial likelihood. This understanding allows the extension of the Zelterman approach by means of logistic regression to include observed heterogeneity in the form of covariates-in case studied here, the holding size and country of origin. Our results confirm the presence of substantial unobserved heterogeneity supporting the application of our two estimators. The total scrapie-affected holding population in Great Britain is around 300 holdings per year. None of the covariates appear to inform the model significantly.
Resumo:
The article considers screening human populations with two screening tests. If any of the two tests is positive, then full evaluation of the disease status is undertaken; however, if both diagnostic tests are negative, then disease status remains unknown. This procedure leads to a data constellation in which, for each disease status, the 2 x 2 table associated with the two diagnostic tests used in screening has exactly one empty, unknown cell. To estimate the unobserved cell counts, previous approaches assume independence of the two diagnostic tests and use specific models, including the special mixture model of Walter or unconstrained capture-recapture estimates. Often, as is also demonstrated in this article by means of a simple test, the independence of the two screening tests is not supported by the data. Two new estimators are suggested that allow associations of the screening test, although the form of association must be assumed to be homogeneous over disease status. These estimators are modifications of the simple capture-recapture estimator and easy to construct. The estimators are investigated for several screening studies with fully evaluated disease status in which the superior behavior of the new estimators compared to the previous conventional ones can be shown. Finally, the performance of the new estimators is compared with maximum likelihood estimators, which are more difficult to obtain in these models. The results indicate the loss of efficiency as minor.
Resumo:
This paper considers the problem of estimation when one of a number of populations, assumed normal with known common variance, is selected on the basis of it having the largest observed mean. Conditional on selection of the population, the observed mean is a biased estimate of the true mean. This problem arises in the analysis of clinical trials in which selection is made between a number of experimental treatments that are compared with each other either with or without an additional control treatment. Attempts to obtain approximately unbiased estimates in this setting have been proposed by Shen [2001. An improved method of evaluating drug effect in a multiple dose clinical trial. Statist. Medicine 20, 1913–1929] and Stallard and Todd [2005. Point estimates and confidence regions for sequential trials involving selection. J. Statist. Plann. Inference 135, 402–419]. This paper explores the problem in the simple setting in which two experimental treatments are compared in a single analysis. It is shown that in this case the estimate of Stallard and Todd is the maximum-likelihood estimate (m.l.e.), and this is compared with the estimate proposed by Shen. In particular, it is shown that the m.l.e. has infinite expectation whatever the true value of the mean being estimated. We show that there is no conditionally unbiased estimator, and propose a new family of approximately conditionally unbiased estimators, comparing these with the estimators suggested by Shen.
Resumo:
The estimation of effective population size from one sample of genotypes has been problematic because most estimators have been proven imprecise or biased. We developed a web-based program, ONeSAMP that uses approximate Bayesian computation to estimate effective population size from a sample of microsatellite genotypes. ONeSAMP requires an input file of sampled individuals' microsatellite genotypes along with information about several sampling and biological parameters. ONeSAMP provides an estimate of effective population size, along with 95% credible limits. We illustrate the use of ONeSAMP with an example data set from a re-introduced population of ibex Capra ibex.
Resumo:
In this paper, we apply one-list capture-recapture models to estimate the number of scrapie-affected holdings in Great Britain. We applied this technique to the Compulsory Scrapie Flocks Scheme dataset where cases from all the surveillance sources monitoring the presence of scrapie in Great Britain, the abattoir survey, the fallen stock survey and the statutory reporting of clinical cases, are gathered. Consequently, the estimates of prevalence obtained from this scheme should be comprehensive and cover all the different presentations of the disease captured individually by the surveillance sources. Two estimators were applied under the one-list approach: the Zelterman estimator and Chao's lower bound estimator. Our results could only inform with confidence the scrapie-affected holding population with clinical disease; this moved around the figure of 350 holdings in Great Britain for the period under study, April 2005-April 2006. Our models allowed the stratification by surveillance source and the input of covariate information, holding size and country of origin. None of the covariates appear to inform the model significantly. Crown Copyright (C) 2008 Published by Elsevier B.V. All rights reserved.
Resumo:
This article is about modeling count data with zero truncation. A parametric count density family is considered. The truncated mixture of densities from this family is different from the mixture of truncated densities from the same family. Whereas the former model is more natural to formulate and to interpret, the latter model is theoretically easier to treat. It is shown that for any mixing distribution leading to a truncated mixture, a (usually different) mixing distribution can be found so. that the associated mixture of truncated densities equals the truncated mixture, and vice versa. This implies that the likelihood surfaces for both situations agree, and in this sense both models are equivalent. Zero-truncated count data models are used frequently in the capture-recapture setting to estimate population size, and it can be shown that the two Horvitz-Thompson estimators, associated with the two models, agree. In particular, it is possible to achieve strong results for mixtures of truncated Poisson densities, including reliable, global construction of the unique NPMLE (nonparametric maximum likelihood estimator) of the mixing distribution, implying a unique estimator for the population size. The benefit of these results lies in the fact that it is valid to work with the mixture of truncated count densities, which is less appealing for the practitioner but theoretically easier. Mixtures of truncated count densities form a convex linear model, for which a developed theory exists, including global maximum likelihood theory as well as algorithmic approaches. Once the problem has been solved in this class, it might readily be transformed back to the original problem by means of an explicitly given mapping. Applications of these ideas are given, particularly in the case of the truncated Poisson family.
Resumo:
We describe and evaluate a new estimator of the effective population size (N-e), a critical parameter in evolutionary and conservation biology. This new "SummStat" N-e. estimator is based upon the use of summary statistics in an approximate Bayesian computation framework to infer N-e. Simulations of a Wright-Fisher population with known N-e show that the SummStat estimator is useful across a realistic range of individuals and loci sampled, generations between samples, and N-e values. We also address the paucity of information about the relative performance of N-e estimators by comparing the SUMMStat estimator to two recently developed likelihood-based estimators and a traditional moment-based estimator. The SummStat estimator is the least biased of the four estimators compared. In 32 of 36 parameter combinations investigated rising initial allele frequencies drawn from a Dirichlet distribution, it has the lowest bias. The relative mean square error (RMSE) of the SummStat estimator was generally intermediate to the others. All of the estimators had RMSE > 1 when small samples (n = 20, five loci) were collected a generation apart. In contrast, when samples were separated by three or more generations and Ne less than or equal to 50, the SummStat and likelihood-based estimators all had greatly reduced RMSE. Under the conditions simulated, SummStat confidence intervals were more conservative than the likelihood-based estimators and more likely to include true N-e. The greatest strength of the SummStat estimator is its flexible structure. This flexibility allows it to incorporate any, potentially informative summary statistic from Population genetic data.
Resumo:
Using the classical Parzen window (PW) estimate as the target function, the sparse kernel density estimator is constructed in a forward constrained regression manner. The leave-one-out (LOO) test score is used for kernel selection. The jackknife parameter estimator subject to positivity constraint check is used for the parameter estimation of a single parameter at each forward step. As such the proposed approach is simple to implement and the associated computational cost is very low. An illustrative example is employed to demonstrate that the proposed approach is effective in constructing sparse kernel density estimators with comparable accuracy to that of the classical Parzen window estimate.
Resumo:
In this work the G(A)(0) distribution is assumed as the universal model for amplitude Synthetic Aperture (SAR) imagery data under the Multiplicative Model. The observed data, therefore, is assumed to obey a G(A)(0) (alpha; gamma, n) law, where the parameter n is related to the speckle noise, and (alpha, gamma) are related to the ground truth, giving information about the background. Therefore, maps generated by the estimation of (alpha, gamma) in each coordinate can be used as the input for classification methods. Maximum likelihood estimators are derived and used to form estimated parameter maps. This estimation can be hampered by the presence of corner reflectors, man-made objects used to calibrate SAR images that produce large return values. In order to alleviate this contamination, robust (M) estimators are also derived for the universal model. Gaussian Maximum Likelihood classification is used to obtain maps using hard-to-deal-with simulated data, and the superiority of robust estimation is quantitatively assessed.
Resumo:
This correspondence introduces a new orthogonal forward regression (OFR) model identification algorithm using D-optimality for model structure selection and is based on an M-estimators of parameter estimates. M-estimator is a classical robust parameter estimation technique to tackle bad data conditions such as outliers. Computationally, The M-estimator can be derived using an iterative reweighted least squares (IRLS) algorithm. D-optimality is a model structure robustness criterion in experimental design to tackle ill-conditioning in model Structure. The orthogonal forward regression (OFR), often based on the modified Gram-Schmidt procedure, is an efficient method incorporating structure selection and parameter estimation simultaneously. The basic idea of the proposed approach is to incorporate an IRLS inner loop into the modified Gram-Schmidt procedure. In this manner, the OFR algorithm for parsimonious model structure determination is extended to bad data conditions with improved performance via the derivation of parameter M-estimators with inherent robustness to outliers. Numerical examples are included to demonstrate the effectiveness of the proposed algorithm.
Resumo:
We agree with Duckrow and Albano [Phys. Rev. E 67, 063901 (2003)] and Quian Quiroga et al. [Phys. Rev. E 67, 063902 (2003)] that mutual information (MI) is a useful measure of dependence for electroencephalogram (EEG) data, but we show that the improvement seen in the performance of MI on extracting dependence trends from EEG is more dependent on the type of MI estimator rather than any embedding technique used. In an independent study we conducted in search for an optimal MI estimator, and in particular for EEG applications, we examined the performance of a number of MI estimators on the data set used by Quian Quiroga et al. in their original study, where the performance of different dependence measures on real data was investigated [Phys. Rev. E 65, 041903 (2002)]. We show that for EEG applications the best performance among the investigated estimators is achieved by k-nearest neighbors, which supports the conjecture by Quian Quiroga et al. in Phys. Rev. E 67, 063902 (2003) that the nearest neighbor estimator is the most precise method for estimating MI.
Resumo:
We agree with Duckrow and Albano [Phys. Rev. E 67, 063901 (2003)] and Quian Quiroga [Phys. Rev. E 67, 063902 (2003)] that mutual information (MI) is a useful measure of dependence for electroencephalogram (EEG) data, but we show that the improvement seen in the performance of MI on extracting dependence trends from EEG is more dependent on the type of MI estimator rather than any embedding technique used. In an independent study we conducted in search for an optimal MI estimator, and in particular for EEG applications, we examined the performance of a number of MI estimators on the data set used by Quian Quiroga in their original study, where the performance of different dependence measures on real data was investigated [Phys. Rev. E 65, 041903 (2002)]. We show that for EEG applications the best performance among the investigated estimators is achieved by k-nearest neighbors, which supports the conjecture by Quian Quiroga in Phys. Rev. E 67, 063902 (2003) that the nearest neighbor estimator is the most precise method for estimating MI.
Resumo:
Population size estimation with discrete or nonparametric mixture models is considered, and reliable ways of construction of the nonparametric mixture model estimator are reviewed and set into perspective. Construction of the maximum likelihood estimator of the mixing distribution is done for any number of components up to the global nonparametric maximum likelihood bound using the EM algorithm. In addition, the estimators of Chao and Zelterman are considered with some generalisations of Zelterman’s estimator. All computations are done with CAMCR, a special software developed for population size estimation with mixture models. Several examples and data sets are discussed and the estimators illustrated. Problems using the mixture model-based estimators are highlighted.
Resumo:
This paper derives an efficient algorithm for constructing sparse kernel density (SKD) estimates. The algorithm first selects a very small subset of significant kernels using an orthogonal forward regression (OFR) procedure based on the D-optimality experimental design criterion. The weights of the resulting sparse kernel model are then calculated using a modified multiplicative nonnegative quadratic programming algorithm. Unlike most of the SKD estimators, the proposed D-optimality regression approach is an unsupervised construction algorithm and it does not require an empirical desired response for the kernel selection task. The strength of the D-optimality OFR is owing to the fact that the algorithm automatically selects a small subset of the most significant kernels related to the largest eigenvalues of the kernel design matrix, which counts for the most energy of the kernel training data, and this also guarantees the most accurate kernel weight estimate. The proposed method is also computationally attractive, in comparison with many existing SKD construction algorithms. Extensive numerical investigation demonstrates the ability of this regression-based approach to efficiently construct a very sparse kernel density estimate with excellent test accuracy, and our results show that the proposed method compares favourably with other existing sparse methods, in terms of test accuracy, model sparsity and complexity, for constructing kernel density estimates.
Resumo:
A new Bayesian algorithm for retrieving surface rain rate from Tropical Rainfall Measuring Mission (TRMM) Microwave Imager (TMI) over the ocean is presented, along with validations against estimates from the TRMM Precipitation Radar (PR). The Bayesian approach offers a rigorous basis for optimally combining multichannel observations with prior knowledge. While other rain-rate algorithms have been published that are based at least partly on Bayesian reasoning, this is believed to be the first self-contained algorithm that fully exploits Bayes’s theorem to yield not just a single rain rate, but rather a continuous posterior probability distribution of rain rate. To advance the understanding of theoretical benefits of the Bayesian approach, sensitivity analyses have been conducted based on two synthetic datasets for which the “true” conditional and prior distribution are known. Results demonstrate that even when the prior and conditional likelihoods are specified perfectly, biased retrievals may occur at high rain rates. This bias is not the result of a defect of the Bayesian formalism, but rather represents the expected outcome when the physical constraint imposed by the radiometric observations is weak owing to saturation effects. It is also suggested that both the choice of the estimators and the prior information are crucial to the retrieval. In addition, the performance of the Bayesian algorithm herein is found to be comparable to that of other benchmark algorithms in real-world applications, while having the additional advantage of providing a complete continuous posterior probability distribution of surface rain rate.