981 resultados para adaptive estimation
Resumo:
The systematic sampling (SYS) design (Madow and Madow, 1944) is widely used by statistical offices due to its simplicity and efficiency (e.g., Iachan, 1982). But it suffers from a serious defect, namely, that it is impossible to unbiasedly estimate the sampling variance (Iachan, 1982) and usual variance estimators (Yates and Grundy, 1953) are inadequate and can overestimate the variance significantly (Särndal et al., 1992). We propose a novel variance estimator which is less biased and that can be implemented with any given population order. We will justify this estimator theoretically and with a Monte Carlo simulation study.
Resumo:
We show that the Hájek (Ann. Math Statist. (1964) 1491) variance estimator can be used to estimate the variance of the Horvitz–Thompson estimator when the Chao sampling scheme (Chao, Biometrika 69 (1982) 653) is implemented. This estimator is simple and can be implemented with any statistical packages. We consider a numerical and an analytic method to show that this estimator can be used. A series of simulations supports our findings.
Resumo:
Sequential techniques can enhance the efficiency of the approximate Bayesian computation algorithm, as in Sisson et al.'s (2007) partial rejection control version. While this method is based upon the theoretical works of Del Moral et al. (2006), the application to approximate Bayesian computation results in a bias in the approximation to the posterior. An alternative version based on genuine importance sampling arguments bypasses this difficulty, in connection with the population Monte Carlo method of Cappe et al. (2004), and it includes an automatic scaling of the forward kernel. When applied to a population genetics example, it compares favourably with two other versions of the approximate algorithm.
Resumo:
Inferring population admixture from genetic data and quantifying it is a difficult but crucial task in evolutionary and conservation biology. Unfortunately state-of-the-art probabilistic approaches are computationally demanding. Effectively exploiting the computational power of modern multiprocessor systems can thus have a positive impact to Monte Carlo-based simulation of admixture modeling. A novel parallel approach is briefly described and promising results on its message passing interface (MPI)-based C implementation are reported.
Resumo:
Adaptive radiations often follow the evolution of key traits, such as the origin of the amniotic egg and the subsequent radiation of terrestrial vertebrates. The mechanism by which a species determines the sex of its offspring has been linked to critical ecological and life-history traits(1-3) but not to major adaptive radiations, in part because sex-determining mechanisms do not fossilize. Here we establish a previously unknown coevolutionary relationship in 94 amniote species between sex-determining mechanism and whether a species bears live young or lays eggs. We use that relationship to predict the sex-determining mechanism in three independent lineages of extinct Mesozoic marine reptiles (mosasaurs, sauropterygians and ichthyosaurs), each of which is known from fossils to have evolved live birth(4-7). Our results indicate that each lineage evolved genotypic sex determination before acquiring live birth. This enabled their pelagic radiations, where the relatively stable temperatures of the open ocean constrain temperature-dependent sex determination in amniote species. Freed from the need to move and nest on land(4,5,8), extreme physical adaptations to a pelagic lifestyle evolved in each group, such as the fluked tails, dorsal fins and wing-shaped limbs of ichthyosaurs. With the inclusion of ichthyosaurs, mosasaurs and sauropterygians, genotypic sex determination is present in all known fully pelagic amniote groups (sea snakes, sirenians and cetaceans), suggesting that this mode of sex determination and the subsequent evolution of live birth are key traits required for marine adaptive radiations in amniote lineages.
Resumo:
Estimation of population size with missing zero-class is an important problem that is encountered in epidemiological assessment studies. Fitting a Poisson model to the observed data by the method of maximum likelihood and estimation of the population size based on this fit is an approach that has been widely used for this purpose. In practice, however, the Poisson assumption is seldom satisfied. Zelterman (1988) has proposed a robust estimator for unclustered data that works well in a wide class of distributions applicable for count data. In the work presented here, we extend this estimator to clustered data. The estimator requires fitting a zero-truncated homogeneous Poisson model by maximum likelihood and thereby using a Horvitz-Thompson estimator of population size. This was found to work well, when the data follow the hypothesized homogeneous Poisson model. However, when the true distribution deviates from the hypothesized model, the population size was found to be underestimated. In the search of a more robust estimator, we focused on three models that use all clusters with exactly one case, those clusters with exactly two cases and those with exactly three cases to estimate the probability of the zero-class and thereby use data collected on all the clusters in the Horvitz-Thompson estimator of population size. Loss in efficiency associated with gain in robustness was examined based on a simulation study. As a trade-off between gain in robustness and loss in efficiency, the model that uses data collected on clusters with at most three cases to estimate the probability of the zero-class was found to be preferred in general. In applications, we recommend obtaining estimates from all three models and making a choice considering the estimates from the three models, robustness and the loss in efficiency. (© 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim)
Resumo:
This paper considers the problem of estimation when one of a number of populations, assumed normal with known common variance, is selected on the basis of it having the largest observed mean. Conditional on selection of the population, the observed mean is a biased estimate of the true mean. This problem arises in the analysis of clinical trials in which selection is made between a number of experimental treatments that are compared with each other either with or without an additional control treatment. Attempts to obtain approximately unbiased estimates in this setting have been proposed by Shen [2001. An improved method of evaluating drug effect in a multiple dose clinical trial. Statist. Medicine 20, 1913–1929] and Stallard and Todd [2005. Point estimates and confidence regions for sequential trials involving selection. J. Statist. Plann. Inference 135, 402–419]. This paper explores the problem in the simple setting in which two experimental treatments are compared in a single analysis. It is shown that in this case the estimate of Stallard and Todd is the maximum-likelihood estimate (m.l.e.), and this is compared with the estimate proposed by Shen. In particular, it is shown that the m.l.e. has infinite expectation whatever the true value of the mean being estimated. We show that there is no conditionally unbiased estimator, and propose a new family of approximately conditionally unbiased estimators, comparing these with the estimators suggested by Shen.
Resumo:
Approximate Bayesian computation (ABC) is a highly flexible technique that allows the estimation of parameters under demographic models that are too complex to be handled by full-likelihood methods. We assess the utility of this method to estimate the parameters of range expansion in a two-dimensional stepping-stone model, using samples from either a single deme or multiple demes. A minor modification to the ABC procedure is introduced, which leads to an improvement in the accuracy of estimation. The method is then used to estimate the expansion time and migration rates for five natural common vole populations in Switzerland typed for a sex-linked marker and a nuclear marker. Estimates based on both markers suggest that expansion occurred < 10,000 years ago, after the most recent glaciation, and that migration rates are strongly male biased.
Resumo:
The identification of signatures of natural selection in genomic surveys has become an area of intense research, stimulated by the increasing ease with which genetic markers can be typed. Loci identified as subject to selection may be functionally important, and hence (weak) candidates for involvement in disease causation. They can also be useful in determining the adaptive differentiation of populations, and exploring hypotheses about speciation. Adaptive differentiation has traditionally been identified from differences in allele frequencies among different populations, summarised by an estimate of F-ST. Low outliers relative to an appropriate neutral population-genetics model indicate loci subject to balancing selection, whereas high outliers suggest adaptive (directional) selection. However, the problem of identifying statistically significant departures from neutrality is complicated by confounding effects on the distribution of F-ST estimates, and current methods have not yet been tested in large-scale simulation experiments. Here, we simulate data from a structured population at many unlinked, diallelic loci that are predominantly neutral but with some loci subject to adaptive or balancing selection. We develop a hierarchical-Bayesian method, implemented via Markov chain Monte Carlo (MCMC), and assess its performance in distinguishing the loci simulated under selection from the neutral loci. We also compare this performance with that of a frequentist method, based on moment-based estimates of F-ST. We find that both methods can identify loci subject to adaptive selection when the selection coefficient is at least five times the migration rate. Neither method could reliably distinguish loci under balancing selection in our simulations, even when the selection coefficient is twenty times the migration rate.
Resumo:
Biologists frequently attempt to infer the character states at ancestral nodes of a phylogeny from the distribution of traits observed in contemporary organisms. Because phylogenies are normally inferences from data, it is desirable to account for the uncertainty in estimates of the tree and its branch lengths when making inferences about ancestral states or other comparative parameters. Here we present a general Bayesian approach for testing comparative hypotheses across statistically justified samples of phylogenies, focusing on the specific issue of reconstructing ancestral states. The method uses Markov chain Monte Carlo techniques for sampling phylogenetic trees and for investigating the parameters of a statistical model of trait evolution. We describe how to combine information about the uncertainty of the phylogeny with uncertainty in the estimate of the ancestral state. Our approach does not constrain the sample of trees only to those that contain the ancestral node or nodes of interest, and we show how to reconstruct ancestral states of uncertain nodes using a most-recent-common-ancestor approach. We illustrate the methods with data on ribonuclease evolution in the Artiodactyla. Software implementing the methods ( BayesMultiState) is available from the authors.
Resumo:
This article introduces a new general method for genealogical inference that samples independent genealogical histories using importance sampling (IS) and then samples other parameters with Markov chain Monte Carlo (MCMC). It is then possible to more easily utilize the advantages of importance sampling in a fully Bayesian framework. The method is applied to the problem of estimating recent changes in effective population size from temporally spaced gene frequency data. The method gives the posterior distribution of effective population size at the time of the oldest sample and at the time of the most recent sample, assuming a model of exponential growth or decline during the interval. The effect of changes in number of alleles, number of loci, and sample size on the accuracy of the method is described using test simulations, and it is concluded that these have an approximately equivalent effect. The method is used on three example data sets and problems in interpreting the posterior densities are highlighted and discussed.
Resumo:
We have studied growth and estimated recruitment of massive coral colonies at three sites, Kaledupa, Hoga and Sampela, separated by about 1.5 km in the Wakatobi Marine National Park, S.E. Sulawesi, Indonesia. There was significantly higher species richness (P<0.05), coral cover (P<0.05) and rugosity (P<0.01) at Kaledupa than at Sampela. A model for coral reef growth has been developed based on a rational polynomial function, where dx/dt is an index of coral growth with time; W is the variable (for example, coral weight, coral length or coral area), up to the power of n in the numerator and m in the denominator; a1……an and b1…bm are constants. The values for n and m represent the degree of the polynomial, and can relate to the morphology of the coral. The model was used to simulate typical coral growth curves, and tested using published data obtained by weighing coral colonies underwater in reefs on the south-west coast of Curaçao [‘Neth. J. Sea Res. 10 (1976) 285’]. The model proved an accurate fit to the data, and parameters were obtained for a number of coral species. Surface area data was obtained on over 1200 massive corals at three different sites in the Wakatobi Marine National Park, S.E. Sulawesi, Indonesia. The year of an individual's recruitment was calculated from knowledge of the growth rate modified by application of the rational polynomial model. The estimated pattern of recruitment was variable, with little numbers of massive corals settling and growing before 1950 at the heavily used site, Sampela, relative to the reef site with little or no human use, Kaledupa, and the intermediate site, Hoga. There was a significantly greater sedimentation rate at Sampela than at either Kaledupa (P<0.0001) or Hoga (P<0.0005). The relative mean abundance of fish families present at the reef crests at the three sites, determined using digital video photography, did not correlate with sedimentation rates, underwater visibility or lack of large non-branching coral colonies. Radial growth rates of three genera of non-branching corals were significantly lower at Sampela than at Kaledupa or at Hoga, and there was a high correlation (r=0.89) between radial growth rates and underwater visibility. Porites spp. was the most abundant coral over all the sites and at all depths followed by Favites (P<0.04) and Favia spp. (P<0.03). Colony ages of Porites corals were significantly lower at the 5 m reef flat on the Sampela reef than at the same depth on both other reefs (P<0.005). At Sampela, only 2.8% of corals on the 5 m reef crest are of a size to have survived from before 1950. The Scleractinian coral community of Sampela is severely impacted by depositing sediments which can lead to the suffocation of corals, whilst also decreasing light penetration resulting in decreased growth and calcification rates. The net loss of material from Sampela, if not checked, could result in the loss of this protective barrier which would be to the detriment of the sublittoral sand flats and hence the Sampela village.