139 resultados para Maximum pseudo-likelihood
Resumo:
[1] In many practical situations where spatial rainfall estimates are needed, rainfall occurs as a spatially intermittent phenomenon. An efficient geostatistical method for rainfall estimation in the case of intermittency has previously been published and comprises the estimation of two independent components: a binary random function for modeling the intermittency and a continuous random function that models the rainfall inside the rainy areas. The final rainfall estimates are obtained as the product of the estimates of these two random functions. However the published approach does not contain a method for estimation of uncertainties. The contribution of this paper is the presentation of the indicator maximum likelihood estimator from which the local conditional distribution of the rainfall value at any location may be derived using an ensemble approach. From the conditional distribution, representations of uncertainty such as the estimation variance and confidence intervals can be obtained. An approximation to the variance can be calculated more simply by assuming rainfall intensity is independent of location within the rainy area. The methodology has been validated using simulated and real rainfall data sets. The results of these case studies show good agreement between predicted uncertainties and measured errors obtained from the validation data.
Resumo:
We use geomagnetic activity data to study the rise and fall over the past century of the solar wind flow speed VSW, the interplanetary magnetic field strength B, and the open solar flux FS. Our estimates include allowance for the kinematic effect of longitudinal structure in the solar wind flow speed. As well as solar cycle variations, all three parameters show a long-term rise during the first half of the 20th century followed by peaks around 1955 and 1986 and then a recent decline. Cosmogenic isotope data reveal that this constitutes a grand maximum of solar activity which began in 1920, using the definition that such grand maxima are when 25-year averages of the heliospheric modulation potential exceeds 600 MV. Extrapolating the linear declines seen in all three parameters since 1985, yields predictions that the grand maximum will end in the years 2013, 2014, or 2027 using VSW, FS, or B, respectively. These estimates are consistent with predictions based on the probability distribution of the durations of past grand solar maxima seen in cosmogenic isotope data. The data contradict any suggestions of a floor to the open solar flux: we show that the solar minimum open solar flux, kinematically corrected to allow for the excess flux effect, has halved over the past two solar cycles.
Resumo:
In this paper we consider the estimation of population size from onesource capture–recapture data, that is, a list in which individuals can potentially be found repeatedly and where the question is how many individuals are missed by the list. As a typical example, we provide data from a drug user study in Bangkok from 2001 where the list consists of drug users who repeatedly contact treatment institutions. Drug users with 1, 2, 3, . . . contacts occur, but drug users with zero contacts are not present, requiring the size of this group to be estimated. Statistically, these data can be considered as stemming from a zero-truncated count distribution.We revisit an estimator for the population size suggested by Zelterman that is known to be robust under potential unobserved heterogeneity. We demonstrate that the Zelterman estimator can be viewed as a maximum likelihood estimator for a locally truncated Poisson likelihood which is equivalent to a binomial likelihood. This result allows the extension of the Zelterman estimator by means of logistic regression to include observed heterogeneity in the form of covariates. We also review an estimator proposed by Chao and explain why we are not able to obtain similar results for this estimator. The Zelterman estimator is applied in two case studies, the first a drug user study from Bangkok, the second an illegal immigrant study in the Netherlands. Our results suggest the new estimator should be used, in particular, if substantial unobserved heterogeneity is present.
Resumo:
This paper investigates the applications of capture–recapture methods to human populations. Capture–recapture methods are commonly used in estimating the size of wildlife populations but can also be used in epidemiology and social sciences, for estimating prevalence of a particular disease or the size of the homeless population in a certain area. Here we focus on estimating the prevalence of infectious diseases. Several estimators of population size are considered: the Lincoln–Petersen estimator and its modified version, the Chapman estimator, Chao’s lower bound estimator, the Zelterman’s estimator, McKendrick’s moment estimator and the maximum likelihood estimator. In order to evaluate these estimators, they are applied to real, three-source, capture-recapture data. By conditioning on each of the sources of three source data, we have been able to compare the estimators with the true value that they are estimating. The Chapman and Chao estimators were compared in terms of their relative bias. A variance formula derived through conditioning is suggested for Chao’s estimator, and normal 95% confidence intervals are calculated for this and the Chapman estimator. We then compare the coverage of the respective confidence intervals. Furthermore, a simulation study is included to compare Chao’s and Chapman’s estimator. Results indicate that Chao’s estimator is less biased than Chapman’s estimator unless both sources are independent. Chao’s estimator has also the smaller mean squared error. Finally, the implications and limitations of the above methods are discussed, with suggestions for further development.
Resumo:
None of the current surveillance streams monitoring the presence of scrapie in Great Britain provide a comprehensive and unbiased estimate of the prevalence of the disease at the holding level. Previous work to estimate the under-ascertainment adjusted prevalence of scrapie in Great Britain applied multiple-list capture–recapture methods. The enforcement of new control measures on scrapie-affected holdings in 2004 has stopped the overlapping between surveillance sources and, hence, the application of multiple-list capture–recapture models. Alternative methods, still under the capture–recapture methodology, relying on repeated entries in one single list have been suggested in these situations. In this article, we apply one-list capture–recapture approaches to data held on the Scrapie Notifications Database to estimate the undetected population of scrapie-affected holdings with clinical disease in Great Britain for the years 2002, 2003, and 2004. For doing so, we develop a new diagnostic tool for indication of heterogeneity as well as a new understanding of the Zelterman and Chao’s lower bound estimators to account for potential unobserved heterogeneity. We demonstrate that the Zelterman estimator can be viewed as a maximum likelihood estimator for a special, locally truncated Poisson likelihood equivalent to a binomial likelihood. This understanding allows the extension of the Zelterman approach by means of logistic regression to include observed heterogeneity in the form of covariates—in case studied here, the holding size and country of origin. Our results confirm the presence of substantial unobserved heterogeneity supporting the application of our two estimators. The total scrapie-affected holding population in Great Britain is around 300 holdings per year. None of the covariates appear to inform the model significantly.
Resumo:
The article considers screening human populations with two screening tests. If any of the two tests is positive, then full evaluation of the disease status is undertaken; however, if both diagnostic tests are negative, then disease status remains unknown. This procedure leads to a data constellation in which, for each disease status, the 2 × 2 table associated with the two diagnostic tests used in screening has exactly one empty, unknown cell. To estimate the unobserved cell counts, previous approaches assume independence of the two diagnostic tests and use specific models, including the special mixture model of Walter or unconstrained capture–recapture estimates. Often, as is also demonstrated in this article by means of a simple test, the independence of the two screening tests is not supported by the data. Two new estimators are suggested that allow associations of the screening test, although the form of association must be assumed to be homogeneous over disease status. These estimators are modifications of the simple capture–recapture estimator and easy to construct. The estimators are investigated for several screening studies with fully evaluated disease status in which the superior behavior of the new estimators compared to the previous conventional ones can be shown. Finally, the performance of the new estimators is compared with maximum likelihood estimators, which are more difficult to obtain in these models. The results indicate the loss of efficiency as minor.
Resumo:
This article assesses the extent to which sampling variation affects findings about Malmquist productivity change derived using data envelopment analysis (DEA), in the first stage by calculating productivity indices and in the second stage by investigating the farm-specific change in productivity. Confidence intervals for Malmquist indices are constructed using Simar and Wilson's (1999) bootstrapping procedure. The main contribution of this article is to account in the second stage for the information in the second stage provided by the first-stage bootstrap. The DEA SEs of the Malmquist indices given by bootstrapping are employed in an innovative heteroscedastic panel regression, using a maximum likelihood procedure. The application is to a sample of 250 Polish farms over the period 1996 to 2000. The confidence intervals' results suggest that the second half of 1990s for Polish farms was characterized not so much by productivity regress but rather by stagnation. As for the determinants of farm productivity change, we find that the integration of the DEA SEs in the second-stage regression is significant in explaining a proportion of the variance in the error term. Although our heteroscedastic regression results differ with those from the standard OLS, in terms of significance and sign, they are consistent with theory and previous research.
Resumo:
A dataset of 1,846,990 completed lactation record,; was created Using milk recording data from 8,967 commercial dairy farms in the United Kingdom over a five year period. Herd-specific lactation curves describing levels of milk, Cat and protein by lactation number and month of calving were generated for each farm. The actual yield of milk and protein proportion at the first milk recording of individual cow lactations were compared with the levels taken from the lactation curves. Logistic regression analysis showed that cows production milk with a lower percentage of protein than average had a significantly lower probability of being in-calf at 100 days post calving and it significantly higher probability of being culled at the end of lactation. The culling rates derived from the studied database demonstrate the current high wastage rate of commercial dairy cows. Well of this wastage is due to involuntary culling as a result of reproductive failure.
Resumo:
A number of authors have proposed clinical trial designs involving the comparison of several experimental treatments with a control treatment in two or more stages. At the end of the first stage, the most promising experimental treatment is selected, and all other experimental treatments are dropped from the trial. Provided it is good enough, the selected experimental treatment is then compared with the control treatment in one or more subsequent stages. The analysis of data from such a trial is problematic because of the treatment selection and the possibility of stopping at interim analyses. These aspects lead to bias in the maximum-likelihood estimate of the advantage of the selected experimental treatment over the control and to inaccurate coverage for the associated confidence interval. In this paper, we evaluate the bias of the maximum-likelihood estimate and propose a bias-adjusted estimate. We also propose an approach to the construction of a confidence region for the vector of advantages of the experimental treatments over the control based on an ordering of the sample space. These regions are shown to have accurate coverage, although they are also shown to be necessarily unbounded. Confidence intervals for the advantage of the selected treatment are obtained from the confidence regions and are shown to have more accurate coverage than the standard confidence interval based upon the maximum-likelihood estimate and its asymptotic standard error.
Resumo:
The paper concerns the design and analysis of serial dilution assays to estimate the infectivity of a sample of tissue when it is assumed that the sample contains a finite number of indivisible infectious units such that a subsample will be infectious if it contains one or more of these units. The aim of the study is to estimate the number of infectious units in the original sample. The standard approach to the analysis of data from such a study is based on the assumption of independence of aliquots both at the same dilution level and at different dilution levels, so that the numbers of infectious units in the aliquots follow independent Poisson distributions. An alternative approach is based on calculation of the expected value of the total number of samples tested that are not infectious. We derive the likelihood for the data on the basis of the discrete number of infectious units, enabling calculation of the maximum likelihood estimate and likelihood-based confidence intervals. We use the exact probabilities that are obtained to compare the maximum likelihood estimate with those given by the other methods in terms of bias and standard error and to compare the coverage of the confidence intervals. We show that the methods have very similar properties and conclude that for practical use the method that is based on the Poisson assumption is to be recommended, since it can be implemented by using standard statistical software. Finally we consider the design of serial dilution assays, concluding that it is important that neither the dilution factor nor the number of samples that remain untested should be too large.
Resumo:
The problem of estimating the individual probabilities of a discrete distribution is considered. The true distribution of the independent observations is a mixture of a family of power series distributions. First, we ensure identifiability of the mixing distribution assuming mild conditions. Next, the mixing distribution is estimated by non-parametric maximum likelihood and an estimator for individual probabilities is obtained from the corresponding marginal mixture density. We establish asymptotic normality for the estimator of individual probabilities by showing that, under certain conditions, the difference between this estimator and the empirical proportions is asymptotically negligible. Our framework includes Poisson, negative binomial and logarithmic series as well as binomial mixture models. Simulations highlight the benefit in achieving normality when using the proposed marginal mixture density approach instead of the empirical one, especially for small sample sizes and/or when interest is in the tail areas. A real data example is given to illustrate the use of the methodology.
Resumo:
The contribution investigates the problem of estimating the size of a population, also known as the missing cases problem. Suppose a registration system is targeting to identify all cases having a certain characteristic such as a specific disease (cancer, heart disease, ...), disease related condition (HIV, heroin use, ...) or a specific behavior (driving a car without license). Every case in such a registration system has a certain notification history in that it might have been identified several times (at least once) which can be understood as a particular capture-recapture situation. Typically, cases are left out which have never been listed at any occasion, and it is this frequency one wants to estimate. In this paper modelling is concentrating on the counting distribution, e.g. the distribution of the variable that counts how often a given case has been identified by the registration system. Besides very simple models like the binomial or Poisson distribution, finite (nonparametric) mixtures of these are considered providing rather flexible modelling tools. Estimation is done using maximum likelihood by means of the EM algorithm. A case study on heroin users in Bangkok in the year 2001 is completing the contribution.
Resumo:
OBJECTIVES: This contribution provides a unifying concept for meta-analysis integrating the handling of unobserved heterogeneity, study covariates, publication bias and study quality. It is important to consider these issues simultaneously to avoid the occurrence of artifacts, and a method for doing so is suggested here. METHODS: The approach is based upon the meta-likelihood in combination with a general linear nonparametric mixed model, which lays the ground for all inferential conclusions suggested here. RESULTS: The concept is illustrated at hand of a meta-analysis investigating the relationship of hormone replacement therapy and breast cancer. The phenomenon of interest has been investigated in many studies for a considerable time and different results were reported. In 1992 a meta-analysis by Sillero-Arenas et al. concluded a small, but significant overall effect of 1.06 on the relative risk scale. Using the meta-likelihood approach it is demonstrated here that this meta-analysis is due to considerable unobserved heterogeneity. Furthermore, it is shown that new methods are available to model this heterogeneity successfully. It is argued further to include available study covariates to explain this heterogeneity in the meta-analysis at hand. CONCLUSIONS: The topic of HRT and breast cancer has again very recently become an issue of public debate, when results of a large trial investigating the health effects of hormone replacement therapy were published indicating an increased risk for breast cancer (risk ratio of 1.26). Using an adequate regression model in the previously published meta-analysis an adjusted estimate of effect of 1.14 can be given which is considerably higher than the one published in the meta-analysis of Sillero-Arenas et al. In summary, it is hoped that the method suggested here contributes further to a good meta-analytic practice in public health and clinical disciplines.
Resumo:
Background: The present paper investigates the question of a suitable basic model for the number of scrapie cases in a holding and applications of this knowledge to the estimation of scrapie-ffected holding population sizes and adequacy of control measures within holding. Is the number of scrapie cases proportional to the size of the holding in which case it should be incorporated into the parameter of the error distribution for the scrapie counts? Or, is there a different - potentially more complex - relationship between case count and holding size in which case the information about the size of the holding should be better incorporated as a covariate in the modeling? Methods: We show that this question can be appropriately addressed via a simple zero-truncated Poisson model in which the hypothesis of proportionality enters as a special offset-model. Model comparisons can be achieved by means of likelihood ratio testing. The procedure is illustrated by means of surveillance data on classical scrapie in Great Britain. Furthermore, the model with the best fit is used to estimate the size of the scrapie-affected holding population in Great Britain by means of two capture-recapture estimators: the Poisson estimator and the generalized Zelterman estimator. Results: No evidence could be found for the hypothesis of proportionality. In fact, there is some evidence that this relationship follows a curved line which increases for small holdings up to a maximum after which it declines again. Furthermore, it is pointed out how crucial the correct model choice is when applied to capture-recapture estimation on the basis of zero-truncated Poisson models as well as on the basis of the generalized Zelterman estimator. Estimators based on the proportionality model return very different and unreasonable estimates for the population sizes. Conclusion: Our results stress the importance of an adequate modelling approach to the association between holding size and the number of cases of classical scrapie within holding. Reporting artefacts and speculative biological effects are hypothesized as the underlying causes of the observed curved relationship. The lack of adjustment for these artefacts might well render ineffective the current strategies for the control of the disease.