945 resultados para Maximum likelihood estimate
Resumo:
Estimation of population size with missing zero-class is an important problem that is encountered in epidemiological assessment studies. Fitting a Poisson model to the observed data by the method of maximum likelihood and estimation of the population size based on this fit is an approach that has been widely used for this purpose. In practice, however, the Poisson assumption is seldom satisfied. Zelterman (1988) has proposed a robust estimator for unclustered data that works well in a wide class of distributions applicable for count data. In the work presented here, we extend this estimator to clustered data. The estimator requires fitting a zero-truncated homogeneous Poisson model by maximum likelihood and thereby using a Horvitz-Thompson estimator of population size. This was found to work well, when the data follow the hypothesized homogeneous Poisson model. However, when the true distribution deviates from the hypothesized model, the population size was found to be underestimated. In the search of a more robust estimator, we focused on three models that use all clusters with exactly one case, those clusters with exactly two cases and those with exactly three cases to estimate the probability of the zero-class and thereby use data collected on all the clusters in the Horvitz-Thompson estimator of population size. Loss in efficiency associated with gain in robustness was examined based on a simulation study. As a trade-off between gain in robustness and loss in efficiency, the model that uses data collected on clusters with at most three cases to estimate the probability of the zero-class was found to be preferred in general. In applications, we recommend obtaining estimates from all three models and making a choice considering the estimates from the three models, robustness and the loss in efficiency. (© 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim)
Resumo:
This article is about modeling count data with zero truncation. A parametric count density family is considered. The truncated mixture of densities from this family is different from the mixture of truncated densities from the same family. Whereas the former model is more natural to formulate and to interpret, the latter model is theoretically easier to treat. It is shown that for any mixing distribution leading to a truncated mixture, a (usually different) mixing distribution can be found so. that the associated mixture of truncated densities equals the truncated mixture, and vice versa. This implies that the likelihood surfaces for both situations agree, and in this sense both models are equivalent. Zero-truncated count data models are used frequently in the capture-recapture setting to estimate population size, and it can be shown that the two Horvitz-Thompson estimators, associated with the two models, agree. In particular, it is possible to achieve strong results for mixtures of truncated Poisson densities, including reliable, global construction of the unique NPMLE (nonparametric maximum likelihood estimator) of the mixing distribution, implying a unique estimator for the population size. The benefit of these results lies in the fact that it is valid to work with the mixture of truncated count densities, which is less appealing for the practitioner but theoretically easier. Mixtures of truncated count densities form a convex linear model, for which a developed theory exists, including global maximum likelihood theory as well as algorithmic approaches. Once the problem has been solved in this class, it might readily be transformed back to the original problem by means of an explicitly given mapping. Applications of these ideas are given, particularly in the case of the truncated Poisson family.
Resumo:
The Lincoln–Petersen estimator is one of the most popular estimators used in capture–recapture studies. It was developed for a sampling situation in which two sources independently identify members of a target population. For each of the two sources, it is determined if a unit of the target population is identified or not. This leads to a 2 × 2 table with frequencies f11, f10, f01, f00 indicating the number of units identified by both sources, by the first but not the second source, by the second but not the first source and not identified by any of the two sources, respectively. However, f00 is unobserved so that the 2 × 2 table is incomplete and the Lincoln–Petersen estimator provides an estimate for f00. In this paper, we consider a generalization of this situation for which one source provides not only a binary identification outcome but also a count outcome of how many times a unit has been identified. Using a truncated Poisson count model, truncating multiple identifications larger than two, we propose a maximum likelihood estimator of the Poisson parameter and, ultimately, of the population size. This estimator shows benefits, in comparison with Lincoln–Petersen’s, in terms of bias and efficiency. It is possible to test the homogeneity assumption that is not testable in the Lincoln–Petersen framework. The approach is applied to surveillance data on syphilis from Izmir, Turkey.
The genus Coleodactylus (Sphaerodactylinae, Gekkota) revisited: A molecular phylogenetic perspective
Resumo:
Nucleotide sequence data from a mitochondrial gene (16S) and two nuclear genes (c-mos, RAG-1) were used to evaluate the monophyly of the genus Coleodactylus, to provide the first phylogenetic hypothesis of relationships among its species in a cladistic framework, and to estimate the relative timing, of species divergences. Maximum Parsimony, Maximum Likelihood and Bayesian analyses of the combined data sets retrieved Coleodactylus as a monophyletic genus, although weakly Supported. Species were recovered as two genetically and morphological distinct clades, with C. amazonicus populations forming the sister taxon to the meridionalis group (C. brachystoma, C. meridionalis, C. natalensis, and C. septentrionalis). Within this group, C. septentrionalis was placed as the sister taxon to a clade comprising the rest of the species, C. meridionalis was recovered as the sister species to C. brachystoma, and C natalensis was found nested within C. meridionalis. Divergence time estimates based on penalized likelihood and Bayesian dating methods do not Support the previous hypothesis based on the Quaternary rain forest fragmentation model proposed to explain the diversification of the genus. The basal cladogenic event between major lineages of Coleodactylus was estimated to have occurred in the late Cretaceous (72.6 +/- 1.77 Mya), approximately at the same point in time than the other genera of Sphaerodactylinae diverged from each other. Within the meridionalis group, the split between C. septentrionalis and C. brachystoma + C. meridionalis was placed in the Eocene (46.4 +/- 4.22 Mya), and the divergence between C. brachystoma and C. meridionalis was estimated to have occurred in the Oligocene (29.3 +/- 4.33 Mya). Most intraspecific cladogenesis occurred through Miocene to Pliocene, and only for two conspecific samples and for C. natalensis could a Quaternary differentiation be assumed (1.9 +/- 1.3 Mya). (C) 2008 Elsevier Inc. All rights reserved.
Resumo:
The increase in biodiversity from high to low latitudes is a widely recognized biogeographical pattern. According to the latitudinal gradient hypothesis (LGH), this pattern was shaped by differential effects of Late Quaternary climatic changes across a latitudinal gradient. Here, we evaluate the effects of climatic changes across a tropical latitudinal gradient and its implications to diversification of an Atlantic Forest (AF) endemic passerine. We studied the intraspecific diversification and historical demography of Sclerurus scansor, based on mitochondrial (ND2, ND3 and cytb) and nuclear (FIB7) gene sequences. Phylogenetic analyses recovered three well-supported clades associated with distinct latitudinal zones. Coalescent-based methods were applied to estimate divergence times and changes in effective population sizes. Estimates of divergence times indicate that intraspecific diversification took place during Middle-Late Pleistocene. Distinct demographic scenarios were identified, with the southern lineage exhibiting a clear signature of demographic expansion, while the central one remained more stable. The northern lineage, contrasting with LGH predictions, exhibited a clear sign of a recent bottleneck. Our results suggest that different AF regions reacted distinctly, even in opposite ways, under the same climatic period, producing simultaneously favourable scenarios for isolation and contact among populations.
Resumo:
The estimation of data transformation is very useful to yield response variables satisfying closely a normal linear model, Generalized linear models enable the fitting of models to a wide range of data types. These models are based on exponential dispersion models. We propose a new class of transformed generalized linear models to extend the Box and Cox models and the generalized linear models. We use the generalized linear model framework to fit these models and discuss maximum likelihood estimation and inference. We give a simple formula to estimate the parameter that index the transformation of the response variable for a subclass of models. We also give a simple formula to estimate the rth moment of the original dependent variable. We explore the possibility of using these models to time series data to extend the generalized autoregressive moving average models discussed by Benjamin er al. [Generalized autoregressive moving average models. J. Amer. Statist. Assoc. 98, 214-223]. The usefulness of these models is illustrated in a Simulation study and in applications to three real data sets. (C) 2009 Elsevier B.V. All rights reserved.
Resumo:
For the first time, we introduce a class of transformed symmetric models to extend the Box and Cox models to more general symmetric models. The new class of models includes all symmetric continuous distributions with a possible non-linear structure for the mean and enables the fitting of a wide range of models to several data types. The proposed methods offer more flexible alternatives to Box-Cox or other existing procedures. We derive a very simple iterative process for fitting these models by maximum likelihood, whereas a direct unconditional maximization would be more difficult. We give simple formulae to estimate the parameter that indexes the transformation of the response variable and the moments of the original dependent variable which generalize previous published results. We discuss inference on the model parameters. The usefulness of the new class of models is illustrated in one application to a real dataset.
Resumo:
We obtain adjustments to the profile likelihood function in Weibull regression models with and without censoring. Specifically, we consider two different modified profile likelihoods: (i) the one proposed by Cox and Reid [Cox, D.R. and Reid, N., 1987, Parameter orthogonality and approximate conditional inference. Journal of the Royal Statistical Society B, 49, 1-39.], and (ii) an approximation to the one proposed by Barndorff-Nielsen [Barndorff-Nielsen, O.E., 1983, On a formula for the distribution of the maximum likelihood estimator. Biometrika, 70, 343-365.], the approximation having been obtained using the results by Fraser and Reid [Fraser, D.A.S. and Reid, N., 1995, Ancillaries and third-order significance. Utilitas Mathematica, 47, 33-53.] and by Fraser et al. [Fraser, D.A.S., Reid, N. and Wu, J., 1999, A simple formula for tail probabilities for frequentist and Bayesian inference. Biometrika, 86, 655-661.]. We focus on point estimation and likelihood ratio tests on the shape parameter in the class of Weibull regression models. We derive some distributional properties of the different maximum likelihood estimators and likelihood ratio tests. The numerical evidence presented in the paper favors the approximation to Barndorff-Nielsen`s adjustment.
Resumo:
The Birnbaum-Saunders regression model is becoming increasingly popular in lifetime analyses and reliability studies. In this model, the signed likelihood ratio statistic provides the basis for testing inference and construction of confidence limits for a single parameter of interest. We focus on the small sample case, where the standard normal distribution gives a poor approximation to the true distribution of the statistic. We derive three adjusted signed likelihood ratio statistics that lead to very accurate inference even for very small samples. Two empirical applications are presented. (C) 2010 Elsevier B.V. All rights reserved.
Resumo:
We review some issues related to the implications of different missing data mechanisms on statistical inference for contingency tables and consider simulation studies to compare the results obtained under such models to those where the units with missing data are disregarded. We confirm that although, in general, analyses under the correct missing at random and missing completely at random models are more efficient even for small sample sizes, there are exceptions where they may not improve the results obtained by ignoring the partially classified data. We show that under the missing not at random (MNAR) model, estimates on the boundary of the parameter space as well as lack of identifiability of the parameters of saturated models may be associated with undesirable asymptotic properties of maximum likelihood estimators and likelihood ratio tests; even in standard cases the bias of the estimators may be low only for very large samples. We also show that the probability of a boundary solution obtained under the correct MNAR model may be large even for large samples and that, consequently, we may not always conclude that a MNAR model is misspecified because the estimate is on the boundary of the parameter space.
Resumo:
Although the asymptotic distributions of the likelihood ratio for testing hypotheses of null variance components in linear mixed models derived by Stram and Lee [1994. Variance components testing in longitudinal mixed effects model. Biometrics 50, 1171-1177] are valid, their proof is based on the work of Self and Liang [1987. Asymptotic properties of maximum likelihood estimators and likelihood tests under nonstandard conditions. J. Amer. Statist. Assoc. 82, 605-610] which requires identically distributed random variables, an assumption not always valid in longitudinal data problems. We use the less restrictive results of Vu and Zhou [1997. Generalization of likelihood ratio tests under nonstandard conditions. Ann. Statist. 25, 897-916] to prove that the proposed mixture of chi-squared distributions is the actual asymptotic distribution of such likelihood ratios used as test statistics for null variance components in models with one or two random effects. We also consider a limited simulation study to evaluate the appropriateness of the asymptotic distribution of such likelihood ratios in moderately sized samples. (C) 2008 Elsevier B.V. All rights reserved.
Resumo:
The aim of this article is to discuss the estimation of the systematic risk in capital asset pricing models with heavy-tailed error distributions to explain the asset returns. Diagnostic methods for assessing departures from the model assumptions as well as the influence of observations on the parameter estimates are also presented. It may be shown that outlying observations are down weighted in the maximum likelihood equations of linear models with heavy-tailed error distributions, such as Student-t, power exponential, logistic II, so on. This robustness aspect may also be extended to influential observations. An application in which the systematic risk estimate of Microsoft is compared under normal and heavy-tailed errors is presented for illustration.
Resumo:
A textura é um atributo ainda pouco utilizado no reconhecimento automático de cenas naturais em sensoriamento remoto, já que ela advém da sensação visual causada pelas variações tonais existentes em uma determinada região da imagem, tornando difícil a sua quantificação. A morfologia matemática, através de operações como erosão, dilatação e abertura, permite decompor uma imagem em elementos fundamentais, as primitivas texturais. As primitivas texturais apresentam diversas dimensões, sendo possível associar um conjunto de primitivas com dimensões semelhantes, em uma determinada classe textural. O processo de classificação textural quantifica as primitivas texturais, extrai as distribuições das dimensões das mesmas e separa as diferentes distribuições por meio de um classificador de máxima-verossimilhança gaussiana. O resultado final é uma imagem temática na qual cada tema representa uma das texturas existentes na imagem original.
Resumo:
In this paper, we propose a class of ACD-type models that accommodates overdispersion, intermittent dynamics, multiple regimes, and sign and size asymmetries in financial durations. In particular, our functional coefficient autoregressive conditional duration (FC-ACD) model relies on a smooth-transition autoregressive specification. The motivation lies on the fact that the latter yields a universal approximation if one lets the number of regimes grows without bound. After establishing that the sufficient conditions for strict stationarity do not exclude explosive regimes, we address model identifiability as well as the existence, consistency, and asymptotic normality of the quasi-maximum likelihood (QML) estimator for the FC-ACD model with a fixed number of regimes. In addition, we also discuss how to consistently estimate using a sieve approach a semiparametric variant of the FC-ACD model that takes the number of regimes to infinity. An empirical illustration indicates that our functional coefficient model is flexible enough to model IBM price durations.
Resumo:
The objective of this study was to estimate the genetic parameters affecting milk production (MP), fat (%F) and protein (%P) contents of buffalo milk. Restricted Maximum Likelihood (REML) using MTDFREML program under animal model analyzed a total of 1744 lactations records from 1268 cows. The means were: MP = 1259.47 +/- 523.09 kg, %F = 6.87 +/- 0.88% and %P = 3.91 +/- 0.61%. The estimates of repeatability and heritability coefficients were: MP = 0.38 and 0.24, %F = 0.28 and 0.21 and %P = 0.30 and 0.26, respectively. The estimated genetic and phenotypic correlations were MP x %F = -0.18 and -0.62, MP x %P = -0.23 and -0.59 and %F x %P = 0.50 and 0.77, respectively. According to these results it is possible to conclude that selection is a proper way to increase milk yield, fat and protein percentage. Although negative low values of genetic correlations among traits, it should be take into account that simultaneous selection based on these traits could not be so efficient.