970 resultados para Zero-inflated Count Data


Relevância:

100.00% 100.00%

Publicador:

Resumo:

El artículo analiza los determinantes de la presencia de hijos no deseados en Colombia. Se utiliza la información de la Encuesta Nacional de Demografía y Salud (ENDS, 2005), específicamente para las mujeres de 40 años o más. Dadas las características especiales de la variable que se analiza, se utilizan modelos de conteo para verificar si determinadas características socioeconómicas como la educación o el estrato económico explican la presencia de hijos no deseados. Se encuentra que la educación de la mujer y el área de residencia son determinantes significativos de los nacimientos no planeados. Además, la relación negativa entre el número de hijos no deseados y la educación de la mujer arroja implicaciones clave en materia de política social.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The contribution investigates the problem of estimating the size of a population, also known as the missing cases problem. Suppose a registration system is targeting to identify all cases having a certain characteristic such as a specific disease (cancer, heart disease, ...), disease related condition (HIV, heroin use, ...) or a specific behavior (driving a car without license). Every case in such a registration system has a certain notification history in that it might have been identified several times (at least once) which can be understood as a particular capture-recapture situation. Typically, cases are left out which have never been listed at any occasion, and it is this frequency one wants to estimate. In this paper modelling is concentrating on the counting distribution, e.g. the distribution of the variable that counts how often a given case has been identified by the registration system. Besides very simple models like the binomial or Poisson distribution, finite (nonparametric) mixtures of these are considered providing rather flexible modelling tools. Estimation is done using maximum likelihood by means of the EM algorithm. A case study on heroin users in Bangkok in the year 2001 is completing the contribution.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Population size estimation with discrete or nonparametric mixture models is considered, and reliable ways of construction of the nonparametric mixture model estimator are reviewed and set into perspective. Construction of the maximum likelihood estimator of the mixing distribution is done for any number of components up to the global nonparametric maximum likelihood bound using the EM algorithm. In addition, the estimators of Chao and Zelterman are considered with some generalisations of Zelterman’s estimator. All computations are done with CAMCR, a special software developed for population size estimation with mixture models. Several examples and data sets are discussed and the estimators illustrated. Problems using the mixture model-based estimators are highlighted.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This article is about modeling count data with zero truncation. A parametric count density family is considered. The truncated mixture of densities from this family is different from the mixture of truncated densities from the same family. Whereas the former model is more natural to formulate and to interpret, the latter model is theoretically easier to treat. It is shown that for any mixing distribution leading to a truncated mixture, a (usually different) mixing distribution can be found so. that the associated mixture of truncated densities equals the truncated mixture, and vice versa. This implies that the likelihood surfaces for both situations agree, and in this sense both models are equivalent. Zero-truncated count data models are used frequently in the capture-recapture setting to estimate population size, and it can be shown that the two Horvitz-Thompson estimators, associated with the two models, agree. In particular, it is possible to achieve strong results for mixtures of truncated Poisson densities, including reliable, global construction of the unique NPMLE (nonparametric maximum likelihood estimator) of the mixing distribution, implying a unique estimator for the population size. The benefit of these results lies in the fact that it is valid to work with the mixture of truncated count densities, which is less appealing for the practitioner but theoretically easier. Mixtures of truncated count densities form a convex linear model, for which a developed theory exists, including global maximum likelihood theory as well as algorithmic approaches. Once the problem has been solved in this class, it might readily be transformed back to the original problem by means of an explicitly given mapping. Applications of these ideas are given, particularly in the case of the truncated Poisson family.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Population size estimation with discrete or nonparametric mixture models is considered, and reliable ways of construction of the nonparametric mixture model estimator are reviewed and set into perspective. Construction of the maximum likelihood estimator of the mixing distribution is done for any number of components up to the global nonparametric maximum likelihood bound using the EM algorithm. In addition, the estimators of Chao and Zelterman are considered with some generalisations of Zelterman’s estimator. All computations are done with CAMCR, a special software developed for population size estimation with mixture models. Several examples and data sets are discussed and the estimators illustrated. Problems using the mixture model-based estimators are highlighted.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We propose a geoadditive negative binomial model (Geo-NB-GAM) for regional count data that allows us to address simultaneously some important methodological issues, such as spatial clustering, nonlinearities, and overdispersion. This model is applied to the study of location determinants of inward greenfield investments that occurred during 2003–2007 in 249 European regions. After presenting the data set and showing the presence of overdispersion and spatial clustering, we review the theoretical framework that motivates the choice of the location determinants included in the empirical model, and we highlight some reasons why the relationship between some of the covariates and the dependent variable might be nonlinear. The subsequent section first describes the solutions proposed by previous literature to tackle spatial clustering, nonlinearities, and overdispersion, and then presents the Geo-NB-GAM. The empirical analysis shows the good performance of Geo-NB-GAM. Notably, the inclusion of a geoadditive component (a smooth spatial trend surface) permits us to control for spatial unobserved heterogeneity that induces spatial clustering. Allowing for nonlinearities reveals, in keeping with theoretical predictions, that the positive effect of agglomeration economies fades as the density of economic activities reaches some threshold value. However, no matter how dense the economic activity becomes, our results suggest that congestion costs never overcome positive agglomeration externalities.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

INTRODUÇÃO: A malaria é uma doença endêmica na região da Amazônia Brasileira, e a detecção de possíveis fatores de risco pode ser de grande interesse às autoridades em saúde pública. O objetivo deste artigo é investigar a associação entre variáveis ambientais e os registros anuais de malária na região amazônica usando métodos bayesianos espaço-temporais. MÉTODOS: Utilizaram-se modelos de regressão espaço-temporais de Poisson para analisar os dados anuais de contagem de casos de malária entre os anos de 1999 a 2008, considerando a presença de alguns fatores como a taxa de desflorestamento. em uma abordagem bayesiana, as inferências foram obtidas por métodos Monte Carlo em cadeias de Markov (MCMC) que simularam amostras para a distribuição conjunta a posteriori de interesse. A discriminação de diferentes modelos também foi discutida. RESULTADOS: O modelo aqui proposto sugeriu que a taxa de desflorestamento, o número de habitants por km² e o índice de desenvolvimento humano (IDH) são importantes para a predição de casos de malária. CONCLUSÕES: É possível concluir que o desenvolvimento humano, o crescimento populacional, o desflorestamento e as alterações ecológicas associadas a estes fatores estão associados ao aumento do risco de malária. Pode-se ainda concluir que o uso de modelos de regressão de Poisson que capturam o efeito temporal e espacial em um enfoque bayesiano é uma boa estratégia para modelar dados de contagem de malária.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

On-farm records are essential for managing mastitis in dairy herds. Mastitis records are a useful tool for caring for an individual cow, to monitor compliance of farm personnel working with groups of animals, to understand the epidemiology of mastitis in the herd, to ensure responsible drug utilization, and to document accountability in care of the cow. Herds have become larger and more people are involved with individual animal care. This article describes a records plan that can be used to monitor mastitis at the herd level, aid in decision-making processes for individual cows, and improve drug use on dairy herds.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The recent advent of Next-generation sequencing technologies has revolutionized the way of analyzing the genome. This innovation allows to get deeper information at a lower cost and in less time, and provides data that are discrete measurements. One of the most important applications with these data is the differential analysis, that is investigating if one gene exhibit a different expression level in correspondence of two (or more) biological conditions (such as disease states, treatments received and so on). As for the statistical analysis, the final aim will be statistical testing and for modeling these data the Negative Binomial distribution is considered the most adequate one especially because it allows for "over dispersion". However, the estimation of the dispersion parameter is a very delicate issue because few information are usually available for estimating it. Many strategies have been proposed, but they often result in procedures based on plug-in estimates, and in this thesis we show that this discrepancy between the estimation and the testing framework can lead to uncontrolled first-type errors. We propose a mixture model that allows each gene to share information with other genes that exhibit similar variability. Afterwards, three consistent statistical tests are developed for differential expression analysis. We show that the proposed method improves the sensitivity of detecting differentially expressed genes with respect to the common procedures, since it is the best one in reaching the nominal value for the first-type error, while keeping elevate power. The method is finally illustrated on prostate cancer RNA-seq data.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper, we study panel count data with informative observation times. We assume nonparametric and semiparametric proportional rate models for the underlying recurrent event process, where the form of the baseline rate function is left unspecified and a subject-specific frailty variable inflates or deflates the rate function multiplicatively. The proposed models allow the recurrent event processes and observation times to be correlated through their connections with the unobserved frailty; moreover, the distributions of both the frailty variable and observation times are considered as nuisance parameters. The baseline rate function and the regression parameters are estimated by maximizing a conditional likelihood function of observed event counts and solving estimation equations. Large sample properties of the proposed estimators are studied. Numerical studies demonstrate that the proposed estimation procedures perform well for moderate sample sizes. An application to a bladder tumor study is presented to illustrate the use of the proposed methods.