994 resultados para Zero inflated models


Relevância:

100.00% 100.00%

Publicador:

Resumo:

The zero-inflated negative binomial model is used to account for overdispersion detected in data that are initially analyzed under the zero-Inflated Poisson model A frequentist analysis a jackknife estimator and a non-parametric bootstrap for parameter estimation of zero-inflated negative binomial regression models are considered In addition an EM-type algorithm is developed for performing maximum likelihood estimation Then the appropriate matrices for assessing local influence on the parameter estimates under different perturbation schemes and some ways to perform global influence analysis are derived In order to study departures from the error assumption as well as the presence of outliers residual analysis based on the standardized Pearson residuals is discussed The relevance of the approach is illustrated with a real data set where It is shown that zero-inflated negative binomial regression models seems to fit the data better than the Poisson counterpart (C) 2010 Elsevier B V All rights reserved

Relevância:

100.00% 100.00%

Publicador:

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper develops stochastic search variable selection (SSVS) for zero-inflated count models which are commonly used in health economics. This allows for either model averaging or model selection in situations with many potential regressors. The proposed techniques are applied to a data set from Germany considering the demand for health care. A package for the free statistical software environment R is provided.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Environmental data are spatial, temporal, and often come with many zeros. In this paper, we included space–time random effects in zero-inflated Poisson (ZIP) and ‘hurdle’ models to investigate haulout patterns of harbor seals on glacial ice. The data consisted of counts, for 18 dates on a lattice grid of samples, of harbor seals hauled out on glacial ice in Disenchantment Bay, near Yakutat, Alaska. A hurdle model is similar to a ZIP model except it does not mix zeros from the binary and count processes. Both models can be used for zero-inflated data, and we compared space–time ZIP and hurdle models in a Bayesian hierarchical model. Space–time ZIP and hurdle models were constructed by using spatial conditional autoregressive (CAR) models and temporal first-order autoregressive (AR(1)) models as random effects in ZIP and hurdle regression models. We created maps of smoothed predictions for harbor seal counts based on ice density, other covariates, and spatio-temporal random effects. For both models predictions around the edges appeared to be positively biased. The linex loss function is an asymmetric loss function that penalizes overprediction more than underprediction, and we used it to correct for prediction bias to get the best map for space–time ZIP and hurdle models.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

When actuaries face with the problem of pricing an insurance contract that contains different types of coverage, such as a motor insurance or homeowner's insurance policy, they usually assume that types of claim are independent. However, this assumption may not be realistic: several studies have shown that there is a positive correlation between types of claim. Here we introduce different regression models in order to relax the independence assumption, including zero-inflated models to account for excess of zeros and overdispersion. These models have been largely ignored to multivariate Poisson date, mainly because of their computational di±culties. Bayesian inference based on MCMC helps to solve this problem (and also lets us derive, for several quantities of interest, posterior summaries to account for uncertainty). Finally, these models are applied to an automobile insurance claims database with three different types of claims. We analyse the consequences for pure and loaded premiums when the independence assumption is relaxed by using different multivariate Poisson regression models and their zero-inflated versions.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In a recent paper Bermúdez [2009] used bivariate Poisson regression models for ratemaking in car insurance, and included zero-inflated models to account for the excess of zeros and the overdispersion in the data set. In the present paper, we revisit this model in order to consider alternatives. We propose a 2-finite mixture of bivariate Poisson regression models to demonstrate that the overdispersion in the data requires more structure if it is to be taken into account, and that a simple zero-inflated bivariate Poisson model does not suffice. At the same time, we show that a finite mixture of bivariate Poisson regression models embraces zero-inflated bivariate Poisson regression models as a special case. Additionally, we describe a model in which the mixing proportions are dependent on covariates when modelling the way in which each individual belongs to a separate cluster. Finally, an EM algorithm is provided in order to ensure the models’ ease-of-fit. These models are applied to the same automobile insurance claims data set as used in Bermúdez [2009] and it is shown that the modelling of the data set can be improved considerably.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper is concerned with the analysis of zero-inflated count data when time of exposure varies. It proposes a modified zero-inflated count data model where the probability of an extra zero is derived from an underlying duration model with Weibull hazard rate. The new model is compared to the standard Poisson model with logit zero inflation in an application to the effect of treatment with thiotepa on the number of new bladder tumors.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Count data with excess zeros relative to a Poisson distribution are common in many biomedical applications. A popular approach to the analysis of such data is to use a zero-inflated Poisson (ZIP) regression model. Often, because of the hierarchical Study design or the data collection procedure, zero-inflation and lack of independence may occur simultaneously, which tender the standard ZIP model inadequate. To account for the preponderance of zero counts and the inherent correlation of observations, a class of multi-level ZIP regression model with random effects is presented. Model fitting is facilitated using an expectation-maximization algorithm, whereas variance components are estimated via residual maximum likelihood estimating equations. A score test for zero-inflation is also presented. The multi-level ZIP model is then generalized to cope with a more complex correlation structure. Application to the analysis of correlated count data from a longitudinal infant feeding study illustrates the usefulness of the approach.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Certain characteristics of some vegetable crops allow multiple harvests during the production cycle; however, to our knowledge, no study has described the behavior of fruit production with progression of the production cycle in vegetable crops with multiple harvests that present data overdispersion. We aimed to characterize the data overdispersion of zero-inflated variables and identify the behavior of these variables during the production cycle of several vegetable crops with multiple harvests. Data from 11 uniformity trials were used without applying treatments; these comprise the database from the Experimental Plants Group at the Federal University of Santa Maria, Brazil. The trials were conducted using four horticultural species grown during different cultivation seasons, cultivation environments, and experimental structures. Although at each harvest, a larger number of basic units with harvest fruit was observed than units without harvest fruit, the basic unit percentage without fruit was high, generating an overdispersion within each individual harvest. The variability within each harvest was high and increased with the evolution of the production cycle of Capsicum annuum, Solanum lycopersicum var. cerasiforme, Phaseolus vulgaris, and Cucurbita pepo species. However, the correlation coefficient between the mean weight and number of harvest fruits tended to remain constant during the crop production cycle. These behaviors show that harvest management should be done individually, at each harvest, such that data overdispersion is reduced.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Species occurrence and abundance models are important tools that can be used in biodiversity conservation, and can be applied to predict or plan actions needed to mitigate the environmental impacts of hydropower dams. In this study our objectives were: (i) to model the occurrence and abundance of threatened plant species, (ii) to verify the relationship between predicted occurrence and true abundance, and (iii) to assess whether models based on abundance are more effective in predicting species occurrence than those based on presence–absence data. Individual representatives of nine species were counted within 388 randomly georeferenced plots (10 m × 50 m) around the Barra Grande hydropower dam reservoir in southern Brazil. We modelled their relationship with 15 environmental variables using both occurrence (Generalised Linear Models) and abundance data (Hurdle and Zero-Inflated models). Overall, occurrence models were more accurate than abundance models. For all species, observed abundance was significantly, although not strongly, correlated with the probability of occurrence. This correlation lost significance when zero-abundance (absence) sites were excluded from analysis, but only when this entailed a substantial drop in sample size. The same occurred when analysing relationships between abundance and probability of occurrence from previously published studies on a range of different species, suggesting that future studies could potentially use probability of occurrence as an approximate indicator of abundance when the latter is not possible to obtain. This possibility might, however, depend on life history traits of the species in question, with some traits favouring a relationship between occurrence and abundance. Reconstructing species abundance patterns from occurrence could be an important tool for conservation planning and the management of threatened species, allowing scientists to indicate the best areas for collection and reintroduction of plant germplasm or choose conservation areas most likely to maintain viable populations.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Les modèles à sur-représentation de zéros discrets et continus ont une large gamme d'applications et leurs propriétés sont bien connues. Bien qu'il existe des travaux portant sur les modèles discrets à sous-représentation de zéro et modifiés à zéro, la formulation usuelle des modèles continus à sur-représentation -- un mélange entre une densité continue et une masse de Dirac -- empêche de les généraliser afin de couvrir le cas de la sous-représentation de zéros. Une formulation alternative des modèles continus à sur-représentation de zéros, pouvant aisément être généralisée au cas de la sous-représentation, est présentée ici. L'estimation est d'abord abordée sous le paradigme classique, et plusieurs méthodes d'obtention des estimateurs du maximum de vraisemblance sont proposées. Le problème de l'estimation ponctuelle est également considéré du point de vue bayésien. Des tests d'hypothèses classiques et bayésiens visant à déterminer si des données sont à sur- ou sous-représentation de zéros sont présentées. Les méthodes d'estimation et de tests sont aussi évaluées au moyen d'études de simulation et appliquées à des données de précipitation agrégées. Les diverses méthodes s'accordent sur la sous-représentation de zéros des données, démontrant la pertinence du modèle proposé. Nous considérons ensuite la classification d'échantillons de données à sous-représentation de zéros. De telles données étant fortement non normales, il est possible de croire que les méthodes courantes de détermination du nombre de grappes s'avèrent peu performantes. Nous affirmons que la classification bayésienne, basée sur la distribution marginale des observations, tiendrait compte des particularités du modèle, ce qui se traduirait par une meilleure performance. Plusieurs méthodes de classification sont comparées au moyen d'une étude de simulation, et la méthode proposée est appliquée à des données de précipitation agrégées provenant de 28 stations de mesure en Colombie-Britannique.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Les données comptées (count data) possèdent des distributions ayant des caractéristiques particulières comme la non-normalité, l’hétérogénéité des variances ainsi qu’un nombre important de zéros. Il est donc nécessaire d’utiliser les modèles appropriés afin d’obtenir des résultats non biaisés. Ce mémoire compare quatre modèles d’analyse pouvant être utilisés pour les données comptées : le modèle de Poisson, le modèle binomial négatif, le modèle de Poisson avec inflation du zéro et le modèle binomial négatif avec inflation du zéro. À des fins de comparaisons, la prédiction de la proportion du zéro, la confirmation ou l’infirmation des différentes hypothèses ainsi que la prédiction des moyennes furent utilisées afin de déterminer l’adéquation des différents modèles. Pour ce faire, le nombre d’arrestations des membres de gangs de rue sur le territoire de Montréal fut utilisé pour la période de 2005 à 2007. L’échantillon est composé de 470 hommes, âgés de 18 à 59 ans. Au terme des analyses, le modèle le plus adéquat est le modèle binomial négatif puisque celui-ci produit des résultats significatifs, s’adapte bien aux données observées et produit une proportion de zéro très similaire à celle observée.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Demonstrating the existence of trends in monitoring data is of increasing practical importance to conservation managers wishing to preserve threatened species or reduce the impact of pest species. However, the ability to do so can be compromised if the species in question has low detectability and the true occupancy level or abundance of the species is thus obscured. Zero-inflated models that explicitly model detectability improve the ability to make sound ecological inference in such situations. In this paper we apply an occupancy model including detectability to data from the initial stages of a fox-monitoring program on the Eyre Peninsula, South Australia. We find that detectability is extremely low (< 18%) and varies according to season and the presence or absence of roadside vegetation. We show that simple methods of using monitoring data to inform management, such as plotting the raw data or performing logistic regression, fail to accurately diagnose either the status of the fox population or its trajectory over time. We use the results of the detectability model to consider how future monitoring could be redesigned to achieve efficiency gains. A wide range of monitoring programs could benefit from similar analyses, as part of an active adaptive approach to improving monitoring and management.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this study, we deal with the problem of overdispersion beyond extra zeros for a collection of counts that can be correlated. Poisson, negative binomial, zero-inflated Poisson and zero-inflated negative binomial distributions have been considered. First, we propose a multivariate count model in which all counts follow the same distribution and are correlated. Then we extend this model in a sense that correlated counts may follow different distributions. To accommodate correlation among counts, we have considered correlated random effects for each individual in the mean structure, thus inducing dependency among common observations to an individual. The method is applied to real data to investigate variation in food resources use in a species of marsupial in a locality of the Brazilian Cerrado biome. © 2013 Copyright Taylor and Francis Group, LLC.