928 resultados para Zero-inflated models, Poisson distribution, Negative binomial distribution, Bernoulli trials, Safety performance functions, Small area analysis


Relevância:

100.00% 100.00%

Publicador:

Resumo:

There has been considerable research conducted over the last 20 years focused on predicting motor vehicle crashes on transportation facilities. The range of statistical models commonly applied includes binomial, Poisson, Poisson-gamma (or negative binomial), zero-inflated Poisson and negative binomial models (ZIP and ZINB), and multinomial probability models. Given the range of possible modeling approaches and the host of assumptions with each modeling approach, making an intelligent choice for modeling motor vehicle crash data is difficult. There is little discussion in the literature comparing different statistical modeling approaches, identifying which statistical models are most appropriate for modeling crash data, and providing a strong justification from basic crash principles. In the recent literature, it has been suggested that the motor vehicle crash process can successfully be modeled by assuming a dual-state data-generating process, which implies that entities (e.g., intersections, road segments, pedestrian crossings, etc.) exist in one of two states—perfectly safe and unsafe. As a result, the ZIP and ZINB are two models that have been applied to account for the preponderance of “excess” zeros frequently observed in crash count data. The objective of this study is to provide defensible guidance on how to appropriate model crash data. We first examine the motor vehicle crash process using theoretical principles and a basic understanding of the crash process. It is shown that the fundamental crash process follows a Bernoulli trial with unequal probability of independent events, also known as Poisson trials. We examine the evolution of statistical models as they apply to the motor vehicle crash process, and indicate how well they statistically approximate the crash process. We also present the theory behind dual-state process count models, and note why they have become popular for modeling crash data. A simulation experiment is then conducted to demonstrate how crash data give rise to “excess” zeros frequently observed in crash data. It is shown that the Poisson and other mixed probabilistic structures are approximations assumed for modeling the motor vehicle crash process. Furthermore, it is demonstrated that under certain (fairly common) circumstances excess zeros are observed—and that these circumstances arise from low exposure and/or inappropriate selection of time/space scales and not an underlying dual state process. In conclusion, carefully selecting the time/space scales for analysis, including an improved set of explanatory variables and/or unobserved heterogeneity effects in count regression models, or applying small-area statistical methods (observations with low exposure) represent the most defensible modeling approaches for datasets with a preponderance of zeros

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The intent of this note is to succinctly articulate additional points that were not provided in the original paper (Lord et al., 2005) and to help clarify a collective reluctance to adopt zero-inflated (ZI) models for modeling highway safety data. A dialogue on this important issue, just one of many important safety modeling issues, is healthy discourse on the path towards improved safety modeling. This note first provides a summary of prior findings and conclusions of the original paper. It then presents two critical and relevant issues: the maximizing statistical fit fallacy and logic problems with the ZI model in highway safety modeling. Finally, we provide brief conclusions.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background The aim of this study was to examine the distribution of physical activity facilities by area-level deprivation in Scotland, adjusting for differences in urbanicity, and exploring differences between and within the four largest Scottish cities. Methods We obtained a list of all recreational physical activity facilities in Scotland. These were mapped and assigned to datazones. Poisson and negative binomial regression models were used to investigate associations between the number of physical activity facilities relative to population size and quintile of area-level deprivation. Results The results showed that prior to adjustment for urbanicity, the density of all facilities lessened with increasing deprivation from quintiles 2 to 5. After adjustment for urbanicity and local authority, the effect of deprivation remained significant but the pattern altered, with datazones in quintile 3 having the highest estimated mean density of facilities. Within-city associations were identified between the number of physical activity facilities and area-level deprivation in Aberdeen and Dundee, but not in Edinburgh or Glasgow. Conclusions In conclusion, area-level deprivation appears to have a significant association with the density of physical activity facilities and although overall no clear pattern was observed, affluent areas had fewer publicly owned facilities than more deprived areas but a greater number of privately owned facilities.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Boston Harbor has had a history of poor water quality, including contamination by enteric pathogens. We conduct a statistical analysis of data collected by the Massachusetts Water Resources Authority (MWRA) between 1996 and 2002 to evaluate the effects of court-mandated improvements in sewage treatment. Motivated by the ineffectiveness of standard Poisson mixture models and their zero-inflated counterparts, we propose a new negative binomial model for time series of Enterococcus counts in Boston Harbor, where nonstationarity and autocorrelation are modeled using a nonparametric smooth function of time in the predictor. Without further restrictions, this function is not identifiable in the presence of time-dependent covariates; consequently we use a basis orthogonal to the space spanned by the covariates and use penalized quasi-likelihood (PQL) for estimation. We conclude that Enterococcus counts were greatly reduced near the Nut Island Treatment Plant (NITP) outfalls following the transfer of wastewaters from NITP to the Deer Island Treatment Plant (DITP) and that the transfer of wastewaters from Boston Harbor to the offshore diffusers in Massachusetts Bay reduced the Enterococcus counts near the DITP outfalls.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A new model selection criterion, termed as the “quasi-likelihood under the independence model criterion” (QIC), was proposed by Pan (2001) for GEE models. Cui (2007) developed a general computing program to implement the QIC method for a range of statistical distributions. However, only a special case of the negative binomial distribution was considered in Cui (2007), where the dispersion parameter equals to unity. This article introduces a new computing program that can be applied for the general negative binomial model, where the dispersion parameter can be any fixed value. An example is also given in this article.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

An organism living in water, and present at low density, may be distributed at random and therefore, samples taken from the water are likely to be distributed according to the Poisson distribution. The distribution of many organisms, however, is not random, individuals being either aggregated into clusters or more uniformly distributed. By fitting a Poisson distribution to data, it is only possible to test the hypothesis that an observed set of frequencies does not deviate significantly from an expected random pattern. Significant deviations from random, either as a result of increasing uniformity or aggregation, may be recognized by either rejection of the random hypothesis or by examining the variance/mean (V/M) ratio of the data. Hence, a V/M ratio not significantly different from unity indicates a random distribution, greater than unity a clustered distribution, and less then unity a regular or uniform distribution . If individual cells are clustered, however, the negative binomial distribution should provide a better description of the data. In addition, a parameter of this distribution, viz., the binomial exponent (k), may be used as a measure of the ‘intensity’ of aggregation present. Hence, this Statnote describes how to fit the negative binomial distribution to counts of a microorganism in samples taken from a freshwater environment.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In regression analysis of counts, a lack of simple and efficient algorithms for posterior computation has made Bayesian approaches appear unattractive and thus underdeveloped. We propose a lognormal and gamma mixed negative binomial (NB) regression model for counts, and present efficient closed-form Bayesian inference; unlike conventional Poisson models, the proposed approach has two free parameters to include two different kinds of random effects, and allows the incorporation of prior information, such as sparsity in the regression coefficients. By placing a gamma distribution prior on the NB dispersion parameter r, and connecting a log-normal distribution prior with the logit of the NB probability parameter p, efficient Gibbs sampling and variational Bayes inference are both developed. The closed-form updates are obtained by exploiting conditional conjugacy via both a compound Poisson representation and a Polya-Gamma distribution based data augmentation approach. The proposed Bayesian inference can be implemented routinely, while being easily generalizable to more complex settings involving multivariate dependence structures. The algorithms are illustrated using real examples. Copyright 2012 by the author(s)/owner(s).

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Soybean bugs are major crop pests that cause significant reduction in harvest yield and influence grain quality. The aim of this study was to verify the spatial distribution of Euschistus heros (F.) (Hemiptera: Pentatomidae) in conventional and transgenic soybean cultivars. The experiment was conducted during the 2010-2011 crop season in UNESP/FCAV, Jaboticabal, SP, Brazil, in two fields of 10,000-m2 area that were subdivided into 100 plots (10 m × 10 m). The cultivars sown were M 7908 RR and its isoline M-SOY 8001. The number of the first to fifth instars and the number of adults were determined. To evaluate insect dispersion in the area, the following indices were used: variance/mean ratio, Morisita index, Green coefficient, and the k exponent of the negative binomial distribution. To study probabilistic models to describe the spatial distribution of the insects, the adjustments of the Poisson and negative binomial distributions were tested. The first to third instars showed aggregated spatial distribution, whereas the fourth and fifth instars, and adults, isolated or grouped, showed variation in the arrangement, ranging from moderately aggregated to randomly dispersed. During the adjustment of probability distributions, the negative binomial distribution model showed adjustment for the first to third instars, fourth and fifth instars, adults, and fourth and fifth instars plus adults. © 2013 Sociedade Entomológica do Brasil.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Count data with excess zeros relative to a Poisson distribution are common in many biomedical applications. A popular approach to the analysis of such data is to use a zero-inflated Poisson (ZIP) regression model. Often, because of the hierarchical Study design or the data collection procedure, zero-inflation and lack of independence may occur simultaneously, which tender the standard ZIP model inadequate. To account for the preponderance of zero counts and the inherent correlation of observations, a class of multi-level ZIP regression model with random effects is presented. Model fitting is facilitated using an expectation-maximization algorithm, whereas variance components are estimated via residual maximum likelihood estimating equations. A score test for zero-inflation is also presented. The multi-level ZIP model is then generalized to cope with a more complex correlation structure. Application to the analysis of correlated count data from a longitudinal infant feeding study illustrates the usefulness of the approach.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Павел Т. Стойнов - В тази работа се разглежда отрицателно биномното разпределение, известно още като разпределение на Пойа. Предполагаме, че смесващото разпределение е претеглено гама разпределение. Изведени са вероятностите в някои частни случаи. Дадени са рекурентните формули на Панжер.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Only a few characterizations have been obtained in literatute for the negative binomial distribution (see Johnson et al., Chap. 5, 1992). In this article a characterization of the negative binomial distribution related to random sums is obtained which is motivated by the geometric distribution characterization given by Khalil et al. (1991). An interpretation in terms of an unreliable system is given.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Crash reduction factors (CRFs) are used to estimate the potential number of traffic crashes expected to be prevented from investment in safety improvement projects. The method used to develop CRFs in Florida has been based on the commonly used before-and-after approach. This approach suffers from a widely recognized problem known as regression-to-the-mean (RTM). The Empirical Bayes (EB) method has been introduced as a means to addressing the RTM problem. This method requires the information from both the treatment and reference sites in order to predict the expected number of crashes had the safety improvement projects at the treatment sites not been implemented. The information from the reference sites is estimated from a safety performance function (SPF), which is a mathematical relationship that links crashes to traffic exposure. The objective of this dissertation was to develop the SPFs for different functional classes of the Florida State Highway System. Crash data from years 2001 through 2003 along with traffic and geometric data were used in the SPF model development. SPFs for both rural and urban roadway categories were developed. The modeling data used were based on one-mile segments that contain homogeneous traffic and geometric conditions within each segment. Segments involving intersections were excluded. The scatter plots of data show that the relationships between crashes and traffic exposure are nonlinear, that crashes increase with traffic exposure in an increasing rate. Four regression models, namely, Poisson (PRM), Negative Binomial (NBRM), zero-inflated Poisson (ZIP), and zero-inflated Negative Binomial (ZINB), were fitted to the one-mile segment records for individual roadway categories. The best model was selected for each category based on a combination of the Likelihood Ratio test, the Vuong statistical test, and the Akaike's Information Criterion (AIC). The NBRM model was found to be appropriate for only one category and the ZINB model was found to be more appropriate for six other categories. The overall results show that the Negative Binomial distribution model generally provides a better fit for the data than the Poisson distribution model. In addition, the ZINB model was found to give the best fit when the count data exhibit excess zeros and over-dispersion for most of the roadway categories. While model validation shows that most data points fall within the 95% prediction intervals of the models developed, the Pearson goodness-of-fit measure does not show statistical significance. This is expected as traffic volume is only one of the many factors contributing to the overall crash experience, and that the SPFs are to be applied in conjunction with Accident Modification Factors (AMFs) to further account for the safety impacts of major geometric features before arriving at the final crash prediction. However, with improved traffic and crash data quality, the crash prediction power of SPF models may be further improved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Les données comptées (count data) possèdent des distributions ayant des caractéristiques particulières comme la non-normalité, l’hétérogénéité des variances ainsi qu’un nombre important de zéros. Il est donc nécessaire d’utiliser les modèles appropriés afin d’obtenir des résultats non biaisés. Ce mémoire compare quatre modèles d’analyse pouvant être utilisés pour les données comptées : le modèle de Poisson, le modèle binomial négatif, le modèle de Poisson avec inflation du zéro et le modèle binomial négatif avec inflation du zéro. À des fins de comparaisons, la prédiction de la proportion du zéro, la confirmation ou l’infirmation des différentes hypothèses ainsi que la prédiction des moyennes furent utilisées afin de déterminer l’adéquation des différents modèles. Pour ce faire, le nombre d’arrestations des membres de gangs de rue sur le territoire de Montréal fut utilisé pour la période de 2005 à 2007. L’échantillon est composé de 470 hommes, âgés de 18 à 59 ans. Au terme des analyses, le modèle le plus adéquat est le modèle binomial négatif puisque celui-ci produit des résultats significatifs, s’adapte bien aux données observées et produit une proportion de zéro très similaire à celle observée.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We introduce in this paper a new class of discrete generalized nonlinear models to extend the binomial, Poisson and negative binomial models to cope with count data. This class of models includes some important models such as log-nonlinear models, logit, probit and negative binomial nonlinear models, generalized Poisson and generalized negative binomial regression models, among other models, which enables the fitting of a wide range of models to count data. We derive an iterative process for fitting these models by maximum likelihood and discuss inference on the parameters. The usefulness of the new class of models is illustrated with an application to a real data set. (C) 2008 Elsevier B.V. All rights reserved.