984 resultados para Count Data


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Mode of access: Internet.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

To account for the preponderance of zero counts and simultaneous correlation of observations, a class of zero-inflated Poisson mixed regression models is applicable for accommodating the within-cluster dependence. In this paper, a score test for zero-inflation is developed for assessing correlated count data with excess zeros. The sampling distribution and the power of the test statistic are evaluated by simulation studies. The results show that the test statistic performs satisfactorily under a wide range of conditions. The test procedure is further illustrated using a data set on recurrent urinary tract infections. Copyright (c) 2005 John Wiley & Sons, Ltd.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Count data with excess zeros relative to a Poisson distribution are common in many biomedical applications. A popular approach to the analysis of such data is to use a zero-inflated Poisson (ZIP) regression model. Often, because of the hierarchical Study design or the data collection procedure, zero-inflation and lack of independence may occur simultaneously, which tender the standard ZIP model inadequate. To account for the preponderance of zero counts and the inherent correlation of observations, a class of multi-level ZIP regression model with random effects is presented. Model fitting is facilitated using an expectation-maximization algorithm, whereas variance components are estimated via residual maximum likelihood estimating equations. A score test for zero-inflation is also presented. The multi-level ZIP model is then generalized to cope with a more complex correlation structure. Application to the analysis of correlated count data from a longitudinal infant feeding study illustrates the usefulness of the approach.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A class of multi-process models is developed for collections of time indexed count data. Autocorrelation in counts is achieved with dynamic models for the natural parameter of the binomial distribution. In addition to modeling binomial time series, the framework includes dynamic models for multinomial and Poisson time series. Markov chain Monte Carlo (MCMC) and Po ́lya-Gamma data augmentation (Polson et al., 2013) are critical for fitting multi-process models of counts. To facilitate computation when the counts are high, a Gaussian approximation to the P ́olya- Gamma random variable is developed.

Three applied analyses are presented to explore the utility and versatility of the framework. The first analysis develops a model for complex dynamic behavior of themes in collections of text documents. Documents are modeled as a “bag of words”, and the multinomial distribution is used to characterize uncertainty in the vocabulary terms appearing in each document. State-space models for the natural parameters of the multinomial distribution induce autocorrelation in themes and their proportional representation in the corpus over time.

The second analysis develops a dynamic mixed membership model for Poisson counts. The model is applied to a collection of time series which record neuron level firing patterns in rhesus monkeys. The monkey is exposed to two sounds simultaneously, and Gaussian processes are used to smoothly model the time-varying rate at which the neuron’s firing pattern fluctuates between features associated with each sound in isolation.

The third analysis presents a switching dynamic generalized linear model for the time-varying home run totals of professional baseball players. The model endows each player with an age specific latent natural ability class and a performance enhancing drug (PED) use indicator. As players age, they randomly transition through a sequence of ability classes in a manner consistent with traditional aging patterns. When the performance of the player significantly deviates from the expected aging pattern, he is identified as a player whose performance is consistent with PED use.

All three models provide a mechanism for sharing information across related series locally in time. The models are fit with variations on the P ́olya-Gamma Gibbs sampler, MCMC convergence diagnostics are developed, and reproducible inference is emphasized throughout the dissertation.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The study of forest re activity, in its several aspects, is essencial to understand the phenomenon and to prevent environmental public catastrophes. In this context the analysis of monthly number of res along several years is one aspect to have into account in order to better comprehend this tematic. The goal of this work is to analyze the monthly number of forest res in the neighboring districts of Aveiro and Coimbra, Portugal, through dynamic factor models for bivariate count series. We use a bayesian approach, through MCMC methods, to estimate the model parameters as well as to estimate the common latent factor to both series.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

We explore the determinants of usage of six different types of health care services, using the Medical Expenditure Panel Survey data, years 1996-2000. We apply a number of models for univariate count data, including semiparametric, semi-nonparametric and finite mixture models. We find that the complexity of the model that is required to fit the data well depends upon the way in which the data is pooled across sexes and over time, and upon the characteristics of the usage measure. Pooling across time and sexes is almost always favored, but when more heterogeneous data is pooled it is often the case that a more complex statistical model is required.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

This paper develops stochastic search variable selection (SSVS) for zero-inflated count models which are commonly used in health economics. This allows for either model averaging or model selection in situations with many potential regressors. The proposed techniques are applied to a data set from Germany considering the demand for health care. A package for the free statistical software environment R is provided.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The log-ratio methodology makes available powerful tools for analyzing compositionaldata. Nevertheless, the use of this methodology is only possible for those data setswithout null values. Consequently, in those data sets where the zeros are present, aprevious treatment becomes necessary. Last advances in the treatment of compositionalzeros have been centered especially in the zeros of structural nature and in the roundedzeros. These tools do not contemplate the particular case of count compositional datasets with null values. In this work we deal with \count zeros" and we introduce atreatment based on a mixed Bayesian-multiplicative estimation. We use the Dirichletprobability distribution as a prior and we estimate the posterior probabilities. Then weapply a multiplicative modi¯cation for the non-zero values. We present a case studywhere this new methodology is applied.Key words: count data, multiplicative replacement, composition, log-ratio analysis

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The log-ratio methodology makes available powerful tools for analyzing compositional data. Nevertheless, the use of this methodology is only possible for those data sets without null values. Consequently, in those data sets where the zeros are present, a previous treatment becomes necessary. Last advances in the treatment of compositional zeros have been centered especially in the zeros of structural nature and in the rounded zeros. These tools do not contemplate the particular case of count compositional data sets with null values. In this work we deal with \count zeros" and we introduce a treatment based on a mixed Bayesian-multiplicative estimation. We use the Dirichlet probability distribution as a prior and we estimate the posterior probabilities. Then we apply a multiplicative modi¯cation for the non-zero values. We present a case study where this new methodology is applied. Key words: count data, multiplicative replacement, composition, log-ratio analysis

Relevância:

70.00% 70.00%

Publicador:

Resumo:

A study to monitor boreal songbird trends was initiated in 1998 in a relatively undisturbed and remote part of the boreal forest in the Northwest Territories, Canada. Eight years of point count data were collected over the 14 years of the study, 1998-2011. Trends were estimated for 50 bird species using generalized linear mixed-effects models, with random effects to account for temporal (repeat sampling within years) and spatial (stations within stands) autocorrelation and variability associated with multiple observers. We tested whether regional and national Breeding Bird Survey (BBS) trends could, on average, predict trends in our study area. Significant increases in our study area outnumbered decreases by 12 species to 6, an opposite pattern compared to Alberta (6 versus 15, respectively) and Canada (9 versus 20). Twenty-two species with relatively precise trend estimates (precision to detect > 30% decline in 10 years; observed SE ≤ 3.7%/year) showed nonsignificant trends, similar to Alberta (24) and Canada (20). Precision-weighted trends for a sample of 19 species with both reliable trends at our site and small portions of their range covered by BBS in Canada were, on average, more negative for Alberta (1.34% per year lower) and for Canada (1.15% per year lower) relative to Fort Liard, though 95% credible intervals still contained zero. We suggest that part of the differences could be attributable to local resource pulses (insect outbreak). However, we also suggest that the tendency for BBS route coverage to disproportionately sample more southerly, developed areas in the boreal forest could result in BBS trends that are not representative of range-wide trends for species whose range is centred farther north.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Estimation of population size with missing zero-class is an important problem that is encountered in epidemiological assessment studies. Fitting a Poisson model to the observed data by the method of maximum likelihood and estimation of the population size based on this fit is an approach that has been widely used for this purpose. In practice, however, the Poisson assumption is seldom satisfied. Zelterman (1988) has proposed a robust estimator for unclustered data that works well in a wide class of distributions applicable for count data. In the work presented here, we extend this estimator to clustered data. The estimator requires fitting a zero-truncated homogeneous Poisson model by maximum likelihood and thereby using a Horvitz-Thompson estimator of population size. This was found to work well, when the data follow the hypothesized homogeneous Poisson model. However, when the true distribution deviates from the hypothesized model, the population size was found to be underestimated. In the search of a more robust estimator, we focused on three models that use all clusters with exactly one case, those clusters with exactly two cases and those with exactly three cases to estimate the probability of the zero-class and thereby use data collected on all the clusters in the Horvitz-Thompson estimator of population size. Loss in efficiency associated with gain in robustness was examined based on a simulation study. As a trade-off between gain in robustness and loss in efficiency, the model that uses data collected on clusters with at most three cases to estimate the probability of the zero-class was found to be preferred in general. In applications, we recommend obtaining estimates from all three models and making a choice considering the estimates from the three models, robustness and the loss in efficiency. (© 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim)

Relevância:

70.00% 70.00%

Publicador:

Resumo:

This article is about modeling count data with zero truncation. A parametric count density family is considered. The truncated mixture of densities from this family is different from the mixture of truncated densities from the same family. Whereas the former model is more natural to formulate and to interpret, the latter model is theoretically easier to treat. It is shown that for any mixing distribution leading to a truncated mixture, a (usually different) mixing distribution can be found so. that the associated mixture of truncated densities equals the truncated mixture, and vice versa. This implies that the likelihood surfaces for both situations agree, and in this sense both models are equivalent. Zero-truncated count data models are used frequently in the capture-recapture setting to estimate population size, and it can be shown that the two Horvitz-Thompson estimators, associated with the two models, agree. In particular, it is possible to achieve strong results for mixtures of truncated Poisson densities, including reliable, global construction of the unique NPMLE (nonparametric maximum likelihood estimator) of the mixing distribution, implying a unique estimator for the population size. The benefit of these results lies in the fact that it is valid to work with the mixture of truncated count densities, which is less appealing for the practitioner but theoretically easier. Mixtures of truncated count densities form a convex linear model, for which a developed theory exists, including global maximum likelihood theory as well as algorithmic approaches. Once the problem has been solved in this class, it might readily be transformed back to the original problem by means of an explicitly given mapping. Applications of these ideas are given, particularly in the case of the truncated Poisson family.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Thesis (Master's)--University of Washington, 2016-06

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Size distributions in woody plant populations have been used to assess their regeneration status, assuming that size structures with reverse-J shapes represent stable populations. We present an empirical approach of this issue using five woody species from the Cerrado. Considering count data for all plants of these five species over a 12-year period, we analyzed size distribution by: a) plotting frequency distributions and their adjustment to the negative exponential curve and b) calculating the Gini coefficient. To look for a relationship between size structure and future trends, we considered the size structures from the first census year. We analyzed changes in number over time and performed a simple population viability analysis, which gives the mean population growth rate, its variance and the probability of extinction in a given time period. Frequency distributions and the Gini coefficient were not able to predict future trends in population numbers. We recommend that managers should not use measures of size structure as a basis for management decisions without applying more appropriate demographic studies.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The zero-inflated negative binomial model is used to account for overdispersion detected in data that are initially analyzed under the zero-Inflated Poisson model A frequentist analysis a jackknife estimator and a non-parametric bootstrap for parameter estimation of zero-inflated negative binomial regression models are considered In addition an EM-type algorithm is developed for performing maximum likelihood estimation Then the appropriate matrices for assessing local influence on the parameter estimates under different perturbation schemes and some ways to perform global influence analysis are derived In order to study departures from the error assumption as well as the presence of outliers residual analysis based on the standardized Pearson residuals is discussed The relevance of the approach is illustrated with a real data set where It is shown that zero-inflated negative binomial regression models seems to fit the data better than the Poisson counterpart (C) 2010 Elsevier B V All rights reserved