981 resultados para Star-count Data
Resumo:
Mode of access: Internet.
Resumo:
To account for the preponderance of zero counts and simultaneous correlation of observations, a class of zero-inflated Poisson mixed regression models is applicable for accommodating the within-cluster dependence. In this paper, a score test for zero-inflation is developed for assessing correlated count data with excess zeros. The sampling distribution and the power of the test statistic are evaluated by simulation studies. The results show that the test statistic performs satisfactorily under a wide range of conditions. The test procedure is further illustrated using a data set on recurrent urinary tract infections. Copyright (c) 2005 John Wiley & Sons, Ltd.
Resumo:
Count data with excess zeros relative to a Poisson distribution are common in many biomedical applications. A popular approach to the analysis of such data is to use a zero-inflated Poisson (ZIP) regression model. Often, because of the hierarchical Study design or the data collection procedure, zero-inflation and lack of independence may occur simultaneously, which tender the standard ZIP model inadequate. To account for the preponderance of zero counts and the inherent correlation of observations, a class of multi-level ZIP regression model with random effects is presented. Model fitting is facilitated using an expectation-maximization algorithm, whereas variance components are estimated via residual maximum likelihood estimating equations. A score test for zero-inflation is also presented. The multi-level ZIP model is then generalized to cope with a more complex correlation structure. Application to the analysis of correlated count data from a longitudinal infant feeding study illustrates the usefulness of the approach.
Resumo:
A class of multi-process models is developed for collections of time indexed count data. Autocorrelation in counts is achieved with dynamic models for the natural parameter of the binomial distribution. In addition to modeling binomial time series, the framework includes dynamic models for multinomial and Poisson time series. Markov chain Monte Carlo (MCMC) and Po ́lya-Gamma data augmentation (Polson et al., 2013) are critical for fitting multi-process models of counts. To facilitate computation when the counts are high, a Gaussian approximation to the P ́olya- Gamma random variable is developed.
Three applied analyses are presented to explore the utility and versatility of the framework. The first analysis develops a model for complex dynamic behavior of themes in collections of text documents. Documents are modeled as a “bag of words”, and the multinomial distribution is used to characterize uncertainty in the vocabulary terms appearing in each document. State-space models for the natural parameters of the multinomial distribution induce autocorrelation in themes and their proportional representation in the corpus over time.
The second analysis develops a dynamic mixed membership model for Poisson counts. The model is applied to a collection of time series which record neuron level firing patterns in rhesus monkeys. The monkey is exposed to two sounds simultaneously, and Gaussian processes are used to smoothly model the time-varying rate at which the neuron’s firing pattern fluctuates between features associated with each sound in isolation.
The third analysis presents a switching dynamic generalized linear model for the time-varying home run totals of professional baseball players. The model endows each player with an age specific latent natural ability class and a performance enhancing drug (PED) use indicator. As players age, they randomly transition through a sequence of ability classes in a manner consistent with traditional aging patterns. When the performance of the player significantly deviates from the expected aging pattern, he is identified as a player whose performance is consistent with PED use.
All three models provide a mechanism for sharing information across related series locally in time. The models are fit with variations on the P ́olya-Gamma Gibbs sampler, MCMC convergence diagnostics are developed, and reproducible inference is emphasized throughout the dissertation.
Resumo:
The study of forest re activity, in its several aspects, is essencial to understand the phenomenon and to prevent environmental public catastrophes. In this context the analysis of monthly number of res along several years is one aspect to have into account in order to better comprehend this tematic. The goal of this work is to analyze the monthly number of forest res in the neighboring districts of Aveiro and Coimbra, Portugal, through dynamic factor models for bivariate count series. We use a bayesian approach, through MCMC methods, to estimate the model parameters as well as to estimate the common latent factor to both series.
Resumo:
The Edinburgh-Cape Blue Object Survey is a major survey to discover blue stellar objects brighter than B similar to 18 in the southern sky. It is planned to cover an area of sky of 10 000 deg(2) with \b\ > 30 degrees and delta < 0 degrees. The blue stellar objects are selected by automatic techniques from U and B pairs of UK Schmidt Telescope plates scanned with the COSMOS measuring machine. Follow-up photometry and spectroscopy are being obtained with the SAAO telescopes to classify objects brighter than B = 16.5. This paper describes the survey, the techniques used to extract the blue stellar objects, the photometric methods and accuracy, the spectroscopic classification, and the limits and completeness of the survey.
Resumo:
Context. The understanding of Galaxy evolution can be facilitated by the use of population synthesis models, which allow to test hypotheses on the star formation history, star evolution, as well as chemical and dynamical evolution of the Galaxy. Aims. The new version of the Besanc¸on Galaxy Model (hereafter BGM) aims to provide a more flexible and powerful tool to investigate the Initial Mass Function (IMF) and Star Formation Rate (SFR) of the Galactic disc. Methods. We present a new strategy for the generation of thin disc stars which assumes the IMF, SFR and evolutionary tracks as free parameters. We have updated most of the ingredients for the star count production and, for the first time, binary stars are generated in a consistent way. We keep in this new scheme the local dynamical self-consistency as in Bienayme et al (1987). We then compare simulations from the new model with Tycho-2 data and the local luminosity function, as a first test to verify and constrain the new ingredients. The effects of changing thirteen different ingredients of the model are systematically studied. Results. For the first time, a full sky comparison is performed between BGM and data. This strategy allows to constrain the IMF slope at high masses which is found to be close to 3.0, excluding a shallower slope such as Salpeter"s one. The SFR is found decreasing whatever IMF is assumed. The model is compatible with a local dark matter density of 0.011 M pc−3 implying that there is no compelling evidence for significant amount of dark matter in the disc. While the model is fitted to Tycho2 data, a magnitude limited sample with V<11, we check that it is still consistent with fainter stars. Conclusions. The new model constitutes a new basis for further comparisons with large scale surveys and is being prepared to become a powerful tool for the analysis of the Gaia mission data.
Resumo:
Context. The understanding of Galaxy evolution can be facilitated by the use of population synthesis models, which allow to test hypotheses on the star formation history, star evolution, as well as chemical and dynamical evolution of the Galaxy. Aims. The new version of the Besanc¸on Galaxy Model (hereafter BGM) aims to provide a more flexible and powerful tool to investigate the Initial Mass Function (IMF) and Star Formation Rate (SFR) of the Galactic disc. Methods. We present a new strategy for the generation of thin disc stars which assumes the IMF, SFR and evolutionary tracks as free parameters. We have updated most of the ingredients for the star count production and, for the first time, binary stars are generated in a consistent way. We keep in this new scheme the local dynamical self-consistency as in Bienayme et al (1987). We then compare simulations from the new model with Tycho-2 data and the local luminosity function, as a first test to verify and constrain the new ingredients. The effects of changing thirteen different ingredients of the model are systematically studied. Results. For the first time, a full sky comparison is performed between BGM and data. This strategy allows to constrain the IMF slope at high masses which is found to be close to 3.0, excluding a shallower slope such as Salpeter"s one. The SFR is found decreasing whatever IMF is assumed. The model is compatible with a local dark matter density of 0.011 M pc−3 implying that there is no compelling evidence for significant amount of dark matter in the disc. While the model is fitted to Tycho2 data, a magnitude limited sample with V<11, we check that it is still consistent with fainter stars. Conclusions. The new model constitutes a new basis for further comparisons with large scale surveys and is being prepared to become a powerful tool for the analysis of the Gaia mission data.
Resumo:
We explore the determinants of usage of six different types of health care services, using the Medical Expenditure Panel Survey data, years 1996-2000. We apply a number of models for univariate count data, including semiparametric, semi-nonparametric and finite mixture models. We find that the complexity of the model that is required to fit the data well depends upon the way in which the data is pooled across sexes and over time, and upon the characteristics of the usage measure. Pooling across time and sexes is almost always favored, but when more heterogeneous data is pooled it is often the case that a more complex statistical model is required.
Resumo:
This paper develops stochastic search variable selection (SSVS) for zero-inflated count models which are commonly used in health economics. This allows for either model averaging or model selection in situations with many potential regressors. The proposed techniques are applied to a data set from Germany considering the demand for health care. A package for the free statistical software environment R is provided.
Resumo:
The log-ratio methodology makes available powerful tools for analyzing compositionaldata. Nevertheless, the use of this methodology is only possible for those data setswithout null values. Consequently, in those data sets where the zeros are present, aprevious treatment becomes necessary. Last advances in the treatment of compositionalzeros have been centered especially in the zeros of structural nature and in the roundedzeros. These tools do not contemplate the particular case of count compositional datasets with null values. In this work we deal with \count zeros" and we introduce atreatment based on a mixed Bayesian-multiplicative estimation. We use the Dirichletprobability distribution as a prior and we estimate the posterior probabilities. Then weapply a multiplicative modi¯cation for the non-zero values. We present a case studywhere this new methodology is applied.Key words: count data, multiplicative replacement, composition, log-ratio analysis
Resumo:
The log-ratio methodology makes available powerful tools for analyzing compositional data. Nevertheless, the use of this methodology is only possible for those data sets without null values. Consequently, in those data sets where the zeros are present, a previous treatment becomes necessary. Last advances in the treatment of compositional zeros have been centered especially in the zeros of structural nature and in the rounded zeros. These tools do not contemplate the particular case of count compositional data sets with null values. In this work we deal with \count zeros" and we introduce a treatment based on a mixed Bayesian-multiplicative estimation. We use the Dirichlet probability distribution as a prior and we estimate the posterior probabilities. Then we apply a multiplicative modi¯cation for the non-zero values. We present a case study where this new methodology is applied. Key words: count data, multiplicative replacement, composition, log-ratio analysis
Resumo:
A study to monitor boreal songbird trends was initiated in 1998 in a relatively undisturbed and remote part of the boreal forest in the Northwest Territories, Canada. Eight years of point count data were collected over the 14 years of the study, 1998-2011. Trends were estimated for 50 bird species using generalized linear mixed-effects models, with random effects to account for temporal (repeat sampling within years) and spatial (stations within stands) autocorrelation and variability associated with multiple observers. We tested whether regional and national Breeding Bird Survey (BBS) trends could, on average, predict trends in our study area. Significant increases in our study area outnumbered decreases by 12 species to 6, an opposite pattern compared to Alberta (6 versus 15, respectively) and Canada (9 versus 20). Twenty-two species with relatively precise trend estimates (precision to detect > 30% decline in 10 years; observed SE ≤ 3.7%/year) showed nonsignificant trends, similar to Alberta (24) and Canada (20). Precision-weighted trends for a sample of 19 species with both reliable trends at our site and small portions of their range covered by BBS in Canada were, on average, more negative for Alberta (1.34% per year lower) and for Canada (1.15% per year lower) relative to Fort Liard, though 95% credible intervals still contained zero. We suggest that part of the differences could be attributable to local resource pulses (insect outbreak). However, we also suggest that the tendency for BBS route coverage to disproportionately sample more southerly, developed areas in the boreal forest could result in BBS trends that are not representative of range-wide trends for species whose range is centred farther north.
Resumo:
Estimation of population size with missing zero-class is an important problem that is encountered in epidemiological assessment studies. Fitting a Poisson model to the observed data by the method of maximum likelihood and estimation of the population size based on this fit is an approach that has been widely used for this purpose. In practice, however, the Poisson assumption is seldom satisfied. Zelterman (1988) has proposed a robust estimator for unclustered data that works well in a wide class of distributions applicable for count data. In the work presented here, we extend this estimator to clustered data. The estimator requires fitting a zero-truncated homogeneous Poisson model by maximum likelihood and thereby using a Horvitz-Thompson estimator of population size. This was found to work well, when the data follow the hypothesized homogeneous Poisson model. However, when the true distribution deviates from the hypothesized model, the population size was found to be underestimated. In the search of a more robust estimator, we focused on three models that use all clusters with exactly one case, those clusters with exactly two cases and those with exactly three cases to estimate the probability of the zero-class and thereby use data collected on all the clusters in the Horvitz-Thompson estimator of population size. Loss in efficiency associated with gain in robustness was examined based on a simulation study. As a trade-off between gain in robustness and loss in efficiency, the model that uses data collected on clusters with at most three cases to estimate the probability of the zero-class was found to be preferred in general. In applications, we recommend obtaining estimates from all three models and making a choice considering the estimates from the three models, robustness and the loss in efficiency. (© 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim)
Resumo:
This article is about modeling count data with zero truncation. A parametric count density family is considered. The truncated mixture of densities from this family is different from the mixture of truncated densities from the same family. Whereas the former model is more natural to formulate and to interpret, the latter model is theoretically easier to treat. It is shown that for any mixing distribution leading to a truncated mixture, a (usually different) mixing distribution can be found so. that the associated mixture of truncated densities equals the truncated mixture, and vice versa. This implies that the likelihood surfaces for both situations agree, and in this sense both models are equivalent. Zero-truncated count data models are used frequently in the capture-recapture setting to estimate population size, and it can be shown that the two Horvitz-Thompson estimators, associated with the two models, agree. In particular, it is possible to achieve strong results for mixtures of truncated Poisson densities, including reliable, global construction of the unique NPMLE (nonparametric maximum likelihood estimator) of the mixing distribution, implying a unique estimator for the population size. The benefit of these results lies in the fact that it is valid to work with the mixture of truncated count densities, which is less appealing for the practitioner but theoretically easier. Mixtures of truncated count densities form a convex linear model, for which a developed theory exists, including global maximum likelihood theory as well as algorithmic approaches. Once the problem has been solved in this class, it might readily be transformed back to the original problem by means of an explicitly given mapping. Applications of these ideas are given, particularly in the case of the truncated Poisson family.