951 resultados para Zero-inflated Binomial (zib) Model
Resumo:
The use of presence/absence data in wildlife management and biological surveys is widespread. There is a growing interest in quantifying the sources of error associated with these data. We show that false-negative errors (failure to record a species when in fact it is present) can have a significant impact on statistical estimation of habitat models using simulated data. Then we introduce an extension of logistic modeling, the zero-inflated binomial (ZIB) model that permits the estimation of the rate of false-negative errors and the correction of estimates of the probability of occurrence for false-negative errors by using repeated. visits to the same site. Our simulations show that even relatively low rates of false negatives bias statistical estimates of habitat effects. The method with three repeated visits eliminates the bias, but estimates are relatively imprecise. Six repeated visits improve precision of estimates to levels comparable to that achieved with conventional statistics in the absence of false-negative errors In general, when error rates are less than or equal to50% greater efficiency is gained by adding more sites, whereas when error rates are >50% it is better to increase the number of repeated visits. We highlight the flexibility of the method with three case studies, clearly demonstrating the effect of false-negative errors for a range of commonly used survey methods.
Resumo:
This paper is concerned with the analysis of zero-inflated count data when time of exposure varies. It proposes a modified zero-inflated count data model where the probability of an extra zero is derived from an underlying duration model with Weibull hazard rate. The new model is compared to the standard Poisson model with logit zero inflation in an application to the effect of treatment with thiotepa on the number of new bladder tumors.
Resumo:
- Context: Pinus pinea L. presents serious problems of natural regeneration in managed forest of Central Spain. The species exhibits specific traits linked to frugivore activity. Therefore, information on plant–animal interactions may be crucial to understand regeneration failure. - Aims: Determining the spatio-temporal pattern of P. pinea seed predation by Apodemus sylvaticus L. and the factors involved. Exploring the importance of A. sylvaticus L. as a disperser of P. pinea. Identifying other frugivores and their seasonal patterns. - Methods: An intensive 24-month seed predation trial was carried out. The probability of seeds escaping predation was modelled through a zero-inflated binomial mixed model. Experiments on seed dispersal by A. sylvaticus were conducted. Cameras were set up to identify other potential frugivores. - Results: Decreasing rodent population in summer and masting enhances seed survival. Seeds were exploited more rapidly nearby parent trees and shelters. A. sylvaticus dispersal activity was found to be scarce. Corvids marginally preyed upon P. pinea seeds. - Conclusions: Survival of P. pinea seeds is climate-controlled through the timing of the dry period together with masting occurrence. Should germination not take place during the survival period, establishment may be limited. A. sylvaticus mediated dispersal does not modify the seed shadow. Seasonality of corvid activity points to a role of corvids in dispersal.
Resumo:
There has been considerable research conducted over the last 20 years focused on predicting motor vehicle crashes on transportation facilities. The range of statistical models commonly applied includes binomial, Poisson, Poisson-gamma (or negative binomial), zero-inflated Poisson and negative binomial models (ZIP and ZINB), and multinomial probability models. Given the range of possible modeling approaches and the host of assumptions with each modeling approach, making an intelligent choice for modeling motor vehicle crash data is difficult. There is little discussion in the literature comparing different statistical modeling approaches, identifying which statistical models are most appropriate for modeling crash data, and providing a strong justification from basic crash principles. In the recent literature, it has been suggested that the motor vehicle crash process can successfully be modeled by assuming a dual-state data-generating process, which implies that entities (e.g., intersections, road segments, pedestrian crossings, etc.) exist in one of two states—perfectly safe and unsafe. As a result, the ZIP and ZINB are two models that have been applied to account for the preponderance of “excess” zeros frequently observed in crash count data. The objective of this study is to provide defensible guidance on how to appropriate model crash data. We first examine the motor vehicle crash process using theoretical principles and a basic understanding of the crash process. It is shown that the fundamental crash process follows a Bernoulli trial with unequal probability of independent events, also known as Poisson trials. We examine the evolution of statistical models as they apply to the motor vehicle crash process, and indicate how well they statistically approximate the crash process. We also present the theory behind dual-state process count models, and note why they have become popular for modeling crash data. A simulation experiment is then conducted to demonstrate how crash data give rise to “excess” zeros frequently observed in crash data. It is shown that the Poisson and other mixed probabilistic structures are approximations assumed for modeling the motor vehicle crash process. Furthermore, it is demonstrated that under certain (fairly common) circumstances excess zeros are observed—and that these circumstances arise from low exposure and/or inappropriate selection of time/space scales and not an underlying dual state process. In conclusion, carefully selecting the time/space scales for analysis, including an improved set of explanatory variables and/or unobserved heterogeneity effects in count regression models, or applying small-area statistical methods (observations with low exposure) represent the most defensible modeling approaches for datasets with a preponderance of zeros
Resumo:
The intent of this note is to succinctly articulate additional points that were not provided in the original paper (Lord et al., 2005) and to help clarify a collective reluctance to adopt zero-inflated (ZI) models for modeling highway safety data. A dialogue on this important issue, just one of many important safety modeling issues, is healthy discourse on the path towards improved safety modeling. This note first provides a summary of prior findings and conclusions of the original paper. It then presents two critical and relevant issues: the maximizing statistical fit fallacy and logic problems with the ZI model in highway safety modeling. Finally, we provide brief conclusions.
Resumo:
Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)
Resumo:
Environmental data are spatial, temporal, and often come with many zeros. In this paper, we included space–time random effects in zero-inflated Poisson (ZIP) and ‘hurdle’ models to investigate haulout patterns of harbor seals on glacial ice. The data consisted of counts, for 18 dates on a lattice grid of samples, of harbor seals hauled out on glacial ice in Disenchantment Bay, near Yakutat, Alaska. A hurdle model is similar to a ZIP model except it does not mix zeros from the binary and count processes. Both models can be used for zero-inflated data, and we compared space–time ZIP and hurdle models in a Bayesian hierarchical model. Space–time ZIP and hurdle models were constructed by using spatial conditional autoregressive (CAR) models and temporal first-order autoregressive (AR(1)) models as random effects in ZIP and hurdle regression models. We created maps of smoothed predictions for harbor seal counts based on ice density, other covariates, and spatio-temporal random effects. For both models predictions around the edges appeared to be positively biased. The linex loss function is an asymmetric loss function that penalizes overprediction more than underprediction, and we used it to correct for prediction bias to get the best map for space–time ZIP and hurdle models.
Resumo:
Count data with excess zeros relative to a Poisson distribution are common in many biomedical applications. A popular approach to the analysis of such data is to use a zero-inflated Poisson (ZIP) regression model. Often, because of the hierarchical Study design or the data collection procedure, zero-inflation and lack of independence may occur simultaneously, which tender the standard ZIP model inadequate. To account for the preponderance of zero counts and the inherent correlation of observations, a class of multi-level ZIP regression model with random effects is presented. Model fitting is facilitated using an expectation-maximization algorithm, whereas variance components are estimated via residual maximum likelihood estimating equations. A score test for zero-inflation is also presented. The multi-level ZIP model is then generalized to cope with a more complex correlation structure. Application to the analysis of correlated count data from a longitudinal infant feeding study illustrates the usefulness of the approach.
Resumo:
At least two important transportation planning activities rely on planning-level crash prediction models. One is motivated by the Transportation Equity Act for the 21st Century, which requires departments of transportation and metropolitan planning organizations to consider safety explicitly in the transportation planning process. The second could arise from a need for state agencies to establish incentive programs to reduce injuries and save lives. Both applications require a forecast of safety for a future period. Planning-level crash prediction models for the Tucson, Arizona, metropolitan region are presented to demonstrate the feasibility of such models. Data were separated into fatal, injury, and property-damage crashes. To accommodate overdispersion in the data, negative binomial regression models were applied. To accommodate the simultaneity of fatality and injury crash outcomes, simultaneous estimation of the models was conducted. All models produce crash forecasts at the traffic analysis zone level. Statistically significant (p-values < 0.05) and theoretically meaningful variables for the fatal crash model included population density, persons 17 years old or younger as a percentage of the total population, and intersection density. Significant variables for the injury and property-damage crash models were population density, number of employees, intersections density, percentage of miles of principal arterial, percentage of miles of minor arterials, and percentage of miles of urban collectors. Among several conclusions it is suggested that planning-level safety models are feasible and may play a role in future planning activities. However, caution must be exercised with such models.
Resumo:
Poisson distribution has often been used for count like accident data. Negative Binomial (NB) distribution has been adopted in the count data to take care of the over-dispersion problem. However, Poisson and NB distributions are incapable of taking into account some unobserved heterogeneities due to spatial and temporal effects of accident data. To overcome this problem, Random Effect models have been developed. Again another challenge with existing traffic accident prediction models is the distribution of excess zero accident observations in some accident data. Although Zero-Inflated Poisson (ZIP) model is capable of handling the dual-state system in accident data with excess zero observations, it does not accommodate the within-location correlation and between-location correlation heterogeneities which are the basic motivations for the need of the Random Effect models. This paper proposes an effective way of fitting ZIP model with location specific random effects and for model calibration and assessment the Bayesian analysis is recommended.
Resumo:
Given a reproducing kernel Hilbert space (H,〈.,.〉)(H,〈.,.〉) of real-valued functions and a suitable measure μμ over the source space D⊂RD⊂R, we decompose HH as the sum of a subspace of centered functions for μμ and its orthogonal in HH. This decomposition leads to a special case of ANOVA kernels, for which the functional ANOVA representation of the best predictor can be elegantly derived, either in an interpolation or regularization framework. The proposed kernels appear to be particularly convenient for analyzing the effect of each (group of) variable(s) and computing sensitivity indices without recursivity.
Resumo:
The relationship between workplace absenteeism and adverse lifestyle factors (smoking, physical inactivity and poor dietary patterns) remains ambiguous. Reliance on self-reported absenteeism and obesity measures may contribute to this uncertainty. Using objective absenteeism and health status measures, the present study aimed to investigate what health status outcomes and lifestyle factors influence workplace absenteeism. Cross-sectional data were obtained from a complex workplace dietary intervention trial, the Food Choice at Work Study. Four multinational manufacturing workplaces in Cork, Republic of Ireland. Participants included 540 randomly selected employees from the four workplaces. Annual count absenteeism data were collected. Physical assessments included objective health status measures (BMI, midway waist circumference and blood pressure). FFQ measured diet quality from which DASH (Dietary Approaches to Stop Hypertension) scores were constructed. A zero-inflated negative binomial (zinb) regression model examined associations between health status outcomes, lifestyle characteristics and absenteeism. The mean number of absences was 2·5 (sd 4·5) d. After controlling for sociodemographic and lifestyle characteristics, the zinb model indicated that absenteeism was positively associated with central obesity, increasing expected absence rate by 72 %. Consuming a high-quality diet and engaging in moderate levels of physical activity were negatively associated with absenteeism and reduced expected frequency by 50 % and 36 %, respectively. Being in a managerial/supervisory position also reduced expected frequency by 50 %. To reduce absenteeism, workplace health promotion policies should incorporate recommendations designed to prevent and manage excess weight, improve diet quality and increase physical activity levels of employees.
Resumo:
Certain characteristics of some vegetable crops allow multiple harvests during the production cycle; however, to our knowledge, no study has described the behavior of fruit production with progression of the production cycle in vegetable crops with multiple harvests that present data overdispersion. We aimed to characterize the data overdispersion of zero-inflated variables and identify the behavior of these variables during the production cycle of several vegetable crops with multiple harvests. Data from 11 uniformity trials were used without applying treatments; these comprise the database from the Experimental Plants Group at the Federal University of Santa Maria, Brazil. The trials were conducted using four horticultural species grown during different cultivation seasons, cultivation environments, and experimental structures. Although at each harvest, a larger number of basic units with harvest fruit was observed than units without harvest fruit, the basic unit percentage without fruit was high, generating an overdispersion within each individual harvest. The variability within each harvest was high and increased with the evolution of the production cycle of Capsicum annuum, Solanum lycopersicum var. cerasiforme, Phaseolus vulgaris, and Cucurbita pepo species. However, the correlation coefficient between the mean weight and number of harvest fruits tended to remain constant during the crop production cycle. These behaviors show that harvest management should be done individually, at each harvest, such that data overdispersion is reduced.
Resumo:
S. japonicum infection is believed to be endemic in 28 of the 80 provinces of the Philippines and the most recent data on schistosomiasis prevalence have shown considerable variability between provinces. In order to increase the efficient allocation of parasitic disease control resources in the country, we aimed to describe the small scale spatial variation in S. japonicum prevalence across the Philippines, quantify the role of the physical environment in driving the spatial variation of S. japonicum, and develop a predictive risk map of S. japonicum infection. Data on S. japonicum infection from 35,754 individuals across the country were geo-located at the barangay level and included in the analysis. The analysis was then stratified geographically for Luzon, the Visayas and Mindanao. Zero-inflated binomial Bayesian geostatistical models of S. japonicum prevalence were developed and diagnostic uncertainty was incorporated. Results of the analysis show that in the three regions, males and individuals aged ≥ 20 years had significantly higher prevalence of S. japonicum compared with females and children <5 years. The role of the environmental variables differed between regions of the Philippines. S. japonicum infection was widespread in the Visayas whereas it was much more focal in Luzon and Mindanao. This analysis revealed significant spatial variation in prevalence of S. japonicum infection in the Philippines. This suggests that a spatially targeted approach to schistosomiasis interventions, including mass drug administration, is warranted. When financially possible, additional schistosomiasis surveys should be prioritized to areas identified to be at high risk, but which were underrepresented in our dataset.