997 resultados para hurdle model
Resumo:
Motivated by the analysis of the Australian Grain Insect Resistance Database (AGIRD), we develop a Bayesian hurdle modelling approach to assess trends in strong resistance of stored grain insects to phosphine over time. The binary response variable from AGIRD indicating presence or absence of strong resistance is characterized by a majority of absence observations and the hurdle model is a two step approach that is useful when analyzing such a binary response dataset. The proposed hurdle model utilizes Bayesian classification trees to firstly identify covariates and covariate levels pertaining to possible presence or absence of strong resistance. Secondly, generalized additive models (GAMs) with spike and slab priors for variable selection are fitted to the subset of the dataset identified from the Bayesian classification tree indicating possibility of presence of strong resistance. From the GAM we assess trends, biosecurity issues and site specific variables influencing the presence of strong resistance using a variable selection approach. The proposed Bayesian hurdle model is compared to its frequentist counterpart, and also to a naive Bayesian approach which fits a GAM to the entire dataset. The Bayesian hurdle model has the benefit of providing a set of good trees for use in the first step and appears to provide enough flexibility to represent the influence of variables on strong resistance compared to the frequentist model, but also captures the subtle changes in the trend that are missed by the frequentist and naive Bayesian models.
Resumo:
Species distribution modelling (SDM) typically analyses species’ presence together with some form of absence information. Ideally absences comprise observations or are inferred from comprehensive sampling. When such information is not available, then pseudo-absences are often generated from the background locations within the study region of interest containing the presences, or else absence is implied through the comparison of presences to the whole study region, e.g. as is the case in Maximum Entropy (MaxEnt) or Poisson point process modelling. However, the choice of which absence information to include can be both challenging and highly influential on SDM predictions (e.g. Oksanen and Minchin, 2002). In practice, the use of pseudo- or implied absences often leads to an imbalance where absences far outnumber presences. This leaves analysis highly susceptible to ‘naughty-noughts’: absences that occur beyond the envelope of the species, which can exert strong influence on the model and its predictions (Austin and Meyers, 1996). Also known as ‘excess zeros’, naughty noughts can be estimated via an overall proportion in simple hurdle or mixture models (Martin et al., 2005). However, absences, especially those that occur beyond the species envelope, can often be more diverse than presences. Here we consider an extension to excess zero models. The two-staged approach first exploits the compartmentalisation provided by classification trees (CTs) (as in O’Leary, 2008) to identify multiple sources of naughty noughts and simultaneously delineate several species envelopes. Then SDMs can be fit separately within each envelope, and for this stage, we examine both CTs (as in Falk et al., 2014) and the popular MaxEnt (Elith et al., 2006). We introduce a wider range of model performance measures to improve treatment of naughty noughts in SDM. We retain an overall measure of model performance, the area under the curve (AUC) of the Receiver-Operating Curve (ROC), but focus on its constituent measures of false negative rate (FNR) and false positive rate (FPR), and how these relate to the threshold in the predicted probability of presence that delimits predicted presence from absence. We also propose error rates more relevant to users of predictions: false omission rate (FOR), the chance that a predicted absence corresponds to (and hence wastes) an observed presence, and the false discovery rate (FDR), reflecting those predicted (or potential) presences that correspond to absence. A high FDR may be desirable since it could help target future search efforts, whereas zero or low FOR is desirable since it indicates none of the (often valuable) presences have been ignored in the SDM. For illustration, we chose Bradypus variegatus, a species that has previously been published as an exemplar species for MaxEnt, proposed by Phillips et al. (2006). We used CTs to increasingly refine the species envelope, starting with the whole study region (E0), eliminating more and more potential naughty noughts (E1–E3). When combined with an SDM fit within the species envelope, the best CT SDM had similar AUC and FPR to the best MaxEnt SDM, but otherwise performed better. The FNR and FOR were greatly reduced, suggesting that CTs handle absences better. Interestingly, MaxEnt predictions showed low discriminatory performance, with the most common predicted probability of presence being in the same range (0.00-0.20) for both true absences and presences. In summary, this example shows that SDMs can be improved by introducing an initial hurdle to identify naughty noughts and partition the envelope before applying SDMs. This improvement was barely detectable via AUC and FPR yet visible in FOR, FNR, and the comparison of predicted probability of presence distribution for pres/absence.
Resumo:
Motivated by the analysis of the Australian Grain Insect Resistance Database (AGIRD), we develop a Bayesian hurdle modelling approach to assess trends in strong resistance of stored grain insects to phosphine over time. The binary response variable from AGIRD indicating presence or absence of strong resistance is characterized by a majority of absence observations and the hurdle model is a two step approach that is useful when analyzing such a binary response dataset. The proposed hurdle model utilizes Bayesian classification trees to firstly identify covariates and covariate levels pertaining to possible presence or absence of strong resistance. Secondly, generalized additive models (GAMs) with spike and slab priors for variable selection are fitted to the subset of the dataset identified from the Bayesian classification tree indicating possibility of presence of strong resistance. From the GAM we assess trends, biosecurity issues and site specific variables influencing the presence of strong resistance using a variable selection approach. The proposed Bayesian hurdle model is compared to its frequentist counterpart, and also to a naive Bayesian approach which fits a GAM to the entire dataset. The Bayesian hurdle model has the benefit of providing a set of good trees for use in the first step and appears to provide enough flexibility to represent the influence of variables on strong resistance compared to the frequentist model, but also captures the subtle changes in the trend that are missed by the frequentist and naive Bayesian models. © 2014 Springer Science+Business Media New York.
Resumo:
We present a model of market participation in which the presence of non-negligible fixed costs leads to random censoring of the traditional double-hurdle model. Fixed costs arise when household resources must be devoted a priori to the decision to participate in the market. These costs, usually of time, are manifested in non-negligible minimum-efficient supplies and supply correspondence that requires modification of the traditional Tobit regression. The costs also complicate econometric estimation of household behavior. These complications are overcome by application of the Gibbs sampler. The algorithm thus derived provides robust estimates of the fixed-costs, double-hurdle model. The model and procedures are demonstrated in an application to milk market participation in the Ethiopian highlands.
Resumo:
Thesis (Ph.D.)--University of Washington, 2016-06
Resumo:
BACKGROUND Dengue fever (DF) outbreaks often arise from imported DF cases in Cairns, Australia. Few studies have incorporated imported DF cases in the estimation of the relationship between weather variability and incidence of autochthonous DF. The study aimed to examine the impact of weather variability on autochthonous DF infection after accounting for imported DF cases and then to explore the possibility of developing an empirical forecast system. METHODOLOGY/PRINCIPAL FINDS Data on weather variables, notified DF cases (including those acquired locally and overseas), and population size in Cairns were supplied by the Australian Bureau of Meteorology, Queensland Health, and Australian Bureau of Statistics. A time-series negative-binomial hurdle model was used to assess the effects of imported DF cases and weather variability on autochthonous DF incidence. Our results showed that monthly autochthonous DF incidences were significantly associated with monthly imported DF cases (Relative Risk (RR):1.52; 95% confidence interval (CI): 1.01-2.28), monthly minimum temperature ((o)C) (RR: 2.28; 95% CI: 1.77-2.93), monthly relative humidity (%) (RR: 1.21; 95% CI: 1.06-1.37), monthly rainfall (mm) (RR: 0.50; 95% CI: 0.31-0.81) and monthly standard deviation of daily relative humidity (%) (RR: 1.27; 95% CI: 1.08-1.50). In the zero hurdle component, the occurrence of monthly autochthonous DF cases was significantly associated with monthly minimum temperature (Odds Ratio (OR): 1.64; 95% CI: 1.01-2.67). CONCLUSIONS/SIGNIFICANCE Our research suggested that incidences of monthly autochthonous DF were strongly positively associated with monthly imported DF cases, local minimum temperature and inter-month relative humidity variability in Cairns. Moreover, DF outbreak in Cairns was driven by imported DF cases only under favourable seasons and weather conditions in the study.
Resumo:
Environmental data are spatial, temporal, and often come with many zeros. In this paper, we included space–time random effects in zero-inflated Poisson (ZIP) and ‘hurdle’ models to investigate haulout patterns of harbor seals on glacial ice. The data consisted of counts, for 18 dates on a lattice grid of samples, of harbor seals hauled out on glacial ice in Disenchantment Bay, near Yakutat, Alaska. A hurdle model is similar to a ZIP model except it does not mix zeros from the binary and count processes. Both models can be used for zero-inflated data, and we compared space–time ZIP and hurdle models in a Bayesian hierarchical model. Space–time ZIP and hurdle models were constructed by using spatial conditional autoregressive (CAR) models and temporal first-order autoregressive (AR(1)) models as random effects in ZIP and hurdle regression models. We created maps of smoothed predictions for harbor seal counts based on ice density, other covariates, and spatio-temporal random effects. For both models predictions around the edges appeared to be positively biased. The linex loss function is an asymmetric loss function that penalizes overprediction more than underprediction, and we used it to correct for prediction bias to get the best map for space–time ZIP and hurdle models.
Resumo:
OBJECTIVES To evaluate the long-term development of labial gingival recessions during orthodontic treatment and retention phase. MATERIAL AND METHODS In this retrospective case-control study, the presence of gingival recession was scored (Yes or No) on plaster models of 100 orthodontic patients (cases) and 120 controls at the age of 12 (T12 ), 15 (T15 ), 18 (T18 ), and 21 (T21 ) years. In the treated group, T12 reflected the start of orthodontic treatment and T15 - the end of active treatment and the start of retention phase with bonded retainers. Independent t-tests, Fisher's exact tests and a fitted two-part "hurdle" model were used to identify the effect of orthodontic treatment/retention on recessions. RESULTS The proportion of subjects with recessions was consistently higher in cases than controls. Overall, the odds ratio for orthodontic patients as compared with controls to have recessions is 4.48 (p < 0.001; 95% CI: 2.61-7.70). CONCLUSIONS Within the limits of the present research design, orthodontic treatment and/or the retention phase may be risk factors for the development of labial gingival recessions. In orthodontically treated subjects, mandibular incisors seem to be the most vulnerable to the development of gingival recessions.
Resumo:
In this study cross-section data was used to analyze the effect of farmers’ demographic, socioeconomic and institutional setting, market access and physical attributes on the probability and intensity of tissue culture banana (TCB) adoption. The study was carried out between July 2011 and November 2011. Both descriptive (mean, variance, promotions) and regression analysis were used in the analysis. A double hurdle regression model was fitted on the data. Using multistage sampling technique, four counties and eight sub-locations were randomly selected. Using random sampling technique, three hundred and thirty farmers were selected from a list of banana households in the selected sub-locations. The adoption level of tissue culture banana (TCB) was about 32%. The results also revealed that the likelihood of TCB adoption was significantly influenced by: availability of TCB planting material, proportion of banana income to the total farm income, per capita household expenditure and the location of the farmer in Kisii County; while those that significantly influenced the intensity of TCB adoption were: occupation of farmers, family size, labour source, farm size, soil fertility, availability/access of TCB plantlets to farmers, distance to banana market, use of manure in planting banana, access to agricultural extension services and index of TCB/non-TCB banana cultivar attributes which were scored by farmers. Compared to West Pokot County, farmers located in Bungoma County are more significantly and likely to adopt TCB technology. Therefore, the results of the study suggest that the probability of adoption and intensity of the use of TCB should be enhanced. This can be done by taking cognizance of these variables in order to meet the priority needs of the smallholder farmers who were the target group. This would lead to alleviating banana shortage in the region for enhanced food security. Subsequently, actors along the banana value chain are encouraged to target the intervention strategies based on the identified farmer, farm and institutional characteristics for enhanced impact on food provision. Opening up more TCB multiplication centres in different regions will make farmers access the TCB technology for enhanced impact on the target population.
Resumo:
A predictive model of terrorist activity is developed by examining the daily number of terrorist attacks in Indonesia from 1994 through 2007. The dynamic model employs a shot noise process to explain the self-exciting nature of the terrorist activities. This estimates the probability of future attacks as a function of the times since the past attacks. In addition, the excess of nonattack days coupled with the presence of multiple coordinated attacks on the same day compelled the use of hurdle models to jointly model the probability of an attack day and corresponding number of attacks. A power law distribution with a shot noise driven parameter best modeled the number of attacks on an attack day. Interpretation of the model parameters is discussed and predictive performance of the models is evaluated.
Resumo:
Even though satellite observations are the most effective means to gather global information in a short span of time, the challenges in this field still remain over continental landmass, despite most of the aerosol sources being land-based. This is a hurdle in global and regional aerosol climate forcing assessment. Retrieval of aerosol properties over land is complicated due to irregular terrain characteristics and the high and largely uncertain surface reflection which acts as `noise' to the much smaller amount of radiation scattered by aerosols, which is the `signal'. In this paper, we describe a satellite sensor the - `Aerosol Satellite (AEROSAT)', which is capable of retrieving aerosols over land with much more accuracy and reduced dependence on models. The sensor, utilizing a set of multi-spectral and multi-angle measurements of polarized components of radiation reflected from the Earth's surface, along with measurements of thermal infrared broadband radiance, results in a large reduction of the `noise' component (compared to the `signal). A conceptual engineering model of AEROSAT has been designed, developed and used to measure the land-surface features in the visible spectral band. Analysing the received signals using a polarization radiative transfer approach, we demonstrate the superiority of this method. It is expected that satellites carrying sensors following the AEROSAT concept would be `self-sufficient', to obtain all the relevant information required for aerosol retrieval from its own measurements.
Resumo:
The Architecture, Engineering, Construction and Facilities Management (AEC/FM) industry is rapidly becoming a multidisciplinary, multinational and multi-billion dollar economy, involving large numbers of actors working concurrently at different locations and using heterogeneous software and hardware technologies. Since the beginning of the last decade, a great deal of effort has been spent within the field of construction IT in order to integrate data and information from most computer tools used to carry out engineering projects. For this purpose, a number of integration models have been developed, like web-centric systems and construction project modeling, a useful approach in representing construction projects and integrating data from various civil engineering applications. In the modern, distributed and dynamic construction environment it is important to retrieve and exchange information from different sources and in different data formats in order to improve the processes supported by these systems. Previous research demonstrated that a major hurdle in AEC/FM data integration in such systems is caused by its variety of data types and that a significant part of the data is stored in semi-structured or unstructured formats. Therefore, new integrative approaches are needed to handle non-structured data types like images and text files. This research is focused on the integration of construction site images. These images are a significant part of the construction documentation with thousands stored in site photographs logs of large scale projects. However, locating and identifying such data needed for the important decision making processes is a very hard and time-consuming task, while so far, there are no automated methods for associating them with other related objects. Therefore, automated methods for the integration of construction images are important for construction information management. During this research, processes for retrieval, classification, and integration of construction images in AEC/FM model based systems have been explored. Specifically, a combination of techniques from the areas of image and video processing, computer vision, information retrieval, statistics and content-based image and video retrieval have been deployed in order to develop a methodology for the retrieval of related construction site image data from components of a project model. This method has been tested on available construction site images from a variety of sources like past and current building construction and transportation projects and is able to automatically classify, store, integrate and retrieve image data files in inter-organizational systems so as to allow their usage in project management related tasks.