973 resultados para probability models


Relevância:

20.00% 20.00%

Publicador:

Resumo:

This dissertation is primarily an applied statistical modelling investigation, motivated by a case study comprising real data and real questions. Theoretical questions on modelling and computation of normalization constants arose from pursuit of these data analytic questions. The essence of the thesis can be described as follows. Consider binary data observed on a two-dimensional lattice. A common problem with such data is the ambiguity of zeroes recorded. These may represent zero response given some threshold (presence) or that the threshold has not been triggered (absence). Suppose that the researcher wishes to estimate the effects of covariates on the binary responses, whilst taking into account underlying spatial variation, which is itself of some interest. This situation arises in many contexts and the dingo, cypress and toad case studies described in the motivation chapter are examples of this. Two main approaches to modelling and inference are investigated in this thesis. The first is frequentist and based on generalized linear models, with spatial variation modelled by using a block structure or by smoothing the residuals spatially. The EM algorithm can be used to obtain point estimates, coupled with bootstrapping or asymptotic MLE estimates for standard errors. The second approach is Bayesian and based on a three- or four-tier hierarchical model, comprising a logistic regression with covariates for the data layer, a binary Markov Random field (MRF) for the underlying spatial process, and suitable priors for parameters in these main models. The three-parameter autologistic model is a particular MRF of interest. Markov chain Monte Carlo (MCMC) methods comprising hybrid Metropolis/Gibbs samplers is suitable for computation in this situation. Model performance can be gauged by MCMC diagnostics. Model choice can be assessed by incorporating another tier in the modelling hierarchy. This requires evaluation of a normalization constant, a notoriously difficult problem. Difficulty with estimating the normalization constant for the MRF can be overcome by using a path integral approach, although this is a highly computationally intensive method. Different methods of estimating ratios of normalization constants (N Cs) are investigated, including importance sampling Monte Carlo (ISMC), dependent Monte Carlo based on MCMC simulations (MCMC), and reverse logistic regression (RLR). I develop an idea present though not fully developed in the literature, and propose the Integrated mean canonical statistic (IMCS) method for estimating log NC ratios for binary MRFs. The IMCS method falls within the framework of the newly identified path sampling methods of Gelman & Meng (1998) and outperforms ISMC, MCMC and RLR. It also does not rely on simplifying assumptions, such as ignoring spatio-temporal dependence in the process. A thorough investigation is made of the application of IMCS to the three-parameter Autologistic model. This work introduces background computations required for the full implementation of the four-tier model in Chapter 7. Two different extensions of the three-tier model to a four-tier version are investigated. The first extension incorporates temporal dependence in the underlying spatio-temporal process. The second extensions allows the successes and failures in the data layer to depend on time. The MCMC computational method is extended to incorporate the extra layer. A major contribution of the thesis is the development of a fully Bayesian approach to inference for these hierarchical models for the first time. Note: The author of this thesis has agreed to make it open access but invites people downloading the thesis to send her an email via the 'Contact Author' function.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A national-level safety analysis tool is needed to complement existing analytical tools for assessment of the safety impacts of roadway design alternatives. FHWA has sponsored the development of the Interactive Highway Safety Design Model (IHSDM), which is roadway design and redesign software that estimates the safety effects of alternative designs. Considering the importance of IHSDM in shaping the future of safety-related transportation investment decisions, FHWA justifiably sponsored research with the sole intent of independently validating some of the statistical models and algorithms in IHSDM. Statistical model validation aims to accomplish many important tasks, including (a) assessment of the logical defensibility of proposed models, (b) assessment of the transferability of models over future time periods and across different geographic locations, and (c) identification of areas in which future model improvements should be made. These three activities are reported for five proposed types of rural intersection crash prediction models. The internal validation of the model revealed that the crash models potentially suffer from omitted variables that affect safety, site selection and countermeasure selection bias, poorly measured and surrogate variables, and misspecification of model functional forms. The external validation indicated the inability of models to perform on par with model estimation performance. Recommendations for improving the state of the practice from this research include the systematic conduct of carefully designed before-and-after studies, improvements in data standardization and collection practices, and the development of analytical methods to combine the results of before-and-after studies with cross-sectional studies in a meaningful and useful way.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Most statistical methods use hypothesis testing. Analysis of variance, regression, discrete choice models, contingency tables, and other analysis methods commonly used in transportation research share hypothesis testing as the means of making inferences about the population of interest. Despite the fact that hypothesis testing has been a cornerstone of empirical research for many years, various aspects of hypothesis tests commonly are incorrectly applied, misinterpreted, and ignored—by novices and expert researchers alike. On initial glance, hypothesis testing appears straightforward: develop the null and alternative hypotheses, compute the test statistic to compare to a standard distribution, estimate the probability of rejecting the null hypothesis, and then make claims about the importance of the finding. This is an oversimplification of the process of hypothesis testing. Hypothesis testing as applied in empirical research is examined here. The reader is assumed to have a basic knowledge of the role of hypothesis testing in various statistical methods. Through the use of an example, the mechanics of hypothesis testing is first reviewed. Then, five precautions surrounding the use and interpretation of hypothesis tests are developed; examples of each are provided to demonstrate how errors are made, and solutions are identified so similar errors can be avoided. Remedies are provided for common errors, and conclusions are drawn on how to use the results of this paper to improve the conduct of empirical research in transportation.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

BACKGROUND: The presence of insects in stored grains is a significant problem for grain farmers, bulk grain handlers and distributors worldwide. Inspections of bulk grain commodities is essential to detect pests and therefore to reduce the risk of their presence in exported goods. It has been well documented that insect pests cluster in response to factors such as microclimatic conditions within bulk grain. Statistical sampling methodologies for grains, however, have typically considered pests and pathogens to be homogeneously distributed throughout grain commodities. In this paper we demonstrate a sampling methodology that accounts for the heterogeneous distribution of insects in bulk grains. RESULTS: We show that failure to account for the heterogeneous distribution of pests may lead to overestimates of the capacity for a sampling program to detect insects in bulk grains. Our results indicate the importance of the proportion of grain that is infested in addition to the density of pests within the infested grain. We also demonstrate that the probability of detecting pests in bulk grains increases as the number of sub-samples increases, even when the total volume or mass of grain sampled remains constant. CONCLUSION: This study demonstrates the importance of considering an appropriate biological model when developing sampling methodologies for insect pests. Accounting for a heterogeneous distribution of pests leads to a considerable improvement in the detection of pests over traditional sampling models.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper reports on the study of passenger experiences and how passengers interact with services, technology and processes at an airport. As part of our research, we have followed people through the airport from check-in to security and from security to boarding. Data was collected by approaching passengers in the departures concourse of the airport and asking for their consent to be videotaped. Data was collected and coded and the analysis focused on both discretionary and process related passenger activities. Our findings show the interdependence between activities and passenger experiences. Within all activities, passengers interact with processes, domain dependent technology, services, personnel and artifacts. These levels of interaction impact on passenger experiences and are interdependent. The emerging taxonomy of activities consists of (i) ownership related activities, (ii) group activities, (iii) individual activities (such as activities at the domain interfaces) and (iv) concurrent activities. This classification is contributing to the development of descriptive models of passenger experiences and how these activities affect the facilitation and design of future airports.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Advances in safety research—trying to improve the collective understanding of motor vehicle crash causation—rests upon the pursuit of numerous lines of inquiry. The research community has focused on analytical methods development (negative binomial specifications, simultaneous equations, etc.), on better experimental designs (before-after studies, comparison sites, etc.), on improving exposure measures, and on model specification improvements (additive terms, non-linear relations, etc.). One might think of different lines of inquiry in terms of ‘low lying fruit’—areas of inquiry that might provide significant improvements in understanding crash causation. It is the contention of this research that omitted variable bias caused by the exclusion of important variables is an important line of inquiry in safety research. In particular, spatially related variables are often difficult to collect and omitted from crash models—but offer significant ability to better understand contributing factors to crashes. This study—believed to represent a unique contribution to the safety literature—develops and examines the role of a sizeable set of spatial variables in intersection crash occurrence. In addition to commonly considered traffic and geometric variables, examined spatial factors include local influences of weather, sun glare, proximity to drinking establishments, and proximity to schools. The results indicate that inclusion of these factors results in significant improvement in model explanatory power, and the results also generally agree with expectation. The research illuminates the importance of spatial variables in safety research and also the negative consequences of their omissions.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Crash prediction models are used for a variety of purposes including forecasting the expected future performance of various transportation system segments with similar traits. The influence of intersection features on safety have been examined extensively because intersections experience a relatively large proportion of motor vehicle conflicts and crashes compared to other segments in the transportation system. The effects of left-turn lanes at intersections in particular have seen mixed results in the literature. Some researchers have found that left-turn lanes are beneficial to safety while others have reported detrimental effects on safety. This inconsistency is not surprising given that the installation of left-turn lanes is often endogenous, that is, influenced by crash counts and/or traffic volumes. Endogeneity creates problems in econometric and statistical models and is likely to account for the inconsistencies reported in the literature. This paper reports on a limited-information maximum likelihood (LIML) estimation approach to compensate for endogeneity between left-turn lane presence and angle crashes. The effects of endogeneity are mitigated using the approach, revealing the unbiased effect of left-turn lanes on crash frequency for a dataset of Georgia intersections. The research shows that without accounting for endogeneity, left-turn lanes ‘appear’ to contribute to crashes; however, when endogeneity is accounted for in the model, left-turn lanes reduce angle crash frequencies as expected by engineering judgment. Other endogenous variables may lurk in crash models as well, suggesting that the method may be used to correct simultaneity problems with other variables and in other transportation modeling contexts.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Statistical modeling of traffic crashes has been of interest to researchers for decades. Over the most recent decade many crash models have accounted for extra-variation in crash counts—variation over and above that accounted for by the Poisson density. The extra-variation – or dispersion – is theorized to capture unaccounted for variation in crashes across sites. The majority of studies have assumed fixed dispersion parameters in over-dispersed crash models—tantamount to assuming that unaccounted for variation is proportional to the expected crash count. Miaou and Lord [Miaou, S.P., Lord, D., 2003. Modeling traffic crash-flow relationships for intersections: dispersion parameter, functional form, and Bayes versus empirical Bayes methods. Transport. Res. Rec. 1840, 31–40] challenged the fixed dispersion parameter assumption, and examined various dispersion parameter relationships when modeling urban signalized intersection accidents in Toronto. They suggested that further work is needed to determine the appropriateness of the findings for rural as well as other intersection types, to corroborate their findings, and to explore alternative dispersion functions. This study builds upon the work of Miaou and Lord, with exploration of additional dispersion functions, the use of an independent data set, and presents an opportunity to corroborate their findings. Data from Georgia are used in this study. A Bayesian modeling approach with non-informative priors is adopted, using sampling-based estimation via Markov Chain Monte Carlo (MCMC) and the Gibbs sampler. A total of eight model specifications were developed; four of them employed traffic flows as explanatory factors in mean structure while the remainder of them included geometric factors in addition to major and minor road traffic flows. The models were compared and contrasted using the significance of coefficients, standard deviance, chi-square goodness-of-fit, and deviance information criteria (DIC) statistics. The findings indicate that the modeling of the dispersion parameter, which essentially explains the extra-variance structure, depends greatly on how the mean structure is modeled. In the presence of a well-defined mean function, the extra-variance structure generally becomes insignificant, i.e. the variance structure is a simple function of the mean. It appears that extra-variation is a function of covariates when the mean structure (expected crash count) is poorly specified and suffers from omitted variables. In contrast, when sufficient explanatory variables are used to model the mean (expected crash count), extra-Poisson variation is not significantly related to these variables. If these results are generalizable, they suggest that model specification may be improved by testing extra-variation functions for significance. They also suggest that known influences of expected crash counts are likely to be different than factors that might help to explain unaccounted for variation in crashes across sites