996 resultados para Statistical maps.
Resumo:
This dissertation is primarily an applied statistical modelling investigation, motivated by a case study comprising real data and real questions. Theoretical questions on modelling and computation of normalization constants arose from pursuit of these data analytic questions. The essence of the thesis can be described as follows. Consider binary data observed on a two-dimensional lattice. A common problem with such data is the ambiguity of zeroes recorded. These may represent zero response given some threshold (presence) or that the threshold has not been triggered (absence). Suppose that the researcher wishes to estimate the effects of covariates on the binary responses, whilst taking into account underlying spatial variation, which is itself of some interest. This situation arises in many contexts and the dingo, cypress and toad case studies described in the motivation chapter are examples of this. Two main approaches to modelling and inference are investigated in this thesis. The first is frequentist and based on generalized linear models, with spatial variation modelled by using a block structure or by smoothing the residuals spatially. The EM algorithm can be used to obtain point estimates, coupled with bootstrapping or asymptotic MLE estimates for standard errors. The second approach is Bayesian and based on a three- or four-tier hierarchical model, comprising a logistic regression with covariates for the data layer, a binary Markov Random field (MRF) for the underlying spatial process, and suitable priors for parameters in these main models. The three-parameter autologistic model is a particular MRF of interest. Markov chain Monte Carlo (MCMC) methods comprising hybrid Metropolis/Gibbs samplers is suitable for computation in this situation. Model performance can be gauged by MCMC diagnostics. Model choice can be assessed by incorporating another tier in the modelling hierarchy. This requires evaluation of a normalization constant, a notoriously difficult problem. Difficulty with estimating the normalization constant for the MRF can be overcome by using a path integral approach, although this is a highly computationally intensive method. Different methods of estimating ratios of normalization constants (N Cs) are investigated, including importance sampling Monte Carlo (ISMC), dependent Monte Carlo based on MCMC simulations (MCMC), and reverse logistic regression (RLR). I develop an idea present though not fully developed in the literature, and propose the Integrated mean canonical statistic (IMCS) method for estimating log NC ratios for binary MRFs. The IMCS method falls within the framework of the newly identified path sampling methods of Gelman & Meng (1998) and outperforms ISMC, MCMC and RLR. It also does not rely on simplifying assumptions, such as ignoring spatio-temporal dependence in the process. A thorough investigation is made of the application of IMCS to the three-parameter Autologistic model. This work introduces background computations required for the full implementation of the four-tier model in Chapter 7. Two different extensions of the three-tier model to a four-tier version are investigated. The first extension incorporates temporal dependence in the underlying spatio-temporal process. The second extensions allows the successes and failures in the data layer to depend on time. The MCMC computational method is extended to incorporate the extra layer. A major contribution of the thesis is the development of a fully Bayesian approach to inference for these hierarchical models for the first time. Note: The author of this thesis has agreed to make it open access but invites people downloading the thesis to send her an email via the 'Contact Author' function.
Resumo:
Most statistical methods use hypothesis testing. Analysis of variance, regression, discrete choice models, contingency tables, and other analysis methods commonly used in transportation research share hypothesis testing as the means of making inferences about the population of interest. Despite the fact that hypothesis testing has been a cornerstone of empirical research for many years, various aspects of hypothesis tests commonly are incorrectly applied, misinterpreted, and ignored—by novices and expert researchers alike. On initial glance, hypothesis testing appears straightforward: develop the null and alternative hypotheses, compute the test statistic to compare to a standard distribution, estimate the probability of rejecting the null hypothesis, and then make claims about the importance of the finding. This is an oversimplification of the process of hypothesis testing. Hypothesis testing as applied in empirical research is examined here. The reader is assumed to have a basic knowledge of the role of hypothesis testing in various statistical methods. Through the use of an example, the mechanics of hypothesis testing is first reviewed. Then, five precautions surrounding the use and interpretation of hypothesis tests are developed; examples of each are provided to demonstrate how errors are made, and solutions are identified so similar errors can be avoided. Remedies are provided for common errors, and conclusions are drawn on how to use the results of this paper to improve the conduct of empirical research in transportation.
Resumo:
A statistical modeling method to accurately determine combustion chamber resonance is proposed and demonstrated. This method utilises Markov-chain Monte Carlo (MCMC) through the use of the Metropolis-Hastings (MH) algorithm to yield a probability density function for the combustion chamber frequency and find the best estimate of the resonant frequency, along with uncertainty. The accurate determination of combustion chamber resonance is then used to investigate various engine phenomena, with appropriate uncertainty, for a range of engine cycles. It is shown that, when operating on various ethanol/diesel fuel combinations, a 20% substitution yields the least amount of inter-cycle variability, in relation to combustion chamber resonance.
Resumo:
The Mobile Emissions Assessment System for Urban and Regional Evaluation (MEASURE) model provides an external validation capability for hot stabilized option; the model is one of several new modal emissions models designed to predict hot stabilized emission rates for various motor vehicle groups as a function of the conditions under which the vehicles are operating. The validation of aggregate measurements, such as speed and acceleration profile, is performed on an independent data set using three statistical criteria. The MEASURE algorithms have proved to provide significant improvements in both average emission estimates and explanatory power over some earlier models for pollutants across almost every operating cycle tested.
Resumo:
There has been considerable research conducted over the last 20 years focused on predicting motor vehicle crashes on transportation facilities. The range of statistical models commonly applied includes binomial, Poisson, Poisson-gamma (or negative binomial), zero-inflated Poisson and negative binomial models (ZIP and ZINB), and multinomial probability models. Given the range of possible modeling approaches and the host of assumptions with each modeling approach, making an intelligent choice for modeling motor vehicle crash data is difficult. There is little discussion in the literature comparing different statistical modeling approaches, identifying which statistical models are most appropriate for modeling crash data, and providing a strong justification from basic crash principles. In the recent literature, it has been suggested that the motor vehicle crash process can successfully be modeled by assuming a dual-state data-generating process, which implies that entities (e.g., intersections, road segments, pedestrian crossings, etc.) exist in one of two states—perfectly safe and unsafe. As a result, the ZIP and ZINB are two models that have been applied to account for the preponderance of “excess” zeros frequently observed in crash count data. The objective of this study is to provide defensible guidance on how to appropriate model crash data. We first examine the motor vehicle crash process using theoretical principles and a basic understanding of the crash process. It is shown that the fundamental crash process follows a Bernoulli trial with unequal probability of independent events, also known as Poisson trials. We examine the evolution of statistical models as they apply to the motor vehicle crash process, and indicate how well they statistically approximate the crash process. We also present the theory behind dual-state process count models, and note why they have become popular for modeling crash data. A simulation experiment is then conducted to demonstrate how crash data give rise to “excess” zeros frequently observed in crash data. It is shown that the Poisson and other mixed probabilistic structures are approximations assumed for modeling the motor vehicle crash process. Furthermore, it is demonstrated that under certain (fairly common) circumstances excess zeros are observed—and that these circumstances arise from low exposure and/or inappropriate selection of time/space scales and not an underlying dual state process. In conclusion, carefully selecting the time/space scales for analysis, including an improved set of explanatory variables and/or unobserved heterogeneity effects in count regression models, or applying small-area statistical methods (observations with low exposure) represent the most defensible modeling approaches for datasets with a preponderance of zeros