837 resultados para Chi-Squared Goodness of Fit Test
Resumo:
This thesis addresses computational challenges arising from Bayesian analysis of complex real-world problems. Many of the models and algorithms designed for such analysis are ‘hybrid’ in nature, in that they are a composition of components for which their individual properties may be easily described but the performance of the model or algorithm as a whole is less well understood. The aim of this research project is to after a better understanding of the performance of hybrid models and algorithms. The goal of this thesis is to analyse the computational aspects of hybrid models and hybrid algorithms in the Bayesian context. The first objective of the research focuses on computational aspects of hybrid models, notably a continuous finite mixture of t-distributions. In the mixture model, an inference of interest is the number of components, as this may relate to both the quality of model fit to data and the computational workload. The analysis of t-mixtures using Markov chain Monte Carlo (MCMC) is described and the model is compared to the Normal case based on the goodness of fit. Through simulation studies, it is demonstrated that the t-mixture model can be more flexible and more parsimonious in terms of number of components, particularly for skewed and heavytailed data. The study also reveals important computational issues associated with the use of t-mixtures, which have not been adequately considered in the literature. The second objective of the research focuses on computational aspects of hybrid algorithms for Bayesian analysis. Two approaches will be considered: a formal comparison of the performance of a range of hybrid algorithms and a theoretical investigation of the performance of one of these algorithms in high dimensions. For the first approach, the delayed rejection algorithm, the pinball sampler, the Metropolis adjusted Langevin algorithm, and the hybrid version of the population Monte Carlo (PMC) algorithm are selected as a set of examples of hybrid algorithms. Statistical literature shows how statistical efficiency is often the only criteria for an efficient algorithm. In this thesis the algorithms are also considered and compared from a more practical perspective. This extends to the study of how individual algorithms contribute to the overall efficiency of hybrid algorithms, and highlights weaknesses that may be introduced by the combination process of these components in a single algorithm. The second approach to considering computational aspects of hybrid algorithms involves an investigation of the performance of the PMC in high dimensions. It is well known that as a model becomes more complex, computation may become increasingly difficult in real time. In particular the importance sampling based algorithms, including the PMC, are known to be unstable in high dimensions. This thesis examines the PMC algorithm in a simplified setting, a single step of the general sampling, and explores a fundamental problem that occurs in applying importance sampling to a high-dimensional problem. The precision of the computed estimate from the simplified setting is measured by the asymptotic variance of the estimate under conditions on the importance function. Additionally, the exponential growth of the asymptotic variance with the dimension is demonstrated and we illustrates that the optimal covariance matrix for the importance function can be estimated in a special case.
Resumo:
Predicting safety on roadways is standard practice for road safety professionals and has a corresponding extensive literature. The majority of safety prediction models are estimated using roadway segment and intersection (microscale) data, while more recently efforts have been undertaken to predict safety at the planning level (macroscale). Safety prediction models typically include roadway, operations, and exposure variables—factors known to affect safety in fundamental ways. Environmental variables, in particular variables attempting to capture the effect of rain on road safety, are difficult to obtain and have rarely been considered. In the few cases weather variables have been included, historical averages rather than actual weather conditions during which crashes are observed have been used. Without the inclusion of weather related variables researchers have had difficulty explaining regional differences in the safety performance of various entities (e.g. intersections, road segments, highways, etc.) As part of the NCHRP 8-44 research effort, researchers developed PLANSAFE, or planning level safety prediction models. These models make use of socio-economic, demographic, and roadway variables for predicting planning level safety. Accounting for regional differences - similar to the experience for microscale safety models - has been problematic during the development of planning level safety prediction models. More specifically, without weather related variables there is an insufficient set of variables for explaining safety differences across regions and states. Furthermore, omitted variable bias resulting from excluding these important variables may adversely impact the coefficients of included variables, thus contributing to difficulty in model interpretation and accuracy. This paper summarizes the results of an effort to include weather related variables, particularly various measures of rainfall, into accident frequency prediction and the prediction of the frequency of fatal and/or injury degree of severity crash models. The purpose of the study was to determine whether these variables do in fact improve overall goodness of fit of the models, whether these variables may explain some or all of observed regional differences, and identifying the estimated effects of rainfall on safety. The models are based on Traffic Analysis Zone level datasets from Michigan, and Pima and Maricopa Counties in Arizona. Numerous rain-related variables were found to be statistically significant, selected rain related variables improved the overall goodness of fit, and inclusion of these variables reduced the portion of the model explained by the constant in the base models without weather variables. Rain tends to diminish safety, as expected, in fairly complex ways, depending on rain frequency and intensity.
Resumo:
Exclusion processes on a regular lattice are used to model many biological and physical systems at a discrete level. The average properties of an exclusion process may be described by a continuum model given by a partial differential equation. We combine a general class of contact interactions with an exclusion process. We determine that many different types of contact interactions at the agent-level always give rise to a nonlinear diffusion equation, with a vast variety of diffusion functions D(C). We find that these functions may be dependent on the chosen lattice and the defined neighborhood of the contact interactions. Mild to moderate contact interaction strength generally results in good agreement between discrete and continuum models, while strong interactions often show discrepancies between the two, particularly when D(C) takes on negative values. We present a measure to predict the goodness of fit between the discrete and continuous model, and thus the validity of the continuum description of a motile, contact-interacting population of agents. This work has implications for modeling cell motility and interpreting cell motility assays, giving the ability to incorporate biologically realistic cell-cell interactions and develop global measures of discrete microscopic data.
Resumo:
This study proposes a full Bayes (FB) hierarchical modeling approach in traffic crash hotspot identification. The FB approach is able to account for all uncertainties associated with crash risk and various risk factors by estimating a posterior distribution of the site safety on which various ranking criteria could be based. Moreover, by use of hierarchical model specification, FB approach is able to flexibly take into account various heterogeneities of crash occurrence due to spatiotemporal effects on traffic safety. Using Singapore intersection crash data(1997-2006), an empirical evaluate was conducted to compare the proposed FB approach to the state-of-the-art approaches. Results show that the Bayesian hierarchical models with accommodation for site specific effect and serial correlation have better goodness-of-fit than non hierarchical models. Furthermore, all model-based approaches perform significantly better in safety ranking than the naive approach using raw crash count. The FB hierarchical models were found to significantly outperform the standard EB approach in correctly identifying hotspots.
Resumo:
Advances in algorithms for approximate sampling from a multivariable target function have led to solutions to challenging statistical inference problems that would otherwise not be considered by the applied scientist. Such sampling algorithms are particularly relevant to Bayesian statistics, since the target function is the posterior distribution of the unobservables given the observables. In this thesis we develop, adapt and apply Bayesian algorithms, whilst addressing substantive applied problems in biology and medicine as well as other applications. For an increasing number of high-impact research problems, the primary models of interest are often sufficiently complex that the likelihood function is computationally intractable. Rather than discard these models in favour of inferior alternatives, a class of Bayesian "likelihoodfree" techniques (often termed approximate Bayesian computation (ABC)) has emerged in the last few years, which avoids direct likelihood computation through repeated sampling of data from the model and comparing observed and simulated summary statistics. In Part I of this thesis we utilise sequential Monte Carlo (SMC) methodology to develop new algorithms for ABC that are more efficient in terms of the number of model simulations required and are almost black-box since very little algorithmic tuning is required. In addition, we address the issue of deriving appropriate summary statistics to use within ABC via a goodness-of-fit statistic and indirect inference. Another important problem in statistics is the design of experiments. That is, how one should select the values of the controllable variables in order to achieve some design goal. The presences of parameter and/or model uncertainty are computational obstacles when designing experiments but can lead to inefficient designs if not accounted for correctly. The Bayesian framework accommodates such uncertainties in a coherent way. If the amount of uncertainty is substantial, it can be of interest to perform adaptive designs in order to accrue information to make better decisions about future design points. This is of particular interest if the data can be collected sequentially. In a sense, the current posterior distribution becomes the new prior distribution for the next design decision. Part II of this thesis creates new algorithms for Bayesian sequential design to accommodate parameter and model uncertainty using SMC. The algorithms are substantially faster than previous approaches allowing the simulation properties of various design utilities to be investigated in a more timely manner. Furthermore the approach offers convenient estimation of Bayesian utilities and other quantities that are particularly relevant in the presence of model uncertainty. Finally, Part III of this thesis tackles a substantive medical problem. A neurological disorder known as motor neuron disease (MND) progressively causes motor neurons to no longer have the ability to innervate the muscle fibres, causing the muscles to eventually waste away. When this occurs the motor unit effectively ‘dies’. There is no cure for MND, and fatality often results from a lack of muscle strength to breathe. The prognosis for many forms of MND (particularly amyotrophic lateral sclerosis (ALS)) is particularly poor, with patients usually only surviving a small number of years after the initial onset of disease. Measuring the progress of diseases of the motor units, such as ALS, is a challenge for clinical neurologists. Motor unit number estimation (MUNE) is an attempt to directly assess underlying motor unit loss rather than indirect techniques such as muscle strength assessment, which generally is unable to detect progressions due to the body’s natural attempts at compensation. Part III of this thesis builds upon a previous Bayesian technique, which develops a sophisticated statistical model that takes into account physiological information about motor unit activation and various sources of uncertainties. More specifically, we develop a more reliable MUNE method by applying marginalisation over latent variables in order to improve the performance of a previously developed reversible jump Markov chain Monte Carlo sampler. We make other subtle changes to the model and algorithm to improve the robustness of the approach.
Resumo:
A generalised gamma bidding model is presented, which incorporates many previous models. The log likelihood equations are provided. Using a new method of testing, variants of the model are fitted to some real data for construction contract auctions to find the best fitting models for groupings of bidders. The results are examined for simplifying assumptions, including all those in the main literature. These indicate no one model to be best for all datasets. However, some models do appear to perform significantly better than others and it is suggested that future research would benefit from a closer examination of these.
Resumo:
This study considered the problem of predicting survival, based on three alternative models: a single Weibull, a mixture of Weibulls and a cure model. Instead of the common procedure of choosing a single “best” model, where “best” is defined in terms of goodness of fit to the data, a Bayesian model averaging (BMA) approach was adopted to account for model uncertainty. This was illustrated using a case study in which the aim was the description of lymphoma cancer survival with covariates given by phenotypes and gene expression. The results of this study indicate that if the sample size is sufficiently large, one of the three models emerge as having highest probability given the data, as indicated by the goodness of fit measure; the Bayesian information criterion (BIC). However, when the sample size was reduced, no single model was revealed as “best”, suggesting that a BMA approach would be appropriate. Although a BMA approach can compromise on goodness of fit to the data (when compared to the true model), it can provide robust predictions and facilitate more detailed investigation of the relationships between gene expression and patient survival. Keywords: Bayesian modelling; Bayesian model averaging; Cure model; Markov Chain Monte Carlo; Mixture model; Survival analysis; Weibull distribution
Resumo:
Pedestrian crashes are one of the major road safety problems in developing countries representing about 40% of total fatal crashes in low income countries. Despite the fact that many pedestrian crashes in these countries occur at unsignalized intersections such as roundabouts, studies focussing on this issue are limited—thus representing a critical research gap. The objective of this study is to develop safety performance functions for pedestrian crashes at modern roundabouts to identify significant roadway geometric, traffic and land use characteristics related to pedestrian safety. To establish the relationship between pedestrian crashes and various causal factors, detailed data including various forms of exposure, geometric and traffic characteristics, and spatial factors such as proximity to schools and proximity to drinking establishments were collected from a sample of 22 modern roundabouts in Addis Ababa, Ethiopia, representing about 56% of such roundabouts in Addis Ababa. To account for spatial correlation resulting from multiple observations at a roundabout, both the random effect Poisson (REP) and random effect Negative Binomial (RENB) regression models were estimated and compared. Model goodness of fit statistics reveal a marginally superior fit of the REP model compared to the RENB model of pedestrian crashes at roundabouts. Pedestrian crossing volume and the product of traffic volumes along major and minor road had significant and positive associations with pedestrian crashes at roundabouts. The presence of a public transport (bus/taxi) terminal beside a roundabout is associated with increased pedestrian crashes. While the maximum gradient of an approach road is negatively associated with pedestrian safety, the provision of a raised median along an approach appears to increase pedestrian safety at roundabouts. Remedial measures are identified for combating pedestrian safety problems at roundabouts in the context of a developing country.
Resumo:
We report the first 3D maps of genetic effects on brain fiber complexity. We analyzed HARDI brain imaging data from 90 young adult twins using an information-theoretic measure, the Jensen-Shannon divergence (JSD), to gauge the regional complexity of the white matter fiber orientation distribution functions (ODF). HARDI data were fluidly registered using Karcher means and ODF square-roots for interpol ation; each subject's JSD map was computed from the spatial coherence of the ODFs in each voxel's neighborhood. We evaluated the genetic influences on generalized fiber anisotropy (GFA) and complexity (JSD) using structural equation models (SEM). At each voxel, genetic and environmental components of data variation were estimated, and their goodness of fit tested by permutation. Color-coded maps revealed that the optimal models varied for different brain regions. Fiber complexity was predominantly under genetic control, and was higher in more highly anisotropic regions. These methods show promise for discovering factors affecting fiber connectivity in the brain.
Resumo:
Current mobile devices and streaming video services support high definition (HD) video, increasing expectation for more contents. HD video streaming generally requires large bandwidth, exerting pressures on existing networks. New generation of video compression codecs, such as VP9 and H.265/HEVC, are expected to be more effective for reducing bandwidth. Existing studies to measure the impact of its compression on users’ perceived quality have not been focused on mobile devices. Here we propose new Quality of Experience (QoE) models that consider both subjective and objective assessments of mobile video quality. We introduce novel predictors, such as the correlations between video resolution and size of coding unit, and achieve a high goodness-of-fit to the collected subjective assessment data (adjusted R-square >83%). The performance analysis shows that H.265 can potentially achieve 44% to 59% bit rate saving compared to H.264/AVC, slightly better than VP9 at 33% to 53%, depending on video content and resolution.
Resumo:
Traffic congestion has been a growing issue in many metropolitan areas during recent years, which necessitates the identification of its key contributors and development of sustainable strategies to help decrease its adverse impacts on traffic networks. Road incidents generally and crashes specifically have been acknowledged as the cause of a large proportion of travel delays in urban areas and account for 25% to 60% of traffic congestion on motorways. Identifying the critical determinants of travel delays has been of significant importance to the incident management systems which constantly collect and store the incident duration data. This study investigates the individual and simultaneous differential effects of the relevant determinants on motorway crash duration probabilities. In particular, it applies parametric Accelerated Failure Time (AFT) hazard-based models to develop in-depth insights into how the crash-specific characteristic and the associated temporal and infrastructural determinants impact the duration. AFT models with both fixed and random parameters have been calibrated on one year of traffic crash records from two major Australian motorways in South East Queensland and the differential effects of determinants on crash survival functions have been studied on these two motorways individually. A comprehensive spectrum of commonly used parametric fixed parameter AFT models, including generalized gamma and generalized F families, have been compared to random parameter AFT structures in terms of goodness of fit to the duration data and as a result, the random parameter Weibull AFT model has been selected as the most appropriate model. Significant determinants of motorway crash duration included traffic diversion requirement, crash injury type, number and type of vehicles involved in a crash, day of week and time of day, towing support requirement and damage to the infrastructure. A major finding of this research is that the motorways under study are significantly different in terms of crash durations; such that motorway exhibits durations that are on average 19% shorter compared to the durations on motorway. The differential effects of explanatory variables on crash durations are also different on the two motorways. The detailed presented analysis confirms that, looking at the motorway network as a whole, neglecting the individual differences between roads, can lead to erroneous interpretations of duration and inefficient strategies for mitigating travel delays along a particular motorway.
Resumo:
Crashes at any particular transport network location consist of a chain of events arising from a multitude of potential causes and/or contributing factors whose nature is likely to reflect geometric characteristics of the road, spatial effects of the surrounding environment, and human behavioural factors. It is postulated that these potential contributing factors do not arise from the same underlying risk process, and thus should be explicitly modelled and understood. The state of the practice in road safety network management applies a safety performance function that represents a single risk process to explain crash variability across network sites. This study aims to elucidate the importance of differentiating among various underlying risk processes contributing to the observed crash count at any particular network location. To demonstrate the principle of this theoretical and corresponding methodological approach, the study explores engineering (e.g. segment length, speed limit) and unobserved spatial factors (e.g. climatic factors, presence of schools) as two explicit sources of crash contributing factors. A Bayesian Latent Class (BLC) analysis is used to explore these two sources and to incorporate prior information about their contribution to crash occurrence. The methodology is applied to the state controlled roads in Queensland, Australia and the results are compared with the traditional Negative Binomial (NB) model. A comparison of goodness of fit measures indicates that the model with a double risk process outperforms the single risk process NB model, and thus indicating the need for further research to capture all the three crash generation processes into the SPFs.
Resumo:
The current state of the practice in Blackspot Identification (BSI) utilizes safety performance functions based on total crash counts to identify transport system sites with potentially high crash risk. This paper postulates that total crash count variation over a transport network is a result of multiple distinct crash generating processes including geometric characteristics of the road, spatial features of the surrounding environment, and driver behaviour factors. However, these multiple sources are ignored in current modelling methodologies in both trying to explain or predict crash frequencies across sites. Instead, current practice employs models that imply that a single underlying crash generating process exists. The model mis-specification may lead to correlating crashes with the incorrect sources of contributing factors (e.g. concluding a crash is predominately caused by a geometric feature when it is a behavioural issue), which may ultimately lead to inefficient use of public funds and misidentification of true blackspots. This study aims to propose a latent class model consistent with a multiple crash process theory, and to investigate the influence this model has on correctly identifying crash blackspots. We first present the theoretical and corresponding methodological approach in which a Bayesian Latent Class (BLC) model is estimated assuming that crashes arise from two distinct risk generating processes including engineering and unobserved spatial factors. The Bayesian model is used to incorporate prior information about the contribution of each underlying process to the total crash count. The methodology is applied to the state-controlled roads in Queensland, Australia and the results are compared to an Empirical Bayesian Negative Binomial (EB-NB) model. A comparison of goodness of fit measures illustrates significantly improved performance of the proposed model compared to the NB model. The detection of blackspots was also improved when compared to the EB-NB model. In addition, modelling crashes as the result of two fundamentally separate underlying processes reveals more detailed information about unobserved crash causes.
Resumo:
An attempt is made in this paper to arrive at a methodology for generating building technologies appropriate to rural housing. An evaluation of traditional modern' technologies currently in use reveals the need for alternatives. The lacunae in the presently available technologies also lead to a definition of rural housing needs. It is emphasised that contending technologies must establish a 'goodness of fit' between the house form and the pattern of needs. A systems viewpoint which looks at the dynamic process of building construction and the static structure of the building is then suggested as a means to match the technologies to the needs. The process viewpoint emphasises the role of building materials production and transportation in achieving desired building performances. A couple of examples of technological alternatives like the compacted soil block and the polythene-stabilised soil roof covering are then discussed. The static structural system viewpoint is then studied to arrive at methodologies of cost reduction. An illustrative analysis is carried out using the dynamic programming technique, to arrive at combinations of alternatives for the building components which lead to cost reduction. Some of the technological options are then evaluated against the need patterns. Finally, a guideline for developments in building technology is suggested
Resumo:
Biological control of exotic plant populations with native organisms appears to be increasing, even though its success to date has been limited. Although many researchers and managers feel that native organisms are easier to use and present less risk to the environment this may not be true. Developing a successful management program with a native insect is dependent on a number of critical factors that need to be considered. Information is needed on the feeding preference of the agent, agent effectiveness, environmental regulation of the agent, unique requirements of the agent, population maintenance of the agent, and time to desired impact. By understanding these factors, researchers and managers can develop a detailed protocol for using the native biological control agent for a specific target plant. . We found E. lecontei in 14 waterbodies, most of which were in eastern Washington. Only one lake with weevils was located in western Washington. Weevils were associated with both Eurasian ( Myriophyllum spicatum L.) and northern watermilfoil ( M. sibiricum K.). Waterbodies with E. lecontei had significantly higher ( P < 0.05) pH (8.7 ± 0.2) (mean ± 2SE), specific conductance (0.3 ± 0.08 mS cm -1 ) and total alkalinity (132.4 ± 30.8 mg CaCO 3 L -1 ). We also found that weevil presence was related to surface water temperature and waterbody location ( = 24.3, P ≤ 0.001) and of all the models tested, this model provided the best fit (Hosmer- Lemeshow goodness-of-fit = 4.0, P = 0.9). Our results suggest that in Washington State E. lecontei occurs primarily in eastern Washington in waterbodies with pH ≥ 8.2 and specific conductance ≥ 0.2 mS cm -1 . Furthermore, weevil distribution appears to be correlated with waterbody location (eastern versus western Washington) and surface water temperature.