188 resultados para Statistical inference

em Queensland University of Technology - ePrints Archive


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background The problem of silent multiple comparisons is one of the most difficult statistical problems faced by scientists. It is a particular problem for investigating a one-off cancer cluster reported to a health department because any one of hundreds, or possibly thousands, of neighbourhoods, schools, or workplaces could have reported a cluster, which could have been for any one of several types of cancer or any one of several time periods. Methods This paper contrasts the frequentist approach with a Bayesian approach for dealing with silent multiple comparisons in the context of a one-off cluster reported to a health department. Two published cluster investigations were re-analysed using the Dunn-Sidak method to adjust frequentist p-values and confidence intervals for silent multiple comparisons. Bayesian methods were based on the Gamma distribution. Results Bayesian analysis with non-informative priors produced results similar to the frequentist analysis, and suggested that both clusters represented a statistical excess. In the frequentist framework, the statistical significance of both clusters was extremely sensitive to the number of silent multiple comparisons, which can only ever be a subjective "guesstimate". The Bayesian approach is also subjective: whether there is an apparent statistical excess depends on the specified prior. Conclusion In cluster investigations, the frequentist approach is just as subjective as the Bayesian approach, but the Bayesian approach is less ambitious in that it treats the analysis as a synthesis of data and personal judgements (possibly poor ones), rather than objective reality. Bayesian analysis is (arguably) a useful tool to support complicated decision-making, because it makes the uncertainty associated with silent multiple comparisons explicit.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

A time series method for the determination of combustion chamber resonant frequencies is outlined. This technique employs the use of Markov-chain Monte Carlo (MCMC) to infer parameters in a chosen model of the data. The development of the model is included and the resonant frequency is characterised as a function of time. Potential applications for cycle-by-cycle analysis are discussed and the bulk temperature of the gas and the trapped mass in the combustion chamber are evaluated as a function of time from resonant frequency information.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Advances in algorithms for approximate sampling from a multivariable target function have led to solutions to challenging statistical inference problems that would otherwise not be considered by the applied scientist. Such sampling algorithms are particularly relevant to Bayesian statistics, since the target function is the posterior distribution of the unobservables given the observables. In this thesis we develop, adapt and apply Bayesian algorithms, whilst addressing substantive applied problems in biology and medicine as well as other applications. For an increasing number of high-impact research problems, the primary models of interest are often sufficiently complex that the likelihood function is computationally intractable. Rather than discard these models in favour of inferior alternatives, a class of Bayesian "likelihoodfree" techniques (often termed approximate Bayesian computation (ABC)) has emerged in the last few years, which avoids direct likelihood computation through repeated sampling of data from the model and comparing observed and simulated summary statistics. In Part I of this thesis we utilise sequential Monte Carlo (SMC) methodology to develop new algorithms for ABC that are more efficient in terms of the number of model simulations required and are almost black-box since very little algorithmic tuning is required. In addition, we address the issue of deriving appropriate summary statistics to use within ABC via a goodness-of-fit statistic and indirect inference. Another important problem in statistics is the design of experiments. That is, how one should select the values of the controllable variables in order to achieve some design goal. The presences of parameter and/or model uncertainty are computational obstacles when designing experiments but can lead to inefficient designs if not accounted for correctly. The Bayesian framework accommodates such uncertainties in a coherent way. If the amount of uncertainty is substantial, it can be of interest to perform adaptive designs in order to accrue information to make better decisions about future design points. This is of particular interest if the data can be collected sequentially. In a sense, the current posterior distribution becomes the new prior distribution for the next design decision. Part II of this thesis creates new algorithms for Bayesian sequential design to accommodate parameter and model uncertainty using SMC. The algorithms are substantially faster than previous approaches allowing the simulation properties of various design utilities to be investigated in a more timely manner. Furthermore the approach offers convenient estimation of Bayesian utilities and other quantities that are particularly relevant in the presence of model uncertainty. Finally, Part III of this thesis tackles a substantive medical problem. A neurological disorder known as motor neuron disease (MND) progressively causes motor neurons to no longer have the ability to innervate the muscle fibres, causing the muscles to eventually waste away. When this occurs the motor unit effectively ‘dies’. There is no cure for MND, and fatality often results from a lack of muscle strength to breathe. The prognosis for many forms of MND (particularly amyotrophic lateral sclerosis (ALS)) is particularly poor, with patients usually only surviving a small number of years after the initial onset of disease. Measuring the progress of diseases of the motor units, such as ALS, is a challenge for clinical neurologists. Motor unit number estimation (MUNE) is an attempt to directly assess underlying motor unit loss rather than indirect techniques such as muscle strength assessment, which generally is unable to detect progressions due to the body’s natural attempts at compensation. Part III of this thesis builds upon a previous Bayesian technique, which develops a sophisticated statistical model that takes into account physiological information about motor unit activation and various sources of uncertainties. More specifically, we develop a more reliable MUNE method by applying marginalisation over latent variables in order to improve the performance of a previously developed reversible jump Markov chain Monte Carlo sampler. We make other subtle changes to the model and algorithm to improve the robustness of the approach.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Driving on an approach to a signalized intersection while distracted is particularly dangerous, as potential vehicular conflicts and resulting angle collisions tend to be severe. Given the prevalence and importance of this particular scenario, the decisions and actions of distracted drivers during the onset of yellow lights are the focus of this study. Driving simulator data were obtained from a sample of 58 drivers under baseline and handheld mobile phone conditions at the University of Iowa - National Advanced Driving Simulator. Explanatory variables included age, gender, cell phone use, distance to stop-line, and speed. Although there is extensive research on drivers’ responses to yellow traffic signals, the examination has been conducted from a traditional regression-based approach, which does not necessary provide the underlying relations and patterns among the sampled data. In this paper, we exploit the benefits of both classical statistical inference and data mining techniques to identify the a priori relationships among main effects, non-linearities, and interaction effects. Results suggest that novice (16-17 years) and young drivers’ (18-25 years) have heightened yellow light running risk while distracted by a cell phone conversation. Driver experience captured by age has a multiplicative effect with distraction, making the combined effect of being inexperienced and distracted particularly risky. Overall, distracted drivers across most tested groups tend to reduce the propensity of yellow light running as the distance to stop line increases, exhibiting risk compensation on a critical driving situation.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We consider how data from scientific research should be used for decision making in health services. Whether a hand hygiene intervention to reduce risk of nosocomial infection should be widely adopted is the case study. Improving hand hygiene has been described as the most important measure to prevent nosocomial infection. 1 Transmission of microorganisms is reduced, and fewer infections arise, which leads to a reduction in mortality2 and cost savings.3 Implementing a hand hygiene program is itself costly, so the extra investment should be tested for cost-effectiveness.4,5 The first part of our commentary is about cost-effectiveness models and how they inform decision making for health services. The second part is about how data on the effectiveness of hand hygiene programs arising from scientific studies are used, and 2 points are made: the threshold for statistical inference of .05 used to judge effectiveness studies is not important for decision making,6,7 and potentially valuable evidence about effectiveness might be excluded by decision makers because it is deemed low quality.8 The ideas put forward will help researchers and health services decision makers to appraise scientific evidence in a more powerful way.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Many model-based investigation techniques, such as sensitivity analysis, optimization, and statistical inference, require a large number of model evaluations to be performed at different input and/or parameter values. This limits the application of these techniques to models that can be implemented in computationally efficient computer codes. Emulators, by providing efficient interpolation between outputs of deterministic simulation models, can considerably extend the field of applicability of such computationally demanding techniques. So far, the dominant techniques for developing emulators have been priors in the form of Gaussian stochastic processes (GASP) that were conditioned with a design data set of inputs and corresponding model outputs. In the context of dynamic models, this approach has two essential disadvantages: (i) these emulators do not consider our knowledge of the structure of the model, and (ii) they run into numerical difficulties if there are a large number of closely spaced input points as is often the case in the time dimension of dynamic models. To address both of these problems, a new concept of developing emulators for dynamic models is proposed. This concept is based on a prior that combines a simplified linear state space model of the temporal evolution of the dynamic model with Gaussian stochastic processes for the innovation terms as functions of model parameters and/or inputs. These innovation terms are intended to correct the error of the linear model at each output step. Conditioning this prior to the design data set is done by Kalman smoothing. This leads to an efficient emulator that, due to the consideration of our knowledge about dominant mechanisms built into the simulation model, can be expected to outperform purely statistical emulators at least in cases in which the design data set is small. The feasibility and potential difficulties of the proposed approach are demonstrated by the application to a simple hydrological model.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Approximate Bayesian Computation’ (ABC) represents a powerful methodology for the analysis of complex stochastic systems for which the likelihood of the observed data under an arbitrary set of input parameters may be entirely intractable – the latter condition rendering useless the standard machinery of tractable likelihood-based, Bayesian statistical inference [e.g. conventional Markov chain Monte Carlo (MCMC) simulation]. In this paper, we demonstrate the potential of ABC for astronomical model analysis by application to a case study in the morphological transformation of high-redshift galaxies. To this end, we develop, first, a stochastic model for the competing processes of merging and secular evolution in the early Universe, and secondly, through an ABC-based comparison against the observed demographics of massive (Mgal > 1011 M⊙) galaxies (at 1.5 < z < 3) in the Cosmic Assembly Near-IR Deep Extragalatic Legacy Survey (CANDELS)/Extended Groth Strip (EGS) data set we derive posterior probability densities for the key parameters of this model. The ‘Sequential Monte Carlo’ implementation of ABC exhibited herein, featuring both a self-generating target sequence and self-refining MCMC kernel, is amongst the most efficient of contemporary approaches to this important statistical algorithm. We highlight as well through our chosen case study the value of careful summary statistic selection, and demonstrate two modern strategies for assessment and optimization in this regard. Ultimately, our ABC analysis of the high-redshift morphological mix returns tight constraints on the evolving merger rate in the early Universe and favours major merging (with disc survival or rapid reformation) over secular evolution as the mechanism most responsible for building up the first generation of bulges in early-type discs.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Driving on an approach to a signalized intersection while distracted is relatively risky, as potential vehicular conflicts and resulting angle collisions tend to be relatively more severe compared to other locations. Given the prevalence and importance of this particular scenario, the objective of this study was to examine the decisions and actions of distracted drivers during the onset of yellow lights. Driving simulator data were obtained from a sample of 69 drivers under baseline and handheld cell phone conditions at the University of Iowa – National Advanced Driving Simulator. Explanatory variables included age, gender, cell phone use, distance to stop-line, and speed. Although there is extensive research on drivers’ responses to yellow traffic signals, the examinations have been conducted from a traditional regression-based approach, which do not necessary provide the underlying relations and patterns among the sampled data. In this paper, we exploit the benefits of both classical statistical inference and data mining techniques to identify the a priori relationships among main effects, non-linearities, and interaction effects. Results suggest that the probability of yellow light running increases with the increase in driving speed at the onset of yellow. Both young (18–25 years) and middle-aged (30–45 years) drivers reveal reduced propensity for yellow light running whilst distracted across the entire speed range, exhibiting possible risk compensation during this critical driving situation. The propensity for yellow light running for both distracted male and female older (50–60 years) drivers is significantly higher. Driver experience captured by age interacts with distraction, resulting in their combined effect having slower physiological response and being distracted particularly risky.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In analysis of longitudinal data, the variance matrix of the parameter estimates is usually estimated by the 'sandwich' method, in which the variance for each subject is estimated by its residual products. We propose smooth bootstrap methods by perturbing the estimating functions to obtain 'bootstrapped' realizations of the parameter estimates for statistical inference. Our extensive simulation studies indicate that the variance estimators by our proposed methods can not only correct the bias of the sandwich estimator but also improve the confidence interval coverage. We applied the proposed method to a data set from a clinical trial of antibiotics for leprosy.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Now in its second edition, this book describes tools that are commonly used in transportation data analysis. The first part of the text provides statistical fundamentals while the second part presents continuous dependent variable models. With a focus on count and discrete dependent variable models, the third part features new chapters on mixed logit models, logistic regression, and ordered probability models. The last section provides additional coverage of Bayesian statistical modeling, including Bayesian inference and Markov chain Monte Carlo methods. Data sets are available online to use with the modeling techniques discussed.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We present a novel approach for developing summary statistics for use in approximate Bayesian computation (ABC) algorithms by using indirect inference. ABC methods are useful for posterior inference in the presence of an intractable likelihood function. In the indirect inference approach to ABC the parameters of an auxiliary model fitted to the data become the summary statistics. Although applicable to any ABC technique, we embed this approach within a sequential Monte Carlo algorithm that is completely adaptive and requires very little tuning. This methodological development was motivated by an application involving data on macroparasite population evolution modelled by a trivariate stochastic process for which there is no tractable likelihood function. The auxiliary model here is based on a beta–binomial distribution. The main objective of the analysis is to determine which parameters of the stochastic model are estimable from the observed data on mature parasite worms.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

PySSM is a Python package that has been developed for the analysis of time series using linear Gaussian state space models (SSM). PySSM is easy to use; models can be set up quickly and efficiently and a variety of different settings are available to the user. It also takes advantage of scientific libraries Numpy and Scipy and other high level features of the Python language. PySSM is also used as a platform for interfacing between optimised and parallelised Fortran routines. These Fortran routines heavily utilise Basic Linear Algebra (BLAS) and Linear Algebra Package (LAPACK) functions for maximum performance. PySSM contains classes for filtering, classical smoothing as well as simulation smoothing.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This chapter contains sections titled: Introduction Case study: Estimating transmission rates of nosocomial pathogens Models and methods Data analysis and results Discussion References