150 resultados para Classification, Markov chain Monte Carlo, k-nearest neighbours


Relevância:

100.00% 100.00%

Publicador:

Resumo:

The research objectives of this thesis were to contribute to Bayesian statistical methodology by contributing to risk assessment statistical methodology, and to spatial and spatio-temporal methodology, by modelling error structures using complex hierarchical models. Specifically, I hoped to consider two applied areas, and use these applications as a springboard for developing new statistical methods as well as undertaking analyses which might give answers to particular applied questions. Thus, this thesis considers a series of models, firstly in the context of risk assessments for recycled water, and secondly in the context of water usage by crops. The research objective was to model error structures using hierarchical models in two problems, namely risk assessment analyses for wastewater, and secondly, in a four dimensional dataset, assessing differences between cropping systems over time and over three spatial dimensions. The aim was to use the simplicity and insight afforded by Bayesian networks to develop appropriate models for risk scenarios, and again to use Bayesian hierarchical models to explore the necessarily complex modelling of four dimensional agricultural data. The specific objectives of the research were to develop a method for the calculation of credible intervals for the point estimates of Bayesian networks; to develop a model structure to incorporate all the experimental uncertainty associated with various constants thereby allowing the calculation of more credible credible intervals for a risk assessment; to model a single dayâs data from the agricultural dataset which satisfactorily captured the complexities of the data; to build a model for several daysâ data, in order to consider how the full data might be modelled; and finally to build a model for the full four dimensional dataset and to consider the timevarying nature of the contrast of interest, having satisfactorily accounted for possible spatial and temporal autocorrelations. This work forms five papers, two of which have been published, with two submitted, and the final paper still in draft. The first two objectives were met by recasting the risk assessments as directed, acyclic graphs (DAGs). In the first case, we elicited uncertainty for the conditional probabilities needed by the Bayesian net, incorporated these into a corresponding DAG, and used Markov chain Monte Carlo (MCMC) to find credible intervals, for all the scenarios and outcomes of interest. In the second case, we incorporated the experimental data underlying the risk assessment constants into the DAG, and also treated some of that data as needing to be modelled as an â˜errors-invariablesâ problem [Fuller, 1987]. This illustrated a simple method for the incorporation of experimental error into risk assessments. In considering one day of the three-dimensional agricultural data, it became clear that geostatistical models or conditional autoregressive (CAR) models over the three dimensions were not the best way to approach the data. Instead CAR models are used with neighbours only in the same depth layer. This gave flexibility to the model, allowing both the spatially structured and non-structured variances to differ at all depths. We call this model the CAR layered model. Given the experimental design, the fixed part of the model could have been modelled as a set of means by treatment and by depth, but doing so allows little insight into how the treatment effects vary with depth. Hence, a number of essentially non-parametric approaches were taken to see the effects of depth on treatment, with the model of choice incorporating an errors-in-variables approach for depth in addition to a non-parametric smooth. The statistical contribution here was the introduction of the CAR layered model, the applied contribution the analysis of moisture over depth and estimation of the contrast of interest together with its credible intervals. These models were fitted using WinBUGS [Lunn et al., 2000]. The work in the fifth paper deals with the fact that with large datasets, the use of WinBUGS becomes more problematic because of its highly correlated term by term updating. In this work, we introduce a Gibbs sampler with block updating for the CAR layered model. The Gibbs sampler was implemented by Chris Strickland using pyMCMC [Strickland, 2010]. This framework is then used to consider five days data, and we show that moisture in the soil for all the various treatments reaches levels particular to each treatment at a depth of 200 cm and thereafter stays constant, albeit with increasing variances with depth. In an analysis across three spatial dimensions and across time, there are many interactions of time and the spatial dimensions to be considered. Hence, we chose to use a daily model and to repeat the analysis at all time points, effectively creating an interaction model of time by the daily model. Such an approach allows great flexibility. However, this approach does not allow insight into the way in which the parameter of interest varies over time. Hence, a two-stage approach was also used, with estimates from the first-stage being analysed as a set of time series. We see this spatio-temporal interaction model as being a useful approach to data measured across three spatial dimensions and time, since it does not assume additivity of the random spatial or temporal effects.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper, we describe an analysis for data collected on a three-dimensional spatial lattice with treatments applied at the horizontal lattice points. Spatial correlation is accounted for using a conditional autoregressive model. Observations are defined as neighbours only if they are at the same depth. This allows the corresponding variance components to vary by depth. We use the Markov chain Monte Carlo method with block updating, together with Krylov subspace methods, for efficient estimation of the model. The method is applicable to both regular and irregular horizontal lattices and hence to data collected at any set of horizontal sites for a set of depths or heights, for example, water column or soil profile data. The model for the three-dimensional data is applied to agricultural trial data for five separate days taken roughly six months apart in order to determine possible relationships over time. The purpose of the trial is to determine a form of cropping that leads to less moist soils in the root zone and beyond.We estimate moisture for each date, depth and treatment accounting for spatial correlation and determine relationships of these and other parameters over time.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Purpose: To explore the role of the neighborhood environment in supporting walking Design: Cross sectional study of 10,286 residents of 200 neighborhoods. Participants were selected using a stratified two-stage cluster design. Data were collected by mail survey (68.5% response rate). Setting: The Brisbane City Local Government Area, Australia, 2007. Subjects: Brisbane residents aged 40 to 65 years. Measures Environmental: street connectivity, residential density, hilliness, tree coverage, bikeways, and street lights within a one kilometer circular buffer from each residentâs home; and network distance to nearest river or coast, public transport, shop, and park. Walking: minutes in the previous week categorized as < 30 minutes, ⥠30 < 90 minutes, ⥠90 < 150 minutes, ⥠150 < 300 minutes, and ⥠300 minutes. Analysis: The association between each neighborhood characteristic and walking was examined using multilevel multinomial logistic regression and the model parameters were estimated using Markov chain Monte Carlo simulation. Results: After adjustment for individual factors, the likelihood of walking for more than 300 minutes (relative to <30 minutes) was highest in areas with the most connectivity (OR=1.93, 99% CI 1.32-2.80), the greatest residential density (OR=1.47, 99% CI 1.02-2.12), the least tree coverage (OR=1.69, 99% CI 1.13-2.51), the most bikeways (OR=1.60, 99% CI 1.16-2.21), and the most street lights (OR=1.50, 99% CI 1.07-2.11). The likelihood of walking for more than 300 minutes was also higher among those who lived closest to a river or the coast (OR=2.06, 99% CI 1.41-3.02). Conclusion: The likelihood of meeting (and exceeding) physical activity recommendations on the basis of walking was higher in neighborhoods with greater street connectivity and residential density, more street lights and bikeways, closer proximity to waterways, and less tree coverage. Interventions targeting these neighborhood characteristics may lead to improved environmental quality as well as lower rates of overweight and obesity and associated chromic disease.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

For clinical use, in electrocardiogram (ECG) signal analysis it is important to detect not only the centre of the P wave, the QRS complex and the T wave, but also the time intervals, such as the ST segment. Much research focused entirely on qrs complex detection, via methods such as wavelet transforms, spline fitting and neural networks. However, drawbacks include the false classification of a severe noise spike as a QRS complex, possibly requiring manual editing, or the omission of information contained in other regions of the ECG signal. While some attempts were made to develop algorithms to detect additional signal characteristics, such as P and T waves, the reported success rates are subject to change from person-to-person and beat-to-beat. To address this variability we propose the use of Markov-chain Monte Carlo statistical modelling to extract the key features of an ECG signal and we report on a feasibility study to investigate the utility of the approach. The modelling approach is examined with reference to a realistic computer generated ECG signal, where details such as wave morphology and noise levels are variable.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

An experimental study has been performed to investigate the ignition delay of a modern heavy-duty common-rail diesel engine run with fumigated ethanol substitutions up to 40% on an energy basis. The ignition delay was determined through the use of statistical modelling in a Bayesian framework this framework allows for the accurate determination of the start of combustion from single consecutive cycles and does not require any differentiation of the in-cylinder pressure signal. At full load the ignition delay has been shown to decrease with increasing ethanol substitutions and evidence of combustion with high ethanol substitutions prior to diesel injection have also been shown experimentally and by modelling. Whereas, at half load increasing ethanol substitutions have increased the ignition delay. A threshold absolute air to fuel ratio (mole basis) of above ~110 for consistent operation has been determined from the inter-cycle variability of the ignition delay, a result that agrees well with previous research of other in-cylinder parameters and further highlights the correlation between the air to fuel ratio and inter-cycle variability. Numerical modelling to investigate the sensitivity of ethanol combustion has also been performed. It has been shown that ethanol combustion is sensitive to the initial air temperature around the feasible operating conditions of the engine. Moreover, a negative temperature coefficient region of approximately 900{1050 K (the approximate temperature at fuel injection) has been shown with for n-heptane and n-heptane/ethanol blends in the numerical modelling. A consequence of this is that the dominate effect influencing the ignition delay under increasing ethanol substitutions may rather be from an increase in chemical reactions and not from in-cylinder temperature. Further investigation revealed that the chemical reactions at low ethanol substitutions are different compared to the high (> 20%) ethanol substitutions.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper, we examine approaches to estimate a Bayesian mixture model at both single and multiple time points for a sample of actual and simulated aerosol particle size distribution (PSD) data. For estimation of a mixture model at a single time point, we use Reversible Jump Markov Chain Monte Carlo (RJMCMC) to estimate mixture model parameters including the number of components which is assumed to be unknown. We compare the results of this approach to a commonly used estimation method in the aerosol physics literature. As PSD data is often measured over time, often at small time intervals, we also examine the use of an informative prior for estimation of the mixture parameters which takes into account the correlated nature of the parameters. The Bayesian mixture model offers a promising approach, providing advantages both in estimation and inference.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Phase-type distributions represent the time to absorption for a finite state Markov chain in continuous time, generalising the exponential distribution and providing a flexible and useful modelling tool. We present a new reversible jump Markov chain Monte Carlo scheme for performing a fully Bayesian analysis of the popular Coxian subclass of phase-type models; the convenient Coxian representation involves fewer parameters than a more general phase-type model. The key novelty of our approach is that we model covariate dependence in the mean whilst using the Coxian phase-type model as a very general residual distribution. Such incorporation of covariates into the model has not previously been attempted in the Bayesian literature. A further novelty is that we also propose a reversible jump scheme for investigating structural changes to the model brought about by the introduction of Erlang phases. Our approach addresses more questions of inference than previous Bayesian treatments of this model and is automatic in nature. We analyse an example dataset comprising lengths of hospital stays of a sample of patients collected from two Australian hospitals to produce a model for a patient's expected length of stay which incorporates the effects of several covariates. This leads to interesting conclusions about what contributes to length of hospital stay with implications for hospital planning. We compare our results with an alternative classical analysis of these data.

Relevância:

100.00% 100.00%

Publicador:

Relevância:

100.00% 100.00%

Publicador:

Relevância:

100.00% 100.00%

Publicador:

Resumo:

PURPOSE: To examine the association between neighborhood disadvantage and physical activity (PA). ---------- METHODS: We use data from the HABITAT multilevel longitudinal study of PA among mid-aged (40-65 years) men and women (n=11, 037, 68.5% response rate) living in 200 neighborhoods in Brisbane, Australia. PA was measured using three questions from the Active Australia Survey (general walking, moderate, and vigorous activity), one indicator of total activity, and two questions about walking and cycling for transport. The PA measures were operationalized using multiple categories based on time and estimated energy expenditure that were interpretable with reference to the latest PA recommendations. The association between neighborhood disadvantage and PA was examined using multilevel multinomial logistic regression and Markov Chain Monte Carlo simulation. The contribution of neighborhood disadvantage to between-neighborhood variation in PA was assessed using the 80% interval odds ratio. ---------- RESULTS: After adjustment for sex, age, living arrangement, education, occupation, and household income, reported participation in all measures and levels of PA varied significantly across Brisbaneâs neighborhoods, and neighborhood disadvantage accounted for some of this variation. Residents of advantaged neighborhoods reported significantly higher levels of total activity, general walking, moderate, and vigorous activity; however, they were less likely to walk for transport. There was no statistically significant association between neighborhood disadvantage and cycling for transport. In terms of total PA, residents of advantaged neighborhoods were more likely to exceed PA recommendations. ---------- CONCLUSIONS: Neighborhoods may exert a contextual effect on residentsâ likelihood of participating in PA. The greater propensity of residents in advantaged neighborhoods to do high levels of total PA may contribute to lower rates of cardiovascular disease and obesity in these areas