150 resultados para Classification, Markov chain Monte Carlo, k-nearest neighbours

em Queensland University of Technology - ePrints Archive


Relevância:

100.00% 100.00%

Publicador:

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Markov chain Monte Carlo (MCMC) estimation provides a solution to the complex integration problems that are faced in the Bayesian analysis of statistical problems. The implementation of MCMC algorithms is, however, code intensive and time consuming. We have developed a Python package, which is called PyMCMC, that aids in the construction of MCMC samplers and helps to substantially reduce the likelihood of coding error, as well as aid in the minimisation of repetitive code. PyMCMC contains classes for Gibbs, Metropolis Hastings, independent Metropolis Hastings, random walk Metropolis Hastings, orientational bias Monte Carlo and slice samplers as well as specific modules for common models such as a module for Bayesian regression analysis. PyMCMC is straightforward to optimise, taking advantage of the Python libraries Numpy and Scipy, as well as being readily extensible with C or Fortran.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Motor unit number estimation (MUNE) is a method which aims to provide a quantitative indicator of progression of diseases that lead to loss of motor units, such as motor neurone disease. However the development of a reliable, repeatable and fast real-time MUNE method has proved elusive hitherto. Ridall et al. (2007) implement a reversible jump Markov chain Monte Carlo (RJMCMC) algorithm to produce a posterior distribution for the number of motor units using a Bayesian hierarchical model that takes into account biological information about motor unit activation. However we find that the approach can be unreliable for some datasets since it can suffer from poor cross-dimensional mixing. Here we focus on improved inference by marginalising over latent variables to create the likelihood. In particular we explore how this can improve the RJMCMC mixing and investigate alternative approaches that utilise the likelihood (e.g. DIC (Spiegelhalter et al., 2002)). For this model the marginalisation is over latent variables which, for a larger number of motor units, is an intractable summation over all combinations of a set of latent binary variables whose joint sample space increases exponentially with the number of motor units. We provide a tractable and accurate approximation for this quantity and also investigate simulation approaches incorporated into RJMCMC using results of Andrieu and Roberts (2009).

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Standard Monte Carlo (sMC) simulation models have been widely used in AEC industry research to address system uncertainties. Although the benefits of probabilistic simulation analyses over deterministic methods are well documented, the sMC simulation technique is quite sensitive to the probability distributions of the input variables. This phenomenon becomes highly pronounced when the region of interest within the joint probability distribution (a function of the input variables) is small. In such cases, the standard Monte Carlo approach is often impractical from a computational standpoint. In this paper, a comparative analysis of standard Monte Carlo simulation to Markov Chain Monte Carlo with subset simulation (MCMC/ss) is presented. The MCMC/ss technique constitutes a more complex simulation method (relative to sMC), wherein a structured sampling algorithm is employed in place of completely randomized sampling. Consequently, gains in computational efficiency can be made. The two simulation methods are compared via theoretical case studies.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Both environmental economists and policy makers have shown a great deal of interest in the effect of pollution abatement on environmental efficiency. In line with the modern resources available, however, no contribution is brought to the environmental economics field with the Markov chain Monte Carlo (MCMC) application, which enables simulation from a distribution of a Markov chain and simulating from the chain until it approaches equilibrium. The probability density functions gained prominence with the advantages over classical statistical methods in its simultaneous inference and incorporation of any prior information on all model parameters. This paper concentrated on this point with the application of MCMC to the database of China, the largest developing country with rapid economic growth and serious environmental pollution in recent years. The variables cover the economic output and pollution abatement cost from the year 1992 to 2003. We test the causal direction between pollution abatement cost and environmental efficiency with MCMC simulation. We found that the pollution abatement cost causes an increase in environmental efficiency through the algorithm application, which makes it conceivable that the environmental policy makers should make more substantial measures to reduce pollution in the near future.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The population Monte Carlo algorithm is an iterative importance sampling scheme for solving static problems. We examine the population Monte Carlo algorithm in a simplified setting, a single step of the general algorithm, and study a fundamental problem that occurs in applying importance sampling to high-dimensional problem. The precision of the computed estimate from the simplified setting is measured by the asymptotic variance of estimate under conditions on the importance function. We demonstrate the exponential growth of the asymptotic variance with the dimension and show that the optimal covariance matrix for the importance function can be estimated in special cases.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Here we present a sequential Monte Carlo approach to Bayesian sequential design for the incorporation of model uncertainty. The methodology is demonstrated through the development and implementation of two model discrimination utilities; mutual information and total separation, but it can also be applied more generally if one has different experimental aims. A sequential Monte Carlo algorithm is run for each rival model (in parallel), and provides a convenient estimate of the marginal likelihood (of each model) given the data, which can be used for model comparison and in the evaluation of utility functions. A major benefit of this approach is that it requires very little problem specific tuning and is also computationally efficient when compared to full Markov chain Monte Carlo approaches. This research is motivated by applications in drug development and chemical engineering.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A new transdimensional Sequential Monte Carlo (SMC) algorithm called SM- CVB is proposed. In an SMC approach, a weighted sample of particles is generated from a sequence of probability distributions which â˜convergeâ to the target distribution of interest, in this case a Bayesian posterior distri- bution. The approach is based on the use of variational Bayes to propose new particles at each iteration of the SMCVB algorithm in order to target the posterior more efficiently. The variational-Bayes-generated proposals are not limited to a fixed dimension. This means that the weighted particle sets that arise can have varying dimensions thereby allowing us the option to also estimate an appropriate dimension for the model. This novel algorithm is outlined within the context of finite mixture model estimation. This pro- vides a less computationally demanding alternative to using reversible jump Markov chain Monte Carlo kernels within an SMC approach. We illustrate these ideas in a simulated data analysis and in applications.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Purpose. To explore the role of the neighborhood environment in supporting walking Design. Cross sectional study of 10,286 residents of 200 neighborhoods. Participants were selected using a stratified two-stage cluster design. Data were collected by mail survey (68.5% response rate). Setting. The Brisbane City Local Government Area, Australia, 2007. Subjects. Brisbane residents aged 40 to 65 years. Measures. Environmental: street connectivity, residential density, hilliness, tree coverage, bikeways, and street lights within a one kilometer circular buffer from each residentâs home; and network distance to nearest river or coast, public transport, shop, and park. Walking: minutes in the previous week categorized as < 30 minutes, ⥠30 < 90 minutes, ⥠90 < 150 minutes, ⥠150 < 300 minutes, and ⥠300 minutes. Analysis. The association between each neighborhood characteristic and walking was examined using multilevel multinomial logistic regression and the model parameters were estimated using Markov chain Monte Carlo simulation. Results. After adjustment for individual factors, the likelihood of walking for more than 300 minutes (relative to <30 minutes) was highest in areas with the most connectivity (OR=1.93, 99% CI 1.32-2.80), the greatest residential density (OR=1.47, 99% CI 1.02-2.12), the least tree coverage (OR=1.69, 99% CI 1.13-2.51), the most bikeways (OR=1.60, 99% CI 1.16-2.21), and the most street lights (OR=1.50, 99% CI 1.07-2.11). The likelihood of walking for more than 300 minutes was also higher among those who lived closest to a river or the coast (OR=2.06, 99% CI 1.41-3.02). Conclusion. The likelihood of meeting (and exceeding) physical activity recommendations on the basis of walking was higher in neighborhoods with greater street connectivity and residential density, more street lights and bikeways, closer proximity to waterways, and less tree coverage. Interventions targeting these neighborhood characteristics may lead to improved environmental quality as well as lower rates of overweight and obesity and associated chromic disease.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Methicillin-resistant Staphylococcus Aureus (MRSA) is a pathogen that continues to be of major concern in hospitals. We develop models and computational schemes based on observed weekly incidence data to estimate MRSA transmission parameters. We extend the deterministic model of McBryde, Pettitt, and McElwain (2007, Journal of Theoretical Biology 245, 470â481) involving an underlying population of MRSA colonized patients and health-care workers that describes, among other processes, transmission between uncolonized patients and colonized health-care workers and vice versa. We develop new bivariate and trivariate Markov models to include incidence so that estimated transmission rates can be based directly on new colonizations rather than indirectly on prevalence. Imperfect sensitivity of pathogen detection is modeled using a hidden Markov process. The advantages of our approach include (i) a discrete valued assumption for the number of colonized health-care workers, (ii) two transmission parameters can be incorporated into the likelihood, (iii) the likelihood depends on the number of new cases to improve precision of inference, (iv) individual patient records are not required, and (v) the possibility of imperfect detection of colonization is incorporated. We compare our approach with that used by McBryde et al. (2007) based on an approximation that eliminates the health-care workers from the model, uses Markov chain Monte Carlo and individual patient data. We apply these models to MRSA colonization data collected in a small intensive care unit at the Princess Alexandra Hospital, Brisbane, Australia.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

One of the surprising recurring phenomena observed in experiments with boosting is that the test error of the generated classifier usually does not increase as its size becomes very large, and often is observed to decrease even after the training error reaches zero. In this paper, we show that this phenomenon is related to the distribution of margins of the training examples with respect to the generated voting classification rule, where the margin of an example is simply the difference between the number of correct votes and the maximum number of votes received by any incorrect label. We show that techniques used in the analysis of Vapnik's support vector classifiers and of neural networks with small weights can be applied to voting methods to relate the margin distribution to the test error. We also show theoretically and experimentally that boosting is especially effective at increasing the margins of the training examples. Finally, we compare our explanation to those based on the bias-variance decomposition.