861 resultados para markov chain model
Resumo:
Intuitively, any ‘bag of words’ approach in IR should benefit from taking term dependencies into account. Unfortunately, for years the results of exploiting such dependencies have been mixed or inconclusive. To improve the situation, this paper shows how the natural language properties of the target documents can be used to transform and enrich the term dependencies to more useful statistics. This is done in three steps. The term co-occurrence statistics of queries and documents are each represented by a Markov chain. The paper proves that such a chain is ergodic, and therefore its asymptotic behavior is unique, stationary, and independent of the initial state. Next, the stationary distribution is taken to model queries and documents, rather than their initial distributions. Finally, ranking is achieved following the customary language modeling paradigm. The main contribution of this paper is to argue why the asymptotic behavior of the document model is a better representation then just the document’s initial distribution. A secondary contribution is to investigate the practical application of this representation in case the queries become increasingly verbose. In the experiments (based on Lemur’s search engine substrate) the default query model was replaced by the stable distribution of the query. Just modeling the query this way already resulted in significant improvements over a standard language model baseline. The results were on a par or better than more sophisticated algorithms that use fine-tuned parameters or extensive training. Moreover, the more verbose the query, the more effective the approach seems to become.
Resumo:
Soil-based emissions of nitrous oxide (N2O), a well-known greenhouse gas, have been associated with changes in soil water-filled pore space (WFPS) and soil temperature in many previous studies. However, it is acknowledged that the environment-N2O relationship is complex and still relatively poorly unknown. In this article, we employed a Bayesian model selection approach (Reversible jump Markov chain Monte Carlo) to develop a data-informed model of the relationship between daily N2O emissions and daily WFPS and soil temperature measurements between March 2007 and February 2009 from a soil under pasture in Queensland, Australia, taking seasonal factors and time-lagged effects into account. The model indicates a very strong relationship between a hybrid seasonal structure and daily N2O emission, with the latter substantially increased in summer. Given the other variables in the model, daily soil WFPS, lagged by a week, had a negative influence on daily N2O; there was evidence of a nonlinear positive relationship between daily soil WFPS and daily N2O emission; and daily soil temperature tended to have a linear positive relationship with daily N2O emission when daily soil temperature was above a threshold of approximately 19°C. We suggest that this flexible Bayesian modeling approach could facilitate greater understanding of the shape of the covariate-N2O flux relation and detection of effect thresholds in the natural temporal variation of environmental variables on N2O emission.
Resumo:
Increasing global competition, rapid technological changes, advances in manufacturing and information technology and discerning customers are forcing supply chains to adopt improvement practices that enable them to deliver high quality products at a lower cost and in a shorter period of time. A lean initiative is one of the most effective approaches toward achieving this goal. In the lean improvement process, it is critical to measure current and desired performance level in order to clearly evaluate the lean implementation efforts. Many attempts have tried to measure supply chain performance incorporating both quantitative and qualitative measures but failed to provide an effective method of measuring improvements in performances for dynamic lean supply chain situations. Therefore, the necessity of appropriate measurement of lean supply chain performance has become imperative. There are many lean tools available for supply chains; however, effectiveness of a lean tool depends on the type of the product and supply chain. One tool may be highly effective for a supply chain involved in high volume products but may not be effective for low volume products. There is currently no systematic methodology available for selecting appropriate lean strategies based on the type of supply chain and market strategy This thesis develops an effective method to measure the performance of supply chain consisting of both quantitative and qualitative metrics and investigates the effects of product types and lean tool selection on the supply chain performance Supply chain performance matrices and the effects of various lean tools over performance metrics mentioned in the SCOR framework have been investigated. A lean supply chain model based on the SCOR metric framework is then developed where non- lean and lean as well as quantitative and qualitative metrics are incorporated in appropriate metrics. The values of appropriate metrics are converted into triangular fuzzy numbers using similarity rules and heuristic methods. Data have been collected from an apparel manufacturing company for multiple supply chain products and then a fuzzy based method is applied to measure the performance improvements in supply chains. Using the fuzzy TOPSIS method, which chooses an optimum alternative to maximise similarities with positive ideal solutions and to minimise similarities with negative ideal solutions, the performances of lean and non- lean supply chain situations for three different apparel products have been evaluated. To address the research questions related to effective performance evaluation method and the effects of lean tools over different types of supply chains; a conceptual framework and two hypotheses are investigated. Empirical results show that implementation of lean tools have significant effects over performance improvements in terms of time, quality and flexibility. Fuzzy TOPSIS based method developed is able to integrate multiple supply chain matrices onto a single performance measure while lean supply chain model incorporates qualitative and quantitative metrics. It can therefore effectively measure the improvements for supply chain after implementing lean tools. It is demonstrated that product types involved in the supply chain and ability to select right lean tools have significant effect on lean supply chain performance. Future study can conduct multiple case studies in different contexts.
Resumo:
This study considered the problem of predicting survival, based on three alternative models: a single Weibull, a mixture of Weibulls and a cure model. Instead of the common procedure of choosing a single “best” model, where “best” is defined in terms of goodness of fit to the data, a Bayesian model averaging (BMA) approach was adopted to account for model uncertainty. This was illustrated using a case study in which the aim was the description of lymphoma cancer survival with covariates given by phenotypes and gene expression. The results of this study indicate that if the sample size is sufficiently large, one of the three models emerge as having highest probability given the data, as indicated by the goodness of fit measure; the Bayesian information criterion (BIC). However, when the sample size was reduced, no single model was revealed as “best”, suggesting that a BMA approach would be appropriate. Although a BMA approach can compromise on goodness of fit to the data (when compared to the true model), it can provide robust predictions and facilitate more detailed investigation of the relationships between gene expression and patient survival. Keywords: Bayesian modelling; Bayesian model averaging; Cure model; Markov Chain Monte Carlo; Mixture model; Survival analysis; Weibull distribution
Resumo:
Approximate Bayesian Computation’ (ABC) represents a powerful methodology for the analysis of complex stochastic systems for which the likelihood of the observed data under an arbitrary set of input parameters may be entirely intractable – the latter condition rendering useless the standard machinery of tractable likelihood-based, Bayesian statistical inference [e.g. conventional Markov chain Monte Carlo (MCMC) simulation]. In this paper, we demonstrate the potential of ABC for astronomical model analysis by application to a case study in the morphological transformation of high-redshift galaxies. To this end, we develop, first, a stochastic model for the competing processes of merging and secular evolution in the early Universe, and secondly, through an ABC-based comparison against the observed demographics of massive (Mgal > 1011 M⊙) galaxies (at 1.5 < z < 3) in the Cosmic Assembly Near-IR Deep Extragalatic Legacy Survey (CANDELS)/Extended Groth Strip (EGS) data set we derive posterior probability densities for the key parameters of this model. The ‘Sequential Monte Carlo’ implementation of ABC exhibited herein, featuring both a self-generating target sequence and self-refining MCMC kernel, is amongst the most efficient of contemporary approaches to this important statistical algorithm. We highlight as well through our chosen case study the value of careful summary statistic selection, and demonstrate two modern strategies for assessment and optimization in this regard. Ultimately, our ABC analysis of the high-redshift morphological mix returns tight constraints on the evolving merger rate in the early Universe and favours major merging (with disc survival or rapid reformation) over secular evolution as the mechanism most responsible for building up the first generation of bulges in early-type discs.
Resumo:
Analytically or computationally intractable likelihood functions can arise in complex statistical inferential problems making them inaccessible to standard Bayesian inferential methods. Approximate Bayesian computation (ABC) methods address such inferential problems by replacing direct likelihood evaluations with repeated sampling from the model. ABC methods have been predominantly applied to parameter estimation problems and less to model choice problems due to the added difficulty of handling multiple model spaces. The ABC algorithm proposed here addresses model choice problems by extending Fearnhead and Prangle (2012, Journal of the Royal Statistical Society, Series B 74, 1–28) where the posterior mean of the model parameters estimated through regression formed the summary statistics used in the discrepancy measure. An additional stepwise multinomial logistic regression is performed on the model indicator variable in the regression step and the estimated model probabilities are incorporated into the set of summary statistics for model choice purposes. A reversible jump Markov chain Monte Carlo step is also included in the algorithm to increase model diversity for thorough exploration of the model space. This algorithm was applied to a validating example to demonstrate the robustness of the algorithm across a wide range of true model probabilities. Its subsequent use in three pathogen transmission examples of varying complexity illustrates the utility of the algorithm in inferring preference of particular transmission models for the pathogens.
Resumo:
This paper develops maximum likelihood (ML) estimation schemes for finite-state semi-Markov chains in white Gaussian noise. We assume that the semi-Markov chain is characterised by transition probabilities of known parametric from with unknown parameters. We reformulate this hidden semi-Markov model (HSM) problem in the scalar case as a two-vector homogeneous hidden Markov model (HMM) problem in which the state consist of the signal augmented by the time to last transition. With this reformulation we apply the expectation Maximumisation (EM ) algorithm to obtain ML estimates of the transition probabilities parameters, Markov state levels and noise variance. To demonstrate our proposed schemes, motivated by neuro-biological applications, we use a damped sinusoidal parameterised function for the transition probabilities.
Resumo:
Introduced in this paper is a Bayesian model for isolating the resonant frequency from combustion chamber resonance. The model shown in this paper focused on characterising the initial rise in the resonant frequency to investigate the rise of in-cylinder bulk temperature associated with combustion. By resolving the model parameters, it is possible to determine: the start of pre-mixed combustion, the start of diffusion combustion, the initial resonant frequency, the resonant frequency as a function of crank angle, the in-cylinder bulk temperature as a function of crank angle and the trapped mass as a function of crank angle. The Bayesian method allows for individual cycles to be examined without cycle-averaging|allowing inter-cycle variability studies. Results are shown for a turbo-charged, common-rail compression ignition engine run at 2000 rpm and full load.
Resumo:
In this paper we present a new method for performing Bayesian parameter inference and model choice for low count time series models with intractable likelihoods. The method involves incorporating an alive particle filter within a sequential Monte Carlo (SMC) algorithm to create a novel pseudo-marginal algorithm, which we refer to as alive SMC^2. The advantages of this approach over competing approaches is that it is naturally adaptive, it does not involve between-model proposals required in reversible jump Markov chain Monte Carlo and does not rely on potentially rough approximations. The algorithm is demonstrated on Markov process and integer autoregressive moving average models applied to real biological datasets of hospital-acquired pathogen incidence, animal health time series and the cumulative number of poison disease cases in mule deer.
Resumo:
In this paper, we examine approaches to estimate a Bayesian mixture model at both single and multiple time points for a sample of actual and simulated aerosol particle size distribution (PSD) data. For estimation of a mixture model at a single time point, we use Reversible Jump Markov Chain Monte Carlo (RJMCMC) to estimate mixture model parameters including the number of components which is assumed to be unknown. We compare the results of this approach to a commonly used estimation method in the aerosol physics literature. As PSD data is often measured over time, often at small time intervals, we also examine the use of an informative prior for estimation of the mixture parameters which takes into account the correlated nature of the parameters. The Bayesian mixture model offers a promising approach, providing advantages both in estimation and inference.
Resumo:
Markov random fields (MRF) are popular in image processing applications to describe spatial dependencies between image units. Here, we take a look at the theory and the models of MRFs with an application to improve forest inventory estimates. Typically, autocorrelation between study units is a nuisance in statistical inference, but we take an advantage of the dependencies to smooth noisy measurements by borrowing information from the neighbouring units. We build a stochastic spatial model, which we estimate with a Markov chain Monte Carlo simulation method. The smooth values are validated against another data set increasing our confidence that the estimates are more accurate than the originals.
Resumo:
This paper considers antenna selection (AS) at a receiver equipped with multiple antenna elements but only a single radio frequency chain for packet reception. As information about the channel state is acquired using training symbols (pilots), the receiver makes its AS decisions based on noisy channel estimates. Additional information that can be exploited for AS includes the time-correlation of the wireless channel and the results of the link-layer error checks upon receiving the data packets. In this scenario, the task of the receiver is to sequentially select (a) the pilot symbol allocation, i.e., how to distribute the available pilot symbols among the antenna elements, for channel estimation on each of the receive antennas; and (b) the antenna to be used for data packet reception. The goal is to maximize the expected throughput, based on the past history of allocation and selection decisions, and the corresponding noisy channel estimates and error check results. Since the channel state is only partially observed through the noisy pilots and the error checks, the joint problem of pilot allocation and AS is modeled as a partially observed Markov decision process (POMDP). The solution to the POMDP yields the policy that maximizes the long-term expected throughput. Using the Finite State Markov Chain (FSMC) model for the wireless channel, the performance of the POMDP solution is compared with that of other existing schemes, and it is illustrated through numerical evaluation that the POMDP solution significantly outperforms them.
Resumo:
Quantifying distributional behavior of extreme events is crucial in hydrologic designs. Intensity Duration Frequency (IDF) relationships are used extensively in engineering especially in urban hydrology, to obtain return level of extreme rainfall event for a specified return period and duration. Major sources of uncertainty in the IDF relationships are due to insufficient quantity and quality of data leading to parameter uncertainty due to the distribution fitted to the data and uncertainty as a result of using multiple GCMs. It is important to study these uncertainties and propagate them to future for accurate assessment of return levels for future. The objective of this study is to quantify the uncertainties arising from parameters of the distribution fitted to data and the multiple GCM models using Bayesian approach. Posterior distribution of parameters is obtained from Bayes rule and the parameters are transformed to obtain return levels for a specified return period. Markov Chain Monte Carlo (MCMC) method using Metropolis Hastings algorithm is used to obtain the posterior distribution of parameters. Twenty six CMIP5 GCMs along with four RCP scenarios are considered for studying the effects of climate change and to obtain projected IDF relationships for the case study of Bangalore city in India. GCM uncertainty due to the use of multiple GCMs is treated using Reliability Ensemble Averaging (REA) technique along with the parameter uncertainty. Scale invariance theory is employed for obtaining short duration return levels from daily data. It is observed that the uncertainty in short duration rainfall return levels is high when compared to the longer durations. Further it is observed that parameter uncertainty is large compared to the model uncertainty. (C) 2015 Elsevier Ltd. All rights reserved.
Resumo:
This work addresses the problem of estimating the optimal value function in a Markov Decision Process from observed state-action pairs. We adopt a Bayesian approach to inference, which allows both the model to be estimated and predictions about actions to be made in a unified framework, providing a principled approach to mimicry of a controller on the basis of observed data. A new Markov chain Monte Carlo (MCMC) sampler is devised for simulation from theposterior distribution over the optimal value function. This step includes a parameter expansion step, which is shown to be essential for good convergence properties of the MCMC sampler. As an illustration, the method is applied to learning a human controller.