911 resultados para Continuous-time Markov Chain


Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper deals with exponential stability of discrete-time singular systems with Markov jump parameters. We propose a set of coupled generalized Lyapunov equations (CGLE) that provides sufficient conditions to check this property for this class of systems. A method for solving the obtained CGLE is also presented, based on iterations of standard singular Lyapunov equations. We present also a numerical example to illustrate the effectiveness of the approach we are proposing.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this work we compared the estimates of the parameters of ARCH models using a complete Bayesian method and an empirical Bayesian method in which we adopted a non-informative prior distribution and informative prior distribution, respectively. We also considered a reparameterization of those models in order to map the space of the parameters into real space. This procedure permits choosing prior normal distributions for the transformed parameters. The posterior summaries were obtained using Monte Carlo Markov chain methods (MCMC). The methodology was evaluated by considering the Telebras series from the Brazilian financial market. The results show that the two methods are able to adjust ARCH models with different numbers of parameters. The empirical Bayesian method provided a more parsimonious model to the data and better adjustment than the complete Bayesian method.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper introduces a novel approach to making inference about the regression parameters in the accelerated failure time (AFT) model for current status and interval censored data. The estimator is constructed by inverting a Wald type test for testing a null proportional hazards model. A numerically efficient Markov chain Monte Carlo (MCMC) based resampling method is proposed to simultaneously obtain the point estimator and a consistent estimator of its variance-covariance matrix. We illustrate our approach with interval censored data sets from two clinical studies. Extensive numerical studies are conducted to evaluate the finite sample performance of the new estimators.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The tobacco-specific nitrosamine 4-(methylnitrosamino)-1-(3-pyridyl)-1-butanone (NNK) is an obvious carcinogen for lung cancer. Since CBMN (Cytokinesis-blocked micronucleus) has been found to be extremely sensitive to NNK-induced genetic damage, it is a potential important factor to predict the lung cancer risk. However, the association between lung cancer and NNK-induced genetic damage measured by CBMN assay has not been rigorously examined. ^ This research develops a methodology to model the chromosomal changes under NNK-induced genetic damage in a logistic regression framework in order to predict the occurrence of lung cancer. Since these chromosomal changes were usually not observed very long due to laboratory cost and time, a resampling technique was applied to generate the Markov chain of the normal and the damaged cell for each individual. A joint likelihood between the resampled Markov chains and the logistic regression model including transition probabilities of this chain as covariates was established. The Maximum likelihood estimation was applied to carry on the statistical test for comparison. The ability of this approach to increase discriminating power to predict lung cancer was compared to a baseline "non-genetic" model. ^ Our method offered an option to understand the association between the dynamic cell information and lung cancer. Our study indicated the extent of DNA damage/non-damage using the CBMN assay provides critical information that impacts public health studies of lung cancer risk. This novel statistical method could simultaneously estimate the process of DNA damage/non-damage and its relationship with lung cancer for each individual.^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Los fundamentos de la Teoría de la Decisión Bayesiana proporcionan un marco coherente en el que se pueden resolver los problemas de toma de decisiones. La creciente disponibilidad de ordenadores potentes está llevando a tratar problemas cada vez más complejos con numerosas fuentes de incertidumbre multidimensionales; varios objetivos conflictivos; preferencias, metas y creencias cambiantes en el tiempo y distintos grupos afectados por las decisiones. Estos factores, a su vez, exigen mejores herramientas de representación de problemas; imponen fuertes restricciones cognitivas sobre los decisores y conllevan difíciles problemas computacionales. Esta tesis tratará estos tres aspectos. En el Capítulo 1, proporcionamos una revisión crítica de los principales métodos gráficos de representación y resolución de problemas, concluyendo con algunas recomendaciones fundamentales y generalizaciones. Nuestro segundo comentario nos lleva a estudiar tales métodos cuando sólo disponemos de información parcial sobre las preferencias y creencias del decisor. En el Capítulo 2, estudiamos este problema cuando empleamos diagramas de influencia (DI). Damos un algoritmo para calcular las soluciones no dominadas en un DI y analizamos varios conceptos de solución ad hoc. El último aspecto se estudia en los Capítulos 3 y 4. Motivado por una aplicación de gestión de embalses, introducimos un método heurístico para resolver problemas de decisión secuenciales. Como muestra resultados muy buenos, extendemos la idea a problemas secuenciales generales y cuantificamos su bondad. Exploramos después en varias direcciones la aplicación de métodos de simulación al Análisis de Decisiones. Introducimos primero métodos de Monte Cario para aproximar el conjunto no dominado en problemas continuos. Después, proporcionamos un método de Monte Cario basado en cadenas de Markov para problemas con información completa con estructura general: las decisiones y las variables aleatorias pueden ser continuas, y la función de utilidad puede ser arbitraria. Nuestro esquema es aplicable a muchos problemas modelizados como DI. Finalizamos con un capítulo de conclusiones y problemas abiertos.---ABSTRACT---The foundations of Bayesian Decisión Theory provide a coherent framework in which decisión making problems may be solved. With the advent of powerful computers and given the many challenging problems we face, we are gradually attempting to solve more and more complex decisión making problems with high and multidimensional uncertainty, múltiple objectives, influence of time over decisión tasks and influence over many groups. These complexity factors demand better representation tools for decisión making problems; place strong cognitive demands on the decison maker judgements; and lead to involved computational problems. This thesis will deal with these three topics. In recent years, many representation tools have been developed for decisión making problems. In Chapter 1, we provide a critical review of most of them and conclude with recommendations and generalisations. Given our second query, we could wonder how may we deal with those representation tools when there is only partial information. In Chapter 2, we find out how to deal with such a problem when it is structured as an influence diagram (ID). We give an algorithm to compute nondominated solutions in ID's and analyse several ad hoc solution concepts.- The last issue is studied in Chapters 3 and 4. In a reservoir management case study, we have introduced a heuristic method for solving sequential decisión making problems. Since it shows very good performance, we extend the idea to general problems and quantify its goodness. We explore then in several directions the application of simulation based methods to Decisión Analysis. We first introduce Monte Cario methods to approximate the nondominated set in continuous problems. Then, we provide a Monte Cario Markov Chain method for problems under total information with general structure: decisions and random variables may be continuous, and the utility function may be arbitrary. Our scheme is applicable to many problems modeled as IDs. We conclude with discussions and several open problems.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Thesis (Master's)--University of Washington, 2016-06

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Key words: Markov-modulated queues, waiting time, heavy traffic.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Many modern applications fall into the category of "large-scale" statistical problems, in which both the number of observations n and the number of features or parameters p may be large. Many existing methods focus on point estimation, despite the continued relevance of uncertainty quantification in the sciences, where the number of parameters to estimate often exceeds the sample size, despite huge increases in the value of n typically seen in many fields. Thus, the tendency in some areas of industry to dispense with traditional statistical analysis on the basis that "n=all" is of little relevance outside of certain narrow applications. The main result of the Big Data revolution in most fields has instead been to make computation much harder without reducing the importance of uncertainty quantification. Bayesian methods excel at uncertainty quantification, but often scale poorly relative to alternatives. This conflict between the statistical advantages of Bayesian procedures and their substantial computational disadvantages is perhaps the greatest challenge facing modern Bayesian statistics, and is the primary motivation for the work presented here.

Two general strategies for scaling Bayesian inference are considered. The first is the development of methods that lend themselves to faster computation, and the second is design and characterization of computational algorithms that scale better in n or p. In the first instance, the focus is on joint inference outside of the standard problem of multivariate continuous data that has been a major focus of previous theoretical work in this area. In the second area, we pursue strategies for improving the speed of Markov chain Monte Carlo algorithms, and characterizing their performance in large-scale settings. Throughout, the focus is on rigorous theoretical evaluation combined with empirical demonstrations of performance and concordance with the theory.

One topic we consider is modeling the joint distribution of multivariate categorical data, often summarized in a contingency table. Contingency table analysis routinely relies on log-linear models, with latent structure analysis providing a common alternative. Latent structure models lead to a reduced rank tensor factorization of the probability mass function for multivariate categorical data, while log-linear models achieve dimensionality reduction through sparsity. Little is known about the relationship between these notions of dimensionality reduction in the two paradigms. In Chapter 2, we derive several results relating the support of a log-linear model to nonnegative ranks of the associated probability tensor. Motivated by these findings, we propose a new collapsed Tucker class of tensor decompositions, which bridge existing PARAFAC and Tucker decompositions, providing a more flexible framework for parsimoniously characterizing multivariate categorical data. Taking a Bayesian approach to inference, we illustrate empirical advantages of the new decompositions.

Latent class models for the joint distribution of multivariate categorical, such as the PARAFAC decomposition, data play an important role in the analysis of population structure. In this context, the number of latent classes is interpreted as the number of genetically distinct subpopulations of an organism, an important factor in the analysis of evolutionary processes and conservation status. Existing methods focus on point estimates of the number of subpopulations, and lack robust uncertainty quantification. Moreover, whether the number of latent classes in these models is even an identified parameter is an open question. In Chapter 3, we show that when the model is properly specified, the correct number of subpopulations can be recovered almost surely. We then propose an alternative method for estimating the number of latent subpopulations that provides good quantification of uncertainty, and provide a simple procedure for verifying that the proposed method is consistent for the number of subpopulations. The performance of the model in estimating the number of subpopulations and other common population structure inference problems is assessed in simulations and a real data application.

In contingency table analysis, sparse data is frequently encountered for even modest numbers of variables, resulting in non-existence of maximum likelihood estimates. A common solution is to obtain regularized estimates of the parameters of a log-linear model. Bayesian methods provide a coherent approach to regularization, but are often computationally intensive. Conjugate priors ease computational demands, but the conjugate Diaconis--Ylvisaker priors for the parameters of log-linear models do not give rise to closed form credible regions, complicating posterior inference. In Chapter 4 we derive the optimal Gaussian approximation to the posterior for log-linear models with Diaconis--Ylvisaker priors, and provide convergence rate and finite-sample bounds for the Kullback-Leibler divergence between the exact posterior and the optimal Gaussian approximation. We demonstrate empirically in simulations and a real data application that the approximation is highly accurate, even in relatively small samples. The proposed approximation provides a computationally scalable and principled approach to regularized estimation and approximate Bayesian inference for log-linear models.

Another challenging and somewhat non-standard joint modeling problem is inference on tail dependence in stochastic processes. In applications where extreme dependence is of interest, data are almost always time-indexed. Existing methods for inference and modeling in this setting often cluster extreme events or choose window sizes with the goal of preserving temporal information. In Chapter 5, we propose an alternative paradigm for inference on tail dependence in stochastic processes with arbitrary temporal dependence structure in the extremes, based on the idea that the information on strength of tail dependence and the temporal structure in this dependence are both encoded in waiting times between exceedances of high thresholds. We construct a class of time-indexed stochastic processes with tail dependence obtained by endowing the support points in de Haan's spectral representation of max-stable processes with velocities and lifetimes. We extend Smith's model to these max-stable velocity processes and obtain the distribution of waiting times between extreme events at multiple locations. Motivated by this result, a new definition of tail dependence is proposed that is a function of the distribution of waiting times between threshold exceedances, and an inferential framework is constructed for estimating the strength of extremal dependence and quantifying uncertainty in this paradigm. The method is applied to climatological, financial, and electrophysiology data.

The remainder of this thesis focuses on posterior computation by Markov chain Monte Carlo. The Markov Chain Monte Carlo method is the dominant paradigm for posterior computation in Bayesian analysis. It has long been common to control computation time by making approximations to the Markov transition kernel. Comparatively little attention has been paid to convergence and estimation error in these approximating Markov Chains. In Chapter 6, we propose a framework for assessing when to use approximations in MCMC algorithms, and how much error in the transition kernel should be tolerated to obtain optimal estimation performance with respect to a specified loss function and computational budget. The results require only ergodicity of the exact kernel and control of the kernel approximation accuracy. The theoretical framework is applied to approximations based on random subsets of data, low-rank approximations of Gaussian processes, and a novel approximating Markov chain for discrete mixture models.

Data augmentation Gibbs samplers are arguably the most popular class of algorithm for approximately sampling from the posterior distribution for the parameters of generalized linear models. The truncated Normal and Polya-Gamma data augmentation samplers are standard examples for probit and logit links, respectively. Motivated by an important problem in quantitative advertising, in Chapter 7 we consider the application of these algorithms to modeling rare events. We show that when the sample size is large but the observed number of successes is small, these data augmentation samplers mix very slowly, with a spectral gap that converges to zero at a rate at least proportional to the reciprocal of the square root of the sample size up to a log factor. In simulation studies, moderate sample sizes result in high autocorrelations and small effective sample sizes. Similar empirical results are observed for related data augmentation samplers for multinomial logit and probit models. When applied to a real quantitative advertising dataset, the data augmentation samplers mix very poorly. Conversely, Hamiltonian Monte Carlo and a type of independence chain Metropolis algorithm show good mixing on the same dataset.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Statistical methodology is proposed for comparing molecular shapes. In order to account for the continuous nature of molecules, classical shape analysis methods are combined with techniques used for predicting random fields in spatial statistics. Applying a modification of Procrustes analysis, Bayesian inference is carried out using Markov chain Monte Carlo methods for the pairwise alignment of the resulting molecular fields. Superimposing entire fields rather than the configuration matrices of nuclear positions thereby solves the problem that there is usually no clear one--to--one correspondence between the atoms of the two molecules under consideration. Using a similar concept, we also propose an adaptation of the generalised Procrustes analysis algorithm for the simultaneous alignment of multiple molecular fields. The methodology is applied to a dataset of 31 steroid molecules.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Financial processes may possess long memory and their probability densities may display heavy tails. Many models have been developed to deal with this tail behaviour, which reflects the jumps in the sample paths. On the other hand, the presence of long memory, which contradicts the efficient market hypothesis, is still an issue for further debates. These difficulties present challenges with the problems of memory detection and modelling the co-presence of long memory and heavy tails. This PhD project aims to respond to these challenges. The first part aims to detect memory in a large number of financial time series on stock prices and exchange rates using their scaling properties. Since financial time series often exhibit stochastic trends, a common form of nonstationarity, strong trends in the data can lead to false detection of memory. We will take advantage of a technique known as multifractal detrended fluctuation analysis (MF-DFA) that can systematically eliminate trends of different orders. This method is based on the identification of scaling of the q-th-order moments and is a generalisation of the standard detrended fluctuation analysis (DFA) which uses only the second moment; that is, q = 2. We also consider the rescaled range R/S analysis and the periodogram method to detect memory in financial time series and compare their results with the MF-DFA. An interesting finding is that short memory is detected for stock prices of the American Stock Exchange (AMEX) and long memory is found present in the time series of two exchange rates, namely the French franc and the Deutsche mark. Electricity price series of the five states of Australia are also found to possess long memory. For these electricity price series, heavy tails are also pronounced in their probability densities. The second part of the thesis develops models to represent short-memory and longmemory financial processes as detected in Part I. These models take the form of continuous-time AR(∞) -type equations whose kernel is the Laplace transform of a finite Borel measure. By imposing appropriate conditions on this measure, short memory or long memory in the dynamics of the solution will result. A specific form of the models, which has a good MA(∞) -type representation, is presented for the short memory case. Parameter estimation of this type of models is performed via least squares, and the models are applied to the stock prices in the AMEX, which have been established in Part I to possess short memory. By selecting the kernel in the continuous-time AR(∞) -type equations to have the form of Riemann-Liouville fractional derivative, we obtain a fractional stochastic differential equation driven by Brownian motion. This type of equations is used to represent financial processes with long memory, whose dynamics is described by the fractional derivative in the equation. These models are estimated via quasi-likelihood, namely via a continuoustime version of the Gauss-Whittle method. The models are applied to the exchange rates and the electricity prices of Part I with the aim of confirming their possible long-range dependence established by MF-DFA. The third part of the thesis provides an application of the results established in Parts I and II to characterise and classify financial markets. We will pay attention to the New York Stock Exchange (NYSE), the American Stock Exchange (AMEX), the NASDAQ Stock Exchange (NASDAQ) and the Toronto Stock Exchange (TSX). The parameters from MF-DFA and those of the short-memory AR(∞) -type models will be employed in this classification. We propose the Fisher discriminant algorithm to find a classifier in the two and three-dimensional spaces of data sets and then provide cross-validation to verify discriminant accuracies. This classification is useful for understanding and predicting the behaviour of different processes within the same market. The fourth part of the thesis investigates the heavy-tailed behaviour of financial processes which may also possess long memory. We consider fractional stochastic differential equations driven by stable noise to model financial processes such as electricity prices. The long memory of electricity prices is represented by a fractional derivative, while the stable noise input models their non-Gaussianity via the tails of their probability density. A method using the empirical densities and MF-DFA will be provided to estimate all the parameters of the model and simulate sample paths of the equation. The method is then applied to analyse daily spot prices for five states of Australia. Comparison with the results obtained from the R/S analysis, periodogram method and MF-DFA are provided. The results from fractional SDEs agree with those from MF-DFA, which are based on multifractal scaling, while those from the periodograms, which are based on the second order, seem to underestimate the long memory dynamics of the process. This highlights the need and usefulness of fractal methods in modelling non-Gaussian financial processes with long memory.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

PURPOSE: To examine the association between neighborhood disadvantage and physical activity (PA). ---------- METHODS: We use data from the HABITAT multilevel longitudinal study of PA among mid-aged (40-65 years) men and women (n=11, 037, 68.5% response rate) living in 200 neighborhoods in Brisbane, Australia. PA was measured using three questions from the Active Australia Survey (general walking, moderate, and vigorous activity), one indicator of total activity, and two questions about walking and cycling for transport. The PA measures were operationalized using multiple categories based on time and estimated energy expenditure that were interpretable with reference to the latest PA recommendations. The association between neighborhood disadvantage and PA was examined using multilevel multinomial logistic regression and Markov Chain Monte Carlo simulation. The contribution of neighborhood disadvantage to between-neighborhood variation in PA was assessed using the 80% interval odds ratio. ---------- RESULTS: After adjustment for sex, age, living arrangement, education, occupation, and household income, reported participation in all measures and levels of PA varied significantly across Brisbane’s neighborhoods, and neighborhood disadvantage accounted for some of this variation. Residents of advantaged neighborhoods reported significantly higher levels of total activity, general walking, moderate, and vigorous activity; however, they were less likely to walk for transport. There was no statistically significant association between neighborhood disadvantage and cycling for transport. In terms of total PA, residents of advantaged neighborhoods were more likely to exceed PA recommendations. ---------- CONCLUSIONS: Neighborhoods may exert a contextual effect on residents’ likelihood of participating in PA. The greater propensity of residents in advantaged neighborhoods to do high levels of total PA may contribute to lower rates of cardiovascular disease and obesity in these areas

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This dissertation is primarily an applied statistical modelling investigation, motivated by a case study comprising real data and real questions. Theoretical questions on modelling and computation of normalization constants arose from pursuit of these data analytic questions. The essence of the thesis can be described as follows. Consider binary data observed on a two-dimensional lattice. A common problem with such data is the ambiguity of zeroes recorded. These may represent zero response given some threshold (presence) or that the threshold has not been triggered (absence). Suppose that the researcher wishes to estimate the effects of covariates on the binary responses, whilst taking into account underlying spatial variation, which is itself of some interest. This situation arises in many contexts and the dingo, cypress and toad case studies described in the motivation chapter are examples of this. Two main approaches to modelling and inference are investigated in this thesis. The first is frequentist and based on generalized linear models, with spatial variation modelled by using a block structure or by smoothing the residuals spatially. The EM algorithm can be used to obtain point estimates, coupled with bootstrapping or asymptotic MLE estimates for standard errors. The second approach is Bayesian and based on a three- or four-tier hierarchical model, comprising a logistic regression with covariates for the data layer, a binary Markov Random field (MRF) for the underlying spatial process, and suitable priors for parameters in these main models. The three-parameter autologistic model is a particular MRF of interest. Markov chain Monte Carlo (MCMC) methods comprising hybrid Metropolis/Gibbs samplers is suitable for computation in this situation. Model performance can be gauged by MCMC diagnostics. Model choice can be assessed by incorporating another tier in the modelling hierarchy. This requires evaluation of a normalization constant, a notoriously difficult problem. Difficulty with estimating the normalization constant for the MRF can be overcome by using a path integral approach, although this is a highly computationally intensive method. Different methods of estimating ratios of normalization constants (N Cs) are investigated, including importance sampling Monte Carlo (ISMC), dependent Monte Carlo based on MCMC simulations (MCMC), and reverse logistic regression (RLR). I develop an idea present though not fully developed in the literature, and propose the Integrated mean canonical statistic (IMCS) method for estimating log NC ratios for binary MRFs. The IMCS method falls within the framework of the newly identified path sampling methods of Gelman & Meng (1998) and outperforms ISMC, MCMC and RLR. It also does not rely on simplifying assumptions, such as ignoring spatio-temporal dependence in the process. A thorough investigation is made of the application of IMCS to the three-parameter Autologistic model. This work introduces background computations required for the full implementation of the four-tier model in Chapter 7. Two different extensions of the three-tier model to a four-tier version are investigated. The first extension incorporates temporal dependence in the underlying spatio-temporal process. The second extensions allows the successes and failures in the data layer to depend on time. The MCMC computational method is extended to incorporate the extra layer. A major contribution of the thesis is the development of a fully Bayesian approach to inference for these hierarchical models for the first time. Note: The author of this thesis has agreed to make it open access but invites people downloading the thesis to send her an email via the 'Contact Author' function.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Now in its second edition, this book describes tools that are commonly used in transportation data analysis. The first part of the text provides statistical fundamentals while the second part presents continuous dependent variable models. With a focus on count and discrete dependent variable models, the third part features new chapters on mixed logit models, logistic regression, and ordered probability models. The last section provides additional coverage of Bayesian statistical modeling, including Bayesian inference and Markov chain Monte Carlo methods. Data sets are available online to use with the modeling techniques discussed.