38 resultados para chaîne de Markov
em Doria (National Library of Finland DSpace Services) - National Library of Finland, Finland
Resumo:
This thesis is concerned with the state and parameter estimation in state space models. The estimation of states and parameters is an important task when mathematical modeling is applied to many different application areas such as the global positioning systems, target tracking, navigation, brain imaging, spread of infectious diseases, biological processes, telecommunications, audio signal processing, stochastic optimal control, machine learning, and physical systems. In Bayesian settings, the estimation of states or parameters amounts to computation of the posterior probability density function. Except for a very restricted number of models, it is impossible to compute this density function in a closed form. Hence, we need approximation methods. A state estimation problem involves estimating the states (latent variables) that are not directly observed in the output of the system. In this thesis, we use the Kalman filter, extended Kalman filter, Gauss–Hermite filters, and particle filters to estimate the states based on available measurements. Among these filters, particle filters are numerical methods for approximating the filtering distributions of non-linear non-Gaussian state space models via Monte Carlo. The performance of a particle filter heavily depends on the chosen importance distribution. For instance, inappropriate choice of the importance distribution can lead to the failure of convergence of the particle filter algorithm. In this thesis, we analyze the theoretical Lᵖ particle filter convergence with general importance distributions, where p ≥2 is an integer. A parameter estimation problem is considered with inferring the model parameters from measurements. For high-dimensional complex models, estimation of parameters can be done by Markov chain Monte Carlo (MCMC) methods. In its operation, the MCMC method requires the unnormalized posterior distribution of the parameters and a proposal distribution. In this thesis, we show how the posterior density function of the parameters of a state space model can be computed by filtering based methods, where the states are integrated out. This type of computation is then applied to estimate parameters of stochastic differential equations. Furthermore, we compute the partial derivatives of the log-posterior density function and use the hybrid Monte Carlo and scaled conjugate gradient methods to infer the parameters of stochastic differential equations. The computational efficiency of MCMC methods is highly depend on the chosen proposal distribution. A commonly used proposal distribution is Gaussian. In this kind of proposal, the covariance matrix must be well tuned. To tune it, adaptive MCMC methods can be used. In this thesis, we propose a new way of updating the covariance matrix using the variational Bayesian adaptive Kalman filter algorithm.
Resumo:
Financial time series have a tendency of abruptly changing their behavior and maintain this behavior for several consecutive periods, and commodity futures returns are not an exception. This quality proposes that nonlinear models, as opposed to linear models, can more accurately describe returns and volatility. Markov regime switching models are able to match this behavior and have become a popular way to model financial time series. This study uses Markov regime switching model to describe the behavior of energy futures returns on a commodity level, because studies show that commodity futures are a heterogeneous asset class. The purpose of this thesis is twofold. First, determine how many regimes characterize individual energy commodities’ returns in different return frequencies. Second, study the characteristics of these regimes. We extent the previous studies on the subject in two ways: We allow for the possibility that the number of regimes may exceed two, as well as conduct the research on individual commodities rather than on commodity indices or subgroups of these indices. We use daily, weekly and monthly time series of Brent crude oil, WTI crude oil, natural gas, heating oil and gasoil futures returns over 1994–2014, where available, to carry out the study. We apply the likelihood ratio test to determine the sufficient number of regimes for each commodity and data frequency. Then the time series are modeled with Markov regime switching model to obtain the return distribution characteristics of each regime, as well as the transition probabilities of moving between regimes. The results for the number of regimes suggest that daily energy futures return series consist of three to six regimes, whereas weekly and monthly returns for all energy commodities display only two regimes. When the number of regimes exceeds two, there is a tendency for the time series of energy commodities to form groups of regimes. These groups are usually quite persistent as a whole because probability of a regime switch inside the group is high. However, individual regimes in these groups are not persistent and the process oscillates between these regimes frequently. Regimes that are not part of any group are generally persistent, but show low ergodic probability, i.e. rarely prevail in the market. This study also suggests that energy futures return series characterized with two regimes do not necessarily display persistent bull and bear regimes. In fact, for the majority of time series, bearish regime is considerably less persistent. Rahoituksen aikasarjoilla on taipumus arvaamattomasti muuttaa käyttäytymistään ja jatkaa tätä uutta käyttäytymistä useiden periodien ajan, eivätkä hyödykefutuurien tuotot tee tähän poikkeusta. Tämän ominaisuuden johdosta lineaaristen mallien sijasta epälineaariset mallit pystyvät tarkemmin kuvailemaan esimerkiksi tuottojen jakauman parametreja. Markov regiiminvaihtomallit pystyvät vangitsemaan tämän ominaisuuden ja siksi niistä on tullut suosittuja rahoituksen aikasarjojen mallintamisessa. Tämä tutkimus käyttää Markov regiiminvaihtomallia kuvaamaan yksittäisten energiafutuurien tuottojen käyttäytymistä, sillä tutkimukset osoittavat hyödykefutuurien olevan hyvin heterogeeninen omaisuusluokka. Tutkimuksen tarkoitus on selvittää, kuinka monta regiimiä tarvitaan kuvaamaan energiafutuurien tuottoja eri tuottofrekvensseillä ja mitkä ovat näiden regiimien ominaisuudet. Aiempaa tutkimusta aiheesta laajennetaan määrittämällä regiimien lukumäärä tilastotieteellisen testauksen menetelmin sekä tutkimalla energiafutuureja yksittäin; ei indeksi- tai alaindeksitasolla. Tutkimuksessa käytetään päivä-, viikko- ja kuukausiaikasarjoja Brent-raakaöljyn, WTI-raakaöljyn, maakaasun, lämmitysöljyn ja polttoöljyn tuotoista aikaväliltä 1994–2014, siltä osin kuin aineistoa on saatavilla. Likelihood ratio -testin avulla estimoidaan kaikille aikasarjoille regiimien määrä,jonka jälkeen Markov regiiminvaihtomallia hyödyntäen määritetään yksittäisten regiimientuottojakaumien ominaisuudet sekä regiimien välinen transitiomatriisi. Tulokset regiimien lukumäärän osalta osoittavat, että energiafutuurien päiväkohtaisten tuottojen aikasarjoissa regiimien lukumäärä vaihtelee kolmen ja kuuden välillä. Viikko- ja kuukausituottojen kohdalla kaikkien energiafutuurien prosesseissa regiimien lukumäärä on kaksi. Kun regiimejä on enemmän kuin kaksi, on prosessilla taipumus muodostaa regiimeistä koostuvia ryhmiä. Prosessi pysyy ryhmän sisällä yleensä pitkään, koska todennäköisyys siirtyä ryhmään kuuluvien regiimien välillä on suuri. Yksittäiset regiimit ryhmän sisällä eivät kuitenkaan ole kovin pysyviä. Näin ollen prosessi vaihtelee ryhmän sisäisten regiimien välillä tiuhaan. Regiimit, jotka eivät kuulu ryhmään, ovat yleensä pysyviä, mutta prosessi ajautuu niihin vain harvoin, sillä todennäköisyys siirtyä muista regiimeistä niihin on pieni. Tutkimuksen tulokset osoittavat myös, että prosesseissa, joita ohjaa kaksi regiimiä, nämä regiimit eivät välttämättä ole pysyvät bull- ja bear-markkinatilanteet. Tulokset osoittavat sen sijaan, että bear-markkinatilanne on energiafutuureissa selvästi vähemmän pysyvä.
Resumo:
Even though a large amount of evidence would suggest that PP2A serine/threonine protein phosphatase acts as a tumour suppressor the genomics data to support this claim is limited. We fit a sparse binary Markov random field with individual sample's total mutational frequency as an additional covariate to model the dependencies between the mutations occurring in the PP2A encoding genes. We utilize the data from recent large scale cancer genomics studies, where the whole genome from a human tumour biopsy has been analysed. Our results show a complex network of interactions between the occurrence of mutations in our twenty examined genes. According to our analysis the mutations occurring in the genes PPP2R1A, PPP2R3A, and PPP2R2B are identified as the key mutations. These genes form the core of the network of conditional dependency between the mutations in the investigated twenty genes. Additionally, we note that the mutations occurring in PPP2R4 seem to be more influential in samples with higher number of total mutations. The mutations occurring in the set of genes suggested by our results has been shown to contribute to the transformation of human cells. We conclude that our evidence further supports the claim that PP2A acts as a tumour suppressor and restoring PP2A activity is an appealing therapeutic strategy.
Resumo:
Yksi keskeisimmistä tehtävistä matemaattisten mallien tilastollisessa analyysissä on mallien tuntemattomien parametrien estimointi. Tässä diplomityössä ollaan kiinnostuneita tuntemattomien parametrien jakaumista ja niiden muodostamiseen sopivista numeerisista menetelmistä, etenkin tapauksissa, joissa malli on epälineaarinen parametrien suhteen. Erilaisten numeeristen menetelmien osalta pääpaino on Markovin ketju Monte Carlo -menetelmissä (MCMC). Nämä laskentaintensiiviset menetelmät ovat viime aikoina kasvattaneet suosiotaan lähinnä kasvaneen laskentatehon vuoksi. Sekä Markovin ketjujen että Monte Carlo -simuloinnin teoriaa on esitelty työssä siinä määrin, että menetelmien toimivuus saadaan perusteltua. Viime aikoina kehitetyistä menetelmistä tarkastellaan etenkin adaptiivisia MCMC menetelmiä. Työn lähestymistapa on käytännönläheinen ja erilaisia MCMC -menetelmien toteutukseen liittyviä asioita korostetaan. Työn empiirisessä osuudessa tarkastellaan viiden esimerkkimallin tuntemattomien parametrien jakaumaa käyttäen hyväksi teoriaosassa esitettyjä menetelmiä. Mallit kuvaavat kemiallisia reaktioita ja kuvataan tavallisina differentiaaliyhtälöryhminä. Mallit on kerätty kemisteiltä Lappeenrannan teknillisestä yliopistosta ja Åbo Akademista, Turusta.
Resumo:
Muokatun matriisi-geometrian tekniikan kehitys yleimmäksi jonoksi on esitelty tässä työssä. Jonotus systeemi koostuu useista jonoista joilla on rajatut kapasiteetit. Tässä työssä on myös tutkittu PH-tyypin jakautumista kun ne jaetaan. Rakenne joka vastaa lopullista Markovin ketjua jossa on itsenäisiä matriiseja joilla on QBD rakenne. Myös eräitä rajallisia olotiloja on käsitelty tässä työssä. Sen esitteleminen matriisi-geometrisessä muodossa, muokkaamalla matriisi-geometristä ratkaisua on tämän opinnäytetyön tulos.
Resumo:
Tässä päättötyössä annetaan kuvaus kehitetystä sovelluksesta Quasi Birth Death processien ratkaisuun. Tämä ohjelma on tähän mennessä ainutlaatuinen ja sen avulla voi ratkaista sarjan tehtäviä ja sitä tarvitaan kommunikaatio systeemien analyysiin. Mainittuun sovellukseen on annettu kuvaus ja määritelmä. Lyhyt kuvaus toisesta sovelluksesta Quasi Birth Death prosessien tehtävien ratkaisuun on myös annettu
Resumo:
This work presents new, efficient Markov chain Monte Carlo (MCMC) simulation methods for statistical analysis in various modelling applications. When using MCMC methods, the model is simulated repeatedly to explore the probability distribution describing the uncertainties in model parameters and predictions. In adaptive MCMC methods based on the Metropolis-Hastings algorithm, the proposal distribution needed by the algorithm learns from the target distribution as the simulation proceeds. Adaptive MCMC methods have been subject of intensive research lately, as they open a way for essentially easier use of the methodology. The lack of user-friendly computer programs has been a main obstacle for wider acceptance of the methods. This work provides two new adaptive MCMC methods: DRAM and AARJ. The DRAM method has been built especially to work in high dimensional and non-linear problems. The AARJ method is an extension to DRAM for model selection problems, where the mathematical formulation of the model is uncertain and we want simultaneously to fit several different models to the same observations. The methods were developed while keeping in mind the needs of modelling applications typical in environmental sciences. The development work has been pursued while working with several application projects. The applications presented in this work are: a winter time oxygen concentration model for Lake Tuusulanjärvi and adaptive control of the aerator; a nutrition model for Lake Pyhäjärvi and lake management planning; validation of the algorithms of the GOMOS ozone remote sensing instrument on board the Envisat satellite of European Space Agency and the study of the effects of aerosol model selection on the GOMOS algorithm.
Resumo:
Construction of multiple sequence alignments is a fundamental task in Bioinformatics. Multiple sequence alignments are used as a prerequisite in many Bioinformatics methods, and subsequently the quality of such methods can be critically dependent on the quality of the alignment. However, automatic construction of a multiple sequence alignment for a set of remotely related sequences does not always provide biologically relevant alignments.Therefore, there is a need for an objective approach for evaluating the quality of automatically aligned sequences. The profile hidden Markov model is a powerful approach in comparative genomics. In the profile hidden Markov model, the symbol probabilities are estimated at each conserved alignment position. This can increase the dimension of parameter space and cause an overfitting problem. These two research problems are both related to conservation. We have developed statistical measures for quantifying the conservation of multiple sequence alignments. Two types of methods are considered, those identifying conserved residues in an alignment position, and those calculating positional conservation scores. The positional conservation score was exploited in a statistical prediction model for assessing the quality of multiple sequence alignments. The residue conservation score was used as part of the emission probability estimation method proposed for profile hidden Markov models. The results of the predicted alignment quality score highly correlated with the correct alignment quality scores, indicating that our method is reliable for assessing the quality of any multiple sequence alignment. The comparison of the emission probability estimation method with the maximum likelihood method showed that the number of estimated parameters in the model was dramatically decreased, while the same level of accuracy was maintained. To conclude, we have shown that conservation can be successfully used in the statistical model for alignment quality assessment and in the estimation of emission probabilities in the profile hidden Markov models.
Resumo:
This thesis was focussed on statistical analysis methods and proposes the use of Bayesian inference to extract information contained in experimental data by estimating Ebola model parameters. The model is a system of differential equations expressing the behavior and dynamics of Ebola. Two sets of data (onset and death data) were both used to estimate parameters, which has not been done by previous researchers in (Chowell, 2004). To be able to use both data, a new version of the model has been built. Model parameters have been estimated and then used to calculate the basic reproduction number and to study the disease-free equilibrium. Estimates of the parameters were useful to determine how well the model fits the data and how good estimates were, in terms of the information they provided about the possible relationship between variables. The solution showed that Ebola model fits the observed onset data at 98.95% and the observed death data at 93.6%. Since Bayesian inference can not be performed analytically, the Markov chain Monte Carlo approach has been used to generate samples from the posterior distribution over parameters. Samples have been used to check the accuracy of the model and other characteristics of the target posteriors.
Resumo:
The optimal design of a heat exchanger system is based on given model parameters together with given standard ranges for machine design variables. The goals set for minimizing the Life Cycle Cost (LCC) function which represents the price of the saved energy, for maximizing the momentary heat recovery output with given constraints satisfied and taking into account the uncertainty in the models were successfully done. Nondominated Sorting Genetic Algorithm II (NSGA-II) for the design optimization of a system is presented and implemented inMatlab environment. Markov ChainMonte Carlo (MCMC) methods are also used to take into account the uncertainty in themodels. Results show that the price of saved energy can be optimized. A wet heat exchanger is found to be more efficient and beneficial than a dry heat exchanger even though its construction is expensive (160 EUR/m2) compared to the construction of a dry heat exchanger (50 EUR/m2). It has been found that the longer lifetime weights higher CAPEX and lower OPEX and vice versa, and the effect of the uncertainty in the models has been identified in a simplified case of minimizing the area of a dry heat exchanger.
Resumo:
The identifiability of the parameters of a heat exchanger model without phase change was studied in this Master’s thesis using synthetically made data. A fast, two-step Markov chain Monte Carlo method (MCMC) was tested with a couple of case studies and a heat exchanger model. The two-step MCMC-method worked well and decreased the computation time compared to the traditional MCMC-method. The effect of measurement accuracy of certain control variables to the identifiability of parameters was also studied. The accuracy used did not seem to have a remarkable effect to the identifiability of parameters. The use of the posterior distribution of parameters in different heat exchanger geometries was studied. It would be computationally most efficient to use the same posterior distribution among different geometries in the optimisation of heat exchanger networks. According to the results, this was possible in the case when the frontal surface areas were the same among different geometries. In the other cases the same posterior distribution can be used for optimisation too, but that will give a wider predictive distribution as a result. For condensing surface heat exchangers the numerical stability of the simulation model was studied. As a result, a stable algorithm was developed.
Resumo:
Identification of order of an Autoregressive Moving Average Model (ARMA) by the usual graphical method is subjective. Hence, there is a need of developing a technique to identify the order without employing the graphical investigation of series autocorrelations. To avoid subjectivity, this thesis focuses on determining the order of the Autoregressive Moving Average Model using Reversible Jump Markov Chain Monte Carlo (RJMCMC). The RJMCMC selects the model from a set of the models suggested by better fitting, standard deviation errors and the frequency of accepted data. Together with deep analysis of the classical Box-Jenkins modeling methodology the integration with MCMC algorithms has been focused through parameter estimation and model fitting of ARMA models. This helps to verify how well the MCMC algorithms can treat the ARMA models, by comparing the results with graphical method. It has been seen that the MCMC produced better results than the classical time series approach.
Resumo:
In the power market, electricity prices play an important role at the economic level. The behavior of a price trend usually known as a structural break may change over time in terms of its mean value, its volatility, or it may change for a period of time before reverting back to its original behavior or switching to another style of behavior, and the latter is typically termed a regime shift or regime switch. Our task in this thesis is to develop an electricity price time series model that captures fat tailed distributions which can explain this behavior and analyze it for better understanding. For NordPool data used, the obtained Markov Regime-Switching model operates on two regimes: regular and non-regular. Three criteria have been considered price difference criterion, capacity/flow difference criterion and spikes in Finland criterion. The suitability of GARCH modeling to simulate multi-regime modeling is also studied.