9 resultados para Estimateur de Bayes
em Helda - Digital Repository of University of Helsinki
Resumo:
Whether a statistician wants to complement a probability model for observed data with a prior distribution and carry out fully probabilistic inference, or base the inference only on the likelihood function, may be a fundamental question in theory, but in practice it may well be of less importance if the likelihood contains much more information than the prior. Maximum likelihood inference can be justified as a Gaussian approximation at the posterior mode, using flat priors. However, in situations where parametric assumptions in standard statistical models would be too rigid, more flexible model formulation, combined with fully probabilistic inference, can be achieved using hierarchical Bayesian parametrization. This work includes five articles, all of which apply probability modeling under various problems involving incomplete observation. Three of the papers apply maximum likelihood estimation and two of them hierarchical Bayesian modeling. Because maximum likelihood may be presented as a special case of Bayesian inference, but not the other way round, in the introductory part of this work we present a framework for probability-based inference using only Bayesian concepts. We also re-derive some results presented in the original articles using the toolbox equipped herein, to show that they are also justifiable under this more general framework. Here the assumption of exchangeability and de Finetti's representation theorem are applied repeatedly for justifying the use of standard parametric probability models with conditionally independent likelihood contributions. It is argued that this same reasoning can be applied also under sampling from a finite population. The main emphasis here is in probability-based inference under incomplete observation due to study design. This is illustrated using a generic two-phase cohort sampling design as an example. The alternative approaches presented for analysis of such a design are full likelihood, which utilizes all observed information, and conditional likelihood, which is restricted to a completely observed set, conditioning on the rule that generated that set. Conditional likelihood inference is also applied for a joint analysis of prevalence and incidence data, a situation subject to both left censoring and left truncation. Other topics covered are model uncertainty and causal inference using posterior predictive distributions. We formulate a non-parametric monotonic regression model for one or more covariates and a Bayesian estimation procedure, and apply the model in the context of optimal sequential treatment regimes, demonstrating that inference based on posterior predictive distributions is feasible also in this case.
Resumo:
In this thesis the use of the Bayesian approach to statistical inference in fisheries stock assessment is studied. The work was conducted in collaboration of the Finnish Game and Fisheries Research Institute by using the problem of monitoring and prediction of the juvenile salmon population in the River Tornionjoki as an example application. The River Tornionjoki is the largest salmon river flowing into the Baltic Sea. This thesis tackles the issues of model formulation and model checking as well as computational problems related to Bayesian modelling in the context of fisheries stock assessment. Each article of the thesis provides a novel method either for extracting information from data obtained via a particular type of sampling system or for integrating the information about the fish stock from multiple sources in terms of a population dynamics model. Mark-recapture and removal sampling schemes and a random catch sampling method are covered for the estimation of the population size. In addition, a method for estimating the stock composition of a salmon catch based on DNA samples is also presented. For most of the articles, Markov chain Monte Carlo (MCMC) simulation has been used as a tool to approximate the posterior distribution. Problems arising from the sampling method are also briefly discussed and potential solutions for these problems are proposed. Special emphasis in the discussion is given to the philosophical foundation of the Bayesian approach in the context of fisheries stock assessment. It is argued that the role of subjective prior knowledge needed in practically all parts of a Bayesian model should be recognized and consequently fully utilised in the process of model formulation.
Resumo:
In this Thesis, we develop theory and methods for computational data analysis. The problems in data analysis are approached from three perspectives: statistical learning theory, the Bayesian framework, and the information-theoretic minimum description length (MDL) principle. Contributions in statistical learning theory address the possibility of generalization to unseen cases, and regression analysis with partially observed data with an application to mobile device positioning. In the second part of the Thesis, we discuss so called Bayesian network classifiers, and show that they are closely related to logistic regression models. In the final part, we apply the MDL principle to tracing the history of old manuscripts, and to noise reduction in digital signals.
Resumo:
Minimum Description Length (MDL) is an information-theoretic principle that can be used for model selection and other statistical inference tasks. There are various ways to use the principle in practice. One theoretically valid way is to use the normalized maximum likelihood (NML) criterion. Due to computational difficulties, this approach has not been used very often. This thesis presents efficient floating-point algorithms that make it possible to compute the NML for multinomial, Naive Bayes and Bayesian forest models. None of the presented algorithms rely on asymptotic analysis and with the first two model classes we also discuss how to compute exact rational number solutions.
Resumo:
Yleisellä tasolla tutkimuksen kohteena oli Suomen helluntailiikkeen spiritualiteetti. Tutkimuksen kehysperusjoukkona oli Helsingin Saalem-seurakunnan tilaisuuksiin osallistuvat ihmiset. Aineisto kerättiin kyselylomakkeilla syksyllä 2004 Saalem-seurakunnan tilaisuuksissa. Täytettyjä lomakkeita kertyi 230. Vastaajien ikä vaihteli 13-87 vuoteen ja heistä 36% olimiehiä. 70% kuului Saalem-seurakuntaan ja 17% johonkin toiseen helluntaiseurakuntaan. Ei-helluntailaisia oli 13% vastaajista. Rajoittuneelta osin käytössä oli myös 500 vastaajan vertailuaineisto Kallion kaupunginosan alueelta. Tämän niinsanotun Case Kallio -aineiston vastaajat olivat pääsääntöisesti heikosti sitoutuneita kristinuskon oppeihin sekä hartaudenharjoittamiseen. Vastaajista 50% oli miehiä. Ikä vaihteli 18-39-uoden välillä. Teoreettisena lähtökohtana tutkimukselle toimi yhdysvaltalaisen Daniel Albrechtin empiirinen tutkimus helluntailais-karismaattisesta spiritualiteetista. Hän määrittelee helluntailais-karismaattisen spiritualiteetin muodostuvan kolmesta tekijästä: uskomuksista, käytännöistä sekä niin sanotuista sensibiliteeteistä. Sensibiliteeteillä tarkoitettaan asennoitumista toimintaa kohti. Albrechtin luomien kategorioiden pohjalta laadittiin kyselylomakkeeseen kaksi mittaria. Toinen mittasi koko helluntailaisen spiritualiteetin kenttää kuvaavia perustekijöitä, joihin sisältyivät uskomukset, käytännöt sekä sensibiliteetit. Toinen mittari keskittyi mittaamaan vain yhtä spiritualiteettimääritelmän osaa, sensibiliteettejä. Helluntailaisuuteen painottuvan näkökulman lisäksi tutkimuksessa käytettiin hyväksi David Hayn spiritualiteettinäkemystä. Hän määrittelee spiritualiteetin arkitodellisuuden ylittäväksi tietoisuudeksi. Hayn laatimien kategorioiden avulla kartoitettiin yleisinhimillistä spiritualiteettia. Tutkimuksen tarkoituksena oli selvittää Saalem-seurakunnan spiritualiteetin ilmenemismuotoja ja eroavaisuuksia suhteessa taustoihin. Lisäksi verrattiin Saalemista kerättyä aineistoa vertailuaineistoon (Case Kallio) sekä selvitettiin kahden erilaisesta lähtökohdasta nousevan spiritualitteettinäkemyksen yhteyttä toisiinsa. Tutkimus oli luonteeltaan kvantitatiivinen. Tutkimusmenetelminä käytettiin tilastollisia testejä sekä faktorianalyysiä. Faktorianalyysin rinnalla käytettiin niin kutsutta Bayes-mallinnusta, jolla ei ole parametrisille menetelmille asetettuja tiukkoja käyttöehtoja. Saalem-seurakunnasta tutkimustulokseksi saatiin 11 eritasoista spiritualiteettiulottuvuutta. Albrechtin esittämät seitsemän sensibiliteettikategoriaa löytyivät lähes sellaisenaan aineistosta, kun taas helluntailaisen spiritualiteetin perustekijöiden sekä yleisinhimillisen spiritualiteetin kohdalla käytössä olleet mittarit eivät toimineet täysin odotetulla tavalla. Kahta erilaista aineistoa voitiin vertailla yleisinhimillisen spiritualiteetin osalta. Yleisinhimillinen spiritualiteetti ei ollut vieras ilmiö kristillisestä opista ja hartaudenharjoittamisesta vieraantuneille vastaajille. Kuitenkin se sai korkeampia vastauspistemääriä helluntailaisten parissa. Kyseistä spiritualiteettia eriytyi kuvaamaan kaksi ulottuvuutta: yhteisöllinen altruismi sekä arjen kauneus. Pelkästään Saalem-seurakunnasta kerätystä aineistosta eriytyi lisäksi kolme helluntailaisen spiritualiteetin perustekijää: sana ja missio, johtajakeskeisyys sekä ylistys -ulottuvuudet. Samasta aineistosta nousi kuusi sensibiliteettiulottuvuutta: ylistys,yleinen puhdistuminen, seremoniallisuus, armolahjat, tavoitteellisuus sekä hengellinen puhdistuminen ja muutos. Toinen ylistysulottuvuus kuvasi ylistyksen merkitystä, toinen ylistystapaa. Saalem-seurakunnasta kerätyn aineiston keskiöön asettui sanaa ja missiota kuvaava ulottuvuus. Korkeimman vastauskeskiarvon sai tavoitteellisuusulottuvuus, samoin kuin molemmat yleisinhimillistä spiritualiteettia kuvastaneet ulottuvuudet saivat korkeita vastauskeskiarvoja. Helluntailaisen spiritualiteetin ulottuvuudet korreloivat positiivisesti yleisinhimillisen spiritualiteetin ulottuvuuksien kanssa. Tulokset voitiin yleistää koskemaan Helsingin Saalem-seurakunnan jäsenistöä sekä pääkaupunkiseudun helluntailaisuutta. Koko Suomen helluntailiikkeen kohdalla tuloksia voitiin pitää suuntaa-antavina. Avainsanat: helluntailiike, spiritualiteetti, Saalem, kvantitatiivinen tutkimus, monimuuttujamenetelmät, Bayes-mallinnus, Daniel Albrecht, David Hay
Resumo:
The purpose of this research is to draw up a clear construction of an anticipatory communicative decision-making process and a successful implementation of a Bayesian application that can be used as an anticipatory communicative decision-making support system. This study is a decision-oriented and constructive research project, and it includes examples of simulated situations. As a basis for further methodological discussion about different approaches to management research, in this research, a decision-oriented approach is used, which is based on mathematics and logic, and it is intended to develop problem solving methods. The approach is theoretical and characteristic of normative management science research. Also, the approach of this study is constructive. An essential part of the constructive approach is to tie the problem to its solution with theoretical knowledge. Firstly, the basic definitions and behaviours of an anticipatory management and managerial communication are provided. These descriptions include discussions of the research environment and formed management processes. These issues define and explain the background to further research. Secondly, it is processed to managerial communication and anticipatory decision-making based on preparation, problem solution, and solution search, which are also related to risk management analysis. After that, a solution to the decision-making support application is formed, using four different Bayesian methods, as follows: the Bayesian network, the influence diagram, the qualitative probabilistic network, and the time critical dynamic network. The purpose of the discussion is not to discuss different theories but to explain the theories which are being implemented. Finally, an application of Bayesian networks to the research problem is presented. The usefulness of the prepared model in examining a problem and the represented results of research is shown. The theoretical contribution includes definitions and a model of anticipatory decision-making. The main theoretical contribution of this study has been to develop a process for anticipatory decision-making that includes management with communication, problem-solving, and the improvement of knowledge. The practical contribution includes a Bayesian Decision Support Model, which is based on Bayesian influenced diagrams. The main contributions of this research are two developed processes, one for anticipatory decision-making, and the other to produce a model of a Bayesian network for anticipatory decision-making. In summary, this research contributes to decision-making support by being one of the few publicly available academic descriptions of the anticipatory decision support system, by representing a Bayesian model that is grounded on firm theoretical discussion, by publishing algorithms suitable for decision-making support, and by defining the idea of anticipatory decision-making for a parallel version. Finally, according to the results of research, an analysis of anticipatory management for planned decision-making is presented, which is based on observation of environment, analysis of weak signals, and alternatives to creative problem solving and communication.
Resumo:
The relationship between site characteristics and understorey vegetation composition was analysed with quantitative methods, especially from the viewpoint of site quality estimation. Theoretical models were applied to an empirical data set collected from the upland forests of southern Finland comprising 104 sites dominated by Scots pine (Pinus sylvestris L.), and 165 sites dominated by Norway spruce (Picea abies (L.) Karsten). Site index H100 was used as an independent measure of site quality. A new model for the estimation of site quality at sites with a known understorey vegetation composition was introduced. It is based on the application of Bayes' theorem to the density function of site quality within the study area combined with the species-specific presence-absence response curves. The resulting posterior probability density function may be used for calculating an estimate for the site variable. Using this method, a jackknife estimate of site index H100 was calculated separately for pine- and spruce-dominated sites. The results indicated that the cross-validation root mean squared error (RMSEcv) of the estimates improved from 2.98 m down to 2.34 m relative to the "null" model (standard deviation of the sample distribution) in pine-dominated forests. In spruce-dominated forests RMSEcv decreased from 3.94 m down to 3.16 m. In order to assess these results, four other estimation methods based on understorey vegetation composition were applied to the same data set. The results showed that none of the methods was clearly superior to the others. In pine-dominated forests, RMSEcv varied between 2.34 and 2.47 m, and the corresponding range for spruce-dominated forests was from 3.13 to 3.57 m.
Resumo:
Bayesian networks are compact, flexible, and interpretable representations of a joint distribution. When the network structure is unknown but there are observational data at hand, one can try to learn the network structure. This is called structure discovery. This thesis contributes to two areas of structure discovery in Bayesian networks: space--time tradeoffs and learning ancestor relations. The fastest exact algorithms for structure discovery in Bayesian networks are based on dynamic programming and use excessive amounts of space. Motivated by the space usage, several schemes for trading space against time are presented. These schemes are presented in a general setting for a class of computational problems called permutation problems; structure discovery in Bayesian networks is seen as a challenging variant of the permutation problems. The main contribution in the area of the space--time tradeoffs is the partial order approach, in which the standard dynamic programming algorithm is extended to run over partial orders. In particular, a certain family of partial orders called parallel bucket orders is considered. A partial order scheme that provably yields an optimal space--time tradeoff within parallel bucket orders is presented. Also practical issues concerning parallel bucket orders are discussed. Learning ancestor relations, that is, directed paths between nodes, is motivated by the need for robust summaries of the network structures when there are unobserved nodes at work. Ancestor relations are nonmodular features and hence learning them is more difficult than modular features. A dynamic programming algorithm is presented for computing posterior probabilities of ancestor relations exactly. Empirical tests suggest that ancestor relations can be learned from observational data almost as accurately as arcs even in the presence of unobserved nodes.