26 resultados para Bayesian model selection
em Doria (National Library of Finland DSpace Services) - National Library of Finland, Finland
Resumo:
This work presents new, efficient Markov chain Monte Carlo (MCMC) simulation methods for statistical analysis in various modelling applications. When using MCMC methods, the model is simulated repeatedly to explore the probability distribution describing the uncertainties in model parameters and predictions. In adaptive MCMC methods based on the Metropolis-Hastings algorithm, the proposal distribution needed by the algorithm learns from the target distribution as the simulation proceeds. Adaptive MCMC methods have been subject of intensive research lately, as they open a way for essentially easier use of the methodology. The lack of user-friendly computer programs has been a main obstacle for wider acceptance of the methods. This work provides two new adaptive MCMC methods: DRAM and AARJ. The DRAM method has been built especially to work in high dimensional and non-linear problems. The AARJ method is an extension to DRAM for model selection problems, where the mathematical formulation of the model is uncertain and we want simultaneously to fit several different models to the same observations. The methods were developed while keeping in mind the needs of modelling applications typical in environmental sciences. The development work has been pursued while working with several application projects. The applications presented in this work are: a winter time oxygen concentration model for Lake Tuusulanjärvi and adaptive control of the aerator; a nutrition model for Lake Pyhäjärvi and lake management planning; validation of the algorithms of the GOMOS ozone remote sensing instrument on board the Envisat satellite of European Space Agency and the study of the effects of aerosol model selection on the GOMOS algorithm.
Resumo:
The purpose of this research is to draw up a clear construction of an anticipatory communicative decision-making process and a successful implementation of a Bayesian application that can be used as an anticipatory communicative decision-making support system. This study is a decision-oriented and constructive research project, and it includes examples of simulated situations. As a basis for further methodological discussion about different approaches to management research, in this research, a decision-oriented approach is used, which is based on mathematics and logic, and it is intended to develop problem solving methods. The approach is theoretical and characteristic of normative management science research. Also, the approach of this study is constructive. An essential part of the constructive approach is to tie the problem to its solution with theoretical knowledge. Firstly, the basic definitions and behaviours of an anticipatory management and managerial communication are provided. These descriptions include discussions of the research environment and formed management processes. These issues define and explain the background to further research. Secondly, it is processed to managerial communication and anticipatory decision-making based on preparation, problem solution, and solution search, which are also related to risk management analysis. After that, a solution to the decision-making support application is formed, using four different Bayesian methods, as follows: the Bayesian network, the influence diagram, the qualitative probabilistic network, and the time critical dynamic network. The purpose of the discussion is not to discuss different theories but to explain the theories which are being implemented. Finally, an application of Bayesian networks to the research problem is presented. The usefulness of the prepared model in examining a problem and the represented results of research is shown. The theoretical contribution includes definitions and a model of anticipatory decision-making. The main theoretical contribution of this study has been to develop a process for anticipatory decision-making that includes management with communication, problem-solving, and the improvement of knowledge. The practical contribution includes a Bayesian Decision Support Model, which is based on Bayesian influenced diagrams. The main contributions of this research are two developed processes, one for anticipatory decision-making, and the other to produce a model of a Bayesian network for anticipatory decision-making. In summary, this research contributes to decision-making support by being one of the few publicly available academic descriptions of the anticipatory decision support system, by representing a Bayesian model that is grounded on firm theoretical discussion, by publishing algorithms suitable for decision-making support, and by defining the idea of anticipatory decision-making for a parallel version. Finally, according to the results of research, an analysis of anticipatory management for planned decision-making is presented, which is based on observation of environment, analysis of weak signals, and alternatives to creative problem solving and communication.
Resumo:
Coastal birds are an integral part of coastal ecosystems, which nowadays are subject to severe environmental pressures. Effective measures for the management and conservation of seabirds and their habitats call for insight into their population processes and the factors affecting their distribution and abundance. Central to national and international management and conservation measures is the availability of accurate data and information on bird populations, as well as on environmental trends and on measures taken to solve environmental problems. In this thesis I address different aspects of the occurrence, abundance, population trends and breeding success of waterbirds breeding on the Finnish coast of the Baltic Sea, and discuss the implications of the results for seabird monitoring, management and conservation. In addition, I assess the position and prospects of coastal bird monitoring data, in the processing and dissemination of biodiversity data and information in accordance with the Convention on Biological Diversity (CBD) and other national and international commitments. I show that important factors for seabird habitat selection are island area and elevation, water depth, shore openness, and the composition of island cover habitats. Habitat preferences are species-specific, with certain similarities within species groups. The occurrence of the colonial Arctic Tern (Sterna paradisaea) is partly affected by different habitat characteristics than its abundance. Using long-term bird monitoring data, I show that eutrophication and winter severity have reduced the populations of several Finnish seabird species. A major demographic factor through which environmental changes influence bird populations is breeding success. Breeding success can function as a more rapid indicator of sublethal environmental impacts than population trends, particularly for long-lived and slowbreeding species, and should therefore be included in coastal bird monitoring schemes. Among my target species, local breeding success can be shown to affect the populations of the Mallard (Anas platyrhynchos), the Eider (Somateria mollissima) and the Goosander (Mergus merganser) after a time lag corresponding to their species-specific recruitment age. For some of the target species, the number of individuals in late summer can be used as an easier and more cost-effective indicator of breeding success than brood counts. My results highlight that the interpretation and application of habitat and population studies require solid background knowledge of the ecology of the target species. In addition, the special characteristics of coastal birds, their habitats, and coastal bird monitoring data have to be considered in the assessment of their distribution and population trends. According to the results, the relationships between the occurrence, abundance and population trends of coastal birds and environmental factors can be quantitatively assessed using multivariate modelling and model selection. Spatial data sets widely available in Finland can be utilised in the calculation of several variables that are relevant to the habitat selection of Finnish coastal species. Concerning some habitat characteristics field work is still required, due to a lack of remotely sensed data or the low resolution of readily available data in relation to the fine scale of the habitat patches in the archipelago. While long-term data sets exist for water quality and weather, the lack of data concerning for instance the food resources of birds hampers more detailed studies of environmental effects on bird populations. Intensive studies of coastal bird species in different archipelago areas should be encouraged. The provision and free delivery of high-quality coastal data concerning bird populations and their habitats would greatly increase the capability of ecological modelling, as well as the management and conservation of coastal environments and communities. International initiatives that promote open spatial data infrastructures and sharing are therefore highly regarded. To function effectively, international information networks, such as the biodiversity Clearing House Mechanism (CHM) under the CBD, need to be rooted at regional and local levels. Attention should also be paid to the processing of data for higher levels of the information hierarchy, so that data are synthesized and developed into high-quality knowledge applicable to management and conservation.
Resumo:
The topic of this thesis is the simulation of a combination of several control and data assimilation methods, meant to be used for controlling the quality of paper in a paper machine. Paper making is a very complex process and the information obtained from the web is sparse. A paper web scanner can only measure a zig zag path on the web. An assimilation method is needed to process estimates for Machine Direction (MD) and Cross Direction (CD) profiles of the web. Quality control is based on these measurements. There is an increasing need for intelligent methods to assist in data assimilation. The target of this thesis is to study how such intelligent assimilation methods are affecting paper web quality. This work is based on a paper web simulator, which has been developed in the TEKES funded MASI NoTes project. The simulator is a valuable tool in comparing different assimilation methods. The thesis contains the comparison of four different assimilation methods. These data assimilation methods are a first order Bayesian model estimator, an ARMA model based on a higher order Bayesian estimator, a Fourier transform based Kalman filter estimator and a simple block estimator. The last one can be considered to be close to current operational methods. From these methods Bayesian, ARMA and Kalman all seem to have advantages over the commercial one. The Kalman and ARMA estimators seems to be best in overall performance.
Resumo:
Machine learning provides tools for automated construction of predictive models in data intensive areas of engineering and science. The family of regularized kernel methods have in the recent years become one of the mainstream approaches to machine learning, due to a number of advantages the methods share. The approach provides theoretically well-founded solutions to the problems of under- and overfitting, allows learning from structured data, and has been empirically demonstrated to yield high predictive performance on a wide range of application domains. Historically, the problems of classification and regression have gained the majority of attention in the field. In this thesis we focus on another type of learning problem, that of learning to rank. In learning to rank, the aim is from a set of past observations to learn a ranking function that can order new objects according to how well they match some underlying criterion of goodness. As an important special case of the setting, we can recover the bipartite ranking problem, corresponding to maximizing the area under the ROC curve (AUC) in binary classification. Ranking applications appear in a large variety of settings, examples encountered in this thesis include document retrieval in web search, recommender systems, information extraction and automated parsing of natural language. We consider the pairwise approach to learning to rank, where ranking models are learned by minimizing the expected probability of ranking any two randomly drawn test examples incorrectly. The development of computationally efficient kernel methods, based on this approach, has in the past proven to be challenging. Moreover, it is not clear what techniques for estimating the predictive performance of learned models are the most reliable in the ranking setting, and how the techniques can be implemented efficiently. The contributions of this thesis are as follows. First, we develop RankRLS, a computationally efficient kernel method for learning to rank, that is based on minimizing a regularized pairwise least-squares loss. In addition to training methods, we introduce a variety of algorithms for tasks such as model selection, multi-output learning, and cross-validation, based on computational shortcuts from matrix algebra. Second, we improve the fastest known training method for the linear version of the RankSVM algorithm, which is one of the most well established methods for learning to rank. Third, we study the combination of the empirical kernel map and reduced set approximation, which allows the large-scale training of kernel machines using linear solvers, and propose computationally efficient solutions to cross-validation when using the approach. Next, we explore the problem of reliable cross-validation when using AUC as a performance criterion, through an extensive simulation study. We demonstrate that the proposed leave-pair-out cross-validation approach leads to more reliable performance estimation than commonly used alternative approaches. Finally, we present a case study on applying machine learning to information extraction from biomedical literature, which combines several of the approaches considered in the thesis. The thesis is divided into two parts. Part I provides the background for the research work and summarizes the most central results, Part II consists of the five original research articles that are the main contribution of this thesis.
Resumo:
The issue of selecting an appropriate healthcare information system is a very essential one. If implemented healthcare information system doesn’t fit particular healthcare institution, for example there are unnecessary functions; healthcare institution wastes its resources and its efficiency decreases. The purpose of this research is to develop a healthcare information system selection model to assist the decision-making process of choosing healthcare information system. Appropriate healthcare information system helps healthcare institutions to become more effective and efficient and keep up with the times. The research is based on comparison analysis of 50 healthcare information systems and 6 interviews with experts from St-Petersburg healthcare institutions that already have experience in healthcare information system utilization. 13 characteristics of healthcare information systems: 5 key and 7 additional features are identified and considered in the selection model development. Variables are used in the selection model in order to narrow the decision algorithm and to avoid duplication of brunches. The questions in the healthcare information systems selection model are designed to be easy-to-understand for common a decision-maker in healthcare institution without permanent establishment.
Resumo:
Globaalin talouden rakenteet muuttuvat jatkuvasti. Yritykset toimivat kansainvälisillä markkinoilla aiempaa enemmän. Tuotannon lisäämiseksi monet yritykset ovat ulkoistaneet tuotteidensa tuki- ja ylläpitotoiminnot halvan työvoiman maihin. Yritykset voivat tällöin keskittää toimintansa ydinosamiseensa. Vapautuneita resursseja voidaan käyttää yrityksen sisäisessä tuotekehityksessä ja panostaa seuraavan sukupolven tuotteiden ja teknologioiden kehittämiseen. Diplomityö esittelee Globaalisti hajautetun toimitusmallin Internet-palveluntarjoajalle jossa tuotteiden tuki- ja ylläpito on ulkoistettu Intiaan. Teoriaosassa esitellään erilaisia toimitusmalleja ja keskitytään erityisesti hajautettuun toimitusmalliin. Tämän lisäksi luetellaan valintakriteerejä joilla voidaan arvioida projektin soveltuvuutta ulkoistettavaksi sekä esitellään mahdollisuuksia ja uhkia jotka sisältyvät globaaliin ulkoistusprosessiin. Käytäntöosassa esitellään globaali palvelun toimittamisprosessi joka on kehitetty Internet-palveluntarjoajan tarpeisiin.
Resumo:
Tämän tutkielman tavoitteena oli määrittää uuden markkinan valinnan perusteita teolliselle tuotteelle. Tutkielma keskittyi jo tunnettuihin kansainvälisen markkinavalinnan lähestymistapoihin ja pyrki soveltamaan yhtä menetelmää käytäntöön tutkielman empiria osassa case-tutkimuksen avulla. Tutkimusote oli tutkiva, eksploratiivinen ja perustui sekundääri analyysiin. Käytetyt tiedon lähteet olivat suureksi osin sekundäärisiä tuottaen kvalitatiivista tietoa. Kuitenkin haastatteluita suoritettiin myös. Kattava kirjallisuus katsaus tunnetuista teoreettisista lähestymistavoista kansainväliseen markkinavalintaan oli osa tutkielmaa. Kolme tärkeintä lähestymistapaa esiteltiin tarkemmin. Yksi lähestymistavoista, ei-järjestelmällinen, muodosti viitekehyksen tutkielman empiria-osalle. Empiria pyrki soveltamaan yhtä ei-järjestelmällisen lähestymistavan malleista kansainvälisessä paperiteollisuudessa. Tarkoituksena oli tunnistaa kaikkein houkuttelevimmat maat mahdollisille markkinointitoimenpiteille tuotteen yhdellä loppukäyttöalueella. Tutkielmassa päädyttiin käyttämään ilmastollisia olosuhteita, siipikarjan päälukua sekä siipikarjan kasvuprosenttia suodattimina pyrittäessä vähentämään mahdollisten maiden lukumäärää. Tutkielman empiria-osa kärsi selkeästi relevantin tiedon puutteesta. Siten myös tutkielman reliabiliteetti ja validiteetti voidaan jossain määrin kyseenalaistaa.
Resumo:
In mathematical modeling the estimation of the model parameters is one of the most common problems. The goal is to seek parameters that fit to the measurements as well as possible. There is always error in the measurements which implies uncertainty to the model estimates. In Bayesian statistics all the unknown quantities are presented as probability distributions. If there is knowledge about parameters beforehand, it can be formulated as a prior distribution. The Bays’ rule combines the prior and the measurements to posterior distribution. Mathematical models are typically nonlinear, to produce statistics for them requires efficient sampling algorithms. In this thesis both Metropolis-Hastings (MH), Adaptive Metropolis (AM) algorithms and Gibbs sampling are introduced. In the thesis different ways to present prior distributions are introduced. The main issue is in the measurement error estimation and how to obtain prior knowledge for variance or covariance. Variance and covariance sampling is combined with the algorithms above. The examples of the hyperprior models are applied to estimation of model parameters and error in an outlier case.
Resumo:
This thesis was focussed on statistical analysis methods and proposes the use of Bayesian inference to extract information contained in experimental data by estimating Ebola model parameters. The model is a system of differential equations expressing the behavior and dynamics of Ebola. Two sets of data (onset and death data) were both used to estimate parameters, which has not been done by previous researchers in (Chowell, 2004). To be able to use both data, a new version of the model has been built. Model parameters have been estimated and then used to calculate the basic reproduction number and to study the disease-free equilibrium. Estimates of the parameters were useful to determine how well the model fits the data and how good estimates were, in terms of the information they provided about the possible relationship between variables. The solution showed that Ebola model fits the observed onset data at 98.95% and the observed death data at 93.6%. Since Bayesian inference can not be performed analytically, the Markov chain Monte Carlo approach has been used to generate samples from the posterior distribution over parameters. Samples have been used to check the accuracy of the model and other characteristics of the target posteriors.
Resumo:
In general, models of ecological systems can be broadly categorized as ’top-down’ or ’bottom-up’ models, based on the hierarchical level that the model processes are formulated on. The structure of a top-down, also known as phenomenological, population model can be interpreted in terms of population characteristics, but it typically lacks an interpretation on a more basic level. In contrast, bottom-up, also known as mechanistic, population models are derived from assumptions and processes on a more basic level, which allows interpretation of the model parameters in terms of individual behavior. Both approaches, phenomenological and mechanistic modelling, can have their advantages and disadvantages in different situations. However, mechanistically derived models might be better at capturing the properties of the system at hand, and thus give more accurate predictions. In particular, when models are used for evolutionary studies, mechanistic models are more appropriate, since natural selection takes place on the individual level, and in mechanistic models the direct connection between model parameters and individual properties has already been established. The purpose of this thesis is twofold. Firstly, a systematical way to derive mechanistic discrete-time population models is presented. The derivation is based on combining explicitly modelled, continuous processes on the individual level within a reproductive period with a discrete-time maturation process between reproductive periods. Secondly, as an example of how evolutionary studies can be carried out in mechanistic models, the evolution of the timing of reproduction is investigated. Thus, these two lines of research, derivation of mechanistic population models and evolutionary studies, are complementary to each other.
Resumo:
The objective of the thesis was to develop a competitors’ financial performance monitoring model for management reporting. The research consisted of the selections of the comparison group and the performance meters as well as the actual creation of the model. A brief analysis of the current situation was also made. The aim of the results was to improve the financial reporting quality in the case organization by adding external business environment observation to the management reports. The comparison group for the case company was selected to include five companies that were all involved in power equipment engineering and project type business. The most limiting factor related to the comparison group selection was the availability of quarterly financial reporting. The most suitable performance meters were defined to be the developments of revenue, order backlog and EBITDA. These meters should be monitored systematically on quarterly basis and reported to the company management in a brief and informative way. The monitoring model was based on spreadsheet construction with key characteristics being usability, flexibility and simplicity. The model acts as a centered storage for financial competitor information as well as a reporting tool. The current market situation is strongly affected by the economic boom in the recent years and future challenges can be clearly seen in declining order backlogs. The case company has succeeded well related to its comparison group during the observation period since its business volume and profitability have developed in the best way.
Resumo:
Forest inventories are used to estimate forest characteristics and the condition of forest for many different applications: operational tree logging for forest industry, forest health state estimation, carbon balance estimation, land-cover and land use analysis in order to avoid forest degradation etc. Recent inventory methods are strongly based on remote sensing data combined with field sample measurements, which are used to define estimates covering the whole area of interest. Remote sensing data from satellites, aerial photographs or aerial laser scannings are used, depending on the scale of inventory. To be applicable in operational use, forest inventory methods need to be easily adjusted to local conditions of the study area at hand. All the data handling and parameter tuning should be objective and automated as much as possible. The methods also need to be robust when applied to different forest types. Since there generally are no extensive direct physical models connecting the remote sensing data from different sources to the forest parameters that are estimated, mathematical estimation models are of "black-box" type, connecting the independent auxiliary data to dependent response data with linear or nonlinear arbitrary models. To avoid redundant complexity and over-fitting of the model, which is based on up to hundreds of possibly collinear variables extracted from the auxiliary data, variable selection is needed. To connect the auxiliary data to the inventory parameters that are estimated, field work must be performed. In larger study areas with dense forests, field work is expensive, and should therefore be minimized. To get cost-efficient inventories, field work could partly be replaced with information from formerly measured sites, databases. The work in this thesis is devoted to the development of automated, adaptive computation methods for aerial forest inventory. The mathematical model parameter definition steps are automated, and the cost-efficiency is improved by setting up a procedure that utilizes databases in the estimation of new area characteristics.
Resumo:
Mathematical models often contain parameters that need to be calibrated from measured data. The emergence of efficient Markov Chain Monte Carlo (MCMC) methods has made the Bayesian approach a standard tool in quantifying the uncertainty in the parameters. With MCMC, the parameter estimation problem can be solved in a fully statistical manner, and the whole distribution of the parameters can be explored, instead of obtaining point estimates and using, e.g., Gaussian approximations. In this thesis, MCMC methods are applied to parameter estimation problems in chemical reaction engineering, population ecology, and climate modeling. Motivated by the climate model experiments, the methods are developed further to make them more suitable for problems where the model is computationally intensive. After the parameters are estimated, one can start to use the model for various tasks. Two such tasks are studied in this thesis: optimal design of experiments, where the task is to design the next measurements so that the parameter uncertainty is minimized, and model-based optimization, where a model-based quantity, such as the product yield in a chemical reaction model, is optimized. In this thesis, novel ways to perform these tasks are developed, based on the output of MCMC parameter estimation. A separate topic is dynamical state estimation, where the task is to estimate the dynamically changing model state, instead of static parameters. For example, in numerical weather prediction, an estimate of the state of the atmosphere must constantly be updated based on the recently obtained measurements. In this thesis, a novel hybrid state estimation method is developed, which combines elements from deterministic and random sampling methods.
Resumo:
Statistical analyses of measurements that can be described by statistical models are of essence in astronomy and in scientific inquiry in general. The sensitivity of such analyses, modelling approaches, and the consequent predictions, is sometimes highly dependent on the exact techniques applied, and improvements therein can result in significantly better understanding of the observed system of interest. Particularly, optimising the sensitivity of statistical techniques in detecting the faint signatures of low-mass planets orbiting the nearby stars is, together with improvements in instrumentation, essential in estimating the properties of the population of such planets, and in the race to detect Earth-analogs, i.e. planets that could support liquid water and, perhaps, life on their surfaces. We review the developments in Bayesian statistical techniques applicable to detections planets orbiting nearby stars and astronomical data analysis problems in general. We also discuss these techniques and demonstrate their usefulness by using various examples and detailed descriptions of the respective mathematics involved. We demonstrate the practical aspects of Bayesian statistical techniques by describing several algorithms and numerical techniques, as well as theoretical constructions, in the estimation of model parameters and in hypothesis testing. We also apply these algorithms to Doppler measurements of nearby stars to show how they can be used in practice to obtain as much information from the noisy data as possible. Bayesian statistical techniques are powerful tools in analysing and interpreting noisy data and should be preferred in practice whenever computational limitations are not too restrictive.