5 resultados para Probabilistic choice models

em Helda - Digital Repository of University of Helsinki


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Minimum Description Length (MDL) is an information-theoretic principle that can be used for model selection and other statistical inference tasks. There are various ways to use the principle in practice. One theoretically valid way is to use the normalized maximum likelihood (NML) criterion. Due to computational difficulties, this approach has not been used very often. This thesis presents efficient floating-point algorithms that make it possible to compute the NML for multinomial, Naive Bayes and Bayesian forest models. None of the presented algorithms rely on asymptotic analysis and with the first two model classes we also discuss how to compute exact rational number solutions.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Eutrophication of the Baltic Sea is a serious problem. This thesis estimates the benefit to Finns from reduced eutrophication in the Gulf of Finland, the most eutrophied part of the Baltic Sea, by applying the choice experiment method, which belongs to the family of stated preference methods. Because stated preference methods have been subject to criticism, e.g., due to their hypothetical survey context, this thesis contributes to the discussion by studying two anomalies that may lead to biased welfare estimates: respondent uncertainty and preference discontinuity. The former refers to the difficulty of stating one s preferences for an environmental good in a hypothetical context. The latter implies a departure from the continuity assumption of conventional consumer theory, which forms the basis for the method and the analysis. In the three essays of the thesis, discrete choice data are analyzed with the multinomial logit and mixed logit models. On average, Finns are willing to contribute to the water quality improvement. The probability for willingness increases with residential or recreational contact with the gulf, higher than average income, younger than average age, and the absence of dependent children in the household. On average, for Finns the relatively most important characteristic of water quality is water clarity followed by the desire for fewer occurrences of blue-green algae. For future nutrient reduction scenarios, the annual mean household willingness to pay estimates range from 271 to 448 and the aggregate welfare estimates for Finns range from 28 billion to 54 billion euros, depending on the model and the intensity of the reduction. Out of the respondents (N=726), 72.1% state in a follow-up question that they are either Certain or Quite certain about their answer when choosing the preferred alternative in the experiment. Based on the analysis of other follow-up questions and another sample (N=307), 10.4% of the respondents are identified as potentially having discontinuous preferences. In relation to both anomalies, the respondent- and questionnaire-specific variables are found among the underlying causes and a departure from standard analysis may improve the model fit and the efficiency of estimates, depending on the chosen modeling approach. The introduction of uncertainty about the future state of the Gulf increases the acceptance of the valuation scenario which may indicate an increased credibility of a proposed scenario. In conclusion, modeling preference heterogeneity is an essential part of the analysis of discrete choice data. The results regarding uncertainty in stating one s preferences and non-standard choice behavior are promising: accounting for these anomalies in the analysis may improve the precision of the estimates of benefit from reduced eutrophication in the Gulf of Finland.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Genetics, the science of heredity and variation in living organisms, has a central role in medicine, in breeding crops and livestock, and in studying fundamental topics of biological sciences such as evolution and cell functioning. Currently the field of genetics is under a rapid development because of the recent advances in technologies by which molecular data can be obtained from living organisms. In order that most information from such data can be extracted, the analyses need to be carried out using statistical models that are tailored to take account of the particular genetic processes. In this thesis we formulate and analyze Bayesian models for genetic marker data of contemporary individuals. The major focus is on the modeling of the unobserved recent ancestry of the sampled individuals (say, for tens of generations or so), which is carried out by using explicit probabilistic reconstructions of the pedigree structures accompanied by the gene flows at the marker loci. For such a recent history, the recombination process is the major genetic force that shapes the genomes of the individuals, and it is included in the model by assuming that the recombination fractions between the adjacent markers are known. The posterior distribution of the unobserved history of the individuals is studied conditionally on the observed marker data by using a Markov chain Monte Carlo algorithm (MCMC). The example analyses consider estimation of the population structure, relatedness structure (both at the level of whole genomes as well as at each marker separately), and haplotype configurations. For situations where the pedigree structure is partially known, an algorithm to create an initial state for the MCMC algorithm is given. Furthermore, the thesis includes an extension of the model for the recent genetic history to situations where also a quantitative phenotype has been measured from the contemporary individuals. In that case the goal is to identify positions on the genome that affect the observed phenotypic values. This task is carried out within the Bayesian framework, where the number and the relative effects of the quantitative trait loci are treated as random variables whose posterior distribution is studied conditionally on the observed genetic and phenotypic data. In addition, the thesis contains an extension of a widely-used haplotyping method, the PHASE algorithm, to settings where genetic material from several individuals has been pooled together, and the allele frequencies of each pool are determined in a single genotyping.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

What can the statistical structure of natural images teach us about the human brain? Even though the visual cortex is one of the most studied parts of the brain, surprisingly little is known about how exactly images are processed to leave us with a coherent percept of the world around us, so we can recognize a friend or drive on a crowded street without any effort. By constructing probabilistic models of natural images, the goal of this thesis is to understand the structure of the stimulus that is the raison d etre for the visual system. Following the hypothesis that the optimal processing has to be matched to the structure of that stimulus, we attempt to derive computational principles, features that the visual system should compute, and properties that cells in the visual system should have. Starting from machine learning techniques such as principal component analysis and independent component analysis we construct a variety of sta- tistical models to discover structure in natural images that can be linked to receptive field properties of neurons in primary visual cortex such as simple and complex cells. We show that by representing images with phase invariant, complex cell-like units, a better statistical description of the vi- sual environment is obtained than with linear simple cell units, and that complex cell pooling can be learned by estimating both layers of a two-layer model of natural images. We investigate how a simplified model of the processing in the retina, where adaptation and contrast normalization take place, is connected to the nat- ural stimulus statistics. Analyzing the effect that retinal gain control has on later cortical processing, we propose a novel method to perform gain control in a data-driven way. Finally we show how models like those pre- sented here can be extended to capture whole visual scenes rather than just small image patches. By using a Markov random field approach we can model images of arbitrary size, while still being able to estimate the model parameters from the data.