947 resultados para Hierarchical Bayesian Methods


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Bayesian adaptive methods have been extensively used in psychophysics to estimate the point at which performance on a task attains arbitrary percentage levels, although the statistical properties of these estimators have never been assessed. We used simulation techniques to determine the small-sample properties of Bayesian estimators of arbitrary performance points, specifically addressing the issues of bias and precision as a function of the target percentage level. The study covered three major types of psychophysical task (yes-no detection, 2AFC discrimination and 2AFC detection) and explored the entire range of target performance levels allowed for by each task. Other factors included in the study were the form and parameters of the actual psychometric function Psi, the form and parameters of the model function M assumed in the Bayesian method, and the location of Psi within the parameter space. Our results indicate that Bayesian adaptive methods render unbiased estimators of any arbitrary point on psi only when M=Psi, and otherwise they yield bias whose magnitude can be considerable as the target level moves away from the midpoint of the range of Psi. The standard error of the estimator also increases as the target level approaches extreme values whether or not M=Psi. Contrary to widespread belief, neither the performance level at which bias is null nor that at which standard error is minimal can be predicted by the sweat factor. A closed-form expression nevertheless gives a reasonable fit to data describing the dependence of standard error on number of trials and target level, which allows determination of the number of trials that must be administered to obtain estimates with prescribed precision.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Fixed-step-size (FSS) and Bayesian staircases are widely used methods to estimate sensory thresholds in 2AFC tasks, although a direct comparison of both types of procedure under identical conditions has not previously been reported. A simulation study and an empirical test were conducted to compare the performance of optimized Bayesian staircases with that of four optimized variants of FSS staircase differing as to up-down rule. The ultimate goal was to determine whether FSS or Bayesian staircases are the best choice in experimental psychophysics. The comparison considered the properties of the estimates (i.e. bias and standard errors) in relation to their cost (i.e. the number of trials to completion). The simulation study showed that mean estimates of Bayesian and FSS staircases are dependable when sufficient trials are given and that, in both cases, the standard deviation (SD) of the estimates decreases with number of trials, although the SD of Bayesian estimates is always lower than that of FSS estimates (and thus, Bayesian staircases are more efficient). The empirical test did not support these conclusions, as (1) neither procedure rendered estimates converging on some value, (2) standard deviations did not follow the expected pattern of decrease with number of trials, and (3) both procedures appeared to be equally efficient. Potential factors explaining the discrepancies between simulation and empirical results are commented upon and, all things considered, a sensible recommendation is for psychophysicists to run no fewer than 18 and no more than 30 reversals of an FSS staircase implementing the 1-up/3-down rule.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Constant technology advances have caused data explosion in recent years. Accord- ingly modern statistical and machine learning methods must be adapted to deal with complex and heterogeneous data types. This phenomenon is particularly true for an- alyzing biological data. For example DNA sequence data can be viewed as categorical variables with each nucleotide taking four different categories. The gene expression data, depending on the quantitative technology, could be continuous numbers or counts. With the advancement of high-throughput technology, the abundance of such data becomes unprecedentedly rich. Therefore efficient statistical approaches are crucial in this big data era.

Previous statistical methods for big data often aim to find low dimensional struc- tures in the observed data. For example in a factor analysis model a latent Gaussian distributed multivariate vector is assumed. With this assumption a factor model produces a low rank estimation of the covariance of the observed variables. Another example is the latent Dirichlet allocation model for documents. The mixture pro- portions of topics, represented by a Dirichlet distributed variable, is assumed. This dissertation proposes several novel extensions to the previous statistical methods that are developed to address challenges in big data. Those novel methods are applied in multiple real world applications including construction of condition specific gene co-expression networks, estimating shared topics among newsgroups, analysis of pro- moter sequences, analysis of political-economics risk data and estimating population structure from genotype data.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Testing for differences within data sets is an important issue across various applications. Our work is primarily motivated by the analysis of microbiomial composition, which has been increasingly relevant and important with the rise of DNA sequencing. We first review classical frequentist tests that are commonly used in tackling such problems. We then propose a Bayesian Dirichlet-multinomial framework for modeling the metagenomic data and for testing underlying differences between the samples. A parametric Dirichlet-multinomial model uses an intuitive hierarchical structure that allows for flexibility in characterizing both the within-group variation and the cross-group difference and provides very interpretable parameters. A computational method for evaluating the marginal likelihoods under the null and alternative hypotheses is also given. Through simulations, we show that our Bayesian model performs competitively against frequentist counterparts. We illustrate the method through analyzing metagenomic applications using the Human Microbiome Project data.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This dissertation contributes to the rapidly growing empirical research area in the field of operations management. It contains two essays, tackling two different sets of operations management questions which are motivated by and built on field data sets from two very different industries --- air cargo logistics and retailing.

The first essay, based on the data set obtained from a world leading third-party logistics company, develops a novel and general Bayesian hierarchical learning framework for estimating customers' spillover learning, that is, customers' learning about the quality of a service (or product) from their previous experiences with similar yet not identical services. We then apply our model to the data set to study how customers' experiences from shipping on a particular route affect their future decisions about shipping not only on that route, but also on other routes serviced by the same logistics company. We find that customers indeed borrow experiences from similar but different services to update their quality beliefs that determine future purchase decisions. Also, service quality beliefs have a significant impact on their future purchasing decisions. Moreover, customers are risk averse; they are averse to not only experience variability but also belief uncertainty (i.e., customer's uncertainty about their beliefs). Finally, belief uncertainty affects customers' utilities more compared to experience variability.

The second essay is based on a data set obtained from a large Chinese supermarket chain, which contains sales as well as both wholesale and retail prices of un-packaged perishable vegetables. Recognizing the special characteristics of this particularly product category, we develop a structural estimation model in a discrete-continuous choice model framework. Building on this framework, we then study an optimization model for joint pricing and inventory management strategies of multiple products, which aims at improving the company's profit from direct sales and at the same time reducing food waste and thus improving social welfare.

Collectively, the studies in this dissertation provide useful modeling ideas, decision tools, insights, and guidance for firms to utilize vast sales and operations data to devise more effective business strategies.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Surveys can collect important data that inform policy decisions and drive social science research. Large government surveys collect information from the U.S. population on a wide range of topics, including demographics, education, employment, and lifestyle. Analysis of survey data presents unique challenges. In particular, one needs to account for missing data, for complex sampling designs, and for measurement error. Conceptually, a survey organization could spend lots of resources getting high-quality responses from a simple random sample, resulting in survey data that are easy to analyze. However, this scenario often is not realistic. To address these practical issues, survey organizations can leverage the information available from other sources of data. For example, in longitudinal studies that suffer from attrition, they can use the information from refreshment samples to correct for potential attrition bias. They can use information from known marginal distributions or survey design to improve inferences. They can use information from gold standard sources to correct for measurement error.

This thesis presents novel approaches to combining information from multiple sources that address the three problems described above.

The first method addresses nonignorable unit nonresponse and attrition in a panel survey with a refreshment sample. Panel surveys typically suffer from attrition, which can lead to biased inference when basing analysis only on cases that complete all waves of the panel. Unfortunately, the panel data alone cannot inform the extent of the bias due to attrition, so analysts must make strong and untestable assumptions about the missing data mechanism. Many panel studies also include refreshment samples, which are data collected from a random sample of new

individuals during some later wave of the panel. Refreshment samples offer information that can be utilized to correct for biases induced by nonignorable attrition while reducing reliance on strong assumptions about the attrition process. To date, these bias correction methods have not dealt with two key practical issues in panel studies: unit nonresponse in the initial wave of the panel and in the

refreshment sample itself. As we illustrate, nonignorable unit nonresponse

can significantly compromise the analyst's ability to use the refreshment samples for attrition bias correction. Thus, it is crucial for analysts to assess how sensitive their inferences---corrected for panel attrition---are to different assumptions about the nature of the unit nonresponse. We present an approach that facilitates such sensitivity analyses, both for suspected nonignorable unit nonresponse

in the initial wave and in the refreshment sample. We illustrate the approach using simulation studies and an analysis of data from the 2007-2008 Associated Press/Yahoo News election panel study.

The second method incorporates informative prior beliefs about

marginal probabilities into Bayesian latent class models for categorical data.

The basic idea is to append synthetic observations to the original data such that

(i) the empirical distributions of the desired margins match those of the prior beliefs, and (ii) the values of the remaining variables are left missing. The degree of prior uncertainty is controlled by the number of augmented records. Posterior inferences can be obtained via typical MCMC algorithms for latent class models, tailored to deal efficiently with the missing values in the concatenated data.

We illustrate the approach using a variety of simulations based on data from the American Community Survey, including an example of how augmented records can be used to fit latent class models to data from stratified samples.

The third method leverages the information from a gold standard survey to model reporting error. Survey data are subject to reporting error when respondents misunderstand the question or accidentally select the wrong response. Sometimes survey respondents knowingly select the wrong response, for example, by reporting a higher level of education than they actually have attained. We present an approach that allows an analyst to model reporting error by incorporating information from a gold standard survey. The analyst can specify various reporting error models and assess how sensitive their conclusions are to different assumptions about the reporting error process. We illustrate the approach using simulations based on data from the 1993 National Survey of College Graduates. We use the method to impute error-corrected educational attainments in the 2010 American Community Survey using the 2010 National Survey of College Graduates as the gold standard survey.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Learning Bayesian networks with bounded tree-width has attracted much attention recently, because low tree-width allows exact inference to be performed efficiently. Some existing methods \cite{korhonen2exact, nie2014advances} tackle the problem by using $k$-trees to learn the optimal Bayesian network with tree-width up to $k$. Finding the best $k$-tree, however, is computationally intractable. In this paper, we propose a sampling method to efficiently find representative $k$-trees by introducing an informative score function to characterize the quality of a $k$-tree. To further improve the quality of the $k$-trees, we propose a probabilistic hill climbing approach that locally refines the sampled $k$-trees. The proposed algorithm can efficiently learn a quality Bayesian network with tree-width at most $k$. Experimental results demonstrate that our approach is more computationally efficient than the exact methods with comparable accuracy, and outperforms most existing approximate methods.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

L’un des problèmes importants en apprentissage automatique est de déterminer la complexité du modèle à apprendre. Une trop grande complexité mène au surapprentissage, ce qui correspond à trouver des structures qui n’existent pas réellement dans les données, tandis qu’une trop faible complexité mène au sous-apprentissage, c’est-à-dire que l’expressivité du modèle est insuffisante pour capturer l’ensemble des structures présentes dans les données. Pour certains modèles probabilistes, la complexité du modèle se traduit par l’introduction d’une ou plusieurs variables cachées dont le rôle est d’expliquer le processus génératif des données. Il existe diverses approches permettant d’identifier le nombre approprié de variables cachées d’un modèle. Cette thèse s’intéresse aux méthodes Bayésiennes nonparamétriques permettant de déterminer le nombre de variables cachées à utiliser ainsi que leur dimensionnalité. La popularisation des statistiques Bayésiennes nonparamétriques au sein de la communauté de l’apprentissage automatique est assez récente. Leur principal attrait vient du fait qu’elles offrent des modèles hautement flexibles et dont la complexité s’ajuste proportionnellement à la quantité de données disponibles. Au cours des dernières années, la recherche sur les méthodes d’apprentissage Bayésiennes nonparamétriques a porté sur trois aspects principaux : la construction de nouveaux modèles, le développement d’algorithmes d’inférence et les applications. Cette thèse présente nos contributions à ces trois sujets de recherches dans le contexte d’apprentissage de modèles à variables cachées. Dans un premier temps, nous introduisons le Pitman-Yor process mixture of Gaussians, un modèle permettant l’apprentissage de mélanges infinis de Gaussiennes. Nous présentons aussi un algorithme d’inférence permettant de découvrir les composantes cachées du modèle que nous évaluons sur deux applications concrètes de robotique. Nos résultats démontrent que l’approche proposée surpasse en performance et en flexibilité les approches classiques d’apprentissage. Dans un deuxième temps, nous proposons l’extended cascading Indian buffet process, un modèle servant de distribution de probabilité a priori sur l’espace des graphes dirigés acycliques. Dans le contexte de réseaux Bayésien, ce prior permet d’identifier à la fois la présence de variables cachées et la structure du réseau parmi celles-ci. Un algorithme d’inférence Monte Carlo par chaîne de Markov est utilisé pour l’évaluation sur des problèmes d’identification de structures et d’estimation de densités. Dans un dernier temps, nous proposons le Indian chefs process, un modèle plus général que l’extended cascading Indian buffet process servant à l’apprentissage de graphes et d’ordres. L’avantage du nouveau modèle est qu’il admet les connections entres les variables observables et qu’il prend en compte l’ordre des variables. Nous présentons un algorithme d’inférence Monte Carlo par chaîne de Markov avec saut réversible permettant l’apprentissage conjoint de graphes et d’ordres. L’évaluation est faite sur des problèmes d’estimations de densité et de test d’indépendance. Ce modèle est le premier modèle Bayésien nonparamétrique permettant d’apprendre des réseaux Bayésiens disposant d’une structure complètement arbitraire.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

When something unfamiliar emerges or when something familiar does something unexpected people need to make sense of what is emerging or going on in order to act. Social representations theory suggests how individuals and society make sense of the unfamiliar and hence how the resultant social representations (SRs) cognitively, emotionally, and actively orient people and enable communication. SRs are social constructions that emerge through individual and collective engagement with media and with everyday conversations among people. Recent developments in text analysis techniques, and in particular topic modeling, provide a potentially powerful analytical method to examine the structure and content of SRs using large samples of narrative or text. In this paper I describe the methods and results of applying topic modeling to 660 micronarratives collected from Australian academics / researchers, government employees, and members of the public in 2010-2011. The narrative fragments focused on adaptation to climate change (CC) and hence provide an example of Australian society making sense of an emerging and conflict ridden phenomena. The results of the topic modeling reflect elements of SRs of adaptation to CC that are consistent with findings in the literature as well as being reasonably robust predictors of classes of action in response to CC. Bayesian Network (BN) modeling was used to identify relationships among the topics (SR elements) and in particular to identify relationships among topics, sentiment, and action. Finally the resulting model and topic modeling results are used to highlight differences in the salience of SR elements among social groups. The approach of linking topic modeling and BN modeling offers a new and encouraging approach to analysis for ongoing research on SRs.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Le processus de planification forestière hiérarchique présentement en place sur les terres publiques risque d’échouer à deux niveaux. Au niveau supérieur, le processus en place ne fournit pas une preuve suffisante de la durabilité du niveau de récolte actuel. À un niveau inférieur, le processus en place n’appuie pas la réalisation du plein potentiel de création de valeur de la ressource forestière, contraignant parfois inutilement la planification à court terme de la récolte. Ces échecs sont attribuables à certaines hypothèses implicites au modèle d’optimisation de la possibilité forestière, ce qui pourrait expliquer pourquoi ce problème n’est pas bien documenté dans la littérature. Nous utilisons la théorie de l’agence pour modéliser le processus de planification forestière hiérarchique sur les terres publiques. Nous développons un cadre de simulation itératif en deux étapes pour estimer l’effet à long terme de l’interaction entre l’État et le consommateur de fibre, nous permettant ainsi d’établir certaines conditions pouvant mener à des ruptures de stock. Nous proposons ensuite une formulation améliorée du modèle d’optimisation de la possibilité forestière. La formulation classique du modèle d’optimisation de la possibilité forestière (c.-à-d., maximisation du rendement soutenu en fibre) ne considère pas que le consommateur de fibre industriel souhaite maximiser son profit, mais suppose plutôt la consommation totale de l’offre de fibre à chaque période, peu importe le potentiel de création de valeur de celle-ci. Nous étendons la formulation classique du modèle d’optimisation de la possibilité forestière afin de permettre l’anticipation du comportement du consommateur de fibre, augmentant ainsi la probabilité que l’offre de fibre soit entièrement consommée, rétablissant ainsi la validité de l’hypothèse de consommation totale de l’offre de fibre implicite au modèle d’optimisation. Nous modélisons la relation principal-agent entre le gouvernement et l’industrie à l’aide d’une formulation biniveau du modèle optimisation, où le niveau supérieur représente le processus de détermination de la possibilité forestière (responsabilité du gouvernement), et le niveau inférieur représente le processus de consommation de la fibre (responsabilité de l’industrie). Nous montrons que la formulation biniveau peux atténuer le risque de ruptures de stock, améliorant ainsi la crédibilité du processus de planification forestière hiérarchique. Ensemble, le modèle biniveau d’optimisation de la possibilité forestière et la méthodologie que nous avons développée pour résoudre celui-ci à l’optimalité, représentent une alternative aux méthodes actuellement utilisées. Notre modèle biniveau et le cadre de simulation itérative représentent un pas vers l’avant en matière de technologie de planification forestière axée sur la création de valeur. L’intégration explicite d’objectifs et de contraintes industrielles au processus de planification forestière, dès la détermination de la possibilité forestière, devrait favoriser une collaboration accrue entre les instances gouvernementales et industrielles, permettant ainsi d’exploiter le plein potentiel de création de valeur de la ressource forestière.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Habitat fragmentation and the consequently the loss of connectivity between populations can reduce the individuals interchange and gene flow, increasing the chances of inbreeding, and the increase the risk of local extinction. Landscape genetics is providing more and better tools to identify genetic barriers.. To our knowledge, no comparison of methods in terms of consistency has been made with observed data and species with low dispersal ability. The aim of this study is to examine the consistency of the results of five methods to detect barriers to gene flow in a Mediterranean pine vole population Microtus duodecimcostatus: F-statistics estimations, Non-Bayesian clustering, Bayesian clustering, Boundary detection and Simple/Partial Mantel tests. All methods were consistent in detecting the stream as a non-genetic barrier. However, no consistency in results among the methods were found regarding the role of the highway as a genetic barrier. Fst, Bayesian clustering assignment test and Partial Mantel test identifyed the highway as a filter to individual interchange. The Mantel tests were the most sensitive method. Boundary detection method (Monmonier’s Algorithm) and Non-Bayesian approaches did not detect any genetic differentiation of the pine vole due to the highway. Based on our findings we recommend that the genetic barrier detection in low dispersal ability populations should be analyzed with multiple methods such as Mantel tests, Bayesian clustering approaches because they show more sensibility in those scenarios and with boundary detection methods by having the aim of detect drastic changes in a variable of interest between the closest individuals. Although simulation studies highlight the weaknesses and the strengths of each method and the factors that promote some results, tests with real data are needed to increase the effectiveness of genetic barrier detection.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The recent advent of new technologies has led to huge amounts of genomic data. With these data come new opportunities to understand biological cellular processes underlying hidden regulation mechanisms and to identify disease related biomarkers for informative diagnostics. However, extracting biological insights from the immense amounts of genomic data is a challenging task. Therefore, effective and efficient computational techniques are needed to analyze and interpret genomic data. In this thesis, novel computational methods are proposed to address such challenges: a Bayesian mixture model, an extended Bayesian mixture model, and an Eigen-brain approach. The Bayesian mixture framework involves integration of the Bayesian network and the Gaussian mixture model. Based on the proposed framework and its conjunction with K-means clustering and principal component analysis (PCA), biological insights are derived such as context specific/dependent relationships and nested structures within microarray where biological replicates are encapsulated. The Bayesian mixture framework is then extended to explore posterior distributions of network space by incorporating a Markov chain Monte Carlo (MCMC) model. The extended Bayesian mixture model summarizes the sampled network structures by extracting biologically meaningful features. Finally, an Eigen-brain approach is proposed to analyze in situ hybridization data for the identification of the cell-type specific genes, which can be useful for informative blood diagnostics. Computational results with region-based clustering reveals the critical evidence for the consistency with brain anatomical structure.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Rigid adherence to pre-specified thresholds and static graphical representations can lead to incorrect decisions on merging of clusters. As an alternative to existing automated or semi-automated methods, we developed a visual analytics approach for performing hierarchical clustering analysis of short time-series gene expression data. Dynamic sliders control parameters such as the similarity threshold at which clusters are merged and the level of relative intra-cluster distinctiveness, which can be used to identify "weak-edges" within clusters. An expert user can drill down to further explore the dendrogram and detect nested clusters and outliers. This is done by using the sliders and by pointing and clicking on the representation to cut the branches of the tree in multiple-heights. A prototype of this tool has been developed in collaboration with a small group of biologists for analysing their own datasets. Initial feedback on the tool has been positive.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A flexible and multipurpose bio-inspired hierarchical model for analyzing musical timbre is presented in this paper. Inspired by findings in the fields of neuroscience, computational neuroscience, and psychoacoustics, not only does the model extract spectral and temporal characteristics of a signal, but it also analyzes amplitude modulations on different timescales. It uses a cochlear filter bank to resolve the spectral components of a sound, lateral inhibition to enhance spectral resolution, and a modulation filter bank to extract the global temporal envelope and roughness of the sound from amplitude modulations. The model was evaluated in three applications. First, it was used to simulate subjective data from two roughness experiments. Second, it was used for musical instrument classification using the k-NN algorithm and a Bayesian network. Third, it was applied to find the features that characterize sounds whose timbres were labeled in an audiovisual experiment. The successful application of the proposed model in these diverse tasks revealed its potential in capturing timbral information.