984 resultados para Bayesian modeling
Resumo:
This paper proposes and demonstrates an approach, Skilloscopy, to the assessment of decision makers. In an increasingly sophisticated, connected and information-rich world, decision making is becoming both more important and more difficult. At the same time, modelling decision-making on computers is becoming more feasible and of interest, partly because the information-input to those decisions is increasingly on record. The aims of Skilloscopy are to rate and rank decision makers in a domain relative to each other: the aims do not include an analysis of why a decision is wrong or suboptimal, nor the modelling of the underlying cognitive process of making the decisions. In the proposed method a decision-maker is characterised by a probability distribution of their competence in choosing among quantifiable alternatives. This probability distribution is derived by classic Bayesian inference from a combination of prior belief and the evidence of the decisions. Thus, decision-makers’ skills may be better compared, rated and ranked. The proposed method is applied and evaluated in the gamedomain of Chess. A large set of games by players across a broad range of the World Chess Federation (FIDE) Elo ratings has been used to infer the distribution of players’ rating directly from the moves they play rather than from game outcomes. Demonstration applications address questions frequently asked by the Chess community regarding the stability of the Elo rating scale, the comparison of players of different eras and/or leagues, and controversial incidents possibly involving fraud. The method of Skilloscopy may be applied in any decision domain where the value of the decision-options can be quantified.
Resumo:
Sentiment analysis has long focused on binary classification of text as either positive or negative. There has been few work on mapping sentiments or emotions into multiple dimensions. This paper studies a Bayesian modeling approach to multi-class sentiment classification and multidimensional sentiment distributions prediction. It proposes effective mechanisms to incorporate supervised information such as labeled feature constraints and document-level sentiment distributions derived from the training data into model learning. We have evaluated our approach on the datasets collected from the confession section of the Experience Project website where people share their life experiences and personal stories. Our results show that using the latent representation of the training documents derived from our approach as features to build a maximum entropy classifier outperforms other approaches on multi-class sentiment classification. In the more difficult task of multi-dimensional sentiment distributions prediction, our approach gives superior performance compared to a few competitive baselines. © 2012 ACM.
Resumo:
Surveys can collect important data that inform policy decisions and drive social science research. Large government surveys collect information from the U.S. population on a wide range of topics, including demographics, education, employment, and lifestyle. Analysis of survey data presents unique challenges. In particular, one needs to account for missing data, for complex sampling designs, and for measurement error. Conceptually, a survey organization could spend lots of resources getting high-quality responses from a simple random sample, resulting in survey data that are easy to analyze. However, this scenario often is not realistic. To address these practical issues, survey organizations can leverage the information available from other sources of data. For example, in longitudinal studies that suffer from attrition, they can use the information from refreshment samples to correct for potential attrition bias. They can use information from known marginal distributions or survey design to improve inferences. They can use information from gold standard sources to correct for measurement error.
This thesis presents novel approaches to combining information from multiple sources that address the three problems described above.
The first method addresses nonignorable unit nonresponse and attrition in a panel survey with a refreshment sample. Panel surveys typically suffer from attrition, which can lead to biased inference when basing analysis only on cases that complete all waves of the panel. Unfortunately, the panel data alone cannot inform the extent of the bias due to attrition, so analysts must make strong and untestable assumptions about the missing data mechanism. Many panel studies also include refreshment samples, which are data collected from a random sample of new
individuals during some later wave of the panel. Refreshment samples offer information that can be utilized to correct for biases induced by nonignorable attrition while reducing reliance on strong assumptions about the attrition process. To date, these bias correction methods have not dealt with two key practical issues in panel studies: unit nonresponse in the initial wave of the panel and in the
refreshment sample itself. As we illustrate, nonignorable unit nonresponse
can significantly compromise the analyst's ability to use the refreshment samples for attrition bias correction. Thus, it is crucial for analysts to assess how sensitive their inferences---corrected for panel attrition---are to different assumptions about the nature of the unit nonresponse. We present an approach that facilitates such sensitivity analyses, both for suspected nonignorable unit nonresponse
in the initial wave and in the refreshment sample. We illustrate the approach using simulation studies and an analysis of data from the 2007-2008 Associated Press/Yahoo News election panel study.
The second method incorporates informative prior beliefs about
marginal probabilities into Bayesian latent class models for categorical data.
The basic idea is to append synthetic observations to the original data such that
(i) the empirical distributions of the desired margins match those of the prior beliefs, and (ii) the values of the remaining variables are left missing. The degree of prior uncertainty is controlled by the number of augmented records. Posterior inferences can be obtained via typical MCMC algorithms for latent class models, tailored to deal efficiently with the missing values in the concatenated data.
We illustrate the approach using a variety of simulations based on data from the American Community Survey, including an example of how augmented records can be used to fit latent class models to data from stratified samples.
The third method leverages the information from a gold standard survey to model reporting error. Survey data are subject to reporting error when respondents misunderstand the question or accidentally select the wrong response. Sometimes survey respondents knowingly select the wrong response, for example, by reporting a higher level of education than they actually have attained. We present an approach that allows an analyst to model reporting error by incorporating information from a gold standard survey. The analyst can specify various reporting error models and assess how sensitive their conclusions are to different assumptions about the reporting error process. We illustrate the approach using simulations based on data from the 1993 National Survey of College Graduates. We use the method to impute error-corrected educational attainments in the 2010 American Community Survey using the 2010 National Survey of College Graduates as the gold standard survey.
Utilisation de splines monotones afin de condenser des tables de mortalité dans un contexte bayésien
Resumo:
Dans ce mémoire, nous cherchons à modéliser des tables à deux entrées monotones en lignes et/ou en colonnes, pour une éventuelle application sur les tables de mortalité. Nous adoptons une approche bayésienne non paramétrique et représentons la forme fonctionnelle des données par splines bidimensionnelles. L’objectif consiste à condenser une table de mortalité, c’est-à-dire de réduire l’espace d’entreposage de la table en minimisant la perte d’information. De même, nous désirons étudier le temps nécessaire pour reconstituer la table. L’approximation doit conserver les mêmes propriétés que la table de référence, en particulier la monotonie des données. Nous travaillons avec une base de fonctions splines monotones afin d’imposer plus facilement la monotonie au modèle. En effet, la structure flexible des splines et leurs dérivées faciles à manipuler favorisent l’imposition de contraintes sur le modèle désiré. Après un rappel sur la modélisation unidimensionnelle de fonctions monotones, nous généralisons l’approche au cas bidimensionnel. Nous décrivons l’intégration des contraintes de monotonie dans le modèle a priori sous l’approche hiérarchique bayésienne. Ensuite, nous indiquons comment obtenir un estimateur a posteriori à l’aide des méthodes de Monte Carlo par chaînes de Markov. Finalement, nous étudions le comportement de notre estimateur en modélisant une table de la loi normale ainsi qu’une table t de distribution de Student. L’estimation de nos données d’intérêt, soit la table de mortalité, s’ensuit afin d’évaluer l’amélioration de leur accessibilité.
Resumo:
Predictability is related to the uncertainty in the outcome of future events during the evolution of the state of a system. The cluster weighted modeling (CWM) is interpreted as a tool to detect such an uncertainty and used it in spatially distributed systems. As such, the simple prediction algorithm in conjunction with the CWM forms a powerful set of methods to relate predictability and dimension.
Resumo:
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)
Resumo:
Trichinellosis is a zoonotic disease that is caused by the nematode Trichinella spp. Both European Union regulations and guidelines from the World Organization for Animal Health foresee the possibility of conducting serological surveillance for Trichinella spp. A newly developed commercial enzyme-linked immunosorbent assay (ELISA) was evaluated against 2 existing diagnostic techniques: an in-house ELISA and an in-house Western blot. A total of 875 Trichinella larva-negative samples of pigs and 93 Trichinella larva-positive samples of both naturally and experimentally infected pigs were included in the study. Bayesian modeling techniques were used to correct for the absence of a perfect reference test. The sensitivity and specificity of the commercial ELISA was 97.1-97.8% and 99.5-99.8%, respectively. Sensitivity analysis demonstrated high stability in the models. In a serological surveillance system, ELISA-positive samples should be tested by a confirmatory test. The Western blot is a suitable test for this purpose. With the use of the results of the models, the sensitivity and specificity of a test protocol in both ELISA and Western blot were 95.9% and 99.9%, respectively. The high sensitivity and specificity were achieved with a lower limit of detection than that of the routine artificial digestion test, suggesting that serological surveillance is a valuable alternative in surveillance for Trichinella spp. in pig production.
Resumo:
Trichinellosis is a zoonotic disease in humans caused by Trichinella spp. According to international regulations and guidelines, serological surveillance can be used to demonstrate the absence of Trichinella spp. in a defined domestic pig population. Most enzyme-linked immunosorbent assay (ELISA) tests presently available do not yield 100% specificity, and therefore, a complementary test is needed to confirm the diagnosis of any initial ELISA seropositivity. The goal of the present study was to evaluate the sensitivity and specificity of a Western Blot assay based on somatic Trichinella spiralis muscle stage (L1) antigen using Bayesian modeling techniques. A total of 295 meat juice and serum samples from pigs negative for Trichinella larvae by artificial digestion, including 74 potentially cross-reactive sera of pigs with other nematode infections, and 93 meat juice samples from pigs infected with Trichinella larvae were included in the study. The diagnostic sensitivity and specificity of the Western Blot were ranged from 95.8% to 96.0% and from 99.5% to 99.6%, respectively. A sensitivity analysis showed that the model outcomes were hardly influenced by changes in the prior distributions, providing a high confidence in the outcomes of the models. This validation study demonstrated that the Western Blot is a suitable method to confirm samples that reacted positively in an initial ELISA.
Resumo:
Mental imagery and perception are thought to rely on similar neural circuits, and many recent behavioral studies have attempted to demonstrate interactions between actual physical stimulation and sensory imagery in the corresponding sensory modality. However, there has been a lack of theoretical understanding of the nature of these interactions, and both interferential and facilitatory effects have been found. Facilitatory effects appear strikingly similar to those that arise due to experimental manipulations of expectation. Using a self-motion discrimination task, we try to disentangle the effects of mental imagery from those of expectation by using a hierarchical drift diffusion model to investigate both choice data and response times. Manipulations of expectation are reasonably well understood in terms of their selective influence on parameters of the drift diffusion model, and in this study, we make the first attempt to similarly characterize the effects of mental imagery. We investigate mental imagery within the computational framework of control theory and state estimation. • Mental imagery and perception are thought to rely on similar neural circuits; however, on more theoretical grounds, imagery seems to be closely related to the output of forward models (sensory predictions). • We reanalyzed data from a study of imagined self-motion. • Bayesian modeling of response times may allow us to disentangle the effects of mental imagery on behavior from other cognitive (top-down) effects, such as expectation.
Resumo:
Foot-and-mouth disease (FMD), a disease of cloven hooved animals caused by FMD virus (FMDV), is one of the most economically devastating diseases of livestock worldwide. The global burden of disease is borne largely by livestock-keepers in areas of Africa and Asia where the disease is endemic and where many people rely on livestock for their livelihoods and food-security. Yet, there are many gaps in our knowledge of the drivers of FMDV circulation in these settings. In East Africa, FMD epidemiology is complicated by the circulation of multiple FMDV serotypes (distinct antigenic variants) and by the presence of large populations of susceptible wildlife and domestic livestock. The African buffalo (Syncerus caffer) is the only wildlife species with consistent evidence of high levels of FMDV infection, and East Africa contains the largest population of this species globally. To inform FMD control in this region, key questions relate to heterogeneities in FMD prevalence and impacts in different livestock management systems and to the role of wildlife as a potential source of FMDV for livestock. To develop FMD control strategies and make best use of vaccine control options, serotype-specific patterns of circulation need to be characterised. In this study, the impacts and epidemiology of FMD were investigated across a range of traditional livestock-keeping systems in northern Tanzania, including pastoralist, agro-pastoralist and rural smallholder systems. Data were generated through field studies and laboratory analyses between 2010 and 2015. The study involved analysis of existing household survey data and generated serological data from cross-sectional livestock and buffalo samples and longitudinal cattle samples. Serological analyses included non-structural protein ELISAs, serotype-specific solid-phase competitive ELISAs, with optimisation to detect East African FMDV variants, and virus neutralisation testing. Risk factors for FMDV infection and outbreaks were investigated through analysis of cross-sectional serological data in conjunction with a case-control outbreak analysis. A novel Bayesian modeling approach was developed to infer serotype-specific infection history from serological data, and combined with virus isolation data from FMD outbreaks to characterise temporal and spatial patterns of serotype-specific infection. A high seroprevalence of FMD was detected in both northern Tanzanian livestock (69%, [66.5 - 71.4%] in cattle and 48.5%, [45.7-51.3%] in small ruminants) and in buffalo (80.9%, [74.7-86.1%]). Four different serotypes of FMDV (A, O, SAT1 and SAT2) were isolated from livestock. Up to three outbreaks per year were reported by households and active surveillance highlighted up to four serial outbreaks in the same herds within three years. Agro-pastoral and pastoral livestock keepers reported more frequent FMD outbreaks compared to smallholders. Households in all three management systems reported that FMD outbreaks caused significant impacts on milk production and sales, and on animals’ draught power, hence on crop production, with implications for food security and livelihoods. Risk factor analyses showed that older livestock were more likely to be seropositive for FMD (Odds Ratio [OR] 1.4 [1.4-1.5] per extra year) and that cattle (OR 3.3 [2.7-4.0]) were more likely than sheep and goats to be seropositive. Livestock managed by agro-pastoralists (OR 8.1 [2.8-23.6]) or pastoralists (OR 7.1 [2.9-17.6]) were more likely to be seropositive compared to those managed by smallholders. Larger herds (OR: 1.02 [1.01-1.03] per extra bovine) and those that recently acquired new livestock (OR: 5.57 [1.01 – 30.91]) had increased odds of suffering an FMD outbreak. Measures of potential contact with buffalo or with other FMD susceptible wildlife did not increase the likelihood of FMD in livestock in either the cross-sectional serological analysis or case-control outbreak analysis. The Bayesian model was validated to correctly infer from ELISA data the most recent serotype to infect cattle. Consistent with the lack of risk factors related to wildlife contact, temporal and spatial patterns of exposure to specific FMDV serotypes were not tightly linked in cattle and buffalo. In cattle, four serial waves of different FMDV serotypes that swept through southern Kenyan and northern Tanzanian livestock populations over a four-year period dominated infection patterns. In contrast, only two serotypes (SAT1 and SAT2) dominated in buffalo populations. Key conclusions are that FMD has a substantial impact in traditional livestock systems in East Africa. Wildlife does not currently appear to act as an important source of FMDV for East African livestock, and control efforts in the region should initially focus on livestock management and vaccination strategies. A novel modeling approach greatly facilitated the interpretation of serological data and may be a potent epidemiological tool in the African setting. There was a clear temporal pattern of FMDV antigenic dominance across northern Tanzania and southern Kenya. Longer-term research to investigate whether serotype-specific FMDV sweeps are truly predictable, and to shed light on FMD post-infection immunity in animals exposed to serial FMD infections is warranted.
Resumo:
Motivation: Understanding the patterns of association between polymorphisms at different loci in a population ( linkage disequilibrium, LD) is of fundamental importance in various genetic studies. Many coefficients were proposed for measuring the degree of LD, but they provide only a static view of the current LD structure. Generative models (GMs) were proposed to go beyond these measures, giving not only a description of the actual LD structure but also a tool to help understanding the process that generated such structure. GMs based in coalescent theory have been the most appealing because they link LD to evolutionary factors. Nevertheless, the inference and parameter estimation of such models is still computationally challenging. Results: We present a more practical method to build GM that describe LD. The method is based on learning weighted Bayesian network structures from haplotype data, extracting equivalence structure classes and using them to model LD. The results obtained in public data from the HapMap database showed that the method is a promising tool for modeling LD. The associations represented by the learned models are correlated with the traditional measure of LD D`. The method was able to represent LD blocks found by standard tools. The granularity of the association blocks and the readability of the models can be controlled in the method. The results suggest that the causality information gained by our method can be useful to tell about the conservability of the genetic markers and to guide the selection of subset of representative markers.