921 resultados para Bayesian nonparametric


Relevância:

20.00% 20.00%

Publicador:

Resumo:

This thesis addresses data assimilation, which typically refers to the estimation of the state of a physical system given a model and observations, and its application to short-term precipitation forecasting. A general introduction to data assimilation is given, both from a deterministic and' stochastic point of view. Data assimilation algorithms are reviewed, in the static case (when no dynamics are involved), then in the dynamic case. A double experiment on two non-linear models, the Lorenz 63 and the Lorenz 96 models, is run and the comparative performance of the methods is discussed in terms of quality of the assimilation, robustness "in the non-linear regime and computational time. Following the general review and analysis, data assimilation is discussed in the particular context of very short-term rainfall forecasting (nowcasting) using radar images. An extended Bayesian precipitation nowcasting model is introduced. The model is stochastic in nature and relies on the spatial decomposition of the rainfall field into rain "cells". Radar observations are assimilated using a Variational Bayesian method in which the true posterior distribution of the parameters is approximated by a more tractable distribution. The motion of the cells is captured by a 20 Gaussian process. The model is tested on two precipitation events, the first dominated by convective showers, the second by precipitation fronts. Several deterministic and probabilistic validation methods are applied and the model is shown to retain reasonable prediction skill at up to 3 hours lead time. Extensions to the model are discussed.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Control design for stochastic uncertain nonlinear systems is traditionally based on minimizing the expected value of a suitably chosen loss function. Moreover, most control methods usually assume the certainty equivalence principle to simplify the problem and make it computationally tractable. We offer an improved probabilistic framework which is not constrained by these previous assumptions, and provides a more natural framework for incorporating and dealing with uncertainty. The focus of this paper is on developing this framework to obtain an optimal control law strategy using a fully probabilistic approach for information extraction from process data, which does not require detailed knowledge of system dynamics. Moreover, the proposed control method framework allows handling the problem of input-dependent noise. A basic paradigm is proposed and the resulting algorithm is discussed. The proposed probabilistic control method is for the general nonlinear class of discrete-time systems. It is demonstrated theoretically on the affine class. A nonlinear simulation example is also provided to validate theoretical development.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Sentiment analysis has long focused on binary classification of text as either positive or negative. There has been few work on mapping sentiments or emotions into multiple dimensions. This paper studies a Bayesian modeling approach to multi-class sentiment classification and multidimensional sentiment distributions prediction. It proposes effective mechanisms to incorporate supervised information such as labeled feature constraints and document-level sentiment distributions derived from the training data into model learning. We have evaluated our approach on the datasets collected from the confession section of the Experience Project website where people share their life experiences and personal stories. Our results show that using the latent representation of the training documents derived from our approach as features to build a maximum entropy classifier outperforms other approaches on multi-class sentiment classification. In the more difficult task of multi-dimensional sentiment distributions prediction, our approach gives superior performance compared to a few competitive baselines. © 2012 ACM.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents a comparative study of three closely related Bayesian models for unsupervised document level sentiment classification, namely, the latent sentiment model (LSM), the joint sentiment-topic (JST) model, and the Reverse-JST model. Extensive experiments have been conducted on two corpora, the movie review dataset and the multi-domain sentiment dataset. It has been found that while all the three models achieve either better or comparable performance on these two corpora when compared to the existing unsupervised sentiment classification approaches, both JST and Reverse-JST are able to extract sentiment-oriented topics. In addition, Reverse-JST always performs worse than JST suggesting that the JST model is more appropriate for joint sentiment topic detection.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

DUE TO COPYRIGHT RESTRICTIONS ONLY AVAILABLE FOR CONSULTATION AT ASTON UNIVERSITY LIBRARY AND INFORMATION SERVICES WITH PRIOR ARRANGEMENT

Relevância:

20.00% 20.00%

Publicador:

Resumo:

DUE TO COPYRIGHT RESTRICTIONS ONLY AVAILABLE FOR CONSULTATION AT ASTON UNIVERSITY LIBRARY AND INFORMATION SERVICES WITH PRIOR ARRANGEMENT

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Social streams have proven to be the mostup-to-date and inclusive information on cur-rent events. In this paper we propose a novelprobabilistic modelling framework, called violence detection model (VDM), which enables the identification of text containing violent content and extraction of violence-related topics over social media data. The proposed VDM model does not require any labeled corpora for training, instead, it only needs the in-corporation of word prior knowledge which captures whether a word indicates violence or not. We propose a novel approach of deriving word prior knowledge using the relative entropy measurement of words based on the in-tuition that low entropy words are indicative of semantically coherent topics and therefore more informative, while high entropy words indicates words whose usage is more topical diverse and therefore less informative. Our proposed VDM model has been evaluated on the TREC Microblog 2011 dataset to identify topics related to violence. Experimental results show that deriving word priors using our proposed relative entropy method is more effective than the widely-used information gain method. Moreover, VDM gives higher violence classification results and produces more coherent violence-related topics compared toa few competitive baselines.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

With the proliferation of social media sites, social streams have proven to contain the most up-to-date information on current events. Therefore, it is crucial to extract events from the social streams such as tweets. However, it is not straightforward to adapt the existing event extraction systems since texts in social media are fragmented and noisy. In this paper we propose a simple and yet effective Bayesian model, called Latent Event Model (LEM), to extract structured representation of events from social media. LEM is fully unsupervised and does not require annotated data for training. We evaluate LEM on a Twitter corpus. Experimental results show that the proposed model achieves 83% in F-measure, and outperforms the state-of-the-art baseline by over 7%.© 2014 Association for Computational Linguistics.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The twin arginine translocation (TAT) system ferries folded proteins across the bacterial membrane. Proteins are directed into this system by the TAT signal peptide present at the amino terminus of the precursor protein, which contains the twin arginine residues that give the system its name. There are currently only two computational methods for the prediction of TAT translocated proteins from sequence. Both methods have limitations that make the creation of a new algorithm for TAT-translocated protein prediction desirable. We have developed TATPred, a new sequence-model method, based on a Nave-Bayesian network, for the prediction of TAT signal peptides. In this approach, a comprehensive range of models was tested to identify the most reliable and robust predictor. The best model comprised 12 residues: three residues prior to the twin arginines and the seven residues that follow them. We found a prediction sensitivity of 0.979 and a specificity of 0.942.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Membrane proteins, which constitute approximately 20% of most genomes, are poorly tractable targets for experimental structure determination, thus analysis by prediction and modelling makes an important contribution to their on-going study. Membrane proteins form two main classes: alpha helical and beta barrel trans-membrane proteins. By using a method based on Bayesian Networks, which provides a flexible and powerful framework for statistical inference, we addressed alpha-helical topology prediction. This method has accuracies of 77.4% for prokaryotic proteins and 61.4% for eukaryotic proteins. The method described here represents an important advance in the computational determination of membrane protein topology and offers a useful, and complementary, tool for the analysis of membrane proteins for a range of applications.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Membrane proteins, which constitute approximately 20% of most genomes, form two main classes: alpha helical and beta barrel transmembrane proteins. Using methods based on Bayesian Networks, a powerful approach for statistical inference, we have sought to address beta-barrel topology prediction. The beta-barrel topology predictor reports individual strand accuracies of 88.6%. The method outlined here represents a potentially important advance in the computational determination of membrane protein topology.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Calibration of stochastic traffic microsimulation models is a challenging task. This paper proposes a fast iterative probabilistic precalibration framework and demonstrates how it can be successfully applied to a real-world traffic simulation model of a section of the M40 motorway and its surrounding area in the U.K. The efficiency of the method stems from the use of emulators of the stochastic microsimulator, which provides fast surrogates of the traffic model. The use of emulators minimizes the number of microsimulator runs required, and the emulators' probabilistic construction allows for the consideration of the extra uncertainty introduced by the approximation. It is shown that automatic precalibration of this real-world microsimulator, using turn-count observational data, is possible, considering all parameters at once, and that this precalibrated microsimulator improves on the fit to observations compared with the traditional expertly tuned microsimulation. © 2000-2011 IEEE.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The objective of this study was to investigate the effects of circularity, comorbidity, prevalence and presentation variation on the accuracy of differential diagnoses made in optometric primary care using a modified form of naïve Bayesian sequential analysis. No such investigation has ever been reported before. Data were collected for 1422 cases seen over one year. Positive test outcomes were recorded for case history (ethnicity, age, symptoms and ocular and medical history) and clinical signs in relation to each diagnosis. For this reason only positive likelihood ratios were used for this modified form of Bayesian analysis that was carried out with Laplacian correction and Chi-square filtration. Accuracy was expressed as the percentage of cases for which the diagnoses made by the clinician appeared at the top of a list generated by Bayesian analysis. Preliminary analyses were carried out on 10 diagnoses and 15 test outcomes. Accuracy of 100% was achieved in the absence of presentation variation but dropped by 6% when variation existed. Circularity artificially elevated accuracy by 0.5%. Surprisingly, removal of Chi-square filtering increased accuracy by 0.4%. Decision tree analysis showed that accuracy was influenced primarily by prevalence followed by presentation variation and comorbidity. Analysis of 35 diagnoses and 105 test outcomes followed. This explored the use of positive likelihood ratios, derived from the case history, to recommend signs to look for. Accuracy of 72% was achieved when all clinical signs were entered. The drop in accuracy, compared to the preliminary analysis, was attributed to the fact that some diagnoses lacked strong diagnostic signs; the accuracy increased by 1% when only recommended signs were entered. Chi-square filtering improved recommended test selection. Decision tree analysis showed that accuracy again influenced primarily by prevalence, followed by comorbidity and presentation variation. Future work will explore the use of likelihood ratios based on positive and negative test findings prior to considering naïve Bayesian analysis as a form of artificial intelligence in optometric practice.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper proposes a constrained nonparametric method of estimating an input distance function. A regression function is estimated via kernel methods without functional form assumptions. To guarantee that the estimated input distance function satisfies its properties, monotonicity constraints are imposed on the regression surface via the constraint weighted bootstrapping method borrowed from statistics literature. The first, second, and cross partial analytical derivatives of the estimated input distance function are derived, and thus the elasticities measuring input substitutability can be computed from them. The method is then applied to a cross-section of 3,249 Norwegian timber producers.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this letter, we derive continuum equations for the generalization error of the Bayesian online algorithm (BOnA) for the one-layer perceptron with a spherical covariance matrix using the Rosenblatt potential and show, by numerical calculations, that the asymptotic performance of the algorithm is the same as the one for the optimal algorithm found by means of variational methods with the added advantage that the BOnA does not use any inaccessible information during learning. © 2007 IEEE.