956 resultados para Kullback-Leibler divergence
Resumo:
Evolutionary algorithms perform optimization using a population of sample solution points. An interesting development has been to view population-based optimization as the process of evolving an explicit, probabilistic model of the search space. This paper investigates a formal basis for continuous, population-based optimization in terms of a stochastic gradient descent on the Kullback-Leibler divergence between the model probability density and the objective function, represented as an unknown density of assumed form. This leads to an update rule that is related and compared with previous theoretical work, a continuous version of the population-based incremental learning algorithm, and the generalized mean shift clustering framework. Experimental results are presented that demonstrate the dynamics of the new algorithm on a set of simple test problems.
Resumo:
We propose and analyze two different Bayesian online algorithms for learning in discrete Hidden Markov Models and compare their performance with the already known Baldi-Chauvin Algorithm. Using the Kullback-Leibler divergence as a measure of generalization we draw learning curves in simplified situations for these algorithms and compare their performances.
Resumo:
Robust controllers for nonlinear stochastic systems with functional uncertainties can be consistently designed using probabilistic control methods. In this paper a generalised probabilistic controller design for the minimisation of the Kullback-Leibler divergence between the actual joint probability density function (pdf) of the closed loop control system, and an ideal joint pdf is presented emphasising how the uncertainty can be systematically incorporated in the absence of reliable systems models. To achieve this objective all probabilistic models of the system are estimated from process data using mixture density networks (MDNs) where all the parameters of the estimated pdfs are taken to be state and control input dependent. Based on this dependency of the density parameters on the input values, explicit formulations to the construction of optimal generalised probabilistic controllers are obtained through the techniques of dynamic programming and adaptive critic methods. Using the proposed generalised probabilistic controller, the conditional joint pdfs can be made to follow the ideal ones. A simulation example is used to demonstrate the implementation of the algorithm and encouraging results are obtained.
Resumo:
Optimal stochastic controller pushes the closed-loop behavior as close as possible to the desired one. The fully probabilistic design (FPD) uses probabilistic description of the desired closed loop and minimizes Kullback-Leibler divergence of the closed-loop description to the desired one. Practical exploitation of the fully probabilistic design control theory continues to be hindered by the computational complexities involved in numerically solving the associated stochastic dynamic programming problem. In particular very hard multivariate integration and an approximate interpolation of the involved multivariate functions. This paper proposes a new fully probabilistic contro algorithm that uses the adaptive critic methods to circumvent the need for explicitly evaluating the optimal value function, thereby dramatically reducing computational requirements. This is a main contribution of this short paper.
Resumo:
We present and analyze three different online algorithms for learning in discrete Hidden Markov Models (HMMs) and compare their performance with the Baldi-Chauvin Algorithm. Using the Kullback-Leibler divergence as a measure of the generalization error we draw learning curves in simplified situations and compare the results. The performance for learning drifting concepts of one of the presented algorithms is analyzed and compared with the Baldi-Chauvin algorithm in the same situations. A brief discussion about learning and symmetry breaking based on our results is also presented. © 2006 American Institute of Physics.
Resumo:
Many modern applications fall into the category of "large-scale" statistical problems, in which both the number of observations n and the number of features or parameters p may be large. Many existing methods focus on point estimation, despite the continued relevance of uncertainty quantification in the sciences, where the number of parameters to estimate often exceeds the sample size, despite huge increases in the value of n typically seen in many fields. Thus, the tendency in some areas of industry to dispense with traditional statistical analysis on the basis that "n=all" is of little relevance outside of certain narrow applications. The main result of the Big Data revolution in most fields has instead been to make computation much harder without reducing the importance of uncertainty quantification. Bayesian methods excel at uncertainty quantification, but often scale poorly relative to alternatives. This conflict between the statistical advantages of Bayesian procedures and their substantial computational disadvantages is perhaps the greatest challenge facing modern Bayesian statistics, and is the primary motivation for the work presented here.
Two general strategies for scaling Bayesian inference are considered. The first is the development of methods that lend themselves to faster computation, and the second is design and characterization of computational algorithms that scale better in n or p. In the first instance, the focus is on joint inference outside of the standard problem of multivariate continuous data that has been a major focus of previous theoretical work in this area. In the second area, we pursue strategies for improving the speed of Markov chain Monte Carlo algorithms, and characterizing their performance in large-scale settings. Throughout, the focus is on rigorous theoretical evaluation combined with empirical demonstrations of performance and concordance with the theory.
One topic we consider is modeling the joint distribution of multivariate categorical data, often summarized in a contingency table. Contingency table analysis routinely relies on log-linear models, with latent structure analysis providing a common alternative. Latent structure models lead to a reduced rank tensor factorization of the probability mass function for multivariate categorical data, while log-linear models achieve dimensionality reduction through sparsity. Little is known about the relationship between these notions of dimensionality reduction in the two paradigms. In Chapter 2, we derive several results relating the support of a log-linear model to nonnegative ranks of the associated probability tensor. Motivated by these findings, we propose a new collapsed Tucker class of tensor decompositions, which bridge existing PARAFAC and Tucker decompositions, providing a more flexible framework for parsimoniously characterizing multivariate categorical data. Taking a Bayesian approach to inference, we illustrate empirical advantages of the new decompositions.
Latent class models for the joint distribution of multivariate categorical, such as the PARAFAC decomposition, data play an important role in the analysis of population structure. In this context, the number of latent classes is interpreted as the number of genetically distinct subpopulations of an organism, an important factor in the analysis of evolutionary processes and conservation status. Existing methods focus on point estimates of the number of subpopulations, and lack robust uncertainty quantification. Moreover, whether the number of latent classes in these models is even an identified parameter is an open question. In Chapter 3, we show that when the model is properly specified, the correct number of subpopulations can be recovered almost surely. We then propose an alternative method for estimating the number of latent subpopulations that provides good quantification of uncertainty, and provide a simple procedure for verifying that the proposed method is consistent for the number of subpopulations. The performance of the model in estimating the number of subpopulations and other common population structure inference problems is assessed in simulations and a real data application.
In contingency table analysis, sparse data is frequently encountered for even modest numbers of variables, resulting in non-existence of maximum likelihood estimates. A common solution is to obtain regularized estimates of the parameters of a log-linear model. Bayesian methods provide a coherent approach to regularization, but are often computationally intensive. Conjugate priors ease computational demands, but the conjugate Diaconis--Ylvisaker priors for the parameters of log-linear models do not give rise to closed form credible regions, complicating posterior inference. In Chapter 4 we derive the optimal Gaussian approximation to the posterior for log-linear models with Diaconis--Ylvisaker priors, and provide convergence rate and finite-sample bounds for the Kullback-Leibler divergence between the exact posterior and the optimal Gaussian approximation. We demonstrate empirically in simulations and a real data application that the approximation is highly accurate, even in relatively small samples. The proposed approximation provides a computationally scalable and principled approach to regularized estimation and approximate Bayesian inference for log-linear models.
Another challenging and somewhat non-standard joint modeling problem is inference on tail dependence in stochastic processes. In applications where extreme dependence is of interest, data are almost always time-indexed. Existing methods for inference and modeling in this setting often cluster extreme events or choose window sizes with the goal of preserving temporal information. In Chapter 5, we propose an alternative paradigm for inference on tail dependence in stochastic processes with arbitrary temporal dependence structure in the extremes, based on the idea that the information on strength of tail dependence and the temporal structure in this dependence are both encoded in waiting times between exceedances of high thresholds. We construct a class of time-indexed stochastic processes with tail dependence obtained by endowing the support points in de Haan's spectral representation of max-stable processes with velocities and lifetimes. We extend Smith's model to these max-stable velocity processes and obtain the distribution of waiting times between extreme events at multiple locations. Motivated by this result, a new definition of tail dependence is proposed that is a function of the distribution of waiting times between threshold exceedances, and an inferential framework is constructed for estimating the strength of extremal dependence and quantifying uncertainty in this paradigm. The method is applied to climatological, financial, and electrophysiology data.
The remainder of this thesis focuses on posterior computation by Markov chain Monte Carlo. The Markov Chain Monte Carlo method is the dominant paradigm for posterior computation in Bayesian analysis. It has long been common to control computation time by making approximations to the Markov transition kernel. Comparatively little attention has been paid to convergence and estimation error in these approximating Markov Chains. In Chapter 6, we propose a framework for assessing when to use approximations in MCMC algorithms, and how much error in the transition kernel should be tolerated to obtain optimal estimation performance with respect to a specified loss function and computational budget. The results require only ergodicity of the exact kernel and control of the kernel approximation accuracy. The theoretical framework is applied to approximations based on random subsets of data, low-rank approximations of Gaussian processes, and a novel approximating Markov chain for discrete mixture models.
Data augmentation Gibbs samplers are arguably the most popular class of algorithm for approximately sampling from the posterior distribution for the parameters of generalized linear models. The truncated Normal and Polya-Gamma data augmentation samplers are standard examples for probit and logit links, respectively. Motivated by an important problem in quantitative advertising, in Chapter 7 we consider the application of these algorithms to modeling rare events. We show that when the sample size is large but the observed number of successes is small, these data augmentation samplers mix very slowly, with a spectral gap that converges to zero at a rate at least proportional to the reciprocal of the square root of the sample size up to a log factor. In simulation studies, moderate sample sizes result in high autocorrelations and small effective sample sizes. Similar empirical results are observed for related data augmentation samplers for multinomial logit and probit models. When applied to a real quantitative advertising dataset, the data augmentation samplers mix very poorly. Conversely, Hamiltonian Monte Carlo and a type of independence chain Metropolis algorithm show good mixing on the same dataset.
Resumo:
Climate change is thought to be one of the most pressing environmental problems facing humanity. However, due in part to failures in political communication and how the issue has been historically defined in American politics, discussions of climate change remain gridlocked and polarized. In this dissertation, I explore how climate change has been historically constructed as a political issue, how conflicts between climate advocates and skeptics have been communicated, and what effects polarization has had on political communication, particularly on the communication of climate change to skeptical audiences. I use a variety of methodological tools to consider these questions, including evolutionary frame analysis, which uses textual data to show how issues are framed and constructed over time; Kullback-Leibler divergence content analysis, which allows for comparison of advocate and skeptical framing over time; and experimental framing methods to test how audiences react to and process different presentations of climate change. I identify six major portrayals of climate change from 1988 to 2012, but find that no single construction of the issue has dominated the public discourse defining the problem. In addition, the construction of climate change may be associated with changes in public political sentiment, such as greater pessimism about climate action when the electorate becomes more conservative. As the issue of climate change has become more polarized in American politics, one proposed causal pathway for the observed polarization is that advocate and skeptic framing of climate change focuses on different facets of the issue and ignores rival arguments, a practice known as “talking past.” However, I find no evidence of increased talking past in 25 years of popular newsmedia reporting on the issue, suggesting both that talking past has not driven public polarization or that polarization is occurring in venues outside of the mainstream public discourse, such as blogs. To examine how polarization affects political communication on climate change, I test the cognitive processing of a variety of messages and sources that promote action against climate change among Republican individuals. Rather than identifying frames that are powerful enough to overcome polarization, I find that Republicans exhibit telltale signs of motivated skepticism on the issue, that is, they reject framing that runs counter to their party line and political identity. This result suggests that polarization constrains political communication on polarized issues, overshadowing traditional message and source effects of framing and increasing the difficulty communicators experience in reaching skeptical audiences.
Resumo:
Bayesian nonparametric models, such as the Gaussian process and the Dirichlet process, have been extensively applied for target kinematics modeling in various applications including environmental monitoring, traffic planning, endangered species tracking, dynamic scene analysis, autonomous robot navigation, and human motion modeling. As shown by these successful applications, Bayesian nonparametric models are able to adjust their complexities adaptively from data as necessary, and are resistant to overfitting or underfitting. However, most existing works assume that the sensor measurements used to learn the Bayesian nonparametric target kinematics models are obtained a priori or that the target kinematics can be measured by the sensor at any given time throughout the task. Little work has been done for controlling the sensor with bounded field of view to obtain measurements of mobile targets that are most informative for reducing the uncertainty of the Bayesian nonparametric models. To present the systematic sensor planning approach to leaning Bayesian nonparametric models, the Gaussian process target kinematics model is introduced at first, which is capable of describing time-invariant spatial phenomena, such as ocean currents, temperature distributions and wind velocity fields. The Dirichlet process-Gaussian process target kinematics model is subsequently discussed for modeling mixture of mobile targets, such as pedestrian motion patterns.
Novel information theoretic functions are developed for these introduced Bayesian nonparametric target kinematics models to represent the expected utility of measurements as a function of sensor control inputs and random environmental variables. A Gaussian process expected Kullback Leibler divergence is developed as the expectation of the KL divergence between the current (prior) and posterior Gaussian process target kinematics models with respect to the future measurements. Then, this approach is extended to develop a new information value function that can be used to estimate target kinematics described by a Dirichlet process-Gaussian process mixture model. A theorem is proposed that shows the novel information theoretic functions are bounded. Based on this theorem, efficient estimators of the new information theoretic functions are designed, which are proved to be unbiased with the variance of the resultant approximation error decreasing linearly as the number of samples increases. Computational complexities for optimizing the novel information theoretic functions under sensor dynamics constraints are studied, and are proved to be NP-hard. A cumulative lower bound is then proposed to reduce the computational complexity to polynomial time.
Three sensor planning algorithms are developed according to the assumptions on the target kinematics and the sensor dynamics. For problems where the control space of the sensor is discrete, a greedy algorithm is proposed. The efficiency of the greedy algorithm is demonstrated by a numerical experiment with data of ocean currents obtained by moored buoys. A sweep line algorithm is developed for applications where the sensor control space is continuous and unconstrained. Synthetic simulations as well as physical experiments with ground robots and a surveillance camera are conducted to evaluate the performance of the sweep line algorithm. Moreover, a lexicographic algorithm is designed based on the cumulative lower bound of the novel information theoretic functions, for the scenario where the sensor dynamics are constrained. Numerical experiments with real data collected from indoor pedestrians by a commercial pan-tilt camera are performed to examine the lexicographic algorithm. Results from both the numerical simulations and the physical experiments show that the three sensor planning algorithms proposed in this dissertation based on the novel information theoretic functions are superior at learning the target kinematics with
little or no prior knowledge
Resumo:
This paper is concerned with SIR (susceptible--infected--removed) household epidemic models in which the infection response may be either mild or severe, with the type of response also affecting the infectiousness of an individual. Two different models are analysed. In the first model, the infection status of an individual is predetermined, perhaps due to partial immunity, and in the second, the infection status of an individual depends on the infection status of its infector and on whether the individual was infected by a within- or between-household contact. The first scenario may be modelled using a multitype household epidemic model, and the second scenario by a model we denote by the infector-dependent-severity household epidemic model. Large population results of the two models are derived, with the focus being on the distribution of the total numbers of mild and severe cases in a typical household, of any given size, in the event that the epidemic becomes established. The aim of the paper is to investigate whether it is possible to determine which of the two underlying explanations is causing the varying response when given final size household outbreak data containing mild and severe cases. We conduct numerical studies which show that, given data on sufficiently many households, it is generally possible to discriminate between the two models by comparing the Kullback-Leibler divergence for the two fitted models to these data.
Resumo:
This article presents and evaluates Quantum Inspired models of Target Activation using Cued-Target Recall Memory Modelling over multiple sources of Free Association data. Two components were evaluated: Whether Quantum Inspired models of Target Activation would provide a better framework than their classical psychological counterparts and how robust these models are across the different sources of Free Association data. In previous work, a formal model of cued-target recall did not exist and as such Target Activation was unable to be assessed directly. Further to that, the data source used was suspected of suffering from temporal and geographical bias. As a consequence, Target Activation was measured against cued-target recall data as an approximation of performance. Since then, a formal model of cued-target recall (PIER3) has been developed [10] with alternative sources of data also becoming available. This allowed us to directly model target activation in cued-target recall with human cued-target recall pairs and use multiply sources of Free Association Data. Featural Characteristics known to be important to Target Activation were measured for each of the data sources to identify any major differences that may explain variations in performance for each of the models. Each of the activation models were used in the PIER3 memory model for each of the data sources and was benchmarked against cued-target recall pairs provided by the University of South Florida (USF). Two methods where used to evaluate performance. The first involved measuring the divergence between the sets of results using the Kullback Leibler (KL) divergence with the second utilizing a previous statistical analysis of the errors [9]. Of the three sources of data, two were sourced from human subjects being the USF Free Association Norms and the University of Leuven (UL) Free Association Networks. The third was sourced from a new method put forward by Galea and Bruza, 2015 in which pseudo Free Association Networks (Corpus Based Association Networks - CANs) are built using co-occurrence statistics on large text corpus. It was found that the Quantum Inspired Models of Target Activation not only outperformed the classical psychological model but was more robust across a variety of data sources.
Resumo:
Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)
Discriminating Different Classes of Biological Networks by Analyzing the Graphs Spectra Distribution
Resumo:
The brain's structural and functional systems, protein-protein interaction, and gene networks are examples of biological systems that share some features of complex networks, such as highly connected nodes, modularity, and small-world topology. Recent studies indicate that some pathologies present topological network alterations relative to norms seen in the general population. Therefore, methods to discriminate the processes that generate the different classes of networks (e. g., normal and disease) might be crucial for the diagnosis, prognosis, and treatment of the disease. It is known that several topological properties of a network (graph) can be described by the distribution of the spectrum of its adjacency matrix. Moreover, large networks generated by the same random process have the same spectrum distribution, allowing us to use it as a "fingerprint". Based on this relationship, we introduce and propose the entropy of a graph spectrum to measure the "uncertainty" of a random graph and the Kullback-Leibler and Jensen-Shannon divergences between graph spectra to compare networks. We also introduce general methods for model selection and network model parameter estimation, as well as a statistical procedure to test the nullity of divergence between two classes of complex networks. Finally, we demonstrate the usefulness of the proposed methods by applying them to (1) protein-protein interaction networks of different species and (2) on networks derived from children diagnosed with Attention Deficit Hyperactivity Disorder (ADHD) and typically developing children. We conclude that scale-free networks best describe all the protein-protein interactions. Also, we show that our proposed measures succeeded in the identification of topological changes in the network while other commonly used measures (number of edges, clustering coefficient, average path length) failed.
Resumo:
This work proposes an optimization of a semi-supervised Change Detection methodology based on a combination of Change Indices (CI) derived from an image multitemporal data set. For this purpose, SPOT 5 Panchromatic images with 2.5 m spatial resolution have been used, from which three Change Indices have been calculated. Two of them are usually known indices; however the third one has been derived considering the Kullbak-Leibler divergence. Then, these three indices have been combined forming a multiband image that has been used in as input for a Support Vector Machine (SVM) classifier where four different discriminant functions have been tested in order to differentiate between change and no_change categories. The performance of the suggested procedure has been assessed applying different quality measures, reaching in each case highly satisfactory values. These results have demonstrated that the simultaneous combination of basic change indices with others more sophisticated like the Kullback-Leibler distance, and the application of non-parametric discriminant functions like those employees in the SVM method, allows solving efficiently a change detection problem.
Resumo:
The main objective of the project is to enhance the already effective health-monitoring system (HUMS) for helicopters by analysing structural vibrations to recognise different flight conditions directly from sensor information. The goal of this paper is to develop a new method to select those sensors and frequency bands that are best for detecting changes in flight conditions. We projected frequency information to a 2-dimensional space in order to visualise flight-condition transitions using the Generative Topographic Mapping (GTM) and a variant which supports simultaneous feature selection. We created an objective measure of the separation between different flight conditions in the visualisation space by calculating the Kullback-Leibler (KL) divergence between Gaussian mixture models (GMMs) fitted to each class: the higher the KL-divergence, the better the interclass separation. To find the optimal combination of sensors, they were considered in pairs, triples and groups of four sensors. The sensor triples provided the best result in terms of KL-divergence. We also found that the use of a variational training algorithm for the GMMs gave more reliable results.
Resumo:
A modeling paradigm is proposed for covariate, variance and working correlation structure selection for longitudinal data analysis. Appropriate selection of covariates is pertinent to correct variance modeling and selecting the appropriate covariates and variance function is vital to correlation structure selection. This leads to a stepwise model selection procedure that deploys a combination of different model selection criteria. Although these criteria find a common theoretical root based on approximating the Kullback-Leibler distance, they are designed to address different aspects of model selection and have different merits and limitations. For example, the extended quasi-likelihood information criterion (EQIC) with a covariance penalty performs well for covariate selection even when the working variance function is misspecified, but EQIC contains little information on correlation structures. The proposed model selection strategies are outlined and a Monte Carlo assessment of their finite sample properties is reported. Two longitudinal studies are used for illustration.