954 resultados para KL divergence
Resumo:
In this paper we study representation of KL-divergence minimization, in the cases where integer sufficient statistics exists, using tools from polynomial algebra. We show that the estimation of parametric statistical models in this case can be transformed to solving a system of polynomial equations. In particular, we also study the case of Kullback-Csiszar iteration scheme. We present implicit descriptions of these models and show that implicitization preserves specialization of prior distribution. This result leads us to a Grobner bases method to compute an implicit representation of minimum KL-divergence models.
Resumo:
In this paper we study constrained maximum entropy and minimum divergence optimization problems, in the cases where integer valued sufficient statistics exists, using tools from computational commutative algebra. We show that the estimation of parametric statistical models in this case can be transformed to solving a system of polynomial equations. We give an implicit description of maximum entropy models by embedding them in algebraic varieties for which we give a Grobner basis method to compute it. In the cases of minimum KL-divergence models we show that implicitization preserves specialization of prior distribution. This result leads us to a Grobner basis method to embed minimum KL-divergence models in algebraic varieties. (C) 2012 Elsevier Inc. All rights reserved.
Resumo:
One main challenge in developing a system for visual surveillance event detection is the annotation of target events in the training data. By making use of the assumption that events with security interest are often rare compared to regular behaviours, this paper presents a novel approach by using Kullback-Leibler (KL) divergence for rare event detection in a weakly supervised learning setting, where only clip-level annotation is available. It will be shown that this approach outperforms state-of-the-art methods on a popular real-world dataset, while preserving real time performance.
Resumo:
Video surveillance infrastructure has been widely installed in public places for security purposes. However, live video feeds are typically monitored by human staff, making the detection of important events as they occur difficult. As such, an expert system that can automatically detect events of interest in surveillance footage is highly desirable. Although a number of approaches have been proposed, they have significant limitations: supervised approaches, which can detect a specific event, ideally require a large number of samples with the event spatially and temporally localised; while unsupervised approaches, which do not require this demanding annotation, can only detect whether an event is abnormal and not specific event types. To overcome these problems, we formulate a weakly-supervised approach using Kullback-Leibler (KL) divergence to detect rare events. The proposed approach leverages the sparse nature of the target events to its advantage, and we show that this data imbalance guarantees the existence of a decision boundary to separate samples that contain the target event from those that do not. This trait, combined with the coarse annotation used by weakly supervised learning (that only indicates approximately when an event occurs), greatly reduces the annotation burden while retaining the ability to detect specific events. Furthermore, the proposed classifier requires only a decision threshold, simplifying its use compared to other weakly supervised approaches. We show that the proposed approach outperforms state-of-the-art methods on a popular real-world traffic surveillance dataset, while preserving real time performance.
Resumo:
This article presents and evaluates Quantum Inspired models of Target Activation using Cued-Target Recall Memory Modelling over multiple sources of Free Association data. Two components were evaluated: Whether Quantum Inspired models of Target Activation would provide a better framework than their classical psychological counterparts and how robust these models are across the different sources of Free Association data. In previous work, a formal model of cued-target recall did not exist and as such Target Activation was unable to be assessed directly. Further to that, the data source used was suspected of suffering from temporal and geographical bias. As a consequence, Target Activation was measured against cued-target recall data as an approximation of performance. Since then, a formal model of cued-target recall (PIER3) has been developed [10] with alternative sources of data also becoming available. This allowed us to directly model target activation in cued-target recall with human cued-target recall pairs and use multiply sources of Free Association Data. Featural Characteristics known to be important to Target Activation were measured for each of the data sources to identify any major differences that may explain variations in performance for each of the models. Each of the activation models were used in the PIER3 memory model for each of the data sources and was benchmarked against cued-target recall pairs provided by the University of South Florida (USF). Two methods where used to evaluate performance. The first involved measuring the divergence between the sets of results using the Kullback Leibler (KL) divergence with the second utilizing a previous statistical analysis of the errors [9]. Of the three sources of data, two were sourced from human subjects being the USF Free Association Norms and the University of Leuven (UL) Free Association Networks. The third was sourced from a new method put forward by Galea and Bruza, 2015 in which pseudo Free Association Networks (Corpus Based Association Networks - CANs) are built using co-occurrence statistics on large text corpus. It was found that the Quantum Inspired Models of Target Activation not only outperformed the classical psychological model but was more robust across a variety of data sources.
Resumo:
It is important to identify the ``correct'' number of topics in mechanisms like Latent Dirichlet Allocation(LDA) as they determine the quality of features that are presented as features for classifiers like SVM. In this work we propose a measure to identify the correct number of topics and offer empirical evidence in its favor in terms of classification accuracy and the number of topics that are naturally present in the corpus. We show the merit of the measure by applying it on real-world as well as synthetic data sets(both text and images). In proposing this measure, we view LDA as a matrix factorization mechanism, wherein a given corpus C is split into two matrix factors M-1 and M-2 as given by C-d*w = M1(d*t) x Q(t*w).Where d is the number of documents present in the corpus anti w is the size of the vocabulary. The quality of the split depends on ``t'', the right number of topics chosen. The measure is computed in terms of symmetric KL-Divergence of salient distributions that are derived from these matrix factors. We observe that the divergence values are higher for non-optimal number of topics - this is shown by a `dip' at the right value for `t'.
Resumo:
Non-negative matrix factorization [5](NMF) is a well known tool for unsupervised machine learning. It can be viewed as a generalization of the K-means clustering, Expectation Maximization based clustering and aspect modeling by Probabilistic Latent Semantic Analysis (PLSA). Specifically PLSA is related to NMF with KL-divergence objective function. Further it is shown that K-means clustering is a special case of NMF with matrix L2 norm based error function. In this paper our objective is to analyze the relation between K-means clustering and PLSA by examining the KL-divergence function and matrix L2 norm based error function.
Resumo:
Vector Taylor Series (VTS) model based compensation is a powerful approach for noise robust speech recognition. An important extension to this approach is VTS adaptive training (VAT), which allows canonical models to be estimated on diverse noise-degraded training data. These canonical model can be estimated using EM-based approaches, allowing simple extensions to discriminative VAT (DVAT). However to ensure a diagonal corrupted speech covariance matrix the Jacobian (loading matrix) relating the noise and clean speech is diagonalised. In this work an approach for yielding optimal diagonal loading matrices based on minimising the expected KL-divergence between the diagonal loading matrix and "correct" distributions is proposed. The performance of DVAT using the standard and optimal diagonalisation was evaluated on both in-car collected data and the Aurora4 task. © 2012 IEEE.
Resumo:
A class identification algorithms is introduced for Gaussian process(GP)models.The fundamental approach is to propose a new kernel function which leads to a covariance matrix with low rank,a property that is consequently exploited for computational efficiency for both model parameter estimation and model predictions.The objective of either maximizing the marginal likelihood or the Kullback–Leibler (K–L) divergence between the estimated output probability density function(pdf)and the true pdf has been used as respective cost functions.For each cost function,an efficient coordinate descent algorithm is proposed to estimate the kernel parameters using a one dimensional derivative free search, and noise variance using a fast gradient descent algorithm. Numerical examples are included to demonstrate the effectiveness of the new identification approaches.
Resumo:
A new class of parameter estimation algorithms is introduced for Gaussian process regression (GPR) models. It is shown that the integration of the GPR model with probability distance measures of (i) the integrated square error and (ii) Kullback–Leibler (K–L) divergence are analytically tractable. An efficient coordinate descent algorithm is proposed to iteratively estimate the kernel width using golden section search which includes a fast gradient descent algorithm as an inner loop to estimate the noise variance. Numerical examples are included to demonstrate the effectiveness of the new identification approaches.
Resumo:
The report explores the problem of detecting complex point target models in a MIMO radar system. A complex point target is a mathematical and statistical model for a radar target that is not resolved in space, but exhibits varying complex reflectivity across the different bistatic view angles. The complex reflectivity can be modeled as a complex stochastic process whose index set is the set of all the bistatic view angles, and the parameters of the stochastic process follow from an analysis of a target model comprising a number of ideal point scatterers randomly located within some radius of the targets center of mass. The proposed complex point targets may be applicable to statistical inference in multistatic or MIMO radar system. Six different target models are summarized here – three 2-dimensional (Gaussian, Uniform Square, and Uniform Circle) and three 3-dimensional (Gaussian, Uniform Cube, and Uniform Sphere). They are assumed to have different distributions on the location of the point scatterers within the target. We develop data models for the received signals from such targets in the MIMO radar system with distributed assets and partially correlated signals, and consider the resulting detection problem which reduces to the familiar Gauss-Gauss detection problem. We illustrate that the target parameter and transmit signal have an influence on the detector performance through target extent and the SNR respectively. A series of the receiver operator characteristic (ROC) curves are generated to notice the impact on the detector for varying SNR. Kullback–Leibler (KL) divergence is applied to obtain the approximate mean difference between density functions the scatterers assume inside the target models to show the change in the performance of the detector with target extent of the point scatterers.
Resumo:
In recent years there has been an increased interest in applying non-parametric methods to real-world problems. Significant research has been devoted to Gaussian processes (GPs) due to their increased flexibility when compared with parametric models. These methods use Bayesian learning, which generally leads to analytically intractable posteriors. This thesis proposes a two-step solution to construct a probabilistic approximation to the posterior. In the first step we adapt the Bayesian online learning to GPs: the final approximation to the posterior is the result of propagating the first and second moments of intermediate posteriors obtained by combining a new example with the previous approximation. The propagation of em functional forms is solved by showing the existence of a parametrisation to posterior moments that uses combinations of the kernel function at the training points, transforming the Bayesian online learning of functions into a parametric formulation. The drawback is the prohibitive quadratic scaling of the number of parameters with the size of the data, making the method inapplicable to large datasets. The second step solves the problem of the exploding parameter size and makes GPs applicable to arbitrarily large datasets. The approximation is based on a measure of distance between two GPs, the KL-divergence between GPs. This second approximation is with a constrained GP in which only a small subset of the whole training dataset is used to represent the GP. This subset is called the em Basis Vector, or BV set and the resulting GP is a sparse approximation to the true posterior. As this sparsity is based on the KL-minimisation, it is probabilistic and independent of the way the posterior approximation from the first step is obtained. We combine the sparse approximation with an extension to the Bayesian online algorithm that allows multiple iterations for each input and thus approximating a batch solution. The resulting sparse learning algorithm is a generic one: for different problems we only change the likelihood. The algorithm is applied to a variety of problems and we examine its performance both on more classical regression and classification tasks and to the data-assimilation and a simple density estimation problems.
Resumo:
The main objective of the project is to enhance the already effective health-monitoring system (HUMS) for helicopters by analysing structural vibrations to recognise different flight conditions directly from sensor information. The goal of this paper is to develop a new method to select those sensors and frequency bands that are best for detecting changes in flight conditions. We projected frequency information to a 2-dimensional space in order to visualise flight-condition transitions using the Generative Topographic Mapping (GTM) and a variant which supports simultaneous feature selection. We created an objective measure of the separation between different flight conditions in the visualisation space by calculating the Kullback-Leibler (KL) divergence between Gaussian mixture models (GMMs) fitted to each class: the higher the KL-divergence, the better the interclass separation. To find the optimal combination of sensors, they were considered in pairs, triples and groups of four sensors. The sensor triples provided the best result in terms of KL-divergence. We also found that the use of a variational training algorithm for the GMMs gave more reliable results.
Resumo:
Analysing the molecular polymorphism and interactions of DNA, RNA and proteins is of fundamental importance in biology. Predicting functions of polymorphic molecules is important in order to design more effective medicines. Analysing major histocompatibility complex (MHC) polymorphism is important for mate choice, epitope-based vaccine design and transplantation rejection etc. Most of the existing exploratory approaches cannot analyse these datasets because of the large number of molecules with a high number of descriptors per molecule. This thesis develops novel methods for data projection in order to explore high dimensional biological dataset by visualising them in a low-dimensional space. With increasing dimensionality, some existing data visualisation methods such as generative topographic mapping (GTM) become computationally intractable. We propose variants of these methods, where we use log-transformations at certain steps of expectation maximisation (EM) based parameter learning process, to make them tractable for high-dimensional datasets. We demonstrate these proposed variants both for synthetic and electrostatic potential dataset of MHC class-I. We also propose to extend a latent trait model (LTM), suitable for visualising high dimensional discrete data, to simultaneously estimate feature saliency as an integrated part of the parameter learning process of a visualisation model. This LTM variant not only gives better visualisation by modifying the project map based on feature relevance, but also helps users to assess the significance of each feature. Another problem which is not addressed much in the literature is the visualisation of mixed-type data. We propose to combine GTM and LTM in a principled way where appropriate noise models are used for each type of data in order to visualise mixed-type data in a single plot. We call this model a generalised GTM (GGTM). We also propose to extend GGTM model to estimate feature saliencies while training a visualisation model and this is called GGTM with feature saliency (GGTM-FS). We demonstrate effectiveness of these proposed models both for synthetic and real datasets. We evaluate visualisation quality using quality metrics such as distance distortion measure and rank based measures: trustworthiness, continuity, mean relative rank errors with respect to data space and latent space. In cases where the labels are known we also use quality metrics of KL divergence and nearest neighbour classifications error in order to determine the separation between classes. We demonstrate the efficacy of these proposed models both for synthetic and real biological datasets with a main focus on the MHC class-I dataset.
Resumo:
Bayesian nonparametric models, such as the Gaussian process and the Dirichlet process, have been extensively applied for target kinematics modeling in various applications including environmental monitoring, traffic planning, endangered species tracking, dynamic scene analysis, autonomous robot navigation, and human motion modeling. As shown by these successful applications, Bayesian nonparametric models are able to adjust their complexities adaptively from data as necessary, and are resistant to overfitting or underfitting. However, most existing works assume that the sensor measurements used to learn the Bayesian nonparametric target kinematics models are obtained a priori or that the target kinematics can be measured by the sensor at any given time throughout the task. Little work has been done for controlling the sensor with bounded field of view to obtain measurements of mobile targets that are most informative for reducing the uncertainty of the Bayesian nonparametric models. To present the systematic sensor planning approach to leaning Bayesian nonparametric models, the Gaussian process target kinematics model is introduced at first, which is capable of describing time-invariant spatial phenomena, such as ocean currents, temperature distributions and wind velocity fields. The Dirichlet process-Gaussian process target kinematics model is subsequently discussed for modeling mixture of mobile targets, such as pedestrian motion patterns.
Novel information theoretic functions are developed for these introduced Bayesian nonparametric target kinematics models to represent the expected utility of measurements as a function of sensor control inputs and random environmental variables. A Gaussian process expected Kullback Leibler divergence is developed as the expectation of the KL divergence between the current (prior) and posterior Gaussian process target kinematics models with respect to the future measurements. Then, this approach is extended to develop a new information value function that can be used to estimate target kinematics described by a Dirichlet process-Gaussian process mixture model. A theorem is proposed that shows the novel information theoretic functions are bounded. Based on this theorem, efficient estimators of the new information theoretic functions are designed, which are proved to be unbiased with the variance of the resultant approximation error decreasing linearly as the number of samples increases. Computational complexities for optimizing the novel information theoretic functions under sensor dynamics constraints are studied, and are proved to be NP-hard. A cumulative lower bound is then proposed to reduce the computational complexity to polynomial time.
Three sensor planning algorithms are developed according to the assumptions on the target kinematics and the sensor dynamics. For problems where the control space of the sensor is discrete, a greedy algorithm is proposed. The efficiency of the greedy algorithm is demonstrated by a numerical experiment with data of ocean currents obtained by moored buoys. A sweep line algorithm is developed for applications where the sensor control space is continuous and unconstrained. Synthetic simulations as well as physical experiments with ground robots and a surveillance camera are conducted to evaluate the performance of the sweep line algorithm. Moreover, a lexicographic algorithm is designed based on the cumulative lower bound of the novel information theoretic functions, for the scenario where the sensor dynamics are constrained. Numerical experiments with real data collected from indoor pedestrians by a commercial pan-tilt camera are performed to examine the lexicographic algorithm. Results from both the numerical simulations and the physical experiments show that the three sensor planning algorithms proposed in this dissertation based on the novel information theoretic functions are superior at learning the target kinematics with
little or no prior knowledge