863 resultados para Gaussian kernel
Resumo:
We investigate the Student-t process as an alternative to the Gaussian process as a non-parametric prior over functions. We derive closed form expressions for the marginal likelihood and predictive distribution of a Student-t process, by integrating away an inverse Wishart process prior over the co-variance kernel of a Gaussian process model. We show surprising equivalences between different hierarchical Gaussian process models leading to Student-t processes, and derive a new sampling scheme for the inverse Wishart process, which helps elucidate these equivalences. Overall, we show that a Student-t process can retain the attractive properties of a Gaussian process - a nonparamet-ric representation, analytic marginal and predictive distributions, and easy model selection through covariance kernels - but has enhanced flexibility, and predictive covariances that, unlike a Gaussian process, explicitly depend on the values of training observations. We verify empirically that a Student-t process is especially useful in situations where there are changes in covariance structure, or in applications such as Bayesian optimization, where accurate predictive covariances are critical for good performance. These advantages come at no additional computational cost over Gaussian processes.
Resumo:
In a Bayesian learning setting, the posterior distribution of a predictive model arises from a trade-off between its prior distribution and the conditional likelihood of observed data. Such distribution functions usually rely on additional hyperparameters which need to be tuned in order to achieve optimum predictive performance; this operation can be efficiently performed in an Empirical Bayes fashion by maximizing the posterior marginal likelihood of the observed data. Since the score function of this optimization problem is in general characterized by the presence of local optima, it is necessary to resort to global optimization strategies, which require a large number of function evaluations. Given that the evaluation is usually computationally intensive and badly scaled with respect to the dataset size, the maximum number of observations that can be treated simultaneously is quite limited. In this paper, we consider the case of hyperparameter tuning in Gaussian process regression. A straightforward implementation of the posterior log-likelihood for this model requires O(N^3) operations for every iteration of the optimization procedure, where N is the number of examples in the input dataset. We derive a novel set of identities that allow, after an initial overhead of O(N^3), the evaluation of the score function, as well as the Jacobian and Hessian matrices, in O(N) operations. We prove how the proposed identities, that follow from the eigendecomposition of the kernel matrix, yield a reduction of several orders of magnitude in the computation time for the hyperparameter optimization problem. Notably, the proposed solution provides computational advantages even with respect to state of the art approximations that rely on sparse kernel matrices.
Resumo:
A greedy technique is proposed to construct parsimonious kernel classifiers using the orthogonal forward selection method and boosting based on Fisher ratio for class separability measure. Unlike most kernel classification methods, which restrict kernel means to the training input data and use a fixed common variance for all the kernel terms, the proposed technique can tune both the mean vector and diagonal covariance matrix of individual kernel by incrementally maximizing Fisher ratio for class separability measure. An efficient weighted optimization method is developed based on boosting to append kernels one by one in an orthogonal forward selection procedure. Experimental results obtained using this construction technique demonstrate that it offers a viable alternative to the existing state-of-the-art kernel modeling methods for constructing sparse Gaussian radial basis function network classifiers. that generalize well.
Resumo:
A class identification algorithms is introduced for Gaussian process(GP)models.The fundamental approach is to propose a new kernel function which leads to a covariance matrix with low rank,a property that is consequently exploited for computational efficiency for both model parameter estimation and model predictions.The objective of either maximizing the marginal likelihood or the Kullback–Leibler (K–L) divergence between the estimated output probability density function(pdf)and the true pdf has been used as respective cost functions.For each cost function,an efficient coordinate descent algorithm is proposed to estimate the kernel parameters using a one dimensional derivative free search, and noise variance using a fast gradient descent algorithm. Numerical examples are included to demonstrate the effectiveness of the new identification approaches.
Resumo:
A new class of parameter estimation algorithms is introduced for Gaussian process regression (GPR) models. It is shown that the integration of the GPR model with probability distance measures of (i) the integrated square error and (ii) Kullback–Leibler (K–L) divergence are analytically tractable. An efficient coordinate descent algorithm is proposed to iteratively estimate the kernel width using golden section search which includes a fast gradient descent algorithm as an inner loop to estimate the noise variance. Numerical examples are included to demonstrate the effectiveness of the new identification approaches.
Resumo:
A particle filter method is presented for the discrete-time filtering problem with nonlinear ItA ` stochastic ordinary differential equations (SODE) with additive noise supposed to be analytically integrable as a function of the underlying vector-Wiener process and time. The Diffusion Kernel Filter is arrived at by a parametrization of small noise-driven state fluctuations within branches of prediction and a local use of this parametrization in the Bootstrap Filter. The method applies for small noise and short prediction steps. With explicit numerical integrators, the operations count in the Diffusion Kernel Filter is shown to be smaller than in the Bootstrap Filter whenever the initial state for the prediction step has sufficiently few moments. The established parametrization is a dual-formula for the analysis of sensitivity to gaussian-initial perturbations and the analysis of sensitivity to noise-perturbations, in deterministic models, showing in particular how the stability of a deterministic dynamics is modeled by noise on short times and how the diffusion matrix of an SODE should be modeled (i.e. defined) for a gaussian-initial deterministic problem to be cast into an SODE problem. From it, a novel definition of prediction may be proposed that coincides with the deterministic path within the branch of prediction whose information entropy at the end of the prediction step is closest to the average information entropy over all branches. Tests are made with the Lorenz-63 equations, showing good results both for the filter and the definition of prediction.
Resumo:
The asymptotic safety scenario allows to define a consistent theory of quantized gravity within the framework of quantum field theory. The central conjecture of this scenario is the existence of a non-Gaussian fixed point of the theory's renormalization group flow, that allows to formulate renormalization conditions that render the theory fully predictive. Investigations of this possibility use an exact functional renormalization group equation as a primary non-perturbative tool. This equation implements Wilsonian renormalization group transformations, and is demonstrated to represent a reformulation of the functional integral approach to quantum field theory.rnAs its main result, this thesis develops an algebraic algorithm which allows to systematically construct the renormalization group flow of gauge theories as well as gravity in arbitrary expansion schemes. In particular, it uses off-diagonal heat kernel techniques to efficiently handle the non-minimal differential operators which appear due to gauge symmetries. The central virtue of the algorithm is that no additional simplifications need to be employed, opening the possibility for more systematic investigations of the emergence of non-perturbative phenomena. As a by-product several novel results on the heat kernel expansion of the Laplace operator acting on general gauge bundles are obtained.rnThe constructed algorithm is used to re-derive the renormalization group flow of gravity in the Einstein-Hilbert truncation, showing the manifest background independence of the results. The well-studied Einstein-Hilbert case is further advanced by taking the effect of a running ghost field renormalization on the gravitational coupling constants into account. A detailed numerical analysis reveals a further stabilization of the found non-Gaussian fixed point.rnFinally, the proposed algorithm is applied to the case of higher derivative gravity including all curvature squared interactions. This establishes an improvement of existing computations, taking the independent running of the Euler topological term into account. Known perturbative results are reproduced in this case from the renormalization group equation, identifying however a unique non-Gaussian fixed point.rn
Resumo:
In the context of expensive numerical experiments, a promising solution for alleviating the computational costs consists of using partially converged simulations instead of exact solutions. The gain in computational time is at the price of precision in the response. This work addresses the issue of fitting a Gaussian process model to partially converged simulation data for further use in prediction. The main challenge consists of the adequate approximation of the error due to partial convergence, which is correlated in both design variables and time directions. Here, we propose fitting a Gaussian process in the joint space of design parameters and computational time. The model is constructed by building a nonstationary covariance kernel that reflects accurately the actual structure of the error. Practical solutions are proposed for solving parameter estimation issues associated with the proposed model. The method is applied to a computational fluid dynamics test case and shows significant improvement in prediction compared to a classical kriging model.
Resumo:
Purpose: A fully three-dimensional (3D) massively parallelizable list-mode ordered-subsets expectation-maximization (LM-OSEM) reconstruction algorithm has been developed for high-resolution PET cameras. System response probabilities are calculated online from a set of parameters derived from Monte Carlo simulations. The shape of a system response for a given line of response (LOR) has been shown to be asymmetrical around the LOR. This work has been focused on the development of efficient region-search techniques to sample the system response probabilities, which are suitable for asymmetric kernel models, including elliptical Gaussian models that allow for high accuracy and high parallelization efficiency. The novel region-search scheme using variable kernel models is applied in the proposed PET reconstruction algorithm. Methods: A novel region-search technique has been used to sample the probability density function in correspondence with a small dynamic subset of the field of view that constitutes the region of response (ROR). The ROR is identified around the LOR by searching for any voxel within a dynamically calculated contour. The contour condition is currently defined as a fixed threshold over the posterior probability, and arbitrary kernel models can be applied using a numerical approach. The processing of the LORs is distributed in batches among the available computing devices, then, individual LORs are processed within different processing units. In this way, both multicore and multiple many-core processing units can be efficiently exploited. Tests have been conducted with probability models that take into account the noncolinearity, positron range, and crystal penetration effects, that produced tubes of response with varying elliptical sections whose axes were a function of the crystal's thickness and angle of incidence of the given LOR. The algorithm treats the probability model as a 3D scalar field defined within a reference system aligned with the ideal LOR. Results: This new technique provides superior image quality in terms of signal-to-noise ratio as compared with the histogram-mode method based on precomputed system matrices available for a commercial small animal scanner. Reconstruction times can be kept low with the use of multicore, many-core architectures, including multiple graphic processing units. Conclusions: A highly parallelizable LM reconstruction method has been proposed based on Monte Carlo simulations and new parallelization techniques aimed at improving the reconstruction speed and the image signal-to-noise of a given OSEM algorithm. The method has been validated using simulated and real phantoms. A special advantage of the new method is the possibility of defining dynamically the cut-off threshold over the calculated probabilities thus allowing for a direct control on the trade-off between speed and quality during the reconstruction.
Resumo:
Purely data-driven approaches for machine learning present difficulties when data are scarce relative to the complexity of the model or when the model is forced to extrapolate. On the other hand, purely mechanistic approaches need to identify and specify all the interactions in the problem at hand (which may not be feasible) and still leave the issue of how to parameterize the system. In this paper, we present a hybrid approach using Gaussian processes and differential equations to combine data-driven modeling with a physical model of the system. We show how different, physically inspired, kernel functions can be developed through sensible, simple, mechanistic assumptions about the underlying system. The versatility of our approach is illustrated with three case studies from motion capture, computational biology, and geostatistics.
Resumo:
The FANOVA (or “Sobol’-Hoeffding”) decomposition of multivariate functions has been used for high-dimensional model representation and global sensitivity analysis. When the objective function f has no simple analytic form and is costly to evaluate, computing FANOVA terms may be unaffordable due to numerical integration costs. Several approximate approaches relying on Gaussian random field (GRF) models have been proposed to alleviate these costs, where f is substituted by a (kriging) predictor or by conditional simulations. Here we focus on FANOVA decompositions of GRF sample paths, and we notably introduce an associated kernel decomposition into 4 d 4d terms called KANOVA. An interpretation in terms of tensor product projections is obtained, and it is shown that projected kernels control both the sparsity of GRF sample paths and the dependence structure between FANOVA effects. Applications on simulated data show the relevance of the approach for designing new classes of covariance kernels dedicated to high-dimensional kriging.
Resumo:
Based on a simple convexity lemma, we develop bounds for different types of Bayesian prediction errors for regression with Gaussian processes. The basic bounds are formulated for a fixed training set. Simpler expressions are obtained for sampling from an input distribution which equals the weight function of the covariance kernel, yielding asymptotically tight results. The results are compared with numerical experiments.
Resumo:
We develop an approach for sparse representations of Gaussian Process (GP) models (which are Bayesian types of kernel machines) in order to overcome their limitations for large data sets. The method is based on a combination of a Bayesian online algorithm together with a sequential construction of a relevant subsample of the data which fully specifies the prediction of the GP model. By using an appealing parametrisation and projection techniques that use the RKHS norm, recursions for the effective parameters and a sparse Gaussian approximation of the posterior process are obtained. This allows both for a propagation of predictions as well as of Bayesian error measures. The significance and robustness of our approach is demonstrated on a variety of experiments.
Resumo:
In recent years there has been an increased interest in applying non-parametric methods to real-world problems. Significant research has been devoted to Gaussian processes (GPs) due to their increased flexibility when compared with parametric models. These methods use Bayesian learning, which generally leads to analytically intractable posteriors. This thesis proposes a two-step solution to construct a probabilistic approximation to the posterior. In the first step we adapt the Bayesian online learning to GPs: the final approximation to the posterior is the result of propagating the first and second moments of intermediate posteriors obtained by combining a new example with the previous approximation. The propagation of em functional forms is solved by showing the existence of a parametrisation to posterior moments that uses combinations of the kernel function at the training points, transforming the Bayesian online learning of functions into a parametric formulation. The drawback is the prohibitive quadratic scaling of the number of parameters with the size of the data, making the method inapplicable to large datasets. The second step solves the problem of the exploding parameter size and makes GPs applicable to arbitrarily large datasets. The approximation is based on a measure of distance between two GPs, the KL-divergence between GPs. This second approximation is with a constrained GP in which only a small subset of the whole training dataset is used to represent the GP. This subset is called the em Basis Vector, or BV set and the resulting GP is a sparse approximation to the true posterior. As this sparsity is based on the KL-minimisation, it is probabilistic and independent of the way the posterior approximation from the first step is obtained. We combine the sparse approximation with an extension to the Bayesian online algorithm that allows multiple iterations for each input and thus approximating a batch solution. The resulting sparse learning algorithm is a generic one: for different problems we only change the likelihood. The algorithm is applied to a variety of problems and we examine its performance both on more classical regression and classification tasks and to the data-assimilation and a simple density estimation problems.
Resumo:
We develop an approach for sparse representations of Gaussian Process (GP) models (which are Bayesian types of kernel machines) in order to overcome their limitations for large data sets. The method is based on a combination of a Bayesian online algorithm together with a sequential construction of a relevant subsample of the data which fully specifies the prediction of the GP model. By using an appealing parametrisation and projection techniques that use the RKHS norm, recursions for the effective parameters and a sparse Gaussian approximation of the posterior process are obtained. This allows both for a propagation of predictions as well as of Bayesian error measures. The significance and robustness of our approach is demonstrated on a variety of experiments.