807 resultados para Machine learning experiments
Resumo:
The paper addresses the problem of learning a regression model parameterized by a fixed-rank positive semidefinite matrix. The focus is on the nonlinear nature of the search space and on scalability to high-dimensional problems. The mathematical developments rely on the theory of gradient descent algorithms adapted to the Riemannian geometry that underlies the set of fixedrank positive semidefinite matrices. In contrast with previous contributions in the literature, no restrictions are imposed on the range space of the learned matrix. The resulting algorithms maintain a linear complexity in the problem size and enjoy important invariance properties. We apply the proposed algorithms to the problem of learning a distance function parameterized by a positive semidefinite matrix. Good performance is observed on classical benchmarks. © 2011 Gilles Meyer, Silvere Bonnabel and Rodolphe Sepulchre.
Resumo:
In this paper we develop a new approach to sparse principal component analysis (sparse PCA). We propose two single-unit and two block optimization formulations of the sparse PCA problem, aimed at extracting a single sparse dominant principal component of a data matrix, or more components at once, respectively. While the initial formulations involve nonconvex functions, and are therefore computationally intractable, we rewrite them into the form of an optimization program involving maximization of a convex function on a compact set. The dimension of the search space is decreased enormously if the data matrix has many more columns (variables) than rows. We then propose and analyze a simple gradient method suited for the task. It appears that our algorithm has best convergence properties in the case when either the objective function or the feasible set are strongly convex, which is the case with our single-unit formulations and can be enforced in the block case. Finally, we demonstrate numerically on a set of random and gene expression test problems that our approach outperforms existing algorithms both in quality of the obtained solution and in computational speed. © 2010 Michel Journée, Yurii Nesterov, Peter Richtárik and Rodolphe Sepulchre.
Generalized Spike-and-Slab Priors for Bayesian Group Feature Selection Using Expectation Propagation
Resumo:
Standard forms of density-functional theory (DFT) have good predictive power for many materials, but are not yet fully satisfactory for cluster, solid, and liquid forms of water. Recent work has stressed the importance of DFT errors in describing dispersion, but we note that errors in other parts of the energy may also contribute. We obtain information about the nature of DFT errors by using a many-body separation of the total energy into its 1-body, 2-body, and beyond-2-body components to analyze the deficiencies of the popular PBE and BLYP approximations for the energetics of water clusters and ice structures. The errors of these approximations are computed by using accurate benchmark energies from the coupled-cluster technique of molecular quantum chemistry and from quantum Monte Carlo calculations. The systems studied are isomers of the water hexamer cluster, the crystal structures Ih, II, XV, and VIII of ice, and two clusters extracted from ice VIII. For the binding energies of these systems, we use the machine-learning technique of Gaussian Approximation Potentials to correct successively for 1-body and 2-body errors of the DFT approximations. We find that even after correction for these errors, substantial beyond-2-body errors remain. The characteristics of the 2-body and beyond-2-body errors of PBE are completely different from those of BLYP, but the errors of both approximations disfavor the close approach of non-hydrogen-bonded monomers. We note the possible relevance of our findings to the understanding of liquid water.
Resumo:
We investigate the Student-t process as an alternative to the Gaussian process as a non-parametric prior over functions. We derive closed form expressions for the marginal likelihood and predictive distribution of a Student-t process, by integrating away an inverse Wishart process prior over the co-variance kernel of a Gaussian process model. We show surprising equivalences between different hierarchical Gaussian process models leading to Student-t processes, and derive a new sampling scheme for the inverse Wishart process, which helps elucidate these equivalences. Overall, we show that a Student-t process can retain the attractive properties of a Gaussian process - a nonparamet-ric representation, analytic marginal and predictive distributions, and easy model selection through covariance kernels - but has enhanced flexibility, and predictive covariances that, unlike a Gaussian process, explicitly depend on the values of training observations. We verify empirically that a Student-t process is especially useful in situations where there are changes in covariance structure, or in applications such as Bayesian optimization, where accurate predictive covariances are critical for good performance. These advantages come at no additional computational cost over Gaussian processes.
Resumo:
Choosing appropriate architectures and regularization strategies of deep networks is crucial to good predictive performance. To shed light on this problem, we analyze the analogous problem of constructing useful priors on compositions of functions. Specifically, we study the deep Gaussian process, a type of infinitely-wide, deep neural network. We show that in standard architectures, the representational capacity of the network tends to capture fewer degrees of freedom as the number of layers increases, retaining only a single degree of freedom in the limit. We propose an alternate network architecture which does not suffer from this pathology. We also examine deep covariance functions, obtained by composing infinitely many feature transforms. Lastly, we characterize the class of models obtained by performing dropout on Gaussian processes.
Resumo:
Copyright 2014 by the author(s). We present a nonparametric prior over reversible Markov chains. We use completely random measures, specifically gamma processes, to construct a countably infinite graph with weighted edges. By enforcing symmetry to make the edges undirected we define a prior over random walks on graphs that results in a reversible Markov chain. The resulting prior over infinite transition matrices is closely related to the hierarchical Dirichlet process but enforces reversibility. A reinforcement scheme has recently been proposed with similar properties, but the de Finetti measure is not well characterised. We take the alternative approach of explicitly constructing the mixing measure, which allows more straightforward and efficient inference at the cost of no longer having a closed form predictive distribution. We use our process to construct a reversible infinite HMM which we apply to two real datasets, one from epigenomics and one ion channel recording.
Resumo:
We present and test an extension of slow feature analysis as a novel approach to nonlinear blind source separation. The algorithm relies on temporal correlations and iteratively reconstructs a set of statistically independent sources from arbitrary nonlinear instantaneous mixtures. Simulations show that it is able to invert a complicated nonlinear mixture of two audio signals with a high reliability. The algorithm is based on a mathematical analysis of slow feature analysis for the case of input data that are generated from statistically independent sources. © 2014 Henning Sprekeler, Tiziano Zito and Laurenz Wiskott.
Resumo:
© 2015 John P. Cunningham and Zoubin Ghahramani. Linear dimensionality reduction methods are a cornerstone of analyzing high dimensional data, due to their simple geometric interpretations and typically attractive computational properties. These methods capture many data features of interest, such as covariance, dynamical structure, correlation between data sets, input-output relationships, and margin between data classes. Methods have been developed with a variety of names and motivations in many fields, and perhaps as a result the connections between all these methods have not been highlighted. Here we survey methods from this disparate literature as optimization programs over matrix manifolds. We discuss principal component analysis, factor analysis, linear multidimensional scaling, Fisher's linear discriminant analysis, canonical correlations analysis, maximum autocorrelation factors, slow feature analysis, sufficient dimensionality reduction, undercomplete independent component analysis, linear regression, distance metric learning, and more. This optimization framework gives insight to some rarely discussed shortcomings of well-known methods, such as the suboptimality of certain eigenvector solutions. Modern techniques for optimization over matrix manifolds enable a generic linear dimensionality reduction solver, which accepts as input data and an objective to be optimized, and returns, as output, an optimal low-dimensional projection of the data. This simple optimization framework further allows straightforward generalizations and novel variants of classical methods, which we demonstrate here by creating an orthogonal-projection canonical correlations analysis. More broadly, this survey and generic solver suggest that linear dimensionality reduction can move toward becoming a blackbox, objective-agnostic numerical technology.
Resumo:
We propose a novel information-theoretic approach for Bayesian optimization called Predictive Entropy Search (PES). At each iteration, PES selects the next evaluation point that maximizes the expected information gained with respect to the global maximum. PES codifies this intractable acquisition function in terms of the expected reduction in the differential entropy of the predictive distribution. This reformulation allows PES to obtain approximations that are both more accurate and efficient than other alternatives such as Entropy Search (ES). Furthermore, PES can easily perform a fully Bayesian treatment of the model hyperparameters while ES cannot. We evaluate PES in both synthetic and real-world applications, including optimization problems in machine learning, finance, biotechnology, and robotics. We show that the increased accuracy of PES leads to significant gains in optimization performance.
Resumo:
Optimization on manifolds is a rapidly developing branch of nonlinear optimization. Its focus is on problems where the smooth geometry of the search space can be leveraged to design effcient numerical algorithms. In particular, optimization on manifolds is well-suited to deal with rank and orthogonality constraints. Such structured constraints appear pervasively in machine learning applications, including low-rank matrix completion, sensor network localization, camera network registration, independent component analysis, metric learning, dimensionality reduction and so on. The Manopt toolbox, available at www.manopt.org, is a user-friendly, documented piece of software dedicated to simplify experimenting with state of the art Riemannian optimization algorithms. By dealing internally with most of the differential geometry, the package aims particularly at lowering the entrance barrier. © 2014 Nicolas Boumal.
Resumo:
Seismic sensors are widely used to detect moving target in ground sensor networks. Footstep detection is very important for security surveillance and other applications. Because of non-stationary characteristic of seismic signal and complex environment conditions, footstep detection is a very challenging problem. A novel wavelet denoising method based on singular value decomposition is used to solve these problems. The signal-to-noise ratio (SNR) of raw footstep signal is greatly improved using this strategy. The feature extraction method is also discussed after denosing procedure. Comparing, with kurtosis statistic feature, the wavelet energy feature is more promising for seismic footstep detection, especially in a long distance surveillance.
Resumo:
Multi-frame image super-resolution (SR) aims to utilize information from a set of low-resolution (LR) images to compose a high-resolution (HR) one. As it is desirable or essential in many real applications, recent years have witnessed the growing interest in the problem of multi-frame SR reconstruction. This set of algorithms commonly utilizes a linear observation model to construct the relationship between the recorded LR images to the unknown reconstructed HR image estimates. Recently, regularization-based schemes have been demonstrated to be effective because SR reconstruction is actually an ill-posed problem. Working within this promising framework, this paper first proposes two new regularization items, termed as locally adaptive bilateral total variation and consistency of gradients, to keep edges and flat regions, which are implicitly described in LR images, sharp and smooth, respectively. Thereafter, the combination of the proposed regularization items is superior to existing regularization items because it considers both edges and flat regions while existing ones consider only edges. Thorough experimental results show the effectiveness of the new algorithm for SR reconstruction. (C) 2009 Elsevier B.V. All rights reserved.
Resumo:
强化学习是一种重要的机器学习方法,随着计算机网络和分布式处理技术的飞速发展,多智能体系统中的分布式强化学习方法正受到越来越多的关注。论文将目前已有的各种分布式强化学习方法总结为中央强化学习、独立强化学习、群体强化学习、社会强化学习四类,然后探讨了这四类分布式强化学习方法的体系结构框架,并给出了这四类分布式强化学习方法的形式化定义。
Resumo:
We introduce and explore an approach to estimating statistical significance of classification accuracy, which is particularly useful in scientific applications of machine learning where high dimensionality of the data and the small number of training examples render most standard convergence bounds too loose to yield a meaningful guarantee of the generalization ability of the classifier. Instead, we estimate statistical significance of the observed classification accuracy, or the likelihood of observing such accuracy by chance due to spurious correlations of the high-dimensional data patterns with the class labels in the given training set. We adopt permutation testing, a non-parametric technique previously developed in classical statistics for hypothesis testing in the generative setting (i.e., comparing two probability distributions). We demonstrate the method on real examples from neuroimaging studies and DNA microarray analysis and suggest a theoretical analysis of the procedure that relates the asymptotic behavior of the test to the existing convergence bounds.