942 resultados para Blog datasets
Resumo:
EXTRACT (SEE PDF FOR FULL ABSTRACT): In this study we use ocean and atmosphere datasets from observations and from an ocean general circulation model integration to examine decadal time scale variability that is centered in the Pacific basin. We know that decadal variability is likely to have a strong expression in the Pacific basin; for example, a marked "shift" of cool season climate in the mid-1970s introduced major changes in Pacific SST and atmospheric circulation, along with many other physical and biological properties.
Resumo:
This paper presents an incremental learning solution for Linear Discriminant Analysis (LDA) and its applications to object recognition problems. We apply the sufficient spanning set approximation in three steps i.e. update for the total scatter matrix, between-class scatter matrix and the projected data matrix, which leads an online solution which closely agrees with the batch solution in accuracy while significantly reducing the computational complexity. The algorithm yields an efficient solution to incremental LDA even when the number of classes as well as the set size is large. The incremental LDA method has been also shown useful for semi-supervised online learning. Label propagation is done by integrating the incremental LDA into an EM framework. The method has been demonstrated in the task of merging large datasets which were collected during MPEG standardization for face image retrieval, face authentication using the BANCA dataset, and object categorisation using the Caltech101 dataset. © 2010 Springer Science+Business Media, LLC.
Resumo:
We introduce a new regression framework, Gaussian process regression networks (GPRN), which combines the structural properties of Bayesian neural networks with the non-parametric flexibility of Gaussian processes. This model accommodates input dependent signal and noise correlations between multiple response variables, input dependent length-scales and amplitudes, and heavy-tailed predictive distributions. We derive both efficient Markov chain Monte Carlo and variational Bayes inference procedures for this model. We apply GPRN as a multiple output regression and multivariate volatility model, demonstrating substantially improved performance over eight popular multiple output (multi-task) Gaussian process models and three multivariate volatility models on benchmark datasets, including a 1000 dimensional gene expression dataset.
Resumo:
In this paper a novel visualisation method for diffusion tensor MRI datasets is introduced. This is based on the use of Complex Wavelets in order to produce "stripy" textures which depict the anisotropic component of the diffusion tensors. Grey-scale pixel intensity is used to show the isotropic component. This paper also discusses enhancements of the technique for 3D visualisation. © 2004 IEEE.
Resumo:
Most previous work on trainable language generation has focused on two paradigms: (a) using a statistical model to rank a set of generated utterances, or (b) using statistics to inform the generation decision process. Both approaches rely on the existence of a handcrafted generator, which limits their scalability to new domains. This paper presents BAGEL, a statistical language generator which uses dynamic Bayesian networks to learn from semantically-aligned data produced by 42 untrained annotators. A human evaluation shows that BAGEL can generate natural and informative utterances from unseen inputs in the information presentation domain. Additionally, generation performance on sparse datasets is improved significantly by using certainty-based active learning, yielding ratings close to the human gold standard with a fraction of the data. © 2010 Association for Computational Linguistics.
Resumo:
A mixture of Gaussians fit to a single curved or heavy-tailed cluster will report that the data contains many clusters. To produce more appropriate clusterings, we introduce a model which warps a latent mixture of Gaussians to produce nonparametric cluster shapes. The possibly low-dimensional latent mixture model allows us to summarize the properties of the high-dimensional clusters (or density manifolds) describing the data. The number of manifolds, as well as the shape and dimension of each manifold is automatically inferred. We derive a simple inference scheme for this model which analytically integrates out both the mixture parameters and the warping function. We show that our model is effective for density estimation, performs better than infinite Gaussian mixture models at recovering the true number of clusters, and produces interpretable summaries of high-dimensional datasets.
Resumo:
This paper tackles the novel challenging problem of 3D object phenotype recognition from a single 2D silhouette. To bridge the large pose (articulation or deformation) and camera viewpoint changes between the gallery images and query image, we propose a novel probabilistic inference algorithm based on 3D shape priors. Our approach combines both generative and discriminative learning. We use latent probabilistic generative models to capture 3D shape and pose variations from a set of 3D mesh models. Based on these 3D shape priors, we generate a large number of projections for different phenotype classes, poses, and camera viewpoints, and implement Random Forests to efficiently solve the shape and pose inference problems. By model selection in terms of the silhouette coherency between the query and the projections of 3D shapes synthesized using the galleries, we achieve the phenotype recognition result as well as a fast approximate 3D reconstruction of the query. To verify the efficacy of the proposed approach, we present new datasets which contain over 500 images of various human and shark phenotypes and motions. The experimental results clearly show the benefits of using the 3D priors in the proposed method over previous 2D-based methods. © 2011 IEEE.
Resumo:
The Internet has enabled the creation of a growing number of large-scale knowledge bases in a variety of domains containing complementary information. Tools for automatically aligning these knowledge bases would make it possible to unify many sources of structured knowledge and answer complex queries. However, the efficient alignment of large-scale knowledge bases still poses a considerable challenge. Here, we present Simple Greedy Matching (SiGMa), a simple algorithm for aligning knowledge bases with millions of entities and facts. SiGMa is an iterative propagation algorithm which leverages both the structural information from the relationship graph as well as flexible similarity measures between entity properties in a greedy local search, thus making it scalable. Despite its greedy nature, our experiments indicate that SiGMa can efficiently match some of the world's largest knowledge bases with high precision. We provide additional experiments on benchmark datasets which demonstrate that SiGMa can outperform state-of-the-art approaches both in accuracy and efficiency.
Resumo:
A combination of singular systems analysis and analytic phase techniques are used to investigate the possible occurrence in observations of coherent synchronization between quasi-biennial and semi-annual oscillations (QBOs; SAOs) in the stratosphere and troposphere. Time series of zonal mean zonal winds near the Equator are analysed from the ERA-40 and ERA-interim reanalysis datasets over a ∼ 50-year period. In the stratosphere, the QBO is found to synchronize with the SAO almost all the time, but with a frequency ratio that changes erratically between 4:1, 5:1 and 6:1. A similar variable synchronization is also evident in the tropical troposphere between semi-annual and quasi-biennial cycles (known as TBOs). Mean zonal winds from ERA-40 and ERA-interim, and also time series of indices for the Indian and West Pacific monsoons, are commonly found to exhibit synchronization, with SAO/TBO ratios that vary between 4:1 and 7:1. Coherent synchronization between the QBO and tropical TBO does not appear to persist for long intervals, however. This suggests that both the QBO and tropical TBOs may be separately synchronized to SAOs that are themselves enslaved to the seasonal cycle, or to the annual cycle itself. However, the QBO and TBOs are evidently only weakly coupled between themselves and are frequently found to lose mutual coherence when each changes its frequency ratio to its respective SAO. This suggests a need to revise a commonly cited paradigm that advocates the use of stratospheric QBO indices as a predictor for tropospheric phenomena such as monsoons and hurricanes. © 2012 Royal Meteorological Society.
Resumo:
Real-time cardiac ultrasound allows monitoring the heart motion during intracardiac beating heart procedures. Our application assists atrial septal defect (ASD) closure techniques using real-time 3D ultrasound guidance. One major image processing challenge is the processing of information at high frame rate. We present an optimized block flow technique, which combines the probability-based velocity computation for an entire block with template matching. We propose adapted similarity constraints both from frame to frame, to conserve energy, and globally, to minimize errors. We show tracking results on eight in-vivo 4D datasets acquired from porcine beating-heart procedures. Computing velocity at the block level with an optimized scheme, our technique tracks ASD motion at 41 frames/s. We analyze the errors of motion estimation and retrieve the cardiac cycle in ungated images. © 2007 IEEE.
Resumo:
Atlases and statistical models play important roles in the personalization and simulation of cardiac physiology. For the study of the heart, however, the construction of comprehensive atlases and spatio-temporal models is faced with a number of challenges, in particular the need to handle large and highly variable image datasets, the multi-region nature of the heart, and the presence of complex as well as small cardiovascular structures. In this paper, we present a detailed atlas and spatio-temporal statistical model of the human heart based on a large population of 3D+time multi-slice computed tomography sequences, and the framework for its construction. It uses spatial normalization based on nonrigid image registration to synthesize a population mean image and establish the spatial relationships between the mean and the subjects in the population. Temporal image registration is then applied to resolve each subject-specific cardiac motion and the resulting transformations are used to warp a surface mesh representation of the atlas to fit the images of the remaining cardiac phases in each subject. Subsequently, we demonstrate the construction of a spatio-temporal statistical model of shape such that the inter-subject and dynamic sources of variation are suitably separated. The framework is applied to a 3D+time data set of 138 subjects. The data is drawn from a variety of pathologies, which benefits its generalization to new subjects and physiological studies. The obtained level of detail and the extendability of the atlas present an advantage over most cardiac models published previously. © 1982-2012 IEEE.
Resumo:
Copulas allow to learn marginal distributions separately from the multivariate dependence structure (copula) that links them together into a density function. Vine factorizations ease the learning of high-dimensional copulas by constructing a hierarchy of conditional bivariate copulas. However, to simplify inference, it is common to assume that each of these conditional bivariate copulas is independent from its conditioning variables. In this paper, we relax this assumption by discovering the latent functions that specify the shape of a conditional copula given its conditioning variables We learn these functions by following a Bayesian approach based on sparse Gaussian processes with expectation propagation for scalable, approximate inference. Experiments on real-world datasets show that, when modeling all conditional dependencies, we obtain better estimates of the underlying copula of the data.