984 resultados para Indian buffet process


Relevância:

100.00% 100.00%

Publicador:

Resumo:

BACKGROUND: Nonparametric Bayesian techniques have been developed recently to extend the sophistication of factor models, allowing one to infer the number of appropriate factors from the observed data. We consider such techniques for sparse factor analysis, with application to gene-expression data from three virus challenge studies. Particular attention is placed on employing the Beta Process (BP), the Indian Buffet Process (IBP), and related sparseness-promoting techniques to infer a proper number of factors. The posterior density function on the model parameters is computed using Gibbs sampling and variational Bayesian (VB) analysis. RESULTS: Time-evolving gene-expression data are considered for respiratory syncytial virus (RSV), Rhino virus, and influenza, using blood samples from healthy human subjects. These data were acquired in three challenge studies, each executed after receiving institutional review board (IRB) approval from Duke University. Comparisons are made between several alternative means of per-forming nonparametric factor analysis on these data, with comparisons as well to sparse-PCA and Penalized Matrix Decomposition (PMD), closely related non-Bayesian approaches. CONCLUSIONS: Applying the Beta Process to the factor scores, or to the singular values of a pseudo-SVD construction, the proposed algorithms infer the number of factors in gene-expression data. For real data the "true" number of factors is unknown; in our simulations we consider a range of noise variances, and the proposed Bayesian models inferred the number of factors accurately relative to other methods in the literature, such as sparse-PCA and PMD. We have also identified a "pan-viral" factor of importance for each of the three viruses considered in this study. We have identified a set of genes associated with this pan-viral factor, of interest for early detection of such viruses based upon the host response, as quantified via gene-expression data.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We propose a nonparametric Bayesian, linear Poisson gamma model for count data and use it for dictionary learning. A key property of this model is that it captures the parts-based representation similar to nonnegative matrix factorization. We present an auxiliary variable Gibbs sampler, which turns the intractable inference into a tractable one. Combining this inference procedure with the slice sampler of Indian buffet process, we show that our model can learn the number of factors automatically. Using synthetic and real-world datasets, we show that the proposed model outperforms other state-of-the-art nonparametric factor models.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Autism Spectrum Disorder (ASD) is growing at a staggering rate, but, little is known about the cause of this condition. Inferring learning patterns from therapeutic performance data, and subsequently clustering ASD children into subgroups, is important to understand this domain, and more importantly to inform evidence-based intervention. However, this data-driven task was difficult in the past due to insufficiency of data to perform reliable analysis. For the first time, using data from a recent application for early intervention in autism (TOBY Play pad), whose download count is now exceeding 4500, we present in this paper the automatic discovery of learning patterns across 32 skills in sensory, imitation and language. We use unsupervised learning methods for this task, but a notorious problem with existing methods is the correct specification of number of patterns in advance, which in our case is even more difficult due to complexity of the data. To this end, we appeal to recent Bayesian nonparametric methods, in particular the use of Bayesian Nonparametric Factor Analysis. This model uses Indian Buffet Process (IBP) as prior on a binary matrix of infinite columns to allocate groups of intervention skills to children. The optimal number of learning patterns as well as subgroup assignments are inferred automatically from data. Our experimental results follow an exploratory approach, present different newly discovered learning patterns. To provide quantitative results, we also report the clustering evaluation against K-means and Nonnegative matrix factorization (NMF). In addition to the novelty of this new problem, we were able to demonstrate the suitability of Bayesian nonparametric models over parametric rivals.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The discovery of contexts is important for context-aware applications in pervasive computing. This is a challenging problem because of the stream nature of data, the complexity and changing nature of contexts. We propose a Bayesian nonparametric model for the detection of co-location contexts from Bluetooth signals. By using an Indian buffet process as the prior distribution, the model can discover the number of contexts automatically. We introduce a novel fixed-lag particle filter that processes data incrementally. This sampling scheme is especially suitable for pervasive computing as the computational requirements remain constant in spite of growing data. We examine our model on a synthetic dataset and two real world datasets. To verify the discovered contexts, we compare them to the communities detected by the Louvain method, showing a strong correlation between the results of the two methods. Fixed-lag particle filter is compared with Gibbs sampling in terms of the normalized factorization error that shows a close performance between the two inference methods. As fixed-lag particle filter processes a small chunk of data when it comes and does not need to be restarted, its execution time is significantly shorter than that of Gibbs sampling.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

L’un des problèmes importants en apprentissage automatique est de déterminer la complexité du modèle à apprendre. Une trop grande complexité mène au surapprentissage, ce qui correspond à trouver des structures qui n’existent pas réellement dans les données, tandis qu’une trop faible complexité mène au sous-apprentissage, c’est-à-dire que l’expressivité du modèle est insuffisante pour capturer l’ensemble des structures présentes dans les données. Pour certains modèles probabilistes, la complexité du modèle se traduit par l’introduction d’une ou plusieurs variables cachées dont le rôle est d’expliquer le processus génératif des données. Il existe diverses approches permettant d’identifier le nombre approprié de variables cachées d’un modèle. Cette thèse s’intéresse aux méthodes Bayésiennes nonparamétriques permettant de déterminer le nombre de variables cachées à utiliser ainsi que leur dimensionnalité. La popularisation des statistiques Bayésiennes nonparamétriques au sein de la communauté de l’apprentissage automatique est assez récente. Leur principal attrait vient du fait qu’elles offrent des modèles hautement flexibles et dont la complexité s’ajuste proportionnellement à la quantité de données disponibles. Au cours des dernières années, la recherche sur les méthodes d’apprentissage Bayésiennes nonparamétriques a porté sur trois aspects principaux : la construction de nouveaux modèles, le développement d’algorithmes d’inférence et les applications. Cette thèse présente nos contributions à ces trois sujets de recherches dans le contexte d’apprentissage de modèles à variables cachées. Dans un premier temps, nous introduisons le Pitman-Yor process mixture of Gaussians, un modèle permettant l’apprentissage de mélanges infinis de Gaussiennes. Nous présentons aussi un algorithme d’inférence permettant de découvrir les composantes cachées du modèle que nous évaluons sur deux applications concrètes de robotique. Nos résultats démontrent que l’approche proposée surpasse en performance et en flexibilité les approches classiques d’apprentissage. Dans un deuxième temps, nous proposons l’extended cascading Indian buffet process, un modèle servant de distribution de probabilité a priori sur l’espace des graphes dirigés acycliques. Dans le contexte de réseaux Bayésien, ce prior permet d’identifier à la fois la présence de variables cachées et la structure du réseau parmi celles-ci. Un algorithme d’inférence Monte Carlo par chaîne de Markov est utilisé pour l’évaluation sur des problèmes d’identification de structures et d’estimation de densités. Dans un dernier temps, nous proposons le Indian chefs process, un modèle plus général que l’extended cascading Indian buffet process servant à l’apprentissage de graphes et d’ordres. L’avantage du nouveau modèle est qu’il admet les connections entres les variables observables et qu’il prend en compte l’ordre des variables. Nous présentons un algorithme d’inférence Monte Carlo par chaîne de Markov avec saut réversible permettant l’apprentissage conjoint de graphes et d’ordres. L’évaluation est faite sur des problèmes d’estimations de densité et de test d’indépendance. Ce modèle est le premier modèle Bayésien nonparamétrique permettant d’apprendre des réseaux Bayésiens disposant d’une structure complètement arbitraire.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The size of online image datasets is constantly increasing. Considering an image dataset with millions of images, image retrieval becomes a seemingly intractable problem for exhaustive similarity search algorithms. Hashing methods, which encodes high-dimensional descriptors into compact binary strings, have become very popular because of their high efficiency in search and storage capacity. In the first part, we propose a multimodal retrieval method based on latent feature models. The procedure consists of a nonparametric Bayesian framework for learning underlying semantically meaningful abstract features in a multimodal dataset, a probabilistic retrieval model that allows cross-modal queries and an extension model for relevance feedback. In the second part, we focus on supervised hashing with kernels. We describe a flexible hashing procedure that treats binary codes and pairwise semantic similarity as latent and observed variables, respectively, in a probabilistic model based on Gaussian processes for binary classification. We present a scalable inference algorithm with the sparse pseudo-input Gaussian process (SPGP) model and distributed computing. In the last part, we define an incremental hashing strategy for dynamic databases where new images are added to the databases frequently. The method is based on a two-stage classification framework using binary and multi-class SVMs. The proposed method also enforces balance in binary codes by an imbalance penalty to obtain higher quality binary codes. We learn hash functions by an efficient algorithm where the NP-hard problem of finding optimal binary codes is solved via cyclic coordinate descent and SVMs are trained in a parallelized incremental manner. For modifications like adding images from an unseen class, we propose an incremental procedure for effective and efficient updates to the previous hash functions. Experiments on three large-scale image datasets demonstrate that the incremental strategy is capable of efficiently updating hash functions to the same retrieval performance as hashing from scratch.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The spectrum nature and heterogeneity within autism spectrum disorders (ASD) pose as a challenge for treatment. Personalisation of syllabus for children with ASD can improve the efficacy of learning by adjusting the number of opportunities and deciding the course of syllabus. We research the data-motivated approach in an attempt to disentangle this heterogeneity for personalisation of syllabus. With the help of technology and a structured syllabus, collecting data while a child with ASD masters the skills is made possible. The performance data collected are, however, growing and contain missing elements based on the pace and the course each child takes while navigating through the syllabus. Bayesian nonparametric methods are known for automatically discovering the number of latent components and their parameters when the model involves higher complexity. We propose a nonparametric Bayesian matrix factorisation model that discovers learning patterns and the way participants associate with them. Our model is built upon the linear Poisson gamma model (LPGM) with an Indian buffet process prior and extended to incorporate data with missing elements. In this paper, for the first time we have presented learning patterns deduced automatically from data mining and machine learning methods using intervention data recorded for over 500 children with ASD. We compare the results with non-negative matrix factorisation and K-means, which being parametric, not only require us to specify the number of learning patterns in advance, but also do not have a principle approach to deal with missing data. The F1 score observed over varying degree of similarity measure (Jaccard Index) suggests that LPGM yields the best outcome. By observing these patterns with additional knowledge regarding the syllabus it may be possible to observe the progress and dynamically modify the syllabus for improved learning.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Modelling is fundamental to many fields of science and engineering. A model can be thought of as a representation of possible data one could predict from a system. The probabilistic approach to modelling uses probability theory to express all aspects of uncertainty in the model. The probabilistic approach is synonymous with Bayesian modelling, which simply uses the rules of probability theory in order to make predictions, compare alternative models, and learn model parameters and structure from data. This simple and elegant framework is most powerful when coupled with flexible probabilistic models. Flexibility is achieved through the use of Bayesian non-parametrics. This article provides an overview of probabilistic modelling and an accessible survey of some of the main tools in Bayesian non-parametrics. The survey covers the use of Bayesian non-parametrics for modelling unknown functions, density estimation, clustering, time-series modelling, and representing sparsity, hierarchies, and covariance structure. More specifically, it gives brief non-technical overviews of Gaussian processes, Dirichlet processes, infinite hidden Markov models, Indian buffet processes, Kingman's coalescent, Dirichlet diffusion trees and Wishart processes.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This paper probes how two small foundries in Belgaum, Karnataka State, India, have achieved technological innovations successfully based on their technological capability and customer needs, enabling them to sail through the competitive environment. This study brought out that technically qualified entrepreneurs of both the foundries have carried out technological innovations, mainly due to their self-motivation and self-efforts. Changing product designs, as desired or directed by the customers, cost reduction, quality improvement and import substitution through reverse engineering are the characteristics of these technological innovations. These incremental innovations have enabled the entrepreneurs of the two foundries to enhance competitiveness, grow in the domestic market and penetrate the international market and grow in size over time.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This paper probes how two small foundries in Belgaum, Karnataka State, India, have achieved technological innovations successfully based on their technological capability and customer needs, enabling them to sail through the competitive environment. This study brought out that technically qualified entrepreneurs of both the foundries have carried out technological innovations, mainly due to their self-motivation and self-efforts. Changing product designs, as desired or directed by the customers, cost reduction, quality improvement and import substitution through reverse engineering are the characteristics of these technological innovations. These incremental innovations have enabled the entrepreneurs of the two foundries to enhance competitiveness, grow in the domestic market and penetrate the international market and grow in size over time.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The basic features of climatology and interannual variations of tropical Pacific and Indian Oceans were analyzed using a coupled general circulation model (CGCM), which was constituted with an intermediate 2.5-layer ocean model and atmosphere model ECHAM4. The CGCM well captures the spatial and temporal structure of the Pacific El Nino-Southern Oscillation (ENSO) and the variability features in the tropical Indian Ocean. The influence of Pacific air-sea coupled process on the Indian Ocean variability was investigated carefully by conducting numerical experiments. Results show that the occurrence frequency of positive/negative Indian Ocean Dipole (IOD) event will decrease/increase with the presence/absence of the coupled process in the Pacific Ocean. Further analysis demonstrated that the air-sea coupled process in the Pacific Ocean affects the IOD variability mainly by influencing the zonal gradient of thermocline via modulating the background sea surface wind.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The nature and extent of protection secured to personal liberty has been a subject matter of great controversy and debate. The expression "procedure established by law" as a standard of protection for personal liberty has been looked upon as highly unsatisfactory and inadequate. For, unlike the specific attributes of liberty that are separately guaranteed under Art.19, ‘personal liberty‘ as guaranteed by Art.21 does not obligate the .Legislature to comply with the requirements of justice and reasonableness as and when it enchroaches upon that right. Though the concept of ‘personal liberty‘ has received an evolutive and expansive meaning through judicial process, the standard of protection which the judicial process could secure to personal liberty through the interpretation of Art.21 has been far from satisfactory Even after four decades of judicial process in the interpretation of Art.21 the problem of evolving a just and adequate standard of protection for personal liberty in that Article continues to be 21 crucial constitutional issue, craving for a. satisfactory solution. And the present study is a humble attempt to unravel this problem and to Search for a reasonable solution.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

A study focusing on the identification of return generating factors and to the extent of their influence on share prices the outcome will be a tool for investment analysis in the hands of investors portfolio managers and mutual funds who are mostly concerned with changing share prices. Since the study takes into account the influence of macroeconomic variables on variations in share returns by using the outcome the government can frame out suitable policies on long term basis and that will help in nurturing a healthy economy and resultant stock market. As every company management tries to maximize the wealth of the share holders a clear idea about the return generating variables and their influence will help the management to frame various policies to maximize the wealth of the shareholders.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Whereas the predominance of El Niño Southern Oscillation (ENSO) mode in the tropical Pacific sea surface temperature (SST) variability is well established, no such consensus seems to have been reached by climate scientists regarding the Indian Ocean. While a number of researchers think that the Indian Ocean SST variability is dominated by an active dipolar-type mode of variability, similar to ENSO, others suggest that the variability is mostly passive and behaves like an autocorrelated noise. For example, it is suggested recently that the Indian Ocean SST variability is consistent with the null hypothesis of a homogeneous diffusion process. However, the existence of the basin-wide warming trend represents a deviation from a homogeneous diffusion process, which needs to be considered. An efficient way of detrending, based on differencing, is introduced and applied to the Hadley Centre ice and SST. The filtered SST anomalies over the basin (23.5N-29.5S, 30.5E-119.5E) are then analysed and found to be inconsistent with the null hypothesis on intraseasonal and interannual timescales. The same differencing method is then applied to the smaller tropical Indian Ocean domain. This smaller domain is also inconsistent with the null hypothesis on intraseasonal and interannual timescales. In particular, it is found that the leading mode of variability yields the Indian Ocean dipole, and departs significantly from the null hypothesis only in the autumn season.