989 resultados para Positive Matrix Factorization


Relevância:

80.00% 80.00%

Publicador:

Resumo:

Biological systems exhibit rich and complex behavior through the orchestrated interplay of a large array of components. It is hypothesized that separable subsystems with some degree of functional autonomy exist; deciphering their independent behavior and functionality would greatly facilitate understanding the system as a whole. Discovering and analyzing such subsystems are hence pivotal problems in the quest to gain a quantitative understanding of complex biological systems. In this work, using approaches from machine learning, physics and graph theory, methods for the identification and analysis of such subsystems were developed. A novel methodology, based on a recent machine learning algorithm known as non-negative matrix factorization (NMF), was developed to discover such subsystems in a set of large-scale gene expression data. This set of subsystems was then used to predict functional relationships between genes, and this approach was shown to score significantly higher than conventional methods when benchmarking them against existing databases. Moreover, a mathematical treatment was developed to treat simple network subsystems based only on their topology (independent of particular parameter values). Application to a problem of experimental interest demonstrated the need for extentions to the conventional model to fully explain the experimental data. Finally, the notion of a subsystem was evaluated from a topological perspective. A number of different protein networks were examined to analyze their topological properties with respect to separability, seeking to find separable subsystems. These networks were shown to exhibit separability in a nonintuitive fashion, while the separable subsystems were of strong biological significance. It was demonstrated that the separability property found was not due to incomplete or biased data, but is likely to reflect biological structure.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper is concerned with tensor clustering with the assistance of dimensionality reduction approaches. A class of formulation for tensor clustering is introduced based on tensor Tucker decomposition models. In this formulation, an extra tensor mode is formed by a collection of tensors of the same dimensions and then used to assist a Tucker decomposition in order to achieve data dimensionality reduction. We design two types of clustering models for the tensors: PCA Tensor Clustering model and Non-negative Tensor Clustering model, by utilizing different regularizations. The tensor clustering can thus be solved by the optimization method based on the alternative coordinate scheme. Interestingly, our experiments show that the proposed models yield comparable or even better performance compared to most recent clustering algorithms based on matrix factorization.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Joint modeling of related data sources has the potential to improve various data mining tasks such as transfer learning, multitask clustering, information retrieval etc. However, diversity among various data sources might outweigh the advantages of the joint modeling, and thus may result in performance degradations. To this end, we propose a regularized shared subspace learning framework, which can exploit the mutual strengths of related data sources while being immune to the effects of the variabilities of each source. This is achieved by further imposing a mutual orthogonality constraint on the constituent subspaces which segregates the common patterns from the source specific patterns, and thus, avoids performance degradations. Our approach is rooted in nonnegative matrix factorization and extends it further to enable joint analysis of related data sources. Experiments performed using three real world data sets for both retrieval and clustering applications demonstrate the benefits of regularization and validate the effectiveness of the model. Our proposed solution provides a formal framework appropriate for jointly analyzing related data sources and therefore, it is applicable to a wider context in data mining.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Although tagging has become increasingly popular in online image and video sharing systems, tags are known to be noisy, ambiguous, incomplete and subjective. These factors can seriously affect the precision of a social tag-based web retrieval system. Therefore improving the precision performance of these social tag-based web retrieval systems has become an increasingly important research topic. To this end, we propose a shared subspace learning framework to leverage a secondary source to improve retrieval performance from a primary dataset. This is achieved by learning a shared subspace between the two sources under a joint Nonnegative Matrix Factorization in which the level of subspace sharing can be explicitly controlled. We derive an efficient algorithm for learning the factorization, analyze its complexity, and provide proof of convergence. We validate the framework on image and video retrieval tasks in which tags from the LabelMe dataset are used to improve image retrieval performance from a Flickr dataset and video retrieval performance from a YouTube dataset. This has implications for how to exploit and transfer knowledge from readily available auxiliary tagging resources to improve another social web retrieval system. Our shared subspace learning framework is applicable to a range of problems where one needs to exploit the strengths existing among multiple and heterogeneous datasets.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper presents a novel Bayesian formulation to exploit shared structures across multiple data sources, constructing foundations for effective mining and retrieval across disparate domains. We jointly analyze diverse data sources using a unifying piece of metadata (textual tags). We propose a method based on Bayesian Probabilistic Matrix Factorization (BPMF) which is able to explicitly model the partial knowledge common to the datasets using shared subspaces and the knowledge specific to each dataset using individual subspaces. For the proposed model, we derive an efficient algorithm for learning the joint factorization based on Gibbs sampling. The effectiveness of the model is demonstrated by social media retrieval tasks across single and multiple media. The proposed solution is applicable to a wider context, providing a formal framework suitable for exploiting individual as well as mutual knowledge present across heterogeneous data sources of many kinds.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We propose a nonparametric Bayesian, linear Poisson gamma model for count data and use it for dictionary learning. A key property of this model is that it captures the parts-based representation similar to nonnegative matrix factorization. We present an auxiliary variable Gibbs sampler, which turns the intractable inference into a tractable one. Combining this inference procedure with the slice sampler of Indian buffet process, we show that our model can learn the number of factors automatically. Using synthetic and real-world datasets, we show that the proposed model outperforms other state-of-the-art nonparametric factor models.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Unlike in general recommendation scenarios where a user has only a single role, users in trust rating network, e.g. Epinions, are associated with two different roles simultaneously: as a truster and as a trustee. With different roles, users can show distinct preferences for rating items, which the previous approaches do not involve. Moreover, based on explicit single links between two users, existing methods can not capture the implicit correlation between two users who are similar but not socially connected. In this paper, we propose to learn dual role preferences (truster/trustee-specific preferences) for trust-aware recommendation by modeling explicit interactions (e.g., rating and trust) and implicit interactions. In particular, local links structure of trust network are exploited as two regularization terms to capture the implicit user correlation, in terms of truster/trustee-specific preferences. Using a real-world and open dataset, we conduct a comprehensive experimental study to investigate the performance of the proposed model, RoRec. The results show that RoRec outperforms other trust-aware recommendation approaches, in terms of prediction accuracy. Copyright 2014 ACM.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Autism Spectrum Disorder (ASD) is growing at a staggering rate, but, little is known about the cause of this condition. Inferring learning patterns from therapeutic performance data, and subsequently clustering ASD children into subgroups, is important to understand this domain, and more importantly to inform evidence-based intervention. However, this data-driven task was difficult in the past due to insufficiency of data to perform reliable analysis. For the first time, using data from a recent application for early intervention in autism (TOBY Play pad), whose download count is now exceeding 4500, we present in this paper the automatic discovery of learning patterns across 32 skills in sensory, imitation and language. We use unsupervised learning methods for this task, but a notorious problem with existing methods is the correct specification of number of patterns in advance, which in our case is even more difficult due to complexity of the data. To this end, we appeal to recent Bayesian nonparametric methods, in particular the use of Bayesian Nonparametric Factor Analysis. This model uses Indian Buffet Process (IBP) as prior on a binary matrix of infinite columns to allocate groups of intervention skills to children. The optimal number of learning patterns as well as subgroup assignments are inferred automatically from data. Our experimental results follow an exploratory approach, present different newly discovered learning patterns. To provide quantitative results, we also report the clustering evaluation against K-means and Nonnegative matrix factorization (NMF). In addition to the novelty of this new problem, we were able to demonstrate the suitability of Bayesian nonparametric models over parametric rivals.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper presents a convex geometry (CG)-based method for blind separation of nonnegative sources. First, the unaccessible source matrix is normalized to be column-sum-to-one by mapping the available observation matrix. Then, its zero-samples are found by searching the facets of the convex hull spanned by the mapped observations. Considering these zero-samples, a quadratic cost function with respect to each row of the unmixing matrix, together with a linear constraint in relation to the involved variables, is proposed. Upon which, an algorithm is presented to estimate the unmixing matrix by solving a classical convex optimization problem. Unlike the traditional blind source separation (BSS) methods, the CG-based method does not require the independence assumption, nor the uncorrelation assumption. Compared with the BSS methods that are specifically designed to distinguish between nonnegative sources, the proposed method requires a weaker sparsity condition. Provided simulation results illustrate the performance of our method.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

As a popular heuristic to the matrix rank minimization problem, nuclear norm minimization attracts intensive research attentions. Matrix factorization based algorithms can reduce the expensive computation cost of SVD for nuclear norm minimization. However, most matrix factorization based algorithms fail to provide the theoretical guarantee for convergence caused by their non-unique factorizations. This paper proposes an efficient and accurate Linearized Grass-mannian Optimization (Lingo) algorithm, which adopts matrix factorization and Grassmann manifold structure to alternatively minimize the subproblems. More specially, linearization strategy makes the auxiliary variables unnecessary and guarantees the close-form solution for low periteration complexity. Lingo then converts linearized objective function into a nuclear norm minimization over Grass-mannian manifold, which could remedy the non-unique of solution for the low-rank matrix factorization. Extensive comparison experiments demonstrate the accuracy and efficiency of Lingo algorithm. The global convergence of Lingo is guaranteed with theoretical proof, which also verifies the effectiveness of Lingo.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We establish sufficient conditions for a matrix to be almost totally positive, thus extending a result of Craven and Csordas who proved that the corresponding conditions guarantee that a matrix is strictly totally positive. Then we apply our main result in order to obtain a new criteria for a real algebraic polynomial to be a Hurwitz one. The properties of the corresponding extremal Hurwitz polynomials are discussed. (C) 2004 Elsevier B.V. All rights reserved.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Il crescente utilizzo di sistemi di analisi high-throughput per lo studio dello stato fisiologico e metabolico del corpo, ha evidenziato che una corretta alimentazione e una buona forma fisica siano fattori chiave per la salute. L'aumento dell'età media della popolazione evidenzia l'importanza delle strategie di contrasto delle patologie legate all'invecchiamento. Una dieta sana è il primo mezzo di prevenzione per molte patologie, pertanto capire come il cibo influisce sul corpo umano è di fondamentale importanza. In questo lavoro di tesi abbiamo affrontato la caratterizzazione dei sistemi di imaging radiografico Dual-energy X-ray Absorptiometry (DXA). Dopo aver stabilito una metodologia adatta per l'elaborazione di dati DXA su un gruppo di soggetti sani non obesi, la PCA ha evidenziato alcune proprietà emergenti dall'interpretazione delle componenti principali in termini delle variabili di composizione corporea restituite dalla DXA. Le prime componenti sono associabili ad indici macroscopici di descrizione corporea (come BMI e WHR). Queste componenti sono sorprendentemente stabili al variare dello status dei soggetti in età, sesso e nazionalità. Dati di analisi metabolica, ottenuti tramite Magnetic Resonance Spectroscopy (MRS) su campioni di urina, sono disponibili per circa mille anziani (provenienti da cinque paesi europei) di età compresa tra i 65 ed i 79 anni, non affetti da patologie gravi. I dati di composizione corporea sono altresì presenti per questi soggetti. L'algoritmo di Non-negative Matrix Factorization (NMF) è stato utilizzato per esprimere gli spettri MRS come combinazione di fattori di base interpretabili come singoli metaboliti. I fattori trovati sono stabili, quindi spettri metabolici di soggetti sono composti dallo stesso pattern di metaboliti indipendentemente dalla nazionalità. Attraverso un'analisi a singolo cieco sono stati trovati alti valori di correlazione tra le variabili di composizione corporea e lo stato metabolico dei soggetti. Ciò suggerisce la possibilità di derivare la composizione corporea dei soggetti a partire dal loro stato metabolico.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Thesis (Ph.D.)--University of Washington, 2016-04

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In recent years, the boundaries between e-commerce and social networking have become increasingly blurred. Many e-commerce websites support the mechanism of social login where users can sign on the websites using their social network identities such as their Facebook or Twitter accounts. Users can also post their newly purchased products on microblogs with links to the e-commerce product web pages. In this paper, we propose a novel solution for cross-site cold-start product recommendation, which aims to recommend products from e-commerce websites to users at social networking sites in 'cold-start' situations, a problem which has rarely been explored before. A major challenge is how to leverage knowledge extracted from social networking sites for cross-site cold-start product recommendation. We propose to use the linked users across social networking sites and e-commerce websites (users who have social networking accounts and have made purchases on e-commerce websites) as a bridge to map users' social networking features to another feature representation for product recommendation. In specific, we propose learning both users' and products' feature representations (called user embeddings and product embeddings, respectively) from data collected from e-commerce websites using recurrent neural networks and then apply a modified gradient boosting trees method to transform users' social networking features into user embeddings. We then develop a feature-based matrix factorization approach which can leverage the learnt user embeddings for cold-start product recommendation. Experimental results on a large dataset constructed from the largest Chinese microblogging service Sina Weibo and the largest Chinese B2C e-commerce website JingDong have shown the effectiveness of our proposed framework.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We present in this article an automated framework that extracts product adopter information from online reviews and incorporates the extracted information into feature-based matrix factorization formore effective product recommendation. In specific, we propose a bootstrapping approach for the extraction of product adopters from review text and categorize them into a number of different demographic categories. The aggregated demographic information of many product adopters can be used to characterize both products and users in the form of distributions over different demographic categories. We further propose a graphbased method to iteratively update user- and product-related distributions more reliably in a heterogeneous user-product graph and incorporate them as features into the matrix factorization approach for product recommendation. Our experimental results on a large dataset crawled from JINGDONG, the largest B2C e-commerce website in China, show that our proposed framework outperforms a number of competitive baselines for product recommendation.