992 resultados para Matrix factorization


Relevância:

100.00% 100.00%

Publicador:

Resumo:

When document corpus is very large, we often need to reduce the number of features. But it is not possible to apply conventional Non-negative Matrix Factorization(NMF) on billion by million matrix as the matrix may not fit in memory. Here we present novel Online NMF algorithm. Using Online NMF, we reduced original high-dimensional space to low-dimensional space. Then we cluster all the documents in reduced dimension using k-means algorithm. We experimentally show that by processing small subsets of documents we will be able to achieve good performance. The method proposed outperforms existing algorithms.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Clustering techniques which can handle incomplete data have become increasingly important due to varied applications in marketing research, medical diagnosis and survey data analysis. Existing techniques cope up with missing values either by using data modification/imputation or by partial distance computation, often unreliable depending on the number of features available. In this paper, we propose a novel approach for clustering data with missing values, which performs the task by Symmetric Non-Negative Matrix Factorization (SNMF) of a complete pair-wise similarity matrix, computed from the given incomplete data. To accomplish this, we define a novel similarity measure based on Average Overlap similarity metric which can effectively handle missing values without modification of data. Further, the similarity measure is more reliable than partial distances and inherently possesses the properties required to perform SNMF. The experimental evaluation on real world datasets demonstrates that the proposed approach is efficient, scalable and shows significantly better performance compared to the existing techniques.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper proposes a hierarchical probabilistic model for ordinal matrix factorization. Unlike previous approaches, we model the ordinal nature of the data and take a principled approach to incorporating priors for the hidden variables. Two algorithms are presented for inference, one based on Gibbs sampling and one based on variational Bayes. Importantly, these algorithms may be implemented in the factorization of very large matrices with missing entries. The model is evaluated on a collaborative filtering task, where users have rated a collection of movies and the system is asked to predict their ratings for other movies. The Netflix data set is used for evaluation, which consists of around 100 million ratings. Using root mean-squared error (RMSE) as an evaluation metric, results show that the suggested model outperforms alternative factorization techniques. Results also show how Gibbs sampling outperforms variational Bayes on this task, despite the large number of ratings and model parameters. Matlab implementations of the proposed algorithms are available from cogsys.imm.dtu.dk/ordinalmatrixfactorization.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We consider the three-particle scattering S-matrix for the Landau-Lifshitz model by directly computing the set of the Feynman diagrams up to the second order. We show, following the analogous computations for the non-linear Schrdinger model [1, 2], that the three-particle S-matrix is factorizable in the first non-trivial order.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Nonnegative matrix factorization based methods provide one of the simplest and most effective approaches to text mining. However, their applicability is mainly limited to analyzing a single data source. In this paper, we propose a novel joint matrix factorization framework which can jointly analyze multiple data sources by exploiting their shared and individual structures. The proposed framework is flexible to handle any arbitrary sharing configurations encountered in real world data. We derive an efficient algorithm for learning the factorization and show that its convergence is theoretically guaranteed. We demonstrate the utility and effectiveness of the proposed framework in two real-world applications–improving social media retrieval using auxiliary sources and cross-social media retrieval. Representing each social media source using their textual tags, for both applications, we show that retrieval performance exceeds the existing state-of-the-art techniques. The proposed solution provides a generic framework and can be applicable to a wider context in data mining wherever one needs to exploit mutual and individual knowledge present across multiple data sources.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Recently, nonnegative matrix factorization (NMF) attracts more and more attentions for the promising of wide applications. A problem that still remains is that, however, the factors resulted from it may not necessarily be realistically interpretable. Some constraints are usually added to the standard NMF to generate such interpretive results. In this paper, a minimum-volume constrained NMF is proposed and an efficient multiplicative update algorithm is developed based on the natural gradient optimization. The proposed method can be applied to the blind source separation (BSS) problem, a hot topic with many potential applications, especially if the sources are mutually dependent. Simulation results of BSS for images show the superiority of the proposed method.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Nonnegative matrix factorization (NMF) is widely used in signal separation and image compression. Motivated by its successful applications, we propose a new cryptosystem based on NMF, where the nonlinear mixing (NLM) model with a strong noise is introduced for encryption and NMF is used for decryption. The security of the cryptosystem relies on following two facts: 1) the constructed multivariable nonlinear function is not invertible; 2) the process of NMF is unilateral, if the inverse matrix of the constructed linear mixing matrix is not nonnegative. Comparing with Lin's method (2006) that is a theoretical scheme using one-time padding in the cryptosystem, our cipher can be used repeatedly for the practical request, i.e., multitme padding is used in our cryptosystem. Also, there is no restriction on statistical characteristics of the ciphers and the plaintexts. Thus, more signals can be processed (successfully encrypted and decrypted), no matter they are correlative, sparse, or Gaussian. Furthermore, instead of the number of zero-crossing-based method that is often unstable in encryption and decryption, an improved method based on the kurtosis of the signals is introduced to solve permutation ambiguities in waveform reconstruction. Simulations are given to illustrate security and availability of our cryptosystem.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Nonnegative matrix factorization (NMF) is a widely used method for blind spectral unmixing (SU), which aims at obtaining the endmembers and corresponding fractional abundances, knowing only the collected mixing spectral data. It is noted that the abundance may be sparse (i.e., the endmembers may be with sparse distributions) and sparse NMF tends to lead to a unique result, so it is intuitive and meaningful to constrain NMF with sparseness for solving SU. However, due to the abundance sum-to-one constraint in SU, the traditional sparseness measured by L0/L1-norm is not an effective constraint any more. A novel measure (termed as S-measure) of sparseness using higher order norms of the signal vector is proposed in this paper. It features the physical significance. By using the S-measure constraint (SMC), a gradient-based sparse NMF algorithm (termed as NMF-SMC) is proposed for solving the SU problem, where the learning rate is adaptively selected, and the endmembers and abundances are simultaneously estimated. In the proposed NMF-SMC, there is no pure index assumption and no need to know the exact sparseness degree of the abundance in prior. Yet, it does not require the preprocessing of dimension reduction in which some useful information may be lost. Experiments based on synthetic mixtures and real-world images collected by AVIRIS and HYDICE sensors are performed to evaluate the validity of the proposed method.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Online blind source separation (BSS) is proposed to overcome the high computational cost problem, which limits the practical applications of traditional batch BSS algorithms. However, the existing online BSS methods are mainly used to separate independent or uncorrelated sources. Recently, nonnegative matrix factorization (NMF) shows great potential to separate the correlative sources, where some constraints are often imposed to overcome the non-uniqueness of the factorization. In this paper, an incremental NMF with volume constraint is derived and utilized for solving online BSS. The volume constraint to the mixing matrix enhances the identifiability of the sources, while the incremental learning mode reduces the computational cost. The proposed method takes advantage of the natural gradient based multiplication updating rule, and it performs especially well in the recovery of dependent sources. Simulations in BSS for dual-energy X-ray images, online encrypted speech signals, and high correlative face images show the validity of the proposed method.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Spectral unmixing (SU) is an emerging problem in the remote sensing image processing. Since both the endmember signatures and their abundances have nonnegative values, it is a natural choice to employ the attractive nonnegative matrix factorization (NMF) methods to solve this problem. Motivated by that the abundances are sparse, the NMF with local smoothness constraint (NMF-LSC) is proposed in this paper. In the proposed method, the smoothness constraint is utilized to impose the sparseness, instead of the traditional L1-norm which is restricted by the underlying column-sum-to-one requirement of the to the abundance matrix. Simulations show the advantages of our algorithm over the compared methods.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Nonnegative matrix factorization based methods provide one of the simplest and most effective approaches to text mining. However, their applicability is mainly limited to analyzing a single data source. In this chapter, we propose a novel joint matrix factorization framework which can jointly analyze multiple data sources by exploiting their shared and individual structures. The proposed framework is flexible to handle any arbitrary sharing configurations encountered in real world data. We derive an efficient algorithm for learning the factorization and show that its convergence is theoretically guaranteed. We demonstrate the utility and effectiveness of the proposed framework in two real-world applications—improving social media retrieval using auxiliary sources and cross-social media retrieval. Representing each social media source using their textual tags, for both applications, we show that retrieval performance exceeds the existing state-of-the-art techniques. The proposed solution provides a generic framework and can be applicable to a wider context in data mining wherever one needs to exploit mutual and individual knowledge present across multiple data sources.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Medical imaging has become an absolutely essential diagnostic tool for clinical practices; at present, pathologies can be detected with an earliness never before known. Its use has not only been relegated to the field of radiology but also, increasingly, to computer-based imaging processes prior to surgery. Motion analysis, in particular, plays an important role in analyzing activities or behaviors of live objects in medicine. This short paper presents several low-cost hardware implementation approaches for the new generation of tablets and/or smartphones for estimating motion compensation and segmentation in medical images. These systems have been optimized for breast cancer diagnosis using magnetic resonance imaging technology with several advantages over traditional X-ray mammography, for example, obtaining patient information during a short period. This paper also addresses the challenge of offering a medical tool that runs on widespread portable devices, both on tablets and/or smartphones to aid in patient diagnostics.