992 resultados para MATRIX FACTORIZATION


Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents a fast part-based subspace selection algorithm, termed the binary sparse nonnegative matrix factorization (B-SNMF). Both the training process and the testing process of B-SNMF are much faster than those of binary principal component analysis (B-PCA). Besides, B-SNMF is more robust to occlusions in images. Experimental results on face images demonstrate the effectiveness and the efficiency of the proposed B-SNMF.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Spectral unmixing (SU) is a technique to characterize mixed pixels of the hyperspectral images measured by remote sensors. Most of the existing spectral unmixing algorithms are developed using the linear mixing models. Since the number of endmembers/materials present at each mixed pixel is normally scanty compared with the number of total endmembers (the dimension of spectral library), the problem becomes sparse. This thesis introduces sparse hyperspectral unmixing methods for the linear mixing model through two different scenarios. In the first scenario, the library of spectral signatures is assumed to be known and the main problem is to find the minimum number of endmembers under a reasonable small approximation error. Mathematically, the corresponding problem is called the $\ell_0$-norm problem which is NP-hard problem. Our main study for the first part of thesis is to find more accurate and reliable approximations of $\ell_0$-norm term and propose sparse unmixing methods via such approximations. The resulting methods are shown considerable improvements to reconstruct the fractional abundances of endmembers in comparison with state-of-the-art methods such as having lower reconstruction errors. In the second part of the thesis, the first scenario (i.e., dictionary-aided semiblind unmixing scheme) will be generalized as the blind unmixing scenario that the library of spectral signatures is also estimated. We apply the nonnegative matrix factorization (NMF) method for proposing new unmixing methods due to its noticeable supports such as considering the nonnegativity constraints of two decomposed matrices. Furthermore, we introduce new cost functions through some statistical and physical features of spectral signatures of materials (SSoM) and hyperspectral pixels such as the collaborative property of hyperspectral pixels and the mathematical representation of the concentrated energy of SSoM for the first few subbands. Finally, we introduce sparse unmixing methods for the blind scenario and evaluate the efficiency of the proposed methods via simulations over synthetic and real hyperspectral data sets. The results illustrate considerable enhancements to estimate the spectral library of materials and their fractional abundances such as smaller values of spectral angle distance (SAD) and abundance angle distance (AAD) as well.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Nonnegative matrix factorization (NMF) is a hot topic in machine learning and data processing. Recently, a constrained version, non-smooth NMF (NsNMF), shows a great potential in learning meaningful sparse representation of the observed data. However, it suffers from a slow linear convergence rate, discouraging its applications to large-scale data representation. In this paper, a fast NsNMF (FNsNMF) algorithm is proposed to speed up NsNMF. In the proposed method, it first shows that the cost function of the derived sub-problem is convex and the corresponding gradient is Lipschitz continuous. Then, the optimization to this function is replaced by solving a proximal function, which is designed based on the Lipschitz constant and can be solved through utilizing a constructed fast convergent sequence. Due to the usage of the proximal function and its efficient optimization, our method can achieve a nonlinear convergence rate, much faster than NsNMF. Simulations in both computer generated data and the real-world data show the advantages of our algorithm over the compared methods.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Granger causality is increasingly being applied to multi-electrode neurophysiological and functional imaging data to characterize directional interactions between neurons and brain regions. For a multivariate dataset, one might be interested in different subsets of the recorded neurons or brain regions. According to the current estimation framework, for each subset, one conducts a separate autoregressive model fitting process, introducing the potential for unwanted variability and uncertainty. In this paper, we propose a multivariate framework for estimating Granger causality. It is based on spectral density matrix factorization and offers the advantage that the estimation of such a matrix needs to be done only once for the entire multivariate dataset. For any subset of recorded data, Granger causality can be calculated through factorizing the appropriate submatrix of the overall spectral density matrix.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Narrative text is a useful way of identifying injury circumstances from the routine emergency department data collections. Automatically classifying narratives based on machine learning techniques is a promising technique, which can consequently reduce the tedious manual classification process. Existing works focus on using Naive Bayes which does not always offer the best performance. This paper proposes the Matrix Factorization approaches along with a learning enhancement process for this task. The results are compared with the performance of various other classification approaches. The impact on the classification results from the parameters setting during the classification of a medical text dataset is discussed. With the selection of right dimension k, Non Negative Matrix Factorization-model method achieves 10 CV accuracy of 0.93.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Feature track matrix factorization based methods have been attractive solutions to the Structure-front-motion (Sfnl) problem. Group motion of the feature points is analyzed to get the 3D information. It is well known that the factorization formulations give rise to rank deficient system of equations. Even when enough constraints exist, the extracted models are sparse due the unavailability of pixel level tracks. Pixel level tracking of 3D surfaces is a difficult problem, particularly when the surface has very little texture as in a human face. Only sparsely located feature points can be tracked and tracking error arc inevitable along rotating lose texture surfaces. However, the 3D models of an object class lie in a subspace of the set of all possible 3D models. We propose a novel solution to the Structure-from-motion problem which utilizes the high-resolution 3D obtained from range scanner to compute a basis for this desired subspace. Adding subspace constraints during factorization also facilitates removal of tracking noise which causes distortions outside the subspace. We demonstrate the effectiveness of our formulation by extracting dense 3D structure of a human face and comparing it with a well known Structure-front-motion algorithm due to Brand.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

On étudie l’application des algorithmes de décomposition matricielles tel que la Factorisation Matricielle Non-négative (FMN), aux représentations fréquentielles de signaux audio musicaux. Ces algorithmes, dirigés par une fonction d’erreur de reconstruction, apprennent un ensemble de fonctions de base et un ensemble de coef- ficients correspondants qui approximent le signal d’entrée. On compare l’utilisation de trois fonctions d’erreur de reconstruction quand la FMN est appliquée à des gammes monophoniques et harmonisées: moindre carré, divergence Kullback-Leibler, et une mesure de divergence dépendente de la phase, introduite récemment. Des nouvelles méthodes pour interpréter les décompositions résultantes sont présentées et sont comparées aux méthodes utilisées précédemment qui nécessitent des connaissances du domaine acoustique. Finalement, on analyse la capacité de généralisation des fonctions de bases apprises par rapport à trois paramètres musicaux: l’amplitude, la durée et le type d’instrument. Pour ce faire, on introduit deux algorithmes d’étiquetage des fonctions de bases qui performent mieux que l’approche précédente dans la majorité de nos tests, la tâche d’instrument avec audio monophonique étant la seule exception importante.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Least square problem with l1 regularization has been proposed as a promising method for sparse signal reconstruction (e.g., basis pursuit de-noising and compressed sensing) and feature selection (e.g., the Lasso algorithm) in signal processing, statistics, and related fields. These problems can be cast as l1-regularized least-square program (LSP). In this paper, we propose a novel monotonic fixed point method to solve large-scale l1-regularized LSP. And we also prove the stability and convergence of the proposed method. Furthermore we generalize this method to least square matrix problem and apply it in nonnegative matrix factorization (NMF). The method is illustrated on sparse signal reconstruction, partner recognition and blind source separation problems, and the method tends to convergent faster and sparser than other l1-regularized algorithms.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The multi-criteria decision making methods, Preference METHods for Enrichment Evaluation (PROMETHEE) and Graphical Analysis for Interactive Assistance (GAIA), and the two-way Positive Matrix Factorization (PMF) receptor model were applied to airborne fine particle compositional data collected at three sites in Hong Kong during two monitoring campaigns held from November 2000 to October 2001 and November 2004 to October 2005. PROMETHEE/GAIA indicated that the three sites were worse during the later monitoring campaign, and that the order of the air quality at the sites during each campaign was: rural site > urban site > roadside site. The PMF analysis on the other hand, identified 6 common sources at all of the sites (diesel vehicle, fresh sea salt, secondary sulphate, soil, aged sea salt and oil combustion) which accounted for approximately 68.8 ± 8.7% of the fine particle mass at the sites. In addition, road dust, gasoline vehicle, biomass burning, secondary nitrate, and metal processing were identified at some of the sites. Secondary sulphate was found to be the highest contributor to the fine particle mass at the rural and urban sites with vehicle emission as a high contributor to the roadside site. The PMF results are broadly similar to those obtained in a previous analysis by PCA/APCS. However, the PMF analysis resolved more factors at each site than the PCA/APCS. In addition, the study demonstrated that combined results from multi-criteria decision making analysis and receptor modelling can provide more detailed information that can be used to formulate the scientific basis for mitigating air pollution in the region.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Purpose: To investigate the significance of sources around measurement sites, assist the development of control strategies for the important sources and mitigate the adverse effects of air pollution due to particle size. Methods: In this study, sampling was conducted at two sites located in urban/industrial and residential areas situated at roadsides along the Brisbane Urban Corridor. Ultrafine and fine particle measurements obtained at the two sites in June-July 2002 were analysed by Positive Matrix Factorization (PMF). Results: Six sources were present, including local traffic, two traffic sources, biomass burning, and two currently unidentified sources. Secondary particles had a significant impact at Site 1, while nitrates, peak traffic hours and main roads located close to the source also affected the results for both sites. Conclusions: This significant traffic corridor exemplifies the type of sources present in heavily trafficked locations and future attempts to control pollution in this type of environment could focus on the sources that were identified.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The aim of this paper is to provide a comparison of various algorithms and parameters to build reduced semantic spaces. The effect of dimension reduction, the stability of the representation and the effect of word order are examined in the context of the five algorithms bearing on semantic vectors: Random projection (RP), singular value decom- position (SVD), non-negative matrix factorization (NMF), permutations and holographic reduced representations (HRR). The quality of semantic representation was tested by means of synonym finding task using the TOEFL test on the TASA corpus. Dimension reduction was found to improve the quality of semantic representation but it is hard to find the optimal parameter settings. Even though dimension reduction by RP was found to be more generally applicable than SVD, the semantic vectors produced by RP are somewhat unstable. The effect of encoding word order into the semantic vector representation via HRR did not lead to any increase in scores over vectors constructed from word co-occurrence in context information. In this regard, very small context windows resulted in better semantic vectors for the TOEFL test.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Topic recommendation can help users deal with the information overload issue in micro-blogging communities. This paper proposes to use the implicit information network formed by the multiple relationships among users, topics and micro-blogs, and the temporal information of micro-blogs to find semantically and temporally relevant topics of each topic, and to profile users' time-drifting topic interests. The Content based, Nearest Neighborhood based and Matrix Factorization models are used to make personalized recommendations. The effectiveness of the proposed approaches is demonstrated in the experiments conducted on a real world dataset that collected from Twitter.com.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Robust hashing is an emerging field that can be used to hash certain data types in applications unsuitable for traditional cryptographic hashing methods. Traditional hashing functions have been used extensively for data/message integrity, data/message authentication, efficient file identification and password verification. These applications are possible because the hashing process is compressive, allowing for efficient comparisons in the hash domain but non-invertible meaning hashes can be used without revealing the original data. These techniques were developed with deterministic (non-changing) inputs such as files and passwords. For such data types a 1-bit or one character change can be significant, as a result the hashing process is sensitive to any change in the input. Unfortunately, there are certain applications where input data are not perfectly deterministic and minor changes cannot be avoided. Digital images and biometric features are two types of data where such changes exist but do not alter the meaning or appearance of the input. For such data types cryptographic hash functions cannot be usefully applied. In light of this, robust hashing has been developed as an alternative to cryptographic hashing and is designed to be robust to minor changes in the input. Although similar in name, robust hashing is fundamentally different from cryptographic hashing. Current robust hashing techniques are not based on cryptographic methods, but instead on pattern recognition techniques. Modern robust hashing algorithms consist of feature extraction followed by a randomization stage that introduces non-invertibility and compression, followed by quantization and binary encoding to produce a binary hash output. In order to preserve robustness of the extracted features, most randomization methods are linear and this is detrimental to the security aspects required of hash functions. Furthermore, the quantization and encoding stages used to binarize real-valued features requires the learning of appropriate quantization thresholds. How these thresholds are learnt has an important effect on hashing accuracy and the mere presence of such thresholds are a source of information leakage that can reduce hashing security. This dissertation outlines a systematic investigation of the quantization and encoding stages of robust hash functions. While existing literature has focused on the importance of quantization scheme, this research is the first to emphasise the importance of the quantizer training on both hashing accuracy and hashing security. The quantizer training process is presented in a statistical framework which allows a theoretical analysis of the effects of quantizer training on hashing performance. This is experimentally verified using a number of baseline robust image hashing algorithms over a large database of real world images. This dissertation also proposes a new randomization method for robust image hashing based on Higher Order Spectra (HOS) and Radon projections. The method is non-linear and this is an essential requirement for non-invertibility. The method is also designed to produce features more suited for quantization and encoding. The system can operate without the need for quantizer training, is more easily encoded and displays improved hashing performance when compared to existing robust image hashing algorithms. The dissertation also shows how the HOS method can be adapted to work with biometric features obtained from 2D and 3D face images.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

An Aerodyne Aerosol Mass Spectrometer was deployed at five urban schools to examine spatial and temporal variability of organic aerosols (OA) and positive matrix factorization (PMF) used for the first time in the Southern Hemisphere to apportion the sources of the OA across an urban area. The sources identified included hydrocarbon-like OA (HOA), biomass burning OA (BBOA) and oxygenated OA (OOA). At all sites, the main source was OOA, which accounted for 62–73% of the total OA mass and was generally more oxidized compared to those reported in the Northern Hemisphere. This suggests that there are differences in aging processes or regional sources in the two hemispheres. Unlike HOA and BBOA, OOA demonstrated instructive temporal variations but not spatial variation across the urban area. Application of cluster analysis to the PMF-derived sources offered a simple and effective method for qualitative comparison of PMF sources that can be used in other studies.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Description of a patient's injuries is recorded in narrative text form by hospital emergency departments. For statistical reporting, this text data needs to be mapped to pre-defined codes. Existing research in this field uses the Naïve Bayes probabilistic method to build classifiers for mapping. In this paper, we focus on providing guidance on the selection of a classification method. We build a number of classifiers belonging to different classification families such as decision tree, probabilistic, neural networks, and instance-based, ensemble-based and kernel-based linear classifiers. An extensive pre-processing is carried out to ensure the quality of data and, in hence, the quality classification outcome. The records with a null entry in injury description are removed. The misspelling correction process is carried out by finding and replacing the misspelt word with a soundlike word. Meaningful phrases have been identified and kept, instead of removing the part of phrase as a stop word. The abbreviations appearing in many forms of entry are manually identified and only one form of abbreviations is used. Clustering is utilised to discriminate between non-frequent and frequent terms. This process reduced the number of text features dramatically from about 28,000 to 5000. The medical narrative text injury dataset, under consideration, is composed of many short documents. The data can be characterized as high-dimensional and sparse, i.e., few features are irrelevant but features are correlated with one another. Therefore, Matrix factorization techniques such as Singular Value Decomposition (SVD) and Non Negative Matrix Factorization (NNMF) have been used to map the processed feature space to a lower-dimensional feature space. Classifiers with these reduced feature space have been built. In experiments, a set of tests are conducted to reflect which classification method is best for the medical text classification. The Non Negative Matrix Factorization with Support Vector Machine method can achieve 93% precision which is higher than all the tested traditional classifiers. We also found that TF/IDF weighting which works well for long text classification is inferior to binary weighting in short document classification. Another finding is that the Top-n terms should be removed in consultation with medical experts, as it affects the classification performance.