999 resultados para ordinal matrix factorisation


Relevância:

100.00% 100.00%

Publicador:

Resumo:

This thesis addressed issues that have prevented qualitative researchers from using thematic discovery algorithms. The central hypothesis evaluated whether allowing qualitative researchers to interact with thematic discovery algorithms and incorporate domain knowledge improved their ability to address research questions and trust the derived themes. Non-negative Matrix Factorisation and Latent Dirichlet Allocation find latent themes within document collections but these algorithms are rarely used, because qualitative researchers do not trust and cannot interact with the themes that are automatically generated. The research determined the types of interactivity that qualitative researchers require and then evaluated interactive algorithms that matched these requirements. Theoretical contributions included the articulation of design guidelines for interactive thematic discovery algorithms, the development of an Evaluation Model and a Conceptual Framework for Interactive Content Analysis.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper we formulate the nonnegative matrix factorisation (NMF) problem as a maximum likelihood estimation problem for hidden Markov models and propose online expectation-maximisation (EM) algorithms to estimate the NMF and the other unknown static parameters. We also propose a sequential Monte Carlo approximation of our online EM algorithm. We show the performance of the proposed method with two numerical examples. © 2012 IFAC.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper proposes a hierarchical probabilistic model for ordinal matrix factorization. Unlike previous approaches, we model the ordinal nature of the data and take a principled approach to incorporating priors for the hidden variables. Two algorithms are presented for inference, one based on Gibbs sampling and one based on variational Bayes. Importantly, these algorithms may be implemented in the factorization of very large matrices with missing entries. The model is evaluated on a collaborative filtering task, where users have rated a collection of movies and the system is asked to predict their ratings for other movies. The Netflix data set is used for evaluation, which consists of around 100 million ratings. Using root mean-squared error (RMSE) as an evaluation metric, results show that the suggested model outperforms alternative factorization techniques. Results also show how Gibbs sampling outperforms variational Bayes on this task, despite the large number of ratings and model parameters. Matlab implementations of the proposed algorithms are available from cogsys.imm.dtu.dk/ordinalmatrixfactorization.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Ordinal data is omnipresent in almost all multiuser-generated feedback - questionnaires, preferences etc. This paper investigates modelling of ordinal data with Gaussian restricted Boltzmann machines (RBMs). In particular, we present the model architecture, learning and inference procedures for both vector-variate and matrix-variate ordinal data. We show that our model is able to capture latent opinion profile of citizens around the world, and is competitive against state-of-art collaborative filtering techniques on large-scale public datasets. The model thus has the potential to extend application of RBMs to diverse domains such as recommendation systems, product reviews and expert assessments.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Recommender Systems heavily rely on numerical preferences, whereas the importance of ordinal preferences has only been recognised in recent works of Ordinal Matrix Factorisation (OMF). Although the OMF can effectively exploit ordinal properties, it captures only the higher-order interactions among users and items, without considering the localised interactions properly. This paper employs Markov Random Fields (MRF) to investigate the localised interactions, and proposes a unified model called Ordinal Random Fields (ORF) to take advantages of both the representational power of the MRF and the ease of modelling ordinal preferences by the OMF. Experimental result on public datasets demonstrates that the proposed ORF model can capture both types of interactions, resulting in improved recommendation accuracy.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This article explores two matrix methods to induce the ``shades of meaning" (SoM) of a word. A matrix representation of a word is computed from a corpus of traces based on the given word. Non-negative Matrix Factorisation (NMF) and Singular Value Decomposition (SVD) compute a set of vectors corresponding to a potential shade of meaning. The two methods were evaluated based on loss of conditional entropy with respect to two sets of manually tagged data. One set reflects concepts generally appearing in text, and the second set comprises words used for investigations into word sense disambiguation. Results show that for NMF consistently outperforms SVD for inducing both SoM of general concepts as well as word senses. The problem of inducing the shades of meaning of a word is more subtle than that of word sense induction and hence relevant to thematic analysis of opinion where nuances of opinion can arise.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This work applies a variety of multilinear function factorisation techniques to extract appropriate features or attributes from high dimensional multivariate time series for classification. Recently, a great deal of work has centred around designing time series classifiers using more and more complex feature extraction and machine learning schemes. This paper argues that complex learners and domain specific feature extraction schemes of this type are not necessarily needed for time series classification, as excellent classification results can be obtained by simply applying a number of existing matrix factorisation or linear projection techniques, which are simple and computationally inexpensive. We highlight this using a geometric separability measure and classification accuracies obtained though experiments on four different high dimensional multivariate time series datasets. © 2013 IEEE.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This overview focuses on the application of chemometrics techniques for the investigation of soils contaminated by polycyclic aromatic hydrocarbons (PAHs) and metals because these two important and very diverse groups of pollutants are ubiquitous in soils. The salient features of various studies carried out in the micro- and recreational environments of humans, are highlighted in the context of the various multivariate statistical techniques available across discipline boundaries that have been effectively used in soil studies. Particular attention is paid to techniques employed in the geosciences that may be effectively utilized for environmental soil studies; classical multivariate approaches that may be used in isolation or as complementary methods to these are also discussed. Chemometrics techniques widely applied in atmospheric studies for identifying sources of pollutants or for determining the importance of contaminant source contributions to a particular site, have seen little use in soil studies, but may be effectively employed in such investigations. Suitable programs are also available for suggesting mitigating measures in cases of soil contamination, and these are also considered. Specific techniques reviewed include pattern recognition techniques such as Principal Components Analysis (PCA), Fuzzy Clustering (FC) and Cluster Analysis (CA); geostatistical tools include variograms, Geographical Information Systems (GIS), contour mapping and kriging; source identification and contribution estimation methods reviewed include Positive Matrix Factorisation (PMF), and Principal Component Analysis on Absolute Principal Component Scores (PCA/APCS). Mitigating measures to limit or eliminate pollutant sources may be suggested through the use of ranking analysis and multi criteria decision making methods (MCDM). These methods are mainly represented in this review by studies employing the Preference Ranking Organisation Method for Enrichment Evaluation (PROMETHEE) and its associated graphic output, Geometrical Analysis for Interactive Aid (GAIA).

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Airborne fine particles were collected at a suburban site in Queensland, Australia between 1995 and 2003. The samples were analysed for 21 elements, and Positive Matrix Factorisation (PMF), Preference Ranking Organisation METHods for Enrichment Evaluation (PROMETHEE) and Graphical Analysis for Interactive Assistance (GAIA) were applied to the data. PROMETHEE provided information on the ranking of pollutant levels from the sampling years while PMF provided insights into the sources of the pollutants, their chemical composition, most likely locations and relative contribution to the levels of particulate pollution at the site. PROMETHEE and GAIA found that the removal of lead from fuel in the area had a significant impact on the pollution patterns while PMF identified 6 pollution sources including: Railways (5.5%), Biomass Burning (43.3%), Soil (9.2%), Sea Salt (15.6%), Aged Sea Salt (24.4%) and Motor Vehicles (2.0%). Thus the results gave information that can assist in the formulation of mitigation measures for air pollution.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

House dust is a heterogeneous matrix, which contains a number of biological materials and particulate matter gathered from several sources. It is the accumulation of a number of semi-volatile and non-volatile contaminants. The contaminants are trapped and preserved. Therefore, house dust can be viewed as an archive of both the indoor and outdoor air pollution. There is evidence to show that on average, people tend to stay indoors most of the time and this increases exposure to house dust. The aims of this investigation were to: " assess the levels of Polycyclic Aromatic Hydrocarbons (PAHs), elements and pesticides in the indoor environment of the Brisbane area; " identify and characterise the possible sources of elemental constituents (inorganic elements), PAHs and pesticides by means of Positive Matrix Factorisation (PMF); and " establish the correlations between the levels of indoor air pollutants (PAHs, elements and pesticides) with the external and internal characteristics or attributes of the buildings and indoor activities by means of multivariate data analysis techniques. The dust samples were collected during the period of 2005-2007 from homes located in different suburbs of Brisbane, Ipswich and Toowoomba, in South East Queensland, Australia. A vacuum cleaner fitted with a paper bag was used as a sampler for collecting the house dust. A survey questionnaire was filled by the house residents which contained information about the indoor and outdoor characteristics of their residences. House dust samples were analysed for three different pollutants: Pesticides, Elements and PAHs. The analyses were carried-out for samples of particle size less than 250 µm. The chemical analyses for both pesticides and PAHs were performed using a Gas Chromatography Mass Spectrometry (GC-MS), while elemental analysis was carried-out by using Inductively-Coupled Plasma-Mass Spectroscopy (ICP-MS). The data was subjected to multivariate data analysis techniques such as multi-criteria decision-making procedures, Preference Ranking Organisation Method for Enrichment Evaluations (PROMETHEE), coupled with Geometrical Analysis for Interactive Aid (GAIA) in order to rank the samples and to examine data display. This study showed that compared to the results from previous works, which were carried-out in Australia and overseas, the concentrations of pollutants in house dusts in Brisbane and the surrounding areas were relatively very high. The results of this work also showed significant correlations between some of the physical parameters (types of building material, floor level, distance from industrial areas and major road, and smoking) and the concentrations of pollutants. Types of building materials and the age of houses were found to be two of the primary factors that affect the concentrations of pesticides and elements in house dust. The concentrations of these two types of pollutant appear to be higher in old houses (timber houses) than in the brick ones. In contrast, the concentrations of PAHs were noticed to be higher in brick houses than in the timber ones. Other factors such as floor level, and distance from the main street and industrial area, also affected the concentrations of pollutants in the house dust samples. To apportion the sources and to understand mechanisms of pollutants, Positive Matrix Factorisation (PMF) receptor model was applied. The results showed that there were significant correlations between the degree of concentration of contaminants in house dust and the physical characteristics of houses, such as the age and the type of the house, the distance from the main road and industrial areas, and smoking. Sources of pollutants were identified. For PAHs, the sources were cooking activities, vehicle emissions, smoking, oil fumes, natural gas combustion and traces of diesel exhaust emissions; for pesticides the sources were application of pesticides for controlling termites in buildings and fences, treating indoor furniture and in gardens for controlling pests attacking horticultural and ornamental plants; for elements the sources were soil, cooking, smoking, paints, pesticides, combustion of motor fuels, residual fuel oil, motor vehicle emissions, wearing down of brake linings and industrial activities.