992 resultados para Matrix factorization


Relevância:

60.00% 60.00%

Publicador:

Resumo:

Joint modeling of related data sources has the potential to improve various data mining tasks such as transfer learning, multitask clustering, information retrieval etc. However, diversity among various data sources might outweigh the advantages of the joint modeling, and thus may result in performance degradations. To this end, we propose a regularized shared subspace learning framework, which can exploit the mutual strengths of related data sources while being immune to the effects of the variabilities of each source. This is achieved by further imposing a mutual orthogonality constraint on the constituent subspaces which segregates the common patterns from the source specific patterns, and thus, avoids performance degradations. Our approach is rooted in nonnegative matrix factorization and extends it further to enable joint analysis of related data sources. Experiments performed using three real world data sets for both retrieval and clustering applications demonstrate the benefits of regularization and validate the effectiveness of the model. Our proposed solution provides a formal framework appropriate for jointly analyzing related data sources and therefore, it is applicable to a wider context in data mining.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Although tagging has become increasingly popular in online image and video sharing systems, tags are known to be noisy, ambiguous, incomplete and subjective. These factors can seriously affect the precision of a social tag-based web retrieval system. Therefore improving the precision performance of these social tag-based web retrieval systems has become an increasingly important research topic. To this end, we propose a shared subspace learning framework to leverage a secondary source to improve retrieval performance from a primary dataset. This is achieved by learning a shared subspace between the two sources under a joint Nonnegative Matrix Factorization in which the level of subspace sharing can be explicitly controlled. We derive an efficient algorithm for learning the factorization, analyze its complexity, and provide proof of convergence. We validate the framework on image and video retrieval tasks in which tags from the LabelMe dataset are used to improve image retrieval performance from a Flickr dataset and video retrieval performance from a YouTube dataset. This has implications for how to exploit and transfer knowledge from readily available auxiliary tagging resources to improve another social web retrieval system. Our shared subspace learning framework is applicable to a range of problems where one needs to exploit the strengths existing among multiple and heterogeneous datasets.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper presents a novel Bayesian formulation to exploit shared structures across multiple data sources, constructing foundations for effective mining and retrieval across disparate domains. We jointly analyze diverse data sources using a unifying piece of metadata (textual tags). We propose a method based on Bayesian Probabilistic Matrix Factorization (BPMF) which is able to explicitly model the partial knowledge common to the datasets using shared subspaces and the knowledge specific to each dataset using individual subspaces. For the proposed model, we derive an efficient algorithm for learning the joint factorization based on Gibbs sampling. The effectiveness of the model is demonstrated by social media retrieval tasks across single and multiple media. The proposed solution is applicable to a wider context, providing a formal framework suitable for exploiting individual as well as mutual knowledge present across heterogeneous data sources of many kinds.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We propose a nonparametric Bayesian, linear Poisson gamma model for count data and use it for dictionary learning. A key property of this model is that it captures the parts-based representation similar to nonnegative matrix factorization. We present an auxiliary variable Gibbs sampler, which turns the intractable inference into a tractable one. Combining this inference procedure with the slice sampler of Indian buffet process, we show that our model can learn the number of factors automatically. Using synthetic and real-world datasets, we show that the proposed model outperforms other state-of-the-art nonparametric factor models.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Unlike in general recommendation scenarios where a user has only a single role, users in trust rating network, e.g. Epinions, are associated with two different roles simultaneously: as a truster and as a trustee. With different roles, users can show distinct preferences for rating items, which the previous approaches do not involve. Moreover, based on explicit single links between two users, existing methods can not capture the implicit correlation between two users who are similar but not socially connected. In this paper, we propose to learn dual role preferences (truster/trustee-specific preferences) for trust-aware recommendation by modeling explicit interactions (e.g., rating and trust) and implicit interactions. In particular, local links structure of trust network are exploited as two regularization terms to capture the implicit user correlation, in terms of truster/trustee-specific preferences. Using a real-world and open dataset, we conduct a comprehensive experimental study to investigate the performance of the proposed model, RoRec. The results show that RoRec outperforms other trust-aware recommendation approaches, in terms of prediction accuracy. Copyright 2014 ACM.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Autism Spectrum Disorder (ASD) is growing at a staggering rate, but, little is known about the cause of this condition. Inferring learning patterns from therapeutic performance data, and subsequently clustering ASD children into subgroups, is important to understand this domain, and more importantly to inform evidence-based intervention. However, this data-driven task was difficult in the past due to insufficiency of data to perform reliable analysis. For the first time, using data from a recent application for early intervention in autism (TOBY Play pad), whose download count is now exceeding 4500, we present in this paper the automatic discovery of learning patterns across 32 skills in sensory, imitation and language. We use unsupervised learning methods for this task, but a notorious problem with existing methods is the correct specification of number of patterns in advance, which in our case is even more difficult due to complexity of the data. To this end, we appeal to recent Bayesian nonparametric methods, in particular the use of Bayesian Nonparametric Factor Analysis. This model uses Indian Buffet Process (IBP) as prior on a binary matrix of infinite columns to allocate groups of intervention skills to children. The optimal number of learning patterns as well as subgroup assignments are inferred automatically from data. Our experimental results follow an exploratory approach, present different newly discovered learning patterns. To provide quantitative results, we also report the clustering evaluation against K-means and Nonnegative matrix factorization (NMF). In addition to the novelty of this new problem, we were able to demonstrate the suitability of Bayesian nonparametric models over parametric rivals.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper presents a convex geometry (CG)-based method for blind separation of nonnegative sources. First, the unaccessible source matrix is normalized to be column-sum-to-one by mapping the available observation matrix. Then, its zero-samples are found by searching the facets of the convex hull spanned by the mapped observations. Considering these zero-samples, a quadratic cost function with respect to each row of the unmixing matrix, together with a linear constraint in relation to the involved variables, is proposed. Upon which, an algorithm is presented to estimate the unmixing matrix by solving a classical convex optimization problem. Unlike the traditional blind source separation (BSS) methods, the CG-based method does not require the independence assumption, nor the uncorrelation assumption. Compared with the BSS methods that are specifically designed to distinguish between nonnegative sources, the proposed method requires a weaker sparsity condition. Provided simulation results illustrate the performance of our method.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

As a popular heuristic to the matrix rank minimization problem, nuclear norm minimization attracts intensive research attentions. Matrix factorization based algorithms can reduce the expensive computation cost of SVD for nuclear norm minimization. However, most matrix factorization based algorithms fail to provide the theoretical guarantee for convergence caused by their non-unique factorizations. This paper proposes an efficient and accurate Linearized Grass-mannian Optimization (Lingo) algorithm, which adopts matrix factorization and Grassmann manifold structure to alternatively minimize the subproblems. More specially, linearization strategy makes the auxiliary variables unnecessary and guarantees the close-form solution for low periteration complexity. Lingo then converts linearized objective function into a nuclear norm minimization over Grass-mannian manifold, which could remedy the non-unique of solution for the low-rank matrix factorization. Extensive comparison experiments demonstrate the accuracy and efficiency of Lingo algorithm. The global convergence of Lingo is guaranteed with theoretical proof, which also verifies the effectiveness of Lingo.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In this work, new tools in atmospheric pollutant sampling and analysis were applied in order to go deeper in source apportionment study. The project was developed mainly by the study of atmospheric emission sources in a suburban area influenced by a municipal solid waste incinerator (MSWI), a medium-sized coastal tourist town and a motorway. Two main research lines were followed. For what concerns the first line, the potentiality of the use of PM samplers coupled with a wind select sensor was assessed. Results showed that they may be a valid support in source apportionment studies. However, meteorological and territorial conditions could strongly affect the results. Moreover, new markers were investigated, particularly focusing on the processes of biomass burning. OC revealed a good biomass combustion process indicator, as well as all determined organic compounds. Among metals, lead and aluminium are well related to the biomass combustion. Surprisingly PM was not enriched of potassium during bonfire event. The second research line consists on the application of Positive Matrix factorization (PMF), a new statistical tool in data analysis. This new technique was applied to datasets which refer to different time resolution data. PMF application to atmospheric deposition fluxes identified six main sources affecting the area. The incinerator’s relative contribution seemed to be negligible. PMF analysis was then applied to PM2.5 collected with samplers coupled with a wind select sensor. The higher number of determined environmental indicators allowed to obtain more detailed results on the sources affecting the area. Vehicular traffic revealed the source of greatest concern for the study area. Also in this case, incinerator’s relative contribution seemed to be negligible. Finally, the application of PMF analysis to hourly aerosol data demonstrated that the higher the temporal resolution of the data was, the more the source profiles were close to the real one.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Il crescente utilizzo di sistemi di analisi high-throughput per lo studio dello stato fisiologico e metabolico del corpo, ha evidenziato che una corretta alimentazione e una buona forma fisica siano fattori chiave per la salute. L'aumento dell'età media della popolazione evidenzia l'importanza delle strategie di contrasto delle patologie legate all'invecchiamento. Una dieta sana è il primo mezzo di prevenzione per molte patologie, pertanto capire come il cibo influisce sul corpo umano è di fondamentale importanza. In questo lavoro di tesi abbiamo affrontato la caratterizzazione dei sistemi di imaging radiografico Dual-energy X-ray Absorptiometry (DXA). Dopo aver stabilito una metodologia adatta per l'elaborazione di dati DXA su un gruppo di soggetti sani non obesi, la PCA ha evidenziato alcune proprietà emergenti dall'interpretazione delle componenti principali in termini delle variabili di composizione corporea restituite dalla DXA. Le prime componenti sono associabili ad indici macroscopici di descrizione corporea (come BMI e WHR). Queste componenti sono sorprendentemente stabili al variare dello status dei soggetti in età, sesso e nazionalità. Dati di analisi metabolica, ottenuti tramite Magnetic Resonance Spectroscopy (MRS) su campioni di urina, sono disponibili per circa mille anziani (provenienti da cinque paesi europei) di età compresa tra i 65 ed i 79 anni, non affetti da patologie gravi. I dati di composizione corporea sono altresì presenti per questi soggetti. L'algoritmo di Non-negative Matrix Factorization (NMF) è stato utilizzato per esprimere gli spettri MRS come combinazione di fattori di base interpretabili come singoli metaboliti. I fattori trovati sono stabili, quindi spettri metabolici di soggetti sono composti dallo stesso pattern di metaboliti indipendentemente dalla nazionalità. Attraverso un'analisi a singolo cieco sono stati trovati alti valori di correlazione tra le variabili di composizione corporea e lo stato metabolico dei soggetti. Ciò suggerisce la possibilità di derivare la composizione corporea dei soggetti a partire dal loro stato metabolico.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Although previous studies report on the effect of street washing on ambient particulate matter levels, there is a lack of studies investigating the results of street washing on the emission strength of road dust. A sampling campaign was conducted in Madrid urban area during July 2009 where road dust samples were collected in two sites, namely Reference site (where the road surface was not washed) and Pelayo site (where street washing was performed daily during night). Following the chemical characterization of the road dust particles the emission sources were resolved by means of Positive Matrix Factorization, PMF (Multilinear Engine scripting) and the mass contribution of each source was calculated for the two sites. Mineral dust, brake wear, tire wear, carbonaceous emissions and construction dust were the main sources of road dust with mineral and construction dust being the major contributors to inhalable road dust load. To evaluate the effectiveness of street washing on the emission sources, the sources mass contributions between the two sites were compared. Although brake wear and tire wear had lower concentrations at the site where street washing was performed, these mass differences were not statistically significant and the temporal variation did not show the expected build-up after dust removal. It was concluded that the washing activities resulted merely in a road dust moistening, without effective removal and that mobilization of particles took place in a few hours between washing and sampling. The results also indicated that it is worth paying attention to the dust dispersed from the construction sites as they affect the emission strength in nearby streets.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In early spring the Baltic region is frequently affected by high-pollution events due to biomass burning in that area. Here we present a comprehensive study to investigate the impact of biomass/grass burning (BB) on the evolution and composition of aerosol in Preila, Lithuania, during springtime open fires. Non-refractory submicron particulate matter (NR-PM1) was measured by an Aerodyne aerosol chemical speciation monitor (ACSM) and a source apportionment with the multilinear engine (ME-2) running the positive matrix factorization (PMF) model was applied to the organic aerosol fraction to investigate the impact of biomass/grass burning. Satellite observations over regions of biomass burning activity supported the results and identification of air mass transport to the area of investigation. Sharp increases in biomass burning tracers, such as levoglucosan up to 683 ngm-3 and black carbon (BC) up to 17 μgm-3 were observed during this period. A further separation between fossil and non-fossil primary and secondary contributions was obtained by coupling ACSM PMF results and radiocarbon (14C) measurements of the elemental (EC) and organic (OC) carbon fractions. Non-fossil organic carbon (OCnf/ was the dominant fraction of PM1, with the primary (POCnf/ and secondary (SOCnf/ fractions contributing 26–44% and 13–23% to the total carbon (TC), respectively. 5–8% of the TC had a primary fossil origin (POCf/, whereas the contribution of fossil secondary organic carbon (SOCf/ was 4–13 %. Nonfossil EC (ECnf/ and fossil EC (ECf/ ranged from 13–24 and 7–13 %, respectively. Isotope ratios of stable carbon and nitrogen isotopes were used to distinguish aerosol particles associated with solid and liquid fossil fuel burning.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Thesis (Ph.D.)--University of Washington, 2016-04

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In recent years, the boundaries between e-commerce and social networking have become increasingly blurred. Many e-commerce websites support the mechanism of social login where users can sign on the websites using their social network identities such as their Facebook or Twitter accounts. Users can also post their newly purchased products on microblogs with links to the e-commerce product web pages. In this paper, we propose a novel solution for cross-site cold-start product recommendation, which aims to recommend products from e-commerce websites to users at social networking sites in 'cold-start' situations, a problem which has rarely been explored before. A major challenge is how to leverage knowledge extracted from social networking sites for cross-site cold-start product recommendation. We propose to use the linked users across social networking sites and e-commerce websites (users who have social networking accounts and have made purchases on e-commerce websites) as a bridge to map users' social networking features to another feature representation for product recommendation. In specific, we propose learning both users' and products' feature representations (called user embeddings and product embeddings, respectively) from data collected from e-commerce websites using recurrent neural networks and then apply a modified gradient boosting trees method to transform users' social networking features into user embeddings. We then develop a feature-based matrix factorization approach which can leverage the learnt user embeddings for cold-start product recommendation. Experimental results on a large dataset constructed from the largest Chinese microblogging service Sina Weibo and the largest Chinese B2C e-commerce website JingDong have shown the effectiveness of our proposed framework.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We present in this article an automated framework that extracts product adopter information from online reviews and incorporates the extracted information into feature-based matrix factorization formore effective product recommendation. In specific, we propose a bootstrapping approach for the extraction of product adopters from review text and categorize them into a number of different demographic categories. The aggregated demographic information of many product adopters can be used to characterize both products and users in the form of distributions over different demographic categories. We further propose a graphbased method to iteratively update user- and product-related distributions more reliably in a heterogeneous user-product graph and incorporate them as features into the matrix factorization approach for product recommendation. Our experimental results on a large dataset crawled from JINGDONG, the largest B2C e-commerce website in China, show that our proposed framework outperforms a number of competitive baselines for product recommendation.