975 resultados para Positive Matrix Factorization


Relevância:

80.00% 80.00%

Publicador:

Resumo:

We present in this article an automated framework that extracts product adopter information from online reviews and incorporates the extracted information into feature-based matrix factorization formore effective product recommendation. In specific, we propose a bootstrapping approach for the extraction of product adopters from review text and categorize them into a number of different demographic categories. The aggregated demographic information of many product adopters can be used to characterize both products and users in the form of distributions over different demographic categories. We further propose a graphbased method to iteratively update user- and product-related distributions more reliably in a heterogeneous user-product graph and incorporate them as features into the matrix factorization approach for product recommendation. Our experimental results on a large dataset crawled from JINGDONG, the largest B2C e-commerce website in China, show that our proposed framework outperforms a number of competitive baselines for product recommendation.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

As massive data sets become increasingly available, people are facing the problem of how to effectively process and understand these data. Traditional sequential computing models are giving way to parallel and distributed computing models, such as MapReduce, both due to the large size of the data sets and their high dimensionality. This dissertation, as in the same direction of other researches that are based on MapReduce, tries to develop effective techniques and applications using MapReduce that can help people solve large-scale problems. Three different problems are tackled in the dissertation. The first one deals with processing terabytes of raster data in a spatial data management system. Aerial imagery files are broken into tiles to enable data parallel computation. The second and third problems deal with dimension reduction techniques that can be used to handle data sets of high dimensionality. Three variants of the nonnegative matrix factorization technique are scaled up to factorize matrices of dimensions in the order of millions in MapReduce based on different matrix multiplication implementations. Two algorithms, which compute CANDECOMP/PARAFAC and Tucker tensor decompositions respectively, are parallelized in MapReduce based on carefully partitioning the data and arranging the computation to maximize data locality and parallelism.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

En los últimos años se ha incrementado el interés de la comunidad científica en la Factorización de matrices no negativas (Non-negative Matrix Factorization, NMF). Este método permite transformar un conjunto de datos de grandes dimensiones en una pequeña colección de elementos que poseen semántica propia en el contexto del análisis. En el caso de Bioinformática, NMF suele emplearse como base de algunos métodos de agrupamiento de datos, que emplean un modelo estadístico para determinar el número de clases más favorable. Este modelo requiere de una gran cantidad de ejecuciones de NMF con distintos parámetros de entrada, lo que representa una enorme carga de trabajo a nivel computacional. La mayoría de las implementaciones de NMF han ido quedando obsoletas ante el constante crecimiento de los datos que la comunidad científica busca analizar, bien sea porque los tiempos de cómputo llegan a alargarse hasta convertirse en inviables, o porque el tamaño de esos datos desborda los recursos del sistema. Por ello, esta tesis doctoral se centra en la optimización y paralelización de la factorización NMF, pero no solo a nivel teórico, sino con el objetivo de proporcionarle a la comunidad científica una nueva herramienta para el análisis de datos de origen biológico. NMF expone un alto grado de paralelismo a nivel de datos, de granularidad variable; mientras que los métodos de agrupamiento mencionados anteriormente presentan un paralelismo a nivel de cómputo, ya que las diversas instancias de NMF que se ejecutan son independientes. Por tanto, desde un punto de vista global, se plantea un modelo de optimización por capas donde se emplean diferentes tecnologías de alto rendimiento...

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Single-particle mixing state information can be a powerful tool for assessing the relative impact of local and regional sources of ambient particulate matter in urban environments. However, quantitative mixing state data are challenging to obtain using single-particle mass spectrometers. In this study, the quantitative chemical composition of carbonaceous single particles has been determined using an aerosol time-of-flight mass spectrometer (ATOFMS) as part of the MEGAPOLI 2010 winter campaign in Paris, France. Relative peak areas of marker ions for elemental carbon (EC), organic aerosol (OA), ammonium, nitrate, sulfate and potassium were compared with concurrent measurements from an Aerodyne high-resolution time-of-flight aerosol mass spectrometer (HR-ToF-AMS), a thermal-optical OCEC analyser and a particle into liquid sampler coupled with ion chromatography (PILS-IC). ATOFMS-derived estimated mass concentrations reproduced the variability of these species well (R-2 = 0.67-0.78), and 10 discrete mixing states for carbonaceous particles were identified and quantified. The chemical mixing state of HR-ToF-AMS organic aerosol factors, resolved using positive matrix factorisation, was also investigated through comparison with the ATOFMS dataset. The results indicate that hydrocarbon-like OA (HOA) detected in Paris is associated with two EC-rich mixing states which differ in their relative sulfate content, while fresh biomass burning OA (BBOA) is associated with two mixing states which differ significantly in their OA/EC ratios. Aged biomass burning OA (OOA(2)-BBOA) was found to be significantly internally mixed with nitrate, while secondary, oxidised OA (OOA) was associated with five particle mixing states, each exhibiting different relative secondary inorganic ion content. Externally mixed secondary organic aerosol was not observed. These findings demonstrate the range of primary and secondary organic aerosol mixing states in Paris. Examination of the temporal behaviour and chemical composition of the ATOFMS classes also enabled estimation of the relative contribution of transported emissions of each chemical species and total particle mass in the size range investigated. Only 22% of the total ATOFMS-derived particle mass was apportioned to fresh, local emissions, with 78% apportioned to regional/continental-scale emissions. Single-particle mixing state information can be a powerful tool for assessing the relative impact of local and regional sources of ambient particulate matter in urban environments. However, quantitative mixing state data are challenging to obtain using single-particle mass spectrometers. In this study, the quantitative chemical composition of carbonaceous single particles has been determined using an aerosol time-of-flight mass spectrometer (ATOFMS) as part of the MEGAPOLI 2010 winter campaign in Paris, France. Relative peak areas of marker ions for elemental carbon (EC), organic aerosol (OA), ammonium, nitrate, sulfate and potassium were compared with concurrent measurements from an Aerodyne high-resolution time-of-flight aerosol mass spectrometer (HR-ToF-AMS), a thermal-optical OCEC analyser and a particle into liquid sampler coupled with ion chromatography (PILS-IC). ATOFMS-derived estimated mass concentrations reproduced the variability of these species well (R-2 = 0.67-0.78), and 10 discrete mixing states for carbonaceous particles were identified and quantified. The chemical mixing state of HR-ToF-AMS organic aerosol factors, resolved using positive matrix factorisation, was also investigated through comparison with the ATOFMS dataset. The results indicate that hydrocarbon-like OA (HOA) detected in Paris is associated with two EC-rich mixing states which differ in their relative sulfate content, while fresh biomass burning OA (BBOA) is associated with two mixing states which differ significantly in their OA/EC ratios. Aged biomass burning OA (OOA(2)-BBOA) was found to be significantly internally mixed with nitrate, while secondary, oxidised OA (OOA) was associated with five particle mixing states, each exhibiting different relative secondary inorganic ion content. Externally mixed secondary organic aerosol was not observed. These findings demonstrate the range of primary and secondary organic aerosol mixing states in Paris. Examination of the temporal behaviour and chemical composition of the ATOFMS classes also enabled estimation of the relative contribution of transported emissions of each chemical species and total particle mass in the size range investigated. Only 22% of the total ATOFMS-derived particle mass was apportioned to fresh, local emissions, with 78% apportioned to regional/continental-scale emissions.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

An aerosol time-of-flight mass spectrometer (ATOFMS) was deployed for the measurement of the size resolved chemical composition of single particles at a site in Cork Harbour, Ireland for three weeks in August 2008. The ATOFMS was co-located with a suite of semi-continuous instrumentation for the measurement of particle number, elemental carbon (EC), organic carbon (OC), sulfate and particulate matter smaller than 2.5 μm in diameter (PM2.5). The temporality of the ambient ATOFMS particle classes was subsequently used in conjunction with the semi-continuous measurements to apportion PM2.5 mass using positive matrix factorisation. The synergy of the single particle classification procedure and positive matrix factorisation allowed for the identification of six factors, corresponding to vehicular traffic, marine, long-range transport, various combustion, domestic solid fuel combustion and shipping traffic with estimated contributions to the measured PM2.5 mass of 23%, 14%, 13%, 11%, 5% and 1.5% respectively. Shipping traffic was found to contribute 18% of the measured particle number (20–600 nm mobility diameter), and thus may have important implications for human health considering the size and composition of ship exhaust particles. The positive matrix factorisation procedure enabled a more refined interpretation of the single particle results by providing source contributions to PM2.5 mass, while the single particle data enabled the identification of additional factors not possible with typical semi-continuous measurements, including local shipping traffic.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This work outlines the theoretical advantages of multivariate methods in biomechanical data, validates the proposed methods and outlines new clinical findings relating to knee osteoarthritis that were made possible by this approach. New techniques were based on existing multivariate approaches, Partial Least Squares (PLS) and Non-negative Matrix Factorization (NMF) and validated using existing data sets. The new techniques developed, PCA-PLS-LDA (Principal Component Analysis – Partial Least Squares – Linear Discriminant Analysis), PCA-PLS-MLR (Principal Component Analysis – Partial Least Squares –Multiple Linear Regression) and Waveform Similarity (based on NMF) were developed to address the challenging characteristics of biomechanical data, variability and correlation. As a result, these new structure-seeking technique revealed new clinical findings. The first new clinical finding relates to the relationship between pain, radiographic severity and mechanics. Simultaneous analysis of pain and radiographic severity outcomes, a first in biomechanics, revealed that the knee adduction moment’s relationship to radiographic features is mediated by pain in subjects with moderate osteoarthritis. The second clinical finding was quantifying the importance of neuromuscular patterns in brace effectiveness for patients with knee osteoarthritis. I found that brace effectiveness was more related to the patient’s unbraced neuromuscular patterns than it was to mechanics, and that these neuromuscular patterns were more complicated than simply increased overall muscle activity, as previously thought.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Matrix factorization (MF) has evolved as one of the better practice to handle sparse data in field of recommender systems. Funk singular value decomposition (SVD) is a variant of MF that exists as state-of-the-art method that enabled winning the Netflix prize competition. The method is widely used with modifications in present day research in field of recommender systems. With the potential of data points to grow at very high velocity, it is prudent to devise newer methods that can handle such data accurately as well as efficiently than Funk-SVD in the context of recommender system. In view of the growing data points, I propose a latent factor model that caters to both accuracy and efficiency by reducing the number of latent features of either users or items making it less complex than Funk-SVD, where latent features of both users and items are equal and often larger. A comprehensive empirical evaluation of accuracy on two publicly available, amazon and ml-100 k datasets reveals the comparable accuracy and lesser complexity of proposed methods than Funk-SVD.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The main purpose of this thesis is to go beyond two usual assumptions that accompany theoretical analysis in spin-glasses and inference: the i.i.d. (independently and identically distributed) hypothesis on the noise elements and the finite rank regime. The first one appears since the early birth of spin-glasses. The second one instead concerns the inference viewpoint. Disordered systems and Bayesian inference have a well-established relation, evidenced by their continuous cross-fertilization. The thesis makes use of techniques coming both from the rigorous mathematical machinery of spin-glasses, such as the interpolation scheme, and from Statistical Physics, such as the replica method. The first chapter contains an introduction to the Sherrington-Kirkpatrick and spiked Wigner models. The first is a mean field spin-glass where the couplings are i.i.d. Gaussian random variables. The second instead amounts to establish the information theoretical limits in the reconstruction of a fixed low rank matrix, the “spike”, blurred by additive Gaussian noise. In chapters 2 and 3 the i.i.d. hypothesis on the noise is broken by assuming a noise with inhomogeneous variance profile. In spin-glasses this leads to multi-species models. The inferential counterpart is called spatial coupling. All the previous models are usually studied in the Bayes-optimal setting, where everything is known about the generating process of the data. In chapter 4 instead we study the spiked Wigner model where the prior on the signal to reconstruct is ignored. In chapter 5 we analyze the statistical limits of a spiked Wigner model where the noise is no longer Gaussian, but drawn from a random matrix ensemble, which makes its elements dependent. The thesis ends with chapter 6, where the challenging problem of high-rank probabilistic matrix factorization is tackled. Here we introduce a new procedure called "decimation" and we show that it is theoretically to perform matrix factorization through it.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

An ammonium chloride erythrocyte-lysing procedure was used to prepare a bacterial pellet from positive blood cultures for direct matrix-assisted laser desorption-ionization time of flight (MALDI-TOF) mass spectrometry analysis. Identification was obtained for 78.7% of the pellets tested. Moreover, 99% of the MALDI-TOF identifications were congruent at the species level when considering valid scores. This fast and accurate method is promising.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Matrix sublimation has demonstrated to be a powerful approach for high-resolution matrix-assisted laser desorption ionization (MALDI) imaging of lipids, providing very homogeneous solvent-free deposition. This work presents a comprehensive study aiming to evaluate current and novel matrix candidates for high spatial resolution MALDI imaging mass spectrometry of lipids from tissue section after deposition by sublimation. For this purpose, 12 matrices including 2,5-dihydroxybenzoic acid (DHB), sinapinic acid (SA), α-cyano-4-hydroxycinnamic acid (CHCA), 2,6-dihydroxyacetphenone (DHA), 2',4',6'-trihydroxyacetophenone (THAP), 3-hydroxypicolinic acid (3-HPA), 1,8-bis(dimethylamino)naphthalene (DMAN), 1,8,9-anthracentriol (DIT), 1,5-diaminonapthalene (DAN), p-nitroaniline (NIT), 9-aminoacridine (9-AA), and 2-mercaptobenzothiazole (MBT) were investigated for lipid detection efficiency in both positive and negative ionization modes, matrix interferences, and stability under vacuum. For the most relevant matrices, ion maps of the different lipid species were obtained from tissue sections at high spatial resolution and the detected peaks were characterized by matrix-assisted laser desorption ionization time-of-flight/time-of-flight (MALDI-TOF/TOF) mass spectrometry. First proposed for imaging mass spectrometry (IMS) after sublimation, DAN has demonstrated to be of high efficiency providing rich lipid signatures in both positive and negative polarities with high vacuum stability and sub-20 μm resolution capacity. Ion images from adult mouse brain were generated with a 10 μm scanning resolution. Furthermore, ion images from adult mouse brain and whole-body fish tissue sections were also acquired in both polarity modes from the same tissue section at 100 μm spatial resolution. Sublimation of DAN represents an interesting approach to improve information with respect to currently employed matrices providing a deeper analysis of the lipidome by IMS.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The modern GPUs are well suited for intensive computational tasks and massive parallel computation. Sparse matrix multiplication and linear triangular solver are the most important and heavily used kernels in scientific computation, and several challenges in developing a high performance kernel with the two modules is investigated. The main interest it to solve linear systems derived from the elliptic equations with triangular elements. The resulting linear system has a symmetric positive definite matrix. The sparse matrix is stored in the compressed sparse row (CSR) format. It is proposed a CUDA algorithm to execute the matrix vector multiplication using directly the CSR format. A dependence tree algorithm is used to determine which variables the linear triangular solver can determine in parallel. To increase the number of the parallel threads, a coloring graph algorithm is implemented to reorder the mesh numbering in a pre-processing phase. The proposed method is compared with parallel and serial available libraries. The results show that the proposed method improves the computation cost of the matrix vector multiplication. The pre-processing associated with the triangular solver needs to be executed just once in the proposed method. The conjugate gradient method was implemented and showed similar convergence rate for all the compared methods. The proposed method showed significant smaller execution time.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Mercury (Hg) exposure is associated with disease conditions, including cardiovascular problems. Although the mechanisms implicated in these complications have not been precisely defined yet, matrix metalloproteinases (MMPs) may be involved. The gene encoding MMP-2 presents genetic polymorphisms which affect the expression and activity level of this enzyme. A common polymorphism of MMP-2 gene is the C(-1306)T (rs 243865), which is known to disrupt a Sp1-type promoter site (CCACC box), thus leading to lower promoter activity associated with the T allele. This study aimed at examining how this polymorphism affects the circulating MMP-2 levels and its endogenous inhibitor, the tissue inhibitor of metalloproteinase-2 (TIMP-2) in 210 subjects environmentally exposed to Hg. Total blood and plasma Hg concentrations were determined by inductively coupled plasma-mass spectrometry (ICP-MS). MMP-2 and TIMP-2 concentrations were measured in plasma samples by gelatin zymography and ELISA, respectively. Genotypes for the C(-1306)T polymorphism were determined by Taqman (R) Allele Discrimination assay. We found a positive association (p = 0.0057) between plasma Hg concentrations and MMP-2/TIMP-2 (an index of net MMP-2 activity). The C(-1306)T polymorphism modified MMP-2 concentrations (p = 0.0465) and MMP-2/TIMP-2 ratio (p = 0.0060) in subjects exposed to Hg, with higher MMP-2 levels been found in subjects carrying the C allele. These findings suggest a significant interaction between the C(-1306)T polymorphism and Hg exposure, possibly increasing the risk of developing diseases in subjects with the C allele. (C) 2011 Elsevier B.V. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Mercury (Hg) exposure causes health problems including cardiovascular diseases. Although precise mechanisms have not been precisely defined yet, matrix metalloproteinases (MMPs) may be involved. The gene encoding MMP-9 presents genetic polymorphisms which affect the expression and activity level of this enzyme. Two polymorphisms in the promoter region [C(-1562)T and (CA)(n)] are functionally relevant, and are implicated in several diseases. This study aimed at examining how these polymorphisms affect the circulating MMP-9 levels and its endogenous inhibitor, the tissue inhibitor of metalloproteinase-1 (TIMP-1) in 266 subjects environmentally exposed to Hg. Blood and plasma Hg concentrations were determined by inductively coupled plasma-mass spectrometry (ICP-MS). MMP-9 and TIMP-1 concentrations were measured in plasma samples by gelatin zymography and ELISA, respectively. Genotypes for the C(-1562)T and the microsatellite (CA)(n) polymorphisms were determined. We found a positive association (P<0.05) between plasma Hg concentrations and MMP-9/TIMP-1 ratio (an index of net MMP-9 activity). When the subjects were divided into tertiles with basis on their plasma Hg concentrations, we found that the (CA)(n) polymorphism modified MMP-9 concentrations and MMP-9/TIMP-1 ratio in subjects with the lowest Hg concentrations (first tertile), with the highest MMP-9 levels being found in subjects with genotypes including alleles with 21 or more CA repeats (H alleles) (P<0.05). Conversely, this polymorphism had no effects on subjects with intermediate or high plasma Hg levels (second and third tertiles, respectively). The C(-1562)T polymorphism had no effects on MMP-9 levels. These findings suggest a significant interaction between the (CA)(n) polymorphism and low levels of Hg exposure, possibly increasing the risk of developing diseases in subjects with H alleles. (c) 2010 Elsevier B.V. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Mercury (Hg) exposure causes health problems that may result from increased oxidative stress and matrix metalloproteinase (MMP) levels. We investigated whether there is an association between the circulating levels of MMP-2, MMP-9, their endogenous inhibitors (the tissue inhibitors of metalloproteinases; TIMPs) and the circulating Hg levels in 159 subjects environmentally exposed to Hg. Blood and plasma Hg were determined by inductively coupled plasma-mass spectrometry (ICP-MS). MMP and TIMP concentrations were measured in plasma samples by gelatin zymography and ELISA respectively. Thiobarbituric acid-reactive species (TBARS) were measured in plasma to assess oxidative stress. Selenium (Se) levels were determined by ICP-MS because it is an antioxidant. The relations between bioindicators of Hg and the metalloproteinases levels were examined using multivariate regression models. While we found no relation between blood or plasma Hg and MMP-9, plasma Hg levels were negatively associated with TIMP-1 and TIMP-2 levels, and thereby with increasing MMP-9/TIMP-1 and MMP-2/TIMP-2 ratios, thus indicating a positive association between plasma Hg and circulating net MMP-9 and MMP-2 activities. These findings provide a new insight into the possible biological mechanisms of Hg toxicity, particularly in cardiovascular diseases.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This is the first in a series of three articles which aimed to derive the matrix elements of the U(2n) generators in a multishell spin-orbit basis. This is a basis appropriate to many-electron systems which have a natural partitioning of the orbital space and where also spin-dependent terms are included in the Hamiltonian. The method is based on a new spin-dependent unitary group approach to the many-electron correlation problem due to Gould and Paldus [M. D. Gould and J. Paldus, J. Chem. Phys. 92, 7394, (1990)]. In this approach, the matrix elements of the U(2n) generators in the U(n) x U(2)-adapted electronic Gelfand basis are determined by the matrix elements of a single Ll(n) adjoint tensor operator called the del-operator, denoted by Delta(j)(i) (1 less than or equal to i, j less than or equal to n). Delta or del is a polynomial of degree two in the U(n) matrix E = [E-j(i)]. The approach of Gould and Paldus is based on the transformation properties of the U(2n) generators as an adjoint tensor operator of U(n) x U(2) and application of the Wigner-Eckart theorem. Hence, to generalize this approach, we need to obtain formulas for the complete set of adjoint coupling coefficients for the two-shell composite Gelfand-Paldus basis. The nonzero shift coefficients are uniquely determined and may he evaluated by the methods of Gould et al. [see the above reference]. In this article, we define zero-shift adjoint coupling coefficients for the two-shell composite Gelfand-Paldus basis which are appropriate to the many-electron problem. By definition, these are proportional to the corresponding two-shell del-operator matrix elements, and it is shown that the Racah factorization lemma applies. Formulas for these coefficients are then obtained by application of the Racah factorization lemma. The zero-shift adjoint reduced Wigner coefficients required for this procedure are evaluated first. All these coefficients are needed later for the multishell case, which leads directly to the two-shell del-operator matrix elements. Finally, we discuss an application to charge and spin densities in a two-shell molecular system. (C) 1998 John Wiley & Sons.