895 resultados para Matrix factorization
Resumo:
This paper is concerned with tensor clustering with the assistance of dimensionality reduction approaches. A class of formulation for tensor clustering is introduced based on tensor Tucker decomposition models. In this formulation, an extra tensor mode is formed by a collection of tensors of the same dimensions and then used to assist a Tucker decomposition in order to achieve data dimensionality reduction. We design two types of clustering models for the tensors: PCA Tensor Clustering model and Non-negative Tensor Clustering model, by utilizing different regularizations. The tensor clustering can thus be solved by the optimization method based on the alternative coordinate scheme. Interestingly, our experiments show that the proposed models yield comparable or even better performance compared to most recent clustering algorithms based on matrix factorization.
Resumo:
Trace element measurements in PM10–2.5, PM2.5–1.0 and PM1.0–0.3 aerosol were performed with 2 h time resolution at kerbside, urban background and rural sites during the ClearfLo winter 2012 campaign in London. The environment-dependent variability of emissions was characterized using the Multilinear Engine implementation of the positive matrix factorization model, conducted on data sets comprising all three sites but segregated by size. Combining the sites enabled separation of sources with high temporal covariance but significant spatial variability. Separation of sizes improved source resolution by preventing sources occurring in only a single size fraction from having too small a contribution for the model to resolve. Anchor profiles were retrieved internally by analysing data subsets, and these profiles were used in the analyses of the complete data sets of all sites for enhanced source apportionment. A total of nine different factors were resolved (notable elements in brackets): in PM10–2.5, brake wear (Cu, Zr, Sb, Ba), other traffic-related (Fe), resuspended dust (Si, Ca), sea/road salt (Cl), aged sea salt (Na, Mg) and industrial (Cr, Ni); in PM2.5–1.0, brake wear, other traffic-related, resuspended dust, sea/road salt, aged sea salt and S-rich (S); and in PM1.0–0.3, traffic-related (Fe, Cu, Zr, Sb, Ba), resuspended dust, sea/road salt, aged sea salt, reacted Cl (Cl), S-rich and solid fuel (K, Pb). Human activities enhance the kerb-to-rural concentration gradients of coarse aged sea salt, typically considered to have a natural source, by 1.7–2.2. These site-dependent concentration differences reflect the effect of local resuspension processes in London. The anthropogenically influenced factors traffic (brake wear and other traffic-related processes), dust and sea/road salt provide further kerb-to-rural concentration enhancements by direct source emissions by a factor of 3.5–12.7. The traffic and dust factors are mainly emitted in PM10–2.5 and show strong diurnal variations with concentrations up to 4 times higher during rush hour than during night-time. Regionally influenced S-rich and solid fuel factors, occurring primarily in PM1.0–0.3, have negligible resuspension influences, and concentrations are similar throughout the day and across the regions.
Resumo:
In this work, new tools in atmospheric pollutant sampling and analysis were applied in order to go deeper in source apportionment study. The project was developed mainly by the study of atmospheric emission sources in a suburban area influenced by a municipal solid waste incinerator (MSWI), a medium-sized coastal tourist town and a motorway. Two main research lines were followed. For what concerns the first line, the potentiality of the use of PM samplers coupled with a wind select sensor was assessed. Results showed that they may be a valid support in source apportionment studies. However, meteorological and territorial conditions could strongly affect the results. Moreover, new markers were investigated, particularly focusing on the processes of biomass burning. OC revealed a good biomass combustion process indicator, as well as all determined organic compounds. Among metals, lead and aluminium are well related to the biomass combustion. Surprisingly PM was not enriched of potassium during bonfire event. The second research line consists on the application of Positive Matrix factorization (PMF), a new statistical tool in data analysis. This new technique was applied to datasets which refer to different time resolution data. PMF application to atmospheric deposition fluxes identified six main sources affecting the area. The incinerator’s relative contribution seemed to be negligible. PMF analysis was then applied to PM2.5 collected with samplers coupled with a wind select sensor. The higher number of determined environmental indicators allowed to obtain more detailed results on the sources affecting the area. Vehicular traffic revealed the source of greatest concern for the study area. Also in this case, incinerator’s relative contribution seemed to be negligible. Finally, the application of PMF analysis to hourly aerosol data demonstrated that the higher the temporal resolution of the data was, the more the source profiles were close to the real one.
Resumo:
Il crescente utilizzo di sistemi di analisi high-throughput per lo studio dello stato fisiologico e metabolico del corpo, ha evidenziato che una corretta alimentazione e una buona forma fisica siano fattori chiave per la salute. L'aumento dell'età media della popolazione evidenzia l'importanza delle strategie di contrasto delle patologie legate all'invecchiamento. Una dieta sana è il primo mezzo di prevenzione per molte patologie, pertanto capire come il cibo influisce sul corpo umano è di fondamentale importanza. In questo lavoro di tesi abbiamo affrontato la caratterizzazione dei sistemi di imaging radiografico Dual-energy X-ray Absorptiometry (DXA). Dopo aver stabilito una metodologia adatta per l'elaborazione di dati DXA su un gruppo di soggetti sani non obesi, la PCA ha evidenziato alcune proprietà emergenti dall'interpretazione delle componenti principali in termini delle variabili di composizione corporea restituite dalla DXA. Le prime componenti sono associabili ad indici macroscopici di descrizione corporea (come BMI e WHR). Queste componenti sono sorprendentemente stabili al variare dello status dei soggetti in età, sesso e nazionalità. Dati di analisi metabolica, ottenuti tramite Magnetic Resonance Spectroscopy (MRS) su campioni di urina, sono disponibili per circa mille anziani (provenienti da cinque paesi europei) di età compresa tra i 65 ed i 79 anni, non affetti da patologie gravi. I dati di composizione corporea sono altresì presenti per questi soggetti. L'algoritmo di Non-negative Matrix Factorization (NMF) è stato utilizzato per esprimere gli spettri MRS come combinazione di fattori di base interpretabili come singoli metaboliti. I fattori trovati sono stabili, quindi spettri metabolici di soggetti sono composti dallo stesso pattern di metaboliti indipendentemente dalla nazionalità. Attraverso un'analisi a singolo cieco sono stati trovati alti valori di correlazione tra le variabili di composizione corporea e lo stato metabolico dei soggetti. Ciò suggerisce la possibilità di derivare la composizione corporea dei soggetti a partire dal loro stato metabolico.
Resumo:
Although previous studies report on the effect of street washing on ambient particulate matter levels, there is a lack of studies investigating the results of street washing on the emission strength of road dust. A sampling campaign was conducted in Madrid urban area during July 2009 where road dust samples were collected in two sites, namely Reference site (where the road surface was not washed) and Pelayo site (where street washing was performed daily during night). Following the chemical characterization of the road dust particles the emission sources were resolved by means of Positive Matrix Factorization, PMF (Multilinear Engine scripting) and the mass contribution of each source was calculated for the two sites. Mineral dust, brake wear, tire wear, carbonaceous emissions and construction dust were the main sources of road dust with mineral and construction dust being the major contributors to inhalable road dust load. To evaluate the effectiveness of street washing on the emission sources, the sources mass contributions between the two sites were compared. Although brake wear and tire wear had lower concentrations at the site where street washing was performed, these mass differences were not statistically significant and the temporal variation did not show the expected build-up after dust removal. It was concluded that the washing activities resulted merely in a road dust moistening, without effective removal and that mobilization of particles took place in a few hours between washing and sampling. The results also indicated that it is worth paying attention to the dust dispersed from the construction sites as they affect the emission strength in nearby streets.
Resumo:
In early spring the Baltic region is frequently affected by high-pollution events due to biomass burning in that area. Here we present a comprehensive study to investigate the impact of biomass/grass burning (BB) on the evolution and composition of aerosol in Preila, Lithuania, during springtime open fires. Non-refractory submicron particulate matter (NR-PM1) was measured by an Aerodyne aerosol chemical speciation monitor (ACSM) and a source apportionment with the multilinear engine (ME-2) running the positive matrix factorization (PMF) model was applied to the organic aerosol fraction to investigate the impact of biomass/grass burning. Satellite observations over regions of biomass burning activity supported the results and identification of air mass transport to the area of investigation. Sharp increases in biomass burning tracers, such as levoglucosan up to 683 ngm-3 and black carbon (BC) up to 17 μgm-3 were observed during this period. A further separation between fossil and non-fossil primary and secondary contributions was obtained by coupling ACSM PMF results and radiocarbon (14C) measurements of the elemental (EC) and organic (OC) carbon fractions. Non-fossil organic carbon (OCnf/ was the dominant fraction of PM1, with the primary (POCnf/ and secondary (SOCnf/ fractions contributing 26–44% and 13–23% to the total carbon (TC), respectively. 5–8% of the TC had a primary fossil origin (POCf/, whereas the contribution of fossil secondary organic carbon (SOCf/ was 4–13 %. Nonfossil EC (ECnf/ and fossil EC (ECf/ ranged from 13–24 and 7–13 %, respectively. Isotope ratios of stable carbon and nitrogen isotopes were used to distinguish aerosol particles associated with solid and liquid fossil fuel burning.
Resumo:
Thesis (Ph.D.)--University of Washington, 2016-04
Resumo:
In recent years, the boundaries between e-commerce and social networking have become increasingly blurred. Many e-commerce websites support the mechanism of social login where users can sign on the websites using their social network identities such as their Facebook or Twitter accounts. Users can also post their newly purchased products on microblogs with links to the e-commerce product web pages. In this paper, we propose a novel solution for cross-site cold-start product recommendation, which aims to recommend products from e-commerce websites to users at social networking sites in 'cold-start' situations, a problem which has rarely been explored before. A major challenge is how to leverage knowledge extracted from social networking sites for cross-site cold-start product recommendation. We propose to use the linked users across social networking sites and e-commerce websites (users who have social networking accounts and have made purchases on e-commerce websites) as a bridge to map users' social networking features to another feature representation for product recommendation. In specific, we propose learning both users' and products' feature representations (called user embeddings and product embeddings, respectively) from data collected from e-commerce websites using recurrent neural networks and then apply a modified gradient boosting trees method to transform users' social networking features into user embeddings. We then develop a feature-based matrix factorization approach which can leverage the learnt user embeddings for cold-start product recommendation. Experimental results on a large dataset constructed from the largest Chinese microblogging service Sina Weibo and the largest Chinese B2C e-commerce website JingDong have shown the effectiveness of our proposed framework.
Resumo:
We present in this article an automated framework that extracts product adopter information from online reviews and incorporates the extracted information into feature-based matrix factorization formore effective product recommendation. In specific, we propose a bootstrapping approach for the extraction of product adopters from review text and categorize them into a number of different demographic categories. The aggregated demographic information of many product adopters can be used to characterize both products and users in the form of distributions over different demographic categories. We further propose a graphbased method to iteratively update user- and product-related distributions more reliably in a heterogeneous user-product graph and incorporate them as features into the matrix factorization approach for product recommendation. Our experimental results on a large dataset crawled from JINGDONG, the largest B2C e-commerce website in China, show that our proposed framework outperforms a number of competitive baselines for product recommendation.
Resumo:
As massive data sets become increasingly available, people are facing the problem of how to effectively process and understand these data. Traditional sequential computing models are giving way to parallel and distributed computing models, such as MapReduce, both due to the large size of the data sets and their high dimensionality. This dissertation, as in the same direction of other researches that are based on MapReduce, tries to develop effective techniques and applications using MapReduce that can help people solve large-scale problems. Three different problems are tackled in the dissertation. The first one deals with processing terabytes of raster data in a spatial data management system. Aerial imagery files are broken into tiles to enable data parallel computation. The second and third problems deal with dimension reduction techniques that can be used to handle data sets of high dimensionality. Three variants of the nonnegative matrix factorization technique are scaled up to factorize matrices of dimensions in the order of millions in MapReduce based on different matrix multiplication implementations. Two algorithms, which compute CANDECOMP/PARAFAC and Tucker tensor decompositions respectively, are parallelized in MapReduce based on carefully partitioning the data and arranging the computation to maximize data locality and parallelism.
Resumo:
En los últimos años se ha incrementado el interés de la comunidad científica en la Factorización de matrices no negativas (Non-negative Matrix Factorization, NMF). Este método permite transformar un conjunto de datos de grandes dimensiones en una pequeña colección de elementos que poseen semántica propia en el contexto del análisis. En el caso de Bioinformática, NMF suele emplearse como base de algunos métodos de agrupamiento de datos, que emplean un modelo estadístico para determinar el número de clases más favorable. Este modelo requiere de una gran cantidad de ejecuciones de NMF con distintos parámetros de entrada, lo que representa una enorme carga de trabajo a nivel computacional. La mayoría de las implementaciones de NMF han ido quedando obsoletas ante el constante crecimiento de los datos que la comunidad científica busca analizar, bien sea porque los tiempos de cómputo llegan a alargarse hasta convertirse en inviables, o porque el tamaño de esos datos desborda los recursos del sistema. Por ello, esta tesis doctoral se centra en la optimización y paralelización de la factorización NMF, pero no solo a nivel teórico, sino con el objetivo de proporcionarle a la comunidad científica una nueva herramienta para el análisis de datos de origen biológico. NMF expone un alto grado de paralelismo a nivel de datos, de granularidad variable; mientras que los métodos de agrupamiento mencionados anteriormente presentan un paralelismo a nivel de cómputo, ya que las diversas instancias de NMF que se ejecutan son independientes. Por tanto, desde un punto de vista global, se plantea un modelo de optimización por capas donde se emplean diferentes tecnologías de alto rendimiento...
Resumo:
Receptor modelling was performed on quadrupole unit mass resolution aerosol mass spectrometer (Q-AMS) sub-micron particulate matter (PM) chemical speciation measurements from Windsor, Ontario, an industrial city situated across the Detroit River from Detroit, Michigan. Aerosol and trace gas measurements were collected on board Environment Canada’s CRUISER mobile laboratory. Positive matrix factorization (PMF) was performed on the AMS full particle-phase mass spectrum (PMFFull MS) encompassing both organic and inorganic components. This approach was compared to the more common method of analysing only the organic mass spectra (PMFOrg MS). PMF of the full mass spectrum revealed that variability in the non-refractory sub-micron aerosol concentration and composition was best explained by six factors: an amine-containing factor (Amine); an ammonium sulphate and oxygenated organic aerosol containing factor (Sulphate-OA); an ammonium nitrate and oxygenated organic aerosol containing factor (Nitrate-OA); an ammonium chloride containing factor (Chloride); a hydrocarbon like organic aerosol (HOA) factor; and a moderately oxygenated organic aerosol factor (OOA). PMF of the organic mass spectrum revealed three factors of similar composition to some of those revealed through PMFFull MS: Amine, HOA and OOA. Including both the inorganic and organic mass proved to be a beneficial approach to analysing the unit mass resolution AMS data for several reasons. First, it provided a method for potentially calculating more accurate sub-micron PM mass concentrations, particularly when unusual factors are present, in this case, an Amine factor. As this method does not rely on a priori knowledge of chemical species, it circumvents the need for any adjustments to the traditional AMS species fragmentation patterns to account for atypical species, and can thus lead to more complete factor profiles. It is expected that this method would be even more useful for HR-ToF-AMS data, due to the ability to better understand the chemical nature of atypical factors from high resolution mass spectra. Second, utilizing PMF to extract factors containing inorganic species allowed for the determination of extent of neutralization, which could have implications for aerosol parameterization. Third, subtler differences in organic aerosol components were resolved through the incorporation of inorganic mass into the PMF matrix. The additional temporal features provided by the inorganic aerosol components allowed for the resolution of more types of oxygenated organic aerosol than could be reliably re-solved from PMF of organics alone. Comparison of findings from the PMFFull MS and PMFOrg MS methods showed that for the Windsor airshed, the PMFFull MS method enabled additional conclusions to be drawn in terms of aerosol sources and chemical processes. While performing PMFOrg MS can provide important distinctions between types of organic aerosol, it is shown that including inorganic species in the PMF analysis can permit further apportionment of organics for unit mass resolution AMS mass spectra.
Resumo:
Ambient wintertime background urban aerosol in Cork city, Ireland, was characterized using aerosol mass spectrometry. During the three-week measurement study in 2009, 93% of the ca. 1 350 000 single particles characterized by an Aerosol Time-of-Flight Mass Spectrometer (TSI ATOFMS) were classified into five organic-rich particle types, internally mixed to different proportions with elemental carbon (EC), sulphate and nitrate, while the remaining 7% was predominantly inorganic in nature. Non-refractory PM1 aerosol was characterized using a High Resolution Time-of-Flight Aerosol Mass Spectrometer (Aerodyne HR-ToF-AMS) and was also found to comprise organic aerosol as the most abundant species (62 %), followed by nitrate (15 %), sulphate (9 %) and ammonium (9 %), and chloride (5 %). Positive matrix factorization (PMF) was applied to the HR-ToF-AMS organic matrix, and a five-factor solution was found to describe the variance in the data well. Specifically, "hydrocarbon-like" organic aerosol (HOA) comprised 20% of the mass, "low-volatility" oxygenated organic aerosol (LV-OOA) comprised 18 %, "biomass burning" organic aerosol (BBOA) comprised 23 %, non-wood solid-fuel combustion "peat and coal" organic aerosol (PCOA) comprised 21 %, and finally a species type characterized by primary m/z peaks at 41 and 55, similar to previously reported "cooking" organic aerosol (COA), but possessing different diurnal variations to what would be expected for cooking activities, contributed 18 %. Correlations between the different particle types obtained by the two aerosol mass spectrometers are also discussed. Despite wood, coal and peat being minor fuel types used for domestic space heating in urban areas, their relatively low combustion efficiencies result in a significant contribution to PM1 aerosol mass (44% and 28% of the total organic aerosol mass and non-refractory total PM1, respectively).Ambient wintertime background urban aerosol in Cork city, Ireland, was characterized using aerosol mass spectrometry. During the three-week measurement study in 2009, 93% of the ca. 1 350 000 single particles characterized by an Aerosol Time-of-Flight Mass Spectrometer (TSI ATOFMS) were classified into five organic-rich particle types, internally mixed to different proportions with elemental carbon (EC), sulphate and nitrate, while the remaining 7% was predominantly inorganic in nature. Non-refractory PM1 aerosol was characterized using a High Resolution Time-of-Flight Aerosol Mass Spectrometer (Aerodyne HR-ToF-AMS) and was also found to comprise organic aerosol as the most abundant species (62 %), followed by nitrate (15 %), sulphate (9 %) and ammonium (9 %), and chloride (5 %). Positive matrix factorization (PMF) was applied to the HR-ToF-AMS organic matrix, and a five-factor solution was found to describe the variance in the data well. Specifically, "hydrocarbon-like" organic aerosol (HOA) comprised 20% of the mass, "low-volatility" oxygenated organic aerosol (LV-OOA) comprised 18 %, "biomass burning" organic aerosol (BBOA) comprised 23 %, non-wood solid-fuel combustion "peat and coal" organic aerosol (PCOA) comprised 21 %, and finally a species type characterized by primary m/z peaks at 41 and 55, similar to previously reported "cooking" organic aerosol (COA), but possessing different diurnal variations to what would be expected for cooking activities, contributed 18 %. Correlations between the different particle types obtained by the two aerosol mass spectrometers are also discussed. Despite wood, coal and peat being minor fuel types used for domestic space heating in urban areas, their relatively low combustion efficiencies result in a significant contribution to PM1 aerosol mass (44% and 28% of the total organic aerosol mass and non-refractory total PM1, respectively).
Resumo:
Understanding the impact of atmospheric black carbon (BC) containing particles on human health and radiative forcing requires knowledge of the mixing state of BC, including the characteristics of the materials with which it is internally mixed. In this study, we demonstrate for the first time the capabilities of the Aerodyne Soot-Particle Aerosol Mass Spectrometer equipped with a light scattering module (LS-SP-AMS) to examine the mixing state of refractory BC (rBC) and other aerosol components in an urban environment (downtown Toronto). K-means clustering analysis was used to classify single particle mass spectra into chemically distinct groups. One resultant cluster is dominated by rBC mass spectral signals (C+1 to C+5) while the organic signals fall into a few major clusters, identified as hydrocarbon-like organic aerosol (HOA), oxygenated organic aerosol (OOA), and cooking emission organic aerosol (COA). A nearly external mixing is observed with small BC particles only thinly coated by HOA ( 28% by mass on average), while over 90% of the HOA-rich particles did not contain detectable amounts of rBC. Most of the particles classified into other inorganic and organic clusters were not significantly associated with BC. The single particle results also suggest that HOA and COA emitted from anthropogenic sources were likely major contributors to organic-rich particles with low to mid-range aerodynamic diameter (dva). The similar temporal profiles and mass spectral features of the organic clusters and the factors from a positive matrix factorization (PMF) analysis of the ensemble aerosol dataset validate the conventional interpretation of the PMF results.
Resumo:
The first long-term aerosol sampling and chemical characterization results from measurements at the Cape Verde Atmospheric Observatory (CVAO) on the island of São Vicente are presented and are discussed with respect to air mass origin and seasonal trends. In total 671 samples were collected using a high-volume PM10 sampler on quartz fiber filters from January 2007 to December 2011. The samples were analyzed for their aerosol chemical composition, including their ionic and organic constituents. Back trajectory analyses showed that the aerosol at CVAO was strongly influenced by emissions from Europe and Africa, with the latter often responsible for high mineral dust loading. Sea salt and mineral dust dominated the aerosol mass and made up in total about 80% of the aerosol mass. The 5-year PM10 mean was 47.1 ± 55.5 µg/m**2, while the mineral dust and sea salt means were 27.9 ± 48.7 and 11.1 ± 5.5 µg/m**2, respectively. Non-sea-salt (nss) sulfate made up 62% of the total sulfate and originated from both long-range transport from Africa or Europe and marine sources. Strong seasonal variation was observed for the aerosol components. While nitrate showed no clear seasonal variation with an annual mean of 1.1 ± 0.6 µg/m**3, the aerosol mass, OC (organic carbon) and EC (elemental carbon), showed strong winter maxima due to strong influence of African air mass inflow. Additionally during summer, elevated concentrations of OM were observed originating from marine emissions. A summer maximum was observed for non-sea-salt sulfate and was connected to periods when air mass inflow was predominantly of marine origin, indicating that marine biogenic emissions were a significant source. Ammonium showed a distinct maximum in spring and coincided with ocean surface water chlorophyll a concentrations. Good correlations were also observed between nss-sulfate and oxalate during the summer and winter seasons, indicating a likely photochemical in-cloud processing of the marine and anthropogenic precursors of these species. High temporal variability was observed in both chloride and bromide depletion, differing significantly within the seasons, air mass history and Saharan dust concentration. Chloride (bromide) depletion varied from 8.8 ± 8.5% (62 ± 42%) in Saharan-dust-dominated air mass to 30 ± 12% (87 ± 11%) in polluted Europe air masses. During summer, bromide depletion often reached 100% in marine as well as in polluted continental samples. In addition to the influence of the aerosol acidic components, photochemistry was one of the main drivers of halogenide depletion during the summer; while during dust events, displacement reaction with nitric acid was found to be the dominant mechanism. Positive matrix factorization (PMF) analysis identified three major aerosol sources: sea salt, aged sea salt and long-range transport. The ionic budget was dominated by the first two of these factors, while the long-range transport factor could only account for about 14% of the total observed ionic mass.