966 resultados para Positive Matrix Factorization


Relevância:

100.00% 100.00%

Publicador:

Resumo:

When document corpus is very large, we often need to reduce the number of features. But it is not possible to apply conventional Non-negative Matrix Factorization(NMF) on billion by million matrix as the matrix may not fit in memory. Here we present novel Online NMF algorithm. Using Online NMF, we reduced original high-dimensional space to low-dimensional space. Then we cluster all the documents in reduced dimension using k-means algorithm. We experimentally show that by processing small subsets of documents we will be able to achieve good performance. The method proposed outperforms existing algorithms.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Clustering techniques which can handle incomplete data have become increasingly important due to varied applications in marketing research, medical diagnosis and survey data analysis. Existing techniques cope up with missing values either by using data modification/imputation or by partial distance computation, often unreliable depending on the number of features available. In this paper, we propose a novel approach for clustering data with missing values, which performs the task by Symmetric Non-Negative Matrix Factorization (SNMF) of a complete pair-wise similarity matrix, computed from the given incomplete data. To accomplish this, we define a novel similarity measure based on Average Overlap similarity metric which can effectively handle missing values without modification of data. Further, the similarity measure is more reliable than partial distances and inherently possesses the properties required to perform SNMF. The experimental evaluation on real world datasets demonstrates that the proposed approach is efficient, scalable and shows significantly better performance compared to the existing techniques.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper proposes a hierarchical probabilistic model for ordinal matrix factorization. Unlike previous approaches, we model the ordinal nature of the data and take a principled approach to incorporating priors for the hidden variables. Two algorithms are presented for inference, one based on Gibbs sampling and one based on variational Bayes. Importantly, these algorithms may be implemented in the factorization of very large matrices with missing entries. The model is evaluated on a collaborative filtering task, where users have rated a collection of movies and the system is asked to predict their ratings for other movies. The Netflix data set is used for evaluation, which consists of around 100 million ratings. Using root mean-squared error (RMSE) as an evaluation metric, results show that the suggested model outperforms alternative factorization techniques. Results also show how Gibbs sampling outperforms variational Bayes on this task, despite the large number of ratings and model parameters. Matlab implementations of the proposed algorithms are available from cogsys.imm.dtu.dk/ordinalmatrixfactorization.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We consider the three-particle scattering S-matrix for the Landau-Lifshitz model by directly computing the set of the Feynman diagrams up to the second order. We show, following the analogous computations for the non-linear Schrdinger model [1, 2], that the three-particle S-matrix is factorizable in the first non-trivial order.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Medical imaging has become an absolutely essential diagnostic tool for clinical practices; at present, pathologies can be detected with an earliness never before known. Its use has not only been relegated to the field of radiology but also, increasingly, to computer-based imaging processes prior to surgery. Motion analysis, in particular, plays an important role in analyzing activities or behaviors of live objects in medicine. This short paper presents several low-cost hardware implementation approaches for the new generation of tablets and/or smartphones for estimating motion compensation and segmentation in medical images. These systems have been optimized for breast cancer diagnosis using magnetic resonance imaging technology with several advantages over traditional X-ray mammography, for example, obtaining patient information during a short period. This paper also addresses the challenge of offering a medical tool that runs on widespread portable devices, both on tablets and/or smartphones to aid in patient diagnostics.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents a fast part-based subspace selection algorithm, termed the binary sparse nonnegative matrix factorization (B-SNMF). Both the training process and the testing process of B-SNMF are much faster than those of binary principal component analysis (B-PCA). Besides, B-SNMF is more robust to occlusions in images. Experimental results on face images demonstrate the effectiveness and the efficiency of the proposed B-SNMF.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Spectral unmixing (SU) is a technique to characterize mixed pixels of the hyperspectral images measured by remote sensors. Most of the existing spectral unmixing algorithms are developed using the linear mixing models. Since the number of endmembers/materials present at each mixed pixel is normally scanty compared with the number of total endmembers (the dimension of spectral library), the problem becomes sparse. This thesis introduces sparse hyperspectral unmixing methods for the linear mixing model through two different scenarios. In the first scenario, the library of spectral signatures is assumed to be known and the main problem is to find the minimum number of endmembers under a reasonable small approximation error. Mathematically, the corresponding problem is called the $\ell_0$-norm problem which is NP-hard problem. Our main study for the first part of thesis is to find more accurate and reliable approximations of $\ell_0$-norm term and propose sparse unmixing methods via such approximations. The resulting methods are shown considerable improvements to reconstruct the fractional abundances of endmembers in comparison with state-of-the-art methods such as having lower reconstruction errors. In the second part of the thesis, the first scenario (i.e., dictionary-aided semiblind unmixing scheme) will be generalized as the blind unmixing scenario that the library of spectral signatures is also estimated. We apply the nonnegative matrix factorization (NMF) method for proposing new unmixing methods due to its noticeable supports such as considering the nonnegativity constraints of two decomposed matrices. Furthermore, we introduce new cost functions through some statistical and physical features of spectral signatures of materials (SSoM) and hyperspectral pixels such as the collaborative property of hyperspectral pixels and the mathematical representation of the concentrated energy of SSoM for the first few subbands. Finally, we introduce sparse unmixing methods for the blind scenario and evaluate the efficiency of the proposed methods via simulations over synthetic and real hyperspectral data sets. The results illustrate considerable enhancements to estimate the spectral library of materials and their fractional abundances such as smaller values of spectral angle distance (SAD) and abundance angle distance (AAD) as well.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The multi-criteria decision making methods, Preference METHods for Enrichment Evaluation (PROMETHEE) and Graphical Analysis for Interactive Assistance (GAIA), and the two-way Positive Matrix Factorization (PMF) receptor model were applied to airborne fine particle compositional data collected at three sites in Hong Kong during two monitoring campaigns held from November 2000 to October 2001 and November 2004 to October 2005. PROMETHEE/GAIA indicated that the three sites were worse during the later monitoring campaign, and that the order of the air quality at the sites during each campaign was: rural site > urban site > roadside site. The PMF analysis on the other hand, identified 6 common sources at all of the sites (diesel vehicle, fresh sea salt, secondary sulphate, soil, aged sea salt and oil combustion) which accounted for approximately 68.8 ± 8.7% of the fine particle mass at the sites. In addition, road dust, gasoline vehicle, biomass burning, secondary nitrate, and metal processing were identified at some of the sites. Secondary sulphate was found to be the highest contributor to the fine particle mass at the rural and urban sites with vehicle emission as a high contributor to the roadside site. The PMF results are broadly similar to those obtained in a previous analysis by PCA/APCS. However, the PMF analysis resolved more factors at each site than the PCA/APCS. In addition, the study demonstrated that combined results from multi-criteria decision making analysis and receptor modelling can provide more detailed information that can be used to formulate the scientific basis for mitigating air pollution in the region.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Purpose: To investigate the significance of sources around measurement sites, assist the development of control strategies for the important sources and mitigate the adverse effects of air pollution due to particle size. Methods: In this study, sampling was conducted at two sites located in urban/industrial and residential areas situated at roadsides along the Brisbane Urban Corridor. Ultrafine and fine particle measurements obtained at the two sites in June-July 2002 were analysed by Positive Matrix Factorization (PMF). Results: Six sources were present, including local traffic, two traffic sources, biomass burning, and two currently unidentified sources. Secondary particles had a significant impact at Site 1, while nitrates, peak traffic hours and main roads located close to the source also affected the results for both sites. Conclusions: This significant traffic corridor exemplifies the type of sources present in heavily trafficked locations and future attempts to control pollution in this type of environment could focus on the sources that were identified.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

An Aerodyne Aerosol Mass Spectrometer was deployed at five urban schools to examine spatial and temporal variability of organic aerosols (OA) and positive matrix factorization (PMF) used for the first time in the Southern Hemisphere to apportion the sources of the OA across an urban area. The sources identified included hydrocarbon-like OA (HOA), biomass burning OA (BBOA) and oxygenated OA (OOA). At all sites, the main source was OOA, which accounted for 62–73% of the total OA mass and was generally more oxidized compared to those reported in the Northern Hemisphere. This suggests that there are differences in aging processes or regional sources in the two hemispheres. Unlike HOA and BBOA, OOA demonstrated instructive temporal variations but not spatial variation across the urban area. Application of cluster analysis to the PMF-derived sources offered a simple and effective method for qualitative comparison of PMF sources that can be used in other studies.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Children are particularly susceptible to air pollution and schools are examples of urban microenvironments that can account for a large portion of children’s exposure to airborne particles. Thus this paper aimed to determine the sources of primary airborne particles that children are exposed to at school by analyzing selected organic molecular markers at 11 urban schools in Brisbane, Australia. Positive matrix factorization analysis identified four sources at the schools: vehicle emissions, biomass burning, meat cooking and plant wax emissions accounting for 45%, 29%, 16% and 7%, of the organic carbon respectively. Biomass burning peaked in winter due to prescribed burning of bushland around Brisbane. Overall, the results indicated that both local (traffic) and regional (biomass burning) sources of primary organic aerosols influence the levels of ambient particles that children are exposed at the schools. These results have implications for potential control strategies for mitigating exposure at schools.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This thesis presents composition measurements for atmospherically relevant inorganic and organic aerosol from laboratory and ambient measurements using the Aerodyne aerosol mass spectrometer. Studies include the oxidation of dodecane in the Caltech environmental chambers, and several aircraft- and ground-based field studies, which include the quantification of wildfire emissions off the coast of California, and Los Angeles urban emissions.

The oxidation of dodecane by OH under low NO conditions and the formation of secondary organic aerosol (SOA) was explored using a gas-phase chemical model, gas-phase CIMS measurements, and high molecular weight ion traces from particle- phase HR-TOF-AMS mass spectra. The combination of these measurements support the hypothesis that particle-phase chemistry leading to peroxyhemiacetal formation is important. Positive matrix factorization (PMF) was applied to the AMS mass spectra which revealed three factors representing a combination of gas-particle partitioning, chemical conversion in the aerosol, and wall deposition.

Airborne measurements of biomass burning emissions from a chaparral fire on the central Californian coast were carried out in November 2009. Physical and chemical changes were reported for smoke ages 0 – 4 h old. CO2 normalized ammonium, nitrate, and sulfate increased, whereas the normalized OA decreased sharply in the first 1.5 - 2 h, and then slowly increased for the remaining 2 h (net decrease in normalized OA). Comparison to wildfire samples from the Yucatan revealed that factors such as relative humidity, incident UV radiation, age of smoke, and concentration of emissions are important for wildfire evolution.

Ground-based aerosol composition is reported for Pasadena, CA during the summer of 2009. The OA component, which dominated the submicron aerosol mass, was deconvolved into hydrocarbon-like organic aerosol (HOA), semi-volatile oxidized organic aerosol (SVOOA), and low-volatility oxidized organic aerosol (LVOOA). The HOA/OA was only 0.08–0.23, indicating that most of Pasadena OA in the summer months is dominated by oxidized OA resulting from transported emissions that have undergone photochemistry and/or moisture-influenced processing, as apposed to only primary organic aerosol emissions. Airborne measurements and model predictions of aerosol composition are reported for the 2010 CalNex field campaign.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Trace element measurements in PM10–2.5, PM2.5–1.0 and PM1.0–0.3 aerosol were performed with 2 h time resolution at kerbside, urban background and rural sites during the ClearfLo winter 2012 campaign in London. The environment-dependent variability of emissions was characterized using the Multilinear Engine implementation of the positive matrix factorization model, conducted on data sets comprising all three sites but segregated by size. Combining the sites enabled separation of sources with high temporal covariance but significant spatial variability. Separation of sizes improved source resolution by preventing sources occurring in only a single size fraction from having too small a contribution for the model to resolve. Anchor profiles were retrieved internally by analysing data subsets, and these profiles were used in the analyses of the complete data sets of all sites for enhanced source apportionment. A total of nine different factors were resolved (notable elements in brackets): in PM10–2.5, brake wear (Cu, Zr, Sb, Ba), other traffic-related (Fe), resuspended dust (Si, Ca), sea/road salt (Cl), aged sea salt (Na, Mg) and industrial (Cr, Ni); in PM2.5–1.0, brake wear, other traffic-related, resuspended dust, sea/road salt, aged sea salt and S-rich (S); and in PM1.0–0.3, traffic-related (Fe, Cu, Zr, Sb, Ba), resuspended dust, sea/road salt, aged sea salt, reacted Cl (Cl), S-rich and solid fuel (K, Pb). Human activities enhance the kerb-to-rural concentration gradients of coarse aged sea salt, typically considered to have a natural source, by 1.7–2.2. These site-dependent concentration differences reflect the effect of local resuspension processes in London. The anthropogenically influenced factors traffic (brake wear and other traffic-related processes), dust and sea/road salt provide further kerb-to-rural concentration enhancements by direct source emissions by a factor of 3.5–12.7. The traffic and dust factors are mainly emitted in PM10–2.5 and show strong diurnal variations with concentrations up to 4 times higher during rush hour than during night-time. Regionally influenced S-rich and solid fuel factors, occurring primarily in PM1.0–0.3, have negligible resuspension influences, and concentrations are similar throughout the day and across the regions.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this work, new tools in atmospheric pollutant sampling and analysis were applied in order to go deeper in source apportionment study. The project was developed mainly by the study of atmospheric emission sources in a suburban area influenced by a municipal solid waste incinerator (MSWI), a medium-sized coastal tourist town and a motorway. Two main research lines were followed. For what concerns the first line, the potentiality of the use of PM samplers coupled with a wind select sensor was assessed. Results showed that they may be a valid support in source apportionment studies. However, meteorological and territorial conditions could strongly affect the results. Moreover, new markers were investigated, particularly focusing on the processes of biomass burning. OC revealed a good biomass combustion process indicator, as well as all determined organic compounds. Among metals, lead and aluminium are well related to the biomass combustion. Surprisingly PM was not enriched of potassium during bonfire event. The second research line consists on the application of Positive Matrix factorization (PMF), a new statistical tool in data analysis. This new technique was applied to datasets which refer to different time resolution data. PMF application to atmospheric deposition fluxes identified six main sources affecting the area. The incinerator’s relative contribution seemed to be negligible. PMF analysis was then applied to PM2.5 collected with samplers coupled with a wind select sensor. The higher number of determined environmental indicators allowed to obtain more detailed results on the sources affecting the area. Vehicular traffic revealed the source of greatest concern for the study area. Also in this case, incinerator’s relative contribution seemed to be negligible. Finally, the application of PMF analysis to hourly aerosol data demonstrated that the higher the temporal resolution of the data was, the more the source profiles were close to the real one.