921 resultados para non-negative matrix factorization
Resumo:
In this article, we study a new class of non negative distributions generated by the symmetric distributions around zero. For the special case of the distribution generated using the normal distribution, properties like moments generating function, stochastic representation, reliability connections, and inference aspects using methods of moments and maximum likelihood are studied. Moreover, a real data set is analyzed, illustrating the fact that good fits can result.
Resumo:
This paper presents a fast part-based subspace selection algorithm, termed the binary sparse nonnegative matrix factorization (B-SNMF). Both the training process and the testing process of B-SNMF are much faster than those of binary principal component analysis (B-PCA). Besides, B-SNMF is more robust to occlusions in images. Experimental results on face images demonstrate the effectiveness and the efficiency of the proposed B-SNMF.
Resumo:
Mathematics Subject Classification: Primary 47A60, 47D06.
Resumo:
In the last decade, large numbers of social media services have emerged and been widely used in people's daily life as important information sharing and acquisition tools. With a substantial amount of user-contributed text data on social media, it becomes a necessity to develop methods and tools for text analysis for this emerging data, in order to better utilize it to deliver meaningful information to users. ^ Previous work on text analytics in last several decades is mainly focused on traditional types of text like emails, news and academic literatures, and several critical issues to text data on social media have not been well explored: 1) how to detect sentiment from text on social media; 2) how to make use of social media's real-time nature; 3) how to address information overload for flexible information needs. ^ In this dissertation, we focus on these three problems. First, to detect sentiment of text on social media, we propose a non-negative matrix tri-factorization (tri-NMF) based dual active supervision method to minimize human labeling efforts for the new type of data. Second, to make use of social media's real-time nature, we propose approaches to detect events from text streams on social media. Third, to address information overload for flexible information needs, we propose two summarization framework, dominating set based summarization framework and learning-to-rank based summarization framework. The dominating set based summarization framework can be applied for different types of summarization problems, while the learning-to-rank based summarization framework helps utilize the existing training data to guild the new summarization tasks. In addition, we integrate these techneques in an application study of event summarization for sports games as an example of how to better utilize social media data. ^
Resumo:
In the last decade, large numbers of social media services have emerged and been widely used in people's daily life as important information sharing and acquisition tools. With a substantial amount of user-contributed text data on social media, it becomes a necessity to develop methods and tools for text analysis for this emerging data, in order to better utilize it to deliver meaningful information to users. Previous work on text analytics in last several decades is mainly focused on traditional types of text like emails, news and academic literatures, and several critical issues to text data on social media have not been well explored: 1) how to detect sentiment from text on social media; 2) how to make use of social media's real-time nature; 3) how to address information overload for flexible information needs. In this dissertation, we focus on these three problems. First, to detect sentiment of text on social media, we propose a non-negative matrix tri-factorization (tri-NMF) based dual active supervision method to minimize human labeling efforts for the new type of data. Second, to make use of social media's real-time nature, we propose approaches to detect events from text streams on social media. Third, to address information overload for flexible information needs, we propose two summarization framework, dominating set based summarization framework and learning-to-rank based summarization framework. The dominating set based summarization framework can be applied for different types of summarization problems, while the learning-to-rank based summarization framework helps utilize the existing training data to guild the new summarization tasks. In addition, we integrate these techneques in an application study of event summarization for sports games as an example of how to better utilize social media data.
Resumo:
This paper is concerned with tensor clustering with the assistance of dimensionality reduction approaches. A class of formulation for tensor clustering is introduced based on tensor Tucker decomposition models. In this formulation, an extra tensor mode is formed by a collection of tensors of the same dimensions and then used to assist a Tucker decomposition in order to achieve data dimensionality reduction. We design two types of clustering models for the tensors: PCA Tensor Clustering model and Non-negative Tensor Clustering model, by utilizing different regularizations. The tensor clustering can thus be solved by the optimization method based on the alternative coordinate scheme. Interestingly, our experiments show that the proposed models yield comparable or even better performance compared to most recent clustering algorithms based on matrix factorization.
Resumo:
Results of a systematic study concerning non-spectral interferences from sulfuric acid containing matrices on a large number of elements in inductively coupled plasma–mass spectrometry (ICP-MS) are presented in this work. The signals obtained with sulfuric acid solutions of different concentrations (up to 5% w w− 1) have been compared with the corresponding signals for a 1% w w− 1− nitric acid solution at different experimental conditions (i.e., sample uptake rates, nebulizer gas flows and r.f. powers). The signals observed for 128Te+, 78Se+ and 75As+ were significantly higher when using sulfuric acid matrices (up to 2.2-fold for 128Te+ and 78Se+ and 1.8-fold for 75As+ in the presence of 5 w w-1 sulfuric acid) for the whole range of experimental conditions tested. This is in agreement with previously reported observations. The signal for 31P+ is also higher (1.1-fold) in the presence of sulfuric acid. The signal enhancements for 128Te+, 78Se+, 75As+ and 31P+ are explained in relation to an increase in the analyte ion population as a result of charge transfer reactions involving S+ species in the plasma. Theoretical data suggest that Os, Sb, Pt, Ir, Zn and Hg could also be involved in sulfur-based charge transfer reactions, but no experimental evidence has been found. The presence of sulfuric acid gives rise to lower ion signals (about 10–20% lower) for the other nuclides tested, thus indicating the negative matrix effect caused by changes in the amount of analyte loading of the plasma. The elemental composition of a certified low-density polyethylene sample (ERM-EC681K) was determined by ICP-MS after two different sample digestion procedures, one of them including sulfuric acid. Element concentrations were in agreement with the certified values, irrespective of the acids used for the digestion. These results demonstrate that the use of matrix-matched standards allows the accurate determination of the tested elements in a sulfuric acid matrix.
Resumo:
In early spring the Baltic region is frequently affected by high-pollution events due to biomass burning in that area. Here we present a comprehensive study to investigate the impact of biomass/grass burning (BB) on the evolution and composition of aerosol in Preila, Lithuania, during springtime open fires. Non-refractory submicron particulate matter (NR-PM1) was measured by an Aerodyne aerosol chemical speciation monitor (ACSM) and a source apportionment with the multilinear engine (ME-2) running the positive matrix factorization (PMF) model was applied to the organic aerosol fraction to investigate the impact of biomass/grass burning. Satellite observations over regions of biomass burning activity supported the results and identification of air mass transport to the area of investigation. Sharp increases in biomass burning tracers, such as levoglucosan up to 683 ngm-3 and black carbon (BC) up to 17 μgm-3 were observed during this period. A further separation between fossil and non-fossil primary and secondary contributions was obtained by coupling ACSM PMF results and radiocarbon (14C) measurements of the elemental (EC) and organic (OC) carbon fractions. Non-fossil organic carbon (OCnf/ was the dominant fraction of PM1, with the primary (POCnf/ and secondary (SOCnf/ fractions contributing 26–44% and 13–23% to the total carbon (TC), respectively. 5–8% of the TC had a primary fossil origin (POCf/, whereas the contribution of fossil secondary organic carbon (SOCf/ was 4–13 %. Nonfossil EC (ECnf/ and fossil EC (ECf/ ranged from 13–24 and 7–13 %, respectively. Isotope ratios of stable carbon and nitrogen isotopes were used to distinguish aerosol particles associated with solid and liquid fossil fuel burning.
Resumo:
2000 Mathematics Subject Classification: 20M20, 20M10.
Resumo:
Receptor modelling was performed on quadrupole unit mass resolution aerosol mass spectrometer (Q-AMS) sub-micron particulate matter (PM) chemical speciation measurements from Windsor, Ontario, an industrial city situated across the Detroit River from Detroit, Michigan. Aerosol and trace gas measurements were collected on board Environment Canada’s CRUISER mobile laboratory. Positive matrix factorization (PMF) was performed on the AMS full particle-phase mass spectrum (PMFFull MS) encompassing both organic and inorganic components. This approach was compared to the more common method of analysing only the organic mass spectra (PMFOrg MS). PMF of the full mass spectrum revealed that variability in the non-refractory sub-micron aerosol concentration and composition was best explained by six factors: an amine-containing factor (Amine); an ammonium sulphate and oxygenated organic aerosol containing factor (Sulphate-OA); an ammonium nitrate and oxygenated organic aerosol containing factor (Nitrate-OA); an ammonium chloride containing factor (Chloride); a hydrocarbon like organic aerosol (HOA) factor; and a moderately oxygenated organic aerosol factor (OOA). PMF of the organic mass spectrum revealed three factors of similar composition to some of those revealed through PMFFull MS: Amine, HOA and OOA. Including both the inorganic and organic mass proved to be a beneficial approach to analysing the unit mass resolution AMS data for several reasons. First, it provided a method for potentially calculating more accurate sub-micron PM mass concentrations, particularly when unusual factors are present, in this case, an Amine factor. As this method does not rely on a priori knowledge of chemical species, it circumvents the need for any adjustments to the traditional AMS species fragmentation patterns to account for atypical species, and can thus lead to more complete factor profiles. It is expected that this method would be even more useful for HR-ToF-AMS data, due to the ability to better understand the chemical nature of atypical factors from high resolution mass spectra. Second, utilizing PMF to extract factors containing inorganic species allowed for the determination of extent of neutralization, which could have implications for aerosol parameterization. Third, subtler differences in organic aerosol components were resolved through the incorporation of inorganic mass into the PMF matrix. The additional temporal features provided by the inorganic aerosol components allowed for the resolution of more types of oxygenated organic aerosol than could be reliably re-solved from PMF of organics alone. Comparison of findings from the PMFFull MS and PMFOrg MS methods showed that for the Windsor airshed, the PMFFull MS method enabled additional conclusions to be drawn in terms of aerosol sources and chemical processes. While performing PMFOrg MS can provide important distinctions between types of organic aerosol, it is shown that including inorganic species in the PMF analysis can permit further apportionment of organics for unit mass resolution AMS mass spectra.
Resumo:
We describe in detail the theory underpinning the measurement of density matrices of a pair of quantum two-level systems (qubits). Our particular emphasis is on qubits realized by the two polarization degrees of freedom of a pair of entangled photons generated in a down-conversion experiment; however, the discussion applies in general, regardless of the actual physical realization. Two techniques are discussed, namely, a tomographic reconstruction (in which the density matrix is linearly related to a set of measured quantities) and a maximum likelihood technique which requires numerical optimization (but has the advantage of producing density matrices that are always non-negative definite). In addition, a detailed error analysis is presented, allowing errors in quantities derived from the density matrix, such as the entropy or entanglement of formation, to be estimated. Examples based on down-conversion experiments are used to illustrate our results.
Resumo:
Relatório de estágio de mestrado em Ensino do 1º e 2º Ciclo do Ensino Básico
Resumo:
Intensity-modulated radiotherapy (IMRT) treatment plan verification by comparison with measured data requires having access to the linear accelerator and is time consuming. In this paper, we propose a method for monitor unit (MU) calculation and plan comparison for step and shoot IMRT based on the Monte Carlo code EGSnrc/BEAMnrc. The beamlets of an IMRT treatment plan are individually simulated using Monte Carlo and converted into absorbed dose to water per MU. The dose of the whole treatment can be expressed through a linear matrix equation of the MU and dose per MU of every beamlet. Due to the positivity of the absorbed dose and MU values, this equation is solved for the MU values using a non-negative least-squares fit optimization algorithm (NNLS). The Monte Carlo plan is formed by multiplying the Monte Carlo absorbed dose to water per MU with the Monte Carlo/NNLS MU. Several treatment plan localizations calculated with a commercial treatment planning system (TPS) are compared with the proposed method for validation. The Monte Carlo/NNLS MUs are close to the ones calculated by the TPS and lead to a treatment dose distribution which is clinically equivalent to the one calculated by the TPS. This procedure can be used as an IMRT QA and further development could allow this technique to be used for other radiotherapy techniques like tomotherapy or volumetric modulated arc therapy.
Resumo:
Résumé : La radiothérapie par modulation d'intensité (IMRT) est une technique de traitement qui utilise des faisceaux dont la fluence de rayonnement est modulée. L'IMRT, largement utilisée dans les pays industrialisés, permet d'atteindre une meilleure homogénéité de la dose à l'intérieur du volume cible et de réduire la dose aux organes à risque. Une méthode usuelle pour réaliser pratiquement la modulation des faisceaux est de sommer de petits faisceaux (segments) qui ont la même incidence. Cette technique est appelée IMRT step-and-shoot. Dans le contexte clinique, il est nécessaire de vérifier les plans de traitement des patients avant la première irradiation. Cette question n'est toujours pas résolue de manière satisfaisante. En effet, un calcul indépendant des unités moniteur (représentatif de la pondération des chaque segment) ne peut pas être réalisé pour les traitements IMRT step-and-shoot, car les poids des segments ne sont pas connus à priori, mais calculés au moment de la planification inverse. Par ailleurs, la vérification des plans de traitement par comparaison avec des mesures prend du temps et ne restitue pas la géométrie exacte du traitement. Dans ce travail, une méthode indépendante de calcul des plans de traitement IMRT step-and-shoot est décrite. Cette méthode est basée sur le code Monte Carlo EGSnrc/BEAMnrc, dont la modélisation de la tête de l'accélérateur linéaire a été validée dans une large gamme de situations. Les segments d'un plan de traitement IMRT sont simulés individuellement dans la géométrie exacte du traitement. Ensuite, les distributions de dose sont converties en dose absorbée dans l'eau par unité moniteur. La dose totale du traitement dans chaque élément de volume du patient (voxel) peut être exprimée comme une équation matricielle linéaire des unités moniteur et de la dose par unité moniteur de chacun des faisceaux. La résolution de cette équation est effectuée par l'inversion d'une matrice à l'aide de l'algorithme dit Non-Negative Least Square fit (NNLS). L'ensemble des voxels contenus dans le volume patient ne pouvant être utilisés dans le calcul pour des raisons de limitations informatiques, plusieurs possibilités de sélection ont été testées. Le meilleur choix consiste à utiliser les voxels contenus dans le Volume Cible de Planification (PTV). La méthode proposée dans ce travail a été testée avec huit cas cliniques représentatifs des traitements habituels de radiothérapie. Les unités moniteur obtenues conduisent à des distributions de dose globale cliniquement équivalentes à celles issues du logiciel de planification des traitements. Ainsi, cette méthode indépendante de calcul des unités moniteur pour l'IMRT step-andshootest validée pour une utilisation clinique. Par analogie, il serait possible d'envisager d'appliquer une méthode similaire pour d'autres modalités de traitement comme par exemple la tomothérapie. Abstract : Intensity Modulated RadioTherapy (IMRT) is a treatment technique that uses modulated beam fluence. IMRT is now widespread in more advanced countries, due to its improvement of dose conformation around target volume, and its ability to lower doses to organs at risk in complex clinical cases. One way to carry out beam modulation is to sum smaller beams (beamlets) with the same incidence. This technique is called step-and-shoot IMRT. In a clinical context, it is necessary to verify treatment plans before the first irradiation. IMRT Plan verification is still an issue for this technique. Independent monitor unit calculation (representative of the weight of each beamlet) can indeed not be performed for IMRT step-and-shoot, because beamlet weights are not known a priori, but calculated by inverse planning. Besides, treatment plan verification by comparison with measured data is time consuming and performed in a simple geometry, usually in a cubic water phantom with all machine angles set to zero. In this work, an independent method for monitor unit calculation for step-and-shoot IMRT is described. This method is based on the Monte Carlo code EGSnrc/BEAMnrc. The Monte Carlo model of the head of the linear accelerator is validated by comparison of simulated and measured dose distributions in a large range of situations. The beamlets of an IMRT treatment plan are calculated individually by Monte Carlo, in the exact geometry of the treatment. Then, the dose distributions of the beamlets are converted in absorbed dose to water per monitor unit. The dose of the whole treatment in each volume element (voxel) can be expressed through a linear matrix equation of the monitor units and dose per monitor unit of every beamlets. This equation is solved by a Non-Negative Least Sqvare fif algorithm (NNLS). However, not every voxels inside the patient volume can be used in order to solve this equation, because of computer limitations. Several ways of voxel selection have been tested and the best choice consists in using voxels inside the Planning Target Volume (PTV). The method presented in this work was tested with eight clinical cases, which were representative of usual radiotherapy treatments. The monitor units obtained lead to clinically equivalent global dose distributions. Thus, this independent monitor unit calculation method for step-and-shoot IMRT is validated and can therefore be used in a clinical routine. It would be possible to consider applying a similar method for other treatment modalities, such as for instance tomotherapy or volumetric modulated arc therapy.
Resumo:
L'objectif du présent mémoire vise à présenter des modèles de séries chronologiques multivariés impliquant des vecteurs aléatoires dont chaque composante est non-négative. Nous considérons les modèles vMEM (modèles vectoriels et multiplicatifs avec erreurs non-négatives) présentés par Cipollini, Engle et Gallo (2006) et Cipollini et Gallo (2010). Ces modèles représentent une généralisation au cas multivarié des modèles MEM introduits par Engle (2002). Ces modèles trouvent notamment des applications avec les séries chronologiques financières. Les modèles vMEM permettent de modéliser des séries chronologiques impliquant des volumes d'actif, des durées, des variances conditionnelles, pour ne citer que ces applications. Il est également possible de faire une modélisation conjointe et d'étudier les dynamiques présentes entre les séries chronologiques formant le système étudié. Afin de modéliser des séries chronologiques multivariées à composantes non-négatives, plusieurs spécifications du terme d'erreur vectoriel ont été proposées dans la littérature. Une première approche consiste à considérer l'utilisation de vecteurs aléatoires dont la distribution du terme d'erreur est telle que chaque composante est non-négative. Cependant, trouver une distribution multivariée suffisamment souple définie sur le support positif est plutôt difficile, au moins avec les applications citées précédemment. Comme indiqué par Cipollini, Engle et Gallo (2006), un candidat possible est une distribution gamma multivariée, qui impose cependant des restrictions sévères sur les corrélations contemporaines entre les variables. Compte tenu que les possibilités sont limitées, une approche possible est d'utiliser la théorie des copules. Ainsi, selon cette approche, des distributions marginales (ou marges) peuvent être spécifiées, dont les distributions en cause ont des supports non-négatifs, et une fonction de copule permet de tenir compte de la dépendance entre les composantes. Une technique d'estimation possible est la méthode du maximum de vraisemblance. Une approche alternative est la méthode des moments généralisés (GMM). Cette dernière méthode présente l'avantage d'être semi-paramétrique dans le sens que contrairement à l'approche imposant une loi multivariée, il n'est pas nécessaire de spécifier une distribution multivariée pour le terme d'erreur. De manière générale, l'estimation des modèles vMEM est compliquée. Les algorithmes existants doivent tenir compte du grand nombre de paramètres et de la nature élaborée de la fonction de vraisemblance. Dans le cas de l'estimation par la méthode GMM, le système à résoudre nécessite également l'utilisation de solveurs pour systèmes non-linéaires. Dans ce mémoire, beaucoup d'énergies ont été consacrées à l'élaboration de code informatique (dans le langage R) pour estimer les différents paramètres du modèle. Dans le premier chapitre, nous définissons les processus stationnaires, les processus autorégressifs, les processus autorégressifs conditionnellement hétéroscédastiques (ARCH) et les processus ARCH généralisés (GARCH). Nous présentons aussi les modèles de durées ACD et les modèles MEM. Dans le deuxième chapitre, nous présentons la théorie des copules nécessaire pour notre travail, dans le cadre des modèles vectoriels et multiplicatifs avec erreurs non-négatives vMEM. Nous discutons également des méthodes possibles d'estimation. Dans le troisième chapitre, nous discutons les résultats des simulations pour plusieurs méthodes d'estimation. Dans le dernier chapitre, des applications sur des séries financières sont présentées. Le code R est fourni dans une annexe. Une conclusion complète ce mémoire.