376 resultados para Collinear factorization
Resumo:
Ambient wintertime background urban aerosol in Cork city, Ireland, was characterized using aerosol mass spectrometry. During the three-week measurement study in 2009, 93% of the ca. 1 350 000 single particles characterized by an Aerosol Time-of-Flight Mass Spectrometer (TSI ATOFMS) were classified into five organic-rich particle types, internally mixed to different proportions with elemental carbon (EC), sulphate and nitrate, while the remaining 7% was predominantly inorganic in nature. Non-refractory PM1 aerosol was characterized using a High Resolution Time-of-Flight Aerosol Mass Spectrometer (Aerodyne HR-ToF-AMS) and was also found to comprise organic aerosol as the most abundant species (62 %), followed by nitrate (15 %), sulphate (9 %) and ammonium (9 %), and chloride (5 %). Positive matrix factorization (PMF) was applied to the HR-ToF-AMS organic matrix, and a five-factor solution was found to describe the variance in the data well. Specifically, "hydrocarbon-like" organic aerosol (HOA) comprised 20% of the mass, "low-volatility" oxygenated organic aerosol (LV-OOA) comprised 18 %, "biomass burning" organic aerosol (BBOA) comprised 23 %, non-wood solid-fuel combustion "peat and coal" organic aerosol (PCOA) comprised 21 %, and finally a species type characterized by primary m/z peaks at 41 and 55, similar to previously reported "cooking" organic aerosol (COA), but possessing different diurnal variations to what would be expected for cooking activities, contributed 18 %. Correlations between the different particle types obtained by the two aerosol mass spectrometers are also discussed. Despite wood, coal and peat being minor fuel types used for domestic space heating in urban areas, their relatively low combustion efficiencies result in a significant contribution to PM1 aerosol mass (44% and 28% of the total organic aerosol mass and non-refractory total PM1, respectively).Ambient wintertime background urban aerosol in Cork city, Ireland, was characterized using aerosol mass spectrometry. During the three-week measurement study in 2009, 93% of the ca. 1 350 000 single particles characterized by an Aerosol Time-of-Flight Mass Spectrometer (TSI ATOFMS) were classified into five organic-rich particle types, internally mixed to different proportions with elemental carbon (EC), sulphate and nitrate, while the remaining 7% was predominantly inorganic in nature. Non-refractory PM1 aerosol was characterized using a High Resolution Time-of-Flight Aerosol Mass Spectrometer (Aerodyne HR-ToF-AMS) and was also found to comprise organic aerosol as the most abundant species (62 %), followed by nitrate (15 %), sulphate (9 %) and ammonium (9 %), and chloride (5 %). Positive matrix factorization (PMF) was applied to the HR-ToF-AMS organic matrix, and a five-factor solution was found to describe the variance in the data well. Specifically, "hydrocarbon-like" organic aerosol (HOA) comprised 20% of the mass, "low-volatility" oxygenated organic aerosol (LV-OOA) comprised 18 %, "biomass burning" organic aerosol (BBOA) comprised 23 %, non-wood solid-fuel combustion "peat and coal" organic aerosol (PCOA) comprised 21 %, and finally a species type characterized by primary m/z peaks at 41 and 55, similar to previously reported "cooking" organic aerosol (COA), but possessing different diurnal variations to what would be expected for cooking activities, contributed 18 %. Correlations between the different particle types obtained by the two aerosol mass spectrometers are also discussed. Despite wood, coal and peat being minor fuel types used for domestic space heating in urban areas, their relatively low combustion efficiencies result in a significant contribution to PM1 aerosol mass (44% and 28% of the total organic aerosol mass and non-refractory total PM1, respectively).
Resumo:
Understanding the impact of atmospheric black carbon (BC) containing particles on human health and radiative forcing requires knowledge of the mixing state of BC, including the characteristics of the materials with which it is internally mixed. In this study, we demonstrate for the first time the capabilities of the Aerodyne Soot-Particle Aerosol Mass Spectrometer equipped with a light scattering module (LS-SP-AMS) to examine the mixing state of refractory BC (rBC) and other aerosol components in an urban environment (downtown Toronto). K-means clustering analysis was used to classify single particle mass spectra into chemically distinct groups. One resultant cluster is dominated by rBC mass spectral signals (C+1 to C+5) while the organic signals fall into a few major clusters, identified as hydrocarbon-like organic aerosol (HOA), oxygenated organic aerosol (OOA), and cooking emission organic aerosol (COA). A nearly external mixing is observed with small BC particles only thinly coated by HOA ( 28% by mass on average), while over 90% of the HOA-rich particles did not contain detectable amounts of rBC. Most of the particles classified into other inorganic and organic clusters were not significantly associated with BC. The single particle results also suggest that HOA and COA emitted from anthropogenic sources were likely major contributors to organic-rich particles with low to mid-range aerodynamic diameter (dva). The similar temporal profiles and mass spectral features of the organic clusters and the factors from a positive matrix factorization (PMF) analysis of the ensemble aerosol dataset validate the conventional interpretation of the PMF results.
Resumo:
Many modern applications fall into the category of "large-scale" statistical problems, in which both the number of observations n and the number of features or parameters p may be large. Many existing methods focus on point estimation, despite the continued relevance of uncertainty quantification in the sciences, where the number of parameters to estimate often exceeds the sample size, despite huge increases in the value of n typically seen in many fields. Thus, the tendency in some areas of industry to dispense with traditional statistical analysis on the basis that "n=all" is of little relevance outside of certain narrow applications. The main result of the Big Data revolution in most fields has instead been to make computation much harder without reducing the importance of uncertainty quantification. Bayesian methods excel at uncertainty quantification, but often scale poorly relative to alternatives. This conflict between the statistical advantages of Bayesian procedures and their substantial computational disadvantages is perhaps the greatest challenge facing modern Bayesian statistics, and is the primary motivation for the work presented here.
Two general strategies for scaling Bayesian inference are considered. The first is the development of methods that lend themselves to faster computation, and the second is design and characterization of computational algorithms that scale better in n or p. In the first instance, the focus is on joint inference outside of the standard problem of multivariate continuous data that has been a major focus of previous theoretical work in this area. In the second area, we pursue strategies for improving the speed of Markov chain Monte Carlo algorithms, and characterizing their performance in large-scale settings. Throughout, the focus is on rigorous theoretical evaluation combined with empirical demonstrations of performance and concordance with the theory.
One topic we consider is modeling the joint distribution of multivariate categorical data, often summarized in a contingency table. Contingency table analysis routinely relies on log-linear models, with latent structure analysis providing a common alternative. Latent structure models lead to a reduced rank tensor factorization of the probability mass function for multivariate categorical data, while log-linear models achieve dimensionality reduction through sparsity. Little is known about the relationship between these notions of dimensionality reduction in the two paradigms. In Chapter 2, we derive several results relating the support of a log-linear model to nonnegative ranks of the associated probability tensor. Motivated by these findings, we propose a new collapsed Tucker class of tensor decompositions, which bridge existing PARAFAC and Tucker decompositions, providing a more flexible framework for parsimoniously characterizing multivariate categorical data. Taking a Bayesian approach to inference, we illustrate empirical advantages of the new decompositions.
Latent class models for the joint distribution of multivariate categorical, such as the PARAFAC decomposition, data play an important role in the analysis of population structure. In this context, the number of latent classes is interpreted as the number of genetically distinct subpopulations of an organism, an important factor in the analysis of evolutionary processes and conservation status. Existing methods focus on point estimates of the number of subpopulations, and lack robust uncertainty quantification. Moreover, whether the number of latent classes in these models is even an identified parameter is an open question. In Chapter 3, we show that when the model is properly specified, the correct number of subpopulations can be recovered almost surely. We then propose an alternative method for estimating the number of latent subpopulations that provides good quantification of uncertainty, and provide a simple procedure for verifying that the proposed method is consistent for the number of subpopulations. The performance of the model in estimating the number of subpopulations and other common population structure inference problems is assessed in simulations and a real data application.
In contingency table analysis, sparse data is frequently encountered for even modest numbers of variables, resulting in non-existence of maximum likelihood estimates. A common solution is to obtain regularized estimates of the parameters of a log-linear model. Bayesian methods provide a coherent approach to regularization, but are often computationally intensive. Conjugate priors ease computational demands, but the conjugate Diaconis--Ylvisaker priors for the parameters of log-linear models do not give rise to closed form credible regions, complicating posterior inference. In Chapter 4 we derive the optimal Gaussian approximation to the posterior for log-linear models with Diaconis--Ylvisaker priors, and provide convergence rate and finite-sample bounds for the Kullback-Leibler divergence between the exact posterior and the optimal Gaussian approximation. We demonstrate empirically in simulations and a real data application that the approximation is highly accurate, even in relatively small samples. The proposed approximation provides a computationally scalable and principled approach to regularized estimation and approximate Bayesian inference for log-linear models.
Another challenging and somewhat non-standard joint modeling problem is inference on tail dependence in stochastic processes. In applications where extreme dependence is of interest, data are almost always time-indexed. Existing methods for inference and modeling in this setting often cluster extreme events or choose window sizes with the goal of preserving temporal information. In Chapter 5, we propose an alternative paradigm for inference on tail dependence in stochastic processes with arbitrary temporal dependence structure in the extremes, based on the idea that the information on strength of tail dependence and the temporal structure in this dependence are both encoded in waiting times between exceedances of high thresholds. We construct a class of time-indexed stochastic processes with tail dependence obtained by endowing the support points in de Haan's spectral representation of max-stable processes with velocities and lifetimes. We extend Smith's model to these max-stable velocity processes and obtain the distribution of waiting times between extreme events at multiple locations. Motivated by this result, a new definition of tail dependence is proposed that is a function of the distribution of waiting times between threshold exceedances, and an inferential framework is constructed for estimating the strength of extremal dependence and quantifying uncertainty in this paradigm. The method is applied to climatological, financial, and electrophysiology data.
The remainder of this thesis focuses on posterior computation by Markov chain Monte Carlo. The Markov Chain Monte Carlo method is the dominant paradigm for posterior computation in Bayesian analysis. It has long been common to control computation time by making approximations to the Markov transition kernel. Comparatively little attention has been paid to convergence and estimation error in these approximating Markov Chains. In Chapter 6, we propose a framework for assessing when to use approximations in MCMC algorithms, and how much error in the transition kernel should be tolerated to obtain optimal estimation performance with respect to a specified loss function and computational budget. The results require only ergodicity of the exact kernel and control of the kernel approximation accuracy. The theoretical framework is applied to approximations based on random subsets of data, low-rank approximations of Gaussian processes, and a novel approximating Markov chain for discrete mixture models.
Data augmentation Gibbs samplers are arguably the most popular class of algorithm for approximately sampling from the posterior distribution for the parameters of generalized linear models. The truncated Normal and Polya-Gamma data augmentation samplers are standard examples for probit and logit links, respectively. Motivated by an important problem in quantitative advertising, in Chapter 7 we consider the application of these algorithms to modeling rare events. We show that when the sample size is large but the observed number of successes is small, these data augmentation samplers mix very slowly, with a spectral gap that converges to zero at a rate at least proportional to the reciprocal of the square root of the sample size up to a log factor. In simulation studies, moderate sample sizes result in high autocorrelations and small effective sample sizes. Similar empirical results are observed for related data augmentation samplers for multinomial logit and probit models. When applied to a real quantitative advertising dataset, the data augmentation samplers mix very poorly. Conversely, Hamiltonian Monte Carlo and a type of independence chain Metropolis algorithm show good mixing on the same dataset.
Resumo:
Abstract
Continuous variable is one of the major data types collected by the survey organizations. It can be incomplete such that the data collectors need to fill in the missingness. Or, it can contain sensitive information which needs protection from re-identification. One of the approaches to protect continuous microdata is to sum them up according to different cells of features. In this thesis, I represents novel methods of multiple imputation (MI) that can be applied to impute missing values and synthesize confidential values for continuous and magnitude data.
The first method is for limiting the disclosure risk of the continuous microdata whose marginal sums are fixed. The motivation for developing such a method comes from the magnitude tables of non-negative integer values in economic surveys. I present approaches based on a mixture of Poisson distributions to describe the multivariate distribution so that the marginals of the synthetic data are guaranteed to sum to the original totals. At the same time, I present methods for assessing disclosure risks in releasing such synthetic magnitude microdata. The illustration on a survey of manufacturing establishments shows that the disclosure risks are low while the information loss is acceptable.
The second method is for releasing synthetic continuous micro data by a nonstandard MI method. Traditionally, MI fits a model on the confidential values and then generates multiple synthetic datasets from this model. Its disclosure risk tends to be high, especially when the original data contain extreme values. I present a nonstandard MI approach conditioned on the protective intervals. Its basic idea is to estimate the model parameters from these intervals rather than the confidential values. The encouraging results of simple simulation studies suggest the potential of this new approach in limiting the posterior disclosure risk.
The third method is for imputing missing values in continuous and categorical variables. It is extended from a hierarchically coupled mixture model with local dependence. However, the new method separates the variables into non-focused (e.g., almost-fully-observed) and focused (e.g., missing-a-lot) ones. The sub-model structure of focused variables is more complex than that of non-focused ones. At the same time, their cluster indicators are linked together by tensor factorization and the focused continuous variables depend locally on non-focused values. The model properties suggest that moving the strongly associated non-focused variables to the side of focused ones can help to improve estimation accuracy, which is examined by several simulation studies. And this method is applied to data from the American Community Survey.
Resumo:
The first long-term aerosol sampling and chemical characterization results from measurements at the Cape Verde Atmospheric Observatory (CVAO) on the island of São Vicente are presented and are discussed with respect to air mass origin and seasonal trends. In total 671 samples were collected using a high-volume PM10 sampler on quartz fiber filters from January 2007 to December 2011. The samples were analyzed for their aerosol chemical composition, including their ionic and organic constituents. Back trajectory analyses showed that the aerosol at CVAO was strongly influenced by emissions from Europe and Africa, with the latter often responsible for high mineral dust loading. Sea salt and mineral dust dominated the aerosol mass and made up in total about 80% of the aerosol mass. The 5-year PM10 mean was 47.1 ± 55.5 µg/m**2, while the mineral dust and sea salt means were 27.9 ± 48.7 and 11.1 ± 5.5 µg/m**2, respectively. Non-sea-salt (nss) sulfate made up 62% of the total sulfate and originated from both long-range transport from Africa or Europe and marine sources. Strong seasonal variation was observed for the aerosol components. While nitrate showed no clear seasonal variation with an annual mean of 1.1 ± 0.6 µg/m**3, the aerosol mass, OC (organic carbon) and EC (elemental carbon), showed strong winter maxima due to strong influence of African air mass inflow. Additionally during summer, elevated concentrations of OM were observed originating from marine emissions. A summer maximum was observed for non-sea-salt sulfate and was connected to periods when air mass inflow was predominantly of marine origin, indicating that marine biogenic emissions were a significant source. Ammonium showed a distinct maximum in spring and coincided with ocean surface water chlorophyll a concentrations. Good correlations were also observed between nss-sulfate and oxalate during the summer and winter seasons, indicating a likely photochemical in-cloud processing of the marine and anthropogenic precursors of these species. High temporal variability was observed in both chloride and bromide depletion, differing significantly within the seasons, air mass history and Saharan dust concentration. Chloride (bromide) depletion varied from 8.8 ± 8.5% (62 ± 42%) in Saharan-dust-dominated air mass to 30 ± 12% (87 ± 11%) in polluted Europe air masses. During summer, bromide depletion often reached 100% in marine as well as in polluted continental samples. In addition to the influence of the aerosol acidic components, photochemistry was one of the main drivers of halogenide depletion during the summer; while during dust events, displacement reaction with nitric acid was found to be the dominant mechanism. Positive matrix factorization (PMF) analysis identified three major aerosol sources: sea salt, aged sea salt and long-range transport. The ionic budget was dominated by the first two of these factors, while the long-range transport factor could only account for about 14% of the total observed ionic mass.
Resumo:
This work outlines the theoretical advantages of multivariate methods in biomechanical data, validates the proposed methods and outlines new clinical findings relating to knee osteoarthritis that were made possible by this approach. New techniques were based on existing multivariate approaches, Partial Least Squares (PLS) and Non-negative Matrix Factorization (NMF) and validated using existing data sets. The new techniques developed, PCA-PLS-LDA (Principal Component Analysis – Partial Least Squares – Linear Discriminant Analysis), PCA-PLS-MLR (Principal Component Analysis – Partial Least Squares –Multiple Linear Regression) and Waveform Similarity (based on NMF) were developed to address the challenging characteristics of biomechanical data, variability and correlation. As a result, these new structure-seeking technique revealed new clinical findings. The first new clinical finding relates to the relationship between pain, radiographic severity and mechanics. Simultaneous analysis of pain and radiographic severity outcomes, a first in biomechanics, revealed that the knee adduction moment’s relationship to radiographic features is mediated by pain in subjects with moderate osteoarthritis. The second clinical finding was quantifying the importance of neuromuscular patterns in brace effectiveness for patients with knee osteoarthritis. I found that brace effectiveness was more related to the patient’s unbraced neuromuscular patterns than it was to mechanics, and that these neuromuscular patterns were more complicated than simply increased overall muscle activity, as previously thought.
Resumo:
Biodiversity loss is a global problem with freshwater bivalves considered amongst the most
endangered biota. The freshwater pearl mussel, Margaritifera margaritifera, is declining
throughout its range owing to habitat degradation and overexploitation. In most of its range,
populations are regarded as reproductively non-functional which has led to the development
of captive breeding programmes. A novel method of releasing M. margaritifera was trialled,
with captive-bred juveniles being released into the rivers caged in ‘mussels silos’ (protective
concrete domes with ventilation creating upwelling to ensure water through flow). We
released 240 juvenile mussels and survival and growth rates were monitored for 18 months
post-release for three size classes: A (13.01-20.00mm); B (10.01-13.00mm); and C (4.01-
10.00mm). We explicitly tested two experimental treatments; one where sediment was added
to each silo (allowing mussels to orientate and burrow) and one without sediment. Survival
by the end of the experiment at month 18 was significantly higher for the largest size class at
97% (though growth was lowest in this cohort), and lowest for the smallest size class at 61%
(though growth was highest in this cohort). Survival and growth were unaffected by the
experimental treatment suggesting that adding sediment offered no advantage. Growth was
positively correlated with both water temperature and the particle size of suspended solids
(both of which were collinear, peaking in summer). There are a large number of ex situ
breeding programmes for freshwater pearl mussels throughout Europe and our finding
suggest that the use of ‘mussel silos’ could be a useful tool to protecting juvenile mussels
allowing them to be released at a relatively early stage of development, minimising the risk of
domestication.
Resumo:
A primary goal of context-aware systems is delivering the right information at the right place and right time to users in order to enable them to make effective decisions and improve their quality of life. There are three key requirements for achieving this goal: determining what information is relevant, personalizing it based on the users’ context (location, preferences, behavioral history etc.), and delivering it to them in a timely manner without an explicit request from them. These requirements create a paradigm that we term as “Proactive Context-aware Computing”. Most of the existing context-aware systems fulfill only a subset of these requirements. Many of these systems focus only on personalization of the requested information based on users’ current context. Moreover, they are often designed for specific domains. In addition, most of the existing systems are reactive - the users request for some information and the system delivers it to them. These systems are not proactive i.e. they cannot anticipate users’ intent and behavior and act proactively without an explicit request from them. In order to overcome these limitations, we need to conduct a deeper analysis and enhance our understanding of context-aware systems that are generic, universal, proactive and applicable to a wide variety of domains. To support this dissertation, we explore several directions. Clearly the most significant sources of information about users today are smartphones. A large amount of users’ context can be acquired through them and they can be used as an effective means to deliver information to users. In addition, social media such as Facebook, Flickr and Foursquare provide a rich and powerful platform to mine users’ interests, preferences and behavioral history. We employ the ubiquity of smartphones and the wealth of information available from social media to address the challenge of building proactive context-aware systems. We have implemented and evaluated a few approaches, including some as part of the Rover framework, to achieve the paradigm of Proactive Context-aware Computing. Rover is a context-aware research platform which has been evolving for the last 6 years. Since location is one of the most important context for users, we have developed ‘Locus’, an indoor localization, tracking and navigation system for multi-story buildings. Other important dimensions of users’ context include the activities that they are engaged in. To this end, we have developed ‘SenseMe’, a system that leverages the smartphone and its multiple sensors in order to perform multidimensional context and activity recognition for users. As part of the ‘SenseMe’ project, we also conducted an exploratory study of privacy, trust, risks and other concerns of users with smart phone based personal sensing systems and applications. To determine what information would be relevant to users’ situations, we have developed ‘TellMe’ - a system that employs a new, flexible and scalable approach based on Natural Language Processing techniques to perform bootstrapped discovery and ranking of relevant information in context-aware systems. In order to personalize the relevant information, we have also developed an algorithm and system for mining a broad range of users’ preferences from their social network profiles and activities. For recommending new information to the users based on their past behavior and context history (such as visited locations, activities and time), we have developed a recommender system and approach for performing multi-dimensional collaborative recommendations using tensor factorization. For timely delivery of personalized and relevant information, it is essential to anticipate and predict users’ behavior. To this end, we have developed a unified infrastructure, within the Rover framework, and implemented several novel approaches and algorithms that employ various contextual features and state of the art machine learning techniques for building diverse behavioral models of users. Examples of generated models include classifying users’ semantic places and mobility states, predicting their availability for accepting calls on smartphones and inferring their device charging behavior. Finally, to enable proactivity in context-aware systems, we have also developed a planning framework based on HTN planning. Together, these works provide a major push in the direction of proactive context-aware computing.
Resumo:
Abstract The ultimate problem considered in this thesis is modeling a high-dimensional joint distribution over a set of discrete variables. For this purpose, we consider classes of context-specific graphical models and the main emphasis is on learning the structure of such models from data. Traditional graphical models compactly represent a joint distribution through a factorization justi ed by statements of conditional independence which are encoded by a graph structure. Context-speci c independence is a natural generalization of conditional independence that only holds in a certain context, speci ed by the conditioning variables. We introduce context-speci c generalizations of both Bayesian networks and Markov networks by including statements of context-specific independence which can be encoded as a part of the model structures. For the purpose of learning context-speci c model structures from data, we derive score functions, based on results from Bayesian statistics, by which the plausibility of a structure is assessed. To identify high-scoring structures, we construct stochastic and deterministic search algorithms designed to exploit the structural decomposition of our score functions. Numerical experiments on synthetic and real-world data show that the increased exibility of context-specific structures can more accurately emulate the dependence structure among the variables and thereby improve the predictive accuracy of the models.
Resumo:
We present efficient algorithms for solving Legendre equations over Q (equivalently, for finding rational points on rational conics) and parametrizing all solutions. Unlike existing algorithms, no integer factorization is required, provided that the prime factors of the discriminant are known.
Resumo:
An extended formulation of a polyhedron P is a linear description of a polyhedron Q together with a linear map π such that π(Q)=P. These objects are of fundamental importance in polyhedral combinatorics and optimization theory, and the subject of a number of studies. Yannakakis’ factorization theorem (Yannakakis in J Comput Syst Sci 43(3):441–466, 1991) provides a surprising connection between extended formulations and communication complexity, showing that the smallest size of an extended formulation of $$P$$P equals the nonnegative rank of its slack matrix S. Moreover, Yannakakis also shows that the nonnegative rank of S is at most 2c, where c is the complexity of any deterministic protocol computing S. In this paper, we show that the latter result can be strengthened when we allow protocols to be randomized. In particular, we prove that the base-2 logarithm of the nonnegative rank of any nonnegative matrix equals the minimum complexity of a randomized communication protocol computing the matrix in expectation. Using Yannakakis’ factorization theorem, this implies that the base-2 logarithm of the smallest size of an extended formulation of a polytope P equals the minimum complexity of a randomized communication protocol computing the slack matrix of P in expectation. We show that allowing randomization in the protocol can be crucial for obtaining small extended formulations. Specifically, we prove that for the spanning tree and perfect matching polytopes, small variance in the protocol forces large size in the extended formulation.
Resumo:
Surface ozone is formed in the presence of NOx (NO + NO2) and volatile organic compounds (VOCs) and is hazardous to human health. A better understanding of these precursors is needed for developing effective policies to improve air quality. To evaluate the year-to-year changes in source contributions to total VOCs, Positive Matrix Factorization (PMF) was used to perform source apportionment using available hourly observations from June through August at a Photochemical Assessment Monitoring Station (PAMS) in Essex, MD for each year from 2007-2015. Results suggest that while gasoline and vehicle exhaust emissions have fallen, the contribution of natural gas sources to total VOCs has risen. To investigate this increasing natural gas influence, ethane measurements from PAMS sites in Essex, MD and Washington, D.C. were examined. Following a period of decline, daytime ethane concentrations have increased significantly after 2009. This trend appears to be linked with the rapid shale gas production in upwind, neighboring states, especially Pennsylvania and West Virginia. Back-trajectory analyses similarly show that ethane concentrations at these monitors were significantly greater if air parcels had passed through counties containing a high density of unconventional natural gas wells. In addition to VOC emissions, the compressors and engines involved with hydraulic fracturing operations also emit NOx and particulate matter (PM). The Community Multi-scale Air Quality (CMAQ) Model was used to simulate air quality for the Eastern U.S. in 2020, including emissions from shale gas operations in the Appalachian Basin. Predicted concentrations of ozone and PM show the largest decreases when these natural gas resources are hypothetically used to convert coal-fired power plants, despite the increased emissions from hydraulic fracturing operations expanded into all possible shale regions in the Appalachian Basin. While not as clean as burning natural gas, emissions of NOx from coal-fired power plants can be reduced by utilizing post-combustion controls. However, even though capital investment has already been made, these controls are not always operated at optimal rates. CMAQ simulations for the Eastern U.S. in 2018 show ozone concentrations decrease by ~5 ppb when controls on coal-fired power plants limit NOx emissions to historically best rates.
Resumo:
The transverse momentum dependent parton distribution/fragmentation functions (TMDs) are essential in the factorization of a number of processes like Drell-Yan scattering, vector boson production, semi-inclusive deep inelastic scattering, etc. We provide a comprehensive study of unpolarized TMDs at next-to-next-to-leading order, which includes an explicit calculation of these TMDs and an extraction of their matching coefficients onto their integrated analogues, for all flavor combinations. The obtained matching coefficients are important for any kind of phenomenology involving TMDs. In the present study each individual TMD is calculated without any reference to a specific process. We recover the known results for parton distribution functions and provide new results for the fragmentation functions. The results for the gluon transverse momentum dependent fragmentation functions are presented for the first time at one and two loops. We also discuss the structure of singularities of TMD operators and TMD matrix elements, crossing relations between TMD parton distribution functions and TMD fragmentation functions, and renormalization group equations. In addition, we consider the behavior of the matching coefficients at threshold and make a conjecture on their structure to all orders in perturbation theory.
Resumo:
Multivariate orthogonal polynomials in D real dimensions are considered from the perspective of the Cholesky factorization of a moment matrix. The approach allows for the construction of corresponding multivariate orthogonal polynomials, associated second kind functions, Jacobi type matrices and associated three term relations and also Christoffel-Darboux formulae. The multivariate orthogonal polynomials, their second kind functions and the corresponding Christoffel-Darboux kernels are shown to be quasi-determinants as well as Schur complements of bordered truncations of the moment matrix; quasi-tau functions are introduced. It is proven that the second kind functions are multivariate Cauchy transforms of the multivariate orthogonal polynomials. Discrete and continuous deformations of the measure lead to Toda type integrable hierarchy, being the corresponding flows described through Lax and Zakharov-Shabat equations; bilinear equations are found. Varying size matrix nonlinear partial difference and differential equations of the 2D Toda lattice type are shown to be solved by matrix coefficients of the multivariate orthogonal polynomials. The discrete flows, which are shown to be connected with a Gauss-Borel factorization of the Jacobi type matrices and its quasi-determinants, lead to expressions for the multivariate orthogonal polynomials and their second kind functions in terms of shifted quasi-tau matrices, which generalize to the multidimensional realm, those that relate the Baker and adjoint Baker functions to ratios of Miwa shifted tau-functions in the 1D scenario. In this context, the multivariate extension of the elementary Darboux transformation is given in terms of quasi-determinants of matrices built up by the evaluation, at a poised set of nodes lying in an appropriate hyperplane in R^D, of the multivariate orthogonal polynomials. The multivariate Christoffel formula for the iteration of m elementary Darboux transformations is given as a quasi-determinant. It is shown, using congruences in the space of semi-infinite matrices, that the discrete and continuous flows are intimately connected and determine nonlinear partial difference-differential equations that involve only one site in the integrable lattice behaving as a Kadomstev-Petviashvili type system. Finally, a brief discussion of measures with a particular linear isometry invariance and some of its consequences for the corresponding multivariate polynomials is given. In particular, it is shown that the Toda times that preserve the invariance condition lay in a secant variety of the Veronese variety of the fixed point set of the linear isometry.
Resumo:
Matrix factorization (MF) has evolved as one of the better practice to handle sparse data in field of recommender systems. Funk singular value decomposition (SVD) is a variant of MF that exists as state-of-the-art method that enabled winning the Netflix prize competition. The method is widely used with modifications in present day research in field of recommender systems. With the potential of data points to grow at very high velocity, it is prudent to devise newer methods that can handle such data accurately as well as efficiently than Funk-SVD in the context of recommender system. In view of the growing data points, I propose a latent factor model that caters to both accuracy and efficiency by reducing the number of latent features of either users or items making it less complex than Funk-SVD, where latent features of both users and items are equal and often larger. A comprehensive empirical evaluation of accuracy on two publicly available, amazon and ml-100 k datasets reveals the comparable accuracy and lesser complexity of proposed methods than Funk-SVD.