2 resultados para Chemical processes Data processing
em CORA - Cork Open Research Archive - University College Cork - Ireland
Resumo:
Receptor modelling was performed on quadrupole unit mass resolution aerosol mass spectrometer (Q-AMS) sub-micron particulate matter (PM) chemical speciation measurements from Windsor, Ontario, an industrial city situated across the Detroit River from Detroit, Michigan. Aerosol and trace gas measurements were collected on board Environment Canada’s CRUISER mobile laboratory. Positive matrix factorization (PMF) was performed on the AMS full particle-phase mass spectrum (PMFFull MS) encompassing both organic and inorganic components. This approach was compared to the more common method of analysing only the organic mass spectra (PMFOrg MS). PMF of the full mass spectrum revealed that variability in the non-refractory sub-micron aerosol concentration and composition was best explained by six factors: an amine-containing factor (Amine); an ammonium sulphate and oxygenated organic aerosol containing factor (Sulphate-OA); an ammonium nitrate and oxygenated organic aerosol containing factor (Nitrate-OA); an ammonium chloride containing factor (Chloride); a hydrocarbon like organic aerosol (HOA) factor; and a moderately oxygenated organic aerosol factor (OOA). PMF of the organic mass spectrum revealed three factors of similar composition to some of those revealed through PMFFull MS: Amine, HOA and OOA. Including both the inorganic and organic mass proved to be a beneficial approach to analysing the unit mass resolution AMS data for several reasons. First, it provided a method for potentially calculating more accurate sub-micron PM mass concentrations, particularly when unusual factors are present, in this case, an Amine factor. As this method does not rely on a priori knowledge of chemical species, it circumvents the need for any adjustments to the traditional AMS species fragmentation patterns to account for atypical species, and can thus lead to more complete factor profiles. It is expected that this method would be even more useful for HR-ToF-AMS data, due to the ability to better understand the chemical nature of atypical factors from high resolution mass spectra. Second, utilizing PMF to extract factors containing inorganic species allowed for the determination of extent of neutralization, which could have implications for aerosol parameterization. Third, subtler differences in organic aerosol components were resolved through the incorporation of inorganic mass into the PMF matrix. The additional temporal features provided by the inorganic aerosol components allowed for the resolution of more types of oxygenated organic aerosol than could be reliably re-solved from PMF of organics alone. Comparison of findings from the PMFFull MS and PMFOrg MS methods showed that for the Windsor airshed, the PMFFull MS method enabled additional conclusions to be drawn in terms of aerosol sources and chemical processes. While performing PMFOrg MS can provide important distinctions between types of organic aerosol, it is shown that including inorganic species in the PMF analysis can permit further apportionment of organics for unit mass resolution AMS mass spectra.
Resumo:
A substantial amount of information on the Internet is present in the form of text. The value of this semi-structured and unstructured data has been widely acknowledged, with consequent scientific and commercial exploitation. The ever-increasing data production, however, pushes data analytic platforms to their limit. This thesis proposes techniques for more efficient textual big data analysis suitable for the Hadoop analytic platform. This research explores the direct processing of compressed textual data. The focus is on developing novel compression methods with a number of desirable properties to support text-based big data analysis in distributed environments. The novel contributions of this work include the following. Firstly, a Content-aware Partial Compression (CaPC) scheme is developed. CaPC makes a distinction between informational and functional content in which only the informational content is compressed. Thus, the compressed data is made transparent to existing software libraries which often rely on functional content to work. Secondly, a context-free bit-oriented compression scheme (Approximated Huffman Compression) based on the Huffman algorithm is developed. This uses a hybrid data structure that allows pattern searching in compressed data in linear time. Thirdly, several modern compression schemes have been extended so that the compressed data can be safely split with respect to logical data records in distributed file systems. Furthermore, an innovative two layer compression architecture is used, in which each compression layer is appropriate for the corresponding stage of data processing. Peripheral libraries are developed that seamlessly link the proposed compression schemes to existing analytic platforms and computational frameworks, and also make the use of the compressed data transparent to developers. The compression schemes have been evaluated for a number of standard MapReduce analysis tasks using a collection of real-world datasets. In comparison with existing solutions, they have shown substantial improvement in performance and significant reduction in system resource requirements.