19 resultados para COMPOSITIONAL DATA
Resumo:
The Irish and UK governments, along with other countries, have made a commitment to limit the concentrations of greenhouse gases in the atmosphere by reducing emissions from the burning of fossil fuels. This can be achieved (in part) through increasing the sequestration of CO2 from the atmosphere including monitoring the amount stored in vegetation and soils. A large proportion of soil carbon is held within peat due to the relatively high carbon density of peat and organic-rich soils. This is particularly important for a country such as Ireland, where some 16% of the land surface is covered by peat. For Northern Ireland, it has been estimated that the total amount of carbon stored in vegetation is 4.4Mt compared to 386Mt stored within peat and soils. As a result it has become increasingly important to measure and monitor changes in stores of carbon in soils. The conservation and restoration of peat covered areas, although ongoing for many years, has become increasingly important. This is summed up in current EU policy outlined by the European Commission (2012) which seeks to assess the relative contributions of the different inputs and outputs of organic carbon and organic matter to and from soil. Results are presented from the EU-funded Tellus Border Soil Carbon Project (2011 to 2013) which aimed to improve current estimates of carbon in soil and peat across Northern Ireland and the bordering counties of the Republic of Ireland.
Historical reports and previous surveys provide baseline data. To monitor change in peat depth and soil organic carbon, these historical data are integrated with more recently acquired airborne geophysical (radiometric) data and ground-based geochemical data generated by two surveys, the Tellus Project (2004-2007: covering Northern Ireland) and the EU-funded Tellus Border project (2011-2013) covering the six bordering counties of the Republic of Ireland, Donegal, Sligo, Leitrim, Cavan, Monaghan and Louth. The concept being applied is that saturated organic-rich soil and peat attenuate gamma-radiation from underlying soils and rocks. This research uses the degree of spatial correlation (coregionalization) between peat depth, soil organic carbon (SOC) and the attenuation of the radiometric signal to update a limited sampling regime of ground-based measurements with remotely acquired data. To comply with the compositional nature of the SOC data (perturbations of loss on ignition [LOI] data), a compositional data analysis approach is investigated. Contemporaneous ground-based measurements allow corroboration for the updated mapped outputs. This provides a methodology that can be used to improve estimates of soil carbon with minimal impact to sensitive habitats (like peat bogs), but with maximum output of data and knowledge.
Resumo:
Statistics are regularly used to make some form of comparison between trace evidence or deploy the exclusionary principle (Morgan and Bull, 2007) in forensic investigations. Trace evidence are routinely the results of particle size, chemical or modal analyses and as such constitute compositional data. The issue is that compositional data including percentages, parts per million etc. only carry relative information. This may be problematic where a comparison of percentages and other constraint/closed data is deemed a statistically valid and appropriate way to present trace evidence in a court of law. Notwithstanding an awareness of the existence of the constant sum problem since the seminal works of Pearson (1896) and Chayes (1960) and the introduction of the application of log-ratio techniques (Aitchison, 1986; Pawlowsky-Glahn and Egozcue, 2001; Pawlowsky-Glahn and Buccianti, 2011; Tolosana-Delgado and van den Boogaart, 2013) the problem that a constant sum destroys the potential independence of variances and covariances required for correlation regression analysis and empirical multivariate methods (principal component analysis, cluster analysis, discriminant analysis, canonical correlation) is all too often not acknowledged in the statistical treatment of trace evidence. Yet the need for a robust treatment of forensic trace evidence analyses is obvious. This research examines the issues and potential pitfalls for forensic investigators if the constant sum constraint is ignored in the analysis and presentation of forensic trace evidence. Forensic case studies involving particle size and mineral analyses as trace evidence are used to demonstrate the use of a compositional data approach using a centred log-ratio (clr) transformation and multivariate statistical analyses.
Resumo:
Conventional practice in Regional Geochemistry includes as a final step of any geochemical campaign the generation of a series of maps, to show the spatial distribution of each of the components considered. Such maps, though necessary, do not comply with the compositional, relative nature of the data, which unfortunately make any conclusion based on them sensitive
to spurious correlation problems. This is one of the reasons why these maps are never interpreted isolated. This contribution aims at gathering a series of statistical methods to produce individual maps of multiplicative combinations of components (logcontrasts), much in the flavor of equilibrium constants, which are designed on purpose to capture certain aspects of the data.
We distinguish between supervised and unsupervised methods, where the first require an external, non-compositional variable (besides the compositional geochemical information) available in an analogous training set. This external variable can be a quantity (soil density, collocated magnetics, collocated ratio of Th/U spectral gamma counts, proportion of clay particle fraction, etc) or a category (rock type, land use type, etc). In the supervised methods, a regression-like model between the external variable and the geochemical composition is derived in the training set, and then this model is mapped on the whole region. This case is illustrated with the Tellus dataset, covering Northern Ireland at a density of 1 soil sample per 2 square km, where we map the presence of blanket peat and the underlying geology. The unsupervised methods considered include principal components and principal balances
(Pawlowsky-Glahn et al., CoDaWork2013), i.e. logcontrasts of the data that are devised to capture very large variability or else be quasi-constant. Using the Tellus dataset again, it is found that geological features are highlighted by the quasi-constant ratios Hf/Nb and their ratio against SiO2; Rb/K2O and Zr/Na2O and the balance between these two groups of two variables; the balance of Al2O3 and TiO2 vs. MgO; or the balance of Cr, Ni and Co vs. V and Fe2O3. The largest variability appears to be related to the presence/absence of peat.
Resumo:
This paper is part of a special issue of Applied Geochemistry focusing on reliable applications of compositional multivariate statistical methods. This study outlines the application of compositional data analysis (CoDa) to calibration of geochemical data and multivariate statistical modelling of geochemistry and grain-size data from a set of Holocene sedimentary cores from the Ganges-Brahmaputra (G-B) delta. Over the last two decades, understanding near-continuous records of sedimentary sequences has required the use of core-scanning X-ray fluorescence (XRF) spectrometry, for both terrestrial and marine sedimentary sequences. Initial XRF data are generally unusable in ‘raw-format’, requiring data processing in order to remove instrument bias, as well as informed sequence interpretation. The applicability of these conventional calibration equations to core-scanning XRF data are further limited by the constraints posed by unknown measurement geometry and specimen homogeneity, as well as matrix effects. Log-ratio based calibration schemes have been developed and applied to clastic sedimentary sequences focusing mainly on energy dispersive-XRF (ED-XRF) core-scanning. This study has applied high resolution core-scanning XRF to Holocene sedimentary sequences from the tidal-dominated Indian Sundarbans, (Ganges-Brahmaputra delta plain). The Log-Ratio Calibration Equation (LRCE) was applied to a sub-set of core-scan and conventional ED-XRF data to quantify elemental composition. This provides a robust calibration scheme using reduced major axis regression of log-ratio transformed geochemical data. Through partial least squares (PLS) modelling of geochemical and grain-size data, it is possible to derive robust proxy information for the Sundarbans depositional environment. The application of these techniques to Holocene sedimentary data offers an improved methodological framework for unravelling Holocene sedimentation patterns.
Resumo:
This study applies spatial statistical techniques including cokriging to integrate airborne geophysical (radiometric) data with ground-based measurements of peat depth and soil organic carbon (SOC) to monitor change in peat cover for carbon stock calculations. The research is part of the EU funded Tellus Border project and is supported by the INTERREG IVA development programme of the European Regional Development Fund, which is managed by the Special EU Programmes Body (SEUPB). The premise is that saturated peat attenuates the radiometric signal from underlying soils and rocks. Contemporaneous ground-based measurements were collected to corroborate mapped estimates and develop a statistical model for volumetric carbon content (VCC) to 0.5 metres. Field measurements included ground penetrating radar, gamma ray spectrometry and a soil sampling methodology which measured bulk density and soil moisture to determine VCC. One aim of the study was to explore whether airborne radiometric survey data can be used to establish VCC across a region. To account for the footprint of airborne radiometric data, five cores were obtained at each soil sampling location: one at the centre of the ground radiometric equivalent sample location and one at each of the four corners 20 metres apart. This soil sampling strategy replicated the methodology deployed for the Tellus Border geochemistry survey. Two key issues will be discussed from this work. The first addresses the integration of different sampling supports for airborne and ground measured data and the second discusses the compositional nature of the VOC data.
Resumo:
A compositional multivariate approach is used to analyse regional scale soil geochemical data obtained as part of the Tellus Project generated by the Geological Survey Northern Ireland (GSNI). The multi-element total concentration data presented comprise XRF analyses of 6862 rural soil samples collected at 20cm depths on a non-aligned grid at one site per 2 km2. Censored data were imputed using published detection limits. Using these imputed values for 46 elements (including LOI), each soil sample site was assigned to the regional geology map provided by GSNI initially using the dominant lithology for the map polygon. Northern Ireland includes a diversity of geology representing a stratigraphic record from the Mesoproterozoic, up to and including the Palaeogene. However, the advance of ice sheets and their meltwaters over the last 100,000 years has left at least 80% of the bedrock covered by superficial deposits, including glacial till and post-glacial alluvium and peat. The question is to what extent the soil geochemistry reflects the underlying geology or superficial deposits. To address this, the geochemical data were transformed using centered log ratios (clr) to observe the requirements of compositional data analysis and avoid closure issues. Following this, compositional multivariate techniques including compositional Principal Component Analysis (PCA) and minimum/maximum autocorrelation factor (MAF) analysis method were used to determine the influence of underlying geology on the soil geochemistry signature. PCA showed that 72% of the variation was determined by the first four principal components (PC’s) implying “significant” structure in the data. Analysis of variance showed that only 10 PC’s were necessary to classify the soil geochemical data. To consider an improvement over PCA that uses the spatial relationships of the data, a classification based on MAF analysis was undertaken using the first 6 dominant factors. Understanding the relationship between soil geochemistry and superficial deposits is important for environmental monitoring of fragile ecosystems such as peat. To explore whether peat cover could be predicted from the classification, the lithology designation was adapted to include the presence of peat, based on GSNI superficial deposit polygons and linear discriminant analysis (LDA) undertaken. Prediction accuracy for LDA classification improved from 60.98% based on PCA using 10 principal components to 64.73% using MAF based on the 6 most dominant factors. The misclassification of peat may reflect degradation of peat covered areas since the creation of superficial deposit classification. Further work will examine the influence of underlying lithologies on elemental concentrations in peat composition and the effect of this in classification analysis.
Resumo:
This study reports on the geochemical and mineralogical characterization of a lateritic profile cropping out in the Balkouin area, Central Burkina Faso, aimed at obtaining a better understanding of the processes responsible for the formation of the laterite itself and the constraints to its development. The lateritic profile rests on a Paleoproterozoic basement mostly composed of granodioritic rocks related to the Eburnean magmatic cycle passing upwards to saprolite and consists of four main composite horizons (bottom to top): kaolinite and clay-rich horizons, mottled laterite and iron-rich duricrust. In order to achieve such a goal, a multi-disciplinary analytical approach was adopted, which includes inductively coupled plasma (ICP) atomic emission and mass spectrometries (ICP-AES and ICP-MS respectively), X-ray powder diffraction (XRPD), scanning electron microscopy with energy dispersive spectrometry (SEM-EDS) and micro-Raman spectroscopy.
The geochemical data, and particularly the immobile elements distribution and REE patterns, show that the Balkouin laterite is the product of an in situ lateritization process that involved a strong depletion of the more soluble elements (K, Mg, Ca, Na, Rb, Sr and Ba) and an enrichment in Fe; Si was also removed, particularly in the uppermost horizons. All along the profile the change in composition is coupled with important changes in mineralogy. In particular, the saprolite is characterized by occurrence of abundant albitic plagioclase, quartz and nontronite; kaolinite is apparently absent. The transition to the overlying lateritic profile marks the breakdown of plagioclase and nontronite, thus allowing kaolinite to become one of the major components upwards, together with goethite and quartz. The upper part of the profile is strongly enriched in hematite (+ kaolinite). Ti oxides (at least in part as anatase) and apatite are typical accessory phases, while free aluminum hydroxides are notably absent. Mass change calculations emphasize the extent of the mass loss, which exceeds 50 wt% (and often 70 wt%) for almost all horizons; only Fe was significantly concentrated in the residual system.
The geochemical and mineralogical features suggest that the lateritic profile is the product of a continuous process that gradually developed from the bedrock upwards, in agreement with the Schellmann classic genetic model. The laterite formation must have occurred at low pH (? 4.5) and high Eh (? 0.4) values, i.e., under acidic and oxidizing environments, which allowed strongly selective leaching conditions. The lack of gibbsite and bohemite is in agreement with the compositional data: the occurrence of quartz (± amorphous silica) all along the profile was an inhibiting factor for the formation of free aluminum hydroxides.
Resumo:
Research over the past two decades on the Holocene sediments from the tide dominated west side of the lower Ganges delta has focussed on constraining the sedimentary environment through grain size distributions (GSD). GSD has traditionally been assessed through the use of probability density function (PDF) methods (e.g. log-normal, log skew-Laplace functions), but these approaches do not acknowledge the compositional nature of the data, which may compromise outcomes in lithofacies interpretations. The use of PDF approaches in GSD analysis poses a series of challenges for the development of lithofacies models, such as equifinal distribution coefficients and obscuring the empirical data variability. In this study a methodological framework for characterising GSD is presented through compositional data analysis (CODA) plus a multivariate statistical framework. This provides a statistically robust analysis of the fine tidal estuary sediments from the West Bengal Sundarbans, relative to alternative PDF approaches.
Resumo:
Single component geochemical maps are the most basic representation of spatial elemental distributions and commonly used in environmental and exploration geochemistry. However, the compositional nature of geochemical data imposes several limitations on how the data should be presented. The problems relate to the constant sum problem (closure), and the inherently multivariate relative information conveyed by compositional data. Well known is, for instance, the tendency of all heavy metals to show lower values in soils with significant contributions of diluting elements (e.g., the quartz dilution effect); or the contrary effect, apparent enrichment in many elements due to removal of potassium during weathering. The validity of classical single component maps is thus investigated, and reasonable alternatives that honour the compositional character of geochemical concentrations are presented. The first recommended such method relies on knowledge-driven log-ratios, chosen to highlight certain geochemical relations or to filter known artefacts (e.g. dilution with SiO2 or volatiles). This is similar to the classical normalisation approach to a single element. The second approach uses the (so called) log-contrasts, that employ suitable statistical methods (such as classification techniques, regression analysis, principal component analysis, clustering of variables, etc.) to extract potentially interesting geochemical summaries. The caution from this work is that if a compositional approach is not used, it becomes difficult to guarantee that any identified pattern, trend or anomaly is not an artefact of the constant sum constraint. In summary the authors recommend a chain of enquiry that involves searching for the appropriate statistical method that can answer the required geological or geochemical question whilst maintaining the integrity of the compositional nature of the data. The required log-ratio transformations should be applied followed by the chosen statistical method. Interpreting the results may require a closer working relationship between statisticians, data analysts and geochemists.
Resumo:
A substantial proportion of aetiological risks for many cancers and chronic diseases remain unexplained. Using geochemical soil and stream water samples collected as part of the Tellus Project studies, current research is investigating naturally occurring background levels of potentially toxic elements (PTEs) in soils and stream sediments and their possible relationship with progressive chronic kidney disease (CKD). The Tellus geological mapping project, Geological Survey Northern Ireland, collected soil sediment and stream water samples on a grid of one sample site every 2 km2 across the rural areas of Northern Ireland resulting in an excess of 6800 soil sampling locations and more than 5800 locations for stream water sampling. Accumulation of several PTEs including arsenic, cadmium, chromium, lead and mercury have been linked with human health and implicated in renal function decline. The hypothesis is that long-term exposure will result in cumulative exposure to PTEs and act as risk factor(s) for cancer and diabetes related CKD and its progression. The ‘bioavailable’ fraction of total PTE soil concentration depends on the ‘bioaccessible’ proportion through an exposure pathway. Recent work has explored this bioaccessible fraction for a range of PTEs across Northern Ireland. In this study the compositional nature of the multivariate geochemical PTE variables and bioaccessible data is explored to augment the investigation into the potential relationship between PTEs, bioaccessibility and disease data.
Resumo:
Here, we describe gene expression compositional assignment (GECA), a powerful, yet simple method based on compositional statistics that can validate the transfer of prior knowledge, such as gene lists, into independent data sets, platforms and technologies. Transcriptional profiling has been used to derive gene lists that stratify patients into prognostic molecular subgroups and assess biomarker performance in the pre-clinical setting. Archived public data sets are an invaluable resource for subsequent in silico validation, though their use can lead to data integration issues. We show that GECA can be used without the need for normalising expression levels between data sets and can outperform rank-based correlation methods. To validate GECA, we demonstrate its success in the cross-platform transfer of gene lists in different domains including: bladder cancer staging, tumour site of origin and mislabelled cell lines. We also show its effectiveness in transferring an epithelial ovarian cancer prognostic gene signature across technologies, from a microarray to a next-generation sequencing setting. In a final case study, we predict the tumour site of origin and histopathology of epithelial ovarian cancer cell lines. In particular, we identify and validate the commonly-used cell line OVCAR-5 as non-ovarian, being gastrointestinal in origin. GECA is available as an open-source R package.