13 resultados para subcomposition


Relevância:

10.00% 10.00%

Publicador:

Resumo:

The use of orthonormal coordinates in the simplex and, particularly, balance coordinates, has suggested the use of a dendrogram for the exploratory analysis of compositional data. The dendrogram is based on a sequential binary partition of a compositional vector into groups of parts. At each step of a partition, one group of parts isdivided into two new groups, and a balancing axis in the simplex between both groupsis defined. The set of balancing axes constitutes an orthonormal basis, and the projections of the sample on them are orthogonal coordinates. They can be represented in adendrogram-like graph showing: (a) the way of grouping parts of the compositional vector; (b) the explanatory role of each subcomposition generated in the partition process;(c) the decomposition of the total variance into balance components associated witheach binary partition; (d) a box-plot of each balance. This representation is useful tohelp the interpretation of balance coordinates; to identify which are the most explanatory coordinates; and to describe the whole sample in a single diagram independentlyof the number of parts of the sample

Relevância:

10.00% 10.00%

Publicador:

Resumo:

R from http://www.r-project.org/ is ‘GNU S’ – a language and environment for statistical computingand graphics. The environment in which many classical and modern statistical techniques havebeen implemented, but many are supplied as packages. There are 8 standard packages and many moreare available through the cran family of Internet sites http://cran.r-project.org .We started to develop a library of functions in R to support the analysis of mixtures and our goal isa MixeR package for compositional data analysis that provides support foroperations on compositions: perturbation and power multiplication, subcomposition with or withoutresiduals, centering of the data, computing Aitchison’s, Euclidean, Bhattacharyya distances,compositional Kullback-Leibler divergence etc.graphical presentation of compositions in ternary diagrams and tetrahedrons with additional features:barycenter, geometric mean of the data set, the percentiles lines, marking and coloring ofsubsets of the data set, theirs geometric means, notation of individual data in the set . . .dealing with zeros and missing values in compositional data sets with R procedures for simpleand multiplicative replacement strategy,the time series analysis of compositional data.We’ll present the current status of MixeR development and illustrate its use on selected data sets

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A study of tin deposits from Priamurye (Russia) is performed to analyze the differencesbetween them based on their origin and also on commercial criteria. A particularanalysis based on their vertical zonality is also given for samples from Solnechnoedeposit. All the statistical analysis are made on the subcomposition formed by seventrace elements in cassiterite (In, Sc, Be, W, Nb, Ti and V) using the Aitchison’methodology of analysis of compositional data

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The R-package “compositions”is a tool for advanced compositional analysis. Its basicfunctionality has seen some conceptual improvement, containing now some facilitiesto work with and represent ilr bases built from balances, and an elaborated subsys-tem for dealing with several kinds of irregular data: (rounded or structural) zeroes,incomplete observations and outliers. The general approach to these irregularities isbased on subcompositions: for an irregular datum, one can distinguish a “regular” sub-composition (where all parts are actually observed and the datum behaves typically)and a “problematic” subcomposition (with those unobserved, zero or rounded parts, orelse where the datum shows an erratic or atypical behaviour). Systematic classificationschemes are proposed for both outliers and missing values (including zeros) focusing onthe nature of irregularities in the datum subcomposition(s).To compute statistics with values missing at random and structural zeros, a projectionapproach is implemented: a given datum contributes to the estimation of the desiredparameters only on the subcompositon where it was observed. For data sets withvalues below the detection limit, two different approaches are provided: the well-knownimputation technique, and also the projection approach.To compute statistics in the presence of outliers, robust statistics are adapted to thecharacteristics of compositional data, based on the minimum covariance determinantapproach. The outlier classification is based on four different models of outlier occur-rence and Monte-Carlo-based tests for their characterization. Furthermore the packageprovides special plots helping to understand the nature of outliers in the dataset.Keywords: coda-dendrogram, lost values, MAR, missing data, MCD estimator,robustness, rounded zeros

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Hydrogeological research usually includes some statistical studies devised to elucidate mean background state, characterise relationships among different hydrochemical parameters, and show the influence of human activities. These goals are achieved either by means of a statistical approach or by mixing modelsbetween end-members. Compositional data analysis has proved to be effective with the first approach, but there is no commonly accepted solution to the end-member problem in a compositional framework.We present here a possible solution based on factor analysis of compositions illustrated with a case study.We find two factors on the compositional bi-plot fitting two non-centered orthogonal axes to the most representative variables. Each one of these axes defines a subcomposition, grouping those variables thatlay nearest to it. With each subcomposition a log-contrast is computed and rewritten as an equilibrium equation. These two factors can be interpreted as the isometric log-ratio coordinates (ilr) of three hiddencomponents, that can be plotted in a ternary diagram. These hidden components might be interpreted as end-members.We have analysed 14 molarities in 31 sampling stations all along the Llobregat River and its tributaries, with a monthly measure during two years. We have obtained a bi-plot with a 57% of explained totalvariance, from which we have extracted two factors: factor G, reflecting geological background enhanced by potash mining; and factor A, essentially controlled by urban and/or farming wastewater. Graphicalrepresentation of these two factors allows us to identify three extreme samples, corresponding to pristine waters, potash mining influence and urban sewage influence. To confirm this, we have available analysisof diffused and widespread point sources identified in the area: springs, potash mining lixiviates, sewage, and fertilisers. Each one of these sources shows a clear link with one of the extreme samples, exceptfertilisers due to the heterogeneity of their composition.This approach is a useful tool to distinguish end-members, and characterise them, an issue generally difficult to solve. It is worth note that the end-member composition cannot be fully estimated but only characterised through log-ratio relationships among components. Moreover, the influence of each endmember in a given sample must be evaluated in relative terms of the other samples. These limitations areintrinsic to the relative nature of compositional data

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The singular value decomposition and its interpretation as alinear biplot has proved to be a powerful tool for analysing many formsof multivariate data. Here we adapt biplot methodology to the specifficcase of compositional data consisting of positive vectors each of whichis constrained to have unit sum. These relative variation biplots haveproperties relating to special features of compositional data: the studyof ratios, subcompositions and models of compositional relationships. Themethodology is demonstrated on a data set consisting of six-part colourcompositions in 22 abstract paintings, showing how the singular valuedecomposition can achieve an accurate biplot of the colour ratios and howpossible models interrelating the colours can be diagnosed.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The use of orthonormal coordinates in the simplex and, particularly, balance coordinates, has suggested the use of a dendrogram for the exploratory analysis of compositional data. The dendrogram is based on a sequential binary partition of a compositional vector into groups of parts. At each step of a partition, one group of parts is divided into two new groups, and a balancing axis in the simplex between both groups is defined. The set of balancing axes constitutes an orthonormal basis, and the projections of the sample on them are orthogonal coordinates. They can be represented in a dendrogram-like graph showing: (a) the way of grouping parts of the compositional vector; (b) the explanatory role of each subcomposition generated in the partition process; (c) the decomposition of the total variance into balance components associated with each binary partition; (d) a box-plot of each balance. This representation is useful to help the interpretation of balance coordinates; to identify which are the most explanatory coordinates; and to describe the whole sample in a single diagram independently of the number of parts of the sample

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Hydrogeological research usually includes some statistical studies devised to elucidate mean background state, characterise relationships among different hydrochemical parameters, and show the influence of human activities. These goals are achieved either by means of a statistical approach or by mixing models between end-members. Compositional data analysis has proved to be effective with the first approach, but there is no commonly accepted solution to the end-member problem in a compositional framework. We present here a possible solution based on factor analysis of compositions illustrated with a case study. We find two factors on the compositional bi-plot fitting two non-centered orthogonal axes to the most representative variables. Each one of these axes defines a subcomposition, grouping those variables that lay nearest to it. With each subcomposition a log-contrast is computed and rewritten as an equilibrium equation. These two factors can be interpreted as the isometric log-ratio coordinates (ilr) of three hidden components, that can be plotted in a ternary diagram. These hidden components might be interpreted as end-members. We have analysed 14 molarities in 31 sampling stations all along the Llobregat River and its tributaries, with a monthly measure during two years. We have obtained a bi-plot with a 57% of explained total variance, from which we have extracted two factors: factor G, reflecting geological background enhanced by potash mining; and factor A, essentially controlled by urban and/or farming wastewater. Graphical representation of these two factors allows us to identify three extreme samples, corresponding to pristine waters, potash mining influence and urban sewage influence. To confirm this, we have available analysis of diffused and widespread point sources identified in the area: springs, potash mining lixiviates, sewage, and fertilisers. Each one of these sources shows a clear link with one of the extreme samples, except fertilisers due to the heterogeneity of their composition. This approach is a useful tool to distinguish end-members, and characterise them, an issue generally difficult to solve. It is worth note that the end-member composition cannot be fully estimated but only characterised through log-ratio relationships among components. Moreover, the influence of each endmember in a given sample must be evaluated in relative terms of the other samples. These limitations are intrinsic to the relative nature of compositional data

Relevância:

10.00% 10.00%

Publicador:

Resumo:

R from http://www.r-project.org/ is ‘GNU S’ – a language and environment for statistical computing and graphics. The environment in which many classical and modern statistical techniques have been implemented, but many are supplied as packages. There are 8 standard packages and many more are available through the cran family of Internet sites http://cran.r-project.org . We started to develop a library of functions in R to support the analysis of mixtures and our goal is a MixeR package for compositional data analysis that provides support for operations on compositions: perturbation and power multiplication, subcomposition with or without residuals, centering of the data, computing Aitchison’s, Euclidean, Bhattacharyya distances, compositional Kullback-Leibler divergence etc. graphical presentation of compositions in ternary diagrams and tetrahedrons with additional features: barycenter, geometric mean of the data set, the percentiles lines, marking and coloring of subsets of the data set, theirs geometric means, notation of individual data in the set . . . dealing with zeros and missing values in compositional data sets with R procedures for simple and multiplicative replacement strategy, the time series analysis of compositional data. We’ll present the current status of MixeR development and illustrate its use on selected data sets

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A study of tin deposits from Priamurye (Russia) is performed to analyze the differences between them based on their origin and also on commercial criteria. A particular analysis based on their vertical zonality is also given for samples from Solnechnoe deposit. All the statistical analysis are made on the subcomposition formed by seven trace elements in cassiterite (In, Sc, Be, W, Nb, Ti and V) using the Aitchison’ methodology of analysis of compositional data

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The R-package “compositions”is a tool for advanced compositional analysis. Its basic functionality has seen some conceptual improvement, containing now some facilities to work with and represent ilr bases built from balances, and an elaborated subsys- tem for dealing with several kinds of irregular data: (rounded or structural) zeroes, incomplete observations and outliers. The general approach to these irregularities is based on subcompositions: for an irregular datum, one can distinguish a “regular” sub- composition (where all parts are actually observed and the datum behaves typically) and a “problematic” subcomposition (with those unobserved, zero or rounded parts, or else where the datum shows an erratic or atypical behaviour). Systematic classification schemes are proposed for both outliers and missing values (including zeros) focusing on the nature of irregularities in the datum subcomposition(s). To compute statistics with values missing at random and structural zeros, a projection approach is implemented: a given datum contributes to the estimation of the desired parameters only on the subcompositon where it was observed. For data sets with values below the detection limit, two different approaches are provided: the well-known imputation technique, and also the projection approach. To compute statistics in the presence of outliers, robust statistics are adapted to the characteristics of compositional data, based on the minimum covariance determinant approach. The outlier classification is based on four different models of outlier occur- rence and Monte-Carlo-based tests for their characterization. Furthermore the package provides special plots helping to understand the nature of outliers in the dataset. Keywords: coda-dendrogram, lost values, MAR, missing data, MCD estimator, robustness, rounded zeros

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The Dirichlet family owes its privileged status within simplex distributions to easyness of interpretation and good mathematical properties. In particular, we recall fundamental properties for the analysis of compositional data such as closure under amalgamation and subcomposition. From a probabilistic point of view, it is characterised (uniquely) by a variety of independence relationships which makes it indisputably the reference model for expressing the non trivial idea of substantial independence for compositions. Indeed, its well known inadequacy as a general model for compositional data stems from such an independence structure together with the poorness of its parametrisation. In this paper a new class of distributions (called Flexible Dirichlet) capable of handling various dependence structures and containing the Dirichlet as a special case is presented. The new model exhibits a considerably richer parametrisation which, for example, allows to model the means and (part of) the variance-covariance matrix separately. Moreover, such a model preserves some good mathematical properties of the Dirichlet, i.e. closure under amalgamation and subcomposition with new parameters simply related to the parent composition parameters. Furthermore, the joint and conditional distributions of subcompositions and relative totals can be expressed as simple mixtures of two Flexible Dirichlet distributions. The basis generating the Flexible Dirichlet, though keeping compositional invariance, shows a dependence structure which allows various forms of partitional dependence to be contemplated by the model (e.g. non-neutrality, subcompositional dependence and subcompositional non-invariance), independence cases being identified by suitable parameter configurations. In particular, within this model substantial independence among subsets of components of the composition naturally occurs when the subsets have a Dirichlet distribution

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The complexity of modern geochemical data sets is increasing in several aspects (number of available samples, number of elements measured, number of matrices analysed, geological-environmental variability covered, etc), hence it is becoming increasingly necessary to apply statistical methods to elucidate their structure. This paper presents an exploratory analysis of one such complex data set, the Tellus geochemical soil survey of Northern Ireland (NI). This exploratory analysis is based on one of the most fundamental exploratory tools, principal component analysis (PCA) and its graphical representation as a biplot, albeit in several variations: the set of elements included (only major oxides vs. all observed elements), the prior transformation applied to the data (none, a standardization or a logratio transformation) and the way the covariance matrix between components is estimated (classical estimation vs. robust estimation). Results show that a log-ratio PCA (robust or classical) of all available elements is the most powerful exploratory setting, providing the following insights: the first two processes controlling the whole geochemical variation in NI soils are peat coverage and a contrast between “mafic” and “felsic” background lithologies; peat covered areas are detected as outliers by a robust analysis, and can be then filtered out if required for further modelling; and peat coverage intensity can be quantified with the %Br in the subcomposition (Br, Rb, Ni).