994 resultados para COMPOSITIONAL DATA
Resumo:
In an earlier investigation (Burger et al., 2000) five sediment cores near the RodriguesTriple Junction in the Indian Ocean were studied applying classical statistical methods(fuzzy c-means clustering, linear mixing model, principal component analysis) for theextraction of endmembers and evaluating the spatial and temporal variation ofgeochemical signals. Three main factors of sedimentation were expected by the marinegeologists: a volcano-genetic, a hydro-hydrothermal and an ultra-basic factor. Thedisplay of fuzzy membership values and/or factor scores versus depth providedconsistent results for two factors only; the ultra-basic component could not beidentified. The reason for this may be that only traditional statistical methods wereapplied, i.e. the untransformed components were used and the cosine-theta coefficient assimilarity measure.During the last decade considerable progress in compositional data analysis was madeand many case studies were published using new tools for exploratory analysis of thesedata. Therefore it makes sense to check if the application of suitable data transformations,reduction of the D-part simplex to two or three factors and visualinterpretation of the factor scores would lead to a revision of earlier results and toanswers to open questions . In this paper we follow the lines of a paper of R. Tolosana-Delgado et al. (2005) starting with a problem-oriented interpretation of the biplotscattergram, extracting compositional factors, ilr-transformation of the components andvisualization of the factor scores in a spatial context: The compositional factors will beplotted versus depth (time) of the core samples in order to facilitate the identification ofthe expected sources of the sedimentary process.Kew words: compositional data analysis, biplot, deep sea sediments
Resumo:
Self-organizing maps (Kohonen 1997) is a type of artificial neural network developedto explore patterns in high-dimensional multivariate data. The conventional versionof the algorithm involves the use of Euclidean metric in the process of adaptation ofthe model vectors, thus rendering in theory a whole methodology incompatible withnon-Euclidean geometries.In this contribution we explore the two main aspects of the problem:1. Whether the conventional approach using Euclidean metric can shed valid resultswith compositional data.2. If a modification of the conventional approach replacing vectorial sum and scalarmultiplication by the canonical operators in the simplex (i.e. perturbation andpowering) can converge to an adequate solution.Preliminary tests showed that both methodologies can be used on compositional data.However, the modified version of the algorithm performs poorer than the conventionalversion, in particular, when the data is pathological. Moreover, the conventional ap-proach converges faster to a solution, when data is \well-behaved".Key words: Self Organizing Map; Artificial Neural networks; Compositional data
Resumo:
Many multivariate methods that are apparently distinct can be linked by introducing oneor more parameters in their definition. Methods that can be linked in this way arecorrespondence analysis, unweighted or weighted logratio analysis (the latter alsoknown as "spectral mapping"), nonsymmetric correspondence analysis, principalcomponent analysis (with and without logarithmic transformation of the data) andmultidimensional scaling. In this presentation I will show how several of thesemethods, which are frequently used in compositional data analysis, may be linkedthrough parametrizations such as power transformations, linear transformations andconvex linear combinations. Since the methods of interest here all lead to visual mapsof data, a "movie" can be made where where the linking parameter is allowed to vary insmall steps: the results are recalculated "frame by frame" and one can see the smoothchange from one method to another. Several of these "movies" will be shown, giving adeeper insight into the similarities and differences between these methods
Resumo:
In this paper we examine the problem of compositional data from a different startingpoint. Chemical compositional data, as used in provenance studies on archaeologicalmaterials, will be approached from the measurement theory. The results will show, in avery intuitive way that chemical data can only be treated by using the approachdeveloped for compositional data. It will be shown that compositional data analysis is aparticular case in projective geometry, when the projective coordinates are in thepositive orthant, and they have the properties of logarithmic interval metrics. Moreover,it will be shown that this approach can be extended to a very large number ofapplications, including shape analysis. This will be exemplified with a case study inarchitecture of Early Christian churches dated back to the 5th-7th centuries AD
Resumo:
Factor analysis as frequent technique for multivariate data inspection is widely used also for compositional data analysis. The usual way is to use a centered logratio (clr)transformation to obtain the random vector y of dimension D. The factor model istheny = Λf + e (1)with the factors f of dimension k & D, the error term e, and the loadings matrix Λ.Using the usual model assumptions (see, e.g., Basilevsky, 1994), the factor analysismodel (1) can be written asCov(y) = ΛΛT + ψ (2)where ψ = Cov(e) has a diagonal form. The diagonal elements of ψ as well as theloadings matrix Λ are estimated from an estimation of Cov(y).Given observed clr transformed data Y as realizations of the random vectory. Outliers or deviations from the idealized model assumptions of factor analysiscan severely effect the parameter estimation. As a way out, robust estimation ofthe covariance matrix of Y will lead to robust estimates of Λ and ψ in (2), seePison et al. (2003). Well known robust covariance estimators with good statisticalproperties, like the MCD or the S-estimators (see, e.g. Maronna et al., 2006), relyon a full-rank data matrix Y which is not the case for clr transformed data (see,e.g., Aitchison, 1986).The isometric logratio (ilr) transformation (Egozcue et al., 2003) solves thissingularity problem. The data matrix Y is transformed to a matrix Z by usingan orthonormal basis of lower dimension. Using the ilr transformed data, a robustcovariance matrix C(Z) can be estimated. The result can be back-transformed tothe clr space byC(Y ) = V C(Z)V Twhere the matrix V with orthonormal columns comes from the relation betweenthe clr and the ilr transformation. Now the parameters in the model (2) can beestimated (Basilevsky, 1994) and the results have a direct interpretation since thelinks to the original variables are still preserved.The above procedure will be applied to data from geochemistry. Our specialinterest is on comparing the results with those of Reimann et al. (2002) for the Kolaproject data
Resumo:
We take stock of the present position of compositional data analysis, of what has beenachieved in the last 20 years, and then make suggestions as to what may be sensibleavenues of future research. We take an uncompromisingly applied mathematical view,that the challenge of solving practical problems should motivate our theoreticalresearch; and that any new theory should be thoroughly investigated to see if it mayprovide answers to previously abandoned practical considerations. Indeed a main themeof this lecture will be to demonstrate this applied mathematical approach by a number ofchallenging examples
Resumo:
The application of compositional data analysis through log ratio trans-formations corresponds to a multinomial logit model for the shares themselves.This model is characterized by the property of Independence of Irrelevant Alter-natives (IIA). IIA states that the odds ratio in this case the ratio of shares is invariant to the addition or deletion of outcomes to the problem. It is exactlythis invariance of the ratio that underlies the commonly used zero replacementprocedure in compositional data analysis. In this paper we investigate using thenested logit model that does not embody IIA and an associated zero replacementprocedure and compare its performance with that of the more usual approach ofusing the multinomial logit model. Our comparisons exploit a data set that com-bines voting data by electoral division with corresponding census data for eachdivision for the 2001 Federal election in Australia
Resumo:
This analysis was stimulated by the real data analysis problem of householdexpenditure data. The full dataset contains expenditure data for a sample of 1224 households. The expenditure is broken down at 2 hierarchical levels: 9 major levels (e.g. housing, food, utilities etc.) and 92 minor levels. There are also 5 factors and 5 covariates at the household level. Not surprisingly, there are a small number of zeros at the major level, but many zeros at the minor level. The question is how best to model the zeros. Clearly, models that tryto add a small amount to the zero terms are not appropriate in general as at least some of the zeros are clearly structural, e.g. alcohol/tobacco for households that are teetotal. The key question then is how to build suitable conditional models. For example, is the sub-composition of spendingexcluding alcohol/tobacco similar for teetotal and non-teetotal households?In other words, we are looking for sub-compositional independence. Also, what determines whether a household is teetotal? Can we assume that it is independent of the composition? In general, whether teetotal will clearly depend on the household level variables, so we need to be able to model this dependence. The other tricky question is that with zeros on more than onecomponent, we need to be able to model dependence and independence of zeros on the different components. Lastly, while some zeros are structural, others may not be, for example, for expenditure on durables, it may be chance as to whether a particular household spends money on durableswithin the sample period. This would clearly be distinguishable if we had longitudinal data, but may still be distinguishable by looking at the distribution, on the assumption that random zeros will usually be for situations where any non-zero expenditure is not small.While this analysis is based on around economic data, the ideas carry over tomany other situations, including geological data, where minerals may be missing for structural reasons (similar to alcohol), or missing because they occur only in random regions which may be missed in a sample (similar to the durables)
Resumo:
Examples of compositional data. The simplex, a suitable sample space for compositional data and Aitchison's geometry. R, a free language and environment for statistical computing and graphics
Resumo:
Compositional data naturally arises from the scientific analysis of the chemicalcomposition of archaeological material such as ceramic and glass artefacts. Data of thistype can be explored using a variety of techniques, from standard multivariate methodssuch as principal components analysis and cluster analysis, to methods based upon theuse of log-ratios. The general aim is to identify groups of chemically similar artefactsthat could potentially be used to answer questions of provenance.This paper will demonstrate work in progress on the development of a documentedlibrary of methods, implemented using the statistical package R, for the analysis ofcompositional data. R is an open source package that makes available very powerfulstatistical facilities at no cost. We aim to show how, with the aid of statistical softwaresuch as R, traditional exploratory multivariate analysis can easily be used alongside, orin combination with, specialist techniques of compositional data analysis.The library has been developed from a core of basic R functionality, together withpurpose-written routines arising from our own research (for example that reported atCoDaWork'03). In addition, we have included other appropriate publicly availabletechniques and libraries that have been implemented in R by other authors. Availablefunctions range from standard multivariate techniques through to various approaches tolog-ratio analysis and zero replacement. We also discuss and demonstrate a smallselection of relatively new techniques that have hitherto been little-used inarchaeometric applications involving compositional data. The application of the libraryto the analysis of data arising in archaeometry will be demonstrated; results fromdifferent analyses will be compared; and the utility of the various methods discussed
Resumo:
”compositions” is a new R-package for the analysis of compositional and positive data.It contains four classes corresponding to the four different types of compositional andpositive geometry (including the Aitchison geometry). It provides means for computation,plotting and high-level multivariate statistical analysis in all four geometries.These geometries are treated in an fully analogous way, based on the principle of workingin coordinates, and the object-oriented programming paradigm of R. In this way,called functions automatically select the most appropriate type of analysis as a functionof the geometry. The graphical capabilities include ternary diagrams and tetrahedrons,various compositional plots (boxplots, barplots, piecharts) and extensive graphical toolsfor principal components. Afterwards, ortion and proportion lines, straight lines andellipses in all geometries can be added to plots. The package is accompanied by ahands-on-introduction, documentation for every function, demos of the graphical capabilitiesand plenty of usage examples. It allows direct and parallel computation inall four vector spaces and provides the beginner with a copy-and-paste style of dataanalysis, while letting advanced users keep the functionality and customizability theydemand of R, as well as all necessary tools to add own analysis routines. A completeexample is included in the appendix
Resumo:
We shall call an n × p data matrix fully-compositional if the rows sum to a constant, and sub-compositional if the variables are a subset of a fully-compositional data set1. Such data occur widely in archaeometry, where it is common to determine the chemical composition of ceramic, glass, metal or other artefacts using techniques such as neutron activation analysis (NAA), inductively coupled plasma spectroscopy (ICPS), X-ray fluorescence analysis (XRF) etc. Interest often centres on whether there are distinct chemical groups within the data and whether, for example, these can be associated with different origins or manufacturing technologies
Resumo:
Presentation in CODAWORK'03, session 4: Applications to archeometry
Resumo:
Developments in the statistical analysis of compositional data over the last twodecades have made possible a much deeper exploration of the nature of variability,and the possible processes associated with compositional data sets from manydisciplines. In this paper we concentrate on geochemical data sets. First we explainhow hypotheses of compositional variability may be formulated within the naturalsample space, the unit simplex, including useful hypotheses of subcompositionaldiscrimination and specific perturbational change. Then we develop through standardmethodology, such as generalised likelihood ratio tests, statistical tools to allow thesystematic investigation of a complete lattice of such hypotheses. Some of these tests are simple adaptations of existing multivariate tests but others require specialconstruction. We comment on the use of graphical methods in compositional dataanalysis and on the ordination of specimens. The recent development of the conceptof compositional processes is then explained together with the necessary tools for astaying- in-the-simplex approach, namely compositional singular value decompositions. All these statistical techniques are illustrated for a substantial compositional data set, consisting of 209 major-oxide and rare-element compositions of metamorphosed limestones from the Northeast and Central Highlands of Scotland.Finally we point out a number of unresolved problems in the statistical analysis ofcompositional processes
Resumo:
R from http://www.r-project.org/ is ‘GNU S’ – a language and environment for statistical computingand graphics. The environment in which many classical and modern statistical techniques havebeen implemented, but many are supplied as packages. There are 8 standard packages and many moreare available through the cran family of Internet sites http://cran.r-project.org .We started to develop a library of functions in R to support the analysis of mixtures and our goal isa MixeR package for compositional data analysis that provides support foroperations on compositions: perturbation and power multiplication, subcomposition with or withoutresiduals, centering of the data, computing Aitchison’s, Euclidean, Bhattacharyya distances,compositional Kullback-Leibler divergence etc.graphical presentation of compositions in ternary diagrams and tetrahedrons with additional features:barycenter, geometric mean of the data set, the percentiles lines, marking and coloring ofsubsets of the data set, theirs geometric means, notation of individual data in the set . . .dealing with zeros and missing values in compositional data sets with R procedures for simpleand multiplicative replacement strategy,the time series analysis of compositional data.We’ll present the current status of MixeR development and illustrate its use on selected data sets