926 resultados para compositional geometry
Resumo:
Compositional data analysis motivated the introduction of a complete Euclidean structure in the simplex of D parts. This was based on the early work of J. Aitchison (1986) and completed recently when Aitchinson distance in the simplex was associated with an inner product and orthonormal bases were identified (Aitchison and others, 2002; Egozcue and others, 2003). A partition of the support of a random variable generates a composition by assigning the probability of each interval to a part of the composition. One can imagine that the partition can be refined and the probability density would represent a kind of continuous composition of probabilities in a simplex of infinitely many parts. This intuitive idea would lead to a Hilbert-space of probability densities by generalizing the Aitchison geometry for compositions in the simplex into the set probability densities
Resumo:
The Aitchison vector space structure for the simplex is generalized to a Hilbert space structure A2(P) for distributions and likelihoods on arbitrary spaces. Central notations of statistics, such as Information or Likelihood, can be identified in the algebraical structure of A2(P) and their corresponding notions in compositional data analysis, such as Aitchison distance or centered log ratio transform. In this way very elaborated aspects of mathematical statistics can be understood easily in the light of a simple vector space structure and of compositional data analysis. E.g. combination of statistical information such as Bayesian updating, combination of likelihood and robust M-estimation functions are simple additions/ perturbations in A2(Pprior). Weighting observations corresponds to a weighted addition of the corresponding evidence. Likelihood based statistics for general exponential families turns out to have a particularly easy interpretation in terms of A2(P). Regular exponential families form finite dimensional linear subspaces of A2(P) and they correspond to finite dimensional subspaces formed by their posterior in the dual information space A2(Pprior). The Aitchison norm can identified with mean Fisher information. The closing constant itself is identified with a generalization of the cummulant function and shown to be Kullback Leiblers directed information. Fisher information is the local geometry of the manifold induced by the A2(P) derivative of the Kullback Leibler information and the space A2(P) can therefore be seen as the tangential geometry of statistical inference at the distribution P. The discussion of A2(P) valued random variables, such as estimation functions or likelihoods, give a further interpretation of Fisher information as the expected squared norm of evidence and a scale free understanding of unbiased reasoning
Resumo:
Examples of compositional data. The simplex, a suitable sample space for compositional data and Aitchison's geometry. R, a free language and environment for statistical computing and graphics
Resumo:
”compositions” is a new R-package for the analysis of compositional and positive data. It contains four classes corresponding to the four different types of compositional and positive geometry (including the Aitchison geometry). It provides means for computation, plotting and high-level multivariate statistical analysis in all four geometries. These geometries are treated in an fully analogous way, based on the principle of working in coordinates, and the object-oriented programming paradigm of R. In this way, called functions automatically select the most appropriate type of analysis as a function of the geometry. The graphical capabilities include ternary diagrams and tetrahedrons, various compositional plots (boxplots, barplots, piecharts) and extensive graphical tools for principal components. Afterwards, ortion and proportion lines, straight lines and ellipses in all geometries can be added to plots. The package is accompanied by a hands-on-introduction, documentation for every function, demos of the graphical capabilities and plenty of usage examples. It allows direct and parallel computation in all four vector spaces and provides the beginner with a copy-and-paste style of data analysis, while letting advanced users keep the functionality and customizability they demand of R, as well as all necessary tools to add own analysis routines. A complete example is included in the appendix
Resumo:
A joint distribution of two discrete random variables with finite support can be displayed as a two way table of probabilities adding to one. Assume that this table has n rows and m columns and all probabilities are non-null. This kind of table can be seen as an element in the simplex of n · m parts. In this context, the marginals are identified as compositional amalgams, conditionals (rows or columns) as subcompositions. Also, simplicial perturbation appears as Bayes theorem. However, the Euclidean elements of the Aitchison geometry of the simplex can also be translated into the table of probabilities: subspaces, orthogonal projections, distances. Two important questions are addressed: a) given a table of probabilities, which is the nearest independent table to the initial one? b) which is the largest orthogonal projection of a row onto a column? or, equivalently, which is the information in a row explained by a column, thus explaining the interaction? To answer these questions three orthogonal decompositions are presented: (1) by columns and a row-wise geometric marginal, (2) by rows and a columnwise geometric marginal, (3) by independent two-way tables and fully dependent tables representing row-column interaction. An important result is that the nearest independent table is the product of the two (row and column)-wise geometric marginal tables. A corollary is that, in an independent table, the geometric marginals conform with the traditional (arithmetic) marginals. These decompositions can be compared with standard log-linear models. Key words: balance, compositional data, simplex, Aitchison geometry, composition, orthonormal basis, arithmetic and geometric marginals, amalgam, dependence measure, contingency table
Resumo:
Simpson's paradox, also known as amalgamation or aggregation paradox, appears when dealing with proportions. Proportions are by construction parts of a whole, which can be interpreted as compositions assuming they only carry relative information. The Aitchison inner product space structure of the simplex, the sample space of compositions, explains the appearance of the paradox, given that amalgamation is a nonlinear operation within that structure. Here we propose to use balances, which are specific elements of this structure, to analyse situations where the paradox might appear. With the proposed approach we obtain that the centre of the tables analysed is a natural way to compare them, which avoids by construction the possibility of a paradox. Key words: Aitchison geometry, geometric mean, orthogonal projection
Resumo:
The amalgamation operation is frequently used to reduce the number of parts of compositional data but it is a non-linear operation in the simplex with the usual geometry, the Aitchison geometry. The concept of balances between groups, a particular coordinate system designed over binary partitions of the parts, could be an alternative to the amalgamation in some cases. In this work we discuss the proper application of both concepts using a real data set corresponding to behavioral measures of pregnant sows
Resumo:
In this paper we examine the problem of compositional data from a different starting point. Chemical compositional data, as used in provenance studies on archaeological materials, will be approached from the measurement theory. The results will show, in a very intuitive way that chemical data can only be treated by using the approach developed for compositional data. It will be shown that compositional data analysis is a particular case in projective geometry, when the projective coordinates are in the positive orthant, and they have the properties of logarithmic interval metrics. Moreover, it will be shown that this approach can be extended to a very large number of applications, including shape analysis. This will be exemplified with a case study in architecture of Early Christian churches dated back to the 5th-7th centuries AD
Resumo:
A novel metric comparison of the appendicular skeleton (fore and hind limb) of different vertebrates using the Compositional Data Analysis (CDA) methodological approach it’s presented. 355 specimens belonging in various taxa of Dinosauria (Sauropodomorpha, Theropoda, Ornithischia and Aves) and Mammalia (Prothotheria, Metatheria and Eutheria) were analyzed with CDA. A special focus has been put on Sauropodomorpha dinosaurs and the Aitchinson distance has been used as a measure of disparity in limb elements proportions to infer some aspects of functional morphology
Resumo:
Soil aggregation is an index of soil structure measured by mean weight diameter (MWD) or scaling factors often interpreted as fragmentation fractal dimensions (D-f). However, the MWD provides a biased estimate of soil aggregation due to spurious correlations among aggregate-size fractions and scale-dependency. The scale-invariant D-f is based on weak assumptions to allow particle counts and sensitive to the selection of the fractal domain, and may frequently exceed a value of 3, implying that D-f is a biased estimate of aggregation. Aggregation indices based on mass may be computed without bias using compositional analysis techniques. Our objective was to elaborate compositional indices of soil aggregation and to compare them to MWD and D-f using a published dataset describing the effect of 7 cropping systems on aggregation. Six aggregate-size fractions were arranged into a sequence of D-1 balances of building blocks that portray the process of soil aggregation. Isometric log-ratios (ilrs) are scale-invariant and orthogonal log contrasts or balances that possess the Euclidean geometry necessary to compute a distance between any two aggregation states, known as the Aitchison distance (A(x,y)). Close correlations (r>0.98) were observed between MWD, D-f, and the ilr when contrasting large and small aggregate sizes. Several unbiased embedded ilrs can characterize the heterogeneous nature of soil aggregates and be related to soil properties or functions. Soil bulk density and penetrater resistance were closely related to A(x,y) with reference to bare fallow. The A(x,y) is easy to implement as unbiased index of soil aggregation using standard sieving methods and may allow comparisons between studies. (C) 2012 Elsevier B.V. All rights reserved.
Resumo:
This paper is part of a special issue of Applied Geochemistry focusing on reliable applications of compositional multivariate statistical methods. This study outlines the application of compositional data analysis (CoDa) to calibration of geochemical data and multivariate statistical modelling of geochemistry and grain-size data from a set of Holocene sedimentary cores from the Ganges-Brahmaputra (G-B) delta. Over the last two decades, understanding near-continuous records of sedimentary sequences has required the use of core-scanning X-ray fluorescence (XRF) spectrometry, for both terrestrial and marine sedimentary sequences. Initial XRF data are generally unusable in ‘raw-format’, requiring data processing in order to remove instrument bias, as well as informed sequence interpretation. The applicability of these conventional calibration equations to core-scanning XRF data are further limited by the constraints posed by unknown measurement geometry and specimen homogeneity, as well as matrix effects. Log-ratio based calibration schemes have been developed and applied to clastic sedimentary sequences focusing mainly on energy dispersive-XRF (ED-XRF) core-scanning. This study has applied high resolution core-scanning XRF to Holocene sedimentary sequences from the tidal-dominated Indian Sundarbans, (Ganges-Brahmaputra delta plain). The Log-Ratio Calibration Equation (LRCE) was applied to a sub-set of core-scan and conventional ED-XRF data to quantify elemental composition. This provides a robust calibration scheme using reduced major axis regression of log-ratio transformed geochemical data. Through partial least squares (PLS) modelling of geochemical and grain-size data, it is possible to derive robust proxy information for the Sundarbans depositional environment. The application of these techniques to Holocene sedimentary data offers an improved methodological framework for unravelling Holocene sedimentation patterns.
Resumo:
The objective of this study was to evaluate children's respiratory patterns in the mixed dentition, by means of acoustic rhinometry, and its relation to the upper arch width development. Fifty patients were examined, 25 females and 25 males with mean age of eight years and seven months. All of them were submitted to acoustic rhinometry and upper and lower arch impressions to obtain plaster models. The upper arch analysis was accomplished by measuring the interdental transverse distance of the upper teeth, deciduous canines (measurement 1), deciduous first molars (measurement 2), deciduous second molars (measurement 3) and the first molars (measurement 4). The results showed that an increased left nasal cavity area in females means an increased interdental distance of the deciduous first molars and deciduous second molars and an increased interdental distance of the deciduous canines, deciduous first and second molars in males. It was concluded that there is a correlation between the nasal cavity area and the upper arch transverse distance in the anterior and mid maxillary regions for both genders.
Resumo:
We present measurements of J/psi yields in d + Au collisions at root S(NN) = 200 GeV recorded by the PHENIX experiment and compare them with yields in p + p collisions at the same energy per nucleon-nucleon collision. The measurements cover a large kinematic range in J/psi rapidity (-2.2 < y < 2.4) with high statistical precision and are compared with two theoretical models: one with nuclear shadowing combined with final state breakup and one with coherent gluon saturation effects. In order to remove model dependent systematic uncertainties we also compare the data to a simple geometric model. The forward rapidity data are inconsistent with nuclear modifications that are linear or exponential in the density weighted longitudinal thickness, such as those from the final state breakup of the bound state.
Resumo:
We have measured the azimuthal anisotropy of pi(0) production for 1 < p(T) < 18 GeV/c for Au + Au collisions at root s(NN) = 200 GeV. The observed anisotropy shows a gradual decrease for 3 less than or similar to p(T) less than or similar to 7-10 GeV/c, but remains positive beyond 10 GeV/c. The magnitude of this anisotropy is underpredicted, up to at least similar to 10 GeV/c, by current perturbative QCD (PQCD) energy-loss model calculations. An estimate of the increase in anisotropy expected from initial-geometry modification due to gluon saturation effects and fluctuations is insufficient to account for this discrepancy. Calculations that implement a path-length dependence steeper than what is implied by current PQCD energy-loss models show reasonable agreement with the data.
Resumo:
Milkfat-soybean oil blends were enzymatically interesterified (EIE) by Aspergillus niger lipase immobilized on SiO(2)-PVA hybrid composite in a solvent free system. An experimental mixture design was used to study the effects of binary blends of milkfat-soybean oil (MF:SBO) at different proportions (0:100; 25:75; 33:67; 50:50; 67:33; 75:25; 100:0) on the compositional and textural properties of the EIE products, considering, as response variables, the interesterification yield (IY), consistency and hardness. Lipase-catalysed interesterification reactions increased the relative proportion of TAGs` C(46)-C(52) and decreased the TAGs` C(40)-C(42) and C(54) concentrations. The highest IY was attained (10.8%) for EIE blend of MF:SBO 67:33 resulting in a more spreadable material at refrigerator temperature in comparison with butter, milkfat or non-interesterified (NIE) blend. In this case, consistency and hardness values were at least 32% lower than values measured for butter. Thus, using A. niger lipase immobilized on SiO(2)-PVA improves the textural properties of milkfat and has potential for development of a product incorporating unsaturated and essential fatty acids from soybean oil. (C) 2010 Elsevier Ltd. All rights reserved.