9 resultados para Geodesic Compositions

em Universitat de Girona, Spain


Relevância:

20.00% 20.00%

Publicador:

Resumo:

We propose to analyze shapes as “compositions” of distances in Aitchison geometry as an alternate and complementary tool to classical shape analysis, especially when size is non-informative. Shapes are typically described by the location of user-chosen landmarks. However the shape – considered as invariant under scaling, translation, mirroring and rotation – does not uniquely define the location of landmarks. A simple approach is to use distances of landmarks instead of the locations of landmarks them self. Distances are positive numbers defined up to joint scaling, a mathematical structure quite similar to compositions. The shape fixes only ratios of distances. Perturbations correspond to relative changes of the size of subshapes and of aspect ratios. The power transform increases the expression of the shape by increasing distance ratios. In analogy to the subcompositional consistency, results should not depend too much on the choice of distances, because different subsets of the pairwise distances of landmarks uniquely define the shape. Various compositional analysis tools can be applied to sets of distances directly or after minor modifications concerning the singularity of the covariance matrix and yield results with direct interpretations in terms of shape changes. The remaining problem is that not all sets of distances correspond to a valid shape. Nevertheless interpolated or predicted shapes can be backtransformated by multidimensional scaling (when all pairwise distances are used) or free geodetic adjustment (when sufficiently many distances are used)

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Aitchison and Bacon-Shone (1999) considered convex linear combinations of compositions. In other words, they investigated compositions of compositions, where the mixing composition follows a logistic Normal distribution (or a perturbation process) and the compositions being mixed follow a logistic Normal distribution. In this paper, I investigate the extension to situations where the mixing composition varies with a number of dimensions. Examples would be where the mixing proportions vary with time or distance or a combination of the two. Practical situations include a river where the mixing proportions vary along the river, or across a lake and possibly with a time trend. This is illustrated with a dataset similar to that used in the Aitchison and Bacon-Shone paper, which looked at how pollution in a loch depended on the pollution in the three rivers that feed the loch. Here, I explicitly model the variation in the linear combination across the loch, assuming that the mean of the logistic Normal distribution depends on the river flows and relative distance from the source origins

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper examines a dataset which is modeled well by the Poisson-Log Normal process and by this process mixed with Log Normal data, which are both turned into compositions. This generates compositional data that has zeros without any need for conditional models or assuming that there is missing or censored data that needs adjustment. It also enables us to model dependence on covariates and within the composition

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The R-package “compositions”is a tool for advanced compositional analysis. Its basic functionality has seen some conceptual improvement, containing now some facilities to work with and represent ilr bases built from balances, and an elaborated subsys- tem for dealing with several kinds of irregular data: (rounded or structural) zeroes, incomplete observations and outliers. The general approach to these irregularities is based on subcompositions: for an irregular datum, one can distinguish a “regular” sub- composition (where all parts are actually observed and the datum behaves typically) and a “problematic” subcomposition (with those unobserved, zero or rounded parts, or else where the datum shows an erratic or atypical behaviour). Systematic classification schemes are proposed for both outliers and missing values (including zeros) focusing on the nature of irregularities in the datum subcomposition(s). To compute statistics with values missing at random and structural zeros, a projection approach is implemented: a given datum contributes to the estimation of the desired parameters only on the subcompositon where it was observed. For data sets with values below the detection limit, two different approaches are provided: the well-known imputation technique, and also the projection approach. To compute statistics in the presence of outliers, robust statistics are adapted to the characteristics of compositional data, based on the minimum covariance determinant approach. The outlier classification is based on four different models of outlier occur- rence and Monte-Carlo-based tests for their characterization. Furthermore the package provides special plots helping to understand the nature of outliers in the dataset. Keywords: coda-dendrogram, lost values, MAR, missing data, MCD estimator, robustness, rounded zeros

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The aim of this talk is to convince the reader that there are a lot of interesting statistical problems in presentday life science data analysis which seem ultimately connected with compositional statistics. Key words: SAGE, cDNA microarrays, (1D-)NMR, virus quasispecies

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Theory of compositional data analysis is often focused on the composition only. However in practical applications we often treat a composition together with covariables with some other scale. This contribution systematically gathers and develop statistical tools for this situation. For instance, for the graphical display of the dependence of a composition with a categorical variable, a colored set of ternary diagrams might be a good idea for a first look at the data, but it will fast hide important aspects if the composition has many parts, or it takes extreme values. On the other hand colored scatterplots of ilr components could not be very instructive for the analyst, if the conventional, black-box ilr is used. Thinking on terms of the Euclidean structure of the simplex, we suggest to set up appropriate projections, which on one side show the compositional geometry and on the other side are still comprehensible by a non-expert analyst, readable for all locations and scales of the data. This is e.g. done by defining special balance displays with carefully- selected axes. Following this idea, we need to systematically ask how to display, explore, describe, and test the relation to complementary or explanatory data of categorical, real, ratio or again compositional scales. This contribution shows that it is sufficient to use some basic concepts and very few advanced tools from multivariate statistics (principal covariances, multivariate linear models, trellis or parallel plots, etc.) to build appropriate procedures for all these combinations of scales. This has some fundamental implications in their software implementation, and how might they be taught to analysts not already experts in multivariate analysis

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The preceding two editions of CoDaWork included talks on the possible consideration of densities as infinite compositions: Egozcue and D´ıaz-Barrero (2003) extended the Euclidean structure of the simplex to a Hilbert space structure of the set of densities within a bounded interval, and van den Boogaart (2005) generalized this to the set of densities bounded by an arbitrary reference density. From the many variations of the Hilbert structures available, we work with three cases. For bounded variables, a basis derived from Legendre polynomials is used. For variables with a lower bound, we standardize them with respect to an exponential distribution and express their densities as coordinates in a basis derived from Laguerre polynomials. Finally, for unbounded variables, a normal distribution is used as reference, and coordinates are obtained with respect to a Hermite-polynomials-based basis. To get the coordinates, several approaches can be considered. A numerical accuracy problem occurs if one estimates the coordinates directly by using discretized scalar products. Thus we propose to use a weighted linear regression approach, where all k- order polynomials are used as predictand variables and weights are proportional to the reference density. Finally, for the case of 2-order Hermite polinomials (normal reference) and 1-order Laguerre polinomials (exponential), one can also derive the coordinates from their relationships to the classical mean and variance. Apart of these theoretical issues, this contribution focuses on the application of this theory to two main problems in sedimentary geology: the comparison of several grain size distributions, and the comparison among different rocks of the empirical distribution of a property measured on a batch of individual grains from the same rock or sediment, like their composition

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Compositional data, also called multiplicative ipsative data, are common in survey research instruments in areas such as time use, budget expenditure and social networks. Compositional data are usually expressed as proportions of a total, whose sum can only be 1. Owing to their constrained nature, statistical analysis in general, and estimation of measurement quality with a confirmatory factor analysis model for multitrait-multimethod (MTMM) designs in particular are challenging tasks. Compositional data are highly non-normal, as they range within the 0-1 interval. One component can only increase if some other(s) decrease, which results in spurious negative correlations among components which cannot be accounted for by the MTMM model parameters. In this article we show how researchers can use the correlated uniqueness model for MTMM designs in order to evaluate measurement quality of compositional indicators. We suggest using the additive log ratio transformation of the data, discuss several approaches to deal with zero components and explain how the interpretation of MTMM designs di ers from the application to standard unconstrained data. We show an illustration of the method on data of social network composition expressed in percentages of partner, family, friends and other members in which we conclude that the faceto-face collection mode is generally superior to the telephone mode, although primacy e ects are higher in the face-to-face mode. Compositions of strong ties (such as partner) are measured with higher quality than those of weaker ties (such as other network members)

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Aquesta tesi estudia com estimar la distribució de les variables regionalitzades l'espai mostral i l'escala de les quals admeten una estructura d'espai Euclidià. Apliquem el principi del treball en coordenades: triem una base ortonormal, fem estadística sobre les coordenades de les dades, i apliquem els output a la base per tal de recuperar un resultat en el mateix espai original. Aplicant-ho a les variables regionalitzades, obtenim una aproximació única consistent, que generalitza les conegudes propietats de les tècniques de kriging a diversos espais mostrals: dades reals, positives o composicionals (vectors de components positives amb suma constant) són tractades com casos particulars. D'aquesta manera, es generalitza la geostadística lineal, i s'ofereix solucions a coneguts problemes de la no-lineal, tot adaptant la mesura i els criteris de representativitat (i.e., mitjanes) a les dades tractades. L'estimador per a dades positives coincideix amb una mitjana geomètrica ponderada, equivalent a l'estimació de la mediana, sense cap dels problemes del clàssic kriging lognormal. El cas composicional ofereix solucions equivalents, però a més permet estimar vectors de probabilitat multinomial. Amb una aproximació bayesiana preliminar, el kriging de composicions esdevé també una alternativa consistent al kriging indicador. Aquesta tècnica s'empra per estimar funcions de probabilitat de variables qualsevol, malgrat que sovint ofereix estimacions negatives, cosa que s'evita amb l'alternativa proposada. La utilitat d'aquest conjunt de tècniques es comprova estudiant la contaminació per amoníac a una estació de control automàtic de la qualitat de l'aigua de la conca de la Tordera, i es conclou que només fent servir les tècniques proposades hom pot detectar en quins instants l'amoni es transforma en amoníac en una concentració superior a la legalment permesa.