994 resultados para Universitat de Girona -- Scolarships, fellowships, etc.
Resumo:
Modern methods of compositional data analysis are not well known in biomedical research. Moreover, there appear to be few mathematical and statistical researchers working on compositional biomedical problems. Like the earth and environmental sciences, biomedicine has many problems in which the relevant scienti c information is encoded in the relative abundance of key species or categories. I introduce three problems in cancer research in which analysis of compositions plays an important role. The problems involve 1) the classi cation of serum proteomic pro les for early detection of lung cancer, 2) inference of the relative amounts of di erent tissue types in a diagnostic tumor biopsy, and 3) the subcellular localization of the BRCA1 protein, and it's role in breast cancer patient prognosis. For each of these problems I outline a partial solution. However, none of these problems is \solved". I attempt to identify areas in which additional statistical development is needed with the hope of encouraging more compositional data analysts to become involved in biomedical research
Resumo:
A version of Matheron’s discrete Gaussian model is applied to cell composition data. The examples are for map patterns of felsic metavolcanics in two different areas. Q-Q plots of the model for cell values representing proportion of 10 km x 10 km cell area underlain by this rock type are approximately linear, and the line of best fit can be used to estimate the parameters of the model. It is also shown that felsic metavolcanics in the Abitibi area of the Canadian Shield can be modeled as a fractal
Resumo:
The biplot has proved to be a powerful descriptive and analytical tool in many areas of applications of statistics. For compositional data the necessary theoretical adaptation has been provided, with illustrative applications, by Aitchison (1990) and Aitchison and Greenacre (2002). These papers were restricted to the interpretation of simple compositional data sets. In many situations the problem has to be described in some form of conditional modelling. For example, in a clinical trial where interest is in how patients’ steroid metabolite compositions may change as a result of different treatment regimes, interest is in relating the compositions after treatment to the compositions before treatment and the nature of the treatments applied. To study this through a biplot technique requires the development of some form of conditional compositional biplot. This is the purpose of this paper. We choose as a motivating application an analysis of the 1992 US President ial Election, where interest may be in how the three-part composition, the percentage division among the three candidates - Bush, Clinton and Perot - of the presidential vote in each state, depends on the ethnic composition and on the urban-rural composition of the state. The methodology of conditional compositional biplots is first developed and a detailed interpretation of the 1992 US Presidential Election provided. We use a second application involving the conditional variability of tektite mineral compositions with respect to major oxide compositions to demonstrate some hazards of simplistic interpretation of biplots. Finally we conjecture on further possible applications of conditional compositional biplots
Resumo:
We propose to analyze shapes as “compositions” of distances in Aitchison geometry as an alternate and complementary tool to classical shape analysis, especially when size is non-informative. Shapes are typically described by the location of user-chosen landmarks. However the shape – considered as invariant under scaling, translation, mirroring and rotation – does not uniquely define the location of landmarks. A simple approach is to use distances of landmarks instead of the locations of landmarks them self. Distances are positive numbers defined up to joint scaling, a mathematical structure quite similar to compositions. The shape fixes only ratios of distances. Perturbations correspond to relative changes of the size of subshapes and of aspect ratios. The power transform increases the expression of the shape by increasing distance ratios. In analogy to the subcompositional consistency, results should not depend too much on the choice of distances, because different subsets of the pairwise distances of landmarks uniquely define the shape. Various compositional analysis tools can be applied to sets of distances directly or after minor modifications concerning the singularity of the covariance matrix and yield results with direct interpretations in terms of shape changes. The remaining problem is that not all sets of distances correspond to a valid shape. Nevertheless interpolated or predicted shapes can be backtransformated by multidimensional scaling (when all pairwise distances are used) or free geodetic adjustment (when sufficiently many distances are used)
Resumo:
We compare correspondance análisis to the logratio approach based on compositional data. We also compare correspondance análisis and an alternative approach using Hellinger distance, for representing categorical data in a contingency table. We propose a coefficient which globally measures the similarity between these approaches. This coefficient can be decomposed into several components, one component for each principal dimension, indicating the contribution of the dimensions to the difference between the two representations. These three methods of representation can produce quite similar results. One illustrative example is given
Resumo:
The use of orthonormal coordinates in the simplex and, particularly, balance coordinates, has suggested the use of a dendrogram for the exploratory analysis of compositional data. The dendrogram is based on a sequential binary partition of a compositional vector into groups of parts. At each step of a partition, one group of parts is divided into two new groups, and a balancing axis in the simplex between both groups is defined. The set of balancing axes constitutes an orthonormal basis, and the projections of the sample on them are orthogonal coordinates. They can be represented in a dendrogram-like graph showing: (a) the way of grouping parts of the compositional vector; (b) the explanatory role of each subcomposition generated in the partition process; (c) the decomposition of the total variance into balance components associated with each binary partition; (d) a box-plot of each balance. This representation is useful to help the interpretation of balance coordinates; to identify which are the most explanatory coordinates; and to describe the whole sample in a single diagram independently of the number of parts of the sample
Resumo:
The application of compositional data analysis through log ratio trans- formations corresponds to a multinomial logit model for the shares themselves. This model is characterized by the property of Independence of Irrelevant Alter- natives (IIA). IIA states that the odds ratio in this case the ratio of shares is invariant to the addition or deletion of outcomes to the problem. It is exactly this invariance of the ratio that underlies the commonly used zero replacement procedure in compositional data analysis. In this paper we investigate using the nested logit model that does not embody IIA and an associated zero replacement procedure and compare its performance with that of the more usual approach of using the multinomial logit model. Our comparisons exploit a data set that com- bines voting data by electoral division with corresponding census data for each division for the 2001 Federal election in Australia
Resumo:
As stated in Aitchison (1986), a proper study of relative variation in a compositional data set should be based on logratios, and dealing with logratios excludes dealing with zeros. Nevertheless, it is clear that zero observations might be present in real data sets, either because the corresponding part is completely absent –essential zeros– or because it is below detection limit –rounded zeros. Because the second kind of zeros is usually understood as “a trace too small to measure”, it seems reasonable to replace them by a suitable small value, and this has been the traditional approach. As stated, e.g. by Tauber (1999) and by Martín-Fernández, Barceló-Vidal, and Pawlowsky-Glahn (2000), the principal problem in compositional data analysis is related to rounded zeros. One should be careful to use a replacement strategy that does not seriously distort the general structure of the data. In particular, the covariance structure of the involved parts –and thus the metric properties– should be preserved, as otherwise further analysis on subpopulations could be misleading. Following this point of view, a non-parametric imputation method is introduced in Martín-Fernández, Barceló-Vidal, and Pawlowsky-Glahn (2000). This method is analyzed in depth by Martín-Fernández, Barceló-Vidal, and Pawlowsky-Glahn (2003) where it is shown that the theoretical drawbacks of the additive zero replacement method proposed in Aitchison (1986) can be overcome using a new multiplicative approach on the non-zero parts of a composition. The new approach has reasonable properties from a compositional point of view. In particular, it is “natural” in the sense that it recovers the “true” composition if replacement values are identical to the missing values, and it is coherent with the basic operations on the simplex. This coherence implies that the covariance structure of subcompositions with no zeros is preserved. As a generalization of the multiplicative replacement, in the same paper a substitution method for missing values on compositional data sets is introduced
Resumo:
Starting with logratio biplots for compositional data, which are based on the principle of subcompositional coherence, and then adding weights, as in correspondence analysis, we rediscover Lewi's spectral map and many connections to analyses of two-way tables of non-negative data. Thanks to the weighting, the method also achieves the property of distributional equivalence
Resumo:
The algebraic-geometric structure of the simplex, known as Aitchison geometry, is used to look at the Dirichlet family of distributions from a new perspective. A classical Dirichlet density function is expressed with respect to the Lebesgue measure on real space. We propose here to change this measure by the Aitchison measure on the simplex, and study some properties and characteristic measures of the resulting density
Resumo:
Examples of compositional data. The simplex, a suitable sample space for compositional data and Aitchison's geometry. R, a free language and environment for statistical computing and graphics
Resumo:
All of the imputation techniques usually applied for replacing values below the detection limit in compositional data sets have adverse effects on the variability. In this work we propose a modification of the EM algorithm that is applied using the additive log-ratio transformation. This new strategy is applied to a compositional data set and the results are compared with the usual imputation techniques
Resumo:
In the eighties, John Aitchison (1986) developed a new methodological approach for the statistical analysis of compositional data. This new methodology was implemented in Basic routines grouped under the name CODA and later NEWCODA inMatlab (Aitchison, 1997). After that, several other authors have published extensions to this methodology: Marín-Fernández and others (2000), Barceló-Vidal and others (2001), Pawlowsky-Glahn and Egozcue (2001, 2002) and Egozcue and others (2003). (...)
Resumo:
Compositional data naturally arises from the scientific analysis of the chemical composition of archaeological material such as ceramic and glass artefacts. Data of this type can be explored using a variety of techniques, from standard multivariate methods such as principal components analysis and cluster analysis, to methods based upon the use of log-ratios. The general aim is to identify groups of chemically similar artefacts that could potentially be used to answer questions of provenance. This paper will demonstrate work in progress on the development of a documented library of methods, implemented using the statistical package R, for the analysis of compositional data. R is an open source package that makes available very powerful statistical facilities at no cost. We aim to show how, with the aid of statistical software such as R, traditional exploratory multivariate analysis can easily be used alongside, or in combination with, specialist techniques of compositional data analysis. The library has been developed from a core of basic R functionality, together with purpose-written routines arising from our own research (for example that reported at CoDaWork'03). In addition, we have included other appropriate publicly available techniques and libraries that have been implemented in R by other authors. Available functions range from standard multivariate techniques through to various approaches to log-ratio analysis and zero replacement. We also discuss and demonstrate a small selection of relatively new techniques that have hitherto been little-used in archaeometric applications involving compositional data. The application of the library to the analysis of data arising in archaeometry will be demonstrated; results from different analyses will be compared; and the utility of the various methods discussed
Resumo:
”compositions” is a new R-package for the analysis of compositional and positive data. It contains four classes corresponding to the four different types of compositional and positive geometry (including the Aitchison geometry). It provides means for computation, plotting and high-level multivariate statistical analysis in all four geometries. These geometries are treated in an fully analogous way, based on the principle of working in coordinates, and the object-oriented programming paradigm of R. In this way, called functions automatically select the most appropriate type of analysis as a function of the geometry. The graphical capabilities include ternary diagrams and tetrahedrons, various compositional plots (boxplots, barplots, piecharts) and extensive graphical tools for principal components. Afterwards, ortion and proportion lines, straight lines and ellipses in all geometries can be added to plots. The package is accompanied by a hands-on-introduction, documentation for every function, demos of the graphical capabilities and plenty of usage examples. It allows direct and parallel computation in all four vector spaces and provides the beginner with a copy-and-paste style of data analysis, while letting advanced users keep the functionality and customizability they demand of R, as well as all necessary tools to add own analysis routines. A complete example is included in the appendix