930 resultados para Compositional data analysis
Resumo:
Guava response to liming and fertilization can be monitored by tissue testing. Tissue nutrient signature is often diagnosed against nutrient concentration standards. However, this approach has been criticized for not considering nutrient interactions and to generate numerical biases as a result of data redundancy, scale dependency and non-normal distribution. Techniques of compositional data analysis can control those biases by balancing groups of nutrients, such as those involved in liming and fertilization. The sequentially arranged and orthonormal isometric log ratios (ilr) or balances avoid numerical bias inherent to compositional data. The objectives were to relate tissue nutrient balances with the production of "Paluma" guava orchards differentially limed and fertilized, and to adjust the current patterns of nutrient balance with the range of more productive guava trees. It was conducted one experiment of 7-yr of liming and three experiments of 3-yr with N, P and K trials in 'Paluma' orchards on an Oxisol. Plant N, P, K, Ca and Mg were monitored yearly. It was selected the [N, P, K | Ca, Mg], [N, P | K], [N | P] and [Ca | Mg] balances to set apart the effects of liming (Ca-Mg) and fertilizers (N-K) on macronutrient balances. Liming largely influenced nutrient balances of guava in the Oxisol while fertilization was less influential. The large range of guava yields and nutrient balances allowed defining balance ranges and comparing them with the critical ranges of nutrient concentration values currently used in Brazil and combined into ilr coordinates.
Resumo:
Fertilizer recommendations for cranberry crops are guided by plant and soil tests. However, critical tissue concentration ranges used for diagnostic purposes are inherently biased by nutrient interactions and physiological age. Compositional data analysis using isometric log ratios (ilr) of nutrients as well as time detrending can avoid numerical biases. The objective was to derive unbiased nutrient signature standards for cranberry in Quebec and compare those standards to literature data. Field trials were conducted during 3 consecutive years with varying P treatments at six commercial sites in Quebec. Leaf tissues were analyzed for N, P, K, Ca, Mg, B, Cu, Zn, Mn and Fe. The analytical results were transformed into ilr nutrient balances of parts and groups of parts. High-yield reference ilr values were computed for cranberry yielding greater than 35 Mg ha-1. Many cranberry fields appeared to be over-supplied with K and either under-supplied with Mn or over-supplied with Fe as shown by their imbalanced [K | Ca, Mg] and [Mn | Fe] ratios. Nutrient concentration ranges from Maine and Wisconsin, USA, were combined into ilr values to generate ranges of balances. It was found that these nutrient ranges were much too broad for application in Quebec or outside the Quebec ranges for the [Ca | Mg] and the [Mn | Fe] balances, that were lower compared to those of high yielding cranberry crops in Quebec.
Resumo:
Compositional data analysis motivated the introduction of a complete Euclidean structure in the simplex of D parts. This was based on the early work of J. Aitchison (1986) and completed recently when Aitchinson distance in the simplex was associated with an inner product and orthonormal bases were identified (Aitchison and others, 2002; Egozcue and others, 2003). A partition of the support of a random variable generates a composition by assigning the probability of each interval to a part of the composition. One can imagine that the partition can be refined and the probability density would represent a kind of continuous composition of probabilities in a simplex of infinitely many parts. This intuitive idea would lead to a Hilbert-space of probability densities by generalizing the Aitchison geometry for compositions in the simplex into the set probability densities
Resumo:
The Aitchison vector space structure for the simplex is generalized to a Hilbert space structure A2(P) for distributions and likelihoods on arbitrary spaces. Central notations of statistics, such as Information or Likelihood, can be identified in the algebraical structure of A2(P) and their corresponding notions in compositional data analysis, such as Aitchison distance or centered log ratio transform. In this way very elaborated aspects of mathematical statistics can be understood easily in the light of a simple vector space structure and of compositional data analysis. E.g. combination of statistical information such as Bayesian updating, combination of likelihood and robust M-estimation functions are simple additions/ perturbations in A2(Pprior). Weighting observations corresponds to a weighted addition of the corresponding evidence. Likelihood based statistics for general exponential families turns out to have a particularly easy interpretation in terms of A2(P). Regular exponential families form finite dimensional linear subspaces of A2(P) and they correspond to finite dimensional subspaces formed by their posterior in the dual information space A2(Pprior). The Aitchison norm can identified with mean Fisher information. The closing constant itself is identified with a generalization of the cummulant function and shown to be Kullback Leiblers directed information. Fisher information is the local geometry of the manifold induced by the A2(P) derivative of the Kullback Leibler information and the space A2(P) can therefore be seen as the tangential geometry of statistical inference at the distribution P. The discussion of A2(P) valued random variables, such as estimation functions or likelihoods, give a further interpretation of Fisher information as the expected squared norm of evidence and a scale free understanding of unbiased reasoning
Resumo:
The chemical composition of sediments and rocks, as well as their distribution at the Martian surface, represent a long term archive of processes, which have formed the planetary surface. A survey of chemical compositions by means of Compositional Data Analysis represents a valuable tool to extract direct evidence for weathering processes and allows to quantify weathering and sedimentation rates. clr-biplot techniques are applied for visualization of chemical relationships across the surface (“chemical maps”). The variability among individual suites of data is further analyzed by means of clr-PCA, in order to extract chemical alteration vectors between fresh rocks and their crusts and for an assessment of different source reservoirs accessible to soil formation. Both techniques are applied to elucidate the influence of remote weathering by combined analysis of several soil forming branches. Vector analysis in the Simplex provides the opportunity to study atmosphere surface interactions, including the role and composition of volcanic gases
Resumo:
There is almost not a case in exploration geology, where the studied data doesn’t includes below detection limits and/or zero values, and since most of the geological data responds to lognormal distributions, these “zero data” represent a mathematical challenge for the interpretation. We need to start by recognizing that there are zero values in geology. For example the amount of quartz in a foyaite (nepheline syenite) is zero, since quartz cannot co-exists with nepheline. Another common essential zero is a North azimuth, however we can always change that zero for the value of 360°. These are known as “Essential zeros”, but what can we do with “Rounded zeros” that are the result of below the detection limit of the equipment? Amalgamation, e.g. adding Na2O and K2O, as total alkalis is a solution, but sometimes we need to differentiate between a sodic and a potassic alteration. Pre-classification into groups requires a good knowledge of the distribution of the data and the geochemical characteristics of the groups which is not always available. Considering the zero values equal to the limit of detection of the used equipment will generate spurious distributions, especially in ternary diagrams. Same situation will occur if we replace the zero values by a small amount using non-parametric or parametric techniques (imputation). The method that we are proposing takes into consideration the well known relationships between some elements. For example, in copper porphyry deposits, there is always a good direct correlation between the copper values and the molybdenum ones, but while copper will always be above the limit of detection, many of the molybdenum values will be “rounded zeros”. So, we will take the lower quartile of the real molybdenum values and establish a regression equation with copper, and then we will estimate the “rounded” zero values of molybdenum by their corresponding copper values. The method could be applied to any type of data, provided we establish first their correlation dependency. One of the main advantages of this method is that we do not obtain a fixed value for the “rounded zeros”, but one that depends on the value of the other variable. Key words: compositional data analysis, treatment of zeros, essential zeros, rounded zeros, correlation dependency
Resumo:
The Hardy-Weinberg law, formulated about 100 years ago, states that under certain assumptions, the three genotypes AA, AB and BB at a bi-allelic locus are expected to occur in the proportions p2, 2pq, and q2 respectively, where p is the allele frequency of A, and q = 1-p. There are many statistical tests being used to check whether empirical marker data obeys the Hardy-Weinberg principle. Among these are the classical xi-square test (with or without continuity correction), the likelihood ratio test, Fisher's Exact test, and exact tests in combination with Monte Carlo and Markov Chain algorithms. Tests for Hardy-Weinberg equilibrium (HWE) are numerical in nature, requiring the computation of a test statistic and a p-value. There is however, ample space for the use of graphics in HWE tests, in particular for the ternary plot. Nowadays, many genetical studies are using genetical markers known as Single Nucleotide Polymorphisms (SNPs). SNP data comes in the form of counts, but from the counts one typically computes genotype frequencies and allele frequencies. These frequencies satisfy the unit-sum constraint, and their analysis therefore falls within the realm of compositional data analysis (Aitchison, 1986). SNPs are usually bi-allelic, which implies that the genotype frequencies can be adequately represented in a ternary plot. Compositions that are in exact HWE describe a parabola in the ternary plot. Compositions for which HWE cannot be rejected in a statistical test are typically “close" to the parabola, whereas compositions that differ significantly from HWE are “far". By rewriting the statistics used to test for HWE in terms of heterozygote frequencies, acceptance regions for HWE can be obtained that can be depicted in the ternary plot. This way, compositions can be tested for HWE purely on the basis of their position in the ternary plot (Graffelman & Morales, 2008). This leads to nice graphical representations where large numbers of SNPs can be tested for HWE in a single graph. Several examples of graphical tests for HWE (implemented in R software), will be shown, using SNP data from different human populations
Resumo:
Theory of compositional data analysis is often focused on the composition only. However in practical applications we often treat a composition together with covariables with some other scale. This contribution systematically gathers and develop statistical tools for this situation. For instance, for the graphical display of the dependence of a composition with a categorical variable, a colored set of ternary diagrams might be a good idea for a first look at the data, but it will fast hide important aspects if the composition has many parts, or it takes extreme values. On the other hand colored scatterplots of ilr components could not be very instructive for the analyst, if the conventional, black-box ilr is used. Thinking on terms of the Euclidean structure of the simplex, we suggest to set up appropriate projections, which on one side show the compositional geometry and on the other side are still comprehensible by a non-expert analyst, readable for all locations and scales of the data. This is e.g. done by defining special balance displays with carefully- selected axes. Following this idea, we need to systematically ask how to display, explore, describe, and test the relation to complementary or explanatory data of categorical, real, ratio or again compositional scales. This contribution shows that it is sufficient to use some basic concepts and very few advanced tools from multivariate statistics (principal covariances, multivariate linear models, trellis or parallel plots, etc.) to build appropriate procedures for all these combinations of scales. This has some fundamental implications in their software implementation, and how might they be taught to analysts not already experts in multivariate analysis
Resumo:
A novel metric comparison of the appendicular skeleton (fore and hind limb) of different vertebrates using the Compositional Data Analysis (CDA) methodological approach it’s presented. 355 specimens belonging in various taxa of Dinosauria (Sauropodomorpha, Theropoda, Ornithischia and Aves) and Mammalia (Prothotheria, Metatheria and Eutheria) were analyzed with CDA. A special focus has been put on Sauropodomorpha dinosaurs and the Aitchinson distance has been used as a measure of disparity in limb elements proportions to infer some aspects of functional morphology
Resumo:
A resposta da goiabeira à calagem e à adubação pode ser monitorada por análises de tecido vegetal. O perfil nutricional é definido em relação a padrões de teores de nutrientes. No entanto, os teores de nutrientes-padrão são constantemente criticados por não considerarem as interações que ocorrem entre nutrientes e por gerarem tendências numéricas, decorrentes da redundância dos dados, da dependência de escala e da distribuição não normal. As técnicas de análise composicional de dados podem controlar esses dados tendenciosos, equilibrando os grupos de nutrientes, tais como os envolvidos na calagem e na adubação. A utilização das relações log isométricas (ilr) ortonormais, sequencialmente dispostas, evita tendências numéricas inerentes aos dados de composição. Os objetivos do trabalho foram relacionar o balanço de nutrientes dos tecidos vegetais com a produção de goiabeiras em pomares de 'Paluma' diferentemente corrigidos e adubados, e ajustar os atuais padrões de nutrientes com a faixa de equilíbrio das goiabeiras mais produtivas. Um experimento de calagem de sete anos e três, experimentos de três anos com doses de N, P2O5 e K2O, foram conduzidos em pomares de goiabeiras 'Paluma' em um Latossolo Vermelho-Amarelo. Os teores de N, P, K, Ca e Mg na planta foram monitorados anualmente. Selecionaram-se os balanços [N, P, K | Ca, Mg], [N, P | K], [N | P] e [Ca | Mg] para separar os efeitos da calagem (Ca-Mg) e dos fertilizantes (N-K) nos balanços de macronutrientes. Os balanços foram mais influenciados pela calagem do que pela fertilização. A produtividade das goiabeiras e seu balanço nutricional permitiram a definição de faixas de equilíbrio de nutrientes e sua validação com as faixas de concentrações críticas atualmente utilizadas no Brasil e combinadas em coordenadas ilr.
Resumo:
Fertilization of guava relies on soil and tissue testing. The interpretation of tissue test is currently conducted by comparing nutrient concentrations or dual ratios with critical values or ranges. The critical value approach is affected by nutrient interactions. Nutrient interactions can be described by dual ratios where two nutrients are compressed into a single expression or a ternary diagrams where one redundant proportion can be computed by difference between 100% and the sum of the other two. There are D(D-1) possible dual ratios in a D-parts composition and most of them are thus redundant. Nutrients are components of a mixture that convey relative, not absolute information on the composition. There are D-1 balances between components or ingredients in any mixture. Compositional data are intrinsically redundant, scale dependent and non-normally distributed. Based on the principles of equilibrium and orthogonality, the nutrient balance concept projects D-1 isometric log ratio (ilr) coordinates into the Euclidean space. The D-1 balances between groups of nutrients are ordered to reflect knowledge in plant physiology, soil fertility and crop management. Our objective was to evaluate the ilr approach using nutrient data from a guava orchard survey and fertilizer trials across the state of São Paulo, Brazil. Cationic balances varied widely between orchards. We found that the Redfield N/P ratio of 13 was critical for high guava yield. We present guava yield maps in ternary diagrams. Although the ratio between nutrients changing in the same direction with time is often assumed to be stationary, most guava nutrient balances and dual ratios were found to be non-stationary. The ilr model provided an unbiased nutrient diagnosis of guava. © ISHS.
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
La pomme de terre est l’une des cultures les plus exigeantes en engrais et elle se cultive généralement sur des sols légers ayant de faibles réserves en N, P, K, Ca et Mg. Ces cinq éléments sont essentiels à la croissance des plants, à l’atteinte des bons rendements et à l’obtention de bonne qualité des tubercules de pomme de terre à la récolte et à l’entreposage. La recherche d’un équilibre entre ces cinq éléments constitue l’un des défis de pratiques agricoles de précision. La collecte de 168 échantillons de feuilles de pommes de terre selon une grille de 40 m × 60 m (densité d’échantillonnage de 2,9 échantillons ha-1) dans un champ de pommes de terre de 54 ha au Saguenay-Lac-Saint-Jean et la détermination de leur teneur en N, P, K, Ca et Mg au stade début floraison, jumelée à une lecture de l’indice de chlorophylle avec SPAD-502, a permis d’établir les faits suivants : parmi tous les indicateurs de diagnostic foliaire selon les 3 approches connues VMC, DRIS et CND, le contraste logarithmique entre les deux éléments nutritifs type anioniques (N et P) vs les trois éléments type cationiques (K, Ca et Mg), noté ilr (log ratio isométrique) du coda (Compositional Data Analysis) est l’indicateur le plus fortement relié à la lecture SPAD-502 (r=0,77). L’étude géostatique spatiale appliquée au Coda a montré une grande similitude entre le CND-ilr (anions vs cations) et la lecture SPAD-502. Ce CND-ilr devrait être interprété en termes de fertilisation de démarrage (N+P), en lien avec les apports des cations sous forme d’engrais (K, Ca et Mg) ou d’amendement (Ca et Mg).