80 resultados para Compositional data analysis-roots in geosciences
Resumo:
A novel metric comparison of the appendicular skeleton (fore and hind limb) ofdifferent vertebrates using the Compositional Data Analysis (CDA) methodologicalapproach it’s presented.355 specimens belonging in various taxa of Dinosauria (Sauropodomorpha, Theropoda,Ornithischia and Aves) and Mammalia (Prothotheria, Metatheria and Eutheria) wereanalyzed with CDA.A special focus has been put on Sauropodomorpha dinosaurs and the Aitchinsondistance has been used as a measure of disparity in limb elements proportions to infersome aspects of functional morphology
Resumo:
In standard multivariate statistical analysis common hypotheses of interest concern changes in mean vectors and subvectors. In compositional data analysis it is now well established that compositional change is most readily described in terms of the simplicial operation of perturbation and that subcompositions replace the marginal concept of subvectors. To motivate the statistical developments of this paper we present two challenging compositional problems from food production processes.Against this background the relevance of perturbations and subcompositions can beclearly seen. Moreover we can identify a number of hypotheses of interest involvingthe specification of particular perturbations or differences between perturbations and also hypotheses of subcompositional stability. We identify the two problems as being the counterpart of the analysis of paired comparison or split plot experiments and of separate sample comparative experiments in the jargon of standard multivariate analysis. We then develop appropriate estimation and testing procedures for a complete lattice of relevant compositional hypotheses
Resumo:
Most of economic literature has presented its analysis under the assumption of homogeneous capital stock.However, capital composition differs across countries. What has been the pattern of capital compositionassociated with World economies? We make an exploratory statistical analysis based on compositional datatransformed by Aitchinson logratio transformations and we use tools for visualizing and measuring statisticalestimators of association among the components. The goal is to detect distinctive patterns in the composition.As initial findings could be cited that:1. Sectorial components behaved in a correlated way, building industries on one side and , in a lessclear view, equipment industries on the other.2. Full sample estimation shows a negative correlation between durable goods component andother buildings component and between transportation and building industries components.3. Countries with zeros in some components are mainly low income countries at the bottom of theincome category and behaved in a extreme way distorting main results observed in the fullsample.4. After removing these extreme cases, conclusions seem not very sensitive to the presence ofanother isolated cases
Resumo:
Pounamu (NZ jade), or nephrite, is a protected mineral in its natural form following thetransfer of ownership back to Ngai Tahu under the Ngai Tahu (Pounamu Vesting) Act 1997.Any theft of nephrite is prosecutable under the Crimes Act 1961. Scientific evidence isessential in cases where origin is disputed. A robust method for discrimination of thismaterial through the use of elemental analysis and compositional data analysis is required.Initial studies have characterised the variability within a given nephrite source. This hasincluded investigation of both in situ outcrops and alluvial material. Methods for thediscrimination of two geographically close nephrite sources are being developed.Key Words: forensic, jade, nephrite, laser ablation, inductively coupled plasma massspectrometry, multivariate analysis, elemental analysis, compositional data analysis
Resumo:
Hydrogeological research usually includes some statistical studies devised to elucidate mean background state, characterise relationships among different hydrochemical parameters, and show the influence of human activities. These goals are achieved either by means of a statistical approach or by mixing modelsbetween end-members. Compositional data analysis has proved to be effective with the first approach, but there is no commonly accepted solution to the end-member problem in a compositional framework.We present here a possible solution based on factor analysis of compositions illustrated with a case study.We find two factors on the compositional bi-plot fitting two non-centered orthogonal axes to the most representative variables. Each one of these axes defines a subcomposition, grouping those variables thatlay nearest to it. With each subcomposition a log-contrast is computed and rewritten as an equilibrium equation. These two factors can be interpreted as the isometric log-ratio coordinates (ilr) of three hiddencomponents, that can be plotted in a ternary diagram. These hidden components might be interpreted as end-members.We have analysed 14 molarities in 31 sampling stations all along the Llobregat River and its tributaries, with a monthly measure during two years. We have obtained a bi-plot with a 57% of explained totalvariance, from which we have extracted two factors: factor G, reflecting geological background enhanced by potash mining; and factor A, essentially controlled by urban and/or farming wastewater. Graphicalrepresentation of these two factors allows us to identify three extreme samples, corresponding to pristine waters, potash mining influence and urban sewage influence. To confirm this, we have available analysisof diffused and widespread point sources identified in the area: springs, potash mining lixiviates, sewage, and fertilisers. Each one of these sources shows a clear link with one of the extreme samples, exceptfertilisers due to the heterogeneity of their composition.This approach is a useful tool to distinguish end-members, and characterise them, an issue generally difficult to solve. It is worth note that the end-member composition cannot be fully estimated but only characterised through log-ratio relationships among components. Moreover, the influence of each endmember in a given sample must be evaluated in relative terms of the other samples. These limitations areintrinsic to the relative nature of compositional data
Resumo:
The aim of this paper is to analyse the impact of university knowledge and technology transfer activities on academic research output. Specifically, we study whether researchers with collaborative links with the private sector publish less than their peers without such links, once controlling for other sources of heterogeneity. We report findings from a longitudinal dataset on researchers from two engineering departments in the UK between 1985 until 2006. Our results indicate that researchers with industrial links publish significantly more than their peers. Academic productivity, though, is higher for low levels of industry involvement as compared to high levels.
Resumo:
The objective of this paper is to examine whether informal labor markets affect the flows of Foreign Direct Investment (FDI), and also whether this effect is similar in developed and developing countries. With this aim, different public data sources, such as the World Bank (WB), and the United Nations Conference on Trade and Development (UNCTAD) are used, and panel econometric models are estimated for a sample of 65 countries over a 14 year period (1996-2009). In addition, this paper uses a dynamic model as an extension of the analysis to establish whether such an effect exists and what its indicators and significance may be.
Resumo:
The objective of this paper is to examine whether informal labor markets affect the flows of Foreign Direct Investment (FDI), and also whether this effect is similar in developed and developing countries. With this aim, different public data sources, such as the World Bank (WB), and the United Nations Conference on Trade and Development (UNCTAD) are used, and panel econometric models are estimated for a sample of 65 countries over a 14 year period (1996-2009). In addition, this paper uses a dynamic model as an extension of the analysis to establish whether such an effect exists and what its indicators and significance may be.
Resumo:
Theory of compositional data analysis is often focused on the composition only. However in practical applications we often treat a composition together with covariableswith some other scale. This contribution systematically gathers and develop statistical tools for this situation. For instance, for the graphical display of the dependenceof a composition with a categorical variable, a colored set of ternary diagrams mightbe a good idea for a first look at the data, but it will fast hide important aspects ifthe composition has many parts, or it takes extreme values. On the other hand colored scatterplots of ilr components could not be very instructive for the analyst, if theconventional, black-box ilr is used.Thinking on terms of the Euclidean structure of the simplex, we suggest to set upappropriate projections, which on one side show the compositional geometry and on theother side are still comprehensible by a non-expert analyst, readable for all locations andscales of the data. This is e.g. done by defining special balance displays with carefully-selected axes. Following this idea, we need to systematically ask how to display, explore,describe, and test the relation to complementary or explanatory data of categorical, real,ratio or again compositional scales.This contribution shows that it is sufficient to use some basic concepts and very fewadvanced tools from multivariate statistics (principal covariances, multivariate linearmodels, trellis or parallel plots, etc.) to build appropriate procedures for all these combinations of scales. This has some fundamental implications in their software implementation, and how might they be taught to analysts not already experts in multivariateanalysis
Resumo:
Compositional data analysis motivated the introduction of a complete Euclidean structure in the simplex of D parts. This was based on the early work of J. Aitchison (1986) and completed recently when Aitchinson distance in the simplex was associated with an inner product and orthonormal bases were identified (Aitchison and others, 2002; Egozcue and others, 2003). A partition of the support of a random variable generates a composition by assigning the probability of each interval to a part of the composition. One can imagine that the partition can be refined and the probability density would represent a kind of continuous composition of probabilities in a simplex of infinitely many parts. This intuitive idea would lead to a Hilbert-space of probability densitiesby generalizing the Aitchison geometry for compositions in the simplex into the set probability densities
Resumo:
Simpson's paradox, also known as amalgamation or aggregation paradox, appears whendealing with proportions. Proportions are by construction parts of a whole, which canbe interpreted as compositions assuming they only carry relative information. TheAitchison inner product space structure of the simplex, the sample space of compositions, explains the appearance of the paradox, given that amalgamation is a nonlinearoperation within that structure. Here we propose to use balances, which are specificelements of this structure, to analyse situations where the paradox might appear. Withthe proposed approach we obtain that the centre of the tables analysed is a naturalway to compare them, which avoids by construction the possibility of a paradox.Key words: Aitchison geometry, geometric mean, orthogonal projection
Resumo:
There is almost not a case in exploration geology, where the studied data doesn’tincludes below detection limits and/or zero values, and since most of the geological dataresponds to lognormal distributions, these “zero data” represent a mathematicalchallenge for the interpretation.We need to start by recognizing that there are zero values in geology. For example theamount of quartz in a foyaite (nepheline syenite) is zero, since quartz cannot co-existswith nepheline. Another common essential zero is a North azimuth, however we canalways change that zero for the value of 360°. These are known as “Essential zeros”, butwhat can we do with “Rounded zeros” that are the result of below the detection limit ofthe equipment?Amalgamation, e.g. adding Na2O and K2O, as total alkalis is a solution, but sometimeswe need to differentiate between a sodic and a potassic alteration. Pre-classification intogroups requires a good knowledge of the distribution of the data and the geochemicalcharacteristics of the groups which is not always available. Considering the zero valuesequal to the limit of detection of the used equipment will generate spuriousdistributions, especially in ternary diagrams. Same situation will occur if we replace thezero values by a small amount using non-parametric or parametric techniques(imputation).The method that we are proposing takes into consideration the well known relationshipsbetween some elements. For example, in copper porphyry deposits, there is always agood direct correlation between the copper values and the molybdenum ones, but whilecopper will always be above the limit of detection, many of the molybdenum values willbe “rounded zeros”. So, we will take the lower quartile of the real molybdenum valuesand establish a regression equation with copper, and then we will estimate the“rounded” zero values of molybdenum by their corresponding copper values.The method could be applied to any type of data, provided we establish first theircorrelation dependency.One of the main advantages of this method is that we do not obtain a fixed value for the“rounded zeros”, but one that depends on the value of the other variable.Key words: compositional data analysis, treatment of zeros, essential zeros, roundedzeros, correlation dependency
Resumo:
The Aitchison vector space structure for the simplex is generalized to a Hilbert space structure A2(P) for distributions and likelihoods on arbitrary spaces. Centralnotations of statistics, such as Information or Likelihood, can be identified in the algebraical structure of A2(P) and their corresponding notions in compositional data analysis, such as Aitchison distance or centered log ratio transform.In this way very elaborated aspects of mathematical statistics can be understoodeasily in the light of a simple vector space structure and of compositional data analysis. E.g. combination of statistical information such as Bayesian updating,combination of likelihood and robust M-estimation functions are simple additions/perturbations in A2(Pprior). Weighting observations corresponds to a weightedaddition of the corresponding evidence.Likelihood based statistics for general exponential families turns out to have aparticularly easy interpretation in terms of A2(P). Regular exponential families formfinite dimensional linear subspaces of A2(P) and they correspond to finite dimensionalsubspaces formed by their posterior in the dual information space A2(Pprior).The Aitchison norm can identified with mean Fisher information. The closing constant itself is identified with a generalization of the cummulant function and shown to be Kullback Leiblers directed information. Fisher information is the local geometry of the manifold induced by the A2(P) derivative of the Kullback Leibler information and the space A2(P) can therefore be seen as the tangential geometry of statistical inference at the distribution P.The discussion of A2(P) valued random variables, such as estimation functionsor likelihoods, give a further interpretation of Fisher information as the expected squared norm of evidence and a scale free understanding of unbiased reasoning
Resumo:
Subcompositional coherence is a fundamental property of Aitchison s approach to compositional data analysis, and is the principal justification for using ratios of components. We maintain, however, that lack of subcompositional coherence, that is incoherence, can be measured in an attempt to evaluate whether any given technique is close enough, for all practical purposes, to being subcompositionally coherent. This opens up the field to alternative methods, which might be better suited to cope with problems such as data zeros and outliers, while being only slightly incoherent. The measure that we propose is based on the distance measure between components. We show that the two-part subcompositions, which appear to be the most sensitive to subcompositional incoherence, can be used to establish a distance matrix which can be directly compared with the pairwise distances in the full composition. The closeness of these two matrices can be quantified using a stress measure that is common in multidimensional scaling, providing a measure of subcompositional incoherence. The approach is illustrated using power-transformed correspondence analysis, which has already been shown to converge to log-ratio analysis as the power transform tends to zero.