1000 resultados para Estadística matemàtica -- Informàtica
Resumo:
The application of compositional data analysis through log ratio trans-formations corresponds to a multinomial logit model for the shares themselves.This model is characterized by the property of Independence of Irrelevant Alter-natives (IIA). IIA states that the odds ratio in this case the ratio of shares is invariant to the addition or deletion of outcomes to the problem. It is exactlythis invariance of the ratio that underlies the commonly used zero replacementprocedure in compositional data analysis. In this paper we investigate using thenested logit model that does not embody IIA and an associated zero replacementprocedure and compare its performance with that of the more usual approach ofusing the multinomial logit model. Our comparisons exploit a data set that com-bines voting data by electoral division with corresponding census data for eachdivision for the 2001 Federal election in Australia
Resumo:
Examples of compositional data. The simplex, a suitable sample space for compositional data and Aitchison's geometry. R, a free language and environment for statistical computing and graphics
Resumo:
Compositional data naturally arises from the scientific analysis of the chemicalcomposition of archaeological material such as ceramic and glass artefacts. Data of thistype can be explored using a variety of techniques, from standard multivariate methodssuch as principal components analysis and cluster analysis, to methods based upon theuse of log-ratios. The general aim is to identify groups of chemically similar artefactsthat could potentially be used to answer questions of provenance.This paper will demonstrate work in progress on the development of a documentedlibrary of methods, implemented using the statistical package R, for the analysis ofcompositional data. R is an open source package that makes available very powerfulstatistical facilities at no cost. We aim to show how, with the aid of statistical softwaresuch as R, traditional exploratory multivariate analysis can easily be used alongside, orin combination with, specialist techniques of compositional data analysis.The library has been developed from a core of basic R functionality, together withpurpose-written routines arising from our own research (for example that reported atCoDaWork'03). In addition, we have included other appropriate publicly availabletechniques and libraries that have been implemented in R by other authors. Availablefunctions range from standard multivariate techniques through to various approaches tolog-ratio analysis and zero replacement. We also discuss and demonstrate a smallselection of relatively new techniques that have hitherto been little-used inarchaeometric applications involving compositional data. The application of the libraryto the analysis of data arising in archaeometry will be demonstrated; results fromdifferent analyses will be compared; and the utility of the various methods discussed
Resumo:
”compositions” is a new R-package for the analysis of compositional and positive data.It contains four classes corresponding to the four different types of compositional andpositive geometry (including the Aitchison geometry). It provides means for computation,plotting and high-level multivariate statistical analysis in all four geometries.These geometries are treated in an fully analogous way, based on the principle of workingin coordinates, and the object-oriented programming paradigm of R. In this way,called functions automatically select the most appropriate type of analysis as a functionof the geometry. The graphical capabilities include ternary diagrams and tetrahedrons,various compositional plots (boxplots, barplots, piecharts) and extensive graphical toolsfor principal components. Afterwards, ortion and proportion lines, straight lines andellipses in all geometries can be added to plots. The package is accompanied by ahands-on-introduction, documentation for every function, demos of the graphical capabilitiesand plenty of usage examples. It allows direct and parallel computation inall four vector spaces and provides the beginner with a copy-and-paste style of dataanalysis, while letting advanced users keep the functionality and customizability theydemand of R, as well as all necessary tools to add own analysis routines. A completeexample is included in the appendix
Resumo:
R from http://www.r-project.org/ is ‘GNU S’ – a language and environment for statistical computingand graphics. The environment in which many classical and modern statistical techniques havebeen implemented, but many are supplied as packages. There are 8 standard packages and many moreare available through the cran family of Internet sites http://cran.r-project.org .We started to develop a library of functions in R to support the analysis of mixtures and our goal isa MixeR package for compositional data analysis that provides support foroperations on compositions: perturbation and power multiplication, subcomposition with or withoutresiduals, centering of the data, computing Aitchison’s, Euclidean, Bhattacharyya distances,compositional Kullback-Leibler divergence etc.graphical presentation of compositions in ternary diagrams and tetrahedrons with additional features:barycenter, geometric mean of the data set, the percentiles lines, marking and coloring ofsubsets of the data set, theirs geometric means, notation of individual data in the set . . .dealing with zeros and missing values in compositional data sets with R procedures for simpleand multiplicative replacement strategy,the time series analysis of compositional data.We’ll present the current status of MixeR development and illustrate its use on selected data sets
Resumo:
The statistical analysis of compositional data is commonly used in geological studies.As is well-known, compositions should be treated using logratios of parts, which aredifficult to use correctly in standard statistical packages. In this paper we describe thenew features of our freeware package, named CoDaPack, which implements most of thebasic statistical methods suitable for compositional data. An example using real data ispresented to illustrate the use of the package
Resumo:
There are two principal chemical concepts that are important for studying the naturalenvironment. The first one is thermodynamics, which describes whether a system is atequilibrium or can spontaneously change by chemical reactions. The second main conceptis how fast chemical reactions (kinetics or rate of chemical change) take place wheneverthey start. In this work we examine a natural system in which both thermodynamics andkinetic factors are important in determining the abundance of NH+4 , NO−2 and NO−3 insuperficial waters. Samples were collected in the Arno Basin (Tuscany, Italy), a system inwhich natural and antrophic effects both contribute to highly modify the chemical compositionof water. Thermodynamical modelling based on the reduction-oxidation reactionsinvolving the passage NH+4 -& NO−2 -& NO−3 in equilibrium conditions has allowed todetermine the Eh redox potential values able to characterise the state of each sample and,consequently, of the fluid environment from which it was drawn. Just as pH expressesthe concentration of H+ in solution, redox potential is used to express the tendency of anenvironment to receive or supply electrons. In this context, oxic environments, as thoseof river systems, are said to have a high redox potential because O2 is available as anelectron acceptor.Principles of thermodynamics and chemical kinetics allow to obtain a model that oftendoes not completely describe the reality of natural systems. Chemical reactions may indeedfail to achieve equilibrium because the products escape from the site of the rectionor because reactions involving the trasformation are very slow, so that non-equilibriumconditions exist for long periods. Moreover, reaction rates can be sensitive to poorly understoodcatalytic effects or to surface effects, while variables as concentration (a largenumber of chemical species can coexist and interact concurrently), temperature and pressurecan have large gradients in natural systems. By taking into account this, data of 91water samples have been modelled by using statistical methodologies for compositionaldata. The application of log–contrast analysis has allowed to obtain statistical parametersto be correlated with the calculated Eh values. In this way, natural conditions in whichchemical equilibrium is hypothesised, as well as underlying fast reactions, are comparedwith those described by a stochastic approach
Resumo:
The R-package “compositions”is a tool for advanced compositional analysis. Its basicfunctionality has seen some conceptual improvement, containing now some facilitiesto work with and represent ilr bases built from balances, and an elaborated subsys-tem for dealing with several kinds of irregular data: (rounded or structural) zeroes,incomplete observations and outliers. The general approach to these irregularities isbased on subcompositions: for an irregular datum, one can distinguish a “regular” sub-composition (where all parts are actually observed and the datum behaves typically)and a “problematic” subcomposition (with those unobserved, zero or rounded parts, orelse where the datum shows an erratic or atypical behaviour). Systematic classificationschemes are proposed for both outliers and missing values (including zeros) focusing onthe nature of irregularities in the datum subcomposition(s).To compute statistics with values missing at random and structural zeros, a projectionapproach is implemented: a given datum contributes to the estimation of the desiredparameters only on the subcompositon where it was observed. For data sets withvalues below the detection limit, two different approaches are provided: the well-knownimputation technique, and also the projection approach.To compute statistics in the presence of outliers, robust statistics are adapted to thecharacteristics of compositional data, based on the minimum covariance determinantapproach. The outlier classification is based on four different models of outlier occur-rence and Monte-Carlo-based tests for their characterization. Furthermore the packageprovides special plots helping to understand the nature of outliers in the dataset.Keywords: coda-dendrogram, lost values, MAR, missing data, MCD estimator,robustness, rounded zeros
Resumo:
All of the imputation techniques usually applied for replacing values below thedetection limit in compositional data sets have adverse effects on the variability. In thiswork we propose a modification of the EM algorithm that is applied using the additivelog-ratio transformation. This new strategy is applied to a compositional data set and theresults are compared with the usual imputation techniques
Resumo:
In the eighties, John Aitchison (1986) developed a new methodological approach for the statistical analysis of compositional data. This new methodology was implemented in Basic routines grouped under the name CODA and later NEWCODA inMatlab (Aitchison, 1997). After that, several other authors have published extensions to this methodology: Marín-Fernández and others (2000), Barceló-Vidal and others (2001), Pawlowsky-Glahn and Egozcue (2001, 2002) and Egozcue and others (2003). (...)
Resumo:
First application of compositional data analysis techniques to Australian election data
Resumo:
The application of compositional data analysis through log ratio trans- formations corresponds to a multinomial logit model for the shares themselves. This model is characterized by the property of Independence of Irrelevant Alter- natives (IIA). IIA states that the odds ratio in this case the ratio of shares is invariant to the addition or deletion of outcomes to the problem. It is exactly this invariance of the ratio that underlies the commonly used zero replacement procedure in compositional data analysis. In this paper we investigate using the nested logit model that does not embody IIA and an associated zero replacement procedure and compare its performance with that of the more usual approach of using the multinomial logit model. Our comparisons exploit a data set that com- bines voting data by electoral division with corresponding census data for each division for the 2001 Federal election in Australia
Resumo:
Examples of compositional data. The simplex, a suitable sample space for compositional data and Aitchison's geometry. R, a free language and environment for statistical computing and graphics
Resumo:
All of the imputation techniques usually applied for replacing values below the detection limit in compositional data sets have adverse effects on the variability. In this work we propose a modification of the EM algorithm that is applied using the additive log-ratio transformation. This new strategy is applied to a compositional data set and the results are compared with the usual imputation techniques
Resumo:
In the eighties, John Aitchison (1986) developed a new methodological approach for the statistical analysis of compositional data. This new methodology was implemented in Basic routines grouped under the name CODA and later NEWCODA inMatlab (Aitchison, 1997). After that, several other authors have published extensions to this methodology: Marín-Fernández and others (2000), Barceló-Vidal and others (2001), Pawlowsky-Glahn and Egozcue (2001, 2002) and Egozcue and others (2003). (...)