980 resultados para Escuela Politécnica Superior
Resumo:
”compositions” is a new R-package for the analysis of compositional and positive data. It contains four classes corresponding to the four different types of compositional and positive geometry (including the Aitchison geometry). It provides means for computation, plotting and high-level multivariate statistical analysis in all four geometries. These geometries are treated in an fully analogous way, based on the principle of working in coordinates, and the object-oriented programming paradigm of R. In this way, called functions automatically select the most appropriate type of analysis as a function of the geometry. The graphical capabilities include ternary diagrams and tetrahedrons, various compositional plots (boxplots, barplots, piecharts) and extensive graphical tools for principal components. Afterwards, ortion and proportion lines, straight lines and ellipses in all geometries can be added to plots. The package is accompanied by a hands-on-introduction, documentation for every function, demos of the graphical capabilities and plenty of usage examples. It allows direct and parallel computation in all four vector spaces and provides the beginner with a copy-and-paste style of data analysis, while letting advanced users keep the functionality and customizability they demand of R, as well as all necessary tools to add own analysis routines. A complete example is included in the appendix
Resumo:
We shall call an n × p data matrix fully-compositional if the rows sum to a constant, and sub-compositional if the variables are a subset of a fully-compositional data set1. Such data occur widely in archaeometry, where it is common to determine the chemical composition of ceramic, glass, metal or other artefacts using techniques such as neutron activation analysis (NAA), inductively coupled plasma spectroscopy (ICPS), X-ray fluorescence analysis (XRF) etc. Interest often centres on whether there are distinct chemical groups within the data and whether, for example, these can be associated with different origins or manufacturing technologies
Resumo:
Presentation in CODAWORK'03, session 4: Applications to archeometry
Resumo:
Developments in the statistical analysis of compositional data over the last two decades have made possible a much deeper exploration of the nature of variability, and the possible processes associated with compositional data sets from many disciplines. In this paper we concentrate on geochemical data sets. First we explain how hypotheses of compositional variability may be formulated within the natural sample space, the unit simplex, including useful hypotheses of subcompositional discrimination and specific perturbational change. Then we develop through standard methodology, such as generalised likelihood ratio tests, statistical tools to allow the systematic investigation of a complete lattice of such hypotheses. Some of these tests are simple adaptations of existing multivariate tests but others require special construction. We comment on the use of graphical methods in compositional data analysis and on the ordination of specimens. The recent development of the concept of compositional processes is then explained together with the necessary tools for a staying- in-the-simplex approach, namely compositional singular value decompositions. All these statistical techniques are illustrated for a substantial compositional data set, consisting of 209 major-oxide and rare-element compositions of metamorphosed limestones from the Northeast and Central Highlands of Scotland. Finally we point out a number of unresolved problems in the statistical analysis of compositional processes
Resumo:
First discussion on compositional data analysis is attributable to Karl Pearson, in 1897. However, notwithstanding the recent developments on algebraic structure of the simplex, more than twenty years after Aitchison’s idea of log-transformations of closed data, scientific literature is again full of statistical treatments of this type of data by using traditional methodologies. This is particularly true in environmental geochemistry where besides the problem of the closure, the spatial structure (dependence) of the data have to be considered. In this work we propose the use of log-contrast values, obtained by a simplicial principal component analysis, as LQGLFDWRUV of given environmental conditions. The investigation of the log-constrast frequency distributions allows pointing out the statistical laws able to generate the values and to govern their variability. The changes, if compared, for example, with the mean values of the random variables assumed as models, or other reference parameters, allow defining monitors to be used to assess the extent of possible environmental contamination. Case study on running and ground waters from Chiavenna Valley (Northern Italy) by using Na+, K+, Ca2+, Mg2+, HCO3-, SO4 2- and Cl- concentrations will be illustrated
Resumo:
The use of perturbation and power transformation operations permits the investigation of linear processes in the simplex as in a vectorial space. When the investigated geochemical processes can be constrained by the use of well-known starting point, the eigenvectors of the covariance matrix of a non-centred principal component analysis allow to model compositional changes compared with a reference point. The results obtained for the chemistry of water collected in River Arno (central-northern Italy) have open new perspectives for considering relative changes of the analysed variables and to hypothesise the relative effect of different acting physical-chemical processes, thus posing the basis for a quantitative modelling
Resumo:
Kriging is an interpolation technique whose optimality criteria are based on normality assumptions either for observed or for transformed data. This is the case of normal, lognormal and multigaussian kriging. When kriging is applied to transformed scores, optimality of obtained estimators becomes a cumbersome concept: back-transformed optimal interpolations in transformed scores are not optimal in the original sample space, and vice-versa. This lack of compatible criteria of optimality induces a variety of problems in both point and block estimates. For instance, lognormal kriging, widely used to interpolate positive variables, has no straightforward way to build consistent and optimal confidence intervals for estimates. These problems are ultimately linked to the assumed space structure of the data support: for instance, positive values, when modelled with lognormal distributions, are assumed to be embedded in the whole real space, with the usual real space structure and Lebesgue measure
Resumo:
Estudi dels paràmetres més rellevants en la creació d'una empresa agropecuària (identificació del projecte, equip fundador, anàlisi del mercat, pla de marqueting, pla d'organització, pla jurídic - fiscal i pla econòmic- financer per tal de valorar la viabilitat de l'empresa "Xai pigallat" de Ridaura (La Garrotxa)
Resumo:
Hydrogeological research usually includes some statistical studies devised to elucidate mean background state, characterise relationships among different hydrochemical parameters, and show the influence of human activities. These goals are achieved either by means of a statistical approach or by mixing models between end-members. Compositional data analysis has proved to be effective with the first approach, but there is no commonly accepted solution to the end-member problem in a compositional framework. We present here a possible solution based on factor analysis of compositions illustrated with a case study. We find two factors on the compositional bi-plot fitting two non-centered orthogonal axes to the most representative variables. Each one of these axes defines a subcomposition, grouping those variables that lay nearest to it. With each subcomposition a log-contrast is computed and rewritten as an equilibrium equation. These two factors can be interpreted as the isometric log-ratio coordinates (ilr) of three hidden components, that can be plotted in a ternary diagram. These hidden components might be interpreted as end-members. We have analysed 14 molarities in 31 sampling stations all along the Llobregat River and its tributaries, with a monthly measure during two years. We have obtained a bi-plot with a 57% of explained total variance, from which we have extracted two factors: factor G, reflecting geological background enhanced by potash mining; and factor A, essentially controlled by urban and/or farming wastewater. Graphical representation of these two factors allows us to identify three extreme samples, corresponding to pristine waters, potash mining influence and urban sewage influence. To confirm this, we have available analysis of diffused and widespread point sources identified in the area: springs, potash mining lixiviates, sewage, and fertilisers. Each one of these sources shows a clear link with one of the extreme samples, except fertilisers due to the heterogeneity of their composition. This approach is a useful tool to distinguish end-members, and characterise them, an issue generally difficult to solve. It is worth note that the end-member composition cannot be fully estimated but only characterised through log-ratio relationships among components. Moreover, the influence of each endmember in a given sample must be evaluated in relative terms of the other samples. These limitations are intrinsic to the relative nature of compositional data
Resumo:
Hungary lies entirely within the Carpatho-Pannonian Region (CPR), a dominant tectonic unit of eastern Central Europe. The CPR consists of the Pannonian Basin system, and the arc of the Carpathian Mountains surrounding the lowlands in the north, east, and southeast. In the west, the CPR is bounded by the Eastern Alps, whereas in the south, by the Dinaridic belt. (...)
Resumo:
In standard multivariate statistical analysis common hypotheses of interest concern changes in mean vectors and subvectors. In compositional data analysis it is now well established that compositional change is most readily described in terms of the simplicial operation of perturbation and that subcompositions replace the marginal concept of subvectors. To motivate the statistical developments of this paper we present two challenging compositional problems from food production processes. Against this background the relevance of perturbations and subcompositions can be clearly seen. Moreover we can identify a number of hypotheses of interest involving the specification of particular perturbations or differences between perturbations and also hypotheses of subcompositional stability. We identify the two problems as being the counterpart of the analysis of paired comparison or split plot experiments and of separate sample comparative experiments in the jargon of standard multivariate analysis. We then develop appropriate estimation and testing procedures for a complete lattice of relevant compositional hypotheses
Resumo:
R from http://www.r-project.org/ is ‘GNU S’ – a language and environment for statistical computing and graphics. The environment in which many classical and modern statistical techniques have been implemented, but many are supplied as packages. There are 8 standard packages and many more are available through the cran family of Internet sites http://cran.r-project.org . We started to develop a library of functions in R to support the analysis of mixtures and our goal is a MixeR package for compositional data analysis that provides support for operations on compositions: perturbation and power multiplication, subcomposition with or without residuals, centering of the data, computing Aitchison’s, Euclidean, Bhattacharyya distances, compositional Kullback-Leibler divergence etc. graphical presentation of compositions in ternary diagrams and tetrahedrons with additional features: barycenter, geometric mean of the data set, the percentiles lines, marking and coloring of subsets of the data set, theirs geometric means, notation of individual data in the set . . . dealing with zeros and missing values in compositional data sets with R procedures for simple and multiplicative replacement strategy, the time series analysis of compositional data. We’ll present the current status of MixeR development and illustrate its use on selected data sets
Resumo:
The statistical analysis of compositional data is commonly used in geological studies. As is well-known, compositions should be treated using logratios of parts, which are difficult to use correctly in standard statistical packages. In this paper we describe the new features of our freeware package, named CoDaPack, which implements most of the basic statistical methods suitable for compositional data. An example using real data is presented to illustrate the use of the package
Resumo:
Aitchison and Bacon-Shone (1999) considered convex linear combinations of compositions. In other words, they investigated compositions of compositions, where the mixing composition follows a logistic Normal distribution (or a perturbation process) and the compositions being mixed follow a logistic Normal distribution. In this paper, I investigate the extension to situations where the mixing composition varies with a number of dimensions. Examples would be where the mixing proportions vary with time or distance or a combination of the two. Practical situations include a river where the mixing proportions vary along the river, or across a lake and possibly with a time trend. This is illustrated with a dataset similar to that used in the Aitchison and Bacon-Shone paper, which looked at how pollution in a loch depended on the pollution in the three rivers that feed the loch. Here, I explicitly model the variation in the linear combination across the loch, assuming that the mean of the logistic Normal distribution depends on the river flows and relative distance from the source origins
Resumo:
The literature related to skew–normal distributions has grown rapidly in recent years but at the moment few applications concern the description of natural phenomena with this type of probability models, as well as the interpretation of their parameters. The skew–normal distributions family represents an extension of the normal family to which a parameter (λ) has been added to regulate the skewness. The development of this theoretical field has followed the general tendency in Statistics towards more flexible methods to represent features of the data, as adequately as possible, and to reduce unrealistic assumptions as the normality that underlies most methods of univariate and multivariate analysis. In this paper an investigation on the shape of the frequency distribution of the logratio ln(Cl−/Na+) whose components are related to waters composition for 26 wells, has been performed. Samples have been collected around the active center of Vulcano island (Aeolian archipelago, southern Italy) from 1977 up to now at time intervals of about six months. Data of the logratio have been tentatively modeled by evaluating the performance of the skew–normal model for each well. Values of the λ parameter have been compared by considering temperature and spatial position of the sampling points. Preliminary results indicate that changes in λ values can be related to the nature of environmental processes affecting the data