436 resultados para ESTADÍSTICA


Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper is a first draft of the principle of statistical modelling on coordinates. Several causes —which would be long to detail—have led to this situation close to the deadline for submitting papers to CODAWORK’03. The main of them is the fast development of the approach along thelast months, which let appear previous drafts as obsolete. The present paper contains the essential parts of the state of the art of this approach from my point of view. I would like to acknowledge many clarifying discussions with the group of people working in this field in Girona, Barcelona, Carrick Castle, Firenze, Berlin, G¨ottingen, and Freiberg. They have given a lot of suggestions and ideas. Nevertheless, there might be still errors or unclear aspects which are exclusively my fault. I hope this contribution serves as a basis for further discussions and new developments

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The simplex, the sample space of compositional data, can be structured as a real Euclidean space. This fact allows to work with the coefficients with respect to an orthonormal basis. Over these coefficients we apply standard real analysis, inparticular, we define two different laws of probability trought the density function and we study their main properties

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Traditionally, compositional data has been identified with closed data, and the simplex has been considered as the natural sample space of this kind of data. In our opinion, the emphasis on the constrained nature ofcompositional data has contributed to mask its real nature. More crucial than the constraining property of compositional data is the scale-invariant property of this kind of data. Indeed, when we are considering only few parts of a full composition we are not working with constrained data but our data are still compositional. We believe that it is necessary to give a more precisedefinition of composition. This is the aim of this oral contribution

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The biplot has proved to be a powerful descriptive and analytical tool in many areasof applications of statistics. For compositional data the necessary theoreticaladaptation has been provided, with illustrative applications, by Aitchison (1990) andAitchison and Greenacre (2002). These papers were restricted to the interpretation ofsimple compositional data sets. In many situations the problem has to be described insome form of conditional modelling. For example, in a clinical trial where interest isin how patients’ steroid metabolite compositions may change as a result of differenttreatment regimes, interest is in relating the compositions after treatment to thecompositions before treatment and the nature of the treatments applied. To study thisthrough a biplot technique requires the development of some form of conditionalcompositional biplot. This is the purpose of this paper. We choose as a motivatingapplication an analysis of the 1992 US President ial Election, where interest may be inhow the three-part composition, the percentage division among the three candidates -Bush, Clinton and Perot - of the presidential vote in each state, depends on the ethniccomposition and on the urban-rural composition of the state. The methodology ofconditional compositional biplots is first developed and a detailed interpretation of the1992 US Presidential Election provided. We use a second application involving theconditional variability of tektite mineral compositions with respect to major oxidecompositions to demonstrate some hazards of simplistic interpretation of biplots.Finally we conjecture on further possible applications of conditional compositionalbiplots

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The use of orthonormal coordinates in the simplex and, particularly, balance coordinates, has suggested the use of a dendrogram for the exploratory analysis of compositional data. The dendrogram is based on a sequential binary partition of a compositional vector into groups of parts. At each step of a partition, one group of parts isdivided into two new groups, and a balancing axis in the simplex between both groupsis defined. The set of balancing axes constitutes an orthonormal basis, and the projections of the sample on them are orthogonal coordinates. They can be represented in adendrogram-like graph showing: (a) the way of grouping parts of the compositional vector; (b) the explanatory role of each subcomposition generated in the partition process;(c) the decomposition of the total variance into balance components associated witheach binary partition; (d) a box-plot of each balance. This representation is useful tohelp the interpretation of balance coordinates; to identify which are the most explanatory coordinates; and to describe the whole sample in a single diagram independentlyof the number of parts of the sample

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The application of compositional data analysis through log ratio trans-formations corresponds to a multinomial logit model for the shares themselves.This model is characterized by the property of Independence of Irrelevant Alter-natives (IIA). IIA states that the odds ratio in this case the ratio of shares is invariant to the addition or deletion of outcomes to the problem. It is exactlythis invariance of the ratio that underlies the commonly used zero replacementprocedure in compositional data analysis. In this paper we investigate using thenested logit model that does not embody IIA and an associated zero replacementprocedure and compare its performance with that of the more usual approach ofusing the multinomial logit model. Our comparisons exploit a data set that com-bines voting data by electoral division with corresponding census data for eachdivision for the 2001 Federal election in Australia

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This analysis was stimulated by the real data analysis problem of householdexpenditure data. The full dataset contains expenditure data for a sample of 1224 households. The expenditure is broken down at 2 hierarchical levels: 9 major levels (e.g. housing, food, utilities etc.) and 92 minor levels. There are also 5 factors and 5 covariates at the household level. Not surprisingly, there are a small number of zeros at the major level, but many zeros at the minor level. The question is how best to model the zeros. Clearly, models that tryto add a small amount to the zero terms are not appropriate in general as at least some of the zeros are clearly structural, e.g. alcohol/tobacco for households that are teetotal. The key question then is how to build suitable conditional models. For example, is the sub-composition of spendingexcluding alcohol/tobacco similar for teetotal and non-teetotal households?In other words, we are looking for sub-compositional independence. Also, what determines whether a household is teetotal? Can we assume that it is independent of the composition? In general, whether teetotal will clearly depend on the household level variables, so we need to be able to model this dependence. The other tricky question is that with zeros on more than onecomponent, we need to be able to model dependence and independence of zeros on the different components. Lastly, while some zeros are structural, others may not be, for example, for expenditure on durables, it may be chance as to whether a particular household spends money on durableswithin the sample period. This would clearly be distinguishable if we had longitudinal data, but may still be distinguishable by looking at the distribution, on the assumption that random zeros will usually be for situations where any non-zero expenditure is not small.While this analysis is based on around economic data, the ideas carry over tomany other situations, including geological data, where minerals may be missing for structural reasons (similar to alcohol), or missing because they occur only in random regions which may be missed in a sample (similar to the durables)

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Examples of compositional data. The simplex, a suitable sample space for compositional data and Aitchison's geometry. R, a free language and environment for statistical computing and graphics

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Compositional data naturally arises from the scientific analysis of the chemicalcomposition of archaeological material such as ceramic and glass artefacts. Data of thistype can be explored using a variety of techniques, from standard multivariate methodssuch as principal components analysis and cluster analysis, to methods based upon theuse of log-ratios. The general aim is to identify groups of chemically similar artefactsthat could potentially be used to answer questions of provenance.This paper will demonstrate work in progress on the development of a documentedlibrary of methods, implemented using the statistical package R, for the analysis ofcompositional data. R is an open source package that makes available very powerfulstatistical facilities at no cost. We aim to show how, with the aid of statistical softwaresuch as R, traditional exploratory multivariate analysis can easily be used alongside, orin combination with, specialist techniques of compositional data analysis.The library has been developed from a core of basic R functionality, together withpurpose-written routines arising from our own research (for example that reported atCoDaWork'03). In addition, we have included other appropriate publicly availabletechniques and libraries that have been implemented in R by other authors. Availablefunctions range from standard multivariate techniques through to various approaches tolog-ratio analysis and zero replacement. We also discuss and demonstrate a smallselection of relatively new techniques that have hitherto been little-used inarchaeometric applications involving compositional data. The application of the libraryto the analysis of data arising in archaeometry will be demonstrated; results fromdifferent analyses will be compared; and the utility of the various methods discussed

Relevância:

10.00% 10.00%

Publicador:

Resumo:

”compositions” is a new R-package for the analysis of compositional and positive data.It contains four classes corresponding to the four different types of compositional andpositive geometry (including the Aitchison geometry). It provides means for computation,plotting and high-level multivariate statistical analysis in all four geometries.These geometries are treated in an fully analogous way, based on the principle of workingin coordinates, and the object-oriented programming paradigm of R. In this way,called functions automatically select the most appropriate type of analysis as a functionof the geometry. The graphical capabilities include ternary diagrams and tetrahedrons,various compositional plots (boxplots, barplots, piecharts) and extensive graphical toolsfor principal components. Afterwards, ortion and proportion lines, straight lines andellipses in all geometries can be added to plots. The package is accompanied by ahands-on-introduction, documentation for every function, demos of the graphical capabilitiesand plenty of usage examples. It allows direct and parallel computation inall four vector spaces and provides the beginner with a copy-and-paste style of dataanalysis, while letting advanced users keep the functionality and customizability theydemand of R, as well as all necessary tools to add own analysis routines. A completeexample is included in the appendix

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Several methods have been suggested to estimate non-linear models with interaction terms in the presence of measurement error. Structural equation models eliminate measurement error bias, but require large samples. Ordinary least squares regression on summated scales, regression on factor scores and partial least squares are appropriate for small samples but do not correct measurement error bias. Two stage least squares regression does correct measurement error bias but the results strongly depend on the instrumental variable choice. This article discusses the old disattenuated regression method as an alternative for correcting measurement error in small samples. The method is extended to the case of interaction terms and is illustrated on a model that examines the interaction effect of innovation and style of use of budgets on business performance. Alternative reliability estimates that can be used to disattenuate the estimates are discussed. A comparison is made with the alternative methods. Methods that do not correct for measurement error bias perform very similarly and considerably worse than disattenuated regression

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In standard multivariate statistical analysis common hypotheses of interest concern changes in mean vectors and subvectors. In compositional data analysis it is now well established that compositional change is most readily described in terms of the simplicial operation of perturbation and that subcompositions replace the marginal concept of subvectors. To motivate the statistical developments of this paper we present two challenging compositional problems from food production processes.Against this background the relevance of perturbations and subcompositions can beclearly seen. Moreover we can identify a number of hypotheses of interest involvingthe specification of particular perturbations or differences between perturbations and also hypotheses of subcompositional stability. We identify the two problems as being the counterpart of the analysis of paired comparison or split plot experiments and of separate sample comparative experiments in the jargon of standard multivariate analysis. We then develop appropriate estimation and testing procedures for a complete lattice of relevant compositional hypotheses

Relevância:

10.00% 10.00%

Publicador:

Resumo:

R from http://www.r-project.org/ is ‘GNU S’ – a language and environment for statistical computingand graphics. The environment in which many classical and modern statistical techniques havebeen implemented, but many are supplied as packages. There are 8 standard packages and many moreare available through the cran family of Internet sites http://cran.r-project.org .We started to develop a library of functions in R to support the analysis of mixtures and our goal isa MixeR package for compositional data analysis that provides support foroperations on compositions: perturbation and power multiplication, subcomposition with or withoutresiduals, centering of the data, computing Aitchison’s, Euclidean, Bhattacharyya distances,compositional Kullback-Leibler divergence etc.graphical presentation of compositions in ternary diagrams and tetrahedrons with additional features:barycenter, geometric mean of the data set, the percentiles lines, marking and coloring ofsubsets of the data set, theirs geometric means, notation of individual data in the set . . .dealing with zeros and missing values in compositional data sets with R procedures for simpleand multiplicative replacement strategy,the time series analysis of compositional data.We’ll present the current status of MixeR development and illustrate its use on selected data sets

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The statistical analysis of compositional data is commonly used in geological studies.As is well-known, compositions should be treated using logratios of parts, which aredifficult to use correctly in standard statistical packages. In this paper we describe thenew features of our freeware package, named CoDaPack, which implements most of thebasic statistical methods suitable for compositional data. An example using real data ispresented to illustrate the use of the package

Relevância:

10.00% 10.00%

Publicador:

Resumo:

There are two principal chemical concepts that are important for studying the naturalenvironment. The first one is thermodynamics, which describes whether a system is atequilibrium or can spontaneously change by chemical reactions. The second main conceptis how fast chemical reactions (kinetics or rate of chemical change) take place wheneverthey start. In this work we examine a natural system in which both thermodynamics andkinetic factors are important in determining the abundance of NH+4 , NO−2 and NO−3 insuperficial waters. Samples were collected in the Arno Basin (Tuscany, Italy), a system inwhich natural and antrophic effects both contribute to highly modify the chemical compositionof water. Thermodynamical modelling based on the reduction-oxidation reactionsinvolving the passage NH+4 -& NO−2 -& NO−3 in equilibrium conditions has allowed todetermine the Eh redox potential values able to characterise the state of each sample and,consequently, of the fluid environment from which it was drawn. Just as pH expressesthe concentration of H+ in solution, redox potential is used to express the tendency of anenvironment to receive or supply electrons. In this context, oxic environments, as thoseof river systems, are said to have a high redox potential because O2 is available as anelectron acceptor.Principles of thermodynamics and chemical kinetics allow to obtain a model that oftendoes not completely describe the reality of natural systems. Chemical reactions may indeedfail to achieve equilibrium because the products escape from the site of the rectionor because reactions involving the trasformation are very slow, so that non-equilibriumconditions exist for long periods. Moreover, reaction rates can be sensitive to poorly understoodcatalytic effects or to surface effects, while variables as concentration (a largenumber of chemical species can coexist and interact concurrently), temperature and pressurecan have large gradients in natural systems. By taking into account this, data of 91water samples have been modelled by using statistical methodologies for compositionaldata. The application of log–contrast analysis has allowed to obtain statistical parametersto be correlated with the calculated Eh values. In this way, natural conditions in whichchemical equilibrium is hypothesised, as well as underlying fast reactions, are comparedwith those described by a stochastic approach