5 resultados para transformed data

em Universitat de Girona, Spain


Relevância:

70.00% 70.00%

Publicador:

Resumo:

Factor analysis as frequent technique for multivariate data inspection is widely used also for compositional data analysis. The usual way is to use a centered logratio (clr) transformation to obtain the random vector y of dimension D. The factor model is then y = Λf + e (1) with the factors f of dimension k < D, the error term e, and the loadings matrix Λ. Using the usual model assumptions (see, e.g., Basilevsky, 1994), the factor analysis model (1) can be written as Cov(y) = ΛΛT + ψ (2) where ψ = Cov(e) has a diagonal form. The diagonal elements of ψ as well as the loadings matrix Λ are estimated from an estimation of Cov(y). Given observed clr transformed data Y as realizations of the random vector y. Outliers or deviations from the idealized model assumptions of factor analysis can severely effect the parameter estimation. As a way out, robust estimation of the covariance matrix of Y will lead to robust estimates of Λ and ψ in (2), see Pison et al. (2003). Well known robust covariance estimators with good statistical properties, like the MCD or the S-estimators (see, e.g. Maronna et al., 2006), rely on a full-rank data matrix Y which is not the case for clr transformed data (see, e.g., Aitchison, 1986). The isometric logratio (ilr) transformation (Egozcue et al., 2003) solves this singularity problem. The data matrix Y is transformed to a matrix Z by using an orthonormal basis of lower dimension. Using the ilr transformed data, a robust covariance matrix C(Z) can be estimated. The result can be back-transformed to the clr space by C(Y ) = V C(Z)V T where the matrix V with orthonormal columns comes from the relation between the clr and the ilr transformation. Now the parameters in the model (2) can be estimated (Basilevsky, 1994) and the results have a direct interpretation since the links to the original variables are still preserved. The above procedure will be applied to data from geochemistry. Our special interest is on comparing the results with those of Reimann et al. (2002) for the Kola project data

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Kriging is an interpolation technique whose optimality criteria are based on normality assumptions either for observed or for transformed data. This is the case of normal, lognormal and multigaussian kriging. When kriging is applied to transformed scores, optimality of obtained estimators becomes a cumbersome concept: back-transformed optimal interpolations in transformed scores are not optimal in the original sample space, and vice-versa. This lack of compatible criteria of optimality induces a variety of problems in both point and block estimates. For instance, lognormal kriging, widely used to interpolate positive variables, has no straightforward way to build consistent and optimal confidence intervals for estimates. These problems are ultimately linked to the assumed space structure of the data support: for instance, positive values, when modelled with lognormal distributions, are assumed to be embedded in the whole real space, with the usual real space structure and Lebesgue measure

Relevância:

60.00% 60.00%

Publicador:

Resumo:

At CoDaWork'03 we presented work on the analysis of archaeological glass composi- tional data. Such data typically consist of geochemical compositions involving 10-12 variables and approximates completely compositional data if the main component, sil- ica, is included. We suggested that what has been termed `crude' principal component analysis (PCA) of standardized data often identi ed interpretable pattern in the data more readily than analyses based on log-ratio transformed data (LRA). The funda- mental problem is that, in LRA, minor oxides with high relative variation, that may not be structure carrying, can dominate an analysis and obscure pattern associated with variables present at higher absolute levels. We investigate this further using sub- compositional data relating to archaeological glasses found on Israeli sites. A simple model for glass-making is that it is based on a `recipe' consisting of two `ingredients', sand and a source of soda. Our analysis focuses on the sub-composition of components associated with the sand source. A `crude' PCA of standardized data shows two clear compositional groups that can be interpreted in terms of di erent recipes being used at di erent periods, re ected in absolute di erences in the composition. LRA analysis can be undertaken either by normalizing the data or de ning a `residual'. In either case, after some `tuning', these groups are recovered. The results from the normalized LRA are di erently interpreted as showing that the source of sand used to make the glass di ered. These results are complementary. One relates to the recipe used. The other relates to the composition (and presumed sources) of one of the ingredients. It seems to be axiomatic in some expositions of LRA that statistical analysis of compositional data should focus on relative variation via the use of ratios. Our analysis suggests that absolute di erences can also be informative

Relevância:

30.00% 30.00%

Publicador:

Resumo:

It is well known that regression analyses involving compositional data need special attention because the data are not of full rank. For a regression analysis where both the dependent and independent variable are components we propose a transformation of the components emphasizing their role as dependent and independent variables. A simple linear regression can be performed on the transformed components. The regression line can be depicted in a ternary diagram facilitating the interpretation of the analysis in terms of components. An exemple with time-budgets illustrates the method and the graphical features

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Planners in public and private institutions would like coherent forecasts of the components of age-specic mortality, such as causes of death. This has been di cult to achieve because the relative values of the forecast components often fail to behave in a way that is coherent with historical experience. In addition, when the group forecasts are combined the result is often incompatible with an all-groups forecast. It has been shown that cause-specic mortality forecasts are pessimistic when compared with all-cause forecasts (Wilmoth, 1995). This paper abandons the conventional approach of using log mortality rates and forecasts the density of deaths in the life table. Since these values obey a unit sum constraint for both conventional single-decrement life tables (only one absorbing state) and multiple-decrement tables (more than one absorbing state), they are intrinsically relative rather than absolute values across decrements as well as ages. Using the methods of Compositional Data Analysis pioneered by Aitchison (1986), death densities are transformed into the real space so that the full range of multivariate statistics can be applied, then back-transformed to positive values so that the unit sum constraint is honoured. The structure of the best-known, single-decrement mortality-rate forecasting model, devised by Lee and Carter (1992), is expressed in compositional form and the results from the two models are compared. The compositional model is extended to a multiple-decrement form and used to forecast mortality by cause of death for Japan