55 resultados para compositional heterogeneity


Relevância:

20.00% 20.00%

Publicador:

Resumo:

The amalgamation operation is frequently used to reduce the number of parts of compositional data but it is a non-linear operation in the simplex with the usual geometry, the Aitchison geometry. The concept of balances between groups, a particular coordinate system designed over binary partitions of the parts, could be an alternative to the amalgamation in some cases. In this work we discuss the proper application of both concepts using a real data set corresponding to behavioral measures of pregnant sows

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The quantitative estimation of Sea Surface Temperatures from fossils assemblages is a fundamental issue in palaeoclimatic and paleooceanographic investigations. The Modern Analogue Technique, a widely adopted method based on direct comparison of fossil assemblages with modern coretop samples, was revised with the aim of conforming it to compositional data analysis. The new CODAMAT method was developed by adopting the Aitchison metric as distance measure. Modern coretop datasets are characterised by a large amount of zeros. The zero replacement was carried out by adopting a Bayesian approach to the zero replacement, based on a posterior estimation of the parameter of the multinomial distribution. The number of modern analogues from which reconstructing the SST was determined by means of a multiple approach by considering the Proxies correlation matrix, Standardized Residual Sum of Squares and Mean Squared Distance. This new CODAMAT method was applied to the planktonic foraminiferal assemblages of a core recovered in the Tyrrhenian Sea. Kew words: Modern analogues, Aitchison distance, Proxies correlation matrix, Standardized Residual Sum of Squares

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Self-organizing maps (Kohonen 1997) is a type of artificial neural network developed to explore patterns in high-dimensional multivariate data. The conventional version of the algorithm involves the use of Euclidean metric in the process of adaptation of the model vectors, thus rendering in theory a whole methodology incompatible with non-Euclidean geometries. In this contribution we explore the two main aspects of the problem: 1. Whether the conventional approach using Euclidean metric can shed valid results with compositional data. 2. If a modification of the conventional approach replacing vectorial sum and scalar multiplication by the canonical operators in the simplex (i.e. perturbation and powering) can converge to an adequate solution. Preliminary tests showed that both methodologies can be used on compositional data. However, the modified version of the algorithm performs poorer than the conventional version, in particular, when the data is pathological. Moreover, the conventional ap- proach converges faster to a solution, when data is \well-behaved". Key words: Self Organizing Map; Artificial Neural networks; Compositional data

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In Catalonia, according to the nitrate directive (91/676/EU), nine areas have been declared as vulnerable to nitrate pollution from agricultural sources (Decret 283/1998 and Decret 479/2004). Five of these areas have been studied coupling hydro chemical data with a multi-isotopic approach (Vitòria et al. 2005, Otero et al. 2007, Puig et al. 2007), in an ongoing research project looking for an integrated application of classical hydrochemistry data, with a comprehensive isotopic characterisation (δ15N and δ18O of dissolved nitrate, δ34S and δ18O of dissolved sulphate, δ13C of dissolved inorganic carbon, and δD and δ18O of water). Within this general frame, the contribution presented explores compositional ways of: (i) distinguish agrochemicals and manure N pollution, (ii) quantify natural attenuation of nitrate (denitrification), and identify possible controlling factors. To achieve this two-fold goal, the following techniques have been used. Separate biplots of each suite of data show that each studied region has a distinct δ34S and pH signatures, but they are homogeneous with regard to NO3- related variables. Also, the geochemical variables were projected onto the compositional directions associated with the possible denitrification reactions in each region. The resulting balances can be plot together with some isotopes, to assess their likelihood of occurrence

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Our essay aims at studying suitable statistical methods for the clustering of compositional data in situations where observations are constituted by trajectories of compositional data, that is, by sequences of composition measurements along a domain. Observed trajectories are known as “functional data” and several methods have been proposed for their analysis. In particular, methods for clustering functional data, known as Functional Cluster Analysis (FCA), have been applied by practitioners and scientists in many fields. To our knowledge, FCA techniques have not been extended to cope with the problem of clustering compositional data trajectories. In order to extend FCA techniques to the analysis of compositional data, FCA clustering techniques have to be adapted by using a suitable compositional algebra. The present work centres on the following question: given a sample of compositional data trajectories, how can we formulate a segmentation procedure giving homogeneous classes? To address this problem we follow the steps described below. First of all we adapt the well-known spline smoothing techniques in order to cope with the smoothing of compositional data trajectories. In fact, an observed curve can be thought of as the sum of a smooth part plus some noise due to measurement errors. Spline smoothing techniques are used to isolate the smooth part of the trajectory: clustering algorithms are then applied to these smooth curves. The second step consists in building suitable metrics for measuring the dissimilarity between trajectories: we propose a metric that accounts for difference in both shape and level, and a metric accounting for differences in shape only. A simulation study is performed in order to evaluate the proposed methodologies, using both hierarchical and partitional clustering algorithm. The quality of the obtained results is assessed by means of several indices

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Many multivariate methods that are apparently distinct can be linked by introducing one or more parameters in their definition. Methods that can be linked in this way are correspondence analysis, unweighted or weighted logratio analysis (the latter also known as "spectral mapping"), nonsymmetric correspondence analysis, principal component analysis (with and without logarithmic transformation of the data) and multidimensional scaling. In this presentation I will show how several of these methods, which are frequently used in compositional data analysis, may be linked through parametrizations such as power transformations, linear transformations and convex linear combinations. Since the methods of interest here all lead to visual maps of data, a "movie" can be made where where the linking parameter is allowed to vary in small steps: the results are recalculated "frame by frame" and one can see the smooth change from one method to another. Several of these "movies" will be shown, giving a deeper insight into the similarities and differences between these methods

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper we examine the problem of compositional data from a different starting point. Chemical compositional data, as used in provenance studies on archaeological materials, will be approached from the measurement theory. The results will show, in a very intuitive way that chemical data can only be treated by using the approach developed for compositional data. It will be shown that compositional data analysis is a particular case in projective geometry, when the projective coordinates are in the positive orthant, and they have the properties of logarithmic interval metrics. Moreover, it will be shown that this approach can be extended to a very large number of applications, including shape analysis. This will be exemplified with a case study in architecture of Early Christian churches dated back to the 5th-7th centuries AD

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Factor analysis as frequent technique for multivariate data inspection is widely used also for compositional data analysis. The usual way is to use a centered logratio (clr) transformation to obtain the random vector y of dimension D. The factor model is then y = Λf + e (1) with the factors f of dimension k < D, the error term e, and the loadings matrix Λ. Using the usual model assumptions (see, e.g., Basilevsky, 1994), the factor analysis model (1) can be written as Cov(y) = ΛΛT + ψ (2) where ψ = Cov(e) has a diagonal form. The diagonal elements of ψ as well as the loadings matrix Λ are estimated from an estimation of Cov(y). Given observed clr transformed data Y as realizations of the random vector y. Outliers or deviations from the idealized model assumptions of factor analysis can severely effect the parameter estimation. As a way out, robust estimation of the covariance matrix of Y will lead to robust estimates of Λ and ψ in (2), see Pison et al. (2003). Well known robust covariance estimators with good statistical properties, like the MCD or the S-estimators (see, e.g. Maronna et al., 2006), rely on a full-rank data matrix Y which is not the case for clr transformed data (see, e.g., Aitchison, 1986). The isometric logratio (ilr) transformation (Egozcue et al., 2003) solves this singularity problem. The data matrix Y is transformed to a matrix Z by using an orthonormal basis of lower dimension. Using the ilr transformed data, a robust covariance matrix C(Z) can be estimated. The result can be back-transformed to the clr space by C(Y ) = V C(Z)V T where the matrix V with orthonormal columns comes from the relation between the clr and the ilr transformation. Now the parameters in the model (2) can be estimated (Basilevsky, 1994) and the results have a direct interpretation since the links to the original variables are still preserved. The above procedure will be applied to data from geochemistry. Our special interest is on comparing the results with those of Reimann et al. (2002) for the Kola project data

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Compositional data, also called multiplicative ipsative data, are common in survey research instruments in areas such as time use, budget expenditure and social networks. Compositional data are usually expressed as proportions of a total, whose sum can only be 1. Owing to their constrained nature, statistical analysis in general, and estimation of measurement quality with a confirmatory factor analysis model for multitrait-multimethod (MTMM) designs in particular are challenging tasks. Compositional data are highly non-normal, as they range within the 0-1 interval. One component can only increase if some other(s) decrease, which results in spurious negative correlations among components which cannot be accounted for by the MTMM model parameters. In this article we show how researchers can use the correlated uniqueness model for MTMM designs in order to evaluate measurement quality of compositional indicators. We suggest using the additive log ratio transformation of the data, discuss several approaches to deal with zero components and explain how the interpretation of MTMM designs di ers from the application to standard unconstrained data. We show an illustration of the method on data of social network composition expressed in percentages of partner, family, friends and other members in which we conclude that the faceto-face collection mode is generally superior to the telephone mode, although primacy e ects are higher in the face-to-face mode. Compositions of strong ties (such as partner) are measured with higher quality than those of weaker ties (such as other network members)

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Dynamic optimization methods have become increasingly important over the last years in economics. Within the dynamic optimization techniques employed, optimal control has emerged as the most powerful tool for the theoretical economic analysis. However, there is the need to advance further and take account that many dynamic economic processes are, in addition, dependent on some other parameter different than time. One can think of relaxing the assumption of a representative (homogeneous) agent in macro- and micro-economic applications allowing for heterogeneity among the agents. For instance, the optimal adaptation and diffusion of a new technology over time, may depend on the age of the person that adopted the new technology. Therefore, the economic models must take account of heterogeneity conditions within the dynamic framework. This thesis intends to accomplish two goals. The first goal is to analyze and revise existing environmental policies that focus on defining the optimal management of natural resources over time, by taking account of the heterogeneity of environmental conditions. Thus, the thesis makes a policy orientated contribution in the field of environmental policy by defining the necessary changes to transform an environmental policy based on the assumption of homogeneity into an environmental policy which takes account of heterogeneity. As a result the newly defined environmental policy will be more efficient and likely also politically more acceptable since it is tailored more specifically to the heterogeneous environmental conditions. Additionally to its policy orientated contribution, this thesis aims making a methodological contribution by applying a new optimization technique for solving problems where the control variables depend on two or more arguments --- the so-called two-stage solution approach ---, and by applying a numerical method --- the Escalator Boxcar Train Method --- for solving distributed optimal control problems, i.e., problems where the state variables, in addition to the control variables, depend on two or more arguments. Chapter 2 presents a theoretical framework to determine optimal resource allocation over time for the production of a good by heterogeneous producers, who generate a stock externalit and derives government policies to modify the behavior of competitive producers in order to achieve optimality. Chapter 3 illustrates the method in a more specific context, and integrates the aspects of quality and time, presenting a theoretical model that allows to determine the socially optimal outcome over time and space for the problem of waterlogging in irrigated agricultural production. Chapter 4 of this thesis concentrates on forestry resources and analyses the optimal selective-logging regime of a size-distributed forest.