989 resultados para Municipis -- Govern i administració -- Catalunya -- Girona
Resumo:
In 2000 the European Statistical Office published the guidelines for developing the Harmonized European Time Use Surveys system. Under such a unified framework, the first Time Use Survey of national scope was conducted in Spain during 2002– 03. The aim of these surveys is to understand human behavior and the lifestyle of people. Time allocation data are of compositional nature in origin, that is, they are subject to non-negativity and constant-sum constraints. Thus, standard multivariate techniques cannot be directly applied to analyze them. The goal of this work is to identify homogeneous Spanish Autonomous Communities with regard to the typical activity pattern of their respective populations. To this end, fuzzy clustering approach is followed. Rather than the hard partitioning of classical clustering, where objects are allocated to only a single group, fuzzy method identify overlapping groups of objects by allowing them to belong to more than one group. Concretely, the probabilistic fuzzy c-means algorithm is conveniently adapted to deal with the Spanish Time Use Survey microdata. As a result, a map distinguishing Autonomous Communities with similar activity pattern is drawn. Key words: Time use data, Fuzzy clustering; FCM; simplex space; Aitchison distance
Resumo:
In order to obtain a high-resolution Pleistocene stratigraphy, eleven continuously cored boreholes, 100 to 220m deep were drilled in the northern part of the Po Plain by Regione Lombardia in the last five years. Quantitative provenance analysis (QPA, Weltje and von Eynatten, 2004) of Pleistocene sands was carried out by using multivariate statistical analysis (principal component analysis, PCA, and similarity analysis) on an integrated data set, including high-resolution bulk petrography and heavy-mineral analyses on Pleistocene sands and of 250 major and minor modern rivers draining the southern flank of the Alps from West to East (Garzanti et al, 2004; 2006). Prior to the onset of major Alpine glaciations, metamorphic and quartzofeldspathic detritus from the Western and Central Alps was carried from the axial belt to the Po basin longitudinally parallel to the SouthAlpine belt by a trunk river (Vezzoli and Garzanti, 2008). This scenario rapidly changed during the marine isotope stage 22 (0.87 Ma), with the onset of the first major Pleistocene glaciation in the Alps (Muttoni et al, 2003). PCA and similarity analysis from core samples show that the longitudinal trunk river at this time was shifted southward by the rapid southward and westward progradation of transverse alluvial river systems fed from the Central and Southern Alps. Sediments were transported southward by braided river systems as well as glacial sediments transported by Alpine valley glaciers invaded the alluvial plain. Kew words: Detrital modes; Modern sands; Provenance; Principal Components Analysis; Similarity, Canberra Distance; palaeodrainage
Resumo:
Emergent molecular measurement methods, such as DNA microarray, qRTPCR, and many others, offer tremendous promise for the personalized treatment of cancer. These technologies measure the amount of specific proteins, RNA, DNA or other molecular targets from tumor specimens with the goal of “fingerprinting” individual cancers. Tumor specimens are heterogeneous; an individual specimen typically contains unknown amounts of multiple tissues types. Thus, the measured molecular concentrations result from an unknown mixture of tissue types, and must be normalized to account for the composition of the mixture. For example, a breast tumor biopsy may contain normal, dysplastic and cancerous epithelial cells, as well as stromal components (fatty and connective tissue) and blood and lymphatic vessels. Our diagnostic interest focuses solely on the dysplastic and cancerous epithelial cells. The remaining tissue components serve to “contaminate” the signal of interest. The proportion of each of the tissue components changes as a function of patient characteristics (e.g., age), and varies spatially across the tumor region. Because each of the tissue components produces a different molecular signature, and the amount of each tissue type is specimen dependent, we must estimate the tissue composition of the specimen, and adjust the molecular signal for this composition. Using the idea of a chemical mass balance, we consider the total measured concentrations to be a weighted sum of the individual tissue signatures, where weights are determined by the relative amounts of the different tissue types. We develop a compositional source apportionment model to estimate the relative amounts of tissue components in a tumor specimen. We then use these estimates to infer the tissuespecific concentrations of key molecular targets for sub-typing individual tumors. We anticipate these specific measurements will greatly improve our ability to discriminate between different classes of tumors, and allow more precise matching of each patient to the appropriate treatment
Resumo:
The Hardy-Weinberg law, formulated about 100 years ago, states that under certain assumptions, the three genotypes AA, AB and BB at a bi-allelic locus are expected to occur in the proportions p2, 2pq, and q2 respectively, where p is the allele frequency of A, and q = 1-p. There are many statistical tests being used to check whether empirical marker data obeys the Hardy-Weinberg principle. Among these are the classical xi-square test (with or without continuity correction), the likelihood ratio test, Fisher's Exact test, and exact tests in combination with Monte Carlo and Markov Chain algorithms. Tests for Hardy-Weinberg equilibrium (HWE) are numerical in nature, requiring the computation of a test statistic and a p-value. There is however, ample space for the use of graphics in HWE tests, in particular for the ternary plot. Nowadays, many genetical studies are using genetical markers known as Single Nucleotide Polymorphisms (SNPs). SNP data comes in the form of counts, but from the counts one typically computes genotype frequencies and allele frequencies. These frequencies satisfy the unit-sum constraint, and their analysis therefore falls within the realm of compositional data analysis (Aitchison, 1986). SNPs are usually bi-allelic, which implies that the genotype frequencies can be adequately represented in a ternary plot. Compositions that are in exact HWE describe a parabola in the ternary plot. Compositions for which HWE cannot be rejected in a statistical test are typically “close" to the parabola, whereas compositions that differ significantly from HWE are “far". By rewriting the statistics used to test for HWE in terms of heterozygote frequencies, acceptance regions for HWE can be obtained that can be depicted in the ternary plot. This way, compositions can be tested for HWE purely on the basis of their position in the ternary plot (Graffelman & Morales, 2008). This leads to nice graphical representations where large numbers of SNPs can be tested for HWE in a single graph. Several examples of graphical tests for HWE (implemented in R software), will be shown, using SNP data from different human populations
Resumo:
The amalgamation operation is frequently used to reduce the number of parts of compositional data but it is a non-linear operation in the simplex with the usual geometry, the Aitchison geometry. The concept of balances between groups, a particular coordinate system designed over binary partitions of the parts, could be an alternative to the amalgamation in some cases. In this work we discuss the proper application of both concepts using a real data set corresponding to behavioral measures of pregnant sows
Resumo:
Planners in public and private institutions would like coherent forecasts of the components of age-specic mortality, such as causes of death. This has been di cult to achieve because the relative values of the forecast components often fail to behave in a way that is coherent with historical experience. In addition, when the group forecasts are combined the result is often incompatible with an all-groups forecast. It has been shown that cause-specic mortality forecasts are pessimistic when compared with all-cause forecasts (Wilmoth, 1995). This paper abandons the conventional approach of using log mortality rates and forecasts the density of deaths in the life table. Since these values obey a unit sum constraint for both conventional single-decrement life tables (only one absorbing state) and multiple-decrement tables (more than one absorbing state), they are intrinsically relative rather than absolute values across decrements as well as ages. Using the methods of Compositional Data Analysis pioneered by Aitchison (1986), death densities are transformed into the real space so that the full range of multivariate statistics can be applied, then back-transformed to positive values so that the unit sum constraint is honoured. The structure of the best-known, single-decrement mortality-rate forecasting model, devised by Lee and Carter (1992), is expressed in compositional form and the results from the two models are compared. The compositional model is extended to a multiple-decrement form and used to forecast mortality by cause of death for Japan
Resumo:
Theory of compositional data analysis is often focused on the composition only. However in practical applications we often treat a composition together with covariables with some other scale. This contribution systematically gathers and develop statistical tools for this situation. For instance, for the graphical display of the dependence of a composition with a categorical variable, a colored set of ternary diagrams might be a good idea for a first look at the data, but it will fast hide important aspects if the composition has many parts, or it takes extreme values. On the other hand colored scatterplots of ilr components could not be very instructive for the analyst, if the conventional, black-box ilr is used. Thinking on terms of the Euclidean structure of the simplex, we suggest to set up appropriate projections, which on one side show the compositional geometry and on the other side are still comprehensible by a non-expert analyst, readable for all locations and scales of the data. This is e.g. done by defining special balance displays with carefully- selected axes. Following this idea, we need to systematically ask how to display, explore, describe, and test the relation to complementary or explanatory data of categorical, real, ratio or again compositional scales. This contribution shows that it is sufficient to use some basic concepts and very few advanced tools from multivariate statistics (principal covariances, multivariate linear models, trellis or parallel plots, etc.) to build appropriate procedures for all these combinations of scales. This has some fundamental implications in their software implementation, and how might they be taught to analysts not already experts in multivariate analysis
Resumo:
The quantitative estimation of Sea Surface Temperatures from fossils assemblages is a fundamental issue in palaeoclimatic and paleooceanographic investigations. The Modern Analogue Technique, a widely adopted method based on direct comparison of fossil assemblages with modern coretop samples, was revised with the aim of conforming it to compositional data analysis. The new CODAMAT method was developed by adopting the Aitchison metric as distance measure. Modern coretop datasets are characterised by a large amount of zeros. The zero replacement was carried out by adopting a Bayesian approach to the zero replacement, based on a posterior estimation of the parameter of the multinomial distribution. The number of modern analogues from which reconstructing the SST was determined by means of a multiple approach by considering the Proxies correlation matrix, Standardized Residual Sum of Squares and Mean Squared Distance. This new CODAMAT method was applied to the planktonic foraminiferal assemblages of a core recovered in the Tyrrhenian Sea. Kew words: Modern analogues, Aitchison distance, Proxies correlation matrix, Standardized Residual Sum of Squares
Resumo:
Self-organizing maps (Kohonen 1997) is a type of artificial neural network developed to explore patterns in high-dimensional multivariate data. The conventional version of the algorithm involves the use of Euclidean metric in the process of adaptation of the model vectors, thus rendering in theory a whole methodology incompatible with non-Euclidean geometries. In this contribution we explore the two main aspects of the problem: 1. Whether the conventional approach using Euclidean metric can shed valid results with compositional data. 2. If a modification of the conventional approach replacing vectorial sum and scalar multiplication by the canonical operators in the simplex (i.e. perturbation and powering) can converge to an adequate solution. Preliminary tests showed that both methodologies can be used on compositional data. However, the modified version of the algorithm performs poorer than the conventional version, in particular, when the data is pathological. Moreover, the conventional ap- proach converges faster to a solution, when data is \well-behaved". Key words: Self Organizing Map; Artificial Neural networks; Compositional data
Resumo:
In Catalonia, according to the nitrate directive (91/676/EU), nine areas have been declared as vulnerable to nitrate pollution from agricultural sources (Decret 283/1998 and Decret 479/2004). Five of these areas have been studied coupling hydro chemical data with a multi-isotopic approach (Vitòria et al. 2005, Otero et al. 2007, Puig et al. 2007), in an ongoing research project looking for an integrated application of classical hydrochemistry data, with a comprehensive isotopic characterisation (δ15N and δ18O of dissolved nitrate, δ34S and δ18O of dissolved sulphate, δ13C of dissolved inorganic carbon, and δD and δ18O of water). Within this general frame, the contribution presented explores compositional ways of: (i) distinguish agrochemicals and manure N pollution, (ii) quantify natural attenuation of nitrate (denitrification), and identify possible controlling factors. To achieve this two-fold goal, the following techniques have been used. Separate biplots of each suite of data show that each studied region has a distinct δ34S and pH signatures, but they are homogeneous with regard to NO3- related variables. Also, the geochemical variables were projected onto the compositional directions associated with the possible denitrification reactions in each region. The resulting balances can be plot together with some isotopes, to assess their likelihood of occurrence
Resumo:
In most psychological tests and questionnaires, a test score is obtained by taking the sum of the item scores. In virtually all cases where the test or questionnaire contains multidimensional forced-choice items, this traditional scoring method is also applied. We argue that the summation of scores obtained with multidimensional forced-choice items produces uninterpretable test scores. Therefore, we propose three alternative scoring methods: a weak and a strict rank preserving scoring method, which both allow an ordinal interpretation of test scores; and a ratio preserving scoring method, which allows a proportional interpretation of test scores. Each proposed scoring method yields an index for each respondent indicating the degree to which the response pattern is inconsistent. Analysis of real data showed that with respect to rank preservation, the weak and strict rank preserving method resulted in lower inconsistency indices than the traditional scoring method; with respect to ratio preservation, the ratio preserving scoring method resulted in lower inconsistency indices than the traditional scoring method
Resumo:
Our essay aims at studying suitable statistical methods for the clustering of compositional data in situations where observations are constituted by trajectories of compositional data, that is, by sequences of composition measurements along a domain. Observed trajectories are known as “functional data” and several methods have been proposed for their analysis. In particular, methods for clustering functional data, known as Functional Cluster Analysis (FCA), have been applied by practitioners and scientists in many fields. To our knowledge, FCA techniques have not been extended to cope with the problem of clustering compositional data trajectories. In order to extend FCA techniques to the analysis of compositional data, FCA clustering techniques have to be adapted by using a suitable compositional algebra. The present work centres on the following question: given a sample of compositional data trajectories, how can we formulate a segmentation procedure giving homogeneous classes? To address this problem we follow the steps described below. First of all we adapt the well-known spline smoothing techniques in order to cope with the smoothing of compositional data trajectories. In fact, an observed curve can be thought of as the sum of a smooth part plus some noise due to measurement errors. Spline smoothing techniques are used to isolate the smooth part of the trajectory: clustering algorithms are then applied to these smooth curves. The second step consists in building suitable metrics for measuring the dissimilarity between trajectories: we propose a metric that accounts for difference in both shape and level, and a metric accounting for differences in shape only. A simulation study is performed in order to evaluate the proposed methodologies, using both hierarchical and partitional clustering algorithm. The quality of the obtained results is assessed by means of several indices
Resumo:
Functional Data Analysis (FDA) deals with samples where a whole function is observed for each individual. A particular case of FDA is when the observed functions are density functions, that are also an example of infinite dimensional compositional data. In this work we compare several methods for dimensionality reduction for this particular type of data: functional principal components analysis (PCA) with or without a previous data transformation and multidimensional scaling (MDS) for diferent inter-densities distances, one of them taking into account the compositional nature of density functions. The difeerent methods are applied to both artificial and real data (households income distributions)
Resumo:
Many multivariate methods that are apparently distinct can be linked by introducing one or more parameters in their definition. Methods that can be linked in this way are correspondence analysis, unweighted or weighted logratio analysis (the latter also known as "spectral mapping"), nonsymmetric correspondence analysis, principal component analysis (with and without logarithmic transformation of the data) and multidimensional scaling. In this presentation I will show how several of these methods, which are frequently used in compositional data analysis, may be linked through parametrizations such as power transformations, linear transformations and convex linear combinations. Since the methods of interest here all lead to visual maps of data, a "movie" can be made where where the linking parameter is allowed to vary in small steps: the results are recalculated "frame by frame" and one can see the smooth change from one method to another. Several of these "movies" will be shown, giving a deeper insight into the similarities and differences between these methods
Resumo:
The preceding two editions of CoDaWork included talks on the possible consideration of densities as infinite compositions: Egozcue and D´ıaz-Barrero (2003) extended the Euclidean structure of the simplex to a Hilbert space structure of the set of densities within a bounded interval, and van den Boogaart (2005) generalized this to the set of densities bounded by an arbitrary reference density. From the many variations of the Hilbert structures available, we work with three cases. For bounded variables, a basis derived from Legendre polynomials is used. For variables with a lower bound, we standardize them with respect to an exponential distribution and express their densities as coordinates in a basis derived from Laguerre polynomials. Finally, for unbounded variables, a normal distribution is used as reference, and coordinates are obtained with respect to a Hermite-polynomials-based basis. To get the coordinates, several approaches can be considered. A numerical accuracy problem occurs if one estimates the coordinates directly by using discretized scalar products. Thus we propose to use a weighted linear regression approach, where all k- order polynomials are used as predictand variables and weights are proportional to the reference density. Finally, for the case of 2-order Hermite polinomials (normal reference) and 1-order Laguerre polinomials (exponential), one can also derive the coordinates from their relationships to the classical mean and variance. Apart of these theoretical issues, this contribution focuses on the application of this theory to two main problems in sedimentary geology: the comparison of several grain size distributions, and the comparison among different rocks of the empirical distribution of a property measured on a batch of individual grains from the same rock or sediment, like their composition