260 resultados para Hidrogeologia -- Catalunya -- Sant Antoni de Calonge
Resumo:
Developments in the statistical analysis of compositional data over the last two decades have made possible a much deeper exploration of the nature of variability, and the possible processes associated with compositional data sets from many disciplines. In this paper we concentrate on geochemical data sets. First we explain how hypotheses of compositional variability may be formulated within the natural sample space, the unit simplex, including useful hypotheses of subcompositional discrimination and specific perturbational change. Then we develop through standard methodology, such as generalised likelihood ratio tests, statistical tools to allow the systematic investigation of a complete lattice of such hypotheses. Some of these tests are simple adaptations of existing multivariate tests but others require special construction. We comment on the use of graphical methods in compositional data analysis and on the ordination of specimens. The recent development of the concept of compositional processes is then explained together with the necessary tools for a staying- in-the-simplex approach, namely compositional singular value decompositions. All these statistical techniques are illustrated for a substantial compositional data set, consisting of 209 major-oxide and rare-element compositions of metamorphosed limestones from the Northeast and Central Highlands of Scotland. Finally we point out a number of unresolved problems in the statistical analysis of compositional processes
Resumo:
First discussion on compositional data analysis is attributable to Karl Pearson, in 1897. However, notwithstanding the recent developments on algebraic structure of the simplex, more than twenty years after Aitchison’s idea of log-transformations of closed data, scientific literature is again full of statistical treatments of this type of data by using traditional methodologies. This is particularly true in environmental geochemistry where besides the problem of the closure, the spatial structure (dependence) of the data have to be considered. In this work we propose the use of log-contrast values, obtained by a simplicial principal component analysis, as LQGLFDWRUV of given environmental conditions. The investigation of the log-constrast frequency distributions allows pointing out the statistical laws able to generate the values and to govern their variability. The changes, if compared, for example, with the mean values of the random variables assumed as models, or other reference parameters, allow defining monitors to be used to assess the extent of possible environmental contamination. Case study on running and ground waters from Chiavenna Valley (Northern Italy) by using Na+, K+, Ca2+, Mg2+, HCO3-, SO4 2- and Cl- concentrations will be illustrated
Resumo:
The use of perturbation and power transformation operations permits the investigation of linear processes in the simplex as in a vectorial space. When the investigated geochemical processes can be constrained by the use of well-known starting point, the eigenvectors of the covariance matrix of a non-centred principal component analysis allow to model compositional changes compared with a reference point. The results obtained for the chemistry of water collected in River Arno (central-northern Italy) have open new perspectives for considering relative changes of the analysed variables and to hypothesise the relative effect of different acting physical-chemical processes, thus posing the basis for a quantitative modelling
Resumo:
Kriging is an interpolation technique whose optimality criteria are based on normality assumptions either for observed or for transformed data. This is the case of normal, lognormal and multigaussian kriging. When kriging is applied to transformed scores, optimality of obtained estimators becomes a cumbersome concept: back-transformed optimal interpolations in transformed scores are not optimal in the original sample space, and vice-versa. This lack of compatible criteria of optimality induces a variety of problems in both point and block estimates. For instance, lognormal kriging, widely used to interpolate positive variables, has no straightforward way to build consistent and optimal confidence intervals for estimates. These problems are ultimately linked to the assumed space structure of the data support: for instance, positive values, when modelled with lognormal distributions, are assumed to be embedded in the whole real space, with the usual real space structure and Lebesgue measure
Resumo:
Hungary lies entirely within the Carpatho-Pannonian Region (CPR), a dominant tectonic unit of eastern Central Europe. The CPR consists of the Pannonian Basin system, and the arc of the Carpathian Mountains surrounding the lowlands in the north, east, and southeast. In the west, the CPR is bounded by the Eastern Alps, whereas in the south, by the Dinaridic belt. (...)
Resumo:
In standard multivariate statistical analysis common hypotheses of interest concern changes in mean vectors and subvectors. In compositional data analysis it is now well established that compositional change is most readily described in terms of the simplicial operation of perturbation and that subcompositions replace the marginal concept of subvectors. To motivate the statistical developments of this paper we present two challenging compositional problems from food production processes. Against this background the relevance of perturbations and subcompositions can be clearly seen. Moreover we can identify a number of hypotheses of interest involving the specification of particular perturbations or differences between perturbations and also hypotheses of subcompositional stability. We identify the two problems as being the counterpart of the analysis of paired comparison or split plot experiments and of separate sample comparative experiments in the jargon of standard multivariate analysis. We then develop appropriate estimation and testing procedures for a complete lattice of relevant compositional hypotheses
Resumo:
The low levels of unemployment recorded in the UK in recent years are widely cited as evidence of the country’s improved economic performance, and the apparent convergence of unemployment rates across the country’s regions used to suggest that the longstanding divide in living standards between the relatively prosperous ‘south’ and the more depressed ‘north’ has been substantially narrowed. Dissenters from these conclusions have drawn attention to the greatly increased extent of non-employment (around a quarter of the UK’s working age population are not in employment) and the marked regional dimension in its distribution across the country. Amongst these dissenters it is generally agreed that non-employment is concentrated amongst older males previously employed in the now very much smaller ‘heavy’ industries (e.g. coal, steel, shipbuilding). This paper uses the tools of compositiona l data analysis to provide a much richer picture of non-employment and one which challenges the conventional analysis wisdom about UK labour market performance as well as the dissenters view of the nature of the problem. It is shown that, associated with the striking ‘north/south’ divide in nonemployment rates, there is a statistically significant relationship between the size of the non-employment rate and the composition of non-employment. Specifically, it is shown that the share of unemployment in non-employment is negatively correlated with the overall non-employment rate: in regions where the non-employment rate is high the share of unemployment is relatively low. So the unemployment rate is not a very reliable indicator of regional disparities in labour market performance. Even more importantly from a policy viewpoint, a significant positive relationship is found between the size of the non-employment rate and the share of those not employed through reason of sickness or disability and it seems (contrary to the dissenters) that this connection is just as strong for women as it is for men
Resumo:
Compositional random vectors are fundamental tools in the Bayesian analysis of categorical data. Many of the issues that are discussed with reference to the statistical analysis of compositional data have a natural counterpart in the construction of a Bayesian statistical model for categorical data. This note builds on the idea of cross-fertilization of the two areas recommended by Aitchison (1986) in his seminal book on compositional data. Particular emphasis is put on the problem of what parameterization to use
Resumo:
In human Population Genetics, routine applications of principal component techniques are often required. Population biologists make widespread use of certain discrete classifications of human samples into haplotypes, the monophyletic units of phylogenetic trees constructed from several single nucleotide bimorphisms hierarchically ordered. Compositional frequencies of the haplotypes are recorded within the different samples. Principal component techniques are then required as a dimension-reducing strategy to bring the dimension of the problem to a manageable level, say two, to allow for graphical analysis. Population biologists at large are not aware of the special features of compositional data and normally make use of the crude covariance of compositional relative frequencies to construct principal components. In this short note we present our experience with using traditional linear principal components or compositional principal components based on logratios, with reference to a specific dataset
Resumo:
The main instrument used in psychological measurement is the self-report questionnaire. One of its major drawbacks however is its susceptibility to response biases. A known strategy to control these biases has been the use of so-called ipsative items. Ipsative items are items that require the respondent to make between-scale comparisons within each item. The selected option determines to which scale the weight of the answer is attributed. Consequently in questionnaires only consisting of ipsative items every respondent is allotted an equal amount, i.e. the total score, that each can distribute differently over the scales. Therefore this type of response format yields data that can be considered compositional from its inception. Methodological oriented psychologists have heavily criticized this type of item format, since the resulting data is also marked by the associated unfavourable statistical properties. Nevertheless, clinicians have kept using these questionnaires to their satisfaction. This investigation therefore aims to evaluate both positions and addresses the similarities and differences between the two data collection methods. The ultimate objective is to formulate a guideline when to use which type of item format. The comparison is based on data obtained with both an ipsative and normative version of three psychological questionnaires, which were administered to 502 first-year students in psychology according to a balanced within-subjects design. Previous research only compared the direct ipsative scale scores with the derived ipsative scale scores. The use of compositional data analysis techniques also enables one to compare derived normative score ratios with direct normative score ratios. The addition of the second comparison not only offers the advantage of a better-balanced research strategy. In principle it also allows for parametric testing in the evaluation
Resumo:
Most of economic literature has presented its analysis under the assumption of homogeneous capital stock. However, capital composition differs across countries. What has been the pattern of capital composition associated with World economies? We make an exploratory statistical analysis based on compositional data transformed by Aitchinson logratio transformations and we use tools for visualizing and measuring statistical estimators of association among the components. The goal is to detect distinctive patterns in the composition. As initial findings could be cited that: 1. Sectorial components behaved in a correlated way, building industries on one side and , in a less clear view, equipment industries on the other. 2. Full sample estimation shows a negative correlation between durable goods component and other buildings component and between transportation and building industries components. 3. Countries with zeros in some components are mainly low income countries at the bottom of the income category and behaved in a extreme way distorting main results observed in the full sample. 4. After removing these extreme cases, conclusions seem not very sensitive to the presence of another isolated cases
Resumo:
The statistical analysis of literary style is the part of stylometry that compares measurable characteristics in a text that are rarely controlled by the author, with those in other texts. When the goal is to settle authorship questions, these characteristics should relate to the author’s style and not to the genre, epoch or editor, and they should be such that their variation between authors is larger than the variation within comparable texts from the same author. For an overview of the literature on stylometry and some of the techniques involved, see for example Mosteller and Wallace (1964, 82), Herdan (1964), Morton (1978), Holmes (1985), Oakes (1998) or Lebart, Salem and Berry (1998). Tirant lo Blanc, a chivalry book, is the main work in catalan literature and it was hailed to be “the best book of its kind in the world” by Cervantes in Don Quixote. Considered by writters like Vargas Llosa or Damaso Alonso to be the first modern novel in Europe, it has been translated several times into Spanish, Italian and French, with modern English translations by Rosenthal (1996) and La Fontaine (1993). The main body of this book was written between 1460 and 1465, but it was not printed until 1490. There is an intense and long lasting debate around its authorship sprouting from its first edition, where its introduction states that the whole book is the work of Martorell (1413?-1468), while at the end it is stated that the last one fourth of the book is by Galba (?-1490), after the death of Martorell. Some of the authors that support the theory of single authorship are Riquer (1990), Chiner (1993) and Badia (1993), while some of those supporting the double authorship are Riquer (1947), Coromines (1956) and Ferrando (1995). For an overview of this debate, see Riquer (1990). Neither of the two candidate authors left any text comparable to the one under study, and therefore discriminant analysis can not be used to help classify chapters by author. By using sample texts encompassing about ten percent of the book, and looking at word length and at the use of 44 conjunctions, prepositions and articles, Ginebra and Cabos (1998) detect heterogeneities that might indicate the existence of two authors. By analyzing the diversity of the vocabulary, Riba and Ginebra (2000) estimates that stylistic boundary to be near chapter 383. Following the lead of the extensive literature, this paper looks into word length, the use of the most frequent words and into the use of vowels in each chapter of the book. Given that the features selected are categorical, that leads to three contingency tables of ordered rows and therefore to three sequences of multinomial observations. Section 2 explores these sequences graphically, observing a clear shift in their distribution. Section 3 describes the problem of the estimation of a suden change-point in those sequences, in the following sections we propose various ways to estimate change-points in multinomial sequences; the method in section 4 involves fitting models for polytomous data, the one in Section 5 fits gamma models onto the sequence of Chi-square distances between each row profiles and the average profile, the one in Section 6 fits models onto the sequence of values taken by the first component of the correspondence analysis as well as onto sequences of other summary measures like the average word length. In Section 7 we fit models onto the marginal binomial sequences to identify the features that distinguish the chapters before and after that boundary. Most methods rely heavily on the use of generalized linear models
Resumo:
In several computer graphics areas, a refinement criterion is often needed to decide whether to go on or to stop sampling a signal. When the sampled values are homogeneous enough, we assume that they represent the signal fairly well and we do not need further refinement, otherwise more samples are required, possibly with adaptive subdivision of the domain. For this purpose, a criterion which is very sensitive to variability is necessary. In this paper, we present a family of discrimination measures, the f-divergences, meeting this requirement. These convex functions have been well studied and successfully applied to image processing and several areas of engineering. Two applications to global illumination are shown: oracles for hierarchical radiosity and criteria for adaptive refinement in ray-tracing. We obtain significantly better results than with classic criteria, showing that f-divergences are worth further investigation in computer graphics. Also a discrimination measure based on entropy of the samples for refinement in ray-tracing is introduced. The recursive decomposition of entropy provides us with a natural method to deal with the adaptive subdivision of the sampling region
Resumo:
Usually, psychometricians apply classical factorial analysis to evaluate construct validity of order rank scales. Nevertheless, these scales have particular characteristics that must be taken into account: total scores and rank are highly relevant
Resumo:
The chemical composition of sediments and rocks, as well as their distribution at the Martian surface, represent a long term archive of processes, which have formed the planetary surface. A survey of chemical compositions by means of Compositional Data Analysis represents a valuable tool to extract direct evidence for weathering processes and allows to quantify weathering and sedimentation rates. clr-biplot techniques are applied for visualization of chemical relationships across the surface (“chemical maps”). The variability among individual suites of data is further analyzed by means of clr-PCA, in order to extract chemical alteration vectors between fresh rocks and their crusts and for an assessment of different source reservoirs accessible to soil formation. Both techniques are applied to elucidate the influence of remote weathering by combined analysis of several soil forming branches. Vector analysis in the Simplex provides the opportunity to study atmosphere surface interactions, including the role and composition of volcanic gases