55 resultados para Astronautics in geology.
Resumo:
The statistical analysis of literary style is the part of stylometry that compares measurable characteristicsin a text that are rarely controlled by the author, with those in other texts. When thegoal is to settle authorship questions, these characteristics should relate to the author’s style andnot to the genre, epoch or editor, and they should be such that their variation between authors islarger than the variation within comparable texts from the same author.For an overview of the literature on stylometry and some of the techniques involved, see for exampleMosteller and Wallace (1964, 82), Herdan (1964), Morton (1978), Holmes (1985), Oakes (1998) orLebart, Salem and Berry (1998).Tirant lo Blanc, a chivalry book, is the main work in catalan literature and it was hailed to be“the best book of its kind in the world” by Cervantes in Don Quixote. Considered by writterslike Vargas Llosa or Damaso Alonso to be the first modern novel in Europe, it has been translatedseveral times into Spanish, Italian and French, with modern English translations by Rosenthal(1996) and La Fontaine (1993). The main body of this book was written between 1460 and 1465,but it was not printed until 1490.There is an intense and long lasting debate around its authorship sprouting from its first edition,where its introduction states that the whole book is the work of Martorell (1413?-1468), while atthe end it is stated that the last one fourth of the book is by Galba (?-1490), after the death ofMartorell. Some of the authors that support the theory of single authorship are Riquer (1990),Chiner (1993) and Badia (1993), while some of those supporting the double authorship are Riquer(1947), Coromines (1956) and Ferrando (1995). For an overview of this debate, see Riquer (1990).Neither of the two candidate authors left any text comparable to the one under study, and thereforediscriminant analysis can not be used to help classify chapters by author. By using sample textsencompassing about ten percent of the book, and looking at word length and at the use of 44conjunctions, prepositions and articles, Ginebra and Cabos (1998) detect heterogeneities that mightindicate the existence of two authors. By analyzing the diversity of the vocabulary, Riba andGinebra (2000) estimates that stylistic boundary to be near chapter 383.Following the lead of the extensive literature, this paper looks into word length, the use of the mostfrequent words and into the use of vowels in each chapter of the book. Given that the featuresselected are categorical, that leads to three contingency tables of ordered rows and therefore tothree sequences of multinomial observations.Section 2 explores these sequences graphically, observing a clear shift in their distribution. Section 3describes the problem of the estimation of a suden change-point in those sequences, in the followingsections we propose various ways to estimate change-points in multinomial sequences; the methodin section 4 involves fitting models for polytomous data, the one in Section 5 fits gamma modelsonto the sequence of Chi-square distances between each row profiles and the average profile, theone in Section 6 fits models onto the sequence of values taken by the first component of thecorrespondence analysis as well as onto sequences of other summary measures like the averageword length. In Section 7 we fit models onto the marginal binomial sequences to identify thefeatures that distinguish the chapters before and after that boundary. Most methods rely heavilyon the use of generalized linear models
Resumo:
Precision of released figures is not only an important quality feature of official statistics,it is also essential for a good understanding of the data. In this paper we show a casestudy of how precision could be conveyed if the multivariate nature of data has to betaken into account. In the official release of the Swiss earnings structure survey, the totalsalary is broken down into several wage components. We follow Aitchison's approachfor the analysis of compositional data, which is based on logratios of components. Wefirst present diferent multivariate analyses of the compositional data whereby the wagecomponents are broken down by economic activity classes. Then we propose a numberof ways to assess precision
Resumo:
By using suitable parameters, we present a uni¯ed aproach for describing four methods for representing categorical data in a contingency table. These methods include:correspondence analysis (CA), the alternative approach using Hellinger distance (HD),the log-ratio (LR) alternative, which is appropriate for compositional data, and theso-called non-symmetrical correspondence analysis (NSCA). We then make an appropriate comparison among these four methods and some illustrative examples are given.Some approaches based on cumulative frequencies are also linked and studied usingmatrices.Key words: Correspondence analysis, Hellinger distance, Non-symmetrical correspondence analysis, log-ratio analysis, Taguchi inertia
Resumo:
A condition needed for testing nested hypotheses from a Bayesianviewpoint is that the prior for the alternative model concentratesmass around the small, or null, model. For testing independencein contingency tables, the intrinsic priors satisfy this requirement.Further, the degree of concentration of the priors is controlled bya discrete parameter m, the training sample size, which plays animportant role in the resulting answer regardless of the samplesize.In this paper we study robustness of the tests of independencein contingency tables with respect to the intrinsic priors withdifferent degree of concentration around the null, and comparewith other “robust” results by Good and Crook. Consistency ofthe intrinsic Bayesian tests is established.We also discuss conditioning issues and sampling schemes,and argue that conditioning should be on either one margin orthe table total, but not on both margins.Examples using real are simulated data are given
Resumo:
Evolution of compositions in time, space, temperature or other covariates is frequentin practice. For instance, the radioactive decomposition of a sample changes its composition with time. Some of the involved isotopes decompose into other isotopes of thesample, thus producing a transfer of mass from some components to other ones, butpreserving the total mass present in the system. This evolution is traditionally modelledas a system of ordinary di erential equations of the mass of each component. However,this kind of evolution can be decomposed into a compositional change, expressed interms of simplicial derivatives, and a mass evolution (constant in this example). A rst result is that the simplicial system of di erential equations is non-linear, despiteof some subcompositions behaving linearly.The goal is to study the characteristics of such simplicial systems of di erential equa-tions such as linearity and stability. This is performed extracting the compositional differential equations from the mass equations. Then, simplicial derivatives are expressedin coordinates of the simplex, thus reducing the problem to the standard theory ofsystems of di erential equations, including stability. The characterisation of stabilityof these non-linear systems relays on the linearisation of the system of di erential equations at the stationary point, if any. The eigenvelues of the linearised matrix and theassociated behaviour of the orbits are the main tools. For a three component system,these orbits can be plotted both in coordinates of the simplex or in a ternary diagram.A characterisation of processes with transfer of mass in closed systems in terms of stability is thus concluded. Two examples are presented for illustration, one of them is aradioactive decay
Resumo:
Hungary lies entirely within the Carpatho-Pannonian Region (CPR), a dominant tectonic unit of eastern Central Europe. The CPR consists of the Pannonian Basin system, and the arc of the Carpathian Mountains surrounding the lowlands in the north, east, and southeast. In the west, the CPR is bounded by the Eastern Alps, whereas in the south, by the Dinaridic belt. (...)
Resumo:
The log-ratio methodology makes available powerful tools for analyzing compositionaldata. Nevertheless, the use of this methodology is only possible for those data setswithout null values. Consequently, in those data sets where the zeros are present, aprevious treatment becomes necessary. Last advances in the treatment of compositionalzeros have been centered especially in the zeros of structural nature and in the roundedzeros. These tools do not contemplate the particular case of count compositional datasets with null values. In this work we deal with \count zeros" and we introduce atreatment based on a mixed Bayesian-multiplicative estimation. We use the Dirichletprobability distribution as a prior and we estimate the posterior probabilities. Then weapply a multiplicative modi¯cation for the non-zero values. We present a case studywhere this new methodology is applied.Key words: count data, multiplicative replacement, composition, log-ratio analysis
Resumo:
In 2000 the European Statistical Office published the guidelines for developing theHarmonized European Time Use Surveys system. Under such a unified framework,the first Time Use Survey of national scope was conducted in Spain during 2002–03. The aim of these surveys is to understand human behavior and the lifestyle ofpeople. Time allocation data are of compositional nature in origin, that is, they aresubject to non-negativity and constant-sum constraints. Thus, standard multivariatetechniques cannot be directly applied to analyze them. The goal of this work is toidentify homogeneous Spanish Autonomous Communities with regard to the typicalactivity pattern of their respective populations. To this end, fuzzy clustering approachis followed. Rather than the hard partitioning of classical clustering, where objects areallocated to only a single group, fuzzy method identify overlapping groups of objectsby allowing them to belong to more than one group. Concretely, the probabilistic fuzzyc-means algorithm is conveniently adapted to deal with the Spanish Time Use Surveymicrodata. As a result, a map distinguishing Autonomous Communities with similaractivity pattern is drawn.Key words: Time use data, Fuzzy clustering; FCM; simplex space; Aitchison distance
Resumo:
Geochemical data that is derived from the whole or partial analysis of various geologic materialsrepresent a composition of mineralogies or solute species. Minerals are composed of structuredrelationships between cations and anions which, through atomic and molecular forces, keep the elementsbound in specific configurations. The chemical compositions of minerals have specific relationships thatare governed by these molecular controls. In the case of olivine, there is a well-defined relationshipbetween Mn-Fe-Mg with Si. Balances between the principal elements defining olivine composition andother significant constituents in the composition (Al, Ti) have been defined, resulting in a near-linearrelationship between the logarithmic relative proportion of Si versus (MgMnFe) and Mg versus (MnFe),which is typically described but poorly illustrated in the simplex.The present contribution corresponds to ongoing research, which attempts to relate stoichiometry andgeochemical data using compositional geometry. We describe here the approach by which stoichiometricrelationships based on mineralogical constraints can be accounted for in the space of simplicialcoordinates using olivines as an example. Further examples for other mineral types (plagioclases andmore complex minerals such as clays) are needed. Issues that remain to be dealt with include thereduction of a bulk chemical composition of a rock comprised of several minerals from which appropriatebalances can be used to describe the composition in a realistic mineralogical framework. The overallobjective of our research is to answer the question: In the cases where the mineralogy is unknown, arethere suitable proxies that can be substituted?Kew words: Aitchison geometry, balances, mineral composition, oxides
Resumo:
One of the tantalising remaining problems in compositional data analysis lies in how to deal with data sets in which there are components which are essential zeros. By anessential zero we mean a component which is truly zero, not something recorded as zero simply because the experimental design or the measuring instrument has not been sufficiently sensitive to detect a trace of the part. Such essential zeros occur inmany compositional situations, such as household budget patterns, time budgets,palaeontological zonation studies, ecological abundance studies. Devices such as nonzero replacement and amalgamation are almost invariably ad hoc and unsuccessful insuch situations. From consideration of such examples it seems sensible to build up amodel in two stages, the first determining where the zeros will occur and the secondhow the unit available is distributed among the non-zero parts. In this paper we suggest two such models, an independent binomial conditional logistic normal model and a hierarchical dependent binomial conditional logistic normal model. The compositional data in such modelling consist of an incidence matrix and a conditional compositional matrix. Interesting statistical problems arise, such as the question of estimability of parameters, the nature of the computational process for the estimation of both the incidence and compositional parameters caused by the complexity of the subcompositional structure, the formation of meaningful hypotheses, and the devising of suitable testing methodology within a lattice of such essential zero-compositional hypotheses. The methodology is illustrated by application to both simulated and real compositional data
Resumo:
The application of Discriminant function analysis (DFA) is not a new idea in the studyof tephrochrology. In this paper, DFA is applied to compositional datasets of twodifferent types of tephras from Mountain Ruapehu in New Zealand and MountainRainier in USA. The canonical variables from the analysis are further investigated witha statistical methodology of change-point problems in order to gain a betterunderstanding of the change in compositional pattern over time. Finally, a special caseof segmented regression has been proposed to model both the time of change and thechange in pattern. This model can be used to estimate the age for the unknown tephrasusing Bayesian statistical calibration
Resumo:
First discussion on compositional data analysis is attributable to Karl Pearson, in 1897. However, notwithstanding the recent developments on algebraic structure of the simplex, more than twenty years after Aitchison’s idea of log-transformations of closed data, scientific literature is again full of statistical treatments of this type of data by using traditional methodologies. This is particularly true in environmental geochemistry where besides the problem of the closure, the spatial structure (dependence) of the data have to be considered. In this work we propose the use of log-contrast values, obtained by asimplicial principal component analysis, as LQGLFDWRUV of given environmental conditions. The investigation of the log-constrast frequency distributions allows pointing out the statistical laws able togenerate the values and to govern their variability. The changes, if compared, for example, with the mean values of the random variables assumed as models, or other reference parameters, allow definingmonitors to be used to assess the extent of possible environmental contamination. Case study on running and ground waters from Chiavenna Valley (Northern Italy) by using Na+, K+, Ca2+, Mg2+, HCO3-, SO4 2- and Cl- concentrations will be illustrated
Resumo:
A problem in the archaeometric classification of Catalan Renaissance pottery is the fact, thatthe clay supply of the pottery workshops was centrally organized by guilds, and thereforeusually all potters of a single production centre produced chemically similar ceramics.However, analysing the glazes of the ware usually a large number of inclusions in the glaze isfound, which reveal technological differences between single workshops. These inclusionshave been used by the potters in order to opacify the transparent glaze and to achieve a whitebackground for further decoration.In order to distinguish different technological preparation procedures of the single workshops,at a Scanning Electron Microscope the chemical composition of those inclusions as well astheir size in the two-dimensional cut is recorded. Based on the latter, a frequency distributionof the apparent diameters is estimated for each sample and type of inclusion.Following an approach by S.D. Wicksell (1925), it is principally possible to transform thedistributions of the apparent 2D-diameters back to those of the true three-dimensional bodies.The applicability of this approach and its practical problems are examined using differentways of kernel density estimation and Monte-Carlo tests of the methodology. Finally, it istested in how far the obtained frequency distributions can be used to classify the pottery
Resumo:
The identification of compositional changes in fumarolic gases of active and quiescent volcanoes is one of the mostimportant targets in monitoring programs. From a general point of view, many systematic (often cyclic) and randomprocesses control the chemistry of gas discharges, making difficult to produce a convincing mathematical-statisticalmodelling.Changes in the chemical composition of volcanic gases sampled at Vulcano Island (Aeolian Arc, Sicily, Italy) fromeight different fumaroles located in the northern sector of the summit crater (La Fossa) have been analysed byconsidering their dependence from time in the period 2000-2007. Each intermediate chemical composition has beenconsidered as potentially derived from the contribution of the two temporal extremes represented by the 2000 and 2007samples, respectively, by using inverse modelling methodologies for compositional data. Data pertaining to fumarolesF5 and F27, located on the rim and in the inner part of La Fossa crater, respectively, have been used to achieve theproposed aim. The statistical approach has allowed us to highlight the presence of random and not random fluctuations,features useful to understand how the volcanic system works, opening new perspectives in sampling strategies and inthe evaluation of the natural risk related to a quiescent volcano
Resumo:
A key issue in the implementation of the Water Framework Directive is the classification of streams and rivers using biological quality parameters and type-specific reference conditions. Four groups of stream types were defined in NE Spain on the basis of 152 diatom samples by means of detrended correspondence analysis and classification techniques. Diatom analysis was restricted to epilithic taxa, and the sites included gradients ranging from near-natural streams to sites with poor ecological quality. The main gradient shows a clear separation of sites in relation to the degree of human influence: polluted streams (mainly located in the lowlands) differ from streams in mountainous areas and in the Pyrenees. A second gradient is related to physiographical features. Headwater streams can be distinguished by their catchment geology. The type-specific diatom taxa for the stream types studied were determined by using indicator species analysis (IndVal). The type-specific taxa from near-natural streams are coincident with the indicator taxa for high ecological status. Human impact reduced the typological heterogeneity of the diatom community composition. Overall, the diatom communities in NE Spain exhibit a regional distribution pattern that closely corresponds with that observed in river systems elsewhere. Physiographical differences are only evident in undisturbed sites, while nutrient enrichment and other human disturbances may mask the regional differences in the distribution of diatom communities