994 resultados para compositional data,


Relevância:

70.00% 70.00%

Publicador:

Resumo:

Low concentrations of elements in geochemical analyses have the peculiarity of being compositional data and, for a given level of significance, are likely to be beyond the capabilities of laboratories to distinguish between minute concentrations and complete absence, thus preventing laboratories from reporting extremely low concentrations of the analyte. Instead, what is reported is the detection limit, which is the minimum concentration that conclusively differentiates between presence and absence of the element. A spatially distributed exhaustive sample is employed in this study to generate unbiased sub-samples, which are further censored to observe the effect that different detection limits and sample sizes have on the inference of population distributions starting from geochemical analyses having specimens below detection limit (nondetects). The isometric logratio transformation is used to convert the compositional data in the simplex to samples in real space, thus allowing the practitioner to properly borrow from the large source of statistical techniques valid only in real space. The bootstrap method is used to numerically investigate the reliability of inferring several distributional parameters employing different forms of imputation for the censored data. The case study illustrates that, in general, best results are obtained when imputations are made using the distribution best fitting the readings above detection limit and exposes the problems of other more widely used practices. When the sample is spatially correlated, it is necessary to combine the bootstrap with stochastic simulation

Relevância:

70.00% 70.00%

Publicador:

Resumo:

A compositional time series is obtained when a compositional data vector is observed at different points in time. Inherently, then, a compositional time series is a multivariate time series with important constraints on the variables observed at any instance in time. Although this type of data frequently occurs in situations of real practical interest, a trawl through the statistical literature reveals that research in the field is very much in its infancy and that many theoretical and empirical issues still remain to be addressed. Any appropriate statistical methodology for the analysis of compositional time series must take into account the constraints which are not allowed for by the usual statistical techniques available for analysing multivariate time series. One general approach to analyzing compositional time series consists in the application of an initial transform to break the positive and unit sum constraints, followed by the analysis of the transformed time series using multivariate ARIMA models. In this paper we discuss the use of the additive log-ratio, centred log-ratio and isometric log-ratio transforms. We also present results from an empirical study designed to explore how the selection of the initial transform affects subsequent multivariate ARIMA modelling as well as the quality of the forecasts

Relevância:

70.00% 70.00%

Publicador:

Resumo:

A joint distribution of two discrete random variables with finite support can be displayed as a two way table of probabilities adding to one. Assume that this table has n rows and m columns and all probabilities are non-null. This kind of table can be seen as an element in the simplex of n · m parts. In this context, the marginals are identified as compositional amalgams, conditionals (rows or columns) as subcompositions. Also, simplicial perturbation appears as Bayes theorem. However, the Euclidean elements of the Aitchison geometry of the simplex can also be translated into the table of probabilities: subspaces, orthogonal projections, distances. Two important questions are addressed: a) given a table of probabilities, which is the nearest independent table to the initial one? b) which is the largest orthogonal projection of a row onto a column? or, equivalently, which is the information in a row explained by a column, thus explaining the interaction? To answer these questions three orthogonal decompositions are presented: (1) by columns and a row-wise geometric marginal, (2) by rows and a columnwise geometric marginal, (3) by independent two-way tables and fully dependent tables representing row-column interaction. An important result is that the nearest independent table is the product of the two (row and column)-wise geometric marginal tables. A corollary is that, in an independent table, the geometric marginals conform with the traditional (arithmetic) marginals. These decompositions can be compared with standard log-linear models. Key words: balance, compositional data, simplex, Aitchison geometry, composition, orthonormal basis, arithmetic and geometric marginals, amalgam, dependence measure, contingency table

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The amalgamation operation is frequently used to reduce the number of parts of compositional data but it is a non-linear operation in the simplex with the usual geometry, the Aitchison geometry. The concept of balances between groups, a particular coordinate system designed over binary partitions of the parts, could be an alternative to the amalgamation in some cases. In this work we discuss the proper application of both concepts using a real data set corresponding to behavioral measures of pregnant sows

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Planners in public and private institutions would like coherent forecasts of the components of age-specic mortality, such as causes of death. This has been di cult to achieve because the relative values of the forecast components often fail to behave in a way that is coherent with historical experience. In addition, when the group forecasts are combined the result is often incompatible with an all-groups forecast. It has been shown that cause-specic mortality forecasts are pessimistic when compared with all-cause forecasts (Wilmoth, 1995). This paper abandons the conventional approach of using log mortality rates and forecasts the density of deaths in the life table. Since these values obey a unit sum constraint for both conventional single-decrement life tables (only one absorbing state) and multiple-decrement tables (more than one absorbing state), they are intrinsically relative rather than absolute values across decrements as well as ages. Using the methods of Compositional Data Analysis pioneered by Aitchison (1986), death densities are transformed into the real space so that the full range of multivariate statistics can be applied, then back-transformed to positive values so that the unit sum constraint is honoured. The structure of the best-known, single-decrement mortality-rate forecasting model, devised by Lee and Carter (1992), is expressed in compositional form and the results from the two models are compared. The compositional model is extended to a multiple-decrement form and used to forecast mortality by cause of death for Japan

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Functional Data Analysis (FDA) deals with samples where a whole function is observed for each individual. A particular case of FDA is when the observed functions are density functions, that are also an example of infinite dimensional compositional data. In this work we compare several methods for dimensionality reduction for this particular type of data: functional principal components analysis (PCA) with or without a previous data transformation and multidimensional scaling (MDS) for diferent inter-densities distances, one of them taking into account the compositional nature of density functions. The difeerent methods are applied to both artificial and real data (households income distributions)

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Compositional data, also called multiplicative ipsative data, are common in survey research instruments in areas such as time use, budget expenditure and social networks. Compositional data are usually expressed as proportions of a total, whose sum can only be 1. Owing to their constrained nature, statistical analysis in general, and estimation of measurement quality with a confirmatory factor analysis model for multitrait-multimethod (MTMM) designs in particular are challenging tasks. Compositional data are highly non-normal, as they range within the 0-1 interval. One component can only increase if some other(s) decrease, which results in spurious negative correlations among components which cannot be accounted for by the MTMM model parameters. In this article we show how researchers can use the correlated uniqueness model for MTMM designs in order to evaluate measurement quality of compositional indicators. We suggest using the additive log ratio transformation of the data, discuss several approaches to deal with zero components and explain how the interpretation of MTMM designs di ers from the application to standard unconstrained data. We show an illustration of the method on data of social network composition expressed in percentages of partner, family, friends and other members in which we conclude that the faceto-face collection mode is generally superior to the telephone mode, although primacy e ects are higher in the face-to-face mode. Compositions of strong ties (such as partner) are measured with higher quality than those of weaker ties (such as other network members)

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Aquesta tesi estudia com estimar la distribució de les variables regionalitzades l'espai mostral i l'escala de les quals admeten una estructura d'espai Euclidià. Apliquem el principi del treball en coordenades: triem una base ortonormal, fem estadística sobre les coordenades de les dades, i apliquem els output a la base per tal de recuperar un resultat en el mateix espai original. Aplicant-ho a les variables regionalitzades, obtenim una aproximació única consistent, que generalitza les conegudes propietats de les tècniques de kriging a diversos espais mostrals: dades reals, positives o composicionals (vectors de components positives amb suma constant) són tractades com casos particulars. D'aquesta manera, es generalitza la geostadística lineal, i s'ofereix solucions a coneguts problemes de la no-lineal, tot adaptant la mesura i els criteris de representativitat (i.e., mitjanes) a les dades tractades. L'estimador per a dades positives coincideix amb una mitjana geomètrica ponderada, equivalent a l'estimació de la mediana, sense cap dels problemes del clàssic kriging lognormal. El cas composicional ofereix solucions equivalents, però a més permet estimar vectors de probabilitat multinomial. Amb una aproximació bayesiana preliminar, el kriging de composicions esdevé també una alternativa consistent al kriging indicador. Aquesta tècnica s'empra per estimar funcions de probabilitat de variables qualsevol, malgrat que sovint ofereix estimacions negatives, cosa que s'evita amb l'alternativa proposada. La utilitat d'aquest conjunt de tècniques es comprova estudiant la contaminació per amoníac a una estació de control automàtic de la qualitat de l'aigua de la conca de la Tordera, i es conclou que només fent servir les tècniques proposades hom pot detectar en quins instants l'amoni es transforma en amoníac en una concentració superior a la legalment permesa.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Parent, L. E., Natale, W. and Ziadi, N. 2009. Compositional nutrient diagnosis of corn using the Mahalanobis distance as nutrient imbalance index. Can. J. Soil Sci. 89: 383-390. Compositional nutrient diagnosis (CND) provides a plant nutrient imbalance index (CND - r(2)) with assumed chi(2) distribution. The Mahalanobis distance D(2), which detects outliers in compositional data sets, also has a chi(2) distribution. The objective of this paper was to compare D(2) and CND - r(2) nutrient imbalance indexes in corn (Zea mays L.). We measured grain yield as well as N, P, K, Ca, Mg, Cu, Fe, Mn, and Zn concentrations in the ear leaf at silk stage for 210 calibration sites in the St. Lawrence Lowlands [2300-2700 corn thermal units (CTU)] as well as 30 phosphorus (2300-2700 CTU; 10 sites) and 10 nitrogen (1900-2100 CTU; one site) replicated fertilizer treatments for validation. We derived CND norms as mean, standard deviation, and the inverse covariance matrix of centred log ratios (clr) for high yielding specimens (>= 9.0 Mg grain ha(-1) at 150 g H(2)O kg(-1) moisture content) in the 2300-2700 CTU zone. Using chi(2) = 17 (P < 0.05) with nine degrees of freedom (i.e., nine nutrients) as a rejection criterion for outliers and a yield threshold of 8.6 Mg ha(-1) after Cate-Nelson partitioning between low- and high-yielders in the P validation data set, D(2) misclassified two specimens compared with nine for CND -r(2). The D(2) classification was not significantly different from a chi(2) classification (P > 0.05), but the CND - r(2) classification differed significantly from chi(2) or D(2) (P < 0.001). A threshold value for nutrient imbalance could thus be derived probabilistically for conducting D(2) diagnosis, while the CND - r(2) nutrient imbalance threshold must be calibrated using fertilizer trials. In the proposed CND - D(2) procedure, D(2) is first computed to classify the specimen as possible outlier. Thereafter, nutrient indices are ranked in their order of limitation. The D(2) norms appeared less effective in the 1900-2100 CTU zone.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Soil aggregation is an index of soil structure measured by mean weight diameter (MWD) or scaling factors often interpreted as fragmentation fractal dimensions (D-f). However, the MWD provides a biased estimate of soil aggregation due to spurious correlations among aggregate-size fractions and scale-dependency. The scale-invariant D-f is based on weak assumptions to allow particle counts and sensitive to the selection of the fractal domain, and may frequently exceed a value of 3, implying that D-f is a biased estimate of aggregation. Aggregation indices based on mass may be computed without bias using compositional analysis techniques. Our objective was to elaborate compositional indices of soil aggregation and to compare them to MWD and D-f using a published dataset describing the effect of 7 cropping systems on aggregation. Six aggregate-size fractions were arranged into a sequence of D-1 balances of building blocks that portray the process of soil aggregation. Isometric log-ratios (ilrs) are scale-invariant and orthogonal log contrasts or balances that possess the Euclidean geometry necessary to compute a distance between any two aggregation states, known as the Aitchison distance (A(x,y)). Close correlations (r>0.98) were observed between MWD, D-f, and the ilr when contrasting large and small aggregate sizes. Several unbiased embedded ilrs can characterize the heterogeneous nature of soil aggregates and be related to soil properties or functions. Soil bulk density and penetrater resistance were closely related to A(x,y) with reference to bare fallow. The A(x,y) is easy to implement as unbiased index of soil aggregation using standard sieving methods and may allow comparisons between studies. (C) 2012 Elsevier B.V. All rights reserved.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Abstract Background Transcript enumeration methods such as SAGE, MPSS, and sequencing-by-synthesis EST "digital northern", are important high-throughput techniques for digital gene expression measurement. As other counting or voting processes, these measurements constitute compositional data exhibiting properties particular to the simplex space where the summation of the components is constrained. These properties are not present on regular Euclidean spaces, on which hybridization-based microarray data is often modeled. Therefore, pattern recognition methods commonly used for microarray data analysis may be non-informative for the data generated by transcript enumeration techniques since they ignore certain fundamental properties of this space. Results Here we present a software tool, Simcluster, designed to perform clustering analysis for data on the simplex space. We present Simcluster as a stand-alone command-line C package and as a user-friendly on-line tool. Both versions are available at: http://xerad.systemsbiology.net/simcluster. Conclusion Simcluster is designed in accordance with a well-established mathematical framework for compositional data analysis, which provides principled procedures for dealing with the simplex space, and is thus applicable in a number of contexts, including enumeration-based gene expression data.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

New trace element, Sr-, Nd-, Pb- and Hf isotope data provide insights into the evolution of the Tonga-Lau Basin subduction system. The involvement of two separate mantle domains, namely Pacific MORB mantle in the pre-rift and early stages of back-arc basin formation, and Indian MORB mantle in the later stages, is confirmed by these results. Contrary to models proposed in recent studies on the basis of Pb isotope and other compositional data, this change in mantle wedge character best explains the shift in the isotopic composition, particularly 143Nd/144Nd ratios, of modern Tofua Arc magmas relative to all other arc products from this region. Nevertheless, significant changes in the slab-derived flux during the evolution of the arc system are also required to explain second order variations in magma chemistry. In this region, the slab-derived flux is dominated by fluid; however, these fluids carry Pb with sediment-influenced isotopic signatures, indicating that their source is not restricted to the subducting altered mafic oceanic crust. This has been the case from the earliest magmatic activity in the arc (Eocene) until the present time, with the exception of two periods of magmatic activity recorded in samples from the Lau Islands. Both the Lau Volcanic Group, and Korobasaga Volcanic Group lavas preserve trace element and isotope evidence for a contribution from subducted sediment that was not transported as a fluid, but possibly in the form of a melt. This component shares similarities with that influencing the chemistry of the northern Tofua Arc magmas, suggesting some caution may be required in the adoption of constraints for the latter dependent upon the involvement of sediments from the Louisville Ridge. A key outcome of this study is to demonstrate that the models proposed to explain subduction zone magmatism cannot afford to ignore the small but important contributions made by the mantle wedge to the incompatible trace element inventory of arc magmas.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Researchers in ecology commonly use multivariate analyses (e.g. redundancy analysis, canonical correspondence analysis, Mantel correlation, multivariate analysis of variance) to interpret patterns in biological data and relate these patterns to environmental predictors. There has been, however, little recognition of the errors associated with biological data and the influence that these may have on predictions derived from ecological hypotheses. We present a permutational method that assesses the effects of taxonomic uncertainty on the multivariate analyses typically used in the analysis of ecological data. The procedure is based on iterative randomizations that randomly re-assign non identified species in each site to any of the other species found in the remaining sites. After each re-assignment of species identities, the multivariate method at stake is run and a parameter of interest is calculated. Consequently, one can estimate a range of plausible values for the parameter of interest under different scenarios of re-assigned species identities. We demonstrate the use of our approach in the calculation of two parameters with an example involving tropical tree species from western Amazonia: 1) the Mantel correlation between compositional similarity and environmental distances between pairs of sites, and; 2) the variance explained by environmental predictors in redundancy analysis (RDA). We also investigated the effects of increasing taxonomic uncertainty (i.e. number of unidentified species), and the taxonomic resolution at which morphospecies are determined (genus-resolution, family-resolution, or fully undetermined species) on the uncertainty range of these parameters. To achieve this, we performed simulations on a tree dataset from southern Mexico by randomly selecting a portion of the species contained in the dataset and classifying them as unidentified at each level of decreasing taxonomic resolution. An analysis of covariance showed that both taxonomic uncertainty and resolution significantly influence the uncertainty range of the resulting parameters. Increasing taxonomic uncertainty expands our uncertainty of the parameters estimated both in the Mantel test and RDA. The effects of increasing taxonomic resolution, however, are not as evident. The method presented in this study improves the traditional approaches to study compositional change in ecological communities by accounting for some of the uncertainty inherent to biological data. We hope that this approach can be routinely used to estimate any parameter of interest obtained from compositional data tables when faced with taxonomic uncertainty.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

A compositional multivariate approach is used to analyse regional scale soil geochemical data obtained as part of the Tellus Project generated by the Geological Survey Northern Ireland (GSNI). The multi-element total concentration data presented comprise XRF analyses of 6862 rural soil samples collected at 20cm depths on a non-aligned grid at one site per 2 km2. Censored data were imputed using published detection limits. Using these imputed values for 46 elements (including LOI), each soil sample site was assigned to the regional geology map provided by GSNI initially using the dominant lithology for the map polygon. Northern Ireland includes a diversity of geology representing a stratigraphic record from the Mesoproterozoic, up to and including the Palaeogene. However, the advance of ice sheets and their meltwaters over the last 100,000 years has left at least 80% of the bedrock covered by superficial deposits, including glacial till and post-glacial alluvium and peat. The question is to what extent the soil geochemistry reflects the underlying geology or superficial deposits. To address this, the geochemical data were transformed using centered log ratios (clr) to observe the requirements of compositional data analysis and avoid closure issues. Following this, compositional multivariate techniques including compositional Principal Component Analysis (PCA) and minimum/maximum autocorrelation factor (MAF) analysis method were used to determine the influence of underlying geology on the soil geochemistry signature. PCA showed that 72% of the variation was determined by the first four principal components (PC’s) implying “significant” structure in the data. Analysis of variance showed that only 10 PC’s were necessary to classify the soil geochemical data. To consider an improvement over PCA that uses the spatial relationships of the data, a classification based on MAF analysis was undertaken using the first 6 dominant factors. Understanding the relationship between soil geochemistry and superficial deposits is important for environmental monitoring of fragile ecosystems such as peat. To explore whether peat cover could be predicted from the classification, the lithology designation was adapted to include the presence of peat, based on GSNI superficial deposit polygons and linear discriminant analysis (LDA) undertaken. Prediction accuracy for LDA classification improved from 60.98% based on PCA using 10 principal components to 64.73% using MAF based on the 6 most dominant factors. The misclassification of peat may reflect degradation of peat covered areas since the creation of superficial deposit classification. Further work will examine the influence of underlying lithologies on elemental concentrations in peat composition and the effect of this in classification analysis.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Open system pyrolysis (heating rate 10 degrees C/min) of coal maturity (vitrinite reflectance, VR) sequence (0.5%, 0.8% and 1.4% VR) demonstrates that there are two stages of thermogenic methane generation from Bowen Basin coals. The first and major stage shows a steady increase in methane generation maximising at 570 degrees C, corresponding to a VR of 2-2.5%. This is followed by a less intense methane generation which has not as yet maximised by 800 degrees C (equivalent to VR of 5%). Heavier (C2+) hydrocarbons are generated up to 570 degrees C after which only the C-1 (CH4, CO and CO2) gases are produced. The main phase of heavy hydrocarbon generation occurs between 420 and 510 degrees C. Over this temperature range,methane generation accounts for only a minor component, whereas the wet gases (C-2-C-5) are either in equal abundance or are more abundant by a factor of two than the liquid hydrocarbons. The yields of non-hydrocarbon gases CO2 and CO are greater then methane during the early stages of gas generation from an immature coal, subordinate to methane during the main phase of methane generation after which they are again dominant. Compositional data for desorbed and produced coal seam gases from the Bowen show that CO2 and wet gases are a minor component. This discrepancy between the proportion of wet gas components produced during open system pyrolysis and that observed in naturally matured coals may be the result of preferential migration of wet gas components, by dilution of methane generated during secondary cracking of bitumen, or kinetic effects associated with different activations for production of individual hydrocarbon gases. Extrapolation of results of artificial pyrolysis of the main organic components in coal to geological significant heating rates suggests that isotopically light methane to delta(13)C of -50 parts per thousand can be generated. Carbon isotope depletions in C-13 are further enhanced, however, as a result of trapping of gases over selected rank levels (instantaneous generation) which is a probable explanation for the range of delta(13)C values we have recorded in methane desorbed from Bowen Basin coals (-51 +/- 9 parts per thousand). Pervasive carbonate-rich veins in Bowen Basin coals are the product of magmatism-related hydrothermal activity. Furthermore, the pyrolysis results suggest an additional organic carbon source front CO2 released at any stage during the maturation history could mix in varying proportions with CO2 from the other sources. This interpretation is supported by C and O isotopic ratios, of carbonates that indicate mixing between magmatic and meteoric fluids. Also, the steep slope of the C and O isotope correlation trend suggests that the carbonates were deposited over a very narrow temperature interval basin-wide, or at relatively high temperatures (i.e., greater than 150 degrees C) where mineral-fluid oxygen isotope fractionations are small. These temperatures are high enough for catagenic production of methane and higher hydrocarbons from the coal and coal-derived bitumen. The results suggests that a combination of thermogenic generation of methane and thermodynamic processes associated with CH4/CO2 equilibria are the two most important factors that control the primary isotope and molecular composition of coal seam gases in the Bowen Basin. Biological process are regionally subordinate but may be locally significant. (C) 1998 Published by Elsevier Science Ltd. All rights reserved.