866 resultados para Compositional data analysis-roots in geosciences
Resumo:
In a seminal paper, Aitchison and Lauder (1985) introduced classical kernel density estimation techniques in the context of compositional data analysis. Indeed, they gave two options for the choice of the kernel to be used in the kernel estimator. One of these kernels is based on the use the alr transformation on the simplex SD jointly with the normal distribution on RD-1. However, these authors themselves recognized that this method has some deficiencies. A method for overcoming these dificulties based on recent developments for compositional data analysis and multivariate kernel estimation theory, combining the ilr transformation with the use of the normal density with a full bandwidth matrix, was recently proposed in Martín-Fernández, Chacón and Mateu- Figueras (2006). Here we present an extensive simulation study that compares both methods in practice, thus exploring the finite-sample behaviour of both estimators
Resumo:
The quantitative estimation of Sea Surface Temperatures from fossils assemblages is a fundamental issue in palaeoclimatic and paleooceanographic investigations. The Modern Analogue Technique, a widely adopted method based on direct comparison of fossil assemblages with modern coretop samples, was revised with the aim of conforming it to compositional data analysis. The new CODAMAT method was developed by adopting the Aitchison metric as distance measure. Modern coretop datasets are characterised by a large amount of zeros. The zero replacement was carried out by adopting a Bayesian approach to the zero replacement, based on a posterior estimation of the parameter of the multinomial distribution. The number of modern analogues from which reconstructing the SST was determined by means of a multiple approach by considering the Proxies correlation matrix, Standardized Residual Sum of Squares and Mean Squared Distance. This new CODAMAT method was applied to the planktonic foraminiferal assemblages of a core recovered in the Tyrrhenian Sea. Kew words: Modern analogues, Aitchison distance, Proxies correlation matrix, Standardized Residual Sum of Squares
Resumo:
Data analysis sessions are a common feature of discourse analytic communities, often involving participants with varying levels of expertise to those with significant expertise. Learning how to do data analysis and working with transcripts, however, are often new experiences for doctoral candidates within the social sciences. While many guides to doctoral education focus on procedures associated with data analysis (Heath, Hindmarsh, & Luff, 2010; McHoul & Rapley, 2001; Silverman, 2011; Wetherall, Taylor, & Yates, 2001), the in situ practices of doing data analysis are relatively undocumented. This chapter has been collaboratively written by members of a special interest research group, the Transcript Analysis Group (TAG), who meet regularly to examine transcripts representing audio- and video-recorded interactional data. Here, we investigate our own actual interactional practices and participation in this group where each member is both analyst and participant. We particularly focus on the pedagogic practices enacted in the group through investigating how members engage in the scholarly practice of data analysis. A key feature of talk within the data sessions is that members work collaboratively to identify and discuss ‘noticings’ from the audio-recorded and transcribed talk being examined, produce candidate analytic observations based on these discussions, and evaluate these observations. Our investigation of how talk constructs social practices in these sessions shows that participants move fluidly between actions that demonstrate pedagogic practices and expertise. Within any one session, members can display their expertise as analysts and, at the same time, display that they have gained an understanding that they did not have before. We take an ethnomethodological position that asks, ‘what’s going on here?’ in the data analysis session. By observing the in situ practices in fine-grained detail, we show how members participate in the data analysis sessions and make sense of a transcript.
Resumo:
A workshop providing an introduction to Bayesian data analysis and hypothesis testing using R, Jags and the BayesFactor package.
Resumo:
A substantial proportion of aetiological risks for many cancers and chronic diseases remain unexplained. Using geochemical soil and stream water samples collected as part of the Tellus Project studies, current research is investigating naturally occurring background levels of potentially toxic elements (PTEs) in soils and stream sediments and their possible relationship with progressive chronic kidney disease (CKD). The Tellus geological mapping project, Geological Survey Northern Ireland, collected soil sediment and stream water samples on a grid of one sample site every 2 km2 across the rural areas of Northern Ireland resulting in an excess of 6800 soil sampling locations and more than 5800 locations for stream water sampling. Accumulation of several PTEs including arsenic, cadmium, chromium, lead and mercury have been linked with human health and implicated in renal function decline. The hypothesis is that long-term exposure will result in cumulative exposure to PTEs and act as risk factor(s) for cancer and diabetes related CKD and its progression. The ‘bioavailable’ fraction of total PTE soil concentration depends on the ‘bioaccessible’ proportion through an exposure pathway. Recent work has explored this bioaccessible fraction for a range of PTEs across Northern Ireland. In this study the compositional nature of the multivariate geochemical PTE variables and bioaccessible data is explored to augment the investigation into the potential relationship between PTEs, bioaccessibility and disease data.
Resumo:
This analysis was stimulated by the real data analysis problem of household expenditure data. The full dataset contains expenditure data for a sample of 1224 households. The expenditure is broken down at 2 hierarchical levels: 9 major levels (e.g. housing, food, utilities etc.) and 92 minor levels. There are also 5 factors and 5 covariates at the household level. Not surprisingly, there are a small number of zeros at the major level, but many zeros at the minor level. The question is how best to model the zeros. Clearly, models that try to add a small amount to the zero terms are not appropriate in general as at least some of the zeros are clearly structural, e.g. alcohol/tobacco for households that are teetotal. The key question then is how to build suitable conditional models. For example, is the sub-composition of spending excluding alcohol/tobacco similar for teetotal and non-teetotal households? In other words, we are looking for sub-compositional independence. Also, what determines whether a household is teetotal? Can we assume that it is independent of the composition? In general, whether teetotal will clearly depend on the household level variables, so we need to be able to model this dependence. The other tricky question is that with zeros on more than one component, we need to be able to model dependence and independence of zeros on the different components. Lastly, while some zeros are structural, others may not be, for example, for expenditure on durables, it may be chance as to whether a particular household spends money on durables within the sample period. This would clearly be distinguishable if we had longitudinal data, but may still be distinguishable by looking at the distribution, on the assumption that random zeros will usually be for situations where any non-zero expenditure is not small. While this analysis is based on around economic data, the ideas carry over to many other situations, including geological data, where minerals may be missing for structural reasons (similar to alcohol), or missing because they occur only in random regions which may be missed in a sample (similar to the durables)
Resumo:
Precision of released figures is not only an important quality feature of official statistics, it is also essential for a good understanding of the data. In this paper we show a case study of how precision could be conveyed if the multivariate nature of data has to be taken into account. In the official release of the Swiss earnings structure survey, the total salary is broken down into several wage components. We follow Aitchison's approach for the analysis of compositional data, which is based on logratios of components. We first present diferent multivariate analyses of the compositional data whereby the wage components are broken down by economic activity classes. Then we propose a number of ways to assess precision
Resumo:
It is well known that regression analyses involving compositional data need special attention because the data are not of full rank. For a regression analysis where both the dependent and independent variable are components we propose a transformation of the components emphasizing their role as dependent and independent variables. A simple linear regression can be performed on the transformed components. The regression line can be depicted in a ternary diagram facilitating the interpretation of the analysis in terms of components. An exemple with time-budgets illustrates the method and the graphical features
Resumo:
The log-ratio methodology makes available powerful tools for analyzing compositional data. Nevertheless, the use of this methodology is only possible for those data sets without null values. Consequently, in those data sets where the zeros are present, a previous treatment becomes necessary. Last advances in the treatment of compositional zeros have been centered especially in the zeros of structural nature and in the rounded zeros. These tools do not contemplate the particular case of count compositional data sets with null values. In this work we deal with \count zeros" and we introduce a treatment based on a mixed Bayesian-multiplicative estimation. We use the Dirichlet probability distribution as a prior and we estimate the posterior probabilities. Then we apply a multiplicative modi¯cation for the non-zero values. We present a case study where this new methodology is applied. Key words: count data, multiplicative replacement, composition, log-ratio analysis
Resumo:
Geochemical data that is derived from the whole or partial analysis of various geologic materials represent a composition of mineralogies or solute species. Minerals are composed of structured relationships between cations and anions which, through atomic and molecular forces, keep the elements bound in specific configurations. The chemical compositions of minerals have specific relationships that are governed by these molecular controls. In the case of olivine, there is a well-defined relationship between Mn-Fe-Mg with Si. Balances between the principal elements defining olivine composition and other significant constituents in the composition (Al, Ti) have been defined, resulting in a near-linear relationship between the logarithmic relative proportion of Si versus (MgMnFe) and Mg versus (MnFe), which is typically described but poorly illustrated in the simplex. The present contribution corresponds to ongoing research, which attempts to relate stoichiometry and geochemical data using compositional geometry. We describe here the approach by which stoichiometric relationships based on mineralogical constraints can be accounted for in the space of simplicial coordinates using olivines as an example. Further examples for other mineral types (plagioclases and more complex minerals such as clays) are needed. Issues that remain to be dealt with include the reduction of a bulk chemical composition of a rock comprised of several minerals from which appropriate balances can be used to describe the composition in a realistic mineralogical framework. The overall objective of our research is to answer the question: In the cases where the mineralogy is unknown, are there suitable proxies that can be substituted? Kew words: Aitchison geometry, balances, mineral composition, oxides
Resumo:
This paper generalizes the HEGY-type test to detect seasonal unit roots in data at any frequency, based on the seasonal unit root tests in univariate time series by Hylleberg, Engle, Granger and Yoo (1990). We introduce the seasonal unit roots at first, and then derive the mechanism of the HEGY-type test for data with any frequency. Thereafter we provide the asymptotic distributions of our test statistics when different test regressions are employed. We find that the F-statistics for testing conjugation unit roots have the same asymptotic distributions. Then we compute the finite-sample and asymptotic critical values for daily and hourly data by a Monte Carlo method. The power and size properties of our test for hourly data is investigated, and we find that including lag augmentations in auxiliary regression without lag elimination have the smallest size distortion and tests with seasonal dummies included in auxiliary regression have more power than the tests without seasonal dummies. At last we apply the our test to hourly wind power production data in Sweden and shows there are no seasonal unit roots in the series.
Resumo:
This paper explores a method of comparative analysis and classification of data through perceived design affordances. Included is discussion about the musical potential of data forms that are derived through eco-structural analysis of musical features inherent in audio recordings of natural sounds. A system of classification of these forms is proposed based on their structural contours. The classifications include four primitive types; steady, iterative, unstable and impulse. The classification extends previous taxonomies used to describe the gestural morphology of sound. The methods presented are used to provide compositional support for eco-structuralism.
Resumo:
BACKGROUND: In Bangladesh, poor infant and young child feeding practices are contributing to the burden of infectious diseases and malnutrition. Objective. To estimate the determinants of selected feeding practices and key indicators of breastfeeding and complementary feeding in Bangladesh. METHODS: The sample included 2482 children aged 0 to 23 months from the Bangladesh Demographic and Health Survey of 2004. The World Health Organization (WHO)-recommended infant and young child feeding indicators were estimated, and selected feeding indicators were examined against a set of individual-, household-, and community-level variables using univariate and multivariate analyses. RESULTS: Only 27.5% of mothers initiated breastfeeding within the first hour after birth, 99.9% had ever breastfed their infants, 97.3% were currently breastfeeding, and 22.4% were currently bottle-feeding. Among infants under 6 months of age, 42.5% were exclusively breastfed, and among those aged 6 to 9 months, 62.3% received complementary foods in addition to breastmilk. Among the risk factors for an infant not being exclusively breastfed were higher socioeconomic status, higher maternal education, and living in the Dhaka region. Higher birth order and female sex were associated with increased rates of exclusive breastfeeding of infants under 6 months of age. The risk factors for bottle-feeding were similar and included having a partner with a higher educational level (OR = 2.17), older maternal age (OR for age > or = 35 years = 2.32), and being in the upper wealth quintiles (OR for the richest = 3.43). Urban mothers were at higher risk for not initiating breastfeeding within the first hour after birth (OR = 1.61). Those who made three to six visits to the antenatal clinic were at lower risk for not initiating breastfeeding within the first hour (OR = 0.61). The rate of initiating breastfeeding within the first hour was higher in mothers from richer households (OR = 0.37). CONCLUSIONS: Most breastfeeding indicators in Bangladesh were below acceptable levels. Breastfeeding promotion programs in Bangladesh need nationwide application because of the low rates of appropriate infant feeding indicators, but they should also target women who have the main risk factors, i.e., working mothers living in urban areas (particularly in Dhaka).
Resumo:
Background: Poor feeding practices in early childhood contribute to the burden of childhood malnutrition and morbidity. Objective: To estimate the key indicators of breastfeeding and complementary feeding and the determinants of selected feeding practices in Sri Lanka. Methods: The sample consisted of 1,127 children aged 0 to 23 months from the Sri Lanka Demographic and Health Survey 2000. The key infant feeding indicators were estimated and selected indicators were examined against a set of individual-, household-, and community- level variables using univariate and multivariate analyses. Results: Breastfeeding was initiated within the first hour after birth in 56.3% of infants, 99.7% had ever been breastfed, 85.0% were currently being breastfed, and 27.2% were being bottle-fed. Of infants under 6 months of age, 60.6% were fully breastfed, and of those aged 6 to 9 months, 93.4% received complementary foods. The likelihood of not initiating breastfeeding within the first hour after birth was higher for mothers who underwent cesarean delivery (OR = 3.23) and those who were not visited by a Public Health Midwife at home during pregnancy (OR = 1.81). The rate of full breastfeeding was significantly lower among mothers who did not receive postnatal home visits by a Public Health Midwife. Bottlefeeding rates were higher among infants whose mothers had ever been employed (OR = 1.86), lived in a metropolitan area (OR = 3.99), or lived in the South-Central Hill country (OR = 3.11) and were lower among infants of mothers with secondary education (OR = 0.27). Infants from the urban (OR = 8.06) and tea estate (OR = 12.63) sectors were less likely to receive timely complementary feeding than rural infants. Conclusions: Antenatal and postnatal contacts with Public Health Midwives were associated with improved breastfeeding practices. Breastfeeding promotion strategies should specifically focus on the estate and urban or metropolitan communities.
Resumo:
Background: Childhood undernutrition and mortality are high in Nepal, and therefore interventions on infant and young child feeding practices deserve high priority. Objective. To estimate infant and young child feeding indicators and the determinants of selected feeding practices. Methods: The sample consisted of 1,906 children aged 0 to 23 months from the Demographic and Health Survey 2006. Selected indicators were examined against a set of variables using univariate and multivariate analyses. Results. Breastfeeding was initiated within the first hour after birth in 35.4% of children, 99.5% were ever breastfed, 98.1% were currently breastfed, and 3.5% were bottle-fed. The rate of exclusive breastfeeding among infants under 6 months of age was 53.1%, and the rate of timely complementary feeding among those 6 to 9 months of age was 74.7%. Mothers who made antenatal clinic visits were at a higher risk for no exclusive breastfeeding than those who made no visits. Mothers who lived in the mountains were more likely to initiate breastfeeding within 1 hour after birth and to introduce complementary feeding at 6 to 9 months of age, but less likely to exclusively breastfeed. Cesarean deliveries were associated with delay in timely initiation of breastfeeding. Higher rates of complementary feeding at 6 to 9 months were also associated with mothers with better education and those above 35 years of age. Risk factors for bottle-feeding included living in urban areas and births attended by trained health personnel. Conclusions: Most breastfeeding indicators in Nepal are below the expected levels to achieve a substantial reduction in child mortality. Breastfeeding promotion strategies should specifically target mothers who have more contact with the health care delivery system, while programs targeting the entire community should be continued.