950 resultados para Longitudinal Data Analysis and Time Series
Resumo:
This data set comprises time series of aboveground community plant biomass (Sown plant community, Weed plant community, Dead plant material, and Unidentified plant material; all measured in biomass as dry weight) and species-specific biomass from the sown species of several experiments at the field site of a large grassland biodiversity experiment (the Jena Experiment; see further details below). Aboveground community biomass was normally harvested twice a year just prior to mowing (during peak standing biomass twice a year, generally in May and August; in 2002 only once in September) on all experimental plots in the Jena Experiment. This was done by clipping the vegetation at 3 cm above ground in up to four rectangles of 0.2 x 0.5 m per large plot. The location of these rectangles was assigned by random selection of new coordinates every year within the core area of the plots. The positions of the rectangles within plots were identical for all plots. The harvested biomass was sorted into categories: individual species for the sown plant species, weed plant species (species not sown at the particular plot), detached dead plant material (i.e., dead plant material in the data file), and remaining plant material that could not be assigned to any category (i.e., unidentified plant material in the data file). All biomass was dried to constant weight (70°C, >= 48 h) and weighed. Sown plant community biomass was calculated as the sum of the biomass of the individual sown species. The data for individual samples and the mean over samples for the biomass measures on the community level are given. Overall, analyses of the community biomass data have identified species richness as well as functional group composition as important drivers of a positive biodiversity-productivity relationship. The following series of datasets are contained in this collection: 1. Plant biomass form the Main Experiment: In the Main Experiment, 82 grassland plots of 20 x 20 m were established from a pool of 60 species belonging to four functional groups (grasses, legumes, tall and small herbs). In May 2002, varying numbers of plant species from this species pool were sown into the plots to create a gradient of plant species richness (1, 2, 4, 8, 16 and 60 species) and functional richness (1, 2, 3, 4 functional groups). 2. Plant biomass from the Dominance Experiment: In the Dominance Experiment, 206 grassland plots of 3.5 x 3.5 m were established from a pool of 9 species that can be dominant in semi-natural grassland communities of the study region. In May 2002, varying numbers of plant species from this species pool were sown into the plots to create a gradient of plant species richness (1, 2, 3, 4, 6, and 9 species). 3. Plant biomass from the monoculture plots: In the monoculture plots the sown plant community contains only a single species per plot and this species is a different one for each plot. Which species has been sown in which plot is stated in the plot information table for monocultures (see further details below). The monoculture plots of 3.5 x 3.5 m were established for all of the 60 plant species of the Jena Experiment species pool with two replicates per species like the other experiments in May 2002. All plots were maintained by bi-annual weeding and mowing.
Resumo:
When it comes to information sets in real life, often pieces of the whole set may not be available. This problem can find its origin in various reasons, describing therefore different patterns. In the literature, this problem is known as Missing Data. This issue can be fixed in various ways, from not taking into consideration incomplete observations, to guessing what those values originally were, or just ignoring the fact that some values are missing. The methods used to estimate missing data are called Imputation Methods. The work presented in this thesis has two main goals. The first one is to determine whether any kind of interactions exists between Missing Data, Imputation Methods and Supervised Classification algorithms, when they are applied together. For this first problem we consider a scenario in which the databases used are discrete, understanding discrete as that it is assumed that there is no relation between observations. These datasets underwent processes involving different combina- tions of the three components mentioned. The outcome showed that the missing data pattern strongly influences the outcome produced by a classifier. Also, in some of the cases, the complex imputation techniques investigated in the thesis were able to obtain better results than simple ones. The second goal of this work is to propose a new imputation strategy, but this time we constrain the specifications of the previous problem to a special kind of datasets, the multivariate Time Series. We designed new imputation techniques for this particular domain, and combined them with some of the contrasted strategies tested in the pre- vious chapter of this thesis. The time series also were subjected to processes involving missing data and imputation to finally propose an overall better imputation method. In the final chapter of this work, a real-world example is presented, describing a wa- ter quality prediction problem. The databases that characterized this problem had their own original latent values, which provides a real-world benchmark to test the algorithms developed in this thesis.
Resumo:
Data analysis sessions are a common feature of discourse analytic communities, often involving participants with varying levels of expertise to those with significant expertise. Learning how to do data analysis and working with transcripts, however, are often new experiences for doctoral candidates within the social sciences. While many guides to doctoral education focus on procedures associated with data analysis (Heath, Hindmarsh, & Luff, 2010; McHoul & Rapley, 2001; Silverman, 2011; Wetherall, Taylor, & Yates, 2001), the in situ practices of doing data analysis are relatively undocumented. This chapter has been collaboratively written by members of a special interest research group, the Transcript Analysis Group (TAG), who meet regularly to examine transcripts representing audio- and video-recorded interactional data. Here, we investigate our own actual interactional practices and participation in this group where each member is both analyst and participant. We particularly focus on the pedagogic practices enacted in the group through investigating how members engage in the scholarly practice of data analysis. A key feature of talk within the data sessions is that members work collaboratively to identify and discuss ‘noticings’ from the audio-recorded and transcribed talk being examined, produce candidate analytic observations based on these discussions, and evaluate these observations. Our investigation of how talk constructs social practices in these sessions shows that participants move fluidly between actions that demonstrate pedagogic practices and expertise. Within any one session, members can display their expertise as analysts and, at the same time, display that they have gained an understanding that they did not have before. We take an ethnomethodological position that asks, ‘what’s going on here?’ in the data analysis session. By observing the in situ practices in fine-grained detail, we show how members participate in the data analysis sessions and make sense of a transcript.
Resumo:
Despite being used since 1976, Delusions-Symptoms-States-Inventory/states of Anxiety and Depression (DSSI/sAD) has not yet been validated for use among people with diabetes. The aim of this study was to examine the validity of the personal disturbance scale (DSSI/sAD) among women with diabetes using Mater-University of Queensland Study of Pregnancy (MUSP) cohort data. The DSSI subscales were compared against DSM-IV disorders, the Mental Component Score of the Short Form 36 (SF-36 MCS), and Center for Epidemiologic Studies Depression Scale (CES-D). Factor analyses, odds ratios, receiver operating characteristic (ROC) analyses and diagnostic efficiency tests were used to report findings. Exploratory factor analysis and fit indices confirmed the hypothesized two-factor model of DSSI/sAD. We found significant variations in the DSSI/sAD domain scores that could be explained by CES-D (DSSI-Anxiety: 55%, DSSI-Depression: 46%) and SF-36 MCS (DSSI-Anxiety: 66%, DSSI-Depression: 56%). The DSSI subscales predicted DSM-IV diagnosed depression and anxiety disorders. The ROC analyses show that although the DSSI symptoms and DSM-IV disorders were measured concurrently the estimates of concordance remained only moderate. The findings demonstrate that the DSSI/sAD items have similar relationships to one another in both the diabetes and non-diabetes data sets which therefore suggest that they have similar interpretations.
Resumo:
The mapping and geospatial analysis of benthic environments are multidisciplinary tasks that have become more accessible in recent years because of advances in technology and cost reductions in survey systems. The complex relationships that exist among physical, biological, and chemical seafloor components require advanced, integrated analysis techniques to enable scientists and others to visualize patterns and, in so doing, allow inferences to be made about benthic processes. Effective mapping, analysis, and visualization of marine habitats are particularly important because the subtidal seafloor environment is not readily viewed directly by eye. Research in benthic environments relies heavily, therefore, on remote sensing techniques to collect effective data. Because many benthic scientists are not mapping professionals, they may not adequately consider the links between data collection, data analysis, and data visualization. Projects often start with clear goals, but may be hampered by the technical details and skills required for maintaining data quality through the entire process from collection through analysis and presentation. The lack of technical understanding of the entire data handling process can represent a significant impediment to success. While many benthic mapping efforts have detailed their methodology as it relates to the overall scientific goals of a project, only a few published papers and reports focus on the analysis and visualization components (Paton et al. 1997, Weihe et al. 1999, Basu and Saxena 1999, Bruce et al. 1997). In particular, the benthic mapping literature often briefly describes data collection and analysis methods, but fails to provide sufficiently detailed explanation of particular analysis techniques or display methodologies so that others can employ them. In general, such techniques are in large part guided by the data acquisition methods, which can include both aerial and water-based remote sensing methods to map the seafloor without physical disturbance, as well as physical sampling methodologies (e.g., grab or core sampling). The terms benthic mapping and benthic habitat mapping are often used synonymously to describe seafloor mapping conducted for the purpose of benthic habitat identification. There is a subtle yet important difference, however, between general benthic mapping and benthic habitat mapping. The distinction is important because it dictates the sequential analysis and visualization techniques that are employed following data collection. In this paper general seafloor mapping for identification of regional geologic features and morphology is defined as benthic mapping. Benthic habitat mapping incorporates the regional scale geologic information but also includes higher resolution surveys and analysis of biological communities to identify the biological habitats. In addition, this paper adopts the definition of habitats established by Kostylev et al. (2001) as a “spatially defined area where the physical, chemical, and biological environment is distinctly different from the surrounding environment.” (PDF contains 31 pages)
Resumo:
EXTRACT (SEE PDF FOR FULL ABSTRACT): Zooplankton biomass and species composition have been sampled since 1985 at a set of standard locations off Vancouver Island. From these data, I have estimated multi-year average seasonal cycles and time series of anomalies from these averages.
Resumo:
Developments in the statistical analysis of compositional data over the last two decades have made possible a much deeper exploration of the nature of variability, and the possible processes associated with compositional data sets from many disciplines. In this paper we concentrate on geochemical data sets. First we explain how hypotheses of compositional variability may be formulated within the natural sample space, the unit simplex, including useful hypotheses of subcompositional discrimination and specific perturbational change. Then we develop through standard methodology, such as generalised likelihood ratio tests, statistical tools to allow the systematic investigation of a complete lattice of such hypotheses. Some of these tests are simple adaptations of existing multivariate tests but others require special construction. We comment on the use of graphical methods in compositional data analysis and on the ordination of specimens. The recent development of the concept of compositional processes is then explained together with the necessary tools for a staying- in-the-simplex approach, namely compositional singular value decompositions. All these statistical techniques are illustrated for a substantial compositional data set, consisting of 209 major-oxide and rare-element compositions of metamorphosed limestones from the Northeast and Central Highlands of Scotland. Finally we point out a number of unresolved problems in the statistical analysis of compositional processes
Resumo:
A presentation on the collection and analysis of data taken from SOES 6018. This module aims to ensure that MSc Oceanography, MSc Marine Science, Policy & Law and MSc Marine Resource Management students are equipped with the skills they need to function as professional marine scientists, in addition to / in conjuction with the skills training in other MSc modules. The module covers training in fieldwork techniques, communication & research skills, IT & data analysis and professional development.
Resumo:
In this review paper we collect several results about copula-based models, especially concerning regression models, by focusing on some insurance applications. (C) 2009 Elsevier B.V. All rights reserved.
Resumo:
Cluster randomized trials (CRTs) use as the unit of randomization clusters, which are usually defined as a collection of individuals sharing some common characteristics. Common examples of clusters include entire dental practices, hospitals, schools, school classes, villages, and towns. Additionally, several measurements (repeated measurements) taken on the same individual at different time points are also considered to be clusters. In dentistry, CRTs are applicable as patients may be treated as clusters containing several individual teeth. CRTs require certain methodological procedures during sample calculation, randomization, data analysis, and reporting, which are often ignored in dental research publications. In general, due to similarity of the observations within clusters, each individual within a cluster provides less information compared with an individual in a non-clustered trial. Therefore, clustered designs require larger sample sizes compared with non-clustered randomized designs, and special statistical analyses that account for the fact that observations within clusters are correlated. It is the purpose of this article to highlight with relevant examples the important methodological characteristics of cluster randomized designs as they may be applied in orthodontics and to explain the problems that may arise if clustered observations are erroneously treated and analysed as independent (non-clustered).