968 resultados para Longitudinal Data Analysis and Time Series
Resumo:
The quality of water level time series data strongly varies with periods of high and low quality sensor data. In this paper we are presenting the processing steps which were used to generate high quality water level data from water pressure measured at the Time Series Station (TSS) Spiekeroog. The TSS is positioned in a tidal inlet between the islands of Spiekeroog and Langeoog in the East Frisian Wadden Sea (southern North Sea). The processing steps will cover sensor drift, outlier identification, interpolation of data gaps and quality control. A central step is the removal of outliers. For this process an absolute threshold of 0.25m/10min was selected which still keeps the water level increase and decrease during extreme events as shown during the quality control process. A second important feature of data processing is the interpolation of gappy data which is accomplished with a high certainty of generating trustworthy data. Applying these methods a 10 years dataset (December 2002-December 2012) of water level information at the TSS was processed resulting in a seven year time series (2005-2011).
Resumo:
The paper investigates a Bayesian hierarchical model for the analysis of categorical longitudinal data from a large social survey of immigrants to Australia. Data for each subject are observed on three separate occasions, or waves, of the survey. One of the features of the data set is that observations for some variables are missing for at least one wave. A model for the employment status of immigrants is developed by introducing, at the first stage of a hierarchical model, a multinomial model for the response and then subsequent terms are introduced to explain wave and subject effects. To estimate the model, we use the Gibbs sampler, which allows missing data for both the response and the explanatory variables to be imputed at each iteration of the algorithm, given some appropriate prior distributions. After accounting for significant covariate effects in the model, results show that the relative probability of remaining unemployed diminished with time following arrival in Australia.
Resumo:
This data set comprises time series of aboveground community plant biomass (Sown plant community, Weed plant community, Dead plant material, and Unidentified plant material; all measured in biomass as dry weight) and species-specific biomass from the sown species of several experiments at the field site of a large grassland biodiversity experiment (the Jena Experiment; see further details below). Aboveground community biomass was normally harvested twice a year just prior to mowing (during peak standing biomass twice a year, generally in May and August; in 2002 only once in September) on all experimental plots in the Jena Experiment. This was done by clipping the vegetation at 3 cm above ground in up to four rectangles of 0.2 x 0.5 m per large plot. The location of these rectangles was assigned by random selection of new coordinates every year within the core area of the plots. The positions of the rectangles within plots were identical for all plots. The harvested biomass was sorted into categories: individual species for the sown plant species, weed plant species (species not sown at the particular plot), detached dead plant material (i.e., dead plant material in the data file), and remaining plant material that could not be assigned to any category (i.e., unidentified plant material in the data file). All biomass was dried to constant weight (70°C, >= 48 h) and weighed. Sown plant community biomass was calculated as the sum of the biomass of the individual sown species. The data for individual samples and the mean over samples for the biomass measures on the community level are given. Overall, analyses of the community biomass data have identified species richness as well as functional group composition as important drivers of a positive biodiversity-productivity relationship. The following series of datasets are contained in this collection: 1. Plant biomass form the Main Experiment: In the Main Experiment, 82 grassland plots of 20 x 20 m were established from a pool of 60 species belonging to four functional groups (grasses, legumes, tall and small herbs). In May 2002, varying numbers of plant species from this species pool were sown into the plots to create a gradient of plant species richness (1, 2, 4, 8, 16 and 60 species) and functional richness (1, 2, 3, 4 functional groups). 2. Plant biomass from the Dominance Experiment: In the Dominance Experiment, 206 grassland plots of 3.5 x 3.5 m were established from a pool of 9 species that can be dominant in semi-natural grassland communities of the study region. In May 2002, varying numbers of plant species from this species pool were sown into the plots to create a gradient of plant species richness (1, 2, 3, 4, 6, and 9 species). 3. Plant biomass from the monoculture plots: In the monoculture plots the sown plant community contains only a single species per plot and this species is a different one for each plot. Which species has been sown in which plot is stated in the plot information table for monocultures (see further details below). The monoculture plots of 3.5 x 3.5 m were established for all of the 60 plant species of the Jena Experiment species pool with two replicates per species like the other experiments in May 2002. All plots were maintained by bi-annual weeding and mowing.
Resumo:
When it comes to information sets in real life, often pieces of the whole set may not be available. This problem can find its origin in various reasons, describing therefore different patterns. In the literature, this problem is known as Missing Data. This issue can be fixed in various ways, from not taking into consideration incomplete observations, to guessing what those values originally were, or just ignoring the fact that some values are missing. The methods used to estimate missing data are called Imputation Methods. The work presented in this thesis has two main goals. The first one is to determine whether any kind of interactions exists between Missing Data, Imputation Methods and Supervised Classification algorithms, when they are applied together. For this first problem we consider a scenario in which the databases used are discrete, understanding discrete as that it is assumed that there is no relation between observations. These datasets underwent processes involving different combina- tions of the three components mentioned. The outcome showed that the missing data pattern strongly influences the outcome produced by a classifier. Also, in some of the cases, the complex imputation techniques investigated in the thesis were able to obtain better results than simple ones. The second goal of this work is to propose a new imputation strategy, but this time we constrain the specifications of the previous problem to a special kind of datasets, the multivariate Time Series. We designed new imputation techniques for this particular domain, and combined them with some of the contrasted strategies tested in the pre- vious chapter of this thesis. The time series also were subjected to processes involving missing data and imputation to finally propose an overall better imputation method. In the final chapter of this work, a real-world example is presented, describing a wa- ter quality prediction problem. The databases that characterized this problem had their own original latent values, which provides a real-world benchmark to test the algorithms developed in this thesis.
Resumo:
Intuitively, music has both predictable and unpredictable components. In this work we assess this qualitative statement in a quantitative way using common time series models fitted to state-of-the-art music descriptors. These descriptors cover different musical facets and are extracted from a large collection of real audio recordings comprising a variety of musical genres. Our findings show that music descriptor time series exhibit a certain predictability not only for short time intervals, but also for mid-term and relatively long intervals. This fact is observed independently of the descriptor, musical facet and time series model we consider. Moreover, we show that our findings are not only of theoretical relevance but can also have practical impact. To this end we demonstrate that music predictability at relatively long time intervals can be exploited in a real-world application, namely the automatic identification of cover songs (i.e. different renditions or versions of the same musical piece). Importantly, this prediction strategy yields a parameter-free approach for cover song identification that is substantially faster, allows for reduced computational storage and still maintains highly competitive accuracies when compared to state-of-the-art systems.
Resumo:
Developments in the statistical analysis of compositional data over the last two decades have made possible a much deeper exploration of the nature of variability, and the possible processes associated with compositional data sets from many disciplines. In this paper we concentrate on geochemical data sets. First we explain how hypotheses of compositional variability may be formulated within the natural sample space, the unit simplex, including useful hypotheses of subcompositional discrimination and specific perturbational change. Then we develop through standard methodology, such as generalised likelihood ratio tests, statistical tools to allow the systematic investigation of a complete lattice of such hypotheses. Some of these tests are simple adaptations of existing multivariate tests but others require special construction. We comment on the use of graphical methods in compositional data analysis and on the ordination of specimens. The recent development of the concept of compositional processes is then explained together with the necessary tools for a staying- in-the-simplex approach, namely compositional singular value decompositions. All these statistical techniques are illustrated for a substantial compositional data set, consisting of 209 major-oxide and rare-element compositions of metamorphosed limestones from the Northeast and Central Highlands of Scotland. Finally we point out a number of unresolved problems in the statistical analysis of compositional processes
Resumo:
A presentation on the collection and analysis of data taken from SOES 6018. This module aims to ensure that MSc Oceanography, MSc Marine Science, Policy & Law and MSc Marine Resource Management students are equipped with the skills they need to function as professional marine scientists, in addition to / in conjuction with the skills training in other MSc modules. The module covers training in fieldwork techniques, communication & research skills, IT & data analysis and professional development.
Resumo:
In this review paper we collect several results about copula-based models, especially concerning regression models, by focusing on some insurance applications. (C) 2009 Elsevier B.V. All rights reserved.
Resumo:
Cluster randomized trials (CRTs) use as the unit of randomization clusters, which are usually defined as a collection of individuals sharing some common characteristics. Common examples of clusters include entire dental practices, hospitals, schools, school classes, villages, and towns. Additionally, several measurements (repeated measurements) taken on the same individual at different time points are also considered to be clusters. In dentistry, CRTs are applicable as patients may be treated as clusters containing several individual teeth. CRTs require certain methodological procedures during sample calculation, randomization, data analysis, and reporting, which are often ignored in dental research publications. In general, due to similarity of the observations within clusters, each individual within a cluster provides less information compared with an individual in a non-clustered trial. Therefore, clustered designs require larger sample sizes compared with non-clustered randomized designs, and special statistical analyses that account for the fact that observations within clusters are correlated. It is the purpose of this article to highlight with relevant examples the important methodological characteristics of cluster randomized designs as they may be applied in orthodontics and to explain the problems that may arise if clustered observations are erroneously treated and analysed as independent (non-clustered).