968 resultados para Longitudinal Data Analysis and Time Series
Resumo:
In this paper we develop an evolutionary kernel-based time update algorithm to recursively estimate subset discrete lag models (including fullorder models) with a forgetting factor and a constant term, using the exactwindowed case. The algorithm applies to causality detection when the true relationship occurs with a continuous or a random delay. We then demonstrate the use of the proposed evolutionary algorithm to study the monthly mutual fund data, which come from the 'CRSP Survivor-bias free US Mutual Fund Database'. The results show that the NAV is an influential player on the international stage of global bond and stock markets.
Resumo:
We demonstrate that the process of generating smooth transitions Call be viewed as a natural result of the filtering operations implied in the generation of discrete-time series observations from the sampling of data from an underlying continuous time process that has undergone a process of structural change. In order to focus discussion, we utilize the problem of estimating the location of abrupt shifts in some simple time series models. This approach will permit its to address salient issues relating to distortions induced by the inherent aggregation associated with discrete-time sampling of continuous time processes experiencing structural change, We also address the issue of how time irreversible structures may be generated within the smooth transition processes. (c) 2005 Elsevier Inc. All rights reserved.
Resumo:
The consensus from published studies is that plasma lipids are each influenced by genetic factors, and that this contributes to genetic variation in risk of cardiovascular disease. Heritability estimates for lipids and lipoproteins are in the range .48 to .87, when measured once per study participant. However, this ignores the confounding effects of biological variation measurement error and ageing, and a truer assessment of genetic effects on cardiovascular risk may be obtained from analysis of longitudinal twin or family data. We have analyzed information on plasma high-density lipoprotein (HDL) and low-density lipoprotein (LDL) cholesterol, and triglycerides, from 415 adult twins who provided blood on two to five occasions over 10 to 17 years. Multivariate modeling of genetic and environmental contributions to variation within and across occasions was used to assess the extent to which genetic and environmental factors have long-term effects on plasma lipids. Results indicated that more than one genetic factor influenced HDL and LDL components of cholesterol, and triglycerides over time in all studies. Nonshared environmental factors did not have significant long-term effects except for HDL. We conclude that when heritability of lipid risk factors is estimated on only one occasion, the existence of biological variation and measurement errors leads to underestimation of the importance of genetic factors as a cause of variation in long-term risk within the population. In addition our data suggest that different genes may affect the risk profile at different ages.
Resumo:
We present in this paper ideas to tackle the problem of analysing and forecasting nonstationary time series within the financial domain. Accepting the stochastic nature of the underlying data generator we assume that the evolution of the generator's parameters is restricted on a deterministic manifold. Therefore we propose methods for determining the characteristics of the time-localised distribution. Starting with the assumption of a static normal distribution we refine this hypothesis according to the empirical results obtained with the methods anc conclude with the indication of a dynamic non-Gaussian behaviour with varying dependency for the time series under consideration.
Resumo:
Amongst all the objectives in the study of time series, uncovering the dynamic law of its generation is probably the most important. When the underlying dynamics are not available, time series modelling consists of developing a model which best explains a sequence of observations. In this thesis, we consider hidden space models for analysing and describing time series. We first provide an introduction to the principal concepts of hidden state models and draw an analogy between hidden Markov models and state space models. Central ideas such as hidden state inference or parameter estimation are reviewed in detail. A key part of multivariate time series analysis is identifying the delay between different variables. We present a novel approach for time delay estimating in a non-stationary environment. The technique makes use of hidden Markov models and we demonstrate its application for estimating a crucial parameter in the oil industry. We then focus on hybrid models that we call dynamical local models. These models combine and generalise hidden Markov models and state space models. Probabilistic inference is unfortunately computationally intractable and we show how to make use of variational techniques for approximating the posterior distribution over the hidden state variables. Experimental simulations on synthetic and real-world data demonstrate the application of dynamical local models for segmenting a time series into regimes and providing predictive distributions.
Resumo:
This thesis addresses the problem of information hiding in low dimensional digital data focussing on issues of privacy and security in Electronic Patient Health Records (EPHRs). The thesis proposes a new security protocol based on data hiding techniques for EPHRs. This thesis contends that embedding of sensitive patient information inside the EPHR is the most appropriate solution currently available to resolve the issues of security in EPHRs. Watermarking techniques are applied to one-dimensional time series data such as the electroencephalogram (EEG) to show that they add a level of confidence (in terms of privacy and security) in an individual’s diverse bio-profile (the digital fingerprint of an individual’s medical history), ensure belief that the data being analysed does indeed belong to the correct person, and also that it is not being accessed by unauthorised personnel. Embedding information inside single channel biomedical time series data is more difficult than the standard application for images due to the reduced redundancy. A data hiding approach which has an in built capability to protect against illegal data snooping is developed. The capability of this secure method is enhanced by embedding not just a single message but multiple messages into an example one-dimensional EEG signal. Embedding multiple messages of similar characteristics, for example identities of clinicians accessing the medical record helps in creating a log of access while embedding multiple messages of dissimilar characteristics into an EPHR enhances confidence in the use of the EPHR. The novel method of embedding multiple messages of both similar and dissimilar characteristics into a single channel EEG demonstrated in this thesis shows how this embedding of data boosts the implementation and use of the EPHR securely.
Resumo:
In this paper, we discuss some practical implications for implementing adaptable network algorithms applied to non-stationary time series problems. Two real world data sets, containing electricity load demands and foreign exchange market prices, are used to test several different methods, ranging from linear models with fixed parameters, to non-linear models which adapt both parameters and model order on-line. Training with the extended Kalman filter, we demonstrate that the dynamic model-order increment procedure of the resource allocating RBF network (RAN) is highly sensitive to the parameters of the novelty criterion. We investigate the use of system noise for increasing the plasticity of the Kalman filter training algorithm, and discuss the consequences for on-line model order selection. The results of our experiments show that there are advantages to be gained in tracking real world non-stationary data through the use of more complex adaptive models.
Resumo:
An application of the heterogeneous variables system prediction method to solving the time series analysis problem with respect to the sample size is considered in this work. It is created a logical-and-probabilistic correlation from the logical decision function class. Two ways is considered. When the information about event is kept safe in the process, and when it is kept safe in depending process.
Resumo:
* The work is supported by RFBR, grant 04-01-00858-a
Resumo:
2000 Mathematics Subject Classification: 62M20, 62M10, 62-07.
Resumo:
This paper presents the results of our data mining study of Pb-Zn (lead-zinc) ore assay records from a mine enterprise in Bulgaria. We examined the dataset, cleaned outliers, visualized the data, and created dataset statistics. A Pb-Zn cluster data mining model was created for segmentation and prediction of Pb-Zn ore assay data. The Pb-Zn cluster data model consists of five clusters and DMX queries. We analyzed the Pb-Zn cluster content, size, structure, and characteristics. The set of the DMX queries allows for browsing and managing the clusters, as well as predicting ore assay records. A testing and validation of the Pb-Zn cluster data mining model was developed in order to show its reasonable accuracy before beingused in a production environment. The Pb-Zn cluster data mining model can be used for changes of the mine grinding and floatation processing parameters in almost real-time, which is important for the efficiency of the Pb-Zn ore beneficiation process. ACM Computing Classification System (1998): H.2.8, H.3.3.
Resumo:
The exponential growth of studies on the biological response to ocean acidification over the last few decades has generated a large amount of data. To facilitate data comparison, a data compilation hosted at the data publisher PANGAEA was initiated in 2008 and is updated on a regular basis (doi:10.1594/PANGAEA.149999). By January 2015, a total of 581 data sets (over 4 000 000 data points) from 539 papers had been archived. Here we present the developments of this data compilation five years since its first description by Nisumaa et al. (2010). Most of study sites from which data archived are still in the Northern Hemisphere and the number of archived data from studies from the Southern Hemisphere and polar oceans are still relatively low. Data from 60 studies that investigated the response of a mix of organisms or natural communities were all added after 2010, indicating a welcomed shift from the study of individual organisms to communities and ecosystems. The initial imbalance of considerably more data archived on calcification and primary production than on other processes has improved. There is also a clear tendency towards more data archived from multifactorial studies after 2010. For easier and more effective access to ocean acidification data, the ocean acidification community is strongly encouraged to contribute to the data archiving effort, and help develop standard vocabularies describing the variables and define best practices for archiving ocean acidification data.