927 resultados para Time-series Analysis


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Multi-output Gaussian processes provide a convenient framework for multi-task problems. An illustrative and motivating example of a multi-task problem is multi-region electrophysiological time-series data, where experimentalists are interested in both power and phase coherence between channels. Recently, the spectral mixture (SM) kernel was proposed to model the spectral density of a single task in a Gaussian process framework. This work develops a novel covariance kernel for multiple outputs, called the cross-spectral mixture (CSM) kernel. This new, flexible kernel represents both the power and phase relationship between multiple observation channels. The expressive capabilities of the CSM kernel are demonstrated through implementation of 1) a Bayesian hidden Markov model, where the emission distribution is a multi-output Gaussian process with a CSM covariance kernel, and 2) a Gaussian process factor analysis model, where factor scores represent the utilization of cross-spectral neural circuits. Results are presented for measured multi-region electrophysiological data.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In recent papers, Wied and his coauthors have introduced change-point procedures to detect and estimate structural breaks in the correlation between time series. To prove the asymptotic distribution of the test statistic and stopping time as well as the change-point estimation rate, they use an extended functional Delta method and assume nearly constant expectations and variances of the time series. In this thesis, we allow asymptotically infinitely many structural breaks in the means and variances of the time series. For this setting, we present test statistics and stopping times which are used to determine whether or not the correlation between two time series is and stays constant, respectively. Additionally, we consider estimates for change-points in the correlations. The employed nonparametric statistics depend on the means and variances. These (nuisance) parameters are replaced by estimates in the course of this thesis. We avoid assuming a fixed form of these estimates but rather we use "blackbox" estimates, i.e. we derive results under assumptions that these estimates fulfill. These results are supplement with examples. This thesis is organized in seven sections. In Section 1, we motivate the issue and present the mathematical model. In Section 2, we consider a posteriori and sequential testing procedures, and investigate convergence rates for change-point estimation, always assuming that the means and the variances of the time series are known. In the following sections, the assumptions of known means and variances are relaxed. In Section 3, we present the assumptions for the mean and variance estimates that we will use for the mean in Section 4, for the variance in Section 5, and for both parameters in Section 6. Finally, in Section 7, a simulation study illustrates the finite sample behaviors of some testing procedures and estimates.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Min/max autocorrelation factor analysis (MAFA) and dynamic factor analysis (DFA) are complementary techniques for analysing short (> 15-25 y), non-stationary, multivariate data sets. We illustrate the two techniques using catch rate (cpue) time-series (1982-2001) for 17 species caught during trawl surveys off Mauritania, with the NAO index, an upwelling index, sea surface temperature, and an index of fishing effort as explanatory variables. Both techniques gave coherent results, the most important common trend being a decrease in cpue during the latter half of the time-series, and the next important being an increase during the first half. A DFA model with SST and UPW as explanatory variables and two common trends gave good fits to most of the cpue time-series. (c) 2004 International Council for the Exploration of the Sea. Published by Elsevier Ltd. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Financial processes may possess long memory and their probability densities may display heavy tails. Many models have been developed to deal with this tail behaviour, which reflects the jumps in the sample paths. On the other hand, the presence of long memory, which contradicts the efficient market hypothesis, is still an issue for further debates. These difficulties present challenges with the problems of memory detection and modelling the co-presence of long memory and heavy tails. This PhD project aims to respond to these challenges. The first part aims to detect memory in a large number of financial time series on stock prices and exchange rates using their scaling properties. Since financial time series often exhibit stochastic trends, a common form of nonstationarity, strong trends in the data can lead to false detection of memory. We will take advantage of a technique known as multifractal detrended fluctuation analysis (MF-DFA) that can systematically eliminate trends of different orders. This method is based on the identification of scaling of the q-th-order moments and is a generalisation of the standard detrended fluctuation analysis (DFA) which uses only the second moment; that is, q = 2. We also consider the rescaled range R/S analysis and the periodogram method to detect memory in financial time series and compare their results with the MF-DFA. An interesting finding is that short memory is detected for stock prices of the American Stock Exchange (AMEX) and long memory is found present in the time series of two exchange rates, namely the French franc and the Deutsche mark. Electricity price series of the five states of Australia are also found to possess long memory. For these electricity price series, heavy tails are also pronounced in their probability densities. The second part of the thesis develops models to represent short-memory and longmemory financial processes as detected in Part I. These models take the form of continuous-time AR(∞) -type equations whose kernel is the Laplace transform of a finite Borel measure. By imposing appropriate conditions on this measure, short memory or long memory in the dynamics of the solution will result. A specific form of the models, which has a good MA(∞) -type representation, is presented for the short memory case. Parameter estimation of this type of models is performed via least squares, and the models are applied to the stock prices in the AMEX, which have been established in Part I to possess short memory. By selecting the kernel in the continuous-time AR(∞) -type equations to have the form of Riemann-Liouville fractional derivative, we obtain a fractional stochastic differential equation driven by Brownian motion. This type of equations is used to represent financial processes with long memory, whose dynamics is described by the fractional derivative in the equation. These models are estimated via quasi-likelihood, namely via a continuoustime version of the Gauss-Whittle method. The models are applied to the exchange rates and the electricity prices of Part I with the aim of confirming their possible long-range dependence established by MF-DFA. The third part of the thesis provides an application of the results established in Parts I and II to characterise and classify financial markets. We will pay attention to the New York Stock Exchange (NYSE), the American Stock Exchange (AMEX), the NASDAQ Stock Exchange (NASDAQ) and the Toronto Stock Exchange (TSX). The parameters from MF-DFA and those of the short-memory AR(∞) -type models will be employed in this classification. We propose the Fisher discriminant algorithm to find a classifier in the two and three-dimensional spaces of data sets and then provide cross-validation to verify discriminant accuracies. This classification is useful for understanding and predicting the behaviour of different processes within the same market. The fourth part of the thesis investigates the heavy-tailed behaviour of financial processes which may also possess long memory. We consider fractional stochastic differential equations driven by stable noise to model financial processes such as electricity prices. The long memory of electricity prices is represented by a fractional derivative, while the stable noise input models their non-Gaussianity via the tails of their probability density. A method using the empirical densities and MF-DFA will be provided to estimate all the parameters of the model and simulate sample paths of the equation. The method is then applied to analyse daily spot prices for five states of Australia. Comparison with the results obtained from the R/S analysis, periodogram method and MF-DFA are provided. The results from fractional SDEs agree with those from MF-DFA, which are based on multifractal scaling, while those from the periodograms, which are based on the second order, seem to underestimate the long memory dynamics of the process. This highlights the need and usefulness of fractal methods in modelling non-Gaussian financial processes with long memory.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The high morbidity and mortality associated with atherosclerotic coronary vascular disease (CVD) and its complications are being lessened by the increased knowledge of risk factors, effective preventative measures and proven therapeutic interventions. However, significant CVD morbidity remains and sudden cardiac death continues to be a presenting feature for some subsequently diagnosed with CVD. Coronary vascular disease is also the leading cause of anaesthesia related complications. Stress electrocardiography/exercise testing is predictive of 10 year risk of CVD events and the cardiovascular variables used to score this test are monitored peri-operatively. Similar physiological time-series datasets are being subjected to data mining methods for the prediction of medical diagnoses and outcomes. This study aims to find predictors of CVD using anaesthesia time-series data and patient risk factor data. Several pre-processing and predictive data mining methods are applied to this data. Physiological time-series data related to anaesthetic procedures are subjected to pre-processing methods for removal of outliers, calculation of moving averages as well as data summarisation and data abstraction methods. Feature selection methods of both wrapper and filter types are applied to derived physiological time-series variable sets alone and to the same variables combined with risk factor variables. The ability of these methods to identify subsets of highly correlated but non-redundant variables is assessed. The major dataset is derived from the entire anaesthesia population and subsets of this population are considered to be at increased anaesthesia risk based on their need for more intensive monitoring (invasive haemodynamic monitoring and additional ECG leads). Because of the unbalanced class distribution in the data, majority class under-sampling and Kappa statistic together with misclassification rate and area under the ROC curve (AUC) are used for evaluation of models generated using different prediction algorithms. The performance based on models derived from feature reduced datasets reveal the filter method, Cfs subset evaluation, to be most consistently effective although Consistency derived subsets tended to slightly increased accuracy but markedly increased complexity. The use of misclassification rate (MR) for model performance evaluation is influenced by class distribution. This could be eliminated by consideration of the AUC or Kappa statistic as well by evaluation of subsets with under-sampled majority class. The noise and outlier removal pre-processing methods produced models with MR ranging from 10.69 to 12.62 with the lowest value being for data from which both outliers and noise were removed (MR 10.69). For the raw time-series dataset, MR is 12.34. Feature selection results in reduction in MR to 9.8 to 10.16 with time segmented summary data (dataset F) MR being 9.8 and raw time-series summary data (dataset A) being 9.92. However, for all time-series only based datasets, the complexity is high. For most pre-processing methods, Cfs could identify a subset of correlated and non-redundant variables from the time-series alone datasets but models derived from these subsets are of one leaf only. MR values are consistent with class distribution in the subset folds evaluated in the n-cross validation method. For models based on Cfs selected time-series derived and risk factor (RF) variables, the MR ranges from 8.83 to 10.36 with dataset RF_A (raw time-series data and RF) being 8.85 and dataset RF_F (time segmented time-series variables and RF) being 9.09. The models based on counts of outliers and counts of data points outside normal range (Dataset RF_E) and derived variables based on time series transformed using Symbolic Aggregate Approximation (SAX) with associated time-series pattern cluster membership (Dataset RF_ G) perform the least well with MR of 10.25 and 10.36 respectively. For coronary vascular disease prediction, nearest neighbour (NNge) and the support vector machine based method, SMO, have the highest MR of 10.1 and 10.28 while logistic regression (LR) and the decision tree (DT) method, J48, have MR of 8.85 and 9.0 respectively. DT rules are most comprehensible and clinically relevant. The predictive accuracy increase achieved by addition of risk factor variables to time-series variable based models is significant. The addition of time-series derived variables to models based on risk factor variables alone is associated with a trend to improved performance. Data mining of feature reduced, anaesthesia time-series variables together with risk factor variables can produce compact and moderately accurate models able to predict coronary vascular disease. Decision tree analysis of time-series data combined with risk factor variables yields rules which are more accurate than models based on time-series data alone. The limited additional value provided by electrocardiographic variables when compared to use of risk factors alone is similar to recent suggestions that exercise electrocardiography (exECG) under standardised conditions has limited additional diagnostic value over risk factor analysis and symptom pattern. The effect of the pre-processing used in this study had limited effect when time-series variables and risk factor variables are used as model input. In the absence of risk factor input, the use of time-series variables after outlier removal and time series variables based on physiological variable values’ being outside the accepted normal range is associated with some improvement in model performance.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Abstract. For interactive systems, recognition, reproduction, and generalization of observed motion data are crucial for successful interaction. In this paper, we present a novel method for analysis of motion data that we refer to as K-OMM-trees. K-OMM-trees combine Ordered Means Models (OMMs) a model-based machine learning approach for time series with an hierarchical analysis technique for very large data sets, the K-tree algorithm. The proposed K-OMM-trees enable unsupervised prototype extraction of motion time series data with hierarchical data representation. After introducing the algorithmic details, we apply the proposed method to a gesture data set that includes substantial inter-class variations. Results from our studies show that K-OMM-trees are able to substantially increase the recognition performance and to learn an inherent data hierarchy with meaningful gesture abstractions.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A satellite based observation system can continuously or repeatedly generate a user state vector time series that may contain useful information. One typical example is the collection of International GNSS Services (IGS) station daily and weekly combined solutions. Another example is the epoch-by-epoch kinematic position time series of a receiver derived by a GPS real time kinematic (RTK) technique. Although some multivariate analysis techniques have been adopted to assess the noise characteristics of multivariate state time series, statistic testings are limited to univariate time series. After review of frequently used hypotheses test statistics in univariate analysis of GNSS state time series, the paper presents a number of T-squared multivariate analysis statistics for use in the analysis of multivariate GNSS state time series. These T-squared test statistics have taken the correlation between coordinate components into account, which is neglected in univariate analysis. Numerical analysis was conducted with the multi-year time series of an IGS station to schematically demonstrate the results from the multivariate hypothesis testing in comparison with the univariate hypothesis testing results. The results have demonstrated that, in general, the testing for multivariate mean shifts and outliers tends to reject less data samples than the testing for univariate mean shifts and outliers under the same confidence level. It is noted that neither univariate nor multivariate data analysis methods are intended to replace physical analysis. Instead, these should be treated as complementary statistical methods for a prior or posteriori investigations. Physical analysis is necessary subsequently to refine and interpret the results.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper we present a new simulation methodology in order to obtain exact or approximate Bayesian inference for models for low-valued count time series data that have computationally demanding likelihood functions. The algorithm fits within the framework of particle Markov chain Monte Carlo (PMCMC) methods. The particle filter requires only model simulations and, in this regard, our approach has connections with approximate Bayesian computation (ABC). However, an advantage of using the PMCMC approach in this setting is that simulated data can be matched with data observed one-at-a-time, rather than attempting to match on the full dataset simultaneously or on a low-dimensional non-sufficient summary statistic, which is common practice in ABC. For low-valued count time series data we find that it is often computationally feasible to match simulated data with observed data exactly. Our particle filter maintains $N$ particles by repeating the simulation until $N+1$ exact matches are obtained. Our algorithm creates an unbiased estimate of the likelihood, resulting in exact posterior inferences when included in an MCMC algorithm. In cases where exact matching is computationally prohibitive, a tolerance is introduced as per ABC. A novel aspect of our approach is that we introduce auxiliary variables into our particle filter so that partially observed and/or non-Markovian models can be accommodated. We demonstrate that Bayesian model choice problems can be easily handled in this framework.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Most studies examining the temperature–mortality association in a city used temperatures from one site or the average from a network of sites. This may cause measurement error as temperature varies across a city due to effects such as urban heat islands. We examined whether spatiotemporal models using spatially resolved temperatures produced different associations between temperature and mortality compared with time series models that used non-spatial temperatures. We obtained daily mortality data in 163 areas across Brisbane city, Australia from 2000 to 2004. We used ordinary kriging to interpolate spatial temperature variation across the city based on 19 monitoring sites. We used a spatiotemporal model to examine the impact of spatially resolved temperatures on mortality. Also, we used a time series model to examine non-spatial temperatures using a single site and the average temperature from three sites. We used squared Pearson scaled residuals to compare model fit. We found that kriged temperatures were consistent with observed temperatures. Spatiotemporal models using kriged temperature data yielded slightly better model fit than time series models using a single site or the average of three sites' data. Despite this better fit, spatiotemporal and time series models produced similar associations between temperature and mortality. In conclusion, time series models using non-spatial temperatures were equally good at estimating the city-wide association between temperature and mortality as spatiotemporal models.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Few studies have formally examined the relationship between meteorological factors and the incidence of child pneumonia in the tropics, despite the fact that most child pneumonia deaths occur there. We examined the association between four meteorological exposures (rainy days, sunshine, relative humidity, temperature) and the incidence of clinical pneumonia in young children in the Philippines using three time-series methods: correlation of seasonal patterns, distributed lag regression, and case-crossover. Lack of sunshine was most strongly associated with pneumonia in both lagged regression [overall relative risk over the following 60 days for a 1-h increase in sunshine per day was 0·67 (95% confidence interval (CI) 0·51–0·87)] and case-crossover analysis [odds ratio for a 1-h increase in mean daily sunshine 8–14 days earlier was 0·95 (95% CI 0·91–1·00)]. This association is well known in temperate settings but has not been noted previously in the tropics. Further research to assess causality is needed.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background Heat-related impacts may have greater public health implications as climate change continues. It is important to appropriately characterize the relationship between heatwave and health outcomes. However, it is unclear whether a case-crossover design can be effectively used to assess the event- or episode-related health effects. This study examined the association between exposure to heatwaves and mortality and emergency hospital admissions (EHAs) from non-external causes in Brisbane, Australia, using both case-crossover and time series analyses approaches. Methods Poisson generalised additive model (GAM) and time-stratified case-crossover analyses were used to assess the short-term impact of heatwaves on mortality and EHAs. Heatwaves exhibited a significant impact on mortality and EHAs after adjusting for air pollution, day of the week, and season. Results For time-stratified case-crossover analysis, odds ratios of mortality and EHAs during heatwaves were 1.62 (95% confidence interval (CI): 1.36–1.94) and 1.22 (95% CI: 1.14–1.30) at lag 1, respectively. Time series GAM models gave similar results. Relative risks of mortality and EHAs ranged from 1.72 (95% CI: 1.40–2.11) to 1.81 (95% CI: 1.56–2.10) and from 1.14 (95% CI: 1.06–1.23) to 1.28 (95% CI: 1.21–1.36) at lag 1, respectively. The risk estimates gradually attenuated after the lag of one day for both case-crossover and time series analyses. Conclusions The risk estimates from both case-crossover and time series models were consistent and comparable. This finding may have implications for future research on the assessment of event- or episode-related (e.g., heatwave) health effects.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper we present a new method for performing Bayesian parameter inference and model choice for low count time series models with intractable likelihoods. The method involves incorporating an alive particle filter within a sequential Monte Carlo (SMC) algorithm to create a novel pseudo-marginal algorithm, which we refer to as alive SMC^2. The advantages of this approach over competing approaches is that it is naturally adaptive, it does not involve between-model proposals required in reversible jump Markov chain Monte Carlo and does not rely on potentially rough approximations. The algorithm is demonstrated on Markov process and integer autoregressive moving average models applied to real biological datasets of hospital-acquired pathogen incidence, animal health time series and the cumulative number of poison disease cases in mule deer.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The multifractal properties of daily rainfall time series at the stations in Pearl River basin of China over periods of up to 45 years are examined using the universal multifractal approach based on the multiplicative cascade model and the multifractal detrended fluctuation analysis (MF-DFA). The results from these two kinds of multifractal analyses show that the daily rainfall time series in this basin have multifractal behavior in two different time scale ranges. It is found that the empirical multifractal moment function K(q)K(q) of the daily rainfall time series can be fitted very well by the universal multifractal model (UMM). The estimated values of the conservation parameter HH from UMM for these daily rainfall data are close to zero indicating that they correspond to conserved fields. After removing the seasonal trend in the rainfall data, the estimated values of the exponent h(2)h(2) from MF-DFA indicate that the daily rainfall time series in Pearl River basin exhibit no long-term correlations. It is also found that K(2)K(2) and elevation series are negatively correlated. It shows a relationship between topography and rainfall variability.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Cereal grain is one of the main export commodities of Australian agriculture. Over the past decade, crop yield forecasts for wheat and sorghum have shown appreciable utility for industry planning at shire, state, and national scales. There is now an increasing drive from industry for more accurate and cost-effective crop production forecasts. In order to generate production estimates, accurate crop area estimates are needed by the end of the cropping season. Multivariate methods for analysing remotely sensed Enhanced Vegetation Index (EVI) from 16-day Moderate Resolution Imaging Spectroradiometer (MODIS) satellite imagery within the cropping period (i.e. April-November) were investigated to estimate crop area for wheat, barley, chickpea, and total winter cropped area for a case study region in NE Australia. Each pixel classification method was trained on ground truth data collected from the study region. Three approaches to pixel classification were examined: (i) cluster analysis of trajectories of EVI values from consecutive multi-date imagery during the crop growth period; (ii) harmonic analysis of the time series (HANTS) of the EVI values; and (iii) principal component analysis (PCA) of the time series of EVI values. Images classified using these three approaches were compared with each other, and with a classification based on the single MODIS image taken at peak EVI. Imagery for the 2003 and 2004 seasons was used to assess the ability of the methods to determine wheat, barley, chickpea, and total cropped area estimates. The accuracy at pixel scale was determined by the percent correct classification metric by contrasting all pixel scale samples with independent pixel observations. At a shire level, aggregated total crop area estimates were compared with surveyed estimates. All multi-temporal methods showed significant overall capability to estimate total winter crop area. There was high accuracy at pixel scale (>98% correct classification) for identifying overall winter cropping. However, discrimination among crops was less accurate. Although the use of single-date EVI data produced high accuracy for estimates of wheat area at shire scale, the result contradicted the poor pixel-scale accuracy associated with this approach, due to fortuitous compensating errors. Further studies are needed to extrapolate the multi-temporal approaches to other geographical areas and to improve the lead time for deriving cropped-area estimates before harvest.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Accurate and stable time series of geodetic parameters can be used to help in understanding the dynamic Earth and its response to global change. The Global Positioning System, GPS, has proven to be invaluable in modern geodynamic studies. In Fennoscandia the first GPS networks were set up in 1993. These networks form the basis of the national reference frames in the area, but they also provide long and important time series for crustal deformation studies. These time series can be used, for example, to better constrain the ice history of the last ice age and the Earth s structure, via existing glacial isostatic adjustment models. To improve the accuracy and stability of the GPS time series, the possible nuisance parameters and error sources need to be minimized. We have analysed GPS time series to study two phenomena. First, we study the refraction in the neutral atmosphere of the GPS signal, and, second, we study the surface loading of the crust by environmental factors, namely the non-tidal Baltic Sea, atmospheric load and varying continental water reservoirs. We studied the atmospheric effects on the GPS time series by comparing the standard method to slant delays derived from a regional numerical weather model. We have presented a method for correcting the atmospheric delays at the observational level. The results show that both standard atmosphere modelling and the atmospheric delays derived from a numerical weather model by ray-tracing provide a stable solution. The advantage of the latter is that the number of unknowns used in the computation decreases and thus, the computation may become faster and more robust. The computation can also be done with any processing software that allows the atmospheric correction to be turned off. The crustal deformation due to loading was computed by convolving Green s functions with surface load data, that is to say, global hydrology models, global numerical weather models and a local model for the Baltic Sea. The result was that the loading factors can be seen in the GPS coordinate time series. Reducing the computed deformation from the vertical time series of GPS coordinates reduces the scatter of the time series; however, the long term trends are not influenced. We show that global hydrology models and the local sea surface can explain up to 30% of the GPS time series variation. On the other hand atmospheric loading admittance in the GPS time series is low, and different hydrological surface load models could not be validated in the present study. In order to be used for GPS corrections in the future, both atmospheric loading and hydrological models need further analysis and improvements.