952 resultados para financial time series prediction
Resumo:
In this paper, we present approximate distributions for the ratio of the cumulative wavelet periodograms considering stationary and non-stationary time series generated from independent Gaussian processes. We also adapt an existing procedure to use this statistic and its approximate distribution in order to test if two regularly or irregularly spaced time series are realizations of the same generating process. Simulation studies show good size and power properties for the test statistic. An application with financial microdata illustrates the test usefulness. We conclude advocating the use of these approximate distributions instead of the ones obtained through randomizations, mainly in the case of irregular time series. (C) 2012 Elsevier B.V. All rights reserved.
Resumo:
This work proposes a system for classification of industrial steel pieces by means of magnetic nondestructive device. The proposed classification system presents two main stages, online system stage and off-line system stage. In online stage, the system classifies inputs and saves misclassification information in order to perform posterior analyses. In the off-line optimization stage, the topology of a Probabilistic Neural Network is optimized by a Feature Selection algorithm combined with the Probabilistic Neural Network to increase the classification rate. The proposed Feature Selection algorithm searches for the signal spectrogram by combining three basic elements: a Sequential Forward Selection algorithm, a Feature Cluster Grow algorithm with classification rate gradient analysis and a Sequential Backward Selection. Also, a trash-data recycling algorithm is proposed to obtain the optimal feedback samples selected from the misclassified ones.
Resumo:
Visualization and exploratory analysis is an important part of any data analysis and is made more challenging when the data are voluminous and high-dimensional. One such example is environmental monitoring data, which are often collected over time and at multiple locations, resulting in a geographically indexed multivariate time series. Financial data, although not necessarily containing a geographic component, present another source of high-volume multivariate time series data. We present the mvtsplot function which provides a method for visualizing multivariate time series data. We outline the basic design concepts and provide some examples of its usage by applying it to a database of ambient air pollution measurements in the United States and to a hypothetical portfolio of stocks.
Resumo:
The rank-based nonlinear predictability score was recently introduced as a test for determinism in point processes. We here adapt this measure to time series sampled from time-continuous flows. We use noisy Lorenz signals to compare this approach against a classical amplitude-based nonlinear prediction error. Both measures show an almost identical robustness against Gaussian white noise. In contrast, when the amplitude distribution of the noise has a narrower central peak and heavier tails than the normal distribution, the rank-based nonlinear predictability score outperforms the amplitude-based nonlinear prediction error. For this type of noise, the nonlinear predictability score has a higher sensitivity for deterministic structure in noisy signals. It also yields a higher statistical power in a surrogate test of the null hypothesis of linear stochastic correlated signals. We show the high relevance of this improved performance in an application to electroencephalographic (EEG) recordings from epilepsy patients. Here the nonlinear predictability score again appears of higher sensitivity to nonrandomness. Importantly, it yields an improved contrast between signals recorded from brain areas where the first ictal EEG signal changes were detected (focal EEG signals) versus signals recorded from brain areas that were not involved at seizure onset (nonfocal EEG signals).
Resumo:
Seizure freedom in patients suffering from pharmacoresistant epilepsies is still not achieved in 20–30% of all cases. Hence, current therapies need to be improved, based on a more complete understanding of ictogenesis. In this respect, the analysis of functional networks derived from intracranial electroencephalographic (iEEG) data has recently become a standard tool. Functional networks however are purely descriptive models and thus are conceptually unable to predict fundamental features of iEEG time-series, e.g., in the context of therapeutical brain stimulation. In this paper we present some first steps towards overcoming the limitations of functional network analysis, by showing that its results are implied by a simple predictive model of time-sliced iEEG time-series. More specifically, we learn distinct graphical models (so called Chow–Liu (CL) trees) as models for the spatial dependencies between iEEG signals. Bayesian inference is then applied to the CL trees, allowing for an analytic derivation/prediction of functional networks, based on thresholding of the absolute value Pearson correlation coefficient (CC) matrix. Using various measures, the thus obtained networks are then compared to those which were derived in the classical way from the empirical CC-matrix. In the high threshold limit we find (a) an excellent agreement between the two networks and (b) key features of periictal networks as they have previously been reported in the literature. Apart from functional networks, both matrices are also compared element-wise, showing that the CL approach leads to a sparse representation, by setting small correlations to values close to zero while preserving the larger ones. Overall, this paper shows the validity of CL-trees as simple, spatially predictive models for periictal iEEG data. Moreover, we suggest straightforward generalizations of the CL-approach for modeling also the temporal features of iEEG signals.
Resumo:
A discussion of nonlinear dynamics, demonstrated by the familiar automobile, is followed by the development of a systematic method of analysis of a possibly nonlinear time series using difference equations in the general state-space format. This format allows recursive state-dependent parameter estimation after each observation thereby revealing the dynamics inherent in the system in combination with random external perturbations.^ The one-step ahead prediction errors at each time period, transformed to have constant variance, and the estimated parametric sequences provide the information to (1) formally test whether time series observations y(,t) are some linear function of random errors (ELEM)(,s), for some t and s, or whether the series would more appropriately be described by a nonlinear model such as bilinear, exponential, threshold, etc., (2) formally test whether a statistically significant change has occurred in structure/level either historically or as it occurs, (3) forecast nonlinear system with a new and innovative (but very old numerical) technique utilizing rational functions to extrapolate individual parameters as smooth functions of time which are then combined to obtain the forecast of y and (4) suggest a measure of resilience, i.e. how much perturbation a structure/level can tolerate, whether internal or external to the system, and remain statistically unchanged. Although similar to one-step control, this provides a less rigid way to think about changes affecting social systems.^ Applications consisting of the analysis of some familiar and some simulated series demonstrate the procedure. Empirical results suggest that this state-space or modified augmented Kalman filter may provide interesting ways to identify particular kinds of nonlinearities as they occur in structural change via the state trajectory.^ A computational flow-chart detailing computations and software input and output is provided in the body of the text. IBM Advanced BASIC program listings to accomplish most of the analysis are provided in the appendix. ^
Resumo:
The first manuscript, entitled "Time-Series Analysis as Input for Clinical Predictive Modeling: Modeling Cardiac Arrest in a Pediatric ICU" lays out the theoretical background for the project. There are several core concepts presented in this paper. First, traditional multivariate models (where each variable is represented by only one value) provide single point-in-time snapshots of patient status: they are incapable of characterizing deterioration. Since deterioration is consistently identified as a precursor to cardiac arrests, we maintain that the traditional multivariate paradigm is insufficient for predicting arrests. We identify time series analysis as a method capable of characterizing deterioration in an objective, mathematical fashion, and describe how to build a general foundation for predictive modeling using time series analysis results as latent variables. Building a solid foundation for any given modeling task involves addressing a number of issues during the design phase. These include selecting the proper candidate features on which to base the model, and selecting the most appropriate tool to measure them. We also identified several unique design issues that are introduced when time series data elements are added to the set of candidate features. One such issue is in defining the duration and resolution of time series elements required to sufficiently characterize the time series phenomena being considered as candidate features for the predictive model. Once the duration and resolution are established, there must also be explicit mathematical or statistical operations that produce the time series analysis result to be used as a latent candidate feature. In synthesizing the comprehensive framework for building a predictive model based on time series data elements, we identified at least four classes of data that can be used in the model design. The first two classes are shared with traditional multivariate models: multivariate data and clinical latent features. Multivariate data is represented by the standard one value per variable paradigm and is widely employed in a host of clinical models and tools. These are often represented by a number present in a given cell of a table. Clinical latent features derived, rather than directly measured, data elements that more accurately represent a particular clinical phenomenon than any of the directly measured data elements in isolation. The second two classes are unique to the time series data elements. The first of these is the raw data elements. These are represented by multiple values per variable, and constitute the measured observations that are typically available to end users when they review time series data. These are often represented as dots on a graph. The final class of data results from performing time series analysis. This class of data represents the fundamental concept on which our hypothesis is based. The specific statistical or mathematical operations are up to the modeler to determine, but we generally recommend that a variety of analyses be performed in order to maximize the likelihood that a representation of the time series data elements is produced that is able to distinguish between two or more classes of outcomes. The second manuscript, entitled "Building Clinical Prediction Models Using Time Series Data: Modeling Cardiac Arrest in a Pediatric ICU" provides a detailed description, start to finish, of the methods required to prepare the data, build, and validate a predictive model that uses the time series data elements determined in the first paper. One of the fundamental tenets of the second paper is that manual implementations of time series based models are unfeasible due to the relatively large number of data elements and the complexity of preprocessing that must occur before data can be presented to the model. Each of the seventeen steps is analyzed from the perspective of how it may be automated, when necessary. We identify the general objectives and available strategies of each of the steps, and we present our rationale for choosing a specific strategy for each step in the case of predicting cardiac arrest in a pediatric intensive care unit. Another issue brought to light by the second paper is that the individual steps required to use time series data for predictive modeling are more numerous and more complex than those used for modeling with traditional multivariate data. Even after complexities attributable to the design phase (addressed in our first paper) have been accounted for, the management and manipulation of the time series elements (the preprocessing steps in particular) are issues that are not present in a traditional multivariate modeling paradigm. In our methods, we present the issues that arise from the time series data elements: defining a reference time; imputing and reducing time series data in order to conform to a predefined structure that was specified during the design phase; and normalizing variable families rather than individual variable instances. The final manuscript, entitled: "Using Time-Series Analysis to Predict Cardiac Arrest in a Pediatric Intensive Care Unit" presents the results that were obtained by applying the theoretical construct and its associated methods (detailed in the first two papers) to the case of cardiac arrest prediction in a pediatric intensive care unit. Our results showed that utilizing the trend analysis from the time series data elements reduced the number of classification errors by 73%. The area under the Receiver Operating Characteristic curve increased from a baseline of 87% to 98% by including the trend analysis. In addition to the performance measures, we were also able to demonstrate that adding raw time series data elements without their associated trend analyses improved classification accuracy as compared to the baseline multivariate model, but diminished classification accuracy as compared to when just the trend analysis features were added (ie, without adding the raw time series data elements). We believe this phenomenon was largely attributable to overfitting, which is known to increase as the ratio of candidate features to class examples rises. Furthermore, although we employed several feature reduction strategies to counteract the overfitting problem, they failed to improve the performance beyond that which was achieved by exclusion of the raw time series elements. Finally, our data demonstrated that pulse oximetry and systolic blood pressure readings tend to start diminishing about 10-20 minutes before an arrest, whereas heart rates tend to diminish rapidly less than 5 minutes before an arrest.
Resumo:
Vector error-correction models (VECMs) have become increasingly important in their application to financial markets. Standard full-order VECM models assume non-zero entries in all their coefficient matrices. However, applications of VECM models to financial market data have revealed that zero entries are often a necessary part of efficient modelling. In such cases, the use of full-order VECM models may lead to incorrect inferences. Specifically, if indirect causality or Granger non-causality exists among the variables, the use of over-parameterised full-order VECM models may weaken the power of statistical inference. In this paper, it is argued that the zero–non-zero (ZNZ) patterned VECM is a more straightforward and effective means of testing for both indirect causality and Granger non-causality. For a ZNZ patterned VECM framework for time series of integrated order two, we provide a new algorithm to select cointegrating and loading vectors that can contain zero entries. Two case studies are used to demonstrate the usefulness of the algorithm in tests of purchasing power parity and a three-variable system involving the stock market.
Resumo:
In this paper we develop an evolutionary kernel-based time update algorithm to recursively estimate subset discrete lag models (including fullorder models) with a forgetting factor and a constant term, using the exactwindowed case. The algorithm applies to causality detection when the true relationship occurs with a continuous or a random delay. We then demonstrate the use of the proposed evolutionary algorithm to study the monthly mutual fund data, which come from the 'CRSP Survivor-bias free US Mutual Fund Database'. The results show that the NAV is an influential player on the international stage of global bond and stock markets.
Resumo:
An application of the heterogeneous variables system prediction method to solving the time series analysis problem with respect to the sample size is considered in this work. It is created a logical-and-probabilistic correlation from the logical decision function class. Two ways is considered. When the information about event is kept safe in the process, and when it is kept safe in depending process.
Resumo:
We examine how the most prevalent stochastic properties of key financial time series have been affected during the recent financial crises. In particular we focus on changes associated with the remarkable economic events of the last two decades in the volatility dynamics, including the underlying volatility persistence and volatility spillover structure. Using daily data from several key stock market indices, the results of our bivariate GARCH models show the existence of time varying correlations as well as time varying shock and volatility spillovers between the returns of FTSE and DAX, and those of NIKKEI and Hang Seng, which became more prominent during the recent financial crisis. Our theoretical considerations on the time varying model which provides the platform upon which we integrate our multifaceted empirical approaches are also of independent interest. In particular, we provide the general solution for time varying asymmetric GARCH specifications, which is a long standing research topic. This enables us to characterize these models by deriving, first, their multistep ahead predictors, second, the first two time varying unconditional moments, and third, their covariance structure.
Resumo:
Limited literature regarding parameter estimation of dynamic systems has been identified as the central-most reason for not having parametric bounds in chaotic time series. However, literature suggests that a chaotic system displays a sensitive dependence on initial conditions, and our study reveals that the behavior of chaotic system: is also sensitive to changes in parameter values. Therefore, parameter estimation technique could make it possible to establish parametric bounds on a nonlinear dynamic system underlying a given time series, which in turn can improve predictability. By extracting the relationship between parametric bounds and predictability, we implemented chaos-based models for improving prediction in time series. ^ This study describes work done to establish bounds on a set of unknown parameters. Our research results reveal that by establishing parametric bounds, it is possible to improve the predictability of any time series, although the dynamics or the mathematical model of that series is not known apriori. In our attempt to improve the predictability of various time series, we have established the bounds for a set of unknown parameters. These are: (i) the embedding dimension to unfold a set of observation in the phase space, (ii) the time delay to use for a series, (iii) the number of neighborhood points to use for avoiding detection of false neighborhood and, (iv) the local polynomial to build numerical interpolation functions from one region to another. Using these bounds, we are able to get better predictability in chaotic time series than previously reported. In addition, the developments of this dissertation can establish a theoretical framework to investigate predictability in time series from the system-dynamics point of view. ^ In closing, our procedure significantly reduces the computer resource usage, as the search method is refined and efficient. Finally, the uniqueness of our method lies in its ability to extract chaotic dynamics inherent in non-linear time series by observing its values. ^
Resumo:
Due to the variability and stochastic nature of wind power system, accurate wind power forecasting has an important role in developing reliable and economic power system operation and control strategies. As wind variability is stochastic, Gaussian Process regression has recently been introduced to capture the randomness of wind energy. However, the disadvantages of Gaussian Process regression include its computation complexity and incapability to adapt to time varying time-series systems. A variant Gaussian Process for time series forecasting is introduced in this study to address these issues. This new method is shown to be capable of reducing computational complexity and increasing prediction accuracy. It is further proved that the forecasting result converges as the number of available data approaches innite. Further, a teaching learning based optimization (TLBO) method is used to train the model and to accelerate
the learning rate. The proposed modelling and optimization method is applied to forecast both the wind power generation of Ireland and that from a single wind farm to show the eectiveness of the proposed method.