967 resultados para Time series modeling
Resumo:
The first manuscript, entitled "Time-Series Analysis as Input for Clinical Predictive Modeling: Modeling Cardiac Arrest in a Pediatric ICU" lays out the theoretical background for the project. There are several core concepts presented in this paper. First, traditional multivariate models (where each variable is represented by only one value) provide single point-in-time snapshots of patient status: they are incapable of characterizing deterioration. Since deterioration is consistently identified as a precursor to cardiac arrests, we maintain that the traditional multivariate paradigm is insufficient for predicting arrests. We identify time series analysis as a method capable of characterizing deterioration in an objective, mathematical fashion, and describe how to build a general foundation for predictive modeling using time series analysis results as latent variables. Building a solid foundation for any given modeling task involves addressing a number of issues during the design phase. These include selecting the proper candidate features on which to base the model, and selecting the most appropriate tool to measure them. We also identified several unique design issues that are introduced when time series data elements are added to the set of candidate features. One such issue is in defining the duration and resolution of time series elements required to sufficiently characterize the time series phenomena being considered as candidate features for the predictive model. Once the duration and resolution are established, there must also be explicit mathematical or statistical operations that produce the time series analysis result to be used as a latent candidate feature. In synthesizing the comprehensive framework for building a predictive model based on time series data elements, we identified at least four classes of data that can be used in the model design. The first two classes are shared with traditional multivariate models: multivariate data and clinical latent features. Multivariate data is represented by the standard one value per variable paradigm and is widely employed in a host of clinical models and tools. These are often represented by a number present in a given cell of a table. Clinical latent features derived, rather than directly measured, data elements that more accurately represent a particular clinical phenomenon than any of the directly measured data elements in isolation. The second two classes are unique to the time series data elements. The first of these is the raw data elements. These are represented by multiple values per variable, and constitute the measured observations that are typically available to end users when they review time series data. These are often represented as dots on a graph. The final class of data results from performing time series analysis. This class of data represents the fundamental concept on which our hypothesis is based. The specific statistical or mathematical operations are up to the modeler to determine, but we generally recommend that a variety of analyses be performed in order to maximize the likelihood that a representation of the time series data elements is produced that is able to distinguish between two or more classes of outcomes. The second manuscript, entitled "Building Clinical Prediction Models Using Time Series Data: Modeling Cardiac Arrest in a Pediatric ICU" provides a detailed description, start to finish, of the methods required to prepare the data, build, and validate a predictive model that uses the time series data elements determined in the first paper. One of the fundamental tenets of the second paper is that manual implementations of time series based models are unfeasible due to the relatively large number of data elements and the complexity of preprocessing that must occur before data can be presented to the model. Each of the seventeen steps is analyzed from the perspective of how it may be automated, when necessary. We identify the general objectives and available strategies of each of the steps, and we present our rationale for choosing a specific strategy for each step in the case of predicting cardiac arrest in a pediatric intensive care unit. Another issue brought to light by the second paper is that the individual steps required to use time series data for predictive modeling are more numerous and more complex than those used for modeling with traditional multivariate data. Even after complexities attributable to the design phase (addressed in our first paper) have been accounted for, the management and manipulation of the time series elements (the preprocessing steps in particular) are issues that are not present in a traditional multivariate modeling paradigm. In our methods, we present the issues that arise from the time series data elements: defining a reference time; imputing and reducing time series data in order to conform to a predefined structure that was specified during the design phase; and normalizing variable families rather than individual variable instances. The final manuscript, entitled: "Using Time-Series Analysis to Predict Cardiac Arrest in a Pediatric Intensive Care Unit" presents the results that were obtained by applying the theoretical construct and its associated methods (detailed in the first two papers) to the case of cardiac arrest prediction in a pediatric intensive care unit. Our results showed that utilizing the trend analysis from the time series data elements reduced the number of classification errors by 73%. The area under the Receiver Operating Characteristic curve increased from a baseline of 87% to 98% by including the trend analysis. In addition to the performance measures, we were also able to demonstrate that adding raw time series data elements without their associated trend analyses improved classification accuracy as compared to the baseline multivariate model, but diminished classification accuracy as compared to when just the trend analysis features were added (ie, without adding the raw time series data elements). We believe this phenomenon was largely attributable to overfitting, which is known to increase as the ratio of candidate features to class examples rises. Furthermore, although we employed several feature reduction strategies to counteract the overfitting problem, they failed to improve the performance beyond that which was achieved by exclusion of the raw time series elements. Finally, our data demonstrated that pulse oximetry and systolic blood pressure readings tend to start diminishing about 10-20 minutes before an arrest, whereas heart rates tend to diminish rapidly less than 5 minutes before an arrest.
Resumo:
En la actualidad, el seguimiento de la dinámica de los procesos medio ambientales está considerado como un punto de gran interés en el campo medioambiental. La cobertura espacio temporal de los datos de teledetección proporciona información continua con una alta frecuencia temporal, permitiendo el análisis de la evolución de los ecosistemas desde diferentes escalas espacio-temporales. Aunque el valor de la teledetección ha sido ampliamente probado, en la actualidad solo existe un número reducido de metodologías que permiten su análisis de una forma cuantitativa. En la presente tesis se propone un esquema de trabajo para explotar las series temporales de datos de teledetección, basado en la combinación del análisis estadístico de series de tiempo y la fenometría. El objetivo principal es demostrar el uso de las series temporales de datos de teledetección para analizar la dinámica de variables medio ambientales de una forma cuantitativa. Los objetivos específicos son: (1) evaluar dichas variables medio ambientales y (2) desarrollar modelos empíricos para predecir su comportamiento futuro. Estos objetivos se materializan en cuatro aplicaciones cuyos objetivos específicos son: (1) evaluar y cartografiar estados fenológicos del cultivo del algodón mediante análisis espectral y fenometría, (2) evaluar y modelizar la estacionalidad de incendios forestales en dos regiones bioclimáticas mediante modelos dinámicos, (3) predecir el riesgo de incendios forestales a nivel pixel utilizando modelos dinámicos y (4) evaluar el funcionamiento de la vegetación en base a la autocorrelación temporal y la fenometría. Los resultados de esta tesis muestran la utilidad del ajuste de funciones para modelizar los índices espectrales AS1 y AS2. Los parámetros fenológicos derivados del ajuste de funciones permiten la identificación de distintos estados fenológicos del cultivo del algodón. El análisis espectral ha demostrado, de una forma cuantitativa, la presencia de un ciclo en el índice AS2 y de dos ciclos en el AS1 así como el comportamiento unimodal y bimodal de la estacionalidad de incendios en las regiones mediterránea y templada respectivamente. Modelos autorregresivos han sido utilizados para caracterizar la dinámica de la estacionalidad de incendios y para predecir de una forma muy precisa el riesgo de incendios forestales a nivel pixel. Ha sido demostrada la utilidad de la autocorrelación temporal para definir y caracterizar el funcionamiento de la vegetación a nivel pixel. Finalmente el concepto “Optical Functional Type” ha sido definido, donde se propone que los pixeles deberían ser considerados como unidades temporales y analizados en función de su dinámica temporal. ix SUMMARY A good understanding of land surface processes is considered as a key subject in environmental sciences. The spatial-temporal coverage of remote sensing data provides continuous observations with a high temporal frequency allowing the assessment of ecosystem evolution at different temporal and spatial scales. Although the value of remote sensing time series has been firmly proved, only few time series methods have been developed for analyzing this data in a quantitative and continuous manner. In the present dissertation a working framework to exploit Remote Sensing time series is proposed based on the combination of Time Series Analysis and phenometric approach. The main goal is to demonstrate the use of remote sensing time series to analyze quantitatively environmental variable dynamics. The specific objectives are (1) to assess environmental variables based on remote sensing time series and (2) to develop empirical models to forecast environmental variables. These objectives have been achieved in four applications which specific objectives are (1) assessing and mapping cotton crop phenological stages using spectral and phenometric analyses, (2) assessing and modeling fire seasonality in two different ecoregions by dynamic models, (3) forecasting forest fire risk on a pixel basis by dynamic models, and (4) assessing vegetation functioning based on temporal autocorrelation and phenometric analysis. The results of this dissertation show the usefulness of function fitting procedures to model AS1 and AS2. Phenometrics derived from function fitting procedure makes it possible to identify cotton crop phenological stages. Spectral analysis has demonstrated quantitatively the presence of one cycle in AS2 and two in AS1 and the unimodal and bimodal behaviour of fire seasonality in the Mediterranean and temperate ecoregions respectively. Autoregressive models has been used to characterize the dynamics of fire seasonality in two ecoregions and to forecasts accurately fire risk on a pixel basis. The usefulness of temporal autocorrelation to define and characterized land surface functioning has been demonstrated. And finally the “Optical Functional Types” concept has been proposed, in this approach pixels could be as temporal unities based on its temporal dynamics or functioning.
Resumo:
Vector error-correction models (VECMs) have become increasingly important in their application to financial markets. Standard full-order VECM models assume non-zero entries in all their coefficient matrices. However, applications of VECM models to financial market data have revealed that zero entries are often a necessary part of efficient modelling. In such cases, the use of full-order VECM models may lead to incorrect inferences. Specifically, if indirect causality or Granger non-causality exists among the variables, the use of over-parameterised full-order VECM models may weaken the power of statistical inference. In this paper, it is argued that the zero–non-zero (ZNZ) patterned VECM is a more straightforward and effective means of testing for both indirect causality and Granger non-causality. For a ZNZ patterned VECM framework for time series of integrated order two, we provide a new algorithm to select cointegrating and loading vectors that can contain zero entries. Two case studies are used to demonstrate the usefulness of the algorithm in tests of purchasing power parity and a three-variable system involving the stock market.
Resumo:
The last three decades have seen quite dramatic changes the way we modeled time dependent data. Linear processes have been in the center stage in modeling time series. As far as the second order properties are concerned, the theory and the methodology are very adequate.However, there are more and more evidences that linear models are not sufficiently flexible and rich enough for modeling purposes and that failure to account for non-linearities can be very misleading and have undesired consequences.
Resumo:
Dissertation submitted in the fufillment of the requirements for the Degree of Master in Biomedical Engineering
Resumo:
Identification of order of an Autoregressive Moving Average Model (ARMA) by the usual graphical method is subjective. Hence, there is a need of developing a technique to identify the order without employing the graphical investigation of series autocorrelations. To avoid subjectivity, this thesis focuses on determining the order of the Autoregressive Moving Average Model using Reversible Jump Markov Chain Monte Carlo (RJMCMC). The RJMCMC selects the model from a set of the models suggested by better fitting, standard deviation errors and the frequency of accepted data. Together with deep analysis of the classical Box-Jenkins modeling methodology the integration with MCMC algorithms has been focused through parameter estimation and model fitting of ARMA models. This helps to verify how well the MCMC algorithms can treat the ARMA models, by comparing the results with graphical method. It has been seen that the MCMC produced better results than the classical time series approach.
Resumo:
In the power market, electricity prices play an important role at the economic level. The behavior of a price trend usually known as a structural break may change over time in terms of its mean value, its volatility, or it may change for a period of time before reverting back to its original behavior or switching to another style of behavior, and the latter is typically termed a regime shift or regime switch. Our task in this thesis is to develop an electricity price time series model that captures fat tailed distributions which can explain this behavior and analyze it for better understanding. For NordPool data used, the obtained Markov Regime-Switching model operates on two regimes: regular and non-regular. Three criteria have been considered price difference criterion, capacity/flow difference criterion and spikes in Finland criterion. The suitability of GARCH modeling to simulate multi-regime modeling is also studied.
Resumo:
Due to its non-storability, electricity must be produced at the same time that it is consumed, as a result prices are determined on an hourly basis and thus analysis becomes more challenging. Moreover, the seasonal fluctuations in demand and supply lead to a seasonal behavior of electricity spot prices. The purpose of this thesis is to seek and remove all causal effects from electricity spot prices and remain with pure prices for modeling purposes. To achieve this we use Qlucore Omics Explorer (QOE) for the visualization and the exploration of the data set and Time Series Decomposition method to estimate and extract the deterministic components from the series. To obtain the target series we use regression based on the background variables (water reservoir and temperature). The result obtained is three price series (for Sweden, Norway and System prices) with no apparent pattern.
Resumo:
This work aims at combining the Chaos theory postulates and Artificial Neural Networks classification and predictive capability, in the field of financial time series prediction. Chaos theory, provides valuable qualitative and quantitative tools to decide on the predictability of a chaotic system. Quantitative measurements based on Chaos theory, are used, to decide a-priori whether a time series, or a portion of a time series is predictable, while Chaos theory based qualitative tools are used to provide further observations and analysis on the predictability, in cases where measurements provide negative answers. Phase space reconstruction is achieved by time delay embedding resulting in multiple embedded vectors. The cognitive approach suggested, is inspired by the capability of some chartists to predict the direction of an index by looking at the price time series. Thus, in this work, the calculation of the embedding dimension and the separation, in Takens‘ embedding theorem for phase space reconstruction, is not limited to False Nearest Neighbor, Differential Entropy or other specific method, rather, this work is interested in all embedding dimensions and separations that are regarded as different ways of looking at a time series by different chartists, based on their expectations. Prior to the prediction, the embedded vectors of the phase space are classified with Fuzzy-ART, then, for each class a back propagation Neural Network is trained to predict the last element of each vector, whereas all previous elements of a vector are used as features.
Resumo:
This thesis consists of four manuscripts in the area of nonlinear time series econometrics on topics of testing, modeling and forecasting nonlinear common features. The aim of this thesis is to develop new econometric contributions for hypothesis testing and forecasting in these area. Both stationary and nonstationary time series are concerned. A definition of common features is proposed in an appropriate way to each class. Based on the definition, a vector nonlinear time series model with common features is set up for testing for common features. The proposed models are available for forecasting as well after being well specified. The first paper addresses a testing procedure on nonstationary time series. A class of nonlinear cointegration, smooth-transition (ST) cointegration, is examined. The ST cointegration nests the previously developed linear and threshold cointegration. An Ftypetest for examining the ST cointegration is derived when stationary transition variables are imposed rather than nonstationary variables. Later ones drive the test standard, while the former ones make the test nonstandard. This has important implications for empirical work. It is crucial to distinguish between the cases with stationary and nonstationary transition variables so that the correct test can be used. The second and the fourth papers develop testing approaches for stationary time series. In particular, the vector ST autoregressive (VSTAR) model is extended to allow for common nonlinear features (CNFs). These two papers propose a modeling procedure and derive tests for the presence of CNFs. Including model specification using the testing contributions above, the third paper considers forecasting with vector nonlinear time series models and extends the procedures available for univariate nonlinear models. The VSTAR model with CNFs and the ST cointegration model in the previous papers are exemplified in detail,and thereafter illustrated within two corresponding macroeconomic data sets.
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
This work provides a forward step in the study and comprehension of the relationships between stochastic processes and a certain class of integral-partial differential equation, which can be used in order to model anomalous diffusion and transport in statistical physics. In the first part, we brought the reader through the fundamental notions of probability and stochastic processes, stochastic integration and stochastic differential equations as well. In particular, within the study of H-sssi processes, we focused on fractional Brownian motion (fBm) and its discrete-time increment process, the fractional Gaussian noise (fGn), which provide examples of non-Markovian Gaussian processes. The fGn, together with stationary FARIMA processes, is widely used in the modeling and estimation of long-memory, or long-range dependence (LRD). Time series manifesting long-range dependence, are often observed in nature especially in physics, meteorology, climatology, but also in hydrology, geophysics, economy and many others. We deepely studied LRD, giving many real data examples, providing statistical analysis and introducing parametric methods of estimation. Then, we introduced the theory of fractional integrals and derivatives, which indeed turns out to be very appropriate for studying and modeling systems with long-memory properties. After having introduced the basics concepts, we provided many examples and applications. For instance, we investigated the relaxation equation with distributed order time-fractional derivatives, which describes models characterized by a strong memory component and can be used to model relaxation in complex systems, which deviates from the classical exponential Debye pattern. Then, we focused in the study of generalizations of the standard diffusion equation, by passing through the preliminary study of the fractional forward drift equation. Such generalizations have been obtained by using fractional integrals and derivatives of distributed orders. In order to find a connection between the anomalous diffusion described by these equations and the long-range dependence, we introduced and studied the generalized grey Brownian motion (ggBm), which is actually a parametric class of H-sssi processes, which have indeed marginal probability density function evolving in time according to a partial integro-differential equation of fractional type. The ggBm is of course Non-Markovian. All around the work, we have remarked many times that, starting from a master equation of a probability density function f(x,t), it is always possible to define an equivalence class of stochastic processes with the same marginal density function f(x,t). All these processes provide suitable stochastic models for the starting equation. Studying the ggBm, we just focused on a subclass made up of processes with stationary increments. The ggBm has been defined canonically in the so called grey noise space. However, we have been able to provide a characterization notwithstanding the underline probability space. We also pointed out that that the generalized grey Brownian motion is a direct generalization of a Gaussian process and in particular it generalizes Brownain motion and fractional Brownain motion as well. Finally, we introduced and analyzed a more general class of diffusion type equations related to certain non-Markovian stochastic processes. We started from the forward drift equation, which have been made non-local in time by the introduction of a suitable chosen memory kernel K(t). The resulting non-Markovian equation has been interpreted in a natural way as the evolution equation of the marginal density function of a random time process l(t). We then consider the subordinated process Y(t)=X(l(t)) where X(t) is a Markovian diffusion. The corresponding time-evolution of the marginal density function of Y(t) is governed by a non-Markovian Fokker-Planck equation which involves the same memory kernel K(t). We developed several applications and derived the exact solutions. Moreover, we considered different stochastic models for the given equations, providing path simulations.
Resumo:
Prediction of radiated fields from transmission lines has not previously been studied from a panoptical power system perspective. The application of BPL technologies to overhead transmission lines would benefit greatly from an ability to simulate real power system environments, not limited to the transmission lines themselves. Presently circuitbased transmission line models used by EMTP-type programs utilize Carson’s formula for a waveguide parallel to an interface. This formula is not valid for calculations at high frequencies, considering effects of earth return currents. This thesis explains the challenges of developing such improved models, explores an approach to combining circuit-based and electromagnetics modeling to predict radiated fields from transmission lines, exposes inadequacies of simulation tools, and suggests methods of extending the validity of transmission line models into very high frequency ranges. Electromagnetics programs are commonly used to study radiated fields from transmission lines. However, an approach is proposed here which is also able to incorporate the components of a power system through the combined use of EMTP-type models. Carson’s formulas address the series impedance of electrical conductors above and parallel to the earth. These equations have been analyzed to show their inherent assumptions and what the implications are. Additionally, the lack of validity into higher frequencies has been demonstrated, showing the need to replace Carson’s formulas for these types of studies. This body of work leads to several conclusions about the relatively new study of BPL. Foremost, there is a gap in modeling capabilities which has been bridged through integration of circuit-based and electromagnetics modeling, allowing more realistic prediction of BPL performance and radiated fields. The proposed approach is limited in its scope of validity due to the formulas used by EMTP-type software. To extend the range of validity, a new set of equations must be identified and implemented in the approach. Several potential methods of implementation have been explored. Though an appropriate set of equations has not yet been identified, further research in this area will benefit from a clear depiction of the next important steps and how they can be accomplished. Prediction of radiated fields from transmission lines has not previously been studied from a panoptical power system perspective. The application of BPL technologies to overhead transmission lines would benefit greatly from an ability to simulate real power system environments, not limited to the transmission lines themselves. Presently circuitbased transmission line models used by EMTP-type programs utilize Carson’s formula for a waveguide parallel to an interface. This formula is not valid for calculations at high frequencies, considering effects of earth return currents. This thesis explains the challenges of developing such improved models, explores an approach to combining circuit-based and electromagnetics modeling to predict radiated fields from transmission lines, exposes inadequacies of simulation tools, and suggests methods of extending the validity of transmission line models into very high frequency ranges. Electromagnetics programs are commonly used to study radiated fields from transmission lines. However, an approach is proposed here which is also able to incorporate the components of a power system through the combined use of EMTP-type models. Carson’s formulas address the series impedance of electrical conductors above and parallel to the earth. These equations have been analyzed to show their inherent assumptions and what the implications are. Additionally, the lack of validity into higher frequencies has been demonstrated, showing the need to replace Carson’s formulas for these types of studies. This body of work leads to several conclusions about the relatively new study of BPL. Foremost, there is a gap in modeling capabilities which has been bridged through integration of circuit-based and electromagnetics modeling, allowing more realistic prediction of BPL performance and radiated fields. The proposed approach is limited in its scope of validity due to the formulas used by EMTP-type software. To extend the range of validity, a new set of equations must be identified and implemented in the approach. Several potential methods of implementation have been explored. Though an appropriate set of equations has not yet been identified, further research in this area will benefit from a clear depiction of the next important steps and how they can be accomplished.
Resumo:
Seizure freedom in patients suffering from pharmacoresistant epilepsies is still not achieved in 20–30% of all cases. Hence, current therapies need to be improved, based on a more complete understanding of ictogenesis. In this respect, the analysis of functional networks derived from intracranial electroencephalographic (iEEG) data has recently become a standard tool. Functional networks however are purely descriptive models and thus are conceptually unable to predict fundamental features of iEEG time-series, e.g., in the context of therapeutical brain stimulation. In this paper we present some first steps towards overcoming the limitations of functional network analysis, by showing that its results are implied by a simple predictive model of time-sliced iEEG time-series. More specifically, we learn distinct graphical models (so called Chow–Liu (CL) trees) as models for the spatial dependencies between iEEG signals. Bayesian inference is then applied to the CL trees, allowing for an analytic derivation/prediction of functional networks, based on thresholding of the absolute value Pearson correlation coefficient (CC) matrix. Using various measures, the thus obtained networks are then compared to those which were derived in the classical way from the empirical CC-matrix. In the high threshold limit we find (a) an excellent agreement between the two networks and (b) key features of periictal networks as they have previously been reported in the literature. Apart from functional networks, both matrices are also compared element-wise, showing that the CL approach leads to a sparse representation, by setting small correlations to values close to zero while preserving the larger ones. Overall, this paper shows the validity of CL-trees as simple, spatially predictive models for periictal iEEG data. Moreover, we suggest straightforward generalizations of the CL-approach for modeling also the temporal features of iEEG signals.
Long-term time-series of mesozooplankton biomass in the Gulf of Naples, data for the years 2005-2006