966 resultados para Multivariate Heterogeneous Time Series
Resumo:
This study presents new evidence concerning the uneven processes of industrialization innineteenth century Spain and Italy based on a disaggregate analysis of the productivesectors from which the behaviour of the aggregate indices is comprised. The use of multivariate time-series analysis techniques can aid our understanding and characterization of these two processes of industrialization. The identification of those sectors with key rolesin leading industrial growth provides new evidence concerning the factors that governed thebehaviour of the aggregates in the two economies. In addition, the analysis of the existenceof interindustry linkages reveals the scale of the industrialization process, and wheresignificant differences exist, accounts for many of the divergences recorded in the historiography for the period 1850-1913.
Resumo:
This study presents new evidence concerning the uneven processes of industrialization innineteenth century Spain and Italy based on a disaggregate analysis of the productivesectors from which the behaviour of the aggregate indices is comprised. The use of multivariate time-series analysis techniques can aid our understanding and characterization of these two processes of industrialization. The identification of those sectors with key rolesin leading industrial growth provides new evidence concerning the factors that governed thebehaviour of the aggregates in the two economies. In addition, the analysis of the existenceof interindustry linkages reveals the scale of the industrialization process, and wheresignificant differences exist, accounts for many of the divergences recorded in the historiography for the period 1850-1913.
Resumo:
The thesis at hand adds to the existing literature by investigating the relationship between economic growth and outward foreign direct investments (OFDI) on a set of 16 emerging countries. Two different econometric techniques are employed: a panel data regression analysis and a time-series causality analysis. Results from the regression analysis indicate a positive and significant correlation between OFDI and economic growth. Additionally, the coefficient for the OFDI variable is robust in the sense specified by the Extreme Bound Analysis (EBA). On the other hand, the findings of the causality analysis are particularly heterogeneous. The vector autoregression (VAR) and the vector error correction model (VECM) approaches identify unidirectional Granger causality running either from OFDI to GDP or from GDP to OFDI in six countries. In four economies causality among the two variables is bidirectional, whereas in five countries no causality relationship between OFDI and GDP seems to be present.
Resumo:
The scope of this paper was to analyze the association between homicides and public security indicators in Sao Paulo between 1996 and 2008, after monitoring the unemployment rate and the proportion of youths in the population. A time-series ecological study for 1996 and 2008 was conducted with Sao Paulo as the unit of analysis. Dependent variable: number of deaths by homicide per year. Main independent variables: arrest-incarceration rate, access to firearms, police activity. Data analysis was conducted using Stata. IC 10.0 software. Simple and multivariate negative binomial regression models were created. Deaths by homicide and arrest-incarceration, as well as police activity were significantly associated in simple regression analysis. Access to firearms was not significantly associated to the reduction in the number of deaths by homicide (p>0,05). After adjustment, the associations with both the public security indicators were not significant. In Sao Paulo the role of public security indicators are less important as explanatory factors for a reduction in homicide rates, after adjustment for unemployment rate and a reduction in the proportion of youths. The results reinforce the importance of socioeconomic and demographic factors for a change in the public security scenario in Sao Paulo.
Resumo:
Granger causality (GC) is a statistical technique used to estimate temporal associations in multivariate time series. Many applications and extensions of GC have been proposed since its formulation by Granger in 1969. Here we control for potentially mediating or confounding associations between time series in the context of event-related electrocorticographic (ECoG) time series. A pruning approach to remove spurious connections and simultaneously reduce the required number of estimations to fit the effective connectivity graph is proposed. Additionally, we consider the potential of adjusted GC applied to independent components as a method to explore temporal relationships between underlying source signals. Both approaches overcome limitations encountered when estimating many parameters in multivariate time-series data, an increasingly common predicament in today's brain mapping studies.
Resumo:
The first manuscript, entitled "Time-Series Analysis as Input for Clinical Predictive Modeling: Modeling Cardiac Arrest in a Pediatric ICU" lays out the theoretical background for the project. There are several core concepts presented in this paper. First, traditional multivariate models (where each variable is represented by only one value) provide single point-in-time snapshots of patient status: they are incapable of characterizing deterioration. Since deterioration is consistently identified as a precursor to cardiac arrests, we maintain that the traditional multivariate paradigm is insufficient for predicting arrests. We identify time series analysis as a method capable of characterizing deterioration in an objective, mathematical fashion, and describe how to build a general foundation for predictive modeling using time series analysis results as latent variables. Building a solid foundation for any given modeling task involves addressing a number of issues during the design phase. These include selecting the proper candidate features on which to base the model, and selecting the most appropriate tool to measure them. We also identified several unique design issues that are introduced when time series data elements are added to the set of candidate features. One such issue is in defining the duration and resolution of time series elements required to sufficiently characterize the time series phenomena being considered as candidate features for the predictive model. Once the duration and resolution are established, there must also be explicit mathematical or statistical operations that produce the time series analysis result to be used as a latent candidate feature. In synthesizing the comprehensive framework for building a predictive model based on time series data elements, we identified at least four classes of data that can be used in the model design. The first two classes are shared with traditional multivariate models: multivariate data and clinical latent features. Multivariate data is represented by the standard one value per variable paradigm and is widely employed in a host of clinical models and tools. These are often represented by a number present in a given cell of a table. Clinical latent features derived, rather than directly measured, data elements that more accurately represent a particular clinical phenomenon than any of the directly measured data elements in isolation. The second two classes are unique to the time series data elements. The first of these is the raw data elements. These are represented by multiple values per variable, and constitute the measured observations that are typically available to end users when they review time series data. These are often represented as dots on a graph. The final class of data results from performing time series analysis. This class of data represents the fundamental concept on which our hypothesis is based. The specific statistical or mathematical operations are up to the modeler to determine, but we generally recommend that a variety of analyses be performed in order to maximize the likelihood that a representation of the time series data elements is produced that is able to distinguish between two or more classes of outcomes. The second manuscript, entitled "Building Clinical Prediction Models Using Time Series Data: Modeling Cardiac Arrest in a Pediatric ICU" provides a detailed description, start to finish, of the methods required to prepare the data, build, and validate a predictive model that uses the time series data elements determined in the first paper. One of the fundamental tenets of the second paper is that manual implementations of time series based models are unfeasible due to the relatively large number of data elements and the complexity of preprocessing that must occur before data can be presented to the model. Each of the seventeen steps is analyzed from the perspective of how it may be automated, when necessary. We identify the general objectives and available strategies of each of the steps, and we present our rationale for choosing a specific strategy for each step in the case of predicting cardiac arrest in a pediatric intensive care unit. Another issue brought to light by the second paper is that the individual steps required to use time series data for predictive modeling are more numerous and more complex than those used for modeling with traditional multivariate data. Even after complexities attributable to the design phase (addressed in our first paper) have been accounted for, the management and manipulation of the time series elements (the preprocessing steps in particular) are issues that are not present in a traditional multivariate modeling paradigm. In our methods, we present the issues that arise from the time series data elements: defining a reference time; imputing and reducing time series data in order to conform to a predefined structure that was specified during the design phase; and normalizing variable families rather than individual variable instances. The final manuscript, entitled: "Using Time-Series Analysis to Predict Cardiac Arrest in a Pediatric Intensive Care Unit" presents the results that were obtained by applying the theoretical construct and its associated methods (detailed in the first two papers) to the case of cardiac arrest prediction in a pediatric intensive care unit. Our results showed that utilizing the trend analysis from the time series data elements reduced the number of classification errors by 73%. The area under the Receiver Operating Characteristic curve increased from a baseline of 87% to 98% by including the trend analysis. In addition to the performance measures, we were also able to demonstrate that adding raw time series data elements without their associated trend analyses improved classification accuracy as compared to the baseline multivariate model, but diminished classification accuracy as compared to when just the trend analysis features were added (ie, without adding the raw time series data elements). We believe this phenomenon was largely attributable to overfitting, which is known to increase as the ratio of candidate features to class examples rises. Furthermore, although we employed several feature reduction strategies to counteract the overfitting problem, they failed to improve the performance beyond that which was achieved by exclusion of the raw time series elements. Finally, our data demonstrated that pulse oximetry and systolic blood pressure readings tend to start diminishing about 10-20 minutes before an arrest, whereas heart rates tend to diminish rapidly less than 5 minutes before an arrest.
Resumo:
The analysis of time-dependent data is an important problem in many application domains, and interactive visualization of time-series data can help in understanding patterns in large time series data. Many effective approaches already exist for visual analysis of univariate time series supporting tasks such as assessment of data quality, detection of outliers, or identification of periodically or frequently occurring patterns. However, much fewer approaches exist which support multivariate time series. The existence of multiple values per time stamp makes the analysis task per se harder, and existing visualization techniques often do not scale well. We introduce an approach for visual analysis of large multivariate time-dependent data, based on the idea of projecting multivariate measurements to a 2D display, visualizing the time dimension by trajectories. We use visual data aggregation metaphors based on grouping of similar data elements to scale with multivariate time series. Aggregation procedures can either be based on statistical properties of the data or on data clustering routines. Appropriately defined user controls allow to navigate and explore the data and interactively steer the parameters of the data aggregation to enhance data analysis. We present an implementation of our approach and apply it on a comprehensive data set from the field of earth bservation, demonstrating the applicability and usefulness of our approach.
Resumo:
The Tertiary detritic aquifer of Madrid (TDAM), with an average thickness of 1500 m and a heterogeneous, anisotropic structure, supplies water to Madrid, the most populated city of Spain (3.2 million inhabitants in the metropolitan area). Besides its complex structure, a previous work focused in the north-northwest of Madrid city showed that the aquifer behaves quasi elastically trough extraction/recovery cycles and ground uplifting during recovery periods compensates most of the ground subsidence measured during previous extraction periods (Ezquerro et al., 2014). Therefore, the relationship between ground deformation and groundwater level through time can be simulated using simple elastic models. In this work, we model the temporal evolution of the piezometric level in 19 wells of the TDAM in the period 1997–2010. Using InSAR and piezometric time series spanning the studied period, we first estimate the elastic storage coefficient (Ske) for every well. Both, the Ske of each well and the average Ske of all wells, are used to predict hydraulic heads at the different well locations during the study period and compared against the measured hydraulic heads, leading to very similar errors when using the Ske of each well and the average Ske of all wells: 14 and 16 % on average respectively. This result suggests that an average Ske can be used to estimate piezometric level variations in all the points where ground deformation has been measured by InSAR, thus allowing production of piezometric level maps for the different extraction/recovery cycles in the TDAM.
Resumo:
Amongst all the objectives in the study of time series, uncovering the dynamic law of its generation is probably the most important. When the underlying dynamics are not available, time series modelling consists of developing a model which best explains a sequence of observations. In this thesis, we consider hidden space models for analysing and describing time series. We first provide an introduction to the principal concepts of hidden state models and draw an analogy between hidden Markov models and state space models. Central ideas such as hidden state inference or parameter estimation are reviewed in detail. A key part of multivariate time series analysis is identifying the delay between different variables. We present a novel approach for time delay estimating in a non-stationary environment. The technique makes use of hidden Markov models and we demonstrate its application for estimating a crucial parameter in the oil industry. We then focus on hybrid models that we call dynamical local models. These models combine and generalise hidden Markov models and state space models. Probabilistic inference is unfortunately computationally intractable and we show how to make use of variational techniques for approximating the posterior distribution over the hidden state variables. Experimental simulations on synthetic and real-world data demonstrate the application of dynamical local models for segmenting a time series into regimes and providing predictive distributions.
Resumo:
An application of the heterogeneous variables system prediction method to solving the time series analysis problem with respect to the sample size is considered in this work. It is created a logical-and-probabilistic correlation from the logical decision function class. Two ways is considered. When the information about event is kept safe in the process, and when it is kept safe in depending process.
Resumo:
When it comes to information sets in real life, often pieces of the whole set may not be available. This problem can find its origin in various reasons, describing therefore different patterns. In the literature, this problem is known as Missing Data. This issue can be fixed in various ways, from not taking into consideration incomplete observations, to guessing what those values originally were, or just ignoring the fact that some values are missing. The methods used to estimate missing data are called Imputation Methods. The work presented in this thesis has two main goals. The first one is to determine whether any kind of interactions exists between Missing Data, Imputation Methods and Supervised Classification algorithms, when they are applied together. For this first problem we consider a scenario in which the databases used are discrete, understanding discrete as that it is assumed that there is no relation between observations. These datasets underwent processes involving different combina- tions of the three components mentioned. The outcome showed that the missing data pattern strongly influences the outcome produced by a classifier. Also, in some of the cases, the complex imputation techniques investigated in the thesis were able to obtain better results than simple ones. The second goal of this work is to propose a new imputation strategy, but this time we constrain the specifications of the previous problem to a special kind of datasets, the multivariate Time Series. We designed new imputation techniques for this particular domain, and combined them with some of the contrasted strategies tested in the pre- vious chapter of this thesis. The time series also were subjected to processes involving missing data and imputation to finally propose an overall better imputation method. In the final chapter of this work, a real-world example is presented, describing a wa- ter quality prediction problem. The databases that characterized this problem had their own original latent values, which provides a real-world benchmark to test the algorithms developed in this thesis.
Resumo:
This thesis provides a necessary and sufficient condition for asymptotic efficiency of a nonparametric estimator of the generalised autocovariance function of a Gaussian stationary random process. The generalised autocovariance function is the inverse Fourier transform of a power transformation of the spectral density, and encompasses the traditional and inverse autocovariance functions. Its nonparametric estimator is based on the inverse discrete Fourier transform of the same power transformation of the pooled periodogram. The general result is then applied to the class of Gaussian stationary ARMA processes and its implications are discussed. We illustrate that for a class of contrast functionals and spectral densities, the minimum contrast estimator of the spectral density satisfies a Yule-Walker system of equations in the generalised autocovariance estimator. Selection of the pooling parameter, which characterizes the nonparametric estimator of the generalised autocovariance, controlling its resolution, is addressed by using a multiplicative periodogram bootstrap to estimate the finite-sample distribution of the estimator. A multivariate extension of recently introduced spectral models for univariate time series is considered, and an algorithm for the coefficients of a power transformation of matrix polynomials is derived, which allows to obtain the Wold coefficients from the matrix coefficients characterizing the generalised matrix cepstral models. This algorithm also allows the definition of the matrix variance profile, providing important quantities for vector time series analysis. A nonparametric estimator based on a transformation of the smoothed periodogram is proposed for estimation of the matrix variance profile.
Resumo:
The objective of the study is to evaluate the effect of the daily variation in concentrations of fine particulate matter (diameter less than 2.5µm - PM2.5) resulting from the burning of biomass on the daily number of hospitalizations of children and elderly people for respiratory diseases, in Alta Floresta and Tangará da Serra in the Brazilian Amazon in 2005. This is an ecological time series study that uses data on daily number of hospitalizations of children and the elderly for respiratory diseases, and estimated concentration of PM2.5. In Alta Floresta, the percentage increases in the relative risk (%RR) of hospitalization for respiratory diseases in children were significant for the whole year and for the dry season with 3-4 day lags. In the dry season these measurements reach 6% (95%CI: 1.4-10.8). The associations were sig-nificant for moving averages of 3-5 days. The %RR for the elderly was significant for the current day of the drought, with a 6.8% increase (95%CI: 0.5-13.5) for each additional 10µg/m3 of PM2.5. No as-sociations were verified for Tangara da Serra. The PM2.5 from the burning of biomass increased hospitalizations for respiratory diseases in children and the elderly.
Resumo:
A susceptible-infective-recovered (SIR) epidemiological model based on probabilistic cellular automaton (PCA) is employed for simulating the temporal evolution of the registered cases of chickenpox in Arizona, USA, between 1994 and 2004. At each time step, every individual is in one of the states S, I, or R. The parameters of this model are the probabilities of each individual (each cell forming the PCA lattice ) passing from a state to another state. Here, the values of these probabilities are identified by using a genetic algorithm. If nonrealistic values are allowed to the parameters, the predictions present better agreement with the historical series than if they are forced to present realistic values. A discussion about how the size of the PCA lattice affects the quality of the model predictions is presented. Copyright (C) 2009 L. H. A. Monteiro et al.
Resumo:
Background: Microarray techniques have become an important tool to the investigation of genetic relationships and the assignment of different phenotypes. Since microarrays are still very expensive, most of the experiments are performed with small samples. This paper introduces a method to quantify dependency between data series composed of few sample points. The method is used to construct gene co-expression subnetworks of highly significant edges. Results: The results shown here are for an adapted subset of a Saccharomyces cerevisiae gene expression data set with low temporal resolution and poor statistics. The method reveals common transcription factors with a high confidence level and allows the construction of subnetworks with high biological relevance that reveals characteristic features of the processes driving the organism adaptations to specific environmental conditions. Conclusion: Our method allows a reliable and sophisticated analysis of microarray data even under severe constraints. The utilization of systems biology improves the biologists ability to elucidate the mechanisms underlying celular processes and to formulate new hypotheses.