958 resultados para Structural time-series model
Resumo:
Background: A common approach for time series gene expression data analysis includes the clustering of genes with similar expression patterns throughout time. Clustered gene expression profiles point to the joint contribution of groups of genes to a particular cellular process. However, since genes belong to intricate networks, other features, besides comparable expression patterns, should provide additional information for the identification of functionally similar genes. Results: In this study we perform gene clustering through the identification of Granger causality between and within sets of time series gene expression data. Granger causality is based on the idea that the cause of an event cannot come after its consequence. Conclusions: This kind of analysis can be used as a complementary approach for functional clustering, wherein genes would be clustered not solely based on their expression similarity but on their topological proximity built according to the intensity of Granger causality among them.
Resumo:
The ubiquity of time series data across almost all human endeavors has produced a great interest in time series data mining in the last decade. While dozens of classification algorithms have been applied to time series, recent empirical evidence strongly suggests that simple nearest neighbor classification is exceptionally difficult to beat. The choice of distance measure used by the nearest neighbor algorithm is important, and depends on the invariances required by the domain. For example, motion capture data typically requires invariance to warping, and cardiology data requires invariance to the baseline (the mean value). Similarly, recent work suggests that for time series clustering, the choice of clustering algorithm is much less important than the choice of distance measure used.In this work we make a somewhat surprising claim. There is an invariance that the community seems to have missed, complexity invariance. Intuitively, the problem is that in many domains the different classes may have different complexities, and pairs of complex objects, even those which subjectively may seem very similar to the human eye, tend to be further apart under current distance measures than pairs of simple objects. This fact introduces errors in nearest neighbor classification, where some complex objects may be incorrectly assigned to a simpler class. Similarly, for clustering this effect can introduce errors by “suggesting” to the clustering algorithm that subjectively similar, but complex objects belong in a sparser and larger diameter cluster than is truly warranted.We introduce the first complexity-invariant distance measure for time series, and show that it generally produces significant improvements in classification and clustering accuracy. We further show that this improvement does not compromise efficiency, since we can lower bound the measure and use a modification of triangular inequality, thus making use of most existing indexing and data mining algorithms. We evaluate our ideas with the largest and most comprehensive set of time series mining experiments ever attempted in a single work, and show that complexity-invariant distance measures can produce improvements in classification and clustering in the vast majority of cases.
Resumo:
This work proposes a system for classification of industrial steel pieces by means of magnetic nondestructive device. The proposed classification system presents two main stages, online system stage and off-line system stage. In online stage, the system classifies inputs and saves misclassification information in order to perform posterior analyses. In the off-line optimization stage, the topology of a Probabilistic Neural Network is optimized by a Feature Selection algorithm combined with the Probabilistic Neural Network to increase the classification rate. The proposed Feature Selection algorithm searches for the signal spectrogram by combining three basic elements: a Sequential Forward Selection algorithm, a Feature Cluster Grow algorithm with classification rate gradient analysis and a Sequential Backward Selection. Also, a trash-data recycling algorithm is proposed to obtain the optimal feedback samples selected from the misclassified ones.
A Phase Space Box-counting based Method for Arrhythmia Prediction from Electrocardiogram Time Series
Resumo:
Arrhythmia is one kind of cardiovascular diseases that give rise to the number of deaths and potentially yields immedicable danger. Arrhythmia is a life threatening condition originating from disorganized propagation of electrical signals in heart resulting in desynchronization among different chambers of the heart. Fundamentally, the synchronization process means that the phase relationship of electrical activities between the chambers remains coherent, maintaining a constant phase difference over time. If desynchronization occurs due to arrhythmia, the coherent phase relationship breaks down resulting in chaotic rhythm affecting the regular pumping mechanism of heart. This phenomenon was explored by using the phase space reconstruction technique which is a standard analysis technique of time series data generated from nonlinear dynamical system. In this project a novel index is presented for predicting the onset of ventricular arrhythmias. Analysis of continuously captured long-term ECG data recordings was conducted up to the onset of arrhythmia by the phase space reconstruction method, obtaining 2-dimensional images, analysed by the box counting method. The method was tested using the ECG data set of three different kinds including normal (NR), Ventricular Tachycardia (VT), Ventricular Fibrillation (VF), extracted from the Physionet ECG database. Statistical measures like mean (μ), standard deviation (σ) and coefficient of variation (σ/μ) for the box-counting in phase space diagrams are derived for a sliding window of 10 beats of ECG signal. From the results of these statistical analyses, a threshold was derived as an upper bound of Coefficient of Variation (CV) for box-counting of ECG phase portraits which is capable of reliably predicting the impeding arrhythmia long before its actual occurrence. As future work of research, it was planned to validate this prediction tool over a wider population of patients affected by different kind of arrhythmia, like atrial fibrillation, bundle and brunch block, and set different thresholds for them, in order to confirm its clinical applicability.
Resumo:
The main objective of this thesis is to explore the short and long run causality patterns in the finance – growth nexus and finance-growth-trade nexus before and after the global financial crisis, in the case of Albania. To this end we use quarterly data on real GDP, 13 proxy measures for financial development and the trade openness indicator for the period 1998Q1 – 2013Q2 and 1998Q1-2008Q3. Causality patterns will be explored in a VAR-VECM framework. For this purpose we will proceed as follows: (i) testing for the integration order of the variables; (ii) cointegration analysis and (iii) performing Granger causality tests in a VAR-VECM framework. In the finance-growth nexus, empirical evidence suggests for a positive long run relationship between finance and economic growth, with causality running from financial development to economic growth. The global financial crisis seems to have not affected the causality direction in the finance and growth nexus, thus supporting the finance led growth hypothesis in the long run in the case of Albania. In the finance-growth-trade openness nexus, we found evidence for a positive long run relationship the variables, with causality direction depending on the proxy used for financial development. When the pre-crisis sample is considered, we find evidence for causality running from financial development and trade openness to economic growth. The global financial crisis seems to have affected somewhat the causality direction in the finance-growth-trade nexus, which has become sensible to the proxy used for financial development. On the short run, empirical evidence suggests for a clear unidirectional relationship between finance and growth, with causality mostly running from economic growth to financial development. When we consider the per-crisis sub sample results are mixed, depending on the proxy used for financial development. The same results are confirmed when trade openness is taken into account.
Resumo:
Zeitreihen sind allgegenwärtig. Die Erfassung und Verarbeitung kontinuierlich gemessener Daten ist in allen Bereichen der Naturwissenschaften, Medizin und Finanzwelt vertreten. Das enorme Anwachsen aufgezeichneter Datenmengen, sei es durch automatisierte Monitoring-Systeme oder integrierte Sensoren, bedarf außerordentlich schneller Algorithmen in Theorie und Praxis. Infolgedessen beschäftigt sich diese Arbeit mit der effizienten Berechnung von Teilsequenzalignments. Komplexe Algorithmen wie z.B. Anomaliedetektion, Motivfabfrage oder die unüberwachte Extraktion von prototypischen Bausteinen in Zeitreihen machen exzessiven Gebrauch von diesen Alignments. Darin begründet sich der Bedarf nach schnellen Implementierungen. Diese Arbeit untergliedert sich in drei Ansätze, die sich dieser Herausforderung widmen. Das umfasst vier Alignierungsalgorithmen und ihre Parallelisierung auf CUDA-fähiger Hardware, einen Algorithmus zur Segmentierung von Datenströmen und eine einheitliche Behandlung von Liegruppen-wertigen Zeitreihen.rnrnDer erste Beitrag ist eine vollständige CUDA-Portierung der UCR-Suite, die weltführende Implementierung von Teilsequenzalignierung. Das umfasst ein neues Berechnungsschema zur Ermittlung lokaler Alignierungsgüten unter Verwendung z-normierten euklidischen Abstands, welches auf jeder parallelen Hardware mit Unterstützung für schnelle Fouriertransformation einsetzbar ist. Des Weiteren geben wir eine SIMT-verträgliche Umsetzung der Lower-Bound-Kaskade der UCR-Suite zur effizienten Berechnung lokaler Alignierungsgüten unter Dynamic Time Warping an. Beide CUDA-Implementierungen ermöglichen eine um ein bis zwei Größenordnungen schnellere Berechnung als etablierte Methoden.rnrnAls zweites untersuchen wir zwei Linearzeit-Approximierungen für das elastische Alignment von Teilsequenzen. Auf der einen Seite behandeln wir ein SIMT-verträgliches Relaxierungschema für Greedy DTW und seine effiziente CUDA-Parallelisierung. Auf der anderen Seite führen wir ein neues lokales Abstandsmaß ein, den Gliding Elastic Match (GEM), welches mit der gleichen asymptotischen Zeitkomplexität wie Greedy DTW berechnet werden kann, jedoch eine vollständige Relaxierung der Penalty-Matrix bietet. Weitere Verbesserungen umfassen Invarianz gegen Trends auf der Messachse und uniforme Skalierung auf der Zeitachse. Des Weiteren wird eine Erweiterung von GEM zur Multi-Shape-Segmentierung diskutiert und auf Bewegungsdaten evaluiert. Beide CUDA-Parallelisierung verzeichnen Laufzeitverbesserungen um bis zu zwei Größenordnungen.rnrnDie Behandlung von Zeitreihen beschränkt sich in der Literatur in der Regel auf reellwertige Messdaten. Der dritte Beitrag umfasst eine einheitliche Methode zur Behandlung von Liegruppen-wertigen Zeitreihen. Darauf aufbauend werden Distanzmaße auf der Rotationsgruppe SO(3) und auf der euklidischen Gruppe SE(3) behandelt. Des Weiteren werden speichereffiziente Darstellungen und gruppenkompatible Erweiterungen elastischer Maße diskutiert.
Resumo:
La tesi tratta una panoramica generale sui Time Series database e relativi gestori. Successivamente l'attenzione è focalizzata sul DBMS InfluxDB. Infine viene mostrato un progetto che implementa InfluxDB