963 resultados para Time series classification


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Time series classification has been extensively explored in many fields of study. Most methods are based on the historical or current information extracted from data. However, if interest is in a specific future time period, methods that directly relate to forecasts of time series are much more appropriate. An approach to time series classification is proposed based on a polarization measure of forecast densities of time series. By fitting autoregressive models, forecast replicates of each time series are obtained via the bias-corrected bootstrap, and a stationarity correction is considered when necessary. Kernel estimators are then employed to approximate forecast densities, and discrepancies of forecast densities of pairs of time series are estimated by a polarization measure, which evaluates the extent to which two densities overlap. Following the distributional properties of the polarization measure, a discriminant rule and a clustering method are proposed to conduct the supervised and unsupervised classification, respectively. The proposed methodology is applied to both simulated and real data sets, and the results show desirable properties.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper we propose a novel family of kernels for multivariate time-series classification problems. Each time-series is approximated by a linear combination of piecewise polynomial functions in a Reproducing Kernel Hilbert Space by a novel kernel interpolation technique. Using the associated kernel function a large margin classification formulation is proposed which can discriminate between two classes. The formulation leads to kernels, between two multivariate time-series, which can be efficiently computed. The kernels have been successfully applied to writer independent handwritten character recognition.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper, we consider the problem of time series classification. Using piecewise linear interpolation various novel kernels are obtained which can be used with Support vector machines for designing classifiers capable of deciding the class of a given time series. The approach is general and is applicable in many scenarios. We apply the method to the task of Online Tamil handwritten character recognition with promising results.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Time series classification deals with the problem of classification of data that is multivariate in nature. This means that one or more of the attributes is in the form of a sequence. The notion of similarity or distance, used in time series data, is significant and affects the accuracy, time, and space complexity of the classification algorithm. There exist numerous similarity measures for time series data, but each of them has its own disadvantages. Instead of relying upon a single similarity measure, our aim is to find the near optimal solution to the classification problem by combining different similarity measures. In this work, we use genetic algorithms to combine the similarity measures so as to get the best performance. The weightage given to different similarity measures evolves over a number of generations so as to get the best combination. We test our approach on a number of benchmark time series datasets and present promising results.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The problem of classification of time series data is an interesting problem in the field of data mining. Even though several algorithms have been proposed for the problem of time series classification we have developed an innovative algorithm which is computationally fast and accurate in several cases when compared with 1NN classifier. In our method we are calculating the fuzzy membership of each test pattern to be classified to each class. We have experimented with 6 benchmark datasets and compared our method with 1NN classifier.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A new test of hypothesis for classifying stationary time series based on the bias-adjusted estimators of the fitted autoregressive model is proposed. It is shown theoretically that the proposed test has desirable properties. Simulation results show that when time series are short, the size and power estimates of the proposed test are reasonably good, and thus this test is reliable in discriminating between short-length time series. As the length of the time series increases, the performance of the proposed test improves, but the benefit of bias-adjustment reduces. The proposed hypothesis test is applied to two real data sets: the annual real GDP per capita of six European countries, and quarterly real GDP per capita of five European countries. The application results demonstrate that the proposed test displays reasonably good performance in classifying relatively short time series.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This work applies a variety of multilinear function factorisation techniques to extract appropriate features or attributes from high dimensional multivariate time series for classification. Recently, a great deal of work has centred around designing time series classifiers using more and more complex feature extraction and machine learning schemes. This paper argues that complex learners and domain specific feature extraction schemes of this type are not necessarily needed for time series classification, as excellent classification results can be obtained by simply applying a number of existing matrix factorisation or linear projection techniques, which are simple and computationally inexpensive. We highlight this using a geometric separability measure and classification accuracies obtained though experiments on four different high dimensional multivariate time series datasets. © 2013 IEEE.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The ubiquity of time series data across almost all human endeavors has produced a great interest in time series data mining in the last decade. While dozens of classification algorithms have been applied to time series, recent empirical evidence strongly suggests that simple nearest neighbor classification is exceptionally difficult to beat. The choice of distance measure used by the nearest neighbor algorithm is important, and depends on the invariances required by the domain. For example, motion capture data typically requires invariance to warping, and cardiology data requires invariance to the baseline (the mean value). Similarly, recent work suggests that for time series clustering, the choice of clustering algorithm is much less important than the choice of distance measure used.In this work we make a somewhat surprising claim. There is an invariance that the community seems to have missed, complexity invariance. Intuitively, the problem is that in many domains the different classes may have different complexities, and pairs of complex objects, even those which subjectively may seem very similar to the human eye, tend to be further apart under current distance measures than pairs of simple objects. This fact introduces errors in nearest neighbor classification, where some complex objects may be incorrectly assigned to a simpler class. Similarly, for clustering this effect can introduce errors by “suggesting” to the clustering algorithm that subjectively similar, but complex objects belong in a sparser and larger diameter cluster than is truly warranted.We introduce the first complexity-invariant distance measure for time series, and show that it generally produces significant improvements in classification and clustering accuracy. We further show that this improvement does not compromise efficiency, since we can lower bound the measure and use a modification of triangular inequality, thus making use of most existing indexing and data mining algorithms. We evaluate our ideas with the largest and most comprehensive set of time series mining experiments ever attempted in a single work, and show that complexity-invariant distance measures can produce improvements in classification and clustering in the vast majority of cases.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The applicability of ultra-short-term wind power prediction (USTWPP) models is reviewed. The USTWPP method proposed extracts featrues from historical data of wind power time series (WPTS), and classifies every short WPTS into one of several different subsets well defined by stationary patterns. All the WPTS that cannot match any one of the stationary patterns are sorted into the subset of nonstationary pattern. Every above WPTS subset needs a USTWPP model specially optimized for it offline. For on-line application, the pattern of the last short WPTS is recognized, then the corresponding prediction model is called for USTWPP. The validity of the proposed method is verified by simulations.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This work proposes a system for classification of industrial steel pieces by means of magnetic nondestructive device. The proposed classification system presents two main stages, online system stage and off-line system stage. In online stage, the system classifies inputs and saves misclassification information in order to perform posterior analyses. In the off-line optimization stage, the topology of a Probabilistic Neural Network is optimized by a Feature Selection algorithm combined with the Probabilistic Neural Network to increase the classification rate. The proposed Feature Selection algorithm searches for the signal spectrogram by combining three basic elements: a Sequential Forward Selection algorithm, a Feature Cluster Grow algorithm with classification rate gradient analysis and a Sequential Backward Selection. Also, a trash-data recycling algorithm is proposed to obtain the optimal feedback samples selected from the misclassified ones.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Financial processes may possess long memory and their probability densities may display heavy tails. Many models have been developed to deal with this tail behaviour, which reflects the jumps in the sample paths. On the other hand, the presence of long memory, which contradicts the efficient market hypothesis, is still an issue for further debates. These difficulties present challenges with the problems of memory detection and modelling the co-presence of long memory and heavy tails. This PhD project aims to respond to these challenges. The first part aims to detect memory in a large number of financial time series on stock prices and exchange rates using their scaling properties. Since financial time series often exhibit stochastic trends, a common form of nonstationarity, strong trends in the data can lead to false detection of memory. We will take advantage of a technique known as multifractal detrended fluctuation analysis (MF-DFA) that can systematically eliminate trends of different orders. This method is based on the identification of scaling of the q-th-order moments and is a generalisation of the standard detrended fluctuation analysis (DFA) which uses only the second moment; that is, q = 2. We also consider the rescaled range R/S analysis and the periodogram method to detect memory in financial time series and compare their results with the MF-DFA. An interesting finding is that short memory is detected for stock prices of the American Stock Exchange (AMEX) and long memory is found present in the time series of two exchange rates, namely the French franc and the Deutsche mark. Electricity price series of the five states of Australia are also found to possess long memory. For these electricity price series, heavy tails are also pronounced in their probability densities. The second part of the thesis develops models to represent short-memory and longmemory financial processes as detected in Part I. These models take the form of continuous-time AR(∞) -type equations whose kernel is the Laplace transform of a finite Borel measure. By imposing appropriate conditions on this measure, short memory or long memory in the dynamics of the solution will result. A specific form of the models, which has a good MA(∞) -type representation, is presented for the short memory case. Parameter estimation of this type of models is performed via least squares, and the models are applied to the stock prices in the AMEX, which have been established in Part I to possess short memory. By selecting the kernel in the continuous-time AR(∞) -type equations to have the form of Riemann-Liouville fractional derivative, we obtain a fractional stochastic differential equation driven by Brownian motion. This type of equations is used to represent financial processes with long memory, whose dynamics is described by the fractional derivative in the equation. These models are estimated via quasi-likelihood, namely via a continuoustime version of the Gauss-Whittle method. The models are applied to the exchange rates and the electricity prices of Part I with the aim of confirming their possible long-range dependence established by MF-DFA. The third part of the thesis provides an application of the results established in Parts I and II to characterise and classify financial markets. We will pay attention to the New York Stock Exchange (NYSE), the American Stock Exchange (AMEX), the NASDAQ Stock Exchange (NASDAQ) and the Toronto Stock Exchange (TSX). The parameters from MF-DFA and those of the short-memory AR(∞) -type models will be employed in this classification. We propose the Fisher discriminant algorithm to find a classifier in the two and three-dimensional spaces of data sets and then provide cross-validation to verify discriminant accuracies. This classification is useful for understanding and predicting the behaviour of different processes within the same market. The fourth part of the thesis investigates the heavy-tailed behaviour of financial processes which may also possess long memory. We consider fractional stochastic differential equations driven by stable noise to model financial processes such as electricity prices. The long memory of electricity prices is represented by a fractional derivative, while the stable noise input models their non-Gaussianity via the tails of their probability density. A method using the empirical densities and MF-DFA will be provided to estimate all the parameters of the model and simulate sample paths of the equation. The method is then applied to analyse daily spot prices for five states of Australia. Comparison with the results obtained from the R/S analysis, periodogram method and MF-DFA are provided. The results from fractional SDEs agree with those from MF-DFA, which are based on multifractal scaling, while those from the periodograms, which are based on the second order, seem to underestimate the long memory dynamics of the process. This highlights the need and usefulness of fractal methods in modelling non-Gaussian financial processes with long memory.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Introduction The acute health effects of heatwaves in a subtropical climate and their impact on emergency departments (ED) are not well known. The purpose of this study is to examine overt heat-related presentations to EDs associated with heatwaves in Brisbane. Methods Data were obtained for the summer seasons (December to February) from 2000-2012. Heatwave events were defined as two or more successive days with daily maximum temperature >=34[degree sign]C (HWD1) or >=37[degree sign]C (HWD2). Poisson generalised additive model was used to assess the effect of heatwaves on heat-related visits (International Classification of Diseases (ICD) 10 codes T67 and X30; ICD 9 codes 992 and E900.0). Results Overall, 628 cases presented for heat-related illnesses. The presentations significantly increased on heatwave days based on HWD1 (relative risk (RR) = 4.9, 95% confidence interval (CI): 3.8, 6.3) and HWD2 (RR = 18.5, 95% CI: 12.0, 28.4). The RRs in different age groups ranged between 3-9.2 (HWD1) and 7.5-37.5 (HWD2). High acuity visits significantly increased based on HWD1 (RR = 4.7, 95% CI: 2.3, 9.6) and HWD2 (RR = 81.7, 95% CI: 21.5, 310.0). Average length of stay in ED significantly increased by >1 hour (HWD1) and >2 hours (HWD2). Conclusions Heatwaves significantly increase ED visits and workload even in a subtropical climate. The degree of impact is directly related to the extent of temperature increases and varies by socio-demographic characteristics of the patients. Heatwave action plans should be tailored according to the population needs and level of vulnerability. EDs should have plans to increase their surge capacity during heatwaves.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Cereal grain is one of the main export commodities of Australian agriculture. Over the past decade, crop yield forecasts for wheat and sorghum have shown appreciable utility for industry planning at shire, state, and national scales. There is now an increasing drive from industry for more accurate and cost-effective crop production forecasts. In order to generate production estimates, accurate crop area estimates are needed by the end of the cropping season. Multivariate methods for analysing remotely sensed Enhanced Vegetation Index (EVI) from 16-day Moderate Resolution Imaging Spectroradiometer (MODIS) satellite imagery within the cropping period (i.e. April-November) were investigated to estimate crop area for wheat, barley, chickpea, and total winter cropped area for a case study region in NE Australia. Each pixel classification method was trained on ground truth data collected from the study region. Three approaches to pixel classification were examined: (i) cluster analysis of trajectories of EVI values from consecutive multi-date imagery during the crop growth period; (ii) harmonic analysis of the time series (HANTS) of the EVI values; and (iii) principal component analysis (PCA) of the time series of EVI values. Images classified using these three approaches were compared with each other, and with a classification based on the single MODIS image taken at peak EVI. Imagery for the 2003 and 2004 seasons was used to assess the ability of the methods to determine wheat, barley, chickpea, and total cropped area estimates. The accuracy at pixel scale was determined by the percent correct classification metric by contrasting all pixel scale samples with independent pixel observations. At a shire level, aggregated total crop area estimates were compared with surveyed estimates. All multi-temporal methods showed significant overall capability to estimate total winter crop area. There was high accuracy at pixel scale (>98% correct classification) for identifying overall winter cropping. However, discrimination among crops was less accurate. Although the use of single-date EVI data produced high accuracy for estimates of wheat area at shire scale, the result contradicted the poor pixel-scale accuracy associated with this approach, due to fortuitous compensating errors. Further studies are needed to extrapolate the multi-temporal approaches to other geographical areas and to improve the lead time for deriving cropped-area estimates before harvest.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Understanding the regulatory mechanisms that are responsible for an organism's response to environmental change is an important issue in molecular biology. A first and important step towards this goal is to detect genes whose expression levels are affected by altered external conditions. A range of methods to test for differential gene expression, both in static as well as in time-course experiments, have been proposed. While these tests answer the question whether a gene is differentially expressed, they do not explicitly address the question when a gene is differentially expressed, although this information may provide insights into the course and causal structure of regulatory programs. In this article, we propose a two-sample test for identifying intervals of differential gene expression in microarray time series. Our approach is based on Gaussian process regression, can deal with arbitrary numbers of replicates, and is robust with respect to outliers. We apply our algorithm to study the response of Arabidopsis thaliana genes to an infection by a fungal pathogen using a microarray time series dataset covering 30,336 gene probes at 24 observed time points. In classification experiments, our test compares favorably with existing methods and provides additional insights into time-dependent differential expression.