984 resultados para Time-series data
Resumo:
Data Mining (DM) methods are being increasingly used in prediction with time series data, in addition to traditional statistical approaches. This paper presents a literature review of the use of DM with time series data, focusing on short- time stocks prediction. This is an area that has been attracting a great deal of attention from researchers in the field. The main contribution of this paper is to provide an outline of the use of DM with time series data, using mainly examples related with short-term stocks prediction. This is important to a better understanding of the field. Some of the main trends and open issues will also be introduced.
Resumo:
In this paper we try to fit a threshold autoregressive (TAR) model to time series data of monthly coconut oil prices at Cochin market. The procedure proposed by Tsay [7] for fitting the TAR model is briefly presented. The fitted model is compared with a simple autoregressive (AR) model. The results are in favour of TAR process. Thus the monthly coconut oil prices exhibit a type of non-linearity which can be accounted for by a threshold model.
Resumo:
Abstract Background A popular model for gene regulatory networks is the Boolean network model. In this paper, we propose an algorithm to perform an analysis of gene regulatory interactions using the Boolean network model and time-series data. Actually, the Boolean network is restricted in the sense that only a subset of all possible Boolean functions are considered. We explore some mathematical properties of the restricted Boolean networks in order to avoid the full search approach. The problem is modeled as a Constraint Satisfaction Problem (CSP) and CSP techniques are used to solve it. Results We applied the proposed algorithm in two data sets. First, we used an artificial dataset obtained from a model for the budding yeast cell cycle. The second data set is derived from experiments performed using HeLa cells. The results show that some interactions can be fully or, at least, partially determined under the Boolean model considered. Conclusions The algorithm proposed can be used as a first step for detection of gene/protein interactions. It is able to infer gene relationships from time-series data of gene expression, and this inference process can be aided by a priori knowledge available.
Resumo:
This work proposes a system for classification of industrial steel pieces by means of magnetic nondestructive device. The proposed classification system presents two main stages, online system stage and off-line system stage. In online stage, the system classifies inputs and saves misclassification information in order to perform posterior analyses. In the off-line optimization stage, the topology of a Probabilistic Neural Network is optimized by a Feature Selection algorithm combined with the Probabilistic Neural Network to increase the classification rate. The proposed Feature Selection algorithm searches for the signal spectrogram by combining three basic elements: a Sequential Forward Selection algorithm, a Feature Cluster Grow algorithm with classification rate gradient analysis and a Sequential Backward Selection. Also, a trash-data recycling algorithm is proposed to obtain the optimal feedback samples selected from the misclassified ones.
Resumo:
Visualization and exploratory analysis is an important part of any data analysis and is made more challenging when the data are voluminous and high-dimensional. One such example is environmental monitoring data, which are often collected over time and at multiple locations, resulting in a geographically indexed multivariate time series. Financial data, although not necessarily containing a geographic component, present another source of high-volume multivariate time series data. We present the mvtsplot function which provides a method for visualizing multivariate time series data. We outline the basic design concepts and provide some examples of its usage by applying it to a database of ambient air pollution measurements in the United States and to a hypothetical portfolio of stocks.
Resumo:
Traffic flow time series data are usually high dimensional and very complex. Also they are sometimes imprecise and distorted due to data collection sensor malfunction. Additionally, events like congestion caused by traffic accidents add more uncertainty to real-time traffic conditions, making traffic flow forecasting a complicated task. This article presents a new data preprocessing method targeting multidimensional time series with a very high number of dimensions and shows its application to real traffic flow time series from the California Department of Transportation (PEMS web site). The proposed method consists of three main steps. First, based on a language for defining events in multidimensional time series, mTESL, we identify a number of types of events in time series that corresponding to either incorrect data or data with interference. Second, each event type is restored utilizing an original method that combines real observations, local forecasted values and historical data. Third, an exponential smoothing procedure is applied globally to eliminate noise interference and other random errors so as to provide good quality source data for future work.
Resumo:
In this study, a methodology based in a dynamical framework is proposed to incorporate additional sources of information to normalized difference vegetation index (NDVI) time series of agricultural observations for a phenological state estimation application. The proposed implementation is based on the particle filter (PF) scheme that is able to integrate multiple sources of data. Moreover, the dynamics-led design is able to conduct real-time (online) estimations, i.e., without requiring to wait until the end of the campaign. The evaluation of the algorithm is performed by estimating the phenological states over a set of rice fields in Seville (SW, Spain). A Landsat-5/7 NDVI series of images is complemented with two distinct sources of information: SAR images from the TerraSAR-X satellite and air temperature information from a ground-based station. An improvement in the overall estimation accuracy is obtained, especially when the time series of NDVI data is incomplete. Evaluations on the sensitivity to different development intervals and on the mitigation of discontinuities of the time series are also addressed in this work, demonstrating the benefits of this data fusion approach based on the dynamic systems.
Resumo:
Mode of access: Internet.
Resumo:
This thesis addresses the problem of information hiding in low dimensional digital data focussing on issues of privacy and security in Electronic Patient Health Records (EPHRs). The thesis proposes a new security protocol based on data hiding techniques for EPHRs. This thesis contends that embedding of sensitive patient information inside the EPHR is the most appropriate solution currently available to resolve the issues of security in EPHRs. Watermarking techniques are applied to one-dimensional time series data such as the electroencephalogram (EEG) to show that they add a level of confidence (in terms of privacy and security) in an individual’s diverse bio-profile (the digital fingerprint of an individual’s medical history), ensure belief that the data being analysed does indeed belong to the correct person, and also that it is not being accessed by unauthorised personnel. Embedding information inside single channel biomedical time series data is more difficult than the standard application for images due to the reduced redundancy. A data hiding approach which has an in built capability to protect against illegal data snooping is developed. The capability of this secure method is enhanced by embedding not just a single message but multiple messages into an example one-dimensional EEG signal. Embedding multiple messages of similar characteristics, for example identities of clinicians accessing the medical record helps in creating a log of access while embedding multiple messages of dissimilar characteristics into an EPHR enhances confidence in the use of the EPHR. The novel method of embedding multiple messages of both similar and dissimilar characteristics into a single channel EEG demonstrated in this thesis shows how this embedding of data boosts the implementation and use of the EPHR securely.
Resumo:
In the face of global population growth and the uneven distribution of water supply, a better knowledge of the spatial and temporal distribution of surface water resources is critical. Remote sensing provides a synoptic view of ongoing processes, which addresses the intricate nature of water surfaces and allows an assessment of the pressures placed on aquatic ecosystems. However, the main challenge in identifying water surfaces from remotely sensed data is the high variability of spectral signatures, both in space and time. In the last 10 years only a few operational methods have been proposed to map or monitor surface water at continental or global scale, and each of them show limitations. The objective of this study is to develop and demonstrate the adequacy of a generic multi-temporal and multi-spectral image analysis method to detect water surfaces automatically, and to monitor them in near-real-time. The proposed approach, based on a transformation of the RGB color space into HSV, provides dynamic information at the continental scale. The validation of the algorithm showed very few omission errors and no commission errors. It demonstrates the ability of the proposed algorithm to perform as effectively as human interpretation of the images. The validation of the permanent water surface product with an independent dataset derived from high resolution imagery, showed an accuracy of 91.5% and few commission errors. Potential applications of the proposed method have been identified and discussed. The methodology that has been developed 27 is generic: it can be applied to sensors with similar bands with good reliability, and minimal effort. Moreover, this experiment at continental scale showed that the methodology is efficient for a large range of environmental conditions. Additional preliminary tests over other continents indicate that the proposed methodology could also be applied at the global scale without too many difficulties
Resumo:
Machine learning is widely adopted to decode multi-variate neural time series, including electroencephalographic (EEG) and single-cell recordings. Recent solutions based on deep learning (DL) outperformed traditional decoders by automatically extracting relevant discriminative features from raw or minimally pre-processed signals. Convolutional Neural Networks (CNNs) have been successfully applied to EEG and are the most common DL-based EEG decoders in the state-of-the-art (SOA). However, the current research is affected by some limitations. SOA CNNs for EEG decoding usually exploit deep and heavy structures with the risk of overfitting small datasets, and architectures are often defined empirically. Furthermore, CNNs are mainly validated by designing within-subject decoders. Crucially, the automatically learned features mainly remain unexplored; conversely, interpreting these features may be of great value to use decoders also as analysis tools, highlighting neural signatures underlying the different decoded brain or behavioral states in a data-driven way. Lastly, SOA DL-based algorithms used to decode single-cell recordings rely on more complex, slower to train and less interpretable networks than CNNs, and the use of CNNs with these signals has not been investigated. This PhD research addresses the previous limitations, with reference to P300 and motor decoding from EEG, and motor decoding from single-neuron activity. CNNs were designed light, compact, and interpretable. Moreover, multiple training strategies were adopted, including transfer learning, which could reduce training times promoting the application of CNNs in practice. Furthermore, CNN-based EEG analyses were proposed to study neural features in the spatial, temporal and frequency domains, and proved to better highlight and enhance relevant neural features related to P300 and motor states than canonical EEG analyses. Remarkably, these analyses could be used, in perspective, to design novel EEG biomarkers for neurological or neurodevelopmental disorders. Lastly, CNNs were developed to decode single-neuron activity, providing a better compromise between performance and model complexity.
Resumo:
Moderate resolution remote sensing data, as provided by MODIS, can be used to detect and map active or past wildfires from daily records of suitable combinations of reflectance bands. The objective of the present work was to develop and test simple algorithms and variations for automatic or semiautomatic detection of burnt areas from time series data of MODIS biweekly vegetation indices for a Mediterranean region. MODIS-derived NDVI 250m time series data for the Valencia region, East Spain, were subjected to a two-step process for the detection of candidate burnt areas, and the results compared with available fire event records from the Valencia Regional Government. For each pixel and date in the data series, a model was fitted to both the previous and posterior time series data. Combining drops between two consecutive points and 1-year average drops, we used discrepancies or jumps between the pre and post models to identify seed pixels, and then delimitated fire scars for each potential wildfire using an extension algorithm from the seed pixels. The resulting maps of the detected burnt areas showed a very good agreement with the perimeters registered in the database of fire records used as reference. Overall accuracies and indices of agreement were very high, and omission and commission errors were similar or lower than in previous studies that used automatic or semiautomatic fire scar detection based on remote sensing. This supports the effectiveness of the method for detecting and mapping burnt areas in the Mediterranean region.
Resumo:
Background: The inference of gene regulatory networks (GRNs) from large-scale expression profiles is one of the most challenging problems of Systems Biology nowadays. Many techniques and models have been proposed for this task. However, it is not generally possible to recover the original topology with great accuracy, mainly due to the short time series data in face of the high complexity of the networks and the intrinsic noise of the expression measurements. In order to improve the accuracy of GRNs inference methods based on entropy (mutual information), a new criterion function is here proposed. Results: In this paper we introduce the use of generalized entropy proposed by Tsallis, for the inference of GRNs from time series expression profiles. The inference process is based on a feature selection approach and the conditional entropy is applied as criterion function. In order to assess the proposed methodology, the algorithm is applied to recover the network topology from temporal expressions generated by an artificial gene network (AGN) model as well as from the DREAM challenge. The adopted AGN is based on theoretical models of complex networks and its gene transference function is obtained from random drawing on the set of possible Boolean functions, thus creating its dynamics. On the other hand, DREAM time series data presents variation of network size and its topologies are based on real networks. The dynamics are generated by continuous differential equations with noise and perturbation. By adopting both data sources, it is possible to estimate the average quality of the inference with respect to different network topologies, transfer functions and network sizes. Conclusions: A remarkable improvement of accuracy was observed in the experimental results by reducing the number of false connections in the inferred topology by the non-Shannon entropy. The obtained best free parameter of the Tsallis entropy was on average in the range 2.5 <= q <= 3.5 (hence, subextensive entropy), which opens new perspectives for GRNs inference methods based on information theory and for investigation of the nonextensivity of such networks. The inference algorithm and criterion function proposed here were implemented and included in the DimReduction software, which is freely available at http://sourceforge.net/projects/dimreduction and http://code.google.com/p/dimreduction/.