Biblioteca Digital

96 resultados para Naive Bayes Classifier

Probabilistic physically-based cloud screening of satellite infra-red imagery for operational sea surface temperature retrieval

Relevância:

10.00% 10.00%

Publicador:

Resumo:

We propose and demonstrate a fully probabilistic (Bayesian) approach to the detection of cloudy pixels in thermal infrared (TIR) imagery observed from satellite over oceans. Using this approach, we show how to exploit the prior information and the fast forward modelling capability that are typically available in the operational context to obtain improved cloud detection. The probability of clear sky for each pixel is estimated by applying Bayes' theorem, and we describe how to apply Bayes' theorem to this problem in general terms. Joint probability density functions (PDFs) of the observations in the TIR channels are needed; the PDFs for clear conditions are calculable from forward modelling and those for cloudy conditions have been obtained empirically. Using analysis fields from numerical weather prediction as prior information, we apply the approach to imagery representative of imagers on polar-orbiting platforms. In comparison with the established cloud-screening scheme, the new technique decreases both the rate of failure to detect cloud contamination and the false-alarm rate by one quarter. The rate of occurrence of cloud-screening-related errors of >1 K in area-averaged SSTs is reduced by 83%. Copyright © 2005 Royal Meteorological Society.

Linear and non-linear (non-)forecastability of high-frequency exchange rates

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper forecasts Daily Sterling exchange rate returns using various naive, linear and non-linear univariate time-series models. The accuracy of the forecasts is evaluated using mean squared error and sign prediction criteria. These show only a very modest improvement over forecasts generated by a random walk model. The Pesaran–Timmerman test and a comparison with forecasts generated artificially shows that even the best models have no evidence of market timing ability.

Sea surface temperature climate change initiative: alternative image classification algorithms for sea-ice affected oceans

Relevância:

10.00% 10.00%

Publicador:

Resumo:

We present a Bayesian image classification scheme for discriminating cloud, clear and sea-ice observations at high latitudes to improve identification of areas of clear-sky over ice-free ocean for SST retrieval. We validate the image classification against a manually classified dataset using Advanced Along Track Scanning Radiometer (AATSR) data. A three way classification scheme using a near-infrared textural feature improves classifier accuracy by 9.9 % over the nadir only version of the cloud clearing used in the ATSR Reprocessing for Climate (ARC) project in high latitude regions. The three way classification gives similar numbers of cloud and ice scenes misclassified as clear but significantly more clear-sky cases are correctly identified (89.9 % compared with 65 % for ARC). We also demonstrate the poetential of a Bayesian image classifier including information from the 0.6 micron channel to be used in sea-ice extent and ice surface temperature retrieval with 77.7 % of ice scenes correctly identified and an overall classifier accuracy of 96 %.

PDFOS: PDF estimation based over-sampling for imbalanced two-class problems

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This contribution proposes a novel probability density function (PDF) estimation based over-sampling (PDFOS) approach for two-class imbalanced classification problems. The classical Parzen-window kernel function is adopted to estimate the PDF of the positive class. Then according to the estimated PDF, synthetic instances are generated as the additional training data. The essential concept is to re-balance the class distribution of the original imbalanced data set under the principle that synthetic data sample follows the same statistical properties. Based on the over-sampled training data, the radial basis function (RBF) classifier is constructed by applying the orthogonal forward selection procedure, in which the classifier’s structure and the parameters of RBF kernels are determined using a particle swarm optimisation algorithm based on the criterion of minimising the leave-one-out misclassification rate. The effectiveness of the proposed PDFOS approach is demonstrated by the empirical study on several imbalanced data sets.

Novel single trial movement classification based on temporal dynamics of EEG

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Various complex oscillatory processes are involved in the generation of the motor command. The temporal dynamics of these processes were studied for movement detection from single trial electroencephalogram (EEG). Autocorrelation analysis was performed on the EEG signals to find robust markers of movement detection. The evolution of the autocorrelation function was characterised via the relaxation time of the autocorrelation by exponential curve fitting. It was observed that the decay constant of the exponential curve increased during movement, indicating that the autocorrelation function decays slowly during motor execution. Significant differences were observed between movement and no moment tasks. Additionally, a linear discriminant analysis (LDA) classifier was used to identify movement trials with a peak accuracy of 74%.

Smart-phone based electrocardiogram wavelet decomposition and neural network classification

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper discusses ECG classification after parametrizing the ECG waveforms in the wavelet domain. The aim of the work is to develop an accurate classification algorithm that can be used to diagnose cardiac beat abnormalities detected using a mobile platform such as smart-phones. Continuous time recurrent neural network classifiers are considered for this task. Records from the European ST-T Database are decomposed in the wavelet domain using discrete wavelet transform (DWT) filter banks and the resulting DWT coefficients are filtered and used as inputs for training the neural network classifier. Advantages of the proposed methodology are the reduced memory requirement for the signals which is of relevance to mobile applications as well as an improvement in the ability of the neural network in its generalization ability due to the more parsimonious representation of the signal to its inputs.

On the application of optimal wavelet filter banks for ECG signal classification

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper discusses ECG signal classification after parametrizing the ECG waveforms in the wavelet domain. Signal decomposition using perfect reconstruction quadrature mirror filter banks can provide a very parsimonious representation of ECG signals. In the current work, the filter parameters are adjusted by a numerical optimization algorithm in order to minimize a cost function associated to the filter cut-off sharpness. The goal consists of achieving a better compromise between frequency selectivity and time resolution at each decomposition level than standard orthogonal filter banks such as those of the Daubechies and Coiflet families. Our aim is to optimally decompose the signals in the wavelet domain so that they can be subsequently used as inputs for training to a neural network classifier.

Complex extreme learning machine applications in terahertz pulsed signals feature sets

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper presents a novel approach to the automatic classification of very large data sets composed of terahertz pulse transient signals, highlighting their potential use in biochemical, biomedical, pharmaceutical and security applications. Two different types of THz spectra are considered in the classification process. Firstly a binary classification study of poly-A and poly-C ribonucleic acid samples is performed. This is then contrasted with a difficult multi-class classification problem of spectra from six different powder samples that although have fairly indistinguishable features in the optical spectrum, they also possess a few discernable spectral features in the terahertz part of the spectrum. Classification is performed using a complex-valued extreme learning machine algorithm that takes into account features in both the amplitude as well as the phase of the recorded spectra. Classification speed and accuracy are contrasted with that achieved using a support vector machine classifier. The study systematically compares the classifier performance achieved after adopting different Gaussian kernels when separating amplitude and phase signatures. The two signatures are presented as feature vectors for both training and testing purposes. The study confirms the utility of complex-valued extreme learning machine algorithms for classification of the very large data sets generated with current terahertz imaging spectrometers. The classifier can take into consideration heterogeneous layers within an object as would be required within a tomographic setting and is sufficiently robust to detect patterns hidden inside noisy terahertz data sets. The proposed study opens up the opportunity for the establishment of complex-valued extreme learning machine algorithms as new chemometric tools that will assist the wider proliferation of terahertz sensing technology for chemical sensing, quality control, security screening and clinic diagnosis. Furthermore, the proposed algorithm should also be very useful in other applications requiring the classification of very large datasets.

Representation errors and retrievals in linear and nonlinear data assimilation

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This article shows how one can formulate the representation problem starting from Bayes’ theorem. The purpose of this article is to raise awareness of the formal solutions,so that approximations can be placed in a proper context. The representation errors appear in the likelihood, and the different possibilities for the representation of reality in model and observations are discussed, including nonlinear representation probability density functions. Specifically, the assumptions needed in the usual procedure to add a representation error covariance to the error covariance of the observations are discussed,and it is shown that, when several sub-grid observations are present, their mean still has a representation error ; socalled ‘superobbing’ does not resolve the issue. Connection is made to the off-line or on-line retrieval problem, providing a new simple proof of the equivalence of assimilating linear retrievals and original observations. Furthermore, it is shown how nonlinear retrievals can be assimilated without loss of information. Finally we discuss how errors in the observation operator model can be treated consistently in the Bayesian framework, connecting to previous work in this area.

Computationally efficient rule-based classification for continuous streaming data

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Advances in hardware and software technologies allow to capture streaming data. The area of Data Stream Mining (DSM) is concerned with the analysis of these vast amounts of data as it is generated in real-time. Data stream classification is one of the most important DSM techniques allowing to classify previously unseen data instances. Different to traditional classifiers for static data, data stream classifiers need to adapt to concept changes (concept drift) in the stream in real-time in order to reflect the most recent concept in the data as accurately as possible. A recent addition to the data stream classifier toolbox is eRules which induces and updates a set of expressive rules that can easily be interpreted by humans. However, like most rule-based data stream classifiers, eRules exhibits a poor computational performance when confronted with continuous attributes. In this work, we propose an approach to deal with continuous data effectively and accurately in rule-based classifiers by using the Gaussian distribution as heuristic for building rule terms on continuous attributes. We show on the example of eRules that incorporating our method for continuous attributes indeed speeds up the real-time rule induction process while maintaining a similar level of accuracy compared with the original eRules classifier. We termed this new version of eRules with our approach G-eRules.

Towards a parallel computationally efficient approach to scaling up data stream classification

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Advances in hardware technologies allow to capture and process data in real-time and the resulting high throughput data streams require novel data mining approaches. The research area of Data Stream Mining (DSM) is developing data mining algorithms that allow us to analyse these continuous streams of data in real-time. The creation and real-time adaption of classification models from data streams is one of the most challenging DSM tasks. Current classifiers for streaming data address this problem by using incremental learning algorithms. However, even so these algorithms are fast, they are challenged by high velocity data streams, where data instances are incoming at a fast rate. This is problematic if the applications desire that there is no or only a very little delay between changes in the patterns of the stream and absorption of these patterns by the classifier. Problems of scalability to Big Data of traditional data mining algorithms for static (non streaming) datasets have been addressed through the development of parallel classifiers. However, there is very little work on the parallelisation of data stream classification techniques. In this paper we investigate K-Nearest Neighbours (KNN) as the basis for a real-time adaptive and parallel methodology for scalable data stream classification tasks.

Application of complex extreme learning machine to multiclass classification problems with high dimensionality: a THz spectra classification problem

Relevância:

10.00% 10.00%

Publicador:

Resumo:

We extend extreme learning machine (ELM) classifiers to complex Reproducing Kernel Hilbert Spaces (RKHS) where the input/output variables as well as the optimization variables are complex-valued. A new family of classifiers, called complex-valued ELM (CELM) suitable for complex-valued multiple-input–multiple-output processing is introduced. In the proposed method, the associated Lagrangian is computed using induced RKHS kernels, adopting a Wirtinger calculus approach formulated as a constrained optimization problem similarly to the conventional ELM classifier formulation. When training the CELM, the Karush–Khun–Tuker (KKT) theorem is used to solve the dual optimization problem that consists of satisfying simultaneously smallest training error as well as smallest norm of output weights criteria. The proposed formulation also addresses aspects of quaternary classification within a Clifford algebra context. For 2D complex-valued inputs, user-defined complex-coupled hyper-planes divide the classifier input space into four partitions. For 3D complex-valued inputs, the formulation generates three pairs of complex-coupled hyper-planes through orthogonal projections. The six hyper-planes then divide the 3D space into eight partitions. It is shown that the CELM problem formulation is equivalent to solving six real-valued ELM tasks, which are induced by projecting the chosen complex kernel across the different user-defined coordinate planes. A classification example of powdered samples on the basis of their terahertz spectral signatures is used to demonstrate the advantages of the CELM classifiers compared to their SVM counterparts. The proposed classifiers retain the advantages of their ELM counterparts, in that they can perform multiclass classification with lower computational complexity than SVM classifiers. Furthermore, because of their ability to perform classification tasks fast, the proposed formulations are of interest to real-time applications.

A scalable expressive ensemble learning using Random Prism: a MapReduce approach

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The induction of classification rules from previously unseen examples is one of the most important data mining tasks in science as well as commercial applications. In order to reduce the influence of noise in the data, ensemble learners are often applied. However, most ensemble learners are based on decision tree classifiers which are affected by noise. The Random Prism classifier has recently been proposed as an alternative to the popular Random Forests classifier, which is based on decision trees. Random Prism is based on the Prism family of algorithms, which is more robust to noise. However, like most ensemble classification approaches, Random Prism also does not scale well on large training data. This paper presents a thorough discussion of Random Prism and a recently proposed parallel version of it called Parallel Random Prism. Parallel Random Prism is based on the MapReduce programming paradigm. The paper provides, for the first time, novel theoretical analysis of the proposed technique and in-depth experimental study that show that Parallel Random Prism scales well on a large number of training examples, a large number of data features and a large number of processors. Expressiveness of decision rules that our technique produces makes it a natural choice for Big Data applications where informed decision making increases the user’s trust in the system.

Motor imagery-induced EEG patterns in individuals with spinal cord injury and their impact on brain–computer interface accuracy

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Objective. Assimilating the diagnosis complete spinal cord injury (SCI) takes time and is not easy, as patients know that there is no ‘cure’ at the present time. Brain–computer interfaces (BCIs) can facilitate daily living. However, inter-subject variability demands measurements with potential user groups and an understanding of how they differ to healthy users BCIs are more commonly tested with. Thus, a three-class motor imagery (MI) screening (left hand, right hand, feet) was performed with a group of 10 able-bodied and 16 complete spinal-cord-injured people (paraplegics, tetraplegics) with the objective of determining what differences were present between the user groups and how they would impact upon the ability of these user groups to interact with a BCI. Approach. Electrophysiological differences between patient groups and healthy users are measured in terms of sensorimotor rhythm deflections from baseline during MI, electroencephalogram microstate scalp maps and strengths of inter-channel phase synchronization. Additionally, using a common spatial pattern algorithm and a linear discriminant analysis classifier, the classification accuracy was calculated and compared between groups. Main results. It is seen that both patient groups (tetraplegic and paraplegic) have some significant differences in event-related desynchronization strengths, exhibit significant increases in synchronization and reach significantly lower accuracies (mean (M) = 66.1%) than the group of healthy subjects (M = 85.1%). Significance. The results demonstrate significant differences in electrophysiological correlates of motor control between healthy individuals and those individuals who stand to benefit most from BCI technology (individuals with SCI). They highlight the difficulty in directly translating results from healthy subjects to participants with SCI and the challenges that, therefore, arise in providing BCIs to such individuals

Motor imagery induced EEG patterns in spinal cord injury patients and their impact on Brain-Computer Interface accuracy

Relevância:

10.00% 10.00%

Publicador:

Resumo:

OBJECTIVE: Assimilating the diagnosis complete spinal cord injury (SCI) takes time and is not easy, as patients know that there is no 'cure' at the present time. Brain-computer interfaces (BCIs) can facilitate daily living. However, inter-subject variability demands measurements with potential user groups and an understanding of how they differ to healthy users BCIs are more commonly tested with. Thus, a three-class motor imagery (MI) screening (left hand, right hand, feet) was performed with a group of 10 able-bodied and 16 complete spinal-cord-injured people (paraplegics, tetraplegics) with the objective of determining what differences were present between the user groups and how they would impact upon the ability of these user groups to interact with a BCI. APPROACH: Electrophysiological differences between patient groups and healthy users are measured in terms of sensorimotor rhythm deflections from baseline during MI, electroencephalogram microstate scalp maps and strengths of inter-channel phase synchronization. Additionally, using a common spatial pattern algorithm and a linear discriminant analysis classifier, the classification accuracy was calculated and compared between groups. MAIN RESULTS: It is seen that both patient groups (tetraplegic and paraplegic) have some significant differences in event-related desynchronization strengths, exhibit significant increases in synchronization and reach significantly lower accuracies (mean (M) = 66.1%) than the group of healthy subjects (M = 85.1%). SIGNIFICANCE: The results demonstrate significant differences in electrophysiological correlates of motor control between healthy individuals and those individuals who stand to benefit most from BCI technology (individuals with SCI). They highlight the difficulty in directly translating results from healthy subjects to participants with SCI and the challenges that, therefore, arise in providing BCIs to such individuals.

«
1
2
3
4
5
6
7
»