968 resultados para Unsupervised classification
Resumo:
Advances in hardware technologies allow to capture and process data in real-time and the resulting high throughput data streams require novel data mining approaches. The research area of Data Stream Mining (DSM) is developing data mining algorithms that allow us to analyse these continuous streams of data in real-time. The creation and real-time adaption of classification models from data streams is one of the most challenging DSM tasks. Current classifiers for streaming data address this problem by using incremental learning algorithms. However, even so these algorithms are fast, they are challenged by high velocity data streams, where data instances are incoming at a fast rate. This is problematic if the applications desire that there is no or only a very little delay between changes in the patterns of the stream and absorption of these patterns by the classifier. Problems of scalability to Big Data of traditional data mining algorithms for static (non streaming) datasets have been addressed through the development of parallel classifiers. However, there is very little work on the parallelisation of data stream classification techniques. In this paper we investigate K-Nearest Neighbours (KNN) as the basis for a real-time adaptive and parallel methodology for scalable data stream classification tasks.
Resumo:
Three coupled knowledge transfer partnerships used pattern recognition techniques to produce an e-procurement system which, the National Audit Office reports, could save the National Health Service £500 m per annum. An extension to the system, GreenInsight, allows the environmental impact of procurements to be assessed and savings made. Both systems require suitable products to be discovered and equivalent products recognised, for which classification is a key component. This paper describes the innovative work done for product classification, feature selection and reducing the impact of mislabelled data.
Resumo:
We extend extreme learning machine (ELM) classifiers to complex Reproducing Kernel Hilbert Spaces (RKHS) where the input/output variables as well as the optimization variables are complex-valued. A new family of classifiers, called complex-valued ELM (CELM) suitable for complex-valued multiple-input–multiple-output processing is introduced. In the proposed method, the associated Lagrangian is computed using induced RKHS kernels, adopting a Wirtinger calculus approach formulated as a constrained optimization problem similarly to the conventional ELM classifier formulation. When training the CELM, the Karush–Khun–Tuker (KKT) theorem is used to solve the dual optimization problem that consists of satisfying simultaneously smallest training error as well as smallest norm of output weights criteria. The proposed formulation also addresses aspects of quaternary classification within a Clifford algebra context. For 2D complex-valued inputs, user-defined complex-coupled hyper-planes divide the classifier input space into four partitions. For 3D complex-valued inputs, the formulation generates three pairs of complex-coupled hyper-planes through orthogonal projections. The six hyper-planes then divide the 3D space into eight partitions. It is shown that the CELM problem formulation is equivalent to solving six real-valued ELM tasks, which are induced by projecting the chosen complex kernel across the different user-defined coordinate planes. A classification example of powdered samples on the basis of their terahertz spectral signatures is used to demonstrate the advantages of the CELM classifiers compared to their SVM counterparts. The proposed classifiers retain the advantages of their ELM counterparts, in that they can perform multiclass classification with lower computational complexity than SVM classifiers. Furthermore, because of their ability to perform classification tasks fast, the proposed formulations are of interest to real-time applications.
Resumo:
We present a method for the recognition of complex actions. Our method combines automatic learning of simple actions and manual definition of complex actions in a single grammar. Contrary to the general trend in complex action recognition that consists in dividing recognition into two stages, our method performs recognition of simple and complex actions in a unified way. This is performed by encoding simple action HMMs within the stochastic grammar that models complex actions. This unified approach enables a more effective influence of the higher activity layers into the recognition of simple actions which leads to a substantial improvement in the classification of complex actions. We consider the recognition of complex actions based on person transits between areas in the scene. As input, our method receives crossings of tracks along a set of zones which are derived using unsupervised learning of the movement patterns of the objects in the scene. We evaluate our method on a large dataset showing normal, suspicious and threat behaviour on a parking lot. Experiments show an improvement of ~ 30% in the recognition of both high-level scenarios and their composing simple actions with respect to a two-stage approach. Experiments with synthetic noise simulating the most common tracking failures show that our method only experiences a limited decrease in performance when moderate amounts of noise are added.
Resumo:
Involuntary musical imagery (INMI) is the subject of much recent research interest. INMI covers a number of experience types such as musical obsessions and musical hallucinations. One type of experience has been called earworms, for which the literature provides a number of definitions. In this paper we consider the origins of the term earworm in the German language literature and compare that usage with the English language literature. We consider the published literature on earworms and conclude that there is merit in distinguishing between earworms and other types of types of involuntary musical imagery described in the scientific literature: e.g. musical hallucinations, musical obsessions. We also describe other experiences that can be considered under the term INMI. The aim of future research could be to ascertain similarities and differences between types of INMI with a view to refining the classification scheme proposed here.
Resumo:
The objective of this article is to study the problem of pedestrian classification across different light spectrum domains (visible and far-infrared (FIR)) and modalities (intensity, depth and motion). In recent years, there has been a number of approaches for classifying and detecting pedestrians in both FIR and visible images, but the methods are difficult to compare, because either the datasets are not publicly available or they do not offer a comparison between the two domains. Our two primary contributions are the following: (1) we propose a public dataset, named RIFIR , containing both FIR and visible images collected in an urban environment from a moving vehicle during daytime; and (2) we compare the state-of-the-art features in a multi-modality setup: intensity, depth and flow, in far-infrared over visible domains. The experiments show that features families, intensity self-similarity (ISS), local binary patterns (LBP), local gradient patterns (LGP) and histogram of oriented gradients (HOG), computed from FIR and visible domains are highly complementary, but their relative performance varies across different modalities. In our experiments, the FIR domain has proven superior to the visible one for the task of pedestrian classification, but the overall best results are obtained by a multi-domain multi-modality multi-feature fusion.
Resumo:
An important application of Big Data Analytics is the real-time analysis of streaming data. Streaming data imposes unique challenges to data mining algorithms, such as concept drifts, the need to analyse the data on the fly due to unbounded data streams and scalable algorithms due to potentially high throughput of data. Real-time classification algorithms that are adaptive to concept drifts and fast exist, however, most approaches are not naturally parallel and are thus limited in their scalability. This paper presents work on the Micro-Cluster Nearest Neighbour (MC-NN) classifier. MC-NN is based on an adaptive statistical data summary based on Micro-Clusters. MC-NN is very fast and adaptive to concept drift whilst maintaining the parallel properties of the base KNN classifier. Also MC-NN is competitive compared with existing data stream classifiers in terms of accuracy and speed.
Resumo:
Parkinson is a neurodegenerative disease, in which tremor is the main symptom. This paper investigates the use of different classification methods to identify tremors experienced by Parkinsonian patients.Some previous research has focussed tremor analysis on external body signals (e.g., electromyography, accelerometer signals, etc.). Our advantage is that we have access to sub-cortical data, which facilitates the applicability of the obtained results into real medical devices since we are dealing with brain signals directly. Local field potentials (LFP) were recorded in the subthalamic nucleus of 7 Parkinsonian patients through the implanted electrodes of a deep brain stimulation (DBS) device prior to its internalization. Measured LFP signals were preprocessed by means of splinting, down sampling, filtering, normalization and rec-tification. Then, feature extraction was conducted through a multi-level decomposition via a wavelettrans form. Finally, artificial intelligence techniques were applied to feature selection, clustering of tremor types, and tremor detection.The key contribution of this paper is to present initial results which indicate, to a high degree of certainty, that there appear to be two distinct subgroups of patients within the group-1 of patients according to the Consensus Statement of the Movement Disorder Society on Tremor. Such results may well lead to different resultant treatments for the patients involved, depending on how their tremor has been classified. Moreover, we propose a new approach for demand driven stimulation, in which tremor detection is also based on the subtype of tremor the patient has. Applying this knowledge to the tremor detection problem, it can be concluded that the results improve when patient clustering is applied prior to detection.
Resumo:
An efficient and robust method to measure vitamin D (25-hydroxy vitamin D3 (25(OH)D3) and 25-hydroxy vitamin D2 in dried blood spots (DBS) has been developed and applied in the pan-European multi-centre, internet-based, personalised nutrition intervention study Food4Me. The method includes calibration with blood containing endogenous 25(OH)D3, spotted as DBS and corrected for haematocrit content. The methodology was validated following international standards. The performance characteristics did not reach those of the current gold standard liquid chromatography-MS/MS in plasma for all parameters, but were found to be very suitable for status-level determination under field conditions. DBS sample quality was very high, and 3778 measurements of 25(OH)D3 were obtained from 1465 participants. The study centre and the season within the study centre were very good predictors of 25(OH)D3 levels (P<0·001 for each case). Seasonal effects were modelled by fitting a sine function with a minimum 25(OH)D3 level on 20 January and a maximum on 21 July. The seasonal amplitude varied from centre to centre. The largest difference between winter and summer levels was found in Germany and the smallest in Poland. The model was cross-validated to determine the consistency of the predictions and the performance of the DBS method. The Pearson's correlation between the measured values and the predicted values was r 0·65, and the sd of their differences was 21·2 nmol/l. This includes the analytical variation and the biological variation within subjects. Overall, DBS obtained by unsupervised sampling of the participants at home was a viable methodology for obtaining vitamin D status information in a large nutritional study.
Resumo:
This work investigates the problem of feature selection in neuroimaging features from structural MRI brain images for the classification of subjects as healthy controls, suffering from Mild Cognitive Impairment or Alzheimer’s Disease. A Genetic Algorithm wrapper method for feature selection is adopted in conjunction with a Support Vector Machine classifier. In very large feature sets, feature selection is found to be redundant as the accuracy is often worsened when compared to an Support Vector Machine with no feature selection. However, when just the hippocampal subfields are used, feature selection shows a significant improvement of the classification accuracy. Three-class Support Vector Machines and two-class Support Vector Machines combined with weighted voting are also compared with the former and found more useful. The highest accuracy achieved at classifying the test data was 65.5% using a genetic algorithm for feature selection with a three-class Support Vector Machine classifier.
Resumo:
The personalised conditioning system (PCS) is widely studied. Potentially, it is able to reduce energy consumption while securing occupants’ thermal comfort requirements. It has been suggested that automatic optimised operation schemes for PCS should be introduced to avoid energy wastage and discomfort caused by inappropriate operation. In certain automatic operation schemes, personalised thermal sensation models are applied as key components to help in setting targets for PCS operation. In this research, a novel personal thermal sensation modelling method based on the C-Support Vector Classification (C-SVC) algorithm has been developed for PCS control. The personal thermal sensation modelling has been regarded as a classification problem. During the modelling process, the method ‘learns’ an occupant’s thermal preferences from his/her feedback, environmental parameters and personal physiological and behavioural factors. The modelling method has been verified by comparing the actual thermal sensation vote (TSV) with the modelled one based on 20 individual cases. Furthermore, the accuracy of each individual thermal sensation model has been compared with the outcomes of the PMV model. The results indicate that the modelling method presented in this paper is an effective tool to model personal thermal sensations and could be integrated within the PCS for optimised system operation and control.
Resumo:
Sea-ice concentrations in the Laptev Sea simulated by the coupled North Atlantic-Arctic Ocean-Sea-Ice Model and Finite Element Sea-Ice Ocean Model are evaluated using sea-ice concentrations from Advanced Microwave Scanning Radiometer-Earth Observing System satellite data and a polynya classification method for winter 2007/08. While developed to simulate largescale sea-ice conditions, both models are analysed here in terms of polynya simulation. The main modification of both models in this study is the implementation of a landfast-ice mask. Simulated sea-ice fields from different model runs are compared with emphasis placed on the impact of this prescribed landfast-ice mask. We demonstrate that sea-ice models are not able to simulate flaw polynyas realistically when used without fast-ice description. Our investigations indicate that without landfast ice and with coarse horizontal resolution the models overestimate the fraction of open water in the polynya. This is not because a realistic polynya appears but due to a larger-scale reduction of ice concentrations and smoothed ice-concentration fields. After implementation of a landfast-ice mask, the polynya location is realistically simulated but the total open-water area is still overestimated in most cases. The study shows that the fast-ice parameterization is essential for model improvements. However, further improvements are necessary in order to progress from the simulation of large-scale features in the Arctic towards a more detailed simulation of smaller-scaled features (here polynyas) in an Arctic shelf sea.
Resumo:
Epidendrum L. is the largest genus of Orchidaceae in the Neotropical region; it has an impressive morphological diversification, which imposes difficulties in delimitation of both infrageneric and interspecific boundaries. In this study, we review infrageneric boundaries within the subgenus Amphiglottium and try to contribute to the understanding of morphological diversification and taxa delimitation within this group. We tested the monophyly of the subgenus Amphiglottium sect. Amphiglottium, expanding previous phylogenetic investigations and reevaluated previous infrageneric classifications proposed. Sequence data from the trnL-trnF region were analyzed with both parsimony and maximum likelihood criteria. AFLP markers were also obtained and analyzed with phylogenetic and principal coordinate analyses. Additionally, we obtained chromosome numbers for representative species within the group. The results strengthen the monophyly of the subgenus Amphiglottium but do not support the current classification system proposed by previous authors. Only section Tuberculata comprises a well-supported monophyletic group, with sections Carinata and Integra not supported. Instead of morphology, biogeographical and ecological patterns are reflected in the phylogenetic signal in this group. This study also confirms the large variability of chromosome numbers for the subgenus Amphiglottium (numbers ranging from 2n = 24 to 2n = 240), suggesting that polyploidy and hybridization are probably important mechanisms of speciation within the group.
Resumo:
Predictive performance evaluation is a fundamental issue in design, development, and deployment of classification systems. As predictive performance evaluation is a multidimensional problem, single scalar summaries such as error rate, although quite convenient due to its simplicity, can seldom evaluate all the aspects that a complete and reliable evaluation must consider. Due to this, various graphical performance evaluation methods are increasingly drawing the attention of machine learning, data mining, and pattern recognition communities. The main advantage of these types of methods resides in their ability to depict the trade-offs between evaluation aspects in a multidimensional space rather than reducing these aspects to an arbitrarily chosen (and often biased) single scalar measure. Furthermore, to appropriately select a suitable graphical method for a given task, it is crucial to identify its strengths and weaknesses. This paper surveys various graphical methods often used for predictive performance evaluation. By presenting these methods in the same framework, we hope this paper may shed some light on deciding which methods are more suitable to use in different situations.