839 resultados para Geomechanic classification
Resumo:
Advances in hardware technologies allow to capture and process data in real-time and the resulting high throughput data streams require novel data mining approaches. The research area of Data Stream Mining (DSM) is developing data mining algorithms that allow us to analyse these continuous streams of data in real-time. The creation and real-time adaption of classification models from data streams is one of the most challenging DSM tasks. Current classifiers for streaming data address this problem by using incremental learning algorithms. However, even so these algorithms are fast, they are challenged by high velocity data streams, where data instances are incoming at a fast rate. This is problematic if the applications desire that there is no or only a very little delay between changes in the patterns of the stream and absorption of these patterns by the classifier. Problems of scalability to Big Data of traditional data mining algorithms for static (non streaming) datasets have been addressed through the development of parallel classifiers. However, there is very little work on the parallelisation of data stream classification techniques. In this paper we investigate K-Nearest Neighbours (KNN) as the basis for a real-time adaptive and parallel methodology for scalable data stream classification tasks.
Resumo:
Three coupled knowledge transfer partnerships used pattern recognition techniques to produce an e-procurement system which, the National Audit Office reports, could save the National Health Service £500 m per annum. An extension to the system, GreenInsight, allows the environmental impact of procurements to be assessed and savings made. Both systems require suitable products to be discovered and equivalent products recognised, for which classification is a key component. This paper describes the innovative work done for product classification, feature selection and reducing the impact of mislabelled data.
Resumo:
We extend extreme learning machine (ELM) classifiers to complex Reproducing Kernel Hilbert Spaces (RKHS) where the input/output variables as well as the optimization variables are complex-valued. A new family of classifiers, called complex-valued ELM (CELM) suitable for complex-valued multiple-input–multiple-output processing is introduced. In the proposed method, the associated Lagrangian is computed using induced RKHS kernels, adopting a Wirtinger calculus approach formulated as a constrained optimization problem similarly to the conventional ELM classifier formulation. When training the CELM, the Karush–Khun–Tuker (KKT) theorem is used to solve the dual optimization problem that consists of satisfying simultaneously smallest training error as well as smallest norm of output weights criteria. The proposed formulation also addresses aspects of quaternary classification within a Clifford algebra context. For 2D complex-valued inputs, user-defined complex-coupled hyper-planes divide the classifier input space into four partitions. For 3D complex-valued inputs, the formulation generates three pairs of complex-coupled hyper-planes through orthogonal projections. The six hyper-planes then divide the 3D space into eight partitions. It is shown that the CELM problem formulation is equivalent to solving six real-valued ELM tasks, which are induced by projecting the chosen complex kernel across the different user-defined coordinate planes. A classification example of powdered samples on the basis of their terahertz spectral signatures is used to demonstrate the advantages of the CELM classifiers compared to their SVM counterparts. The proposed classifiers retain the advantages of their ELM counterparts, in that they can perform multiclass classification with lower computational complexity than SVM classifiers. Furthermore, because of their ability to perform classification tasks fast, the proposed formulations are of interest to real-time applications.
Resumo:
Involuntary musical imagery (INMI) is the subject of much recent research interest. INMI covers a number of experience types such as musical obsessions and musical hallucinations. One type of experience has been called earworms, for which the literature provides a number of definitions. In this paper we consider the origins of the term earworm in the German language literature and compare that usage with the English language literature. We consider the published literature on earworms and conclude that there is merit in distinguishing between earworms and other types of types of involuntary musical imagery described in the scientific literature: e.g. musical hallucinations, musical obsessions. We also describe other experiences that can be considered under the term INMI. The aim of future research could be to ascertain similarities and differences between types of INMI with a view to refining the classification scheme proposed here.
Resumo:
The objective of this article is to study the problem of pedestrian classification across different light spectrum domains (visible and far-infrared (FIR)) and modalities (intensity, depth and motion). In recent years, there has been a number of approaches for classifying and detecting pedestrians in both FIR and visible images, but the methods are difficult to compare, because either the datasets are not publicly available or they do not offer a comparison between the two domains. Our two primary contributions are the following: (1) we propose a public dataset, named RIFIR , containing both FIR and visible images collected in an urban environment from a moving vehicle during daytime; and (2) we compare the state-of-the-art features in a multi-modality setup: intensity, depth and flow, in far-infrared over visible domains. The experiments show that features families, intensity self-similarity (ISS), local binary patterns (LBP), local gradient patterns (LGP) and histogram of oriented gradients (HOG), computed from FIR and visible domains are highly complementary, but their relative performance varies across different modalities. In our experiments, the FIR domain has proven superior to the visible one for the task of pedestrian classification, but the overall best results are obtained by a multi-domain multi-modality multi-feature fusion.
Resumo:
An important application of Big Data Analytics is the real-time analysis of streaming data. Streaming data imposes unique challenges to data mining algorithms, such as concept drifts, the need to analyse the data on the fly due to unbounded data streams and scalable algorithms due to potentially high throughput of data. Real-time classification algorithms that are adaptive to concept drifts and fast exist, however, most approaches are not naturally parallel and are thus limited in their scalability. This paper presents work on the Micro-Cluster Nearest Neighbour (MC-NN) classifier. MC-NN is based on an adaptive statistical data summary based on Micro-Clusters. MC-NN is very fast and adaptive to concept drift whilst maintaining the parallel properties of the base KNN classifier. Also MC-NN is competitive compared with existing data stream classifiers in terms of accuracy and speed.
Resumo:
Parkinson is a neurodegenerative disease, in which tremor is the main symptom. This paper investigates the use of different classification methods to identify tremors experienced by Parkinsonian patients.Some previous research has focussed tremor analysis on external body signals (e.g., electromyography, accelerometer signals, etc.). Our advantage is that we have access to sub-cortical data, which facilitates the applicability of the obtained results into real medical devices since we are dealing with brain signals directly. Local field potentials (LFP) were recorded in the subthalamic nucleus of 7 Parkinsonian patients through the implanted electrodes of a deep brain stimulation (DBS) device prior to its internalization. Measured LFP signals were preprocessed by means of splinting, down sampling, filtering, normalization and rec-tification. Then, feature extraction was conducted through a multi-level decomposition via a wavelettrans form. Finally, artificial intelligence techniques were applied to feature selection, clustering of tremor types, and tremor detection.The key contribution of this paper is to present initial results which indicate, to a high degree of certainty, that there appear to be two distinct subgroups of patients within the group-1 of patients according to the Consensus Statement of the Movement Disorder Society on Tremor. Such results may well lead to different resultant treatments for the patients involved, depending on how their tremor has been classified. Moreover, we propose a new approach for demand driven stimulation, in which tremor detection is also based on the subtype of tremor the patient has. Applying this knowledge to the tremor detection problem, it can be concluded that the results improve when patient clustering is applied prior to detection.
Resumo:
This work investigates the problem of feature selection in neuroimaging features from structural MRI brain images for the classification of subjects as healthy controls, suffering from Mild Cognitive Impairment or Alzheimer’s Disease. A Genetic Algorithm wrapper method for feature selection is adopted in conjunction with a Support Vector Machine classifier. In very large feature sets, feature selection is found to be redundant as the accuracy is often worsened when compared to an Support Vector Machine with no feature selection. However, when just the hippocampal subfields are used, feature selection shows a significant improvement of the classification accuracy. Three-class Support Vector Machines and two-class Support Vector Machines combined with weighted voting are also compared with the former and found more useful. The highest accuracy achieved at classifying the test data was 65.5% using a genetic algorithm for feature selection with a three-class Support Vector Machine classifier.
Resumo:
The personalised conditioning system (PCS) is widely studied. Potentially, it is able to reduce energy consumption while securing occupants’ thermal comfort requirements. It has been suggested that automatic optimised operation schemes for PCS should be introduced to avoid energy wastage and discomfort caused by inappropriate operation. In certain automatic operation schemes, personalised thermal sensation models are applied as key components to help in setting targets for PCS operation. In this research, a novel personal thermal sensation modelling method based on the C-Support Vector Classification (C-SVC) algorithm has been developed for PCS control. The personal thermal sensation modelling has been regarded as a classification problem. During the modelling process, the method ‘learns’ an occupant’s thermal preferences from his/her feedback, environmental parameters and personal physiological and behavioural factors. The modelling method has been verified by comparing the actual thermal sensation vote (TSV) with the modelled one based on 20 individual cases. Furthermore, the accuracy of each individual thermal sensation model has been compared with the outcomes of the PMV model. The results indicate that the modelling method presented in this paper is an effective tool to model personal thermal sensations and could be integrated within the PCS for optimised system operation and control.
Resumo:
Sea-ice concentrations in the Laptev Sea simulated by the coupled North Atlantic-Arctic Ocean-Sea-Ice Model and Finite Element Sea-Ice Ocean Model are evaluated using sea-ice concentrations from Advanced Microwave Scanning Radiometer-Earth Observing System satellite data and a polynya classification method for winter 2007/08. While developed to simulate largescale sea-ice conditions, both models are analysed here in terms of polynya simulation. The main modification of both models in this study is the implementation of a landfast-ice mask. Simulated sea-ice fields from different model runs are compared with emphasis placed on the impact of this prescribed landfast-ice mask. We demonstrate that sea-ice models are not able to simulate flaw polynyas realistically when used without fast-ice description. Our investigations indicate that without landfast ice and with coarse horizontal resolution the models overestimate the fraction of open water in the polynya. This is not because a realistic polynya appears but due to a larger-scale reduction of ice concentrations and smoothed ice-concentration fields. After implementation of a landfast-ice mask, the polynya location is realistically simulated but the total open-water area is still overestimated in most cases. The study shows that the fast-ice parameterization is essential for model improvements. However, further improvements are necessary in order to progress from the simulation of large-scale features in the Arctic towards a more detailed simulation of smaller-scaled features (here polynyas) in an Arctic shelf sea.
Resumo:
Epidendrum L. is the largest genus of Orchidaceae in the Neotropical region; it has an impressive morphological diversification, which imposes difficulties in delimitation of both infrageneric and interspecific boundaries. In this study, we review infrageneric boundaries within the subgenus Amphiglottium and try to contribute to the understanding of morphological diversification and taxa delimitation within this group. We tested the monophyly of the subgenus Amphiglottium sect. Amphiglottium, expanding previous phylogenetic investigations and reevaluated previous infrageneric classifications proposed. Sequence data from the trnL-trnF region were analyzed with both parsimony and maximum likelihood criteria. AFLP markers were also obtained and analyzed with phylogenetic and principal coordinate analyses. Additionally, we obtained chromosome numbers for representative species within the group. The results strengthen the monophyly of the subgenus Amphiglottium but do not support the current classification system proposed by previous authors. Only section Tuberculata comprises a well-supported monophyletic group, with sections Carinata and Integra not supported. Instead of morphology, biogeographical and ecological patterns are reflected in the phylogenetic signal in this group. This study also confirms the large variability of chromosome numbers for the subgenus Amphiglottium (numbers ranging from 2n = 24 to 2n = 240), suggesting that polyploidy and hybridization are probably important mechanisms of speciation within the group.
Resumo:
Predictive performance evaluation is a fundamental issue in design, development, and deployment of classification systems. As predictive performance evaluation is a multidimensional problem, single scalar summaries such as error rate, although quite convenient due to its simplicity, can seldom evaluate all the aspects that a complete and reliable evaluation must consider. Due to this, various graphical performance evaluation methods are increasingly drawing the attention of machine learning, data mining, and pattern recognition communities. The main advantage of these types of methods resides in their ability to depict the trade-offs between evaluation aspects in a multidimensional space rather than reducing these aspects to an arbitrarily chosen (and often biased) single scalar measure. Furthermore, to appropriately select a suitable graphical method for a given task, it is crucial to identify its strengths and weaknesses. This paper surveys various graphical methods often used for predictive performance evaluation. By presenting these methods in the same framework, we hope this paper may shed some light on deciding which methods are more suitable to use in different situations.
Resumo:
This work proposes and discusses an approach for inducing Bayesian classifiers aimed at balancing the tradeoff between the precise probability estimates produced by time consuming unrestricted Bayesian networks and the computational efficiency of Naive Bayes (NB) classifiers. The proposed approach is based on the fundamental principles of the Heuristic Search Bayesian network learning. The Markov Blanket concept, as well as a proposed ""approximate Markov Blanket"" are used to reduce the number of nodes that form the Bayesian network to be induced from data. Consequently, the usually high computational cost of the heuristic search learning algorithms can be lessened, while Bayesian network structures better than NB can be achieved. The resulting algorithms, called DMBC (Dynamic Markov Blanket Classifier) and A-DMBC (Approximate DMBC), are empirically assessed in twelve domains that illustrate scenarios of particular interest. The obtained results are compared with NB and Tree Augmented Network (TAN) classifiers, and confinn that both proposed algorithms can provide good classification accuracies and better probability estimates than NB and TAN, while being more computationally efficient than the widely used K2 Algorithm.
Resumo:
The substitution of missing values, also called imputation, is an important data preparation task for many domains. Ideally, the substitution of missing values should not insert biases into the dataset. This aspect has been usually assessed by some measures of the prediction capability of imputation methods. Such measures assume the simulation of missing entries for some attributes whose values are actually known. These artificially missing values are imputed and then compared with the original values. Although this evaluation is useful, it does not allow the influence of imputed values in the ultimate modelling task (e.g. in classification) to be inferred. We argue that imputation cannot be properly evaluated apart from the modelling task. Thus, alternative approaches are needed. This article elaborates on the influence of imputed values in classification. In particular, a practical procedure for estimating the inserted bias is described. As an additional contribution, we have used such a procedure to empirically illustrate the performance of three imputation methods (majority, naive Bayes and Bayesian networks) in three datasets. Three classifiers (decision tree, naive Bayes and nearest neighbours) have been used as modelling tools in our experiments. The achieved results illustrate a variety of situations that can take place in the data preparation practice.