1000 resultados para 319-C0010A


Relevância:

10.00% 10.00%

Publicador:

Resumo:

Advances in hardware and software technologies allow to capture streaming data. The area of Data Stream Mining (DSM) is concerned with the analysis of these vast amounts of data as it is generated in real-time. Data stream classification is one of the most important DSM techniques allowing to classify previously unseen data instances. Different to traditional classifiers for static data, data stream classifiers need to adapt to concept changes (concept drift) in the stream in real-time in order to reflect the most recent concept in the data as accurately as possible. A recent addition to the data stream classifier toolbox is eRules which induces and updates a set of expressive rules that can easily be interpreted by humans. However, like most rule-based data stream classifiers, eRules exhibits a poor computational performance when confronted with continuous attributes. In this work, we propose an approach to deal with continuous data effectively and accurately in rule-based classifiers by using the Gaussian distribution as heuristic for building rule terms on continuous attributes. We show on the example of eRules that incorporating our method for continuous attributes indeed speeds up the real-time rule induction process while maintaining a similar level of accuracy compared with the original eRules classifier. We termed this new version of eRules with our approach G-eRules.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Advances in hardware technologies allow to capture and process data in real-time and the resulting high throughput data streams require novel data mining approaches. The research area of Data Stream Mining (DSM) is developing data mining algorithms that allow us to analyse these continuous streams of data in real-time. The creation and real-time adaption of classification models from data streams is one of the most challenging DSM tasks. Current classifiers for streaming data address this problem by using incremental learning algorithms. However, even so these algorithms are fast, they are challenged by high velocity data streams, where data instances are incoming at a fast rate. This is problematic if the applications desire that there is no or only a very little delay between changes in the patterns of the stream and absorption of these patterns by the classifier. Problems of scalability to Big Data of traditional data mining algorithms for static (non streaming) datasets have been addressed through the development of parallel classifiers. However, there is very little work on the parallelisation of data stream classification techniques. In this paper we investigate K-Nearest Neighbours (KNN) as the basis for a real-time adaptive and parallel methodology for scalable data stream classification tasks.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The urban heat island is a well-known phenomenon that impacts a wide variety of city operations. With greater availability of cheap meteorological sensors, it is possible to measure the spatial patterns of urban atmospheric characteristics with greater resolution. To develop robust and resilient networks, recognizing sensors may malfunction, it is important to know when measurement points are providing additional information and also the minimum number of sensors needed to provide spatial information for particular applications. Here we consider the example of temperature data, and the urban heat island, through analysis of a network of sensors in the Tokyo metropolitan area (Extended METROS). The effect of reducing observation points from an existing meteorological measurement network is considered, using random sampling and sampling with clustering. The results indicated the sampling with hierarchical clustering can yield similar temperature patterns with up to a 30% reduction in measurement sites in Tokyo. The methods presented have broader utility in evaluating the robustness and resilience of existing urban temperature networks and in how networks can be enhanced by new mobile and open data sources.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Learning low dimensional manifold from highly nonlinear data of high dimensionality has become increasingly important for discovering intrinsic representation that can be utilized for data visualization and preprocessing. The autoencoder is a powerful dimensionality reduction technique based on minimizing reconstruction error, and it has regained popularity because it has been efficiently used for greedy pretraining of deep neural networks. Compared to Neural Network (NN), the superiority of Gaussian Process (GP) has been shown in model inference, optimization and performance. GP has been successfully applied in nonlinear Dimensionality Reduction (DR) algorithms, such as Gaussian Process Latent Variable Model (GPLVM). In this paper we propose the Gaussian Processes Autoencoder Model (GPAM) for dimensionality reduction by extending the classic NN based autoencoder to GP based autoencoder. More interestingly, the novel model can also be viewed as back constrained GPLVM (BC-GPLVM) where the back constraint smooth function is represented by a GP. Experiments verify the performance of the newly proposed model.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In the last decade, several research results have presented formulations for the auto-calibration problem. Most of these have relied on the evaluation of vanishing points to extract the camera parameters. Normally vanishing points are evaluated using pedestrians or the Manhattan World assumption i.e. it is assumed that the scene is necessarily composed of orthogonal planar surfaces. In this work, we present a robust framework for auto-calibration, with improved results and generalisability for real-life situations. This framework is capable of handling problems such as occlusions and the presence of unexpected objects in the scene. In our tests, we compare our formulation with the state-of-the-art in auto-calibration using pedestrians and Manhattan World-based assumptions. This paper reports on the experiments conducted using publicly available datasets; the results have shown that our formulation represents an improvement over the state-of-the-art.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Human brain imaging techniques, such as Magnetic Resonance Imaging (MRI) or Diffusion Tensor Imaging (DTI), have been established as scientific and diagnostic tools and their adoption is growing in popularity. Statistical methods, machine learning and data mining algorithms have successfully been adopted to extract predictive and descriptive models from neuroimage data. However, the knowledge discovery process typically requires also the adoption of pre-processing, post-processing and visualisation techniques in complex data workflows. Currently, a main problem for the integrated preprocessing and mining of MRI data is the lack of comprehensive platforms able to avoid the manual invocation of preprocessing and mining tools, that yields to an error-prone and inefficient process. In this work we present K-Surfer, a novel plug-in of the Konstanz Information Miner (KNIME) workbench, that automatizes the preprocessing of brain images and leverages the mining capabilities of KNIME in an integrated way. K-Surfer supports the importing, filtering, merging and pre-processing of neuroimage data from FreeSurfer, a tool for human brain MRI feature extraction and interpretation. K-Surfer automatizes the steps for importing FreeSurfer data, reducing time costs, eliminating human errors and enabling the design of complex analytics workflow for neuroimage data by leveraging the rich functionalities available in the KNIME workbench.