4 resultados para k-nearest neighbours

em CentAUR: Central Archive University of Reading - UK


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Advances in hardware technologies allow to capture and process data in real-time and the resulting high throughput data streams require novel data mining approaches. The research area of Data Stream Mining (DSM) is developing data mining algorithms that allow us to analyse these continuous streams of data in real-time. The creation and real-time adaption of classification models from data streams is one of the most challenging DSM tasks. Current classifiers for streaming data address this problem by using incremental learning algorithms. However, even so these algorithms are fast, they are challenged by high velocity data streams, where data instances are incoming at a fast rate. This is problematic if the applications desire that there is no or only a very little delay between changes in the patterns of the stream and absorption of these patterns by the classifier. Problems of scalability to Big Data of traditional data mining algorithms for static (non streaming) datasets have been addressed through the development of parallel classifiers. However, there is very little work on the parallelisation of data stream classification techniques. In this paper we investigate K-Nearest Neighbours (KNN) as the basis for a real-time adaptive and parallel methodology for scalable data stream classification tasks.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We agree with Duckrow and Albano [Phys. Rev. E 67, 063901 (2003)] and Quian Quiroga et al. [Phys. Rev. E 67, 063902 (2003)] that mutual information (MI) is a useful measure of dependence for electroencephalogram (EEG) data, but we show that the improvement seen in the performance of MI on extracting dependence trends from EEG is more dependent on the type of MI estimator rather than any embedding technique used. In an independent study we conducted in search for an optimal MI estimator, and in particular for EEG applications, we examined the performance of a number of MI estimators on the data set used by Quian Quiroga et al. in their original study, where the performance of different dependence measures on real data was investigated [Phys. Rev. E 65, 041903 (2002)]. We show that for EEG applications the best performance among the investigated estimators is achieved by k-nearest neighbors, which supports the conjecture by Quian Quiroga et al. in Phys. Rev. E 67, 063902 (2003) that the nearest neighbor estimator is the most precise method for estimating MI.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We agree with Duckrow and Albano [Phys. Rev. E 67, 063901 (2003)] and Quian Quiroga [Phys. Rev. E 67, 063902 (2003)] that mutual information (MI) is a useful measure of dependence for electroencephalogram (EEG) data, but we show that the improvement seen in the performance of MI on extracting dependence trends from EEG is more dependent on the type of MI estimator rather than any embedding technique used. In an independent study we conducted in search for an optimal MI estimator, and in particular for EEG applications, we examined the performance of a number of MI estimators on the data set used by Quian Quiroga in their original study, where the performance of different dependence measures on real data was investigated [Phys. Rev. E 65, 041903 (2002)]. We show that for EEG applications the best performance among the investigated estimators is achieved by k-nearest neighbors, which supports the conjecture by Quian Quiroga in Phys. Rev. E 67, 063902 (2003) that the nearest neighbor estimator is the most precise method for estimating MI.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In Peer-to-Peer (P2P) networks, it is often desirable to assign node IDs which preserve locality relationships in the underlying topology. Node locality can be embedded into node IDs by utilizing a one dimensional mapping by a Hilbert space filling curve on a vector of network distances from each node to a subset of reference landmark nodes within the network. However this approach is fundamentally limited because while robustness and accuracy might be expected to improve with the number of landmarks, the effectiveness of 1 dimensional Hilbert Curve mapping falls for the curse of dimensionality. This work proposes an approach to solve this issue using Landmark Multidimensional Scaling (LMDS) to reduce a large set of landmarks to a smaller set of virtual landmarks. This smaller set of landmarks has been postulated to represent the intrinsic dimensionality of the network space and therefore a space filling curve applied to these virtual landmarks is expected to produce a better mapping of the node ID space. The proposed approach, the Virtual Landmarks Hilbert Curve (VLHC), is particularly suitable for decentralised systems like P2P networks. In the experimental simulations the effectiveness of the methods is measured by means of the locality preservation derived from node IDs in terms of latency to nearest neighbours. A variety of realistic network topologies are simulated and this work provides strong evidence to suggest that VLHC performs better than either Hilbert Curves or LMDS use independently of each other.