125 resultados para agglomerative clustering


Relevância:

20.00% 20.00%

Publicador:

Resumo:

The recent years have seen extensive work on statistics-based network traffic classification using machine learning (ML) techniques. In the particular scenario of learning from unlabeled traffic data, some classic unsupervised clustering algorithms (e.g. K-Means and EM) have been applied but the reported results are unsatisfactory in terms of low accuracy. This paper presents a novel approach for the task, which performs clustering based on Random Forest (RF) proximities instead of Euclidean distances. The approach consists of two steps. In the first step, we derive a proximity measure for each pair of data points by performing a RF classification on the original data and a set of synthetic data. In the next step, we perform a K-Medoids clustering to partition the data points into K groups based on the proximity matrix. Evaluations have been conducted on real-world Internet traffic traces and the experimental results indicate that the proposed approach is more accurate than the previous methods.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The internet age has fuelled an enormous explosion in the amount of information generated by humanity. Much of this information is transient in nature, created to be immediately consumed and built upon (or discarded). The field of data mining is surprisingly scant with algorithms that are geared towards the unsupervised knowledge extraction of such dynamic data streams. This chapter describes a new neural network algorithm inspired by self-organising maps. The new algorithm is a hybrid algorithm from the growing self-organising map (GSOM) and the cellular probabilistic self-organising map (CPSOM). The result is an algorithm which generates a dynamically growing feature map for the purpose of clustering dynamic data streams and tracking clusters as they evolve in the data stream.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Understanding neural functions requires knowledge from analysing electrophysiological data. The process of assigning spikes of a multichannel signal into clusters, called spike sorting, is one of the important problems in such analysis. There have been various automated spike sorting techniques with both advantages and disadvantages regarding accuracy and computational costs. Therefore, developing spike sorting methods that are highly accurate and computationally inexpensive is always a challenge in the biomedical engineering practice.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents a comparison of applying different clustering algorithms on a point cloud constructed from the depth maps captured by a RGBD camera such as Microsoft Kinect. The depth sensor is capable of returning images, where each pixel represents the distance to its corresponding point not the RGB data. This is considered as the real novelty of the RGBD camera in computer vision compared to the common video-based and stereo-based products. Depth sensors captures depth data without using markers, 2D to 3D-transition or determining feature points. The captured depth map then cluster the 3D depth points into different clusters to determine the different limbs of the human-body. The 3D points clustering is achieved by different clustering techniques. Our Experiments show good performance and results in using clustering to determine different human-body limbs.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Biomedical time series clustering that automatically groups a collection of time series according to their internal similarity is of importance for medical record management and inspection such as bio-signals archiving and retrieval. In this paper, a novel framework that automatically groups a set of unlabelled multichannel biomedical time series according to their internal structural similarity is proposed. Specifically, we treat a multichannel biomedical time series as a document and extract local segments from the time series as words. We extend a topic model, i.e., the Hierarchical probabilistic Latent Semantic Analysis (H-pLSA), which was originally developed for visual motion analysis to cluster a set of unlabelled multichannel time series. The H-pLSA models each channel of the multichannel time series using a local pLSA in the first layer. The topics learned in the local pLSA are then fed to a global pLSA in the second layer to discover the categories of multichannel time series. Experiments on a dataset extracted from multichannel Electrocardiography (ECG) signals demonstrate that the proposed method performs better than previous state-of-the-art approaches and is relatively robust to the variations of parameters including length of local segments and dictionary size. Although the experimental evaluation used the multichannel ECG signals in a biometric scenario, the proposed algorithm is a universal framework for multichannel biomedical time series clustering according to their structural similarity, which has many applications in biomedical time series management.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Failure mode and effect analysis (FMEA) is a popular safety and reliability analysis tool in examining potential failures of products, process, designs, or services, in a wide range of industries. While FMEA is a popular tool, the limitations of the traditional Risk Priority Number (RPN) model in FMEA have been highlighted in the literature. Even though many alternatives to the traditional RPN model have been proposed, there are not many investigations on the use of clustering techniques in FMEA. The main aim of this paper was to examine the use of a new Euclidean distance-based similarity measure and an incremental-learning clustering model, i.e., fuzzy adaptive resonance theory neural network, for similarity analysis and clustering of failure modes in FMEA; therefore, allowing the failure modes to be analyzed, visualized, and clustered. In this paper, the concept of a risk interval encompassing a group of failure modes is investigated. Besides that, a new approach to analyze risk ordering of different failure groups is introduced. These proposed methods are evaluated using a case study related to the edible bird nest industry in Sarawak, Malaysia. In short, the contributions of this paper are threefold: (1) a new Euclidean distance-based similarity measure, (2) a new risk interval measure for a group of failure modes, and (3) a new analysis of risk ordering of different failure groups. © 2014 The Natural Computing Applications Forum.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

 Understanding neural functions requires the observation of the activities of single neurons that are represented via electrophysiological data. Processing and understanding these data are challenging problems in biomedical engineering. A microelectrode commonly records the activity of multiple neurons. Spike sorting is a process of classifying every single action potential (spike) to a particular neuron. This paper proposes a combination between diffusion maps (DM) and mean shift clustering method for spike sorting. DM is utilized to extract spike features, which are highly capable of discriminating different spike shapes. Mean shift clustering provides an automatic unsupervised clustering, which takes extracted features from DM as inputs. Experimental results show a noticeable dominance of the features extracted by DM compared to those selected by wavelet transformation (WT). Accordingly, the proposed integrated method is significantly superior to the popular existing combination of WT and superparamagnetic clustering regarding spike sorting accuracy.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Spike sorting plays an important role in analysing electrophysiological data and understanding neural functions. Developing spike sorting methods that are highly accurate and computationally inexpensive is always a challenge in the biomedical engineering practice. This paper proposes an automatic unsupervised spike sorting method using the landmark-based spectral clustering (LSC) method in connection with features extracted by the locality preserving projection (LPP) technique. Gap statistics is employed to evaluate the number of clusters before the LSC can be performed. Experimental results show that LPP spike features are more discriminative than those of the popular wavelet transformation (WT). Accordingly, the proposed method LPP-LSC demonstrates a significant dominance compared to the existing method that is the combination between WT feature extraction and the superparamagnetic clustering. LPP and LSC are both linear algorithms that help reduce computational burden and thus their combination can be applied into realtime spike analysis.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Statistics-based Internet traffic classification using machine learning techniques has attracted extensive research interest lately, because of the increasing ineffectiveness of traditional port-based and payload-based approaches. In particular, unsupervised learning, that is, traffic clustering, is very important in real-life applications, where labeled training data are difficult to obtain and new patterns keep emerging. Although previous studies have applied some classic clustering algorithms such as K-Means and EM for the task, the quality of resultant traffic clusters was far from satisfactory. In order to improve the accuracy of traffic clustering, we propose a constrained clustering scheme that makes decisions with consideration of some background information in addition to the observed traffic statistics. Specifically, we make use of equivalence set constraints indicating that particular sets of flows are using the same application layer protocols, which can be efficiently inferred from packet headers according to the background knowledge of TCP/IP networking. We model the observed data and constraints using Gaussian mixture density and adapt an approximate algorithm for the maximum likelihood estimation of model parameters. Moreover, we study the effects of unsupervised feature discretization on traffic clustering by using a fundamental binning method. A number of real-world Internet traffic traces have been used in our evaluation, and the results show that the proposed approach not only improves the quality of traffic clusters in terms of overall accuracy and per-class metrics, but also speeds up the convergence.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Internet traffic classification is a critical and essential functionality for network management and security systems. Due to the limitations of traditional port-based and payload-based classification approaches, the past several years have seen extensive research on utilizing machine learning techniques to classify Internet traffic based on packet and flow level characteristics. For the purpose of learning from unlabeled traffic data, some classic clustering methods have been applied in previous studies but the reported accuracy results are unsatisfactory. In this paper, we propose a semi-supervised approach for accurate Internet traffic clustering, which is motivated by the observation of widely existing partial equivalence relationships among Internet traffic flows. In particular, we formulate the problem using a Gaussian Mixture Model (GMM) with set-based equivalence constraint and propose a constrained Expectation Maximization (EM) algorithm for clustering. Experiments with real-world packet traces show that the proposed approach can significantly improve the quality of resultant traffic clusters. © 2014 Elsevier Inc.