125 resultados para constrained clustering

em Deakin Research Online - Australia


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Statistics-based Internet traffic classification using machine learning techniques has attracted extensive research interest lately, because of the increasing ineffectiveness of traditional port-based and payload-based approaches. In particular, unsupervised learning, that is, traffic clustering, is very important in real-life applications, where labeled training data are difficult to obtain and new patterns keep emerging. Although previous studies have applied some classic clustering algorithms such as K-Means and EM for the task, the quality of resultant traffic clusters was far from satisfactory. In order to improve the accuracy of traffic clustering, we propose a constrained clustering scheme that makes decisions with consideration of some background information in addition to the observed traffic statistics. Specifically, we make use of equivalence set constraints indicating that particular sets of flows are using the same application layer protocols, which can be efficiently inferred from packet headers according to the background knowledge of TCP/IP networking. We model the observed data and constraints using Gaussian mixture density and adapt an approximate algorithm for the maximum likelihood estimation of model parameters. Moreover, we study the effects of unsupervised feature discretization on traffic clustering by using a fundamental binning method. A number of real-world Internet traffic traces have been used in our evaluation, and the results show that the proposed approach not only improves the quality of traffic clusters in terms of overall accuracy and per-class metrics, but also speeds up the convergence.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Network traffic classification is an essential component for network management and security systems. To address the limitations of traditional port-based and payload-based methods, recent studies have been focusing on alternative approaches. One promising direction is applying machine learning techniques to classify traffic flows based on packet and flow level statistics. In particular, previous papers have illustrated that clustering can achieve high accuracy and discover unknown application classes. In this work, we present a novel semi-supervised learning method using constrained clustering algorithms. The motivation is that in network domain a lot of background information is available in addition to the data instances themselves. For example, we might know that flow ƒ1 and ƒ2 are using the same application protocol because they are visiting the same host address at the same port simultaneously. In this case, ƒ1 and ƒ2 shall be grouped into the same cluster ideally. Therefore, we describe these correlations in the form of pair-wise must-link constraints and incorporate them in the process of clustering. We have applied three constrained variants of the K-Means algorithm, which perform hard or soft constraint satisfaction and metric learning from constraints. A number of real-world traffic traces have been used to show the availability of constraints and to test the proposed approach. The experimental results indicate that by incorporating constraints in the course of clustering, the overall accuracy and cluster purity can be significantly improved.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Due to the limitations of the traditional port-based and payload-based traffic classification approaches, the past decade has seen extensive work on utilizing machine learning techniques to classify network traffic based on packet and flow level features. In particular, previous studies have shown that the unsupervised clustering approach is both accurate and capable of discovering previously unknown application classes. In this paper, we explore the utility of side information in the process of traffic clustering. Specifically, we focus on the flow correlation information that can be efficiently extracted from packet headers and expressed as instance-level constraints, which indicate that particular sets of flows are using the same application and thus should be put into the same cluster. To incorporate the constraints, we propose a modified constrained K-Means algorithm. A variety of real-world traffic traces are used to show that the constraints are widely available. The experimental results indicate that the constrained approach not only improves the quality of the resulted clusters, but also speeds up the convergence of the clustering process.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Internet traffic classification is a critical and essential functionality for network management and security systems. Due to the limitations of traditional port-based and payload-based classification approaches, the past several years have seen extensive research on utilizing machine learning techniques to classify Internet traffic based on packet and flow level characteristics. For the purpose of learning from unlabeled traffic data, some classic clustering methods have been applied in previous studies but the reported accuracy results are unsatisfactory. In this paper, we propose a semi-supervised approach for accurate Internet traffic clustering, which is motivated by the observation of widely existing partial equivalence relationships among Internet traffic flows. In particular, we formulate the problem using a Gaussian Mixture Model (GMM) with set-based equivalence constraint and propose a constrained Expectation Maximization (EM) algorithm for clustering. Experiments with real-world packet traces show that the proposed approach can significantly improve the quality of resultant traffic clusters. © 2014 Elsevier Inc.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The Dirichlet process mixture (DPM) model, a typical Bayesian nonparametric model, can infer the number of clusters automatically, and thus performing priority in data clustering. This paper investigates the influence of pairwise constraints in the DPM model. The pairwise constraint, known as two types: must-link (ML) and cannot-link (CL) constraints, indicates the relationship between two data points. We have proposed two relevant models which incorporate pairwise constraints: the constrained DPM (C-DPM) and the constrained DPM with selected constraints (SC-DPM). In C-DPM, the concept of chunklet is introduced. ML constraints are compiled into chunklets and CL constraints exist between chunklets. We derive the Gibbs sampling of the C-DPM based on chunklets. We further propose a principled approach to select the most useful constraints, which will be incorporated into the SC-DPM. We evaluate the proposed models based on three real datasets: 20 Newsgroups dataset, NUS-WIDE image dataset and Facebook comments datasets we collected by ourselves. Our SC-DPM performs priority in data clustering. In addition, our SC-DPM can be potentially used for short-text clustering.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The past decade has seen a lot of research on statistics-based network protocol identification using machine learning techniques. Prior studies have shown promising results in terms of high accuracy and fast classification speed. However, most works have embodied an implicit assumption that all protocols are known in advance and presented in the training data, which is unrealistic since real-world networks constantly witness emerging traffic patterns as well as unknown protocols in the wild. In this paper, we revisit the problem by proposing a learning scheme with unknown pattern extraction for statistical protocol identification. The scheme is designed with a more realistic setting, where the training dataset contains labeled samples from a limited number of protocols, and the goal is to tell these known protocols apart from each other and from potential unknown ones. Preliminary results derived from real-world traffic are presented to show the effectiveness of the scheme.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

One reason for semi-supervised clustering fail to deliver satisfactory performance in document clustering is that the transformed optimization problem could have many candidate solutions, but existing methods provide no mechanism to select a suitable one from all those candidates. This paper alleviates this problem by posing the same task as a soft-constrained optimization problem, and introduces the salient degree measure as an information guide to control the searching of an optimal solution. Experimental results show the effectiveness of the proposed method in the improvement of the performance, especially when the amount of priori domain knowledge is limited.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Recognition of multiple moving objects is a very important task for achieving user-cared knowledge to send to the base station in wireless video-based sensor networks. However, video based sensor nodes, which have constrained resources and produce huge amount of video streams continuously, bring a challenge to segment multiple moving objects from the video stream online. Traditional efficient clustering algorithms such as DBSCAN cannot run time-efficiently and even fail to run on limited memory space on sensor nodes, because the number of pixel points is too huge. This paper provides a novel algorithm named Inter-Frame Change Directing Online clustering (IFCDO clustering) for segmenting multiple moving objects from video stream on sensor nodes. IFCDO clustering only needs to group inter-frame different pixels, thus it reduces both space and time complexity while achieves robust clusters the same as DBSCAN. Experiment results show IFCDO clustering excels DBSCAN in terms of both time and space efficiency. © 2008 IEEE.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The paper presents a simple approach to the problem of designing low-order output feedback controllers for linear continuous systems. The controller can place all of the closed-loop poles within a circle, C(- , 1/ β) , with centre at - and radius of 1/ β in the left half s-plane. The design method is based on transformation of the original system and then applying the bounded-real-lemma to the transformed system. It is shown that subjected to the solvability of an algebraic Riccati equation (ARE), output feedback controllers can then be systematically derived. Furthermore, the order of the controller is low and equals only the number of the open-loop poles lying outside the circle. A step-by-step design algorithm is given. Numerical examples are given to illustrate the design method.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Cutting angle method (CAM) is a deterministic global optimization technique applicable to Lipschitz functions f: Rn → R. The method builds a sequence of piecewise linear lower approximations to the objective function f. The sequence of solutions to these relaxed problems converges to the global minimum of f. This article adapts CAM to the case of linear constraints on the feasible domain. We show how the relaxed problems are modified, and how the numerical efficiency of solving these problems can be preserved. A number of numerical experiments confirms the improved numerical efficiency.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Clustering is a difficult problem especially when we consider the task in the context of a data stream of categorical attributes. In this paper, we propose SCLOPE, a novel algorithm based on CLOPErsquos intuitive observation about cluster histograms. Unlike CLOPE however, our algo- rithm is very fast and operates within the constraints of a data stream environment. In particular, we designed SCLOPE according to the recent CluStream framework. Our evaluation of SCLOPE shows very promising results. It consistently outperforms CLOPE in speed and scalability tests on our data sets while maintaining high cluster purity; it also supports cluster analysis that other algorithms in its class do not.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Clustering is a difficult problem especially when we consider the task in the context of a data stream of categorical attributes. In this paper, we propose σ-SCLOPE, a novel algorithm based on SCLOPE’s intuitive observation about cluster histograms. Unlike SCLOPE however, our algorithm consumes less memory per window and has a better clustering runtime for the same data stream in a given window. This positions σ-SCLOPE as a more attractive option over SCLOPE if a minor lost of clustering accuracy is insignificant in the application.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We propose a new technique to perform unsupervised data classification (clustering) based on density induced metric and non-smooth optimization. Our goal is to automatically recognize multidimensional clusters of non-convex shape. We present a modification of the fuzzy c-means algorithm, which uses the data induced metric, defined with the help of Delaunay triangulation. We detail computation of the distances in such a metric using graph algorithms. To find optimal positions of cluster prototypes we employ the discrete gradient method of non-smooth optimization. The new clustering method is capable to identify non-convex overlapped d-dimensional clusters.


Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper considers the Cardinality Constrained Quadratic Knapsack Problem (QKP) and the Quadratic Selective Travelling Salesman Problem (QSTSP). The QKP is a generalization of the Knapsack Problem and the QSTSP is a generalization of the Travelling Salesman Problem. Thus, both problems are NP hard. The QSTSP and the QKP can be solved using branch-and-cut methods. Good bounds can be obtained if strong constraints are used. Hence it is important to identify strong or even facet-defining constraints. This paper studies the polyhedral combinatorics of the QSTSP and the QKP, i.e. amongst others we identify facet-defining constraints for the QSTSP and the QKP, and provide mathematical proofs that they do indeed define facets.

Relevância:

20.00% 20.00%

Publicador: