Internet traffic classification using constrained clustering


Autoria(s): Wang,Y; Xiang,Y; Zhang,J; Zhou,W; Wei,G; Yang,LT
Data(s)

09/10/2014

Resumo

Statistics-based Internet traffic classification using machine learning techniques has attracted extensive research interest lately, because of the increasing ineffectiveness of traditional port-based and payload-based approaches. In particular, unsupervised learning, that is, traffic clustering, is very important in real-life applications, where labeled training data are difficult to obtain and new patterns keep emerging. Although previous studies have applied some classic clustering algorithms such as K-Means and EM for the task, the quality of resultant traffic clusters was far from satisfactory. In order to improve the accuracy of traffic clustering, we propose a constrained clustering scheme that makes decisions with consideration of some background information in addition to the observed traffic statistics. Specifically, we make use of equivalence set constraints indicating that particular sets of flows are using the same application layer protocols, which can be efficiently inferred from packet headers according to the background knowledge of TCP/IP networking. We model the observed data and constraints using Gaussian mixture density and adapt an approximate algorithm for the maximum likelihood estimation of model parameters. Moreover, we study the effects of unsupervised feature discretization on traffic clustering by using a fundamental binning method. A number of real-world Internet traffic traces have been used in our evaluation, and the results show that the proposed approach not only improves the quality of traffic clusters in terms of overall accuracy and per-class metrics, but also speeds up the convergence.

Identificador

http://hdl.handle.net/10536/DRO/DU:30071857

Idioma(s)

eng

Publicador

IEEE Computer Society

Relação

http://dro.deakin.edu.au/eserv/DU:30071857/wang-yu-internettrafficclassific-2014.pdf

http://www.dx.doi.org/10.1109/TPDS.2013.307

Direitos

2014, Springer

Palavras-Chave #Algorithms #clustering #machine learning #network security #traffic analysis #Science & Technology #Technology #Computer Science, Theory & Methods #Engineering, Electrical & Electronic #Computer Science #Engineering
Tipo

Journal Article