999 resultados para traffic classification


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Traffic classification using machine learning continues to be an active research area. The majority of work in this area uses off-the-shelf machine learning tools and treats them as black-box classifiers. This approach turns all the modelling complexity into a feature selection problem. In this paper, we build a problem-specific solution to the traffic classification problem by designing a custom probabilistic graphical model. Graphical models are a modular framework to design classifiers which incorporate domain-specific knowledge. More specifically, our solution introduces semi-supervised learning which means we learn from both labelled and unlabelled traffic flows. We show that our solution performs competitively compared to previous approaches while using less data and simpler features. Copyright © 2010 ACM.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Due to the increasing unreliability of traditional port-based methods, Internet traffic classification has attracted a lot of research efforts in recent years. Quite a lot of previous papers have focused on using statistical characteristics as discriminators and applying machine learning techniques to classify the traffic flows. In this paper, we propose a novel machine learning based approach where the features are extracted from packet payload instead of flow statistics. Specifically, every flow is represented by a feature vector, in which each item indicates the occurrence of a particular token, i.e.; a common substring, in the payload. We have applied various machine learning algorithms to evaluate the idea and used different feature selection schemes to identify the critical tokens. Experimental result based on a real-world traffic data set shows that the approach can achieve high accuracy with low overhead.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents a novel traffic classification scheme to improve classification performance when few training data arc available. In the proposed scheme, traffic flows are described using the discretized statistical features and flow correlation information is modeled by bag-of-flow (BoF). We solve the BoF-based traffic classification in a classifier combination framework and theoretically analyze the performance benefit. Furthermore, a new BoF-based traffic classification method is proposed to aggregate the naive Bayes (NB) predictions of the correlated flows. We also present an analysis on prediction error sensitivity of the aggregation strategies. Finally, a large number of experiments are carried out on two large-scale real-world traffic datasets to evaluate the proposed scheme. The experimental results show that the proposed scheme can achieve much better classification performance than existing state-of-the-art traffic classification methods.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Traffic classification has wide applications in network management, from security monitoring to quality of service measurements. Recent research tends to apply machine learning techniques to flow statistical feature based classification methods. The nearest neighbor (NN)-based method has exhibited superior classification performance. It also has several important advantages, such as no requirements of training procedure, no risk of overfitting of parameters, and naturally being able to handle a huge number of classes. However, the performance of NN classifier can be severely affected if the size of training data is small. In this paper, we propose a novel nonparametric approach for traffic classification, which can improve the classification performance effectively by incorporating correlated information into the classification process. We analyze the new classification approach and its performance benefit from both theoretical and empirical perspectives. A large number of experiments are carried out on two real-world traffic data sets to validate the proposed approach. The results show the traffic classification performance can be improved significantly even under the extreme difficult circumstance of very few training samples.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Traffic classification technique is an essential tool for network and system security in the complex environments such as cloud computing based environment. The state-of-the-art traffic classification methods aim to take the advantages of flow statistical features and machine learning techniques, however the classification performance is severely affected by limited supervised information and unknown applications. To achieve effective network traffic classification, we propose a new method to tackle the problem of unknown applications in the crucial situation of a small supervised training set. The proposed method possesses the superior capability of detecting unknown flows generated by unknown applications and utilizing the correlation information among real-world network traffic to boost the classification performance. A theoretical analysis is provided to confirm performance benefit of the proposed method. Moreover, the comprehensive performance evaluation conducted on two real-world network traffic datasets shows that the proposed scheme outperforms the existing methods in the critical network environment.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The thesis addresses a number of critical problems in regard to fully automating the process of network traffic classification and protocol identification. Several effective solutions based on statistical analysis and machine learning techniques are proposed, which significantly reduce the requirements for human interventions in network traffic classification systems.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Statistics-based Internet traffic classification using machine learning techniques has attracted extensive research interest lately, because of the increasing ineffectiveness of traditional port-based and payload-based approaches. In particular, unsupervised learning, that is, traffic clustering, is very important in real-life applications, where labeled training data are difficult to obtain and new patterns keep emerging. Although previous studies have applied some classic clustering algorithms such as K-Means and EM for the task, the quality of resultant traffic clusters was far from satisfactory. In order to improve the accuracy of traffic clustering, we propose a constrained clustering scheme that makes decisions with consideration of some background information in addition to the observed traffic statistics. Specifically, we make use of equivalence set constraints indicating that particular sets of flows are using the same application layer protocols, which can be efficiently inferred from packet headers according to the background knowledge of TCP/IP networking. We model the observed data and constraints using Gaussian mixture density and adapt an approximate algorithm for the maximum likelihood estimation of model parameters. Moreover, we study the effects of unsupervised feature discretization on traffic clustering by using a fundamental binning method. A number of real-world Internet traffic traces have been used in our evaluation, and the results show that the proposed approach not only improves the quality of traffic clusters in terms of overall accuracy and per-class metrics, but also speeds up the convergence.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

With the arrival of big data era, the Internet traffic is growing exponentially. A wide variety of applications arise on the Internet and traffic classification is introduced to help people manage the massive applications on the Internet for security monitoring and quality of service purposes. A large number of Machine Learning (ML) algorithms are introduced to deal with traffic classification. A significant challenge to the classification performance comes from imbalanced distribution of data in traffic classification system. In this paper, we proposed an Optimised Distance-based Nearest Neighbor (ODNN), which has the capability of improving the classification performance of imbalanced traffic data. We analyzed the proposed ODNN approach and its performance benefit from both theoretical and empirical perspectives. A large number of experiments were implemented on the real-world traffic dataset. The results show that the performance of “small classes” can be improved significantly even only with small number of training data and the performance of “large classes” remains stable.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

As a fundamental tool for network management and security, traffic classification has attracted increasing attention in recent years. A significant challenge to the robustness of classification performance comes from zero-day applications previously unknown in traffic classification systems. In this paper, we propose a new scheme of Robust statistical Traffic Classification (RTC) by combining supervised and unsupervised machine learning techniques to meet this challenge. The proposed RTC scheme has the capability of identifying the traffic of zero-day applications as well as accurately discriminating predefined application classes. In addition, we develop a new method for automating the RTC scheme parameters optimization process. The empirical study on real-world traffic data confirms the effectiveness of the proposed scheme. When zero-day applications are present, the classification performance of the new scheme is significantly better than four state-of-the-art methods: random forest, correlation-based classification, semi-supervised clustering, and one-class SVM.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We propose a novel methodology to generate realistic network flow traces to enable systematic evaluation of network monitoring systems in various traffic conditions. Our technique uses a graph-based approach to model the communication structure observed in real-world traces and to extract traffic templates. By combining extracted and user-defined traffic templates, realistic network flow traces that comprise normal traffic and customized conditions are generated in a scalable manner. A proof-of-concept implementation demonstrates the utility and simplicity of our method to produce a variety of evaluation scenarios. We show that the extraction of templates from real-world traffic leads to a manageable number of templates that still enable accurate re-creation of the original communication properties on the network flow level.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Internet traffic classification is a relevant and mature research field, anyway of growing importance and with still open technical challenges, also due to the pervasive presence of Internet-connected devices into everyday life. We claim the need for innovative traffic classification solutions capable of being lightweight, of adopting a domain-based approach, of not only concentrating on application-level protocol categorization but also classifying Internet traffic by subject. To this purpose, this paper originally proposes a classification solution that leverages domain name information extracted from IPFIX summaries, DNS logs, and DHCP leases, with the possibility to be applied to any kind of traffic. Our proposed solution is based on an extension of Word2vec unsupervised learning techniques running on a specialized Apache Spark cluster. In particular, learning techniques are leveraged to generate word-embeddings from a mixed dataset composed by domain names and natural language corpuses in a lightweight way and with general applicability. The paper also reports lessons learnt from our implementation and deployment experience that demonstrates that our solution can process 5500 IPFIX summaries per second on an Apache Spark cluster with 1 slave instance in Amazon EC2 at a cost of $ 3860 year. Reported experimental results about Precision, Recall, F-Measure, Accuracy, and Cohen's Kappa show the feasibility and effectiveness of the proposal. The experiments prove that words contained in domain names do have a relation with the kind of traffic directed towards them, therefore using specifically trained word embeddings we are able to classify them in customizable categories. We also show that training word embeddings on larger natural language corpuses leads improvements in terms of precision up to 180%.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Traffic classification plays the significant role in the network security and management. However, accurate classification is challenging if the training data is contaminated with unclean traffic. Recent researches often assume clean training data, and hence performance reduced on real-time network traffic. To meet this challenge, in this paper, we propose a robust method, Unclean Traffic Classification (UTC), which incorporates noise elimination and suspected noise reweighting. Firstly, UTC eliminates strong noisy training data identified by a consensus filtering with multiple classifiers. Furthermore, UTC estimates the relevance of remaining training data and learns a robust traffic classifier. Through a number of experiments on a real-world traffic dataset, we show that the new method outperforms existing state-of-the-art traffic classification methods, under the extremely difficult circumstance with unclean training data.