862 resultados para Payload-based traffic classifiers.
Resumo:
Q. Shen. Rough feature selection for intelligent classifiers. LNCS Transactions on Rough Sets, 7:244-255, 2007.
Resumo:
R. Jensen and Q. Shen, 'Tolerance-based and Fuzzy-Rough Feature Selection,' Proceedings of the 16th International Conference on Fuzzy Systems (FUZZ-IEEE'07), pp. 877-882, 2007.
Resumo:
Accurate knowledge of traffic demands in a communication network enables or enhances a variety of traffic engineering and network management tasks of paramount importance for operational networks. Directly measuring a complete set of these demands is prohibitively expensive because of the huge amounts of data that must be collected and the performance impact that such measurements would impose on the regular behavior of the network. As a consequence, we must rely on statistical techniques to produce estimates of actual traffic demands from partial information. The performance of such techniques is however limited due to their reliance on limited information and the high amount of computations they incur, which limits their convergence behavior. In this paper we study strategies to improve the convergence of a powerful statistical technique based on an Expectation-Maximization iterative algorithm. First we analyze modeling approaches to generating starting points. We call these starting points informed priors since they are obtained using actual network information such as packet traces and SNMP link counts. Second we provide a very fast variant of the EM algorithm which extends its computation range, increasing its accuracy and decreasing its dependence on the quality of the starting point. Finally, we study the convergence characteristics of our EM algorithm and compare it against a recently proposed Weighted Least Squares approach.
Resumo:
The best-effort nature of the Internet poses a significant obstacle to the deployment of many applications that require guaranteed bandwidth. In this paper, we present a novel approach that enables two edge/border routers-which we call Internet Traffic Managers (ITM)-to use an adaptive number of TCP connections to set up a tunnel of desirable bandwidth between them. The number of TCP connections that comprise this tunnel is elastic in the sense that it increases/decreases in tandem with competing cross traffic to maintain a target bandwidth. An origin ITM would then schedule incoming packets from an application requiring guaranteed bandwidth over that elastic tunnel. Unlike many proposed solutions that aim to deliver soft QoS guarantees, our elastic-tunnel approach does not require any support from core routers (as with IntServ and DiffServ); it is scalable in the sense that core routers do not have to maintain per-flow state (as with IntServ); and it is readily deployable within a single ISP or across multiple ISPs. To evaluate our approach, we develop a flow-level control-theoretic model to study the transient behavior of established elastic TCP-based tunnels. The model captures the effect of cross-traffic connections on our bandwidth allocation policies. Through extensive simulations, we confirm the effectiveness of our approach in providing soft bandwidth guarantees. We also outline our kernel-level ITM prototype implementation.
Resumo:
Internet Traffic Managers (ITMs) are special machines placed at strategic places in the Internet. itmBench is an interface that allows users (e.g. network managers, service providers, or experimental researchers) to register different traffic control functionalities to run on one ITM or an overlay of ITMs. Thus itmBench offers a tool that is extensible and powerful yet easy to maintain. ITM traffic control applications could be developed either using a kernel API so they run in kernel space, or using a user-space API so they run in user space. We demonstrate the flexibility of itmBench by showing the implementation of both a kernel module that provides a differentiated network service, and a user-space module that provides an overlay routing service. Our itmBench Linux-based prototype is free software and can be obtained from http://www.cs.bu.edu/groups/itm/.
Resumo:
Anomalies are unusual and significant changes in a network's traffic levels, which can often involve multiple links. Diagnosing anomalies is critical for both network operators and end users. It is a difficult problem because one must extract and interpret anomalous patterns from large amounts of high-dimensional, noisy data. In this paper we propose a general method to diagnose anomalies. This method is based on a separation of the high-dimensional space occupied by a set of network traffic measurements into disjoint subspaces corresponding to normal and anomalous network conditions. We show that this separation can be performed effectively using Principal Component Analysis. Using only simple traffic measurements from links, we study volume anomalies and show that the method can: (1) accurately detect when a volume anomaly is occurring; (2) correctly identify the underlying origin-destination (OD) flow which is the source of the anomaly; and (3) accurately estimate the amount of traffic involved in the anomalous OD flow. We evaluate the method's ability to diagnose (i.e., detect, identify, and quantify) both existing and synthetically injected volume anomalies in real traffic from two backbone networks. Our method consistently diagnoses the largest volume anomalies, and does so with a very low false alarm rate.
Resumo:
We analyzed the logs of our departmental HTTP server http://cs-www.bu.edu as well as the logs of the more popular Rolling Stones HTTP server http://www.stones.com. These servers have very different purposes; the former caters primarily to local clients, whereas the latter caters exclusively to remote clients all over the world. In both cases, our analysis showed that remote HTTP accesses were confined to a very small subset of documents. Using a validated analytical model of server popularity and file access profiles, we show that by disseminating the most popular documents on servers (proxies) closer to the clients, network traffic could be reduced considerably, while server loads are balanced. We argue that this process could be generalized so as to provide for an automated demand-based duplication of documents. We believe that such server-based information dissemination protocols will be more effective at reducing both network bandwidth and document retrieval times than client-based caching protocols [2].
Resumo:
The explosion of WWW traffic necessitates an accurate picture of WWW use, and in particular requires a good understanding of client requests for WWW documents. To address this need, we have collected traces of actual executions of NCSA Mosaic, reflecting over half a million user requests for WWW documents. In this paper we describe the methods we used to collect our traces, and the formats of the collected data. Next, we present a descriptive statistical summary of the traces we collected, which identifies a number of trends and reference patterns in WWW use. In particular, we show that many characteristics of WWW use can be modelled using power-law distributions, including the distribution of document sizes, the popularity of documents as a function of size, the distribution of user requests for documents, and the number of references to documents as a function of their overall rank in popularity (Zipf's law). Finally, we show how the power-law distributions derived from our traces can be used to guide system designers interested in caching WWW documents.
Resumo:
Recently the notion of self-similarity has been shown to apply to wide-area and local-area network traffic. In this paper we examine the mechanisms that give rise to self-similar network traffic. We present an explanation for traffic self-similarity by using a particular subset of wide area traffic: traffic due to the World Wide Web (WWW). Using an extensive set of traces of actual user executions of NCSA Mosaic, reflecting over half a million requests for WWW documents, we show evidence that WWW traffic is self-similar. Then we show that the self-similarity in such traffic can be explained based on the underlying distributions of WWW document sizes, the effects of caching and user preference in file transfer, the effect of user "think time", and the superimposition of many such transfers in a local area network. To do this we rely on empirically measured distributions both from our traces and from data independently collected at over thirty WWW sites.
Resumo:
Recent measurements of local-area and wide-area traffic have shown that network traffic exhibits variability at a wide range of scales self-similarity. In this paper, we examine a mechanism that gives rise to self-similar network traffic and present some of its performance implications. The mechanism we study is the transfer of files or messages whose size is drawn from a heavy-tailed distribution. We examine its effects through detailed transport-level simulations of multiple TCP streams in an internetwork. First, we show that in a "realistic" client/server network environment i.e., one with bounded resources and coupling among traffic sources competing for resources the degree to which file sizes are heavy-tailed can directly determine the degree of traffic self-similarity at the link level. We show that this causal relationship is not significantly affected by changes in network resources (bottleneck bandwidth and buffer capacity), network topology, the influence of cross-traffic, or the distribution of interarrival times. Second, we show that properties of the transport layer play an important role in preserving and modulating this relationship. In particular, the reliable transmission and flow control mechanisms of TCP (Reno, Tahoe, or Vegas) serve to maintain the long-range dependency structure induced by heavy-tailed file size distributions. In contrast, if a non-flow-controlled and unreliable (UDP-based) transport protocol is used, the resulting traffic shows little self-similar characteristics: although still bursty at short time scales, it has little long-range dependence. If flow-controlled, unreliable transport is employed, the degree of traffic self-similarity is positively correlated with the degree of throttling at the source. Third, in exploring the relationship between file sizes, transport protocols, and self-similarity, we are also able to show some of the performance implications of self-similarity. We present data on the relationship between traffic self-similarity and network performance as captured by performance measures including packet loss rate, retransmission rate, and queueing delay. Increased self-similarity, as expected, results in degradation of performance. Queueing delay, in particular, exhibits a drastic increase with increasing self-similarity. Throughput-related measures such as packet loss and retransmission rate, however, increase only gradually with increasing traffic self-similarity as long as reliable, flow-controlled transport protocol is used.
Resumo:
The congestion control mechanisms of TCP make it vulnerable in an environment where flows with different congestion-sensitivity compete for scarce resources. With the increasing amount of unresponsive UDP traffic in today's Internet, new mechanisms are needed to enforce fairness in the core of the network. We propose a scalable Diffserv-like architecture, where flows with different characteristics are classified into separate service queues at the routers. Such class-based isolation provides protection so that flows with different characteristics do not negatively impact one another. In this study, we examine different aspects of UDP and TCP interaction and possible gains from segregating UDP and TCP into different classes. We also investigate the utility of further segregating TCP flows into two classes, which are class of short and class of long flows. Results are obtained analytically for both Tail-drop and Random Early Drop (RED) routers. Class-based isolation have the following salient features: (1) better fairness, (2) improved predictability for all kinds of flows, (3) lower transmission delay for delay-sensitive flows, and (4) better control over Quality of Service (QoS) of a particular traffic type.
Resumo:
Internet measurements show that the size distribution of Web-based transactions is usually very skewed; a few large requests constitute most of the total traffic. Motivated by the advantages of scheduling algorithms which favor short jobs, we propose to perform differentiated control over Web-based transactions to give preferential service to short web requests. The control is realized through service semantics provided by Internet Traffic Managers, a Diffserv-like architecture. To evaluate the performance of such a control system, it is necessary to have a fast but accurate analytical method. To this end, we model the Internet as a time-shared system and propose a numerical approach which utilizes Kleinrock's conservation law to solve the model. The numerical results are shown to match well those obtained by packet-level simulation, which runs orders of magnitude slower than our numerical method.
Resumo:
A number of problems in network operations and engineering call for new methods of traffic analysis. While most existing traffic analysis methods are fundamentally temporal, there is a clear need for the analysis of traffic across multiple network links — that is, for spatial traffic analysis. In this paper we give examples of problems that can be addressed via spatial traffic analysis. We then propose a formal approach to spatial traffic analysis based on the wavelet transform. Our approach (graph wavelets) generalizes the traditional wavelet transform so that it can be applied to data elements connected via an arbitrary graph topology. We explore the necessary and desirable properties of this approach and consider some of its possible realizations. We then apply graph wavelets to measurements from an operating network. Our results show that graph wavelets are very useful for our motivating problems; for example, they can be used to form highly summarized views of an entire network's traffic load, to gain insight into a network's global traffic response to a link failure, and to localize the extent of a failure event within the network.
Resumo:
Quality of Service (QoS) guarantees are required by an increasing number of applications to ensure a minimal level of fidelity in the delivery of application data units through the network. Application-level QoS does not necessarily follow from any transport-level QoS guarantees regarding the delivery of the individual cells (e.g. ATM cells) which comprise the application's data units. The distinction between application-level and transport-level QoS guarantees is due primarily to the fragmentation that occurs when transmitting large application data units (e.g. IP packets, or video frames) using much smaller network cells, whereby the partial delivery of a data unit is useless; and, bandwidth spent to partially transmit the data unit is wasted. The data units transmitted by an application may vary in size while being constant in rate, which results in a variable bit rate (VBR) data flow. That data flow requires QoS guarantees. Statistical multiplexing is inadequate, because no guarantees can be made and no firewall property exists between different data flows. In this paper, we present a novel resource management paradigm for the maintenance of application-level QoS for VBR flows. Our paradigm is based on Statistical Rate Monotonic Scheduling (SRMS), in which (1) each application generates its variable-size data units at a fixed rate, (2) the partial delivery of data units is of no value to the application, and (3) the QoS guarantee extended to the application is the probability that an arbitrary data unit will be successfully transmitted through the network to/from the application.
Resumo:
The performance of different classification approaches is evaluated using a view-based approach for motion representation. The view-based approach uses computer vision and image processing techniques to register and process the video sequence. Two motion representations called Motion Energy Images and Motion History Image are then constructed. These representations collapse the temporal component in a way that no explicit temporal analysis or sequence matching is needed. Statistical descriptions are then computed using moment-based features and dimensionality reduction techniques. For these tests, we used 7 Hu moments, which are invariant to scale and translation. Principal Components Analysis is used to reduce the dimensionality of this representation. The system is trained using different subjects performing a set of examples of every action to be recognized. Given these samples, K-nearest neighbor, Gaussian, and Gaussian mixture classifiers are used to recognize new actions. Experiments are conducted using instances of eight human actions (i.e., eight classes) performed by seven different subjects. Comparisons in the performance among these classifiers under different conditions are analyzed and reported. Our main goals are to test this dimensionality-reduced representation of actions, and more importantly to use this representation to compare the advantages of different classification approaches in this recognition task.