25 resultados para network traffic analysis

em Boston University Digital Common


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Network traffic arises from the superposition of Origin-Destination (OD) flows. Hence, a thorough understanding of OD flows is essential for modeling network traffic, and for addressing a wide variety of problems including traffic engineering, traffic matrix estimation, capacity planning, forecasting and anomaly detection. However, to date, OD flows have not been closely studied, and there is very little known about their properties. We present the first analysis of complete sets of OD flow timeseries, taken from two different backbone networks (Abilene and Sprint-Europe). Using Principal Component Analysis (PCA), we find that the set of OD flows has small intrinsic dimension. In fact, even in a network with over a hundred OD flows, these flows can be accurately modeled in time using a small number (10 or less) of independent components or dimensions. We also show how to use PCA to systematically decompose the structure of OD flow timeseries into three main constituents: common periodic trends, short-lived bursts, and noise. We provide insight into how the various constituents contribute to the overall structure of OD flows and explore the extent to which this decomposition varies over time.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A number of problems in network operations and engineering call for new methods of traffic analysis. While most existing traffic analysis methods are fundamentally temporal, there is a clear need for the analysis of traffic across multiple network links — that is, for spatial traffic analysis. In this paper we give examples of problems that can be addressed via spatial traffic analysis. We then propose a formal approach to spatial traffic analysis based on the wavelet transform. Our approach (graph wavelets) generalizes the traditional wavelet transform so that it can be applied to data elements connected via an arbitrary graph topology. We explore the necessary and desirable properties of this approach and consider some of its possible realizations. We then apply graph wavelets to measurements from an operating network. Our results show that graph wavelets are very useful for our motivating problems; for example, they can be used to form highly summarized views of an entire network's traffic load, to gain insight into a network's global traffic response to a link failure, and to localize the extent of a failure event within the network.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Accurate knowledge of traffic demands in a communication network enables or enhances a variety of traffic engineering and network management tasks of paramount importance for operational networks. Directly measuring a complete set of these demands is prohibitively expensive because of the huge amounts of data that must be collected and the performance impact that such measurements would impose on the regular behavior of the network. As a consequence, we must rely on statistical techniques to produce estimates of actual traffic demands from partial information. The performance of such techniques is however limited due to their reliance on limited information and the high amount of computations they incur, which limits their convergence behavior. In this paper we study strategies to improve the convergence of a powerful statistical technique based on an Expectation-Maximization iterative algorithm. First we analyze modeling approaches to generating starting points. We call these starting points informed priors since they are obtained using actual network information such as packet traces and SNMP link counts. Second we provide a very fast variant of the EM algorithm which extends its computation range, increasing its accuracy and decreasing its dependence on the quality of the starting point. Finally, we study the convergence characteristics of our EM algorithm and compare it against a recently proposed Weighted Least Squares approach.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Accurate knowledge of traffic demands in a communication network enables or enhances a variety of traffic engineering and network management tasks of paramount importance for operational networks. Directly measuring a complete set of these demands is prohibitively expensive because of the huge amounts of data that must be collected and the performance impact that such measurements would impose on the regular behavior of the network. As a consequence, we must rely on statistical techniques to produce estimates of actual traffic demands from partial information. The performance of such techniques is however limited due to their reliance on limited information and the high amount of computations they incur, which limits their convergence behavior. In this paper we study a two-step approach for inferring network traffic demands. First we elaborate and evaluate a modeling approach for generating good starting points to be fed to iterative statistical inference techniques. We call these starting points informed priors since they are obtained using actual network information such as packet traces and SNMP link counts. Second we provide a very fast variant of the EM algorithm which extends its computation range, increasing its accuracy and decreasing its dependence on the quality of the starting point. Finally, we evaluate and compare alternative mechanisms for generating starting points and the convergence characteristics of our EM algorithm against a recently proposed Weighted Least Squares approach.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Recent measurements of local-area and wide-area traffic have shown that network traffic exhibits variability at a wide range of scales self-similarity. In this paper, we examine a mechanism that gives rise to self-similar network traffic and present some of its performance implications. The mechanism we study is the transfer of files or messages whose size is drawn from a heavy-tailed distribution. We examine its effects through detailed transport-level simulations of multiple TCP streams in an internetwork. First, we show that in a "realistic" client/server network environment i.e., one with bounded resources and coupling among traffic sources competing for resources the degree to which file sizes are heavy-tailed can directly determine the degree of traffic self-similarity at the link level. We show that this causal relationship is not significantly affected by changes in network resources (bottleneck bandwidth and buffer capacity), network topology, the influence of cross-traffic, or the distribution of interarrival times. Second, we show that properties of the transport layer play an important role in preserving and modulating this relationship. In particular, the reliable transmission and flow control mechanisms of TCP (Reno, Tahoe, or Vegas) serve to maintain the long-range dependency structure induced by heavy-tailed file size distributions. In contrast, if a non-flow-controlled and unreliable (UDP-based) transport protocol is used, the resulting traffic shows little self-similar characteristics: although still bursty at short time scales, it has little long-range dependence. If flow-controlled, unreliable transport is employed, the degree of traffic self-similarity is positively correlated with the degree of throttling at the source. Third, in exploring the relationship between file sizes, transport protocols, and self-similarity, we are also able to show some of the performance implications of self-similarity. We present data on the relationship between traffic self-similarity and network performance as captured by performance measures including packet loss rate, retransmission rate, and queueing delay. Increased self-similarity, as expected, results in degradation of performance. Queueing delay, in particular, exhibits a drastic increase with increasing self-similarity. Throughput-related measures such as packet loss and retransmission rate, however, increase only gradually with increasing traffic self-similarity as long as reliable, flow-controlled transport protocol is used.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Anomalies are unusual and significant changes in a network's traffic levels, which can often involve multiple links. Diagnosing anomalies is critical for both network operators and end users. It is a difficult problem because one must extract and interpret anomalous patterns from large amounts of high-dimensional, noisy data. In this paper we propose a general method to diagnose anomalies. This method is based on a separation of the high-dimensional space occupied by a set of network traffic measurements into disjoint subspaces corresponding to normal and anomalous network conditions. We show that this separation can be performed effectively using Principal Component Analysis. Using only simple traffic measurements from links, we study volume anomalies and show that the method can: (1) accurately detect when a volume anomaly is occurring; (2) correctly identify the underlying origin-destination (OD) flow which is the source of the anomaly; and (3) accurately estimate the amount of traffic involved in the anomalous OD flow. We evaluate the method's ability to diagnose (i.e., detect, identify, and quantify) both existing and synthetically injected volume anomalies in real traffic from two backbone networks. Our method consistently diagnoses the largest volume anomalies, and does so with a very low false alarm rate.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In a recent paper, Structural Analysis of Network Traffic Flows, we analyzed the set of Origin Destination traffic flows from the Sprint-Europe and Abilene backbone networks. This report presents the complete set of results from analyzing data from both networks. The results in this report are specific to the Sprint-1 and Abilene datasets studied in the above paper. The following results are presented here: 1 Rows of Principal Matrix (V) 2 1.1 Sprint-1 Dataset ................................ 2 1.2 Abilene Dataset.................................. 9 2 Set of Eigenflows 14 2.1 Sprint-1 Dataset.................................. 14 2.2 Abilene Dataset................................... 21 3 Classifying Eigenflows 26 3.1 Sprint-1 Dataset.................................. 26 3.2 Abilene Datase.................................... 44

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Previous studies have shown that giving preferential treatment to short jobs helps reduce the average system response time, especially when the job size distribution possesses the heavy-tailed property. Since it has been shown that the TCP flow length distribution also has the same property, it is natural to let short TCP flows enjoy better service inside the network. Analyzing such discriminatory system requires modification to traditional job scheduling models since usually network traffic managers do not have detailed knowledge about individual flows such as their lengths. The Multi-Level (ML) queue, proposed by Kleinrock, can b e used to characterize such system. In an ML queueing system, the priority of a flow is reduced as the flow stays longer. We present an approximate analysis of the ML queueing system to obtain a closed-form solution of the average system response time function for general flow size distributions. We show that the response time of short flows can be significantly reduced without penalizing long flows.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The increasing practicality of large-scale flow capture makes it possible to conceive of traffic analysis methods that detect and identify a large and diverse set of anomalies. However the challenge of effectively analyzing this massive data source for anomaly diagnosis is as yet unmet. We argue that the distributions of packet features (IP addresses and ports) observed in flow traces reveals both the presence and the structure of a wide range of anomalies. Using entropy as a summarization tool, we show that the analysis of feature distributions leads to significant advances on two fronts: (1) it enables highly sensitive detection of a wide range of anomalies, augmenting detections by volume-based methods, and (2) it enables automatic classification of anomalies via unsupervised learning. We show that using feature distributions, anomalies naturally fall into distinct and meaningful clusters. These clusters can be used to automatically classify anomalies and to uncover new anomaly types. We validate our claims on data from two backbone networks (Abilene and Geant) and conclude that feature distributions show promise as a key element of a fairly general network anomaly diagnosis framework.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

We analyzed the logs of our departmental HTTP server http://cs-www.bu.edu as well as the logs of the more popular Rolling Stones HTTP server http://www.stones.com. These servers have very different purposes; the former caters primarily to local clients, whereas the latter caters exclusively to remote clients all over the world. In both cases, our analysis showed that remote HTTP accesses were confined to a very small subset of documents. Using a validated analytical model of server popularity and file access profiles, we show that by disseminating the most popular documents on servers (proxies) closer to the clients, network traffic could be reduced considerably, while server loads are balanced. We argue that this process could be generalized so as to provide for an automated demand-based duplication of documents. We believe that such server-based information dissemination protocols will be more effective at reducing both network bandwidth and document retrieval times than client-based caching protocols [2].

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Recently the notion of self-similarity has been shown to apply to wide-area and local-area network traffic. In this paper we examine the mechanisms that give rise to self-similar network traffic. We present an explanation for traffic self-similarity by using a particular subset of wide area traffic: traffic due to the World Wide Web (WWW). Using an extensive set of traces of actual user executions of NCSA Mosaic, reflecting over half a million requests for WWW documents, we show evidence that WWW traffic is self-similar. Then we show that the self-similarity in such traffic can be explained based on the underlying distributions of WWW document sizes, the effects of caching and user preference in file transfer, the effect of user "think time", and the superimposition of many such transfers in a local area network. To do this we rely on empirically measured distributions both from our traces and from data independently collected at over thirty WWW sites.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The exploding demand for services like the World Wide Web reflects the potential that is presented by globally distributed information systems. The number of WWW servers world-wide has doubled every 3 to 5 months since 1993, outstripping even the growth of the Internet. At each of these self-managed sites, the Common Gateway Interface (CGI) and Hypertext Transfer Protocol (HTTP) already constitute a rudimentary basis for contributing local resources to remote collaborations. However, the Web has serious deficiencies that make it unsuited for use as a true medium for metacomputing --- the process of bringing hardware, software, and expertise from many geographically dispersed sources to bear on large scale problems. These deficiencies are, paradoxically, the direct result of the very simple design principles that enabled its exponential growth. There are many symptoms of the problems exhibited by the Web: disk and network resources are consumed extravagantly; information search and discovery are difficult; protocols are aimed at data movement rather than task migration, and ignore the potential for distributing computation. However, all of these can be seen as aspects of a single problem: as a distributed system for metacomputing, the Web offers unpredictable performance and unreliable results. The goal of our project is to use the Web as a medium (within either the global Internet or an enterprise intranet) for metacomputing in a reliable way with performance guarantees. We attack this problem one four levels: (1) Resource Management Services: Globally distributed computing allows novel approaches to the old problems of performance guarantees and reliability. Our first set of ideas involve setting up a family of real-time resource management models organized by the Web Computing Framework with a standard Resource Management Interface (RMI), a Resource Registry, a Task Registry, and resource management protocols to allow resource needs and availability information be collected and disseminated so that a family of algorithms with varying computational precision and accuracy of representations can be chosen to meet realtime and reliability constraints. (2) Middleware Services: Complementary to techniques for allocating and scheduling available resources to serve application needs under realtime and reliability constraints, the second set of ideas aim at reduce communication latency, traffic congestion, server work load, etc. We develop customizable middleware services to exploit application characteristics in traffic analysis to drive new server/browser design strategies (e.g., exploit self-similarity of Web traffic), derive document access patterns via multiserver cooperation, and use them in speculative prefetching, document caching, and aggressive replication to reduce server load and bandwidth requirements. (3) Communication Infrastructure: Finally, to achieve any guarantee of quality of service or performance, one must get at the network layer that can provide the basic guarantees of bandwidth, latency, and reliability. Therefore, the third area is a set of new techniques in network service and protocol designs. (4) Object-Oriented Web Computing Framework A useful resource management system must deal with job priority, fault-tolerance, quality of service, complex resources such as ATM channels, probabilistic models, etc., and models must be tailored to represent the best tradeoff for a particular setting. This requires a family of models, organized within an object-oriented framework, because no one-size-fits-all approach is appropriate. This presents a software engineering challenge requiring integration of solutions at all levels: algorithms, models, protocols, and profiling and monitoring tools. The framework captures the abstract class interfaces of the collection of cooperating components, but allows the concretization of each component to be driven by the requirements of a specific approach and environment.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The data streaming model provides an attractive framework for one-pass summarization of massive data sets at a single observation point. However, in an environment where multiple data streams arrive at a set of distributed observation points, sketches must be computed remotely and then must be aggregated through a hierarchy before queries may be conducted. As a result, many sketch-based methods for the single stream case do not apply directly, as either the error introduced becomes large, or because the methods assume that the streams are non-overlapping. These limitations hinder the application of these techniques to practical problems in network traffic monitoring and aggregation in sensor networks. To address this, we develop a general framework for evaluating and enabling robust computation of duplicate-sensitive aggregate functions (e.g., SUM and QUANTILE), over data produced by distributed sources. We instantiate our approach by augmenting the Count-Min and Quantile-Digest sketches to apply in this distributed setting, and analyze their performance. We conclude with experimental evaluation to validate our analysis.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

MPLS (Multi-Protocol Label Switching) has recently emerged to facilitate the engineering of network traffic. This can be achieved by directing packet flows over paths that satisfy multiple requirements. MPLS has been regarded as an enhancement to traditional IP routing, which has the following problems: (1) all packets with the same IP destination address have to follow the same path through the network; and (2) paths have often been computed based on static and single link metrics. These problems may cause traffic concentration, and thus degradation in quality of service. In this paper, we investigate by simulations a range of routing solutions and examine the tradeoff between scalability and performance. At one extreme, IP packet routing using dynamic link metrics provides a stateless solution but may lead to routing oscillations. At the other extreme, we consider a recently proposed Profile-based Routing (PBR), which uses knowledge of potential ingress-egress pairs as well as the traffic profile among them. Minimum Interference Routing (MIRA) is another recently proposed MPLS-based scheme, which only exploits knowledge of potential ingress-egress pairs but not their traffic profile. MIRA and the more conventional widest-shortest path (WSP) routing represent alternative MPLS-based approaches on the spectrum of routing solutions. We compare these solutions in terms of utility, bandwidth acceptance ratio as well as their scalability (routing state and computational overhead) and load balancing capability. While the simplest of the per-flow algorithms we consider, the performance of WSP is close to dynamic per-packet routing, without the potential instabilities of dynamic routing.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Version 1.1 of the Hyper Text Transfer Protocol (HTTP) was principally developed as a means for reducing both document transfer latency and network traffic. The rationale for the performance enhancements in HTTP/1.1 is based on the assumption that the network is the bottleneck in Web transactions. In practice, however, the Web server can be the primary source of document transfer latency. In this paper, we characterize and compare the performance of HTTP/1.0 and HTTP/1.1 in terms of throughput at the server and transfer latency at the client. Our approach is based on considering a broader set of bottlenecks in an HTTP transfer; we examine how bottlenecks in the network, CPU, and in the disk system affect the relative performance of HTTP/1.0 versus HTTP/1.1. We show that the network demands under HTTP/1.1 are somewhat lower than HTTP/1.0, and we quantify those differences in terms of packets transferred, server congestion window size and data bytes per packet. We show that when the CPU is the bottleneck, there is relatively little difference in performance between HTTP/1.0 and HTTP/1.1. Surprisingly, we show that when the disk system is the bottleneck, performance using HTTP/1.1 can be much worse than with HTTP/1.0. Based on these observations, we suggest a connection management policy for HTTP/1.1 that can improve throughput, decrease latency, and keep network traffic low when the disk system is the bottleneck.