45 resultados para National Biodiversity Network
em Boston University Digital Common
Resumo:
Numerous problems exist that can be modeled as traffic through a network in which constraints exist to regulate flow. Vehicular road travel, computer networks, and cloud based resource distribution, among others all have natural representations in this manner. As these networks grow in size and/or complexity, analysis and certification of the safety invariants becomes increasingly costly. The NetSketch formalism introduces a lightweight verification framework that allows for greater scalability than traditional analysis methods. The NetSketch tool was developed to provide the power of this formalism in an easy to use and intuitive user interface.
Resumo:
The Transmission Control Protocol (TCP) has been the protocol of choice for many Internet applications requiring reliable connections. The design of TCP has been challenged by the extension of connections over wireless links. We ask a fundamental question: What is the basic predictive power of TCP of network state, including wireless error conditions? The goal is to improve or readily exploit this predictive power to enable TCP (or variants) to perform well in generalized network settings. To that end, we use Maximum Likelihood Ratio tests to evaluate TCP as a detector/estimator. We quantify how well network state can be estimated, given network response such as distributions of packet delays or TCP throughput that are conditioned on the type of packet loss. Using our model-based approach and extensive simulations, we demonstrate that congestion-induced losses and losses due to wireless transmission errors produce sufficiently different statistics upon which an efficient detector can be built; distributions of network loads can provide effective means for estimating packet loss type; and packet delay is a better signal of network state than short-term throughput. We demonstrate how estimation accuracy is influenced by different proportions of congestion versus wireless losses and penalties on incorrect estimation.
Resumo:
Formal correctness of complex multi-party network protocols can be difficult to verify. While models of specific fixed compositions of agents can be checked against design constraints, protocols which lend themselves to arbitrarily many compositions of agents-such as the chaining of proxies or the peering of routers-are more difficult to verify because they represent potentially infinite state spaces and may exhibit emergent behaviors which may not materialize under particular fixed compositions. We address this challenge by developing an algebraic approach that enables us to reduce arbitrary compositions of network agents into a behaviorally-equivalent (with respect to some correctness property) compact, canonical representation, which is amenable to mechanical verification. Our approach consists of an algebra and a set of property-preserving rewrite rules for the Canonical Homomorphic Abstraction of Infinite Network protocol compositions (CHAIN). Using CHAIN, an expression over our algebra (i.e., a set of configurations of network protocol agents) can be reduced to another behaviorally-equivalent expression (i.e., a smaller set of configurations). Repeated applications of such rewrite rules produces a canonical expression which can be checked mechanically. We demonstrate our approach by characterizing deadlock-prone configurations of HTTP agents, as well as establishing useful properties of an overlay protocol for scheduling MPEG frames, and of a protocol for Web intra-cache consistency.
Resumo:
Network traffic arises from the superposition of Origin-Destination (OD) flows. Hence, a thorough understanding of OD flows is essential for modeling network traffic, and for addressing a wide variety of problems including traffic engineering, traffic matrix estimation, capacity planning, forecasting and anomaly detection. However, to date, OD flows have not been closely studied, and there is very little known about their properties. We present the first analysis of complete sets of OD flow timeseries, taken from two different backbone networks (Abilene and Sprint-Europe). Using Principal Component Analysis (PCA), we find that the set of OD flows has small intrinsic dimension. In fact, even in a network with over a hundred OD flows, these flows can be accurately modeled in time using a small number (10 or less) of independent components or dimensions. We also show how to use PCA to systematically decompose the structure of OD flow timeseries into three main constituents: common periodic trends, short-lived bursts, and noise. We provide insight into how the various constituents contribute to the overall structure of OD flows and explore the extent to which this decomposition varies over time.
Resumo:
Anomalies are unusual and significant changes in a network's traffic levels, which can often involve multiple links. Diagnosing anomalies is critical for both network operators and end users. It is a difficult problem because one must extract and interpret anomalous patterns from large amounts of high-dimensional, noisy data. In this paper we propose a general method to diagnose anomalies. This method is based on a separation of the high-dimensional space occupied by a set of network traffic measurements into disjoint subspaces corresponding to normal and anomalous network conditions. We show that this separation can be performed effectively using Principal Component Analysis. Using only simple traffic measurements from links, we study volume anomalies and show that the method can: (1) accurately detect when a volume anomaly is occurring; (2) correctly identify the underlying origin-destination (OD) flow which is the source of the anomaly; and (3) accurately estimate the amount of traffic involved in the anomalous OD flow. We evaluate the method's ability to diagnose (i.e., detect, identify, and quantify) both existing and synthetically injected volume anomalies in real traffic from two backbone networks. Our method consistently diagnoses the largest volume anomalies, and does so with a very low false alarm rate.
Resumo:
Accurate knowledge of traffic demands in a communication network enables or enhances a variety of traffic engineering and network management tasks of paramount importance for operational networks. Directly measuring a complete set of these demands is prohibitively expensive because of the huge amounts of data that must be collected and the performance impact that such measurements would impose on the regular behavior of the network. As a consequence, we must rely on statistical techniques to produce estimates of actual traffic demands from partial information. The performance of such techniques is however limited due to their reliance on limited information and the high amount of computations they incur, which limits their convergence behavior. In this paper we study a two-step approach for inferring network traffic demands. First we elaborate and evaluate a modeling approach for generating good starting points to be fed to iterative statistical inference techniques. We call these starting points informed priors since they are obtained using actual network information such as packet traces and SNMP link counts. Second we provide a very fast variant of the EM algorithm which extends its computation range, increasing its accuracy and decreasing its dependence on the quality of the starting point. Finally, we evaluate and compare alternative mechanisms for generating starting points and the convergence characteristics of our EM algorithm against a recently proposed Weighted Least Squares approach.
Resumo:
Detecting and understanding anomalies in IP networks is an open and ill-defined problem. Toward this end, we have recently proposed the subspace method for anomaly diagnosis. In this paper we present the first large-scale exploration of the power of the subspace method when applied to flow traffic. An important aspect of this approach is that it fuses information from flow measurements taken throughout a network. We apply the subspace method to three different types of sampled flow traffic in a large academic network: multivariate timeseries of byte counts, packet counts, and IP-flow counts. We show that each traffic type brings into focus a different set of anomalies via the subspace method. We illustrate and classify the set of anomalies detected. We find that almost all of the anomalies detected represent events of interest to network operators. Furthermore, the anomalies span a remarkably wide spectrum of event types, including denial of service attacks (single-source and distributed), flash crowds, port scanning, downstream traffic engineering, high-rate flows, worm propagation, and network outage.
Resumo:
Recent measurements of local-area and wide-area traffic have shown that network traffic exhibits variability at a wide range of scales self-similarity. In this paper, we examine a mechanism that gives rise to self-similar network traffic and present some of its performance implications. The mechanism we study is the transfer of files or messages whose size is drawn from a heavy-tailed distribution. We examine its effects through detailed transport-level simulations of multiple TCP streams in an internetwork. First, we show that in a "realistic" client/server network environment i.e., one with bounded resources and coupling among traffic sources competing for resources the degree to which file sizes are heavy-tailed can directly determine the degree of traffic self-similarity at the link level. We show that this causal relationship is not significantly affected by changes in network resources (bottleneck bandwidth and buffer capacity), network topology, the influence of cross-traffic, or the distribution of interarrival times. Second, we show that properties of the transport layer play an important role in preserving and modulating this relationship. In particular, the reliable transmission and flow control mechanisms of TCP (Reno, Tahoe, or Vegas) serve to maintain the long-range dependency structure induced by heavy-tailed file size distributions. In contrast, if a non-flow-controlled and unreliable (UDP-based) transport protocol is used, the resulting traffic shows little self-similar characteristics: although still bursty at short time scales, it has little long-range dependence. If flow-controlled, unreliable transport is employed, the degree of traffic self-similarity is positively correlated with the degree of throttling at the source. Third, in exploring the relationship between file sizes, transport protocols, and self-similarity, we are also able to show some of the performance implications of self-similarity. We present data on the relationship between traffic self-similarity and network performance as captured by performance measures including packet loss rate, retransmission rate, and queueing delay. Increased self-similarity, as expected, results in degradation of performance. Queueing delay, in particular, exhibits a drastic increase with increasing self-similarity. Throughput-related measures such as packet loss and retransmission rate, however, increase only gradually with increasing traffic self-similarity as long as reliable, flow-controlled transport protocol is used.
Resumo:
One role for workload generation is as a means for understanding how servers and networks respond to variation in load. This enables management and capacity planning based on current and projected usage. This paper applies a number of observations of Web server usage to create a realistic Web workload generation tool which mimics a set of real users accessing a server. The tool, called Surge (Scalable URL Reference Generator) generates references matching empirical measurements of 1) server file size distribution; 2) request size distribution; 3) relative file popularity; 4) embedded file references; 5) temporal locality of reference; and 6) idle periods of individual users. This paper reviews the essential elements required in the generation of a representative Web workload. It also addresses the technical challenges to satisfying this large set of simultaneous constraints on the properties of the reference stream, the solutions we adopted, and their associated accuracy. Finally, we present evidence that Surge exercises servers in a manner significantly different from other Web server benchmarks.
Resumo:
The development and deployment of distributed network-aware applications and services over the Internet require the ability to compile and maintain a model of the underlying network resources with respect to (one or more) characteristic properties of interest. To be manageable, such models must be compact, and must enable a representation of properties along temporal, spatial, and measurement resolution dimensions. In this paper, we propose a general framework for the construction of such metric-induced models using end-to-end measurements. We instantiate our approach using one such property, packet loss rates, and present an analytical framework for the characterization of Internet loss topologies. From the perspective of a server the loss topology is a logical tree rooted at the server with clients at its leaves, in which edges represent lossy paths between a pair of internal network nodes. We show how end-to-end unicast packet probing techniques could b e used to (1) infer a loss topology and (2) identify the loss rates of links in an existing loss topology. Correct, efficient inference of loss topology information enables new techniques for aggregate congestion control, QoS admission control, connection scheduling and mirror site selection. We report on simulation, implementation, and Internet deployment results that show the effectiveness of our approach and its robustness in terms of its accuracy and convergence over a wide range of network conditions.
Resumo:
In our previous work, we developed TRAFFIC(X), a specification language for modeling bi-directional network flows featuring a type system with constrained polymorphism. In this paper, we present two ways to customize the constraint system: (1) when using linear inequality constraints for the constraint system, TRAFFIC(X) can describe flows with numeric properties such as MTU (maximum transmission unit), RTT (round trip time), traversal order, and bandwidth allocation over parallel paths; (2) when using Boolean predicate constraints for the constraint system, TRAFFIC(X) can describe routing policies of an IP network. These examples illustrate how to use the customized type system.
Resumo:
Internet streaming applications are adversely affected by network conditions such as high packet loss rates and long delays. This paper aims at mitigating such effects by leveraging the availability of client-side caching proxies. We present a novel caching architecture (and associated cache management algorithms) that turn edge caches into accelerators of streaming media delivery. A salient feature of our caching algorithms is that they allow partial caching of streaming media objects and joint delivery of content from caches and origin servers. The caching algorithms we propose are both network-aware and stream-aware; they take into account the popularity of streaming media objects, their bit-rate requirements, and the available bandwidth between clients and servers. Using realistic models of Internet bandwidth (derived from proxy cache logs and measured over real Internet paths), we have conducted extensive simulations to evaluate the performance of various cache management alternatives. Our experiments demonstrate that network-aware caching algorithms can significantly reduce service delay and improve overall stream quality. Also, our experiments show that partial caching is particularly effective when bandwidth variability is not very high.
Resumo:
Formal tools like finite-state model checkers have proven useful in verifying the correctness of systems of bounded size and for hardening single system components against arbitrary inputs. However, conventional applications of these techniques are not well suited to characterizing emergent behaviors of large compositions of processes. In this paper, we present a methodology by which arbitrarily large compositions of components can, if sufficient conditions are proven concerning properties of small compositions, be modeled and completely verified by performing formal verifications upon only a finite set of compositions. The sufficient conditions take the form of reductions, which are claims that particular sequences of components will be causally indistinguishable from other shorter sequences of components. We show how this methodology can be applied to a variety of network protocol applications, including two features of the HTTP protocol, a simple active networking applet, and a proposed web cache consistency algorithm. We also doing discuss its applicability to framing protocol design goals and to representing systems which employ non-model-checking verification methodologies. Finally, we briefly discuss how we hope to broaden this methodology to more general topological compositions of network applications.
Resumo:
Overlay networks have been used for adding and enhancing functionality to the end-users without requiring modifications in the Internet core mechanisms. Overlay networks have been used for a variety of popular applications including routing, file sharing, content distribution, and server deployment. Previous work has focused on devising practical neighbor selection heuristics under the assumption that users conform to a specific wiring protocol. This is not a valid assumption in highly decentralized systems like overlay networks. Overlay users may act selfishly and deviate from the default wiring protocols by utilizing knowledge they have about the network when selecting neighbors to improve the performance they receive from the overlay. This thesis goes against the conventional thinking that overlay users conform to a specific protocol. The contributions of this thesis are threefold. It provides a systematic evaluation of the design space of selfish neighbor selection strategies in real overlays, evaluates the performance of overlay networks that consist of users that select their neighbors selfishly, and examines the implications of selfish neighbor and server selection to overlay protocol design and service provisioning respectively. This thesis develops a game-theoretic framework that provides a unified approach to modeling Selfish Neighbor Selection (SNS) wiring procedures on behalf of selfish users. The model is general, and takes into consideration costs reflecting network latency and user preference profiles, the inherent directionality in overlay maintenance protocols, and connectivity constraints imposed on the system designer. Within this framework the notion of user’s "best response" wiring strategy is formalized as a k-median problem on asymmetric distance and is used to obtain overlay structures in which no node can re-wire to improve the performance it receives from the overlay. Evaluation results presented in this thesis indicate that selfish users can reap substantial performance benefits when connecting to overlay networks composed of non-selfish users. In addition, in overlays that are dominated by selfish users, the resulting stable wirings are optimized to such great extent that even non-selfish newcomers can extract near-optimal performance through naïve wiring strategies. To capitalize on the performance advantages of optimal neighbor selection strategies and the emergent global wirings that result, this thesis presents EGOIST: an SNS-inspired overlay network creation and maintenance routing system. Through an extensive measurement study on the deployed prototype, results presented in this thesis show that EGOIST’s neighbor selection primitives outperform existing heuristics on a variety of performance metrics, including delay, available bandwidth, and node utilization. Moreover, these results demonstrate that EGOIST is competitive with an optimal but unscalable full-mesh approach, remains highly effective under significant churn, is robust to cheating, and incurs minimal overheads. This thesis also studies selfish neighbor selection strategies for swarming applications. The main focus is on n-way broadcast applications where each of n overlay user wants to push its own distinct file to all other destinations as well as download their respective data files. Results presented in this thesis demonstrate that the performance of our swarming protocol for n-way broadcast on top of overlays of selfish users is far superior than the performance on top of existing overlays. In the context of service provisioning, this thesis examines the use of distributed approaches that enable a provider to determine the number and location of servers for optimal delivery of content or services to its selfish end-users. To leverage recent advances in virtualization technologies, this thesis develops and evaluates a distributed protocol to migrate servers based on end-users demand and only on local topological knowledge. Results under a range of network topologies and workloads suggest that the performance of the distributed deployment is comparable to that of the optimal but unscalable centralized deployment.
Resumo:
NetSketch is a tool for the specification of constrained-flow applications and the certification of desirable safety properties imposed thereon. NetSketch is conceived to assist system integrators in two types of activities: modeling and design. As a modeling tool, it enables the abstraction of an existing system while retaining sufficient information about it to carry out future analysis of safety properties. As a design tool, NetSketch enables the exploration of alternative safe designs as well as the identification of minimal requirements for outsourced subsystems. NetSketch embodies a lightweight formal verification philosophy, whereby the power (but not the heavy machinery) of a rigorous formalism is made accessible to users via a friendly interface. NetSketch does so by exposing tradeoffs between exactness of analysis and scalability, and by combining traditional whole-system analysis with a more flexible compositional analysis. The compositional analysis is based on a strongly-typed Domain-Specific Language (DSL) for describing and reasoning about constrained-flow networks at various levels of sketchiness along with invariants that need to be enforced thereupon. In this paper, we define the formal system underlying the operation of NetSketch, in particular the DSL behind NetSketch's user-interface when used in "sketch mode", and prove its soundness relative to appropriately-defined notions of validity. In a companion paper [6], we overview NetSketch, highlight its salient features, and illustrate how it could be used in two applications: the management/shaping of traffic flows in a vehicular network (as a proxy for CPS applications) and in a streaming media network (as a proxy for Internet applications).