24 resultados para 13627-005
em Boston University Digital Common
Resumo:
The proliferation of inexpensive workstations and networks has prompted several researchers to use such distributed systems for parallel computing. Attempts have been made to offer a shared-memory programming model on such distributed memory computers. Most systems provide a shared-memory that is coherent in that all processes that use it agree on the order of all memory events. This dissertation explores the possibility of a significant improvement in the performance of some applications when they use non-coherent memory. First, a new formal model to describe existing non-coherent memories is developed. I use this model to prove that certain problems can be solved using asynchronous iterative algorithms on shared-memory in which the coherence constraints are substantially relaxed. In the course of the development of the model I discovered a new type of non-coherent behavior called Local Consistency. Second, a programming model, Mermera, is proposed. It provides programmers with a choice of hierarchically related non-coherent behaviors along with one coherent behavior. Thus, one can trade-off the ease of programming with coherent memory for improved performance with non-coherent memory. As an example, I present a program to solve a linear system of equations using an asynchronous iterative algorithm. This program uses all the behaviors offered by Mermera. Third, I describe the implementation of Mermera on a BBN Butterfly TC2000 and on a network of workstations. The performance of a version of the equation solving program that uses all the behaviors of Mermera is compared with that of a version that uses coherent behavior only. For a system of 1000 equations the former exhibits at least a 5-fold improvement in convergence time over the latter. The version using coherent behavior only does not benefit from employing more than one workstation to solve the problem while the program using non-coherent behavior continues to achieve improved performance as the number of workstations is increased from 1 to 6. This measurement corroborates our belief that non-coherent shared memory can be a performance boon for some applications.
Resumo:
By utilizing structure sharing among its parse trees, a GB parser can increase its efficiency dramatically. Using a GB parser which has as its phrase structure recovery component an implementation of Tomita's algorithm (as described in [Tom86]), we investigate how a GB parser can preserve the structure sharing output by Tomita's algorithm. In this report, we discuss the implications of using Tomita's algorithm in GB parsing, and we give some details of the structuresharing parser currently under construction. We also discuss a method of parallelizing a GB parser, and relate it to the existing literature on parallel GB parsing. Our approach to preserving sharing within a shared-packed forest is applicable not only to GB parsing, but anytime we want to preserve structure sharing in a parse forest in the presence of features.
Resumo:
Recent studies have noted that vertex degree in the autonomous system (AS) graph exhibits a highly variable distribution [15, 22]. The most prominent explanatory model for this phenomenon is the Barabási-Albert (B-A) model [5, 2]. A central feature of the B-A model is preferential connectivity—meaning that the likelihood a new node in a growing graph will connect to an existing node is proportional to the existing node’s degree. In this paper we ask whether a more general explanation than the B-A model, and absent the assumption of preferential connectivity, is consistent with empirical data. We are motivated by two observations: first, AS degree and AS size are highly correlated [11]; and second, highly variable AS size can arise simply through exponential growth. We construct a model incorporating exponential growth in the size of the Internet, and in the number of ASes. We then show via analysis that such a model yields a size distribution exhibiting a power-law tail. In such a model, if an AS’s link formation is roughly proportional to its size, then AS degree will also show high variability. We instantiate such a model with empirically derived estimates of growth rates and show that the resulting degree distribution is in good agreement with that of real AS graphs.
Resumo:
In this paper, we expose an unorthodox adversarial attack that exploits the transients of a system's adaptive behavior, as opposed to its limited steady-state capacity. We show that a well orchestrated attack could introduce significant inefficiencies that could potentially deprive a network element from much of its capacity, or significantly reduce its service quality, while evading detection by consuming an unsuspicious, small fraction of that element's hijacked capacity. This type of attack stands in sharp contrast to traditional brute-force, sustained high-rate DoS attacks, as well as recently proposed attacks that exploit specific protocol settings such as TCP timeouts. We exemplify what we term as Reduction of Quality (RoQ) attacks by exposing the vulnerabilities of common adaptation mechanisms. We develop control-theoretic models and associated metrics to quantify these vulnerabilities. We present numerical and simulation results, which we validate with observations from real Internet experiments. Our findings motivate the need for the development of adaptation mechanisms that are resilient to these new forms of attacks.
Resumo:
A problem with Speculative Concurrency Control algorithms and other common concurrency control schemes using forward validation is that committing a transaction as soon as it finishes validating, may result in a value loss to the system. Haritsa showed that by making a lower priority transaction wait after it is validated, the number of transactions meeting their deadlines is increased, which may result in a higher value-added to the system. SCC-based protocols can benefit from the introduction of such delays by giving optimistic shadows with high value-added to the system more time to execute and commit instead of being aborted in favor of other validating transactions, whose value-added to the system is lower. In this paper we present and evaluate an extension to SCC algorithms that allows for commit deferments.
Resumo:
Programmers of parallel processes that communicate through shared globally distributed data structures (DDS) face a difficult choice. Either they must explicitly program DDS management, by partitioning or replicating it over multiple distributed memory modules, or be content with a high latency coherent (sequentially consistent) memory abstraction that hides the DDS' distribution. We present Mermera, a new formalism and system that enable a smooth spectrum of noncoherent shared memory behaviors to coexist between the above two extremes. Our approach allows us to define known noncoherent memories in a new simple way, to identify new memory behaviors, and to characterize generic mixed-behavior computations. The latter are useful for programming using multiple behaviors that complement each others' advantages. On the practical side, we show that the large class of programs that use asynchronous iterative methods (AIM) can run correctly on slow memory, one of the weakest, and hence most efficient and fault-tolerant, noncoherence conditions. An example AIM program to solve linear equations, is developed to illustrate: (1) the need for concurrently mixing memory behaviors, and, (2) the performance gains attainable via noncoherence. Other program classes tolerate weak memory consistency by synchronizing in such a way as to yield executions indistinguishable from coherent ones. AIM computations on noncoherent memory yield noncoherent, yet correct, computations. We report performance data that exemplifies the potential benefits of noncoherence, in terms of raw memory performance, as well as application speed.
Resumo:
ImageRover is a search by image content navigation tool for the world wide web. To gather images expediently, the image collection subsystem utilizes a distributed fleet of WWW robots running on different computers. The image robots gather information about the images they find, computing the appropriate image decompositions and indices, and store this extracted information in vector form for searches based on image content. At search time, users can iteratively guide the search through the selection of relevant examples. Search performance is made efficient through the use of an approximate, optimized k-d tree algorithm. The system employs a novel relevance feedback algorithm that selects the distance metrics appropriate for a particular query.
Resumo:
Recent measurement based studies reveal that most of the Internet connections are short in terms of the amount of traffic they carry (mice), while a small fraction of the connections are carrying a large portion of the traffic (elephants). A careful study of the TCP protocol shows that without help from an Active Queue Management (AQM) policy, short connections tend to lose to long connections in their competition for bandwidth. This is because short connections do not gain detailed knowledge of the network state, and therefore they are doomed to be less competitive due to the conservative nature of the TCP congestion control algorithm. Inspired by the Differentiated Services (Diffserv) architecture, we propose to give preferential treatment to short connections inside the bottleneck queue, so that short connections experience less packet drop rate than long connections. This is done by employing the RIO (RED with In and Out) queue management policy which uses different drop functions for different classes of traffic. Our simulation results show that: (1) in a highly loaded network, preferential treatment is necessary to provide short TCP connections with better response time and fairness without hurting the performance of long TCP connections; (2) the proposed scheme still delivers packets in FIFO manner at each link, thus it maintains statistical multiplexing gain and does not misorder packets; (3) choosing a smaller default initial timeout value for TCP can help enhance the performance of short TCP flows, however not as effectively as our scheme and at the risk of congestion collapse; (4) in the worst case, our proposal works as well as a regular RED scheme, in terms of response time and goodput.
Resumo:
Growing interest in inference and prediction of network characteristics is justified by its importance for a variety of network-aware applications. One widely adopted strategy to characterize network conditions relies on active, end-to-end probing of the network. Active end-to-end probing techniques differ in (1) the structural composition of the probes they use (e.g., number and size of packets, the destination of various packets, the protocols used, etc.), (2) the entity making the measurements (e.g. sender vs. receiver), and (3) the techniques used to combine measurements in order to infer specific metrics of interest. In this paper, we present Periscope: a Linux API that enables the definition of new probing structures and inference techniques from user space through a flexible interface. PeriScope requires no support from clients beyond the ability to respond to ICMP ECHO REQUESTs and is designed to minimize user/kernel crossings and to ensure various constraints (e.g., back-to-back packet transmissions, fine-grained timing measurements) We show how to use Periscope for two different probing purposes, namely the measurement of shared packet losses between pairs of endpoints and for the measurement of subpath bandwidth. Results from Internet experiments for both of these goals are also presented.
Resumo:
Forwarding in DTNs is a challenging problem. We focus on the specific issue of forwarding in an environment where mobile devices are carried by people in a restricted physical space (e.g. a conference) and contact patterns are not predictable. We show for the first time a path explosion phenomenon between most pairs of nodes. This means that, once the first path reaches the destination, the number of subsequent paths grows rapidly with time, so there usually exist many near-optimal paths. We study the path explosion phenomenon both analytically and empirically. Our results highlight the importance of unequal contact rates across nodes for understanding the performance of forwarding algorithms. We also find that a variety of well-known forwarding algorithms show surprisingly similar performance in our setting and we interpret this fact in light of the path explosion phenomenon.
Resumo:
We define and construct efficient depth universal and almost size universal quantum circuits. Such circuits can be viewed as general purpose simulators for central classes of quantum circuits and can be used to capture the computational power of the circuit class being simulated. For depth we construct universal circuits whose depth is the same order as the circuits being simulated. For size, there is a log factor blow-up in the universal circuits constructed here. We prove that this construction is nearly optimal. Our results apply to a number of well-studied quantum circuit classes.
Resumo:
We propose a multi-object multi-camera framework for tracking large numbers of tightly-spaced objects that rapidly move in three dimensions. We formulate the problem of finding correspondences across multiple views as a multidimensional assignment problem and use a greedy randomized adaptive search procedure to solve this NP-hard problem efficiently. To account for occlusions, we relax the one-to-one constraint that one measurement corresponds to one object and iteratively solve the relaxed assignment problem. After correspondences are established, object trajectories are estimated by stereoscopic reconstruction using an epipolar-neighborhood search. We embedded our method into a tracker-to-tracker multi-view fusion system that not only obtains the three-dimensional trajectories of closely-moving objects but also accurately settles track uncertainties that could not be resolved from single views due to occlusion. We conducted experiments to validate our greedy assignment procedure and our technique to recover from occlusions. We successfully track hundreds of flying bats and provide an analysis of their group behavior based on 150 reconstructed 3D trajectories.
Resumo:
A number of recent studies have pointed out that TCP's performance over ATM networks tends to suffer, especially under congestion and switch buffer limitations. Switch-level enhancements and link-level flow control have been proposed to improve TCP's performance in ATM networks. Selective Cell Discard (SCD) and Early Packet Discard (EPD) ensure that partial packets are discarded from the network "as early as possible", thus reducing wasted bandwidth. While such techniques improve the achievable throughput, their effectiveness tends to degrade in multi-hop networks. In this paper, we introduce Lazy Packet Discard (LPD), an AAL-level enhancement that improves effective throughput, reduces response time, and minimizes wasted bandwidth for TCP/IP over ATM. In contrast to the SCD and EPD policies, LPD delays as much as possible the removal from the network of cells belonging to a partially communicated packet. We outline the implementation of LPD and show the performance advantage of TCP/LPD, compared to plain TCP and TCP/EPD through analysis and simulations.
Resumo:
An improved technique for 3D head tracking under varying illumination conditions is proposed. The head is modeled as a texture mapped cylinder. Tracking is formulated as an image registration problem in the cylinder's texture map image. The resulting dynamic texture map provides a stabilized view of the face that can be used as input to many existing 2D techniques for face recognition, facial expressions analysis, lip reading, and eye tracking. To solve the registration problem in the presence of lighting variation and head motion, the residual error of registration is modeled as a linear combination of texture warping templates and orthogonal illumination templates. Fast and stable on-line tracking is achieved via regularized, weighted least squares minimization of the registration error. The regularization term tends to limit potential ambiguities that arise in the warping and illumination templates. It enables stable tracking over extended sequences. Tracking does not require a precise initial fit of the model; the system is initialized automatically using a simple 2D face detector. The only assumption is that the target is facing the camera in the first frame of the sequence. The formulation is tailored to take advantage of texture mapping hardware available in many workstations, PC's, and game consoles. The non-optimized implementation runs at about 15 frames per second on a SGI O2 graphic workstation. Extensive experiments evaluating the effectiveness of the formulation are reported. The sensitivity of the technique to illumination, regularization parameters, errors in the initial positioning and internal camera parameters are analyzed. Examples and applications of tracking are reported.
Resumo:
This paper focuses on an efficient user-level method for the deployment of application-specific extensions, using commodity operating systems and hardware. A sandboxing technique is described that supports multiple extensions within a shared virtual address space. Applications can register sandboxed code with the system, so that it may be executed in the context of any process. Such code may be used to implement generic routines and handlers for a class of applications, or system service extensions that complement the functionality of the core kernel. Using our approach, application-specific extensions can be written like conventional user-level code, utilizing libraries and system calls, with the advantage that they may be executed without the traditional costs of scheduling and context-switching between process-level protection domains. No special hardware support such as segmentation or tagged translation look-aside buffers (TLBs) is required. Instead, our ``user-level sandboxing'' mechanism requires only paged-based virtual memory support, given that sandboxed extensions are either written by a trusted source or are guaranteed to be memory-safe (e.g., using type-safe languages). Using a fast method of upcalls, we show how our mechanism provides significant performance improvements over traditional methods of invoking user-level services. As an application of our approach, we have implemented a user-level network subsystem that avoids data copying via the kernel and, in many cases, yields far greater network throughput than kernel-level approaches.