14 resultados para Probabilistic Algorithms
em Boston University Digital Common
Resumo:
Large probabilistic graphs arise in various domains spanning from social networks to biological and communication networks. An important query in these graphs is the k nearest-neighbor query, which involves finding and reporting the k closest nodes to a specific node. This query assumes the existence of a measure of the "proximity" or the "distance" between any two nodes in the graph. To that end, we propose various novel distance functions that extend well known notions of classical graph theory, such as shortest paths and random walks. We argue that many meaningful distance functions are computationally intractable to compute exactly. Thus, in order to process nearest-neighbor queries, we resort to Monte Carlo sampling and exploit novel graph-transformation ideas and pruning opportunities. In our extensive experimental analysis, we explore the trade-offs of our approximation algorithms and demonstrate that they scale well on real-world probabilistic graphs with tens of millions of edges.
Resumo:
The Google AdSense Program is a successful internet advertisement program where Google places contextual adverts on third-party websites and shares the resulting revenue with each publisher. Advertisers have budgets and bid on ad slots while publishers set reserve prices for the ad slots on their websites. Following previous modelling efforts, we model the program as a two-sided market with advertisers on one side and publishers on the other. We show a reduction from the Generalised Assignment Problem (GAP) to the problem of computing the revenue maximising allocation and pricing of publisher slots under a first-price auction. GAP is APX-hard but a (1-1/e) approximation is known. We compute truthful and revenue-maximizing prices and allocation of ad slots to advertisers under a second-price auction. The auctioneer's revenue is within (1-1/e) second-price optimal.
Resumo:
For communication-intensive parallel applications, the maximum degree of concurrency achievable is limited by the communication throughput made available by the network. In previous work [HPS94], we showed experimentally that the performance of certain parallel applications running on a workstation network can be improved significantly if a congestion control protocol is used to enhance network performance. In this paper, we characterize and analyze the communication requirements of a large class of supercomputing applications that fall under the category of fixed-point problems, amenable to solution by parallel iterative methods. This results in a set of interface and architectural features sufficient for the efficient implementation of the applications over a large-scale distributed system. In particular, we propose a direct link between the application and network layer, supporting congestion control actions at both ends. This in turn enhances the system's responsiveness to network congestion, improving performance. Measurements are given showing the efficacy of our scheme to support large-scale parallel computations.
Resumo:
This paper investigates the power of genetic algorithms at solving the MAX-CLIQUE problem. We measure the performance of a standard genetic algorithm on an elementary set of problem instances consisting of embedded cliques in random graphs. We indicate the need for improvement, and introduce a new genetic algorithm, the multi-phase annealed GA, which exhibits superior performance on the same problem set. As we scale up the problem size and test on \hard" benchmark instances, we notice a degraded performance in the algorithm caused by premature convergence to local minima. To alleviate this problem, a sequence of modi cations are implemented ranging from changes in input representation to systematic local search. The most recent version, called union GA, incorporates the features of union cross-over, greedy replacement, and diversity enhancement. It shows a marked speed-up in the number of iterations required to find a given solution, as well as some improvement in the clique size found. We discuss issues related to the SIMD implementation of the genetic algorithms on a Thinking Machines CM-5, which was necessitated by the intrinsically high time complexity (O(n3)) of the serial algorithm for computing one iteration. Our preliminary conclusions are: (1) a genetic algorithm needs to be heavily customized to work "well" for the clique problem; (2) a GA is computationally very expensive, and its use is only recommended if it is known to find larger cliques than other algorithms; (3) although our customization e ort is bringing forth continued improvements, there is no clear evidence, at this time, that a GA will have better success in circumventing local minima.
Resumo:
The performance of a randomized version of the subgraph-exclusion algorithm (called Ramsey) for CLIQUE by Boppana and Halldorsson is studied on very large graphs. We compare the performance of this algorithm with the performance of two common heuristic algorithms, the greedy heuristic and a version of simulated annealing. These algorithms are tested on graphs with up to 10,000 vertices on a workstation and graphs as large as 70,000 vertices on a Connection Machine. Our implementations establish the ability to run clique approximation algorithms on very large graphs. We test our implementations on a variety of different graphs. Our conclusions indicate that on randomly generated graphs minor changes to the distribution can cause dramatic changes in the performance of the heuristic algorithms. The Ramsey algorithm, while not as good as the others for the most common distributions, seems more robust and provides a more even overall performance. In general, and especially on deterministically generated graphs, a combination of simulated annealing with either the Ramsey algorithm or the greedy heuristic seems to perform best. This combined algorithm works particularly well on large Keller and Hamming graphs and has a competitive overall performance on the DIMACS benchmark graphs.
Resumo:
The exploding demand for services like the World Wide Web reflects the potential that is presented by globally distributed information systems. The number of WWW servers world-wide has doubled every 3 to 5 months since 1993, outstripping even the growth of the Internet. At each of these self-managed sites, the Common Gateway Interface (CGI) and Hypertext Transfer Protocol (HTTP) already constitute a rudimentary basis for contributing local resources to remote collaborations. However, the Web has serious deficiencies that make it unsuited for use as a true medium for metacomputing --- the process of bringing hardware, software, and expertise from many geographically dispersed sources to bear on large scale problems. These deficiencies are, paradoxically, the direct result of the very simple design principles that enabled its exponential growth. There are many symptoms of the problems exhibited by the Web: disk and network resources are consumed extravagantly; information search and discovery are difficult; protocols are aimed at data movement rather than task migration, and ignore the potential for distributing computation. However, all of these can be seen as aspects of a single problem: as a distributed system for metacomputing, the Web offers unpredictable performance and unreliable results. The goal of our project is to use the Web as a medium (within either the global Internet or an enterprise intranet) for metacomputing in a reliable way with performance guarantees. We attack this problem one four levels: (1) Resource Management Services: Globally distributed computing allows novel approaches to the old problems of performance guarantees and reliability. Our first set of ideas involve setting up a family of real-time resource management models organized by the Web Computing Framework with a standard Resource Management Interface (RMI), a Resource Registry, a Task Registry, and resource management protocols to allow resource needs and availability information be collected and disseminated so that a family of algorithms with varying computational precision and accuracy of representations can be chosen to meet realtime and reliability constraints. (2) Middleware Services: Complementary to techniques for allocating and scheduling available resources to serve application needs under realtime and reliability constraints, the second set of ideas aim at reduce communication latency, traffic congestion, server work load, etc. We develop customizable middleware services to exploit application characteristics in traffic analysis to drive new server/browser design strategies (e.g., exploit self-similarity of Web traffic), derive document access patterns via multiserver cooperation, and use them in speculative prefetching, document caching, and aggressive replication to reduce server load and bandwidth requirements. (3) Communication Infrastructure: Finally, to achieve any guarantee of quality of service or performance, one must get at the network layer that can provide the basic guarantees of bandwidth, latency, and reliability. Therefore, the third area is a set of new techniques in network service and protocol designs. (4) Object-Oriented Web Computing Framework A useful resource management system must deal with job priority, fault-tolerance, quality of service, complex resources such as ATM channels, probabilistic models, etc., and models must be tailored to represent the best tradeoff for a particular setting. This requires a family of models, organized within an object-oriented framework, because no one-size-fits-all approach is appropriate. This presents a software engineering challenge requiring integration of solutions at all levels: algorithms, models, protocols, and profiling and monitoring tools. The framework captures the abstract class interfaces of the collection of cooperating components, but allows the concretization of each component to be driven by the requirements of a specific approach and environment.
Resumo:
The increased diversity of Internet application requirements has spurred recent interests in transport protocols with flexible transmission controls. In window-based congestion control schemes, increase rules determine how to probe available bandwidth, whereas decrease rules determine how to back off when losses due to congestion are detected. The parameterization of these control rules is done so as to ensure that the resulting protocol is TCP-friendly in terms of the relationship between throughput and loss rate. In this paper, we define a new spectrum of window-based congestion control algorithms that are TCP-friendly as well as TCP-compatible under RED. Contrary to previous memory-less controls, our algorithms utilize history information in their control rules. Our proposed algorithms have two salient features: (1) They enable a wider region of TCP-friendliness, and thus more flexibility in trading off among smoothness, aggressiveness, and responsiveness; and (2) they ensure a faster convergence to fairness under a wide range of system conditions. We demonstrate analytically and through extensive ns simulations the steady-state and transient behaviors of several instances of this new spectrum of algorithms. In particular, SIMD is one instance in which the congestion window is increased super-linearly with time since the detection of the last loss. Compared to recently proposed TCP-friendly AIMD and binomial algorithms, we demonstrate the superiority of SIMD in: (1) adapting to sudden increases in available bandwidth, while maintaining competitive smoothness and responsiveness; and (2) rapidly converging to fairness and efficiency.
Resumo:
The increasing diversity of Internet application requirements has spurred recent interest in transport protocols with flexible transmission controls. In window-based congestion control schemes, increase rules determine how to probe available bandwidth, whereas decrease rules determine how to back off when losses due to congestion are detected. The control rules are parameterized so as to ensure that the resulting protocol is TCP-friendly in terms of the relationship between throughput and loss rate. This paper presents a comprehensive study of a new spectrum of window-based congestion controls, which are TCP-friendly as well as TCP-compatible under RED. Our controls utilize history information in their control rules. By doing so, they improve the transient behavior, compared to recently proposed slowly-responsive congestion controls such as general AIMD and binomial controls. Our controls can achieve better tradeoffs among smoothness, aggressiveness, and responsiveness, and they can achieve faster convergence. We demonstrate analytically and through extensive ns simulations the steady-state and transient behavior of several instances of this new spectrum.
Resumo:
In this paper we present Statistical Rate Monotonic Scheduling (SRMS), a generalization of the classical RMS results of Liu and Layland that allows scheduling periodic tasks with highly variable execution times and statistical QoS requirements. Similar to RMS, SRMS has two components: a feasibility test and a scheduling algorithm. The feasibility test for SRMS ensures that using SRMS' scheduling algorithms, it is possible for a given periodic task set to share a given resource (e.g. a processor, communication medium, switching device, etc.) in such a way that such sharing does not result in the violation of any of the periodic tasks QoS constraints. The SRMS scheduling algorithm incorporates a number of unique features. First, it allows for fixed priority scheduling that keeps the tasks' value (or importance) independent of their periods. Second, it allows for job admission control, which allows the rejection of jobs that are not guaranteed to finish by their deadlines as soon as they are released, thus enabling the system to take necessary compensating actions. Also, admission control allows the preservation of resources since no time is spent on jobs that will miss their deadlines anyway. Third, SRMS integrates reservation-based and best-effort resource scheduling seamlessly. Reservation-based scheduling ensures the delivery of the minimal requested QoS; best-effort scheduling ensures that unused, reserved bandwidth is not wasted, but rather used to improve QoS further. Fourth, SRMS allows a system to deal gracefully with overload conditions by ensuring a fair deterioration in QoS across all tasks---as opposed to penalizing tasks with longer periods, for example. Finally, SRMS has the added advantage that its schedulability test is simple and its scheduling algorithm has a constant overhead in the sense that the complexity of the scheduler is not dependent on the number of the tasks in the system. We have evaluated SRMS against a number of alternative scheduling algorithms suggested in the literature (e.g. RMS and slack stealing), as well as refinements thereof, which we describe in this paper. Consistently throughout our experiments, SRMS provided the best performance. In addition, to evaluate the optimality of SRMS, we have compared it to an inefficient, yet optimal scheduler for task sets with harmonic periods.
Resumo:
Statistical Rate Monotonic Scheduling (SRMS) is a generalization of the classical RMS results of Liu and Layland [LL73] for periodic tasks with highly variable execution times and statistical QoS requirements. The main tenet of SRMS is that the variability in task resource requirements could be smoothed through aggregation to yield guaranteed QoS. This aggregation is done over time for a given task and across multiple tasks for a given period of time. Similar to RMS, SRMS has two components: a feasibility test and a scheduling algorithm. SRMS feasibility test ensures that it is possible for a given periodic task set to share a given resource without violating any of the statistical QoS constraints imposed on each task in the set. The SRMS scheduling algorithm consists of two parts: a job admission controller and a scheduler. The SRMS scheduler is a simple, preemptive, fixed-priority scheduler. The SRMS job admission controller manages the QoS delivered to the various tasks through admit/reject and priority assignment decisions. In particular, it ensures the important property of task isolation, whereby tasks do not infringe on each other. In this paper we present the design and implementation of SRMS within the KURT Linux Operating System [HSPN98, SPH 98, Sri98]. KURT Linux supports conventional tasks as well as real-time tasks. It provides a mechanism for transitioning from normal Linux scheduling to a mixed scheduling of conventional and real-time tasks, and to a focused mode where only real-time tasks are scheduled. We overview the technical issues that we had to overcome in order to integrate SRMS into KURT Linux and present the API we have developed for scheduling periodic real-time tasks using SRMS.
Resumo:
Web caching aims to reduce network traffic, server load, and user-perceived retrieval delays by replicating "popular" content on proxy caches that are strategically placed within the network. While key to effective cache utilization, popularity information (e.g. relative access frequencies of objects requested through a proxy) is seldom incorporated directly in cache replacement algorithms. Rather, other properties of the request stream (e.g. temporal locality and content size), which are easier to capture in an on-line fashion, are used to indirectly infer popularity information, and hence drive cache replacement policies. Recent studies suggest that the correlation between these secondary properties and popularity is weakening due in part to the prevalence of efficient client and proxy caches (which tend to mask these correlations). This trend points to the need for proxy cache replacement algorithms that directly capture and use popularity information. In this paper, we (1) present an on-line algorithm that effectively captures and maintains an accurate popularity profile of Web objects requested through a caching proxy, (2) propose a novel cache replacement policy that uses such information to generalize the well-known GreedyDual-Size algorithm, and (3) show the superiority of our proposed algorithm by comparing it to a host of recently-proposed and widely-used algorithms using extensive trace-driven simulations and a variety of performance metrics.
Resumo:
A non-linear supervised learning architecture, the Specialized Mapping Architecture (SMA) and its application to articulated body pose reconstruction from single monocular images is described. The architecture is formed by a number of specialized mapping functions, each of them with the purpose of mapping certain portions (connected or not) of the input space, and a feedback matching process. A probabilistic model for the architecture is described along with a mechanism for learning its parameters. The learning problem is approached using a maximum likelihood estimation framework; we present Expectation Maximization (EM) algorithms for two different instances of the likelihood probability. Performance is characterized by estimating human body postures from low level visual features, showing promising results.
Resumo:
Before choosing, it helps to know both the expected value signaled by a predictive cue and the associated uncertainty that the reward will be forthcoming. Recently, Fiorillo et al. (2003) found the dopamine (DA) neurons of the SNc exhibit sustained responses related to the uncertainty that a cure will be followed by reward, in addition to phasic responses related to reward prediction errors (RPEs). This suggests that cue-dependent anticipations of the timing, magnitude, and uncertainty of rewards are learned and reflected in components of the DA signals broadcast by SNc neurons. What is the minimal local circuit model that can explain such multifaceted reward-related learning? A new computational model shows how learned uncertainty responses emerge robustly on single trial along with phasic RPE responses, such that both types of DA responses exhibit the empirically observed dependence on conditional probability, expected value of reward, and time since onset of the reward-predicting cue. The model includes three major pathways for computing: immediate expected values of cures, timed predictions of reward magnitudes (and RPEs), and the uncertainty associated with these predictions. The first two model pathways refine those previously modeled by Brown et al. (1999). A third, newly modeled, pathway is formed by medium spiny projection neurons (MSPNs) of the matrix compartment of the striatum, whose axons co-release GABA and a neuropeptide, substance P, both at synapses with GABAergic neurons in the SNr and with the dendrites (in SNr) of DA neurons whose somas are in ventral SNc. Co-release enables efficient computation of sustained DA uncertainty responses that are a non-monotonic function of the conditonal probability that a reward will follow the cue. The new model's incorporation of a striatal microcircuit allowed it to reveals that variability in striatal cholinergic transmission can explain observed difference, between monkeys, in the amplitutude of the non-monotonic uncertainty function. Involvement of matriceal MSPNs and striatal cholinergic transmission implpies a relation between uncertainty in the cue-reward contigency and action-selection functions of the basal ganglia. The model synthesizes anatomical, electrophysiological and behavioral data regarding the midbrain DA system in a novel way, by relating the ability to compute uncertainty, in parallel with other aspects of reward contingencies, to the unique distribution of SP inputs in ventral SN.
Resumo:
A new family of neural network architectures is presented. This family of architectures solves the problem of constructing and training minimal neural network classification expert systems by using switching theory. The primary insight that leads to the use of switching theory is that the problem of minimizing the number of rules and the number of IF statements (antecedents) per rule in a neural network expert system can be recast into the problem of minimizing the number of digital gates and the number of connections between digital gates in a Very Large Scale Integrated (VLSI) circuit. The rules that the neural network generates to perform a task are readily extractable from the network's weights and topology. Analysis and simulations on the Mushroom database illustrate the system's performance.