911 resultados para 1145
Resumo:
Smoothed functional (SF) schemes for gradient estimation are known to be efficient in stochastic optimization algorithms, especially when the objective is to improve the performance of a stochastic system However, the performance of these methods depends on several parameters, such as the choice of a suitable smoothing kernel. Different kernels have been studied in the literature, which include Gaussian, Cauchy, and uniform distributions, among others. This article studies a new class of kernels based on the q-Gaussian distribution, which has gained popularity in statistical physics over the last decade. Though the importance of this family of distributions is attributed to its ability to generalize the Gaussian distribution, we observe that this class encompasses almost all existing smoothing kernels. This motivates us to study SF schemes for gradient estimation using the q-Gaussian distribution. Using the derived gradient estimates, we propose two-timescale algorithms for optimization of a stochastic objective function in a constrained setting with a projected gradient search approach. We prove the convergence of our algorithms to the set of stationary points of an associated ODE. We also demonstrate their performance numerically through simulations on a queuing model.
Resumo:
Regions in video streams attracting human interest contribute significantly to human understanding of the video. Being able to predict salient and informative Regions of Interest (ROIs) through a sequence of eye movements is a challenging problem. Applications such as content-aware retargeting of videos to different aspect ratios while preserving informative regions and smart insertion of dialog (closed-caption text) into the video stream can significantly be improved using the predicted ROIs. We propose an interactive human-in-the-loop framework to model eye movements and predict visual saliency into yet-unseen frames. Eye tracking and video content are used to model visual attention in a manner that accounts for important eye-gaze characteristics such as temporal discontinuities due to sudden eye movements, noise, and behavioral artifacts. A novel statistical-and algorithm-based method gaze buffering is proposed for eye-gaze analysis and its fusion with content-based features. Our robust saliency prediction is instantiated for two challenging and exciting applications. The first application alters video aspect ratios on-the-fly using content-aware video retargeting, thus making them suitable for a variety of display sizes. The second application dynamically localizes active speakers and places dialog captions on-the-fly in the video stream. Our method ensures that dialogs are faithful to active speaker locations and do not interfere with salient content in the video stream. Our framework naturally accommodates personalisation of the application to suit biases and preferences of individual users.
Resumo:
Representatives of several Internet service providers (ISPs) have expressed their wish to see a substantial change in the pricing policies of the Internet. In particular, they would like to see content providers (CPs) pay for use of the network, given the large amount of resources they use. This would be in clear violation of the ``network neutrality'' principle that had characterized the development of the wireline Internet. Our first goal in this article is to propose and study possible ways of implementing such payments and of regulating their amount. We introduce a model that includes the users' behavior, the utilities of the ISP and of the CPs, and, the monetary flow that involves the content users, the ISP and CP, and, in pUrticular, the CP's revenues from advertisements. We consider various game models and study the resulting equilibria; they are all combinations of a noncooperative game (in which the ISPs and CPs determine how much they will charge the users) with a ``cooperative'' one on how the CP and the ISP share the payments. We include in our model a possible asymmetric weighting parameter (that varies between zero to one). We also study equilibria that arise when one of the CPs colludes with the TSP. We also study two dynamic game models as well as the convergence of prices to the equilibrium values.
Resumo:
Today's programming languages are supported by powerful third-party APIs. For a given application domain, it is common to have many competing APIs that provide similar functionality. Programmer productivity therefore depends heavily on the programmer's ability to discover suitable APIs both during an initial coding phase, as well as during software maintenance. The aim of this work is to support the discovery and migration of math APIs. Math APIs are at the heart of many application domains ranging from machine learning to scientific computations. Our approach, called MATHFINDER, combines executable specifications of mathematical computations with unit tests (operational specifications) of API methods. Given a math expression, MATHFINDER synthesizes pseudo-code comprised of API methods to compute the expression by mining unit tests of the API methods. We present a sequential version of our unit test mining algorithm and also design a more scalable data-parallel version. We perform extensive evaluation of MATHFINDER (1) for API discovery, where math algorithms are to be implemented from scratch and (2) for API migration, where client programs utilizing a math API are to be migrated to another API. We evaluated the precision and recall of MATHFINDER on a diverse collection of math expressions, culled from algorithms used in a wide range of application areas such as control systems and structural dynamics. In a user study to evaluate the productivity gains obtained by using MATHFINDER for API discovery, the programmers who used MATHFINDER finished their programming tasks twice as fast as their counterparts who used the usual techniques like web and code search, IDE code completion, and manual inspection of library documentation. For the problem of API migration, as a case study, we used MATHFINDER to migrate Weka, a popular machine learning library. Overall, our evaluation shows that MATHFINDER is easy to use, provides highly precise results across several math APIs and application domains even with a small number of unit tests per method, and scales to large collections of unit tests.
Resumo:
Programming environments for smartphones expose a concurrency model that combines multi-threading and asynchronous event-based dispatch. While this enables the development of efficient and feature-rich applications, unforeseen thread interleavings coupled with non-deterministic reorderings of asynchronous tasks can lead to subtle concurrency errors in the applications. In this paper, we formalize the concurrency semantics of the Android programming model. We further define the happens-before relation for Android applications, and develop a dynamic race detection technique based on this relation. Our relation generalizes the so far independently studied happens-before relations for multi-threaded programs and single-threaded event-driven programs. Additionally, our race detection technique uses a model of the Android runtime environment to reduce false positives. We have implemented a tool called DROIDRACER. It generates execution traces by systematically testing Android applications and detects data races by computing the happens-before relation on the traces. We analyzed 1 5 Android applications including popular applications such as Facebook, Twitter and K-9 Mail. Our results indicate that data races are prevalent in Android applications, and that DROIDRACER is an effective tool to identify data races.
Resumo:
It is shown how to use non-commutative stopping times in order to stop the CCR flow of arbitrary index and also its isometric cocycles, i.e. left operator Markovian cocycles on Boson Fock space. Stopping the CCR flow yields a homomorphism from the semigroup of stopping times, equipped with the convolution product, into the semigroup of unital endomorphisms of the von Neumann algebra of bounded operators on the ambient Fock space. The operators produced by stopping cocycles themselves satisfy a cocycle relation.
Resumo:
Task-parallel languages are increasingly popular. Many of them provide expressive mechanisms for intertask synchronization. For example, OpenMP 4.0 will integrate data-driven execution semantics derived from the StarSs research language. Compared to the more restrictive data-parallel and fork-join concurrency models, the advanced features being introduced into task-parallelmodels in turn enable improved scalability through load balancing, memory latency hiding, mitigation of the pressure on memory bandwidth, and, as a side effect, reduced power consumption. In this article, we develop a systematic approach to compile loop nests into concurrent, dynamically constructed graphs of dependent tasks. We propose a simple and effective heuristic that selects the most profitable parallelization idiom for every dependence type and communication pattern. This heuristic enables the extraction of interband parallelism (cross-barrier parallelism) in a number of numerical computations that range from linear algebra to structured grids and image processing. The proposed static analysis and code generation alleviates the burden of a full-blown dependence resolver to track the readiness of tasks at runtime. We evaluate our approach and algorithms in the PPCG compiler, targeting OpenStream, a representative dataflow task-parallel language with explicit intertask dependences and a lightweight runtime. Experimental results demonstrate the effectiveness of the approach.
Resumo:
Designing and implementing thread-safe multithreaded libraries can be a daunting task as developers of these libraries need to ensure that their implementations are free from concurrency bugs, including deadlocks. The usual practice involves employing software testing and/or dynamic analysis to detect. deadlocks. Their effectiveness is dependent on well-designed multithreaded test cases. Unsurprisingly, developing multithreaded tests is significantly harder than developing sequential tests for obvious reasons. In this paper, we address the problem of automatically synthesizing multithreaded tests that can induce deadlocks. The key insight to our approach is that a subset of the properties observed when a deadlock manifests in a concurrent execution can also be observed in a single threaded execution. We design a novel, automatic, scalable and directed approach that identifies these properties and synthesizes a deadlock revealing multithreaded test. The input to our approach is the library implementation under consideration and the output is a set of deadlock revealing multithreaded tests. We have implemented our approach as part of a tool, named OMEN1. OMEN is able to synthesize multithreaded tests on many multithreaded Java libraries. Applying a dynamic deadlock detector on the execution of the synthesized tests results in the detection of a number of deadlocks, including 35 real deadlocks in classes documented as thread-safe. Moreover, our experimental results show that dynamic analysis on multithreaded tests that are either synthesized randomly or developed by third-party programmers are ineffective in detecting the deadlocks.
Resumo:
Dynamic analysis techniques have been proposed to detect potential deadlocks. Analyzing and comprehending each potential deadlock to determine whether the deadlock is feasible in a real execution requires significant programmer effort. Moreover, empirical evidence shows that existing analyses are quite imprecise. This imprecision of the analyses further void the manual effort invested in reasoning about non-existent defects. In this paper, we address the problems of imprecision of existing analyses and the subsequent manual effort necessary to reason about deadlocks. We propose a novel approach for deadlock detection by designing a dynamic analysis that intelligently leverages execution traces. To reduce the manual effort, we replay the program by making the execution follow a schedule derived based on the observed trace. For a real deadlock, its feasibility is automatically verified if the replay causes the execution to deadlock. We have implemented our approach as part of WOLF and have analyzed many large (upto 160KLoC) Java programs. Our experimental results show that we are able to identify 74% of the reported defects as true (or false) positives automatically leaving very few defects for manual analysis. The overhead of our approach is negligible making it a compelling tool for practical adoption.
Resumo:
In geographical forwarding of packets in a large wireless sensor network (WSN) with sleep-wake cycling nodes, we are interested in the local decision problem faced by a node that has ``custody'' of a packet and has to choose one among a set of next-hop relay nodes to forward the packet toward the sink. Each relay is associated with a ``reward'' that summarizes the benefit of forwarding the packet through that relay. We seek a solution to this local problem, the idea being that such a solution, if adopted by every node, could provide a reasonable heuristic for the end-to-end forwarding problem. Toward this end, we propose a local relay selection problem consisting of a forwarding node and a collection of relay nodes, with the relays waking up sequentially at random times. At each relay wake-up instant, the forwarder can choose to probe a relay to learn its reward value, based on which the forwarder can then decide whether to stop (and forward its packet to the chosen relay) or to continue to wait for further relays to wake up. The forwarder's objective is to select a relay so as to minimize a combination of waiting delay, reward, and probing cost. The local decision problem can be considered as a variant of the asset selling problem studied in the operations research literature. We formulate the local problem as a Markov decision process (MDP) and characterize the solution in terms of stopping sets and probing sets. We provide results illustrating the structure of the stopping sets, namely, the (lower bound) threshold and the stage independence properties. Regarding the probing sets, we make an interesting conjecture that these sets are characterized by upper bounds. Through simulation experiments, we provide valuable insights into the performance of the optimal local forwarding and its use as an end-to-end forwarding heuristic.
Resumo:
The Lattice-Boltzmann method (LBM), a promising new particle-based simulation technique for complex and multiscale fluid flows, has seen tremendous adoption in recent years in computational fluid dynamics. Even with a state-of-the-art LBM solver such as Palabos, a user has to still manually write the program using library-supplied primitives. We propose an automated code generator for a class of LBM computations with the objective to achieve high performance on modern architectures. Few studies have looked at time tiling for LBM codes. We exploit a key similarity between stencils and LBM to enable polyhedral optimizations and in turn time tiling for LBM. We also characterize the performance of LBM with the Roofline performance model. Experimental results for standard LBM simulations like Lid Driven Cavity, Flow Past Cylinder, and Poiseuille Flow show that our scheme consistently outperforms Palabos-on average by up to 3x while running on 16 cores of an Intel Xeon (Sandybridge). We also obtain an improvement of 2.47x on the SPEC LBM benchmark.
Resumo:
Subtle concurrency errors in multithreaded libraries that arise because of incorrect or inadequate synchronization are often difficult to pinpoint precisely using only static techniques. On the other hand, the effectiveness of dynamic race detectors is critically dependent on multithreaded test suites whose execution can be used to identify and trigger races. Usually, such multithreaded tests need to invoke a specific combination of methods with objects involved in the invocations being shared appropriately to expose a race. Without a priori knowledge of the race, construction of such tests can be challenging. In this paper, we present a lightweight and scalable technique for synthesizing precisely these kinds of tests. Given a multithreaded library and a sequential test suite, we describe a fully automated analysis that examines sequential execution traces, and produces as its output a concurrent client program that drives shared objects via library method calls to states conducive for triggering a race. Experimental results on a variety of well-tested Java libraries yield 101 synthesized multithreaded tests in less than four minutes. Analyzing the execution of these tests using an off-the-shelf race detector reveals 187 harmful races, including several previously unreported ones.
Resumo:
Graph algorithms have been shown to possess enough parallelism to keep several computing resources busy-even hundreds of cores on a GPU. Unfortunately, tuning their implementation for efficient execution on a particular hardware configuration of heterogeneous systems consisting of multicore CPUs and GPUs is challenging, time consuming, and error prone. To address these issues, we propose a domain-specific language (DSL), Falcon, for implementing graph algorithms that (i) abstracts the hardware, (ii) provides constructs to write explicitly parallel programs at a higher level, and (iii) can work with general algorithms that may change the graph structure (morph algorithms). We illustrate the usage of our DSL to implement local computation algorithms (that do not change the graph structure) and morph algorithms such as Delaunay mesh refinement, survey propagation, and dynamic SSSP on GPU and multicore CPUs. Using a set of benchmark graphs, we illustrate that the generated code performs close to the state-of-the-art hand-tuned implementations.
Resumo:
This paper presents the design and implementation of PolyMage, a domain-specific language and compiler for image processing pipelines. An image processing pipeline can be viewed as a graph of interconnected stages which process images successively. Each stage typically performs one of point-wise, stencil, reduction or data-dependent operations on image pixels. Individual stages in a pipeline typically exhibit abundant data parallelism that can be exploited with relative ease. However, the stages also require high memory bandwidth preventing effective utilization of parallelism available on modern architectures. For applications that demand high performance, the traditional options are to use optimized libraries like OpenCV or to optimize manually. While using libraries precludes optimization across library routines, manual optimization accounting for both parallelism and locality is very tedious. The focus of our system, PolyMage, is on automatically generating high-performance implementations of image processing pipelines expressed in a high-level declarative language. Our optimization approach primarily relies on the transformation and code generation capabilities of the polyhedral compiler framework. To the best of our knowledge, this is the first model-driven compiler for image processing pipelines that performs complex fusion, tiling, and storage optimization automatically. Experimental results on a modern multicore system show that the performance achieved by our automatic approach is up to 1.81x better than that achieved through manual tuning in Halide, a state-of-the-art language and compiler for image processing pipelines. For a camera raw image processing pipeline, our performance is comparable to that of a hand-tuned implementation.
Resumo:
We propose the generation of Standard Model fermion hierarchy by the extension of renormalizable SO(10) GUT with O(N (g) ) family gauge symmetry. In this scenario, Higgs representations of SO(10) also carry family indices and are called Yukawons. Vacuum expectation values of these Yukawon fields break GUT and family symmetry and generate MSSM Yukawa couplings dynamically. We have demonstrated this idea using Higgs irrep, ignoring the contribution of 1 2 0-plet which is, however, required for complete fitting of fermion mass-mixing data. The effective MSSM matter fermion couplings to the light Higgs pair are determined by the null eigenvectors of the MSSM-type Higgs doublet superfield mass matrix . A consistency condition on the doublet (1,2,+/- 1]) mass matrix ( 0) is required to keep one pair of Higgs doublets light in the effective MSSM. We show that the Yukawa structure generated by null eigenvectors of are of generic kind required by the MSSM. A hidden sector with a pair of (S (a b) ; I center dot (a b) ) fields breaks supersymmetry and facilitates 0. SUSY breaking is communicated via supergravity. In this scenario, matter fermion Yukawa couplings are reduced from 15 to just 3 parameters in MSGUT with three generations.