970 resultados para Parallel programming
Resumo:
Graph analytics is an important and computationally demanding class of data analytics. It is essential to balance scalability, ease-of-use and high performance in large scale graph analytics. As such, it is necessary to hide the complexity of parallelism, data distribution and memory locality behind an abstract interface. The aim of this work is to build a scalable graph analytics framework that does not demand significant parallel programming experience based on NUMA-awareness.
The realization of such a system faces two key problems:
(i)~how to develop a scale-free parallel programming framework that scales efficiently across NUMA domains; (ii)~how to efficiently apply graph partitioning in order to create separate and largely independent work items that can be distributed among threads.
Resumo:
Visualization of vector fields plays an important role in research activities nowadays -- Web applications allow a fast, multi-platform and multi-device access to data, which results in the need of optimized applications to be implemented in both high-performance and low-performance devices -- Point trajectory calculation procedures usually perform repeated calculations due to the fact that several points might lie over the same trajectory -- This paper presents a new methodology to calculate point trajectories over highly-dense and uniformly-distributed grid of points in which the trajectories are forced to lie over the points in the grid -- Its advantages rely on a highly parallel computing architecture implementation and in the reduction of the computational effort to calculate the stream paths since unnecessary calculations are avoided, reusing data through iterations -- As case study, the visualization of oceanic currents through in the web platform is presented and analyzed, using WebGL as the parallel computing architecture and the rendering Application Programming Interface
Resumo:
This paper presents the implementation of a high quality real-time 3D video system intended for 3D videoconferencing -- Basically, the system is able to extract depth information from a pair of images coming from a short-baseline camera setup -- The system is based on the use of a variant of the adaptive support-weight algorithm to be applied on GPU-based architectures -- The reason to do it is to get real-time results without compromising accuracy and also to reduce costs by using commodity hardware -- The complete system runs over the GStreamer multimedia software platform to make it even more flexible -- Moreover, an autoestereoscopic display has been used as the end-up terminal for 3D content visualization
Resumo:
String searching within a large corpus of data is an important component of digital forensic (DF) analysis techniques such as file carving. The continuing increase in capacity of consumer storage devices requires corresponding im-provements to the performance of string searching techniques. As string search-ing is a trivially-parallelisable problem, GPGPU approaches are a natural fit – but previous studies have found that local storage presents an insurmountable performance bottleneck. We show that this need not be the case with modern hardware, and demonstrate substantial performance improvements from the use of single and multiple GPUs when searching for strings within a typical forensic disk image.
Resumo:
Fast restoration of critical loads and non-black-start generators can significantly reduce the economic losses caused by power system blackouts. In a parallel power system restoration scenario, the sectionalization of restoration subsystems plays a very important role in determining the pickup of critical loads before synchronization. Most existing research mainly focuses on the startup of non-black-start generators. The restoration of critical loads, especially the loads with cold load characteristics, has not yet been addressed in optimizing the subsystem divisions. As a result, sectionalized restoration subsystems cannot achieve the best coordination between the pickup of loads and the ramping of generators. In order to generate sectionalizing strategies considering the pickup of critical loads in parallel power system restoration scenarios, an optimization model considering power system constraints, the characteristics of the cold load pickup and the features of generator startup is proposed in this paper. A bi-level programming approach is employed to solve the proposed sectionalizing model. In the upper level the optimal sectionalizing problem for the restoration subsystems is addressed, while in the lower level the objective is to minimize the outage durations of critical loads. The proposed sectionalizing model has been validated by the New-England 39-bus system and the IEEE 118-bus system. Further comparisons with some existing methods are carried out as well.
Resumo:
A new method of specifying the syntax of programming languages, known as hierarchical language specifications (HLS), is proposed. Efficient parallel algorithms for parsing languages generated by HLS are presented. These algorithms run on an exclusive-read exclusive-write parallel random-access machine. They require O(n) processors and O(log2n) time, where n is the length of the string to be parsed. The most important feature of these algorithms is that they do not use a stack.
Resumo:
This article describes advances in statistical computation for large-scale data analysis in structured Bayesian mixture models via graphics processing unit (GPU) programming. The developments are partly motivated by computational challenges arising in fitting models of increasing heterogeneity to increasingly large datasets. An example context concerns common biological studies using high-throughput technologies generating many, very large datasets and requiring increasingly high-dimensional mixture models with large numbers of mixture components.We outline important strategies and processes for GPU computation in Bayesian simulation and optimization approaches, give examples of the benefits of GPU implementations in terms of processing speed and scale-up in ability to analyze large datasets, and provide a detailed, tutorial-style exposition that will benefit readers interested in developing GPU-based approaches in other statistical models. Novel, GPU-oriented approaches to modifying existing algorithms software design can lead to vast speed-up and, critically, enable statistical analyses that presently will not be performed due to compute time limitations in traditional computational environments. Supplementalmaterials are provided with all source code, example data, and details that will enable readers to implement and explore the GPU approach in this mixture modeling context. © 2010 American Statistical Association, Institute of Mathematical Statistics, and Interface Foundation of North America.
Resumo:
Functional and non-functional concerns require different programming effort, different techniques and different methodologies when attempting to program efficient parallel/distributed applications. In this work we present a "programmer oriented" methodology based on formal tools that permits reasoning about parallel/distributed program development and refinement. The proposed methodology is semi-formal in that it does not require the exploitation of highly formal tools and techniques, while providing a palatable and effective support to programmers developing parallel/distributed applications, in particular when handling non-functional concerns.
Resumo:
Discrete optimization problems are very difficult to solve, even if the dimantion is small. For most of them the problem of finding an ε-approximate solution is already NP-hard.
Resumo:
We show a method for parallelizing top down dynamic programs in a straightforward way by a careful choice of a lock-free shared hash table implementation and randomization of the order in which the dynamic program computes its subproblems. This generic approach is applied to dynamic programs for knapsack, shortest paths, and RNA structure alignment, as well as to a state-of-the-art solution for minimizing the máximum number of open stacks. Experimental results are provided on three different modern multicore architectures which show that this parallelization is effective and reasonably scalable. In particular, we obtain over 10 times speedup for 32 threads on the open stacks problem.
Resumo:
Compilation techniques such as those portrayed by the Warren Abstract Machine(WAM) have greatly improved the speed of execution of logic programs. The research presented herein is geared towards providing additional performance to logic programs through the use of parallelism, while preserving the conventional semantics of logic languages. Two áreas to which special attention is given are the preservation of sequential performance and storage efficiency, and the use of low overhead mechanisms for controlling parallel execution. Accordingly, the techniques used for supporting parallelism are efficient extensions of those which have brought high inferencing speeds to sequential implementations. At a lower level, special attention is also given to design and simulation detail and to the architectural implications of the execution model behavior. This paper offers an overview of the basic concepts and techniques used in the parallel design, simulation tools used, and some of the results obtained to date.
Resumo:
We present a technique to estimate accurate speedups for parallel logic programs with relative independence from characteristics of a given implementation or underlying parallel hardware. The proposed technique is based on gathering accurate data describing one execution at run-time, which is fed to a simulator. Alternative schedulings are then simulated and estimates computed for the corresponding speedups. A tool implementing the aforementioned techniques is presented, and its predictions are compared to the performance of real systems, showing good correlation.
Resumo:
Incorporating the possibility of attaching attributes to variables in a logic programming system has been shown to allow the addition of general constraint solving capabilities to it. This approach is very attractive in that by adding a few primitives any logic programming system can be turned into a generic constraint logic programming system in which constraint solving can be user deñned, and at source level - an extreme example of the "glass box" approach. In this paper we propose a different and novel use for the concept of attributed variables: developing a generic parallel/concurrent (constraint) logic programming system, using the same "glass box" flavor. We argüe that a system which implements attributed variables and a few additional primitives can be easily customized at source level to implement many of the languages and execution models of parallelism and concurrency currently proposed, in both shared memory and distributed systems. We illustrate this through examples and report on an implementation of our ideas.