193 resultados para parallelization
Resumo:
This report presents an overview of the current work performed by us in the context of the efficient parallel implementation of traditional logic programming systems. The work is based on the &-Prolog System, a system for the automatic parallelization and execution of logic programming languages within the Independent And-parallelism model, and the global analysis and parallelization tools which have been developed for this system. In order to make the report self-contained, we first describe the "classical" tools of the &-Prolog system. We then explain in detail the work performed in improving and generalizing the global analysis and parallelization tools. Also, we describe the objectives which will drive our future work in this area.
Resumo:
This paper focuses on the parallelization of an ocean model applying current multicore processor-based cluster architectures to an irregular computational mesh. The aim is to maximize the efficiency of the computational resources used. To make the best use of the resources offered by these architectures, this parallelization has been addressed at all the hardware levels of modern supercomputers: firstly, exploiting the internal parallelism of the CPU through vectorization; secondly, taking advantage of the multiple cores of each node using OpenMP; and finally, using the cluster nodes to distribute the computational mesh, using MPI for communication within the nodes. The speedup obtained with each parallelization technique as well as the combined overall speedup have been measured for the western Mediterranean Sea for different cluster configurations, achieving a speedup factor of 73.3 using 256 processors. The results also show the efficiency achieved in the different cluster nodes and the advantages obtained by combining OpenMP and MPI versus using only OpenMP or MPI. Finally, the scalability of the model has been analysed by examining computation and communication times as well as the communication and synchronization overhead due to parallelization.
Resumo:
The diversity of the networks (wired/wireless) prefers a TCP solution robust across a wide range of networks rather than fine-tuned for a particular one at the cost of another. TCP parallelization uses multiple virtual TCP connections to transfer data for an application process and opens a way to improve TCP performance across a wide range of environments - high bandwidth-delay product (BDP), wireless as well as conventional networks. In particular, it can significantly benefit the emerging high-speed wireless networks. Despite its potential to work well over a wide range of networks, it is not fully understood how TCP parallelization performs when experiencing various packet losses in the heterogeneous environment. This paper examines the current TCP parallelization related methods under various packet losses and shows how to improve the performance of TCP parallelization.
Resumo:
A new method for solving some hard combinatorial optimization problems is suggested, admitting a certain reformulation. Considering such a problem, several different similar problems are prepared which have the same set of solutions. They are solved on computer in parallel until one of them will be solved, and that solution is accepted. Notwithstanding the evident overhead, the whole run-time could be significantly reduced due to dispersion of velocities of combinatorial search in regarded cases. The efficiency of this approach is investigated on the concrete problem of finding short solutions of non-deterministic system of linear logical equations.
Resumo:
* This paper was made according to the program № 14 of fundamental scientific research of the Presidium of the Russian Academy of Sciences, the project 06-I-П14-052
Resumo:
The Mobile Network Optimization (MNO) technologies have advanced at a tremendous pace in recent years. And the Dynamic Network Optimization (DNO) concept emerged years ago, aimed to continuously optimize the network in response to variations in network traffic and conditions. Yet, DNO development is still at its infancy, mainly hindered by a significant bottleneck of the lengthy optimization runtime. This paper identifies parallelism in greedy MNO algorithms and presents an advanced distributed parallel solution. The solution is designed, implemented and applied to real-life projects whose results yield a significant, highly scalable and nearly linear speedup up to 6.9 and 14.5 on distributed 8-core and 16-core systems respectively. Meanwhile, optimization outputs exhibit self-consistency and high precision compared to their sequential counterpart. This is a milestone in realizing the DNO. Further, the techniques may be applied to similar greedy optimization algorithm based applications.
Resumo:
It has been years since the introduction of the Dynamic Network Optimization (DNO) concept, yet the DNO development is still at its infant stage, largely due to a lack of breakthrough in minimizing the lengthy optimization runtime. Our previous work, a distributed parallel solution, has achieved a significant speed gain. To cater for the increased optimization complexity pressed by the uptake of smartphones and tablets, however, this paper examines the potential areas for further improvement and presents a novel asynchronous distributed parallel design that minimizes the inter-process communications. The new approach is implemented and applied to real-life projects whose results demonstrate an augmented acceleration of 7.5 times on a 16-core distributed system compared to 6.1 of our previous solution. Moreover, there is no degradation in the optimization outcome. This is a solid sprint towards the realization of DNO.
Resumo:
Many-core systems are emerging from the need of more computational power and power efficiency. However there are many issues which still revolve around the many-core systems. These systems need specialized software before they can be fully utilized and the hardware itself may differ from the conventional computational systems. To gain efficiency from many-core system, programs need to be parallelized. In many-core systems the cores are small and less powerful than cores used in traditional computing, so running a conventional program is not an efficient option. Also in Network-on-Chip based processors the network might get congested and the cores might work at different speeds. In this thesis is, a dynamic load balancing method is proposed and tested on Intel 48-core Single-Chip Cloud Computer by parallelizing a fault simulator. The maximum speedup is difficult to obtain due to severe bottlenecks in the system. In order to exploit all the available parallelism of the Single-Chip Cloud Computer, a runtime approach capable of dynamically balancing the load during the fault simulation process is used. The proposed dynamic fault simulation approach on the Single-Chip Cloud Computer shows up to 45X speedup compared to a serial fault simulation approach. Many-core systems can draw enormous amounts of power, and if this power is not controlled properly, the system might get damaged. One way to manage power is to set power budget for the system. But if this power is drawn by just few cores of the many, these few cores get extremely hot and might get damaged. Due to increase in power density multiple thermal sensors are deployed on the chip area to provide realtime temperature feedback for thermal management techniques. Thermal sensor accuracy is extremely prone to intra-die process variation and aging phenomena. These factors lead to a situation where thermal sensor values drift from the nominal values. This necessitates efficient calibration techniques to be applied before the sensor values are used. In addition, in modern many-core systems cores have support for dynamic voltage and frequency scaling. Thermal sensors located on cores are sensitive to the core's current voltage level, meaning that dedicated calibration is needed for each voltage level. In this thesis a general-purpose software-based auto-calibration approach is also proposed for thermal sensors to calibrate thermal sensors on different range of voltages.
Resumo:
With the emergence of multi-cores into the mainstream, there is a growing need for systems to allow programmers and automated systems to reason about data dependencies and inherent parallelismin imperative object-oriented languages. In this paper we exploit the structure of object-oriented programs to abstract computational side-effects. We capture and validate these effects using a static type system. We use these as the basis of sufficient conditions for several different data and task parallelism patterns. We compliment our static type system with a lightweight runtime system to allow for parallelization in the presence of complex data flows. We have a functioning compiler and worked examples to demonstrate the practicality of our solution.
Resumo:
With the emergence of multi-core processors into the mainstream, parallel programming is no longer the specialized domain it once was. There is a growing need for systems to allow programmers to more easily reason about data dependencies and inherent parallelism in general purpose programs. Many of these programs are written in popular imperative programming languages like Java and C]. In this thesis I present a system for reasoning about side-effects of evaluation in an abstract and composable manner that is suitable for use by both programmers and automated tools such as compilers. The goal of developing such a system is to both facilitate the automatic exploitation of the inherent parallelism present in imperative programs and to allow programmers to reason about dependencies which may be limiting the parallelism available for exploitation in their applications. Previous work on languages and type systems for parallel computing has tended to focus on providing the programmer with tools to facilitate the manual parallelization of programs; programmers must decide when and where it is safe to employ parallelism without the assistance of the compiler or other automated tools. None of the existing systems combine abstraction and composition with parallelization and correctness checking to produce a framework which helps both programmers and automated tools to reason about inherent parallelism. In this work I present a system for abstractly reasoning about side-effects and data dependencies in modern, imperative, object-oriented languages using a type and effect system based on ideas from Ownership Types. I have developed sufficient conditions for the safe, automated detection and exploitation of a number task, data and loop parallelism patterns in terms of ownership relationships. To validate my work, I have applied my ideas to the C] version 3.0 language to produce a language extension called Zal. I have implemented a compiler for the Zal language as an extension of the GPC] research compiler as a proof of concept of my system. I have used it to parallelize a number of real-world applications to demonstrate the feasibility of my proposed approach. In addition to this empirical validation, I present an argument for the correctness of the type system and language semantics I have proposed as well as sketches of proofs for the correctness of the sufficient conditions for parallelization proposed.
Resumo:
X-ray microtomography (micro-CT) with micron resolution enables new ways of characterizing microstructures and opens pathways for forward calculations of multiscale rock properties. A quantitative characterization of the microstructure is the first step in this challenge. We developed a new approach to extract scale-dependent characteristics of porosity, percolation, and anisotropic permeability from 3-D microstructural models of rocks. The Hoshen-Kopelman algorithm of percolation theory is employed for a standard percolation analysis. The anisotropy of permeability is calculated by means of the star volume distribution approach. The local porosity distribution and local percolation probability are obtained by using the local porosity theory. Additionally, the local anisotropy distribution is defined and analyzed through two empirical probability density functions, the isotropy index and the elongation index. For such a high-resolution data set, the typical data sizes of the CT images are on the order of gigabytes to tens of gigabytes; thus an extremely large number of calculations are required. To resolve this large memory problem parallelization in OpenMP was used to optimally harness the shared memory infrastructure on cache coherent Non-Uniform Memory Access architecture machines such as the iVEC SGI Altix 3700Bx2 Supercomputer. We see adequate visualization of the results as an important element in this first pioneering study.
Resumo:
A parallel authentication and public-key encryption is introduced and exemplified on joint encryption and signing which compares favorably with sequential Encrypt-then-Sign (ɛtS) or Sign-then-Encrypt (Stɛ) schemes as far as both efficiency and security are concerned. A security model for signcryption, and thus joint encryption and signing, has been recently defined which considers possible attacks and security goals. Such a scheme is considered secure if the encryption part guarantees indistinguishability and the signature part prevents existential forgeries, for outsider but also insider adversaries. We propose two schemes of parallel signcryption, which are efficient alternative to Commit-then-Sign-and- Encrypt (Ct&G3&S). They are both provably secure in the random oracle model. The first one, called generic parallel encrypt and sign, is secure if the encryption scheme is semantically secure against chosen-ciphertext attacks and the signature scheme prevents existential forgeries against random-message attacks. The second scheme, called optimal parallel encrypt. and sign, applies random oracles similar to the OAEP technique in order to achieve security using encryption and signature components with very weak security requirements — encryption is expected to be one-way under chosen-plaintext attacks while signature needs to be secure against universal forgeries under random-plaintext attack, that is actually the case for both the plain-RSA encryption and signature under the usual RSA assumption. Both proposals are generic in the sense that any suitable encryption and signature schemes (i.e. which simply achieve required security) can be used. Furthermore they allow both parallel encryption and signing, as well as parallel decryption and verification. Properties of parallel encrypt and sign schemes are considered and a new security standard for parallel signcryption is proposed.
Resumo:
Many complex aeronautical design problems can be formulated with efficient multi-objective evolutionary optimization methods and game strategies. This book describes the role of advanced innovative evolution tools in the solution, or the set of solutions of single or multi disciplinary optimization. These tools use the concept of multi-population, asynchronous parallelization and hierarchical topology which allows different models including precise, intermediate and approximate models with each node belonging to the different hierarchical layer handled by a different Evolutionary Algorithm. The efficiency of evolutionary algorithms for both single and multi-objective optimization problems are significantly improved by the coupling of EAs with games and in particular by a new dynamic methodology named “Hybridized Nash-Pareto games”. Multi objective Optimization techniques and robust design problems taking into account uncertainties are introduced and explained in detail. Several applications dealing with civil aircraft and UAV, UCAV systems are implemented numerically and discussed. Applications of increasing optimization complexity are presented as well as two hands-on test cases problems. These examples focus on aeronautical applications and will be useful to the practitioner in the laboratory or in industrial design environments. The evolutionary methods coupled with games presented in this volume can be applied to other areas including surface and marine transport, structures, biomedical engineering, renewable energy and environmental problems.
Resumo:
Computational docking of ligands to protein structures is a key step in structure-based drug design. Currently, the time required for each docking run is high and thus limits the use of docking in a high-throughput manner, warranting parallelization of docking algorithms. AutoDock, a widely used tool, has been chosen for parallelization. Near-linear increases in speed were observed with 96 processors, reducing the time required for docking ligands to HIV-protease from 81 min, as an example, on a single IBM Power-5 processor ( 1.65 GHz), to about 1 min on an IBM cluster, with 96 such processors. This implementation would make it feasible to perform virtual ligand screening using AutoDock.