829 resultados para Parallel programming
Resumo:
"Supported in part by the Advanced Research Projects Agency ... under Contract no. US AF 30(602) 4144."
Resumo:
-scale vary from a planetary scale and million years for convection problems to 100km and 10 years for fault systems simulations. Various techniques are in use to deal with the time dependency (e.g. Crank-Nicholson), with the non-linearity (e.g. Newton-Raphson) and weakly coupled equations (e.g. non-linear Gauss-Seidel). Besides these high-level solution algorithms discretization methods (e.g. finite element method (FEM), boundary element method (BEM)) are used to deal with spatial derivatives. Typically, large-scale, three dimensional meshes are required to resolve geometrical complexity (e.g. in the case of fault systems) or features in the solution (e.g. in mantel convection simulations). The modelling environment escript allows the rapid implementation of new physics as required for the development of simulation codes in earth sciences. Its main object is to provide a programming language, where the user can define new models and rapidly develop high-level solution algorithms. The current implementation is linked with the finite element package finley as a PDE solver. However, the design is open and other discretization technologies such as finite differences and boundary element methods could be included. escript is implemented as an extension of the interactive programming environment python (see www.python.org). Key concepts introduced are Data objects, which are holding values on nodes or elements of the finite element mesh, and linearPDE objects, which are defining linear partial differential equations to be solved by the underlying discretization technology. In this paper we will show the basic concepts of escript and will show how escript is used to implement a simulation code for interacting fault systems. We will show some results of large-scale, parallel simulations on an SGI Altix system. Acknowledgements: Project work is supported by Australian Commonwealth Government through the Australian Computational Earth Systems Simulator Major National Research Facility, Queensland State Government Smart State Research Facility Fund, The University of Queensland and SGI.
The effective use of implicit parallelism through the use of an object-oriented programming language
Resumo:
This thesis explores translating well-written sequential programs in a subset of the Eiffel programming language - without syntactic or semantic extensions - into parallelised programs for execution on a distributed architecture. The main focus is on constructing two object-oriented models: a theoretical self-contained model of concurrency which enables a simplified second model for implementing the compiling process. There is a further presentation of principles that, if followed, maximise the potential levels of parallelism. Model of Concurrency. The concurrency model is designed to be a straightforward target for mapping sequential programs onto, thus making them parallel. It aids the compilation process by providing a high level of abstraction, including a useful model of parallel behaviour which enables easy incorporation of message interchange, locking, and synchronization of objects. Further, the model is sufficient such that a compiler can and has been practically built. Model of Compilation. The compilation-model's structure is based upon an object-oriented view of grammar descriptions and capitalises on both a recursive-descent style of processing and abstract syntax trees to perform the parsing. A composite-object view with an attribute grammar style of processing is used to extract sufficient semantic information for the parallelisation (i.e. code-generation) phase. Programming Principles. The set of principles presented are based upon information hiding, sharing and containment of objects and the dividing up of methods on the basis of a command/query division. When followed, the level of potential parallelism within the presented concurrency model is maximised. Further, these principles naturally arise from good programming practice. Summary. In summary this thesis shows that it is possible to compile well-written programs, written in a subset of Eiffel, into parallel programs without any syntactic additions or semantic alterations to Eiffel: i.e. no parallel primitives are added, and the parallel program is modelled to execute with equivalent semantics to the sequential version. If the programming principles are followed, a parallelised program achieves the maximum level of potential parallelisation within the concurrency model.
Resumo:
Image segmentation is one of the most computationally intensive operations in image processing and computer vision. This is because a large volume of data is involved and many different features have to be extracted from the image data. This thesis is concerned with the investigation of practical issues related to the implementation of several classes of image segmentation algorithms on parallel architectures. The Transputer is used as the basic building block of hardware architectures and Occam is used as the programming language. The segmentation methods chosen for implementation are convolution, for edge-based segmentation; the Split and Merge algorithm for segmenting non-textured regions; and the Granlund method for segmentation of textured images. Three different convolution methods have been implemented. The direct method of convolution, carried out in the spatial domain, uses the array architecture. The other two methods, based on convolution in the frequency domain, require the use of the two-dimensional Fourier transform. Parallel implementations of two different Fast Fourier Transform algorithms have been developed, incorporating original solutions. For the Row-Column method the array architecture has been adopted, and for the Vector-Radix method, the pyramid architecture. The texture segmentation algorithm, for which a system-level design is given, demonstrates a further application of the Vector-Radix Fourier transform. A novel concurrent version of the quad-tree based Split and Merge algorithm has been implemented on the pyramid architecture. The performance of the developed parallel implementations is analysed. Many of the obtained speed-up and efficiency measures show values close to their respective theoretical maxima. Where appropriate comparisons are drawn between different implementations. The thesis concludes with comments on general issues related to the use of the Transputer system as a development tool for image processing applications; and on the issues related to the engineering of concurrent image processing applications.
Resumo:
* The research is supported partly by INTAS: 04-77-7173 project, http://www.intas.be
Resumo:
Femtosecond laser microfabrication has emerged over the last decade as a 3D flexible technology in photonics. Numerical simulations provide an important insight into spatial and temporal beam and pulse shaping during the course of extremely intricate nonlinear propagation (see e.g. [1,2]). Electromagnetics of such propagation is typically described in the form of the generalized Non-Linear Schrdinger Equation (NLSE) coupled with Drude model for plasma [3]. In this paper we consider a multi-threaded parallel numerical solution for a specific model which describes femtosecond laser pulse propagation in transparent media [4, 5]. However our approach can be extended to similar models. The numerical code is implemented in NVIDIA Graphics Processing Unit (GPU) which provides an effitient hardware platform for multi-threded computing. We compare the performance of the described below parallel code implementated for GPU using CUDA programming interface [3] with a serial CPU version used in our previous papers [4,5]. © 2011 IEEE.
Resumo:
This book constitutes the refereed proceedings of the 14th International Conference on Parallel Problem Solving from Nature, PPSN 2016, held in Edinburgh, UK, in September 2016. The total of 93 revised full papers were carefully reviewed and selected from 224 submissions. The meeting began with four workshops which offered an ideal opportunity to explore specific topics in intelligent transportation Workshop, landscape-aware heuristic search, natural computing in scheduling and timetabling, and advances in multi-modal optimization. PPSN XIV also included sixteen free tutorials to give us all the opportunity to learn about new aspects: gray box optimization in theory; theory of evolutionary computation; graph-based and cartesian genetic programming; theory of parallel evolutionary algorithms; promoting diversity in evolutionary optimization: why and how; evolutionary multi-objective optimization; intelligent systems for smart cities; advances on multi-modal optimization; evolutionary computation in cryptography; evolutionary robotics - a practical guide to experiment with real hardware; evolutionary algorithms and hyper-heuristics; a bridge between optimization over manifolds and evolutionary computation; implementing evolutionary algorithms in the cloud; the attainment function approach to performance evaluation in EMO; runtime analysis of evolutionary algorithms: basic introduction; meta-model assisted (evolutionary) optimization. The papers are organized in topical sections on adaption, self-adaption and parameter tuning; differential evolution and swarm intelligence; dynamic, uncertain and constrained environments; genetic programming; multi-objective, many-objective and multi-level optimization; parallel algorithms and hardware issues; real-word applications and modeling; theory; diversity and landscape analysis.
Resumo:
A scenario-based two-stage stochastic programming model for gas production network planning under uncertainty is usually a large-scale nonconvex mixed-integer nonlinear programme (MINLP), which can be efficiently solved to global optimality with nonconvex generalized Benders decomposition (NGBD). This paper is concerned with the parallelization of NGBD to exploit multiple available computing resources. Three parallelization strategies are proposed, namely, naive scenario parallelization, adaptive scenario parallelization, and adaptive scenario and bounding parallelization. Case study of two industrial natural gas production network planning problems shows that, while the NGBD without parallelization is already faster than a state-of-the-art global optimization solver by an order of magnitude, the parallelization can improve the efficiency by several times on computers with multicore processors. The adaptive scenario and bounding parallelization achieves the best overall performance among the three proposed parallelization strategies.
Resumo:
Thesis (Ph.D.)--University of Washington, 2016-08
Resumo:
Embedded systems are increasingly integral to daily life, improving and facilitating the efficiency of modern Cyber-Physical Systems which provide access to sensor data, and actuators. As modern architectures become increasingly complex and heterogeneous, their optimization becomes a challenging task. Additionally, ensuring platform security is important to avoid harm to individuals and assets. This study primarily addresses challenges in contemporary Embedded Systems, focusing on platform optimization and security enforcement. The initial section of this study delves into the application of machine learning methods to efficiently determine the optimal number of cores for a parallel RISC-V cluster to minimize energy consumption using static source code analysis. Results demonstrate that automated platform configuration is not only viable but also that there is a moderate performance trade-off when relying solely on static features. The second part focuses on addressing the problem of heterogeneous device mapping, which involves assigning tasks to the most suitable computational device in a heterogeneous platform for optimal runtime. The contribution of this section lies in the introduction of novel pre-processing techniques, along with a training framework called Siamese Networks, that enhances the classification performance of DeepLLVM, an advanced approach for task mapping. Importantly, these proposed approaches are independent from the specific deep-learning model used. Finally, this research work focuses on addressing issues concerning the binary exploitation of software running in modern Embedded Systems. It proposes an architecture to implement Control-Flow Integrity in embedded platforms with a Root-of-Trust, aiming to enhance security guarantees with limited hardware modifications. The approach involves enhancing the architecture of a modern RISC-V platform for autonomous vehicles by implementing a side-channel communication mechanism that relays control-flow changes executed by the process running on the host core to the Root-of-Trust. This approach has limited impact on performance and it is effective in enhancing the security of embedded platforms.
Resumo:
Modern High-Performance Computing HPC systems are gradually increasing in size and complexity due to the correspondent demand of larger simulations requiring more complicated tasks and higher accuracy. However, as side effects of the Dennard’s scaling approaching its ultimate power limit, the efficiency of software plays also an important role in increasing the overall performance of a computation. Tools to measure application performance in these increasingly complex environments provide insights into the intricate ways in which software and hardware interact. The monitoring of the power consumption in order to save energy is possible through processors interfaces like Intel Running Average Power Limit RAPL. Given the low level of these interfaces, they are often paired with an application-level tool like Performance Application Programming Interface PAPI. Since several problems in many heterogeneous fields can be represented as a complex linear system, an optimized and scalable linear system solver algorithm can decrease significantly the time spent to compute its resolution. One of the most widely used algorithms deployed for the resolution of large simulation is the Gaussian Elimination, which has its most popular implementation for HPC systems in the Scalable Linear Algebra PACKage ScaLAPACK library. However, another relevant algorithm, which is increasing in popularity in the academic field, is the Inhibition Method. This thesis compares the energy consumption of the Inhibition Method and Gaussian Elimination from ScaLAPACK to profile their execution during the resolution of linear systems above the HPC architecture offered by CINECA. Moreover, it also collates the energy and power values for different ranks, nodes, and sockets configurations. The monitoring tools employed to track the energy consumption of these algorithms are PAPI and RAPL, that will be integrated with the parallel execution of the algorithms managed with the Message Passing Interface MPI.
Resumo:
Prince Maximilian zu Wied's great exploration of coastal Brazil in 1815-1817 resulted in important collections of reptiles, amphibians, birds, and mammals, many of which were new species later described by Wied himself The bulk of his collection was purchased for the American Museum of Natural History in 1869, although many ""type specimens"" had disappeared earlier. Wied carefully identified his localities but did not designate type specimens or type localities, which are taxonomic concepts that were not yet established. Information and manuscript names on a fraction (17 species) of his Brazilian reptiles and amphibians were transmitted by Wied to Prof. Heinrich Rudolf Schinz at the University of Zurich. Schinz included these species (credited to their discoverer ""Princ. Max."") in the second volume of Das Thierreich ... (1822). Most are junior objective synonyms of names published by Wied. However, six of the 17 names used by Schinz predate Wied's own publications. Three were manuscript names never published by Wied because he determined the species to be previously known. (1) Lacerta vittata Schinz, 1822 (a nomen oblitum) = Lacerta striata sensu Wied (a misidentification, non Linnaeus nec sensu Merrem) = Kentropyx calcarata Spix, 1825, herein qualified as a nomen protectum. (2) Polychrus virescens Schinz, 1822 = Lacerta marmorata Linnaeus, 1758 (now Polychrus marmoratus). (3) Scincus cyanurus Schinz, 1822 (a nomen oblitum) = Gymnophthalmus quadrilineatus sensu Wied (a misidentification, non Linnaeus nec sensu Merrem) = Micrablepharus maximiliani (Reinhardt and Lutken, ""1861"" [1862]), herein qualified as a nomen protectum. Qualifying Scincus cyanurus Schinz, 1822, as a nomen oblitum also removes the problem of homonymy with the later-named Pacific skink Scincus cyanurus Lesson (= Emoia cyanura). The remaining three names used by Schinz are senior objective synonyms that take priority over Wied's names. (4) Bufo cinctus Schinz, 1822, is senior to Bufo cinctus Wied, 1823; both, however, are junior synonyms of Bufo crucifer Wied, 1821 = Chaunus crucifer (Wied). (5) Agama picta Schinz, 1822, is senior to Agama picta Wied, 1823, requiring a change of authorship for this poorly known species, to be known as Enyalius pictus (Schinz). (6) Lacerta cyanomelas Schinz, 1822, predates Teius cyanomelas Wied, 1824 (1822-1831) both nomina oblita. Wied's illustration and description shows cyanomelas as apparently conspecific with the recently described but already well-known Cnemidophorus nativo Rocha et al., 1997, which is the valid name because of its qualification herein as a nomen protectum. The preceding specific name cyanomelas (as corrected in an errata section) is misspelled several ways in different copies of Schinz's original description (""cyanom las,"" ""cyanomlas,"" and cyanom""). Loosening, separation, and final loss of the last three letters of movable type in the printing chase probably accounts for the variant misspellings.
Resumo:
In this paper, artificial neural networks are employed in a novel approach to identify harmonic components of single-phase nonlinear load currents, whose amplitude and phase angle are subject to unpredictable changes, even in steady-state. The first six harmonic current components are identified through the variation analysis of waveform characteristics. The effectiveness of this method is tested by applying it to the model of a single-phase active power filter, dedicated to the selective compensation of harmonic current drained by an AC controller. Simulation and experimental results are presented to validate the proposed approach. (C) 2010 Elsevier B. V. All rights reserved.
Resumo:
This paper discusses the integrated design of parallel manipulators, which exhibit varying dynamics. This characteristic affects the machine stability and performance. The design methodology consists of four main steps: (i) the system modeling using flexible multibody technique, (ii) the synthesis of reduced-order models suitable for control design, (iii) the systematic flexible model-based input signal design, and (iv) the evaluation of some possible machine designs. The novelty in this methodology is to take structural flexibilities into consideration during the input signal design; therefore, enhancing the standard design process which mainly considers rigid bodies dynamics. The potential of the proposed strategy is exploited for the design evaluation of a two degree-of-freedom high-speed parallel manipulator. The results are experimentally validated. (C) 2010 Elsevier Ltd. All rights reserved.