765 resultados para Parallel computing


Relevância:

30.00% 30.00%

Publicador:

Resumo:

ACKNOWLEDGEMENTS This research is based upon work supported in part by the U.S. ARL and U.K. Ministry of Defense under Agreement Number W911NF-06-3-0001, and by the NSF under award CNS-1213140. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views or represent the official policies of the NSF, the U.S. ARL, the U.S. Government, the U.K. Ministry of Defense or the U.K. Government. The U.S. and U.K. Governments are authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation hereon.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

ACKNOWLEDGEMENTS This research is based upon work supported in part by the U.S. ARL and U.K. Ministry of Defense under Agreement Number W911NF-06-3-0001, and by the NSF under award CNS-1213140. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views or represent the official policies of the NSF, the U.S. ARL, the U.S. Government, the U.K. Ministry of Defense or the U.K. Government. The U.S. and U.K. Governments are authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation hereon.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This book constitutes the refereed proceedings of the 14th International Conference on Parallel Problem Solving from Nature, PPSN 2016, held in Edinburgh, UK, in September 2016. The total of 93 revised full papers were carefully reviewed and selected from 224 submissions. The meeting began with four workshops which offered an ideal opportunity to explore specific topics in intelligent transportation Workshop, landscape-aware heuristic search, natural computing in scheduling and timetabling, and advances in multi-modal optimization. PPSN XIV also included sixteen free tutorials to give us all the opportunity to learn about new aspects: gray box optimization in theory; theory of evolutionary computation; graph-based and cartesian genetic programming; theory of parallel evolutionary algorithms; promoting diversity in evolutionary optimization: why and how; evolutionary multi-objective optimization; intelligent systems for smart cities; advances on multi-modal optimization; evolutionary computation in cryptography; evolutionary robotics - a practical guide to experiment with real hardware; evolutionary algorithms and hyper-heuristics; a bridge between optimization over manifolds and evolutionary computation; implementing evolutionary algorithms in the cloud; the attainment function approach to performance evaluation in EMO; runtime analysis of evolutionary algorithms: basic introduction; meta-model assisted (evolutionary) optimization. The papers are organized in topical sections on adaption, self-adaption and parameter tuning; differential evolution and swarm intelligence; dynamic, uncertain and constrained environments; genetic programming; multi-objective, many-objective and multi-level optimization; parallel algorithms and hardware issues; real-word applications and modeling; theory; diversity and landscape analysis.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper, we develop a fast implementation of an hyperspectral coded aperture (HYCA) algorithm on different platforms using OpenCL, an open standard for parallel programing on heterogeneous systems, which includes a wide variety of devices, from dense multicore systems from major manufactures such as Intel or ARM to new accelerators such as graphics processing units (GPUs), field programmable gate arrays (FPGAs), the Intel Xeon Phi and other custom devices. Our proposed implementation of HYCA significantly reduces its computational cost. Our experiments have been conducted using simulated data and reveal considerable acceleration factors. This kind of implementations with the same descriptive language on different architectures are very important in order to really calibrate the possibility of using heterogeneous platforms for efficient hyperspectral imaging processing in real remote sensing missions.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A parallel method for dynamic partitioning of unstructured meshes is described. The method employs a new iterative optimisation technique which both balances the workload and attempts to minimise the interprocessor communications overhead. Experiments on a series of adaptively refined meshes indicate that the algorithm provides partitions of an equivalent or higher quality to static partitioners (which do not reuse the existing partition) and much more quickly. Perhaps more importantly, the algorithm results in only a small fraction of the amount of data migration compared to the static partitioners.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In many areas of simulation, a crucial component for efficient numerical computations is the use of solution-driven adaptive features: locally adapted meshing or re-meshing; dynamically changing computational tasks. The full advantages of high performance computing (HPC) technology will thus only be able to be exploited when efficient parallel adaptive solvers can be realised. The resulting requirement for HPC software is for dynamic load balancing, which for many mesh-based applications means dynamic mesh re-partitioning. The DRAMA project has been initiated to address this issue, with a particular focus being the requirements of industrial Finite Element codes, but codes using Finite Volume formulations will also be able to make use of the project results.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A method is outlined for optimising graph partitions which arise in mapping unstructured mesh calculations to parallel computers. The method employs a relative gain iterative technique to both evenly balance the workload and minimise the number and volume of interprocessor communications. A parallel graph reduction technique is also briefly described and can be used to give a global perspective to the optimisation. The algorithms work efficiently in parallel as well as sequentially and when combined with a fast direct partitioning technique (such as the Greedy algorithm) to give an initial partition, the resulting two-stage process proves itself to be both a powerful and flexible solution to the static graph-partitioning problem. Experiments indicate that the resulting parallel code can provide high quality partitions, independent of the initial partition, within a few seconds. The algorithms can also be used for dynamic load-balancing, reusing existing partitions and in this case the procedures are much faster than static techniques, provide partitions of similar or higher quality and, in comparison, involve the migration of a fraction of the data.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Solving linear systems is an important problem for scientific computing. Exploiting parallelism is essential for solving complex systems, and this traditionally involves writing parallel algorithms on top of a library such as MPI. The SPIKE family of algorithms is one well-known example of a parallel solver for linear systems. The Hierarchically Tiled Array data type extends traditional data-parallel array operations with explicit tiling and allows programmers to directly manipulate tiles. The tiles of the HTA data type map naturally to the block nature of many numeric computations, including the SPIKE family of algorithms. The higher level of abstraction of the HTA enables the same program to be portable across different platforms. Current implementations target both shared-memory and distributed-memory models. In this thesis we present a proof-of-concept for portable linear solvers. We implement two algorithms from the SPIKE family using the HTA library. We show that our implementations of SPIKE exploit the abstractions provided by the HTA to produce a compact, clean code that can run on both shared-memory and distributed-memory models without modification. We discuss how we map the algorithms to HTA programs as well as examine their performance. We compare the performance of our HTA codes to comparable codes written in MPI as well as current state-of-the-art linear algebra routines.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Embedding intelligence in extreme edge devices allows distilling raw data acquired from sensors into actionable information, directly on IoT end-nodes. This computing paradigm, in which end-nodes no longer depend entirely on the Cloud, offers undeniable benefits, driving a large research area (TinyML) to deploy leading Machine Learning (ML) algorithms on micro-controller class of devices. To fit the limited memory storage capability of these tiny platforms, full-precision Deep Neural Networks (DNNs) are compressed by representing their data down to byte and sub-byte formats, in the integer domain. However, the current generation of micro-controller systems can barely cope with the computing requirements of QNNs. This thesis tackles the challenge from many perspectives, presenting solutions both at software and hardware levels, exploiting parallelism, heterogeneity and software programmability to guarantee high flexibility and high energy-performance proportionality. The first contribution, PULP-NN, is an optimized software computing library for QNN inference on parallel ultra-low-power (PULP) clusters of RISC-V processors, showing one order of magnitude improvements in performance and energy efficiency, compared to current State-of-the-Art (SoA) STM32 micro-controller systems (MCUs) based on ARM Cortex-M cores. The second contribution is XpulpNN, a set of RISC-V domain specific instruction set architecture (ISA) extensions to deal with sub-byte integer arithmetic computation. The solution, including the ISA extensions and the micro-architecture to support them, achieves energy efficiency comparable with dedicated DNN accelerators and surpasses the efficiency of SoA ARM Cortex-M based MCUs, such as the low-end STM32M4 and the high-end STM32H7 devices, by up to three orders of magnitude. To overcome the Von Neumann bottleneck while guaranteeing the highest flexibility, the final contribution integrates an Analog In-Memory Computing accelerator into the PULP cluster, creating a fully programmable heterogeneous fabric that demonstrates end-to-end inference capabilities of SoA MobileNetV2 models, showing two orders of magnitude performance improvements over current SoA analog/digital solutions.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Continuum parallel robots (CPRs) are manipulators employing multiple flexible beams arranged in parallel and connected to a rigid end-effector. CPRs promise higher payload and accuracy than serial CRs while keeping great flexibility. As the risk of injury during accidental contacts between a human and a CPR should be reduced, CPRs may be used in large-scale collaborative tasks or assisted robotic surgery. There exist various CPR designs, but the prototype conception is rarely based on performance considerations, and the CPRs realization in mainly based on intuitions or rigid-link parallel manipulators architectures. This thesis focuses on the performance analysis of CPRs, and the tools needed for such evaluation, such as workspace computation algorithms. In particular, workspace computation strategies for CPRs are essential for the performance assessment, since the CPRs workspace may be used as a performance index or it can serve for optimal-design tools. Two new workspace computation algorithms are proposed in this manuscript, the former focusing on the workspace volume computation and the certification of its numerical results, while the latter aims at computing the workspace boundary only. Due to the elastic nature of CPRs, a key performance indicator for these robots is the stability of their equilibrium configurations. This thesis proposes the experimental validation of the equilibrium stability assessment on a real prototype, demonstrating limitations of some commonly used assumptions. Additionally, a performance index measuring the distance to instability is originally proposed in this manuscript. Differently from the majority of the existing approaches, the clear advantage of the proposed index is a sound physical meaning; accordingly, the index can be used for a more straightforward performance quantification, and to derive robot specifications.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Modern High-Performance Computing HPC systems are gradually increasing in size and complexity due to the correspondent demand of larger simulations requiring more complicated tasks and higher accuracy. However, as side effects of the Dennard’s scaling approaching its ultimate power limit, the efficiency of software plays also an important role in increasing the overall performance of a computation. Tools to measure application performance in these increasingly complex environments provide insights into the intricate ways in which software and hardware interact. The monitoring of the power consumption in order to save energy is possible through processors interfaces like Intel Running Average Power Limit RAPL. Given the low level of these interfaces, they are often paired with an application-level tool like Performance Application Programming Interface PAPI. Since several problems in many heterogeneous fields can be represented as a complex linear system, an optimized and scalable linear system solver algorithm can decrease significantly the time spent to compute its resolution. One of the most widely used algorithms deployed for the resolution of large simulation is the Gaussian Elimination, which has its most popular implementation for HPC systems in the Scalable Linear Algebra PACKage ScaLAPACK library. However, another relevant algorithm, which is increasing in popularity in the academic field, is the Inhibition Method. This thesis compares the energy consumption of the Inhibition Method and Gaussian Elimination from ScaLAPACK to profile their execution during the resolution of linear systems above the HPC architecture offered by CINECA. Moreover, it also collates the energy and power values for different ranks, nodes, and sockets configurations. The monitoring tools employed to track the energy consumption of these algorithms are PAPI and RAPL, that will be integrated with the parallel execution of the algorithms managed with the Message Passing Interface MPI.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Universidade Estadual de Campinas . Faculdade de Educação Física

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Prince Maximilian zu Wied's great exploration of coastal Brazil in 1815-1817 resulted in important collections of reptiles, amphibians, birds, and mammals, many of which were new species later described by Wied himself The bulk of his collection was purchased for the American Museum of Natural History in 1869, although many ""type specimens"" had disappeared earlier. Wied carefully identified his localities but did not designate type specimens or type localities, which are taxonomic concepts that were not yet established. Information and manuscript names on a fraction (17 species) of his Brazilian reptiles and amphibians were transmitted by Wied to Prof. Heinrich Rudolf Schinz at the University of Zurich. Schinz included these species (credited to their discoverer ""Princ. Max."") in the second volume of Das Thierreich ... (1822). Most are junior objective synonyms of names published by Wied. However, six of the 17 names used by Schinz predate Wied's own publications. Three were manuscript names never published by Wied because he determined the species to be previously known. (1) Lacerta vittata Schinz, 1822 (a nomen oblitum) = Lacerta striata sensu Wied (a misidentification, non Linnaeus nec sensu Merrem) = Kentropyx calcarata Spix, 1825, herein qualified as a nomen protectum. (2) Polychrus virescens Schinz, 1822 = Lacerta marmorata Linnaeus, 1758 (now Polychrus marmoratus). (3) Scincus cyanurus Schinz, 1822 (a nomen oblitum) = Gymnophthalmus quadrilineatus sensu Wied (a misidentification, non Linnaeus nec sensu Merrem) = Micrablepharus maximiliani (Reinhardt and Lutken, ""1861"" [1862]), herein qualified as a nomen protectum. Qualifying Scincus cyanurus Schinz, 1822, as a nomen oblitum also removes the problem of homonymy with the later-named Pacific skink Scincus cyanurus Lesson (= Emoia cyanura). The remaining three names used by Schinz are senior objective synonyms that take priority over Wied's names. (4) Bufo cinctus Schinz, 1822, is senior to Bufo cinctus Wied, 1823; both, however, are junior synonyms of Bufo crucifer Wied, 1821 = Chaunus crucifer (Wied). (5) Agama picta Schinz, 1822, is senior to Agama picta Wied, 1823, requiring a change of authorship for this poorly known species, to be known as Enyalius pictus (Schinz). (6) Lacerta cyanomelas Schinz, 1822, predates Teius cyanomelas Wied, 1824 (1822-1831) both nomina oblita. Wied's illustration and description shows cyanomelas as apparently conspecific with the recently described but already well-known Cnemidophorus nativo Rocha et al., 1997, which is the valid name because of its qualification herein as a nomen protectum. The preceding specific name cyanomelas (as corrected in an errata section) is misspelled several ways in different copies of Schinz's original description (""cyanom las,"" ""cyanomlas,"" and cyanom""). Loosening, separation, and final loss of the last three letters of movable type in the printing chase probably accounts for the variant misspellings.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper discusses the integrated design of parallel manipulators, which exhibit varying dynamics. This characteristic affects the machine stability and performance. The design methodology consists of four main steps: (i) the system modeling using flexible multibody technique, (ii) the synthesis of reduced-order models suitable for control design, (iii) the systematic flexible model-based input signal design, and (iv) the evaluation of some possible machine designs. The novelty in this methodology is to take structural flexibilities into consideration during the input signal design; therefore, enhancing the standard design process which mainly considers rigid bodies dynamics. The potential of the proposed strategy is exploited for the design evaluation of a two degree-of-freedom high-speed parallel manipulator. The results are experimentally validated. (C) 2010 Elsevier Ltd. All rights reserved.