978 resultados para Android, Componenti, Sensori, IPC, Shared memory


Relevância:

100.00% 100.00%

Publicador:

Resumo:

The shared-memory programming model can be an effective way to achieve parallelism on shared memory parallel computers. Historically however, the lack of a programming standard using directives and the limited scalability have affected its take-up. Recent advances in hardware and software technologies have resulted in improvements to both the performance of parallel programs with compiler directives and the issue of portability with the introduction of OpenMP. In this study, the Computer Aided Parallelisation Toolkit has been extended to automatically generate OpenMP-based parallel programs with nominal user assistance. We categorize the different loop types and show how efficient directives can be placed using the toolkit's in-depth interprocedural analysis. Examples are taken from the NAS parallel benchmarks and a number of real-world application codes. This demonstrates the great potential of using the toolkit to quickly parallelise serial programs as well as the good performance achievable on up to 300 processors for hybrid message passing-directive parallelisations.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Despite the apparent simplicity of the OpenMP directive shared memory programming model and the sophisticated dependence analysis and code generation capabilities of the ParaWise/CAPO tools, experience shows that a level of expertise is required to produce efficient parallel code. In a real world application the investigation of a single loop in a generated parallel code can soon become an in-depth inspection of numerous dependencies in many routines. The additional understanding of dependencies is also needed to effectively interpret the information provided and supply the required feedback. The ParaWise Expert Assistant has been developed to automate this investigation and present questions to the user about, and in the context of, their application code. In this paper, we demonstrate that knowledge of dependence information and OpenMP are no longer essential to produce efficient parallel code with the Expert Assistant. It is hoped that this will enable a far wider audience to use the tools and subsequently, exploit the benefits of large parallel systems.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Code parallelization using OpenMP for shared memory systems is relatively easier than using message passing for distributed memory systems. Despite this, it is still a challenge to use OpenMP to parallelize application codes in a way that yields effective scalable performance when executed on a shared memory parallel system. We describe an environment that will assist the programmer in the various tasks of code parallelization and this is achieved in a greatly reduced time frame and level of skill required. The parallelization environment includes a number of tools that address the main tasks of parallelism detection, OpenMP source code generation, debugging and optimization. These tools include a high quality, fully interprocedural dependence analysis with user interaction capabilities to facilitate the generation of efficient parallel code, an automatic relative debugging tool to identify erroneous user decisions in that interaction and also performance profiling to identify bottlenecks. Finally, experiences of parallelizing some NASA application codes are presented to illustrate some of the benefits of using the evolving environment.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The scalability of a computer system is its response to growth. It is also depended on its hardware, its operating system and the applications it is running. Most distributed systems technology today still depends on bus-based shared memory which do not scale well, and systems based on the grid or hypercube scheme requires significantly less connections than a full inter-connection that would exhibit a quadratic growth rate. The rapid convergence of mobile communication, digital broadcasting and network infrastructures calls for rich multimedia content that is adaptive and responsive to the needs of individuals, businesses and the public organisations. This paper will discuss the emergence of mobile Multimedia systems and provides an overview of the issues regarding design and delivery of multimedia content to mobile devices.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Many scientific applications are programmed using hybrid programming models that use both message passing and shared memory, due to the increasing prevalence of large-scale systems with multicore, multisocket nodes. Previous work has shown that energy efficiency can be improved using software-controlled execution schemes that consider both the programming model and the power-aware execution capabilities of the system. However, such approaches have focused on identifying optimal resource utilization for one programming model, either shared memory or message passing, in isolation. The potential solution space, thus the challenge, increases substantially when optimizing hybrid models since the possible resource configurations increase exponentially. Nonetheless, with the accelerating adoption of hybrid programming models, we increasingly need improved energy efficiency in hybrid parallel applications on large-scale systems. In this work, we present new software-controlled execution schemes that consider the effects of dynamic concurrency throttling (DCT) and dynamic voltage and frequency scaling (DVFS) in the context of hybrid programming models. Specifically, we present predictive models and novel algorithms based on statistical analysis that anticipate application power and time requirements under different concurrency and frequency configurations. We apply our models and methods to the NPB MZ benchmarks and selected applications from the ASC Sequoia codes. Overall, we achieve substantial energy savings (8.74 percent on average and up to 13.8 percent) with some performance gain (up to 7.5 percent) or negligible performance loss.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

FastFlow is a structured parallel programming framework targeting shared memory multi-core architectures. In this paper we introduce a FastFlow extension aimed at supporting also a network of multi-core workstations. The extension supports the execution of FastFlow programs by coordinating-in a structured way-the fine grain parallel activities running on a single workstation. We discuss the design and the implementation of this extension presenting preliminary experimental results validating it on state-of-the-art networked multi-core nodes. © 2013 Springer-Verlag.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

FastFlow is a programming framework specifically targeting cache-coherent shared-memory multi-cores. It is implemented as a stack of C++ template libraries built on top of lock-free (and memory fence free) synchronization mechanisms. Its philosophy is to combine programmability with performance. In this paper a new FastFlow programming methodology aimed at supporting parallelization of existing sequential code via offloading onto a dynamically created software accelerator is presented. The new methodology has been validated using a set of simple micro-benchmarks and some real applications. © 2011 Springer-Verlag.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Recent trends in computing systems, such as multi-core processors and cloud computing, expose tens to thousands of processors to the software. Software developers must respond by introducing parallelism in their software. To obtain highest performance, it is not only necessary to identify parallelism, but also to reason about synchronization between threads and the communication of data from one thread to another. This entry gives an overview on some of the most common abstractions that are used in parallel programming, namely explicit vs. implicit expression of parallelism and shared and distributed memory. Several parallel programming models are reviewed and categorized by means of these abstractions. The pros and cons of parallel programming models from the perspective of performance and programmability are discussed.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The inherent difficulty of thread-based shared-memory programming has recently motivated research in high-level, task-parallel programming models. Recent advances of Task-Parallel models add implicit synchronization, where the system automatically detects and satisfies data dependencies among spawned tasks. However, dynamic dependence analysis incurs significant runtime overheads, because the runtime must track task resources and use this information to schedule tasks while avoiding conflicts and races.
We present SCOOP, a compiler that effectively integrates static and dynamic analysis in code generation. SCOOP combines context-sensitive points-to, control-flow, escape, and effect analyses to remove redundant dependence checks at runtime. Our static analysis can work in combination with existing dynamic analyses and task-parallel runtimes that use annotations to specify tasks and their memory footprints. We use our static dependence analysis to detect non-conflicting tasks and an existing dynamic analysis to handle the remaining dependencies. We evaluate the resulting hybrid dependence analysis on a set of task-parallel programs.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper introduces hybrid address spaces as a fundamental design methodology for implementing scalable runtime systems on many-core architectures without hardware support for cache coherence. We use hybrid address spaces for an implementation of MapReduce, a programming model for large-scale data processing, and the implementation of a remote memory access (RMA) model. Both implementations are available on the Intel SCC and are portable to similar architectures. We present the design and implementation of HyMR, a MapReduce runtime system whereby different stages and the synchronization operations between them alternate between a distributed memory address space and a shared memory address space, to improve performance and scalability. We compare HyMR to a reference implementation and we find that HyMR improves performance by a factor of 1.71× over a set of representative MapReduce benchmarks. We also compare HyMR with Phoenix++, a state-of-art implementation for systems with hardware-managed cache coherence in terms of scalability and sustained to peak data processing bandwidth, where HyMR demon- strates improvements of a factor of 3.1× and 3.2× respectively. We further evaluate our hybrid remote memory access (HyRMA) programming model and assess its performance to be superior of that of message passing.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Power, and consequently energy, has recently attained first-class system resource status, on par with conventional metrics such as CPU time. To reduce energy consumption, many hardware- and OS-level solutions have been investigated. However, application-level information - which can provide the system with valuable insights unattainable otherwise - was only considered in a handful of cases. We introduce OpenMPE, an extension to OpenMP designed for power management. OpenMP is the de-facto standard for programming parallel shared memory systems, but does not yet provide any support for power control. Our extension exposes (i) per-region multi-objective optimization hints and (ii) application-level adaptation parameters, in order to create energy-saving opportunities for the whole system stack. We have implemented OpenMPE support in a compiler and runtime system, and empirically evaluated its performance on two architectures, mobile and desktop. Our results demonstrate the effectiveness of OpenMPE with geometric mean energy savings across 9 use cases of 15 % while maintaining full quality of service.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

As data analytics are growing in importance they are also quickly becoming one of the dominant application domains that require parallel processing. This paper investigates the applicability of OpenMP, the dominant shared-memory parallel programming model in high-performance computing, to the domain of data analytics. We contrast the performance and programmability of key data analytics benchmarks against Phoenix++, a state-of-the-art shared memory map/reduce programming system. Our study shows that OpenMP outperforms the Phoenix++ system by a large margin for several benchmarks. In other cases, however, the programming model is lacking support for this application domain.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Interest on using teams of mobile robots has been growing, due to their potential to cooperate for diverse purposes, such as rescue, de-mining, surveillance or even games such as robotic soccer. These applications require a real-time middleware and wireless communication protocol that can support an efficient and timely fusion of the perception data from different robots as well as the development of coordinated behaviours. Coordinating several autonomous robots towards achieving a common goal is currently a topic of high interest, which can be found in many application domains. Despite these different application domains, the technical problem of building an infrastructure to support the integration of the distributed perception and subsequent coordinated action is similar. This problem becomes tougher with stronger system dynamics, e.g., when the robots move faster or interact with fast objects, leading to tighter real-time constraints. This thesis work addressed computing architectures and wireless communication protocols to support efficient information sharing and coordination strategies taking into account the real-time nature of robot activities. The thesis makes two main claims. Firstly, we claim that despite the use of a wireless communication protocol that includes arbitration mechanisms, the self-organization of the team communications in a dynamic round that also accounts for variable team membership, effectively reduces collisions within the team, independently of its current composition, significantly improving the quality of the communications. We will validate this claim in terms of packet losses and communication latency. We show how such self-organization of the communications can be achieved in an efficient way with the Reconfigurable and Adaptive TDMA protocol. Secondly, we claim that the development of distributed perception, cooperation and coordinated action for teams of mobile robots can be simplified by using a shared memory middleware that replicates in each cooperating robot all necessary remote data, the Real-Time Database (RTDB) middleware. These remote data copies, which are updated in the background by the selforganizing communications protocol, are extended with age information automatically computed by the middleware and are locally accessible through fast primitives. We validate our claim showing a parsimonious use of the communication medium, improved timing information with respect to the shared data and the simplicity of use and effectiveness of the proposed middleware shown in several use cases, reinforced with a reasonable impact in the Middle Size League of RoboCup.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The motion instability is an important issue that occurs during the operation of towed underwater vehicles (TUV), which considerably affects the accuracy of high precision acoustic instrumentations housed inside the same. Out of the various parameters responsible for this, the disturbances from the tow-ship are the most significant one. The present study focus on the motion dynamics of an underwater towing system with ship induced disturbances as the input. The study focus on an innovative system called two-part towing. The methodology involves numerical modeling of the tow system, which consists of modeling of the tow-cables and vehicles formulation. Previous study in this direction used a segmental approach for the modeling of the cable. Even though, the model was successful in predicting the heave response of the tow-body, instabilities were observed in the numerical solution. The present study devises a simple approach called lumped mass spring model (LMSM) for the cable formulation. In this work, the traditional LMSM has been modified in two ways. First, by implementing advanced time integration procedures and secondly, use of a modified beam model which uses only translational degrees of freedoms for solving beam equation. A number of time integration procedures, such as Euler, Houbolt, Newmark and HHT-α were implemented in the traditional LMSM and the strength and weakness of each scheme were numerically estimated. In most of the previous studies, hydrodynamic forces acting on the tow-system such as drag and lift etc. are approximated as analytical expression of velocities. This approach restricts these models to use simple cylindrical shaped towed bodies and may not be applicable modern tow systems which are diversed in shape and complexity. Hence, this particular study, hydrodynamic parameters such as drag and lift of the tow-system are estimated using CFD techniques. To achieve this, a RANS based CFD code has been developed. Further, a new convection interpolation scheme for CFD simulation, called BNCUS, which is blend of cell based and node based formulation, was proposed in the study and numerically tested. To account for the fact that simulation takes considerable time in solving fluid dynamic equations, a dedicated parallel computing setup has been developed. Two types of computational parallelisms are explored in the current study, viz; the model for shared memory processors and distributed memory processors. In the present study, shared memory model was used for structural dynamic analysis of towing system, distributed memory one was devised in solving fluid dynamic equations.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Scheduling tasks to efficiently use the available processor resources is crucial to minimizing the runtime of applications on shared-memory parallel processors. One factor that contributes to poor processor utilization is the idle time caused by long latency operations, such as remote memory references or processor synchronization operations. One way of tolerating this latency is to use a processor with multiple hardware contexts that can rapidly switch to executing another thread of computation whenever a long latency operation occurs, thus increasing processor utilization by overlapping computation with communication. Although multiple contexts are effective for tolerating latency, this effectiveness can be limited by memory and network bandwidth, by cache interference effects among the multiple contexts, and by critical tasks sharing processor resources with less critical tasks. This thesis presents techniques that increase the effectiveness of multiple contexts by intelligently scheduling threads to make more efficient use of processor pipeline, bandwidth, and cache resources. This thesis proposes thread prioritization as a fundamental mechanism for directing the thread schedule on a multiple-context processor. A priority is assigned to each thread either statically or dynamically and is used by the thread scheduler to decide which threads to load in the contexts, and to decide which context to switch to on a context switch. We develop a multiple-context model that integrates both cache and network effects, and shows how thread prioritization can both maintain high processor utilization, and limit increases in critical path runtime caused by multithreading. The model also shows that in order to be effective in bandwidth limited applications, thread prioritization must be extended to prioritize memory requests. We show how simple hardware can prioritize the running of threads in the multiple contexts, and the issuing of requests to both the local memory and the network. Simulation experiments show how thread prioritization is used in a variety of applications. Thread prioritization can improve the performance of synchronization primitives by minimizing the number of processor cycles wasted in spinning and devoting more cycles to critical threads. Thread prioritization can be used in combination with other techniques to improve cache performance and minimize cache interference between different working sets in the cache. For applications that are critical path limited, thread prioritization can improve performance by allowing processor resources to be devoted preferentially to critical threads. These experimental results show that thread prioritization is a mechanism that can be used to implement a wide range of scheduling policies.