222 resultados para multicore


Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper describes the ParaPhrase project, a new 3-year targeted research project funded under EU Framework 7 Objective 3.4 (Computer Systems), starting in October 2011. ParaPhrase aims to follow a new approach to introducing parallelism using advanced refactoring techniques coupled with high-level parallel design patterns. The refactoring approach will use these design patterns to restructure programs defined as networks of software components into other forms that are more suited to parallel execution. The programmer will be aided by high-level cost information that will be integrated into the refactoring tools. The implementation of these patterns will then use a well-understood algorithmic skeleton approach to achieve good parallelism. A key ParaPhrase design goal is that parallel components are intended to match heterogeneous architectures, defined in terms of CPU/GPU combinations, for example. In order to achieve this, the ParaPhrase approach will map components at link time to the available hardware, and will then re-map them during program execution, taking account of multiple applications, changes in hardware resource availability, the desire to reduce communication costs etc. In this way, we aim to develop a new approach to programming that will be able to produce software that can adapt to dynamic changes in the system environment. Moreover, by using a strong component basis for parallelism, we can achieve potentially significant gains in terms of reducing sharing at a high level of abstraction, and so in reducing or even eliminating the costs that are usually associated with cache management, locking, and synchronisation. © 2013 Springer-Verlag Berlin Heidelberg.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The end of Dennard scaling has pushed power consumption into a first order concern for current systems, on par with performance. As a result, near-threshold voltage computing (NTVC) has been proposed as a potential means to tackle the limited cooling capacity of CMOS technology. Hardware operating in NTV consumes significantly less power, at the cost of lower frequency, and thus reduced performance, as well as increased error rates. In this paper, we investigate if a low-power systems-on-chip, consisting of ARM's asymmetric big.LITTLE technology, can be an alternative to conventional high performance multicore processors in terms of power/energy in an unreliable scenario. For our study, we use the Conjugate Gradient solver, an algorithm representative of the computations performed by a large range of scientific and engineering codes.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The end of Dennard scaling has promoted low power consumption into a firstorder concern for computing systems. However, conventional power conservation schemes such as voltage and frequency scaling are reaching their limits when used in performance-constrained environments. New technologies are required to break the power wall while sustaining performance on future processors. Low-power embedded processors and near-threshold voltage computing (NTVC) have been proposed as viable solutions to tackle the power wall in future computing systems. Unfortunately, these technologies may also compromise per-core performance and, in the case of NTVC, xreliability. These limitations would make them unsuitable for HPC systems and datacenters. In order to demonstrate that emerging low-power processing technologies can effectively replace conventional technologies, this study relies on ARM’s big.LITTLE processors as both an actual and emulation platform, and state-of-the-art implementations of the CG solver. For NTVC in particular, the paper describes how efficient algorithm-based fault tolerance schemes preserve the power and energy benefits of very low voltage operation.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Tese de doutoramento, Informática (Engenharia Informática), Universidade de Lisboa, Faculdade de Ciências, 2014

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Contention on the memory bus in COTS based multicore systems is becoming a major determining factor of the execution time of a task. Analyzing this extra execution time is non-trivial because (i) bus arbitration protocols in such systems are often undocumented and (ii) the times when the memory bus is requested to be used are not explicitly controlled by the operating system scheduler; they are instead a result of cache misses. We present a method for finding an upper bound on the extra execution time of a task due to contention on the memory bus in COTS based multicore systems. This method makes no assumptions on the bus arbitration protocol (other than assuming that it is work-conserving).

Relevância:

20.00% 20.00%

Publicador:

Resumo:

With the transition to multicore processors almost complete, the parallel processing community is seeking efficient ways to port legacy message passing applications on shared memory and multicore processors. MPJ Express is our reference implementation of Message Passing Interface (MPI)-like bindings for the Java language. Starting with the current release, the MPJ Express software can be configured in two modes: the multicore and the cluster mode. In the multicore mode, parallel Java applications execute on shared memory or multicore processors. In the cluster mode, Java applications parallelized using MPJ Express can be executed on distributed memory platforms like compute clusters and clouds. The multicore device has been implemented using Java threads in order to satisfy two main design goals of portability and performance. We also discuss the challenges of integrating the multicore device in the MPJ Express software. This turned out to be a challenging task because the parallel application executes in a single JVM in the multicore mode. On the contrary in the cluster mode, the parallel user application executes in multiple JVMs. Due to these inherent architectural differences between the two modes, the MPJ Express runtime is modified to ensure correct semantics of the parallel program. Towards the end, we compare performance of MPJ Express (multicore mode) with other C and Java message passing libraries---including mpiJava, MPJ/Ibis, MPICH2, MPJ Express (cluster mode)---on shared memory and multicore processors. We found out that MPJ Express performs signicantly better in the multicore mode than in the cluster mode. Not only this but the MPJ Express software also performs better in comparison to other Java messaging libraries including mpiJava and MPJ/Ibis when used in the multicore mode on shared memory or multicore processors. We also demonstrate effectiveness of the MPJ Express multicore device in Gadget-2, which is a massively parallel astrophysics N-body siimulation code.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The complexity of current and emerging high performance architectures provides users with options about how best to use the available resources, but makes predicting performance challenging. In this work a benchmark-driven performance modelling approach is outlined that is appro- priate for modern multicore architectures. The approach is demonstrated by constructing a model of a simple shallow water code on a Cray XE6 system, from application-specific benchmarks that illustrate precisely how architectural char- acteristics impact performance. The model is found to recre- ate observed scaling behaviour up to 16K cores, and used to predict optimal rank-core affinity strategies, exemplifying the type of problem such a model can be used for.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Previous work, in the area of defense systems has focused on developing a firewall like structure, in order to protect applications from attacks. The major drawback for implementing security in general, is that it affects the performance of the application they are trying to protect. In fact, most developers avoid implementing security at all. With the coming of new multicore systems, we might at last be able to minimize the performance issues that security places on applications. In our bodyguard framework we propose a new kind of defense that acts alongside, not in front, of applications. This means that performance issues that effect system applications are kept to a minimum, but at the same time still provide high grade security. Our experimental results demonstrate that a ten to fifteen percent speedup in performance is possible, with the potential of greater speedup.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

 Multicore network processors have been playing an increasingly important role in computational processes, which emphasize on scalability and parallelism of the systems, in distributed environments especially in Internet-based delay-sensitive applications. It is an important but unsolved issue, however, to efficiently schedule tasks in network processors with multicore and multithread for improving the system throughput as much as possible. Profiling can gather runtime environment information and guide the compiler to optimize programs through scheduling tasks based on the runtime context. This paper proposes a profiling-based task scheduling approach, targeting on improving the throughput of multicore network processor (Intel IXP) systems in the balanced pipeline way. In this work, we investigate a profiling-based task scheduling framework, a task scheduling algorithm, and a set of performance models. Our task allocation scheme maps tasks onto the pipeline architecture and multiple threads of network processors in parallel, which incorporates the profiling context and global thread refinement. We evaluate our task scheduling algorithm by implementing representative network applications on the Intel IXP network processor. Experimental results demonstrate that our algorithm is able to schedule tasks in a balanced pipeline fashion and achieve the high throughput and data transmission rate. Copyright © 2012 John Wiley & Sons, Ltd.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

For more than a decade now, multimedia developers have usually "ride the waves", so to speak, with the coming of each generation of microprocessors, which allows their applications, designs and programs to usually running more proficiently, efficiently and effectively. This so-called "free" ride seems to be coming to an end, with results of increases clock speeds, the widening of the gap in processor and memory performance, and the tradeoffs that are needed to meet the former two points, with the new multi-core systems. In this paper, we build upon our previous work within multi-core systems, by proposing a ubiquitous multi-core (UM) design. The goal of such a framework is help researchers to plan and implement their multimedia applications so they can take advantage of speed up computations of multi-core systems and allow real-time multimedia. As our experiments show, our UM system increases performance speeds at an average of 100%, with the average execution cost of 1.4ms, showing that multimedia can use multi-core resources efficiently and effectively.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Multicore processors are widely used in today's computer systems. Multicore virtualization technology provides an elastic solution to more efficiently utilize the multicore system. However, the Lock Holder Preemption (LHP) problem in the virtualized multicore systems causes significant CPU cycles wastes, which hurt virtual machine (VM) performance and reduces response latency. The system consolidates more VMs, the LHP problem becomes worse. In this paper, we propose an efficient consolidation-aware vCPU (CVS) scheduling scheme on multicore virtualization platform. Based on vCPU over-commitment rate, the CVS scheduling scheme adaptively selects one algorithm among three vCPU scheduling algorithms: co-scheduling, yield-to-head, and yield-to-tail based on the vCPU over-commitment rate because the actions of vCPU scheduling are split into many single steps such as scheduling vCPUs simultaneously or inserting one vCPU into the run-queue from the head or tail. The CVS scheme can effectively improve VM performance in the low, middle, and high VM consolidation scenarios. Using real-life parallel benchmarks, our experimental results show that the proposed CVS scheme improves the overall system performance while the optimization overhead remains low.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Artificial neural networks are usually applied to solve complex problems. In problems with more complexity, by increasing the number of layers and neurons, it is possible to achieve greater functional efficiency. Nevertheless, this leads to a greater computational effort. The response time is an important factor in the decision to use neural networks in some systems. Many argue that the computational cost is higher in the training period. However, this phase is held only once. Once the network trained, it is necessary to use the existing computational resources efficiently. In the multicore era, the problem boils down to efficient use of all available processing cores. However, it is necessary to consider the overhead of parallel computing. In this sense, this paper proposes a modular structure that proved to be more suitable for parallel implementations. It is proposed to parallelize the feedforward process of an RNA-type MLP, implemented with OpenMP on a shared memory computer architecture. The research consistes on testing and analizing execution times. Speedup, efficiency and parallel scalability are analyzed. In the proposed approach, by reducing the number of connections between remote neurons, the response time of the network decreases and, consequently, so does the total execution time. The time required for communication and synchronization is directly linked to the number of remote neurons in the network, and so it is necessary to investigate which one is the best distribution of remote connections

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This work presents a scalable and efficient parallel implementation of the Standard Simplex algorithm in the multicore architecture to solve large scale linear programming problems. We present a general scheme explaining how each step of the standard Simplex algorithm was parallelized, indicating some important points of the parallel implementation. Performance analysis were conducted by comparing the sequential time using the Simplex tableau and the Simplex of the CPLEXR IBM. The experiments were executed on a shared memory machine with 24 cores. The scalability analysis was performed with problems of different dimensions, finding evidence that our parallel standard Simplex algorithm has a better parallel efficiency for problems with more variables than constraints. In comparison with CPLEXR , the proposed parallel algorithm achieved a efficiency of up to 16 times better

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper the authors describe three cases of multicore myopathy in the same family. Case J was a white 77-year-old patient with proximal muscular atrophy and weakness, global hypotonia and global hypoactive deep tendon reflexes. Motor and sensory conduction studies were normal in all limbs. EMG examination showed a myopathic pattern with frequent spontaneous activity consisting of fibrillations and positive sharp waves. Histochemical reactions showed typical oxidative alterations of multicore myopathy. Cases 2 and 3 were the son and the daughter of case 1 respectively. They were both non-symptomatic patients with minimal EMG and histochemical alterations. These three patients illustrated the great clinical variability of this condition.