193 resultados para multicore
Resumo:
This work presents synopsis of efficient strategies used in power managements for achieving the most economical power and energy consumption in multicore systems, FPGA and NoC Platforms. In this work, a practical approach was taken, in an effort to validate the significance of the proposed Adaptive Power Management Algorithm (APMA), proposed for system developed, for this thesis project. This system comprise arithmetic and logic unit, up and down counters, adder, state machine and multiplexer. The essence of carrying this project firstly, is to develop a system that will be used for this power management project. Secondly, to perform area and power synopsis of the system on these various scalable technology platforms, UMC 90nm nanotechnology 1.2v, UMC 90nm nanotechnology 1.32v and UMC 0.18 μmNanotechnology 1.80v, in order to examine the difference in area and power consumption of the system on the platforms. Thirdly, to explore various strategies that can be used to reducing system’s power consumption and to propose an adaptive power management algorithm that can be used to reduce the power consumption of the system. The strategies introduced in this work comprise Dynamic Voltage Frequency Scaling (DVFS) and task parallelism. After the system development, it was run on FPGA board, basically NoC Platforms and on these various technology platforms UMC 90nm nanotechnology1.2v, UMC 90nm nanotechnology 1.32v and UMC180 nm nanotechnology 1.80v, the system synthesis was successfully accomplished, the simulated result analysis shows that the system meets all functional requirements, the power consumption and the area utilization were recorded and analyzed in chapter 7 of this work. This work extensively reviewed various strategies for managing power consumption which were quantitative research works by many researchers and companies, it's a mixture of study analysis and experimented lab works, it condensed and presents the whole basic concepts of power management strategy from quality technical papers.
Resumo:
With the transition to multicore processors almost complete, the parallel processing community is seeking efficient ways to port legacy message passing applications on shared memory and multicore processors. MPJ Express is our reference implementation of Message Passing Interface (MPI)-like bindings for the Java language. Starting with the current release, the MPJ Express software can be configured in two modes: the multicore and the cluster mode. In the multicore mode, parallel Java applications execute on shared memory or multicore processors. In the cluster mode, Java applications parallelized using MPJ Express can be executed on distributed memory platforms like compute clusters and clouds. The multicore device has been implemented using Java threads in order to satisfy two main design goals of portability and performance. We also discuss the challenges of integrating the multicore device in the MPJ Express software. This turned out to be a challenging task because the parallel application executes in a single JVM in the multicore mode. On the contrary in the cluster mode, the parallel user application executes in multiple JVMs. Due to these inherent architectural differences between the two modes, the MPJ Express runtime is modified to ensure correct semantics of the parallel program. Towards the end, we compare performance of MPJ Express (multicore mode) with other C and Java message passing libraries---including mpiJava, MPJ/Ibis, MPICH2, MPJ Express (cluster mode)---on shared memory and multicore processors. We found out that MPJ Express performs signicantly better in the multicore mode than in the cluster mode. Not only this but the MPJ Express software also performs better in comparison to other Java messaging libraries including mpiJava and MPJ/Ibis when used in the multicore mode on shared memory or multicore processors. We also demonstrate effectiveness of the MPJ Express multicore device in Gadget-2, which is a massively parallel astrophysics N-body siimulation code.
Resumo:
The complexity of current and emerging high performance architectures provides users with options about how best to use the available resources, but makes predicting performance challenging. In this work a benchmark-driven performance modelling approach is outlined that is appro- priate for modern multicore architectures. The approach is demonstrated by constructing a model of a simple shallow water code on a Cray XE6 system, from application-specific benchmarks that illustrate precisely how architectural char- acteristics impact performance. The model is found to recre- ate observed scaling behaviour up to 16K cores, and used to predict optimal rank-core affinity strategies, exemplifying the type of problem such a model can be used for.
Resumo:
Artificial neural networks are usually applied to solve complex problems. In problems with more complexity, by increasing the number of layers and neurons, it is possible to achieve greater functional efficiency. Nevertheless, this leads to a greater computational effort. The response time is an important factor in the decision to use neural networks in some systems. Many argue that the computational cost is higher in the training period. However, this phase is held only once. Once the network trained, it is necessary to use the existing computational resources efficiently. In the multicore era, the problem boils down to efficient use of all available processing cores. However, it is necessary to consider the overhead of parallel computing. In this sense, this paper proposes a modular structure that proved to be more suitable for parallel implementations. It is proposed to parallelize the feedforward process of an RNA-type MLP, implemented with OpenMP on a shared memory computer architecture. The research consistes on testing and analizing execution times. Speedup, efficiency and parallel scalability are analyzed. In the proposed approach, by reducing the number of connections between remote neurons, the response time of the network decreases and, consequently, so does the total execution time. The time required for communication and synchronization is directly linked to the number of remote neurons in the network, and so it is necessary to investigate which one is the best distribution of remote connections
Resumo:
This work presents a scalable and efficient parallel implementation of the Standard Simplex algorithm in the multicore architecture to solve large scale linear programming problems. We present a general scheme explaining how each step of the standard Simplex algorithm was parallelized, indicating some important points of the parallel implementation. Performance analysis were conducted by comparing the sequential time using the Simplex tableau and the Simplex of the CPLEXR IBM. The experiments were executed on a shared memory machine with 24 cores. The scalability analysis was performed with problems of different dimensions, finding evidence that our parallel standard Simplex algorithm has a better parallel efficiency for problems with more variables than constraints. In comparison with CPLEXR , the proposed parallel algorithm achieved a efficiency of up to 16 times better
Resumo:
In this paper the authors describe three cases of multicore myopathy in the same family. Case J was a white 77-year-old patient with proximal muscular atrophy and weakness, global hypotonia and global hypoactive deep tendon reflexes. Motor and sensory conduction studies were normal in all limbs. EMG examination showed a myopathic pattern with frequent spontaneous activity consisting of fibrillations and positive sharp waves. Histochemical reactions showed typical oxidative alterations of multicore myopathy. Cases 2 and 3 were the son and the daughter of case 1 respectively. They were both non-symptomatic patients with minimal EMG and histochemical alterations. These three patients illustrated the great clinical variability of this condition.
Resumo:
This thesis explores the capabilities of heterogeneous multi-core systems, based on multiple Graphics Processing Units (GPUs) in a standard desktop framework. Multi-GPU accelerated desk side computers are an appealing alternative to other high performance computing (HPC) systems: being composed of commodity hardware components fabricated in large quantities, their price-performance ratio is unparalleled in the world of high performance computing. Essentially bringing “supercomputing to the masses”, this opens up new possibilities for application fields where investing in HPC resources had been considered unfeasible before. One of these is the field of bioelectrical imaging, a class of medical imaging technologies that occupy a low-cost niche next to million-dollar systems like functional Magnetic Resonance Imaging (fMRI). In the scope of this work, several computational challenges encountered in bioelectrical imaging are tackled with this new kind of computing resource, striving to help these methods approach their true potential. Specifically, the following main contributions were made: Firstly, a novel dual-GPU implementation of parallel triangular matrix inversion (TMI) is presented, addressing an crucial kernel in computation of multi-mesh head models of encephalographic (EEG) source localization. This includes not only a highly efficient implementation of the routine itself achieving excellent speedups versus an optimized CPU implementation, but also a novel GPU-friendly compressed storage scheme for triangular matrices. Secondly, a scalable multi-GPU solver for non-hermitian linear systems was implemented. It is integrated into a simulation environment for electrical impedance tomography (EIT) that requires frequent solution of complex systems with millions of unknowns, a task that this solution can perform within seconds. In terms of computational throughput, it outperforms not only an highly optimized multi-CPU reference, but related GPU-based work as well. Finally, a GPU-accelerated graphical EEG real-time source localization software was implemented. Thanks to acceleration, it can meet real-time requirements in unpreceeded anatomical detail running more complex localization algorithms. Additionally, a novel implementation to extract anatomical priors from static Magnetic Resonance (MR) scansions has been included.
Resumo:
Next generation electronic devices have to guarantee high performance while being less power-consuming and highly reliable for several application domains ranging from the entertainment to the business. In this context, multicore platforms have proven the most efficient design choice but new challenges have to be faced. The ever-increasing miniaturization of the components produces unexpected variations on technological parameters and wear-out characterized by soft and hard errors. Even though hardware techniques, which lend themselves to be applied at design time, have been studied with the objective to mitigate these effects, they are not sufficient; thus software adaptive techniques are necessary. In this thesis we focus on multicore task allocation strategies to minimize the energy consumption while meeting performance constraints. We firstly devise a technique based on an Integer Linear Problem formulation which provides the optimal solution but cannot be applied on-line since the algorithm it needs is time-demanding; then we propose a sub-optimal technique based on two steps which can be applied on-line. We demonstrate the effectiveness of the latter solution through an exhaustive comparison against the optimal solution, state-of-the-art policies, and variability-agnostic task allocations by running multimedia applications on the virtual prototype of a next generation industrial multicore platform. We also face the problem of the performance and lifetime degradation. We firstly focus on embedded multicore platforms and propose an idleness distribution policy that increases core expected lifetimes by duty cycling their activity; then, we investigate the use of micro thermoelectrical coolers in general-purpose multicore processors to control the temperature of the cores at runtime with the objective of meeting lifetime constraints without performance loss.
Resumo:
We demonstrate a multicore multidopant fiber which, when pumped with a single pump source around ∼800 nm , emits a more than one octave-spanning fluorescence spectrum ranging from 925 to 2300 nm . The fiber preform is manufactured from granulated oxides and the individual cores are doped with five different rare earths, i.e., Nd3+ , Yb3+ , Er3+ , Ho3+ , and Tm3+ .