988 resultados para Multi-core


Relevância:

60.00% 60.00%

Publicador:

Resumo:

The traffic carried by core optical networks grows at a steady but remarkable pace of 30-40% year-over-year. Optical transmissions and networking advancements continue to satisfy the traffic requirements by delivering the content over the network infrastructure in a cost and energy efficient manner. Such core optical networks serve the information traffic demands in a dynamic way, in response to requirements for shifting of traffics demands, both temporally (day/night) and spatially (business district/residential). However as we are approaching fundamental spectral efficiency limits of singlemode fibers, the scientific community is pursuing recently the development of an innovative, all-optical network architecture introducing the spatial degree of freedom when designing/operating future transport networks. Spacedivision- multiplexing through the use of bundled single mode fibers, and/or multi-core fibers and/or few-mode fibers can offer up to 100-fold capacity increase in future optical networks. The EU INSPACE project is working on the development of a complete spatial-spectral flexible optical networking solution, offering the network ultra-high capacity, flexibility and energy efficiency required to meet the challenges of delivering exponentially growing traffic demands in the internet over the next twenty years. In this paper we will present the motivation and main research activities of the INSPACE consortium towards the realization of the overall project solution. © 2014 Copyright SPIE.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Since the development of large scale power grid interconnections and power markets, research on available transfer capability (ATC) has attracted great attention. The challenges for accurate assessment of ATC originate from the numerous uncertainties in electricity generation, transmission, distribution and utilization sectors. Power system uncertainties can be mainly described as two types: randomness and fuzziness. However, the traditional transmission reliability margin (TRM) approach only considers randomness. Based on credibility theory, this paper firstly built models of generators, transmission lines and loads according to their features of both randomness and fuzziness. Then a random fuzzy simulation is applied, along with a novel method proposed for ATC assessment, in which both randomness and fuzziness are considered. The bootstrap method and multi-core parallel computing technique are introduced to enhance the processing speed. By implementing simulation for the IEEE-30-bus system and a real-life system located in Northwest China, the viability of the models and the proposed method is verified.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In this talk we will review some of the key enabling technologies of optical communications and potential future bottlenecks. Single mode fibre (SMF) has long been the preferred waveguide for long distance communication. This is largely due to low loss, low cost and relative linearity over a wide bandwidth. As capacity demands have grown SMF has largely been able to keep pace with demand. Several groups have been identifying the possibility of exhausting the bandwidth provided by SMF [1,2,3]. This so called “capacity-crunch” has potentially vast economic and social consequences and will be discussed in detail. As demand grows optical power launched into the fibre has the potential to cause nonlinearities that can be detrimental to transmission. There has been considerable work done on identifying this nonlinear limit [4, 5] with a strong re- search interest currently on the topic of nonlinear compensation [6, 7]. Embracing and compensating for nonlinear transmission is one potential solution that may extend the lifetime of the current waveguide technology. However, at sufficiently high powers the waveguide will fail due to heat-induced mechanical failure. Moving forward it be- comes necessary to address the waveguide itself with several promising contenders discussed, including few-mode fibre and multi-core fibre.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Catering to society's demand for high performance computing, billions of transistors are now integrated on IC chips to deliver unprecedented performances. With increasing transistor density, the power consumption/density is growing exponentially. The increasing power consumption directly translates to the high chip temperature, which not only raises the packaging/cooling costs, but also degrades the performance/reliability and life span of the computing systems. Moreover, high chip temperature also greatly increases the leakage power consumption, which is becoming more and more significant with the continuous scaling of the transistor size. As the semiconductor industry continues to evolve, power and thermal challenges have become the most critical challenges in the design of new generations of computing systems. ^ In this dissertation, we addressed the power/thermal issues from the system-level perspective. Specifically, we sought to employ real-time scheduling methods to optimize the power/thermal efficiency of the real-time computing systems, with leakage/ temperature dependency taken into consideration. In our research, we first explored the fundamental principles on how to employ dynamic voltage scaling (DVS) techniques to reduce the peak operating temperature when running a real-time application on a single core platform. We further proposed a novel real-time scheduling method, “M-Oscillations” to reduce the peak temperature when scheduling a hard real-time periodic task set. We also developed three checking methods to guarantee the feasibility of a periodic real-time schedule under peak temperature constraint. We further extended our research from single core platform to multi-core platform. We investigated the energy estimation problem on the multi-core platforms and developed a light weight and accurate method to calculate the energy consumption for a given voltage schedule on a multi-core platform. Finally, we concluded the dissertation with elaborated discussions of future extensions of our research. ^

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In this paper we advocate the Loop-of-stencil-reduce pattern as a way to simplify the parallel programming of heterogeneous platforms (multicore+GPUs). Loop-of-Stencil-reduce is general enough to subsume map, reduce, map-reduce, stencil, stencil-reduce, and, crucially, their usage in a loop. It transparently targets (by using OpenCL) combinations of CPU cores and GPUs, and it makes it possible to simplify the deployment of a single stencil computation kernel on different GPUs. The paper discusses the implementation of Loop-of-stencil-reduce within the FastFlow parallel framework, considering a simple iterative data-parallel application as running example (Game of Life) and a highly effective parallel filter for visual data restoration to assess performance. Thanks to the high-level design of the Loop-of-stencil-reduce, it was possible to run the filter seamlessly on a multicore machine, on multi-GPUs, and on both.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We advocate the Loop-of-stencil-reduce pattern as a means of simplifying the implementation of data-parallel programs on heterogeneous multi-core platforms. Loop-of-stencil-reduce is general enough to subsume map, reduce, map-reduce, stencil, stencil-reduce, and, crucially, their usage in a loop in both data-parallel and streaming applications, or a combination of both. The pattern makes it possible to deploy a single stencil computation kernel on different GPUs. We discuss the implementation of Loop-of-stencil-reduce in FastFlow, a framework for the implementation of applications based on the parallel patterns. Experiments are presented to illustrate the use of Loop-of-stencil-reduce in developing data-parallel kernels running on heterogeneous systems.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Due to the growth of design size and complexity, design verification is an important aspect of the Logic Circuit development process. The purpose of verification is to validate that the design meets the system requirements and specification. This is done by either functional or formal verification. The most popular approach to functional verification is the use of simulation based techniques. Using models to replicate the behaviour of an actual system is called simulation. In this thesis, a software/data structure architecture without explicit locks is proposed to accelerate logic gate circuit simulation. We call thus system ZSIM. The ZSIM software architecture simulator targets low cost SIMD multi-core machines. Its performance is evaluated on the Intel Xeon Phi and 2 other machines (Intel Xeon and AMD Opteron). The aim of these experiments is to: • Verify that the data structure used allows SIMD acceleration, particularly on machines with gather instructions ( section 5.3.1). • Verify that, on sufficiently large circuits, substantial gains could be made from multicore parallelism ( section 5.3.2 ). • Show that a simulator using this approach out-performs an existing commercial simulator on a standard workstation ( section 5.3.3 ). • Show that the performance on a cheap Xeon Phi card is competitive with results reported elsewhere on much more expensive super-computers ( section 5.3.5 ). To evaluate the ZSIM, two types of test circuits were used: 1. Circuits from the IWLS benchmark suit [1] which allow direct comparison with other published studies of parallel simulators.2. Circuits generated by a parametrised circuit synthesizer. The synthesizer used an algorithm that has been shown to generate circuits that are statistically representative of real logic circuits. The synthesizer allowed testing of a range of very large circuits, larger than the ones for which it was possible to obtain open source files. The experimental results show that with SIMD acceleration and multicore, ZSIM gained a peak parallelisation factor of 300 on Intel Xeon Phi and 11 on Intel Xeon. With only SIMD enabled, ZSIM achieved a maximum parallelistion gain of 10 on Intel Xeon Phi and 4 on Intel Xeon. Furthermore, it was shown that this software architecture simulator running on a SIMD machine is much faster than, and can handle much bigger circuits than a widely used commercial simulator (Xilinx) running on a workstation. The performance achieved by ZSIM was also compared with similar pre-existing work on logic simulation targeting GPUs and supercomputers. It was shown that ZSIM simulator running on a Xeon Phi machine gives comparable simulation performance to the IBM Blue Gene supercomputer at very much lower cost. The experimental results have shown that the Xeon Phi is competitive with simulation on GPUs and allows the handling of much larger circuits than have been reported for GPU simulation. When targeting Xeon Phi architecture, the automatic cache management of the Xeon Phi, handles and manages the on-chip local store without any explicit mention of the local store being made in the architecture of the simulator itself. However, targeting GPUs, explicit cache management in program increases the complexity of the software architecture. Furthermore, one of the strongest points of the ZSIM simulator is its portability. Note that the same code was tested on both AMD and Xeon Phi machines. The same architecture that efficiently performs on Xeon Phi, was ported into a 64 core NUMA AMD Opteron. To conclude, the two main achievements are restated as following: The primary achievement of this work was proving that the ZSIM architecture was faster than previously published logic simulators on low cost platforms. The secondary achievement was the development of a synthetic testing suite that went beyond the scale range that was previously publicly available, based on prior work that showed the synthesis technique is valid.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

As the semiconductor industry struggles to maintain its momentum down the path following the Moore's Law, three dimensional integrated circuit (3D IC) technology has emerged as a promising solution to achieve higher integration density, better performance, and lower power consumption. However, despite its significant improvement in electrical performance, 3D IC presents several serious physical design challenges. In this dissertation, we investigate physical design methodologies for 3D ICs with primary focus on two areas: low power 3D clock tree design, and reliability degradation modeling and management. Clock trees are essential parts for digital system which dissipate a large amount of power due to high capacitive loads. The majority of existing 3D clock tree designs focus on minimizing the total wire length, which produces sub-optimal results for power optimization. In this dissertation, we formulate a 3D clock tree design flow which directly optimizes for clock power. Besides, we also investigate the design methodology for clock gating a 3D clock tree, which uses shutdown gates to selectively turn off unnecessary clock activities. Different from the common assumption in 2D ICs that shutdown gates are cheap thus can be applied at every clock node, shutdown gates in 3D ICs introduce additional control TSVs, which compete with clock TSVs for placement resources. We explore the design methodologies to produce the optimal allocation and placement for clock and control TSVs so that the clock power is minimized. We show that the proposed synthesis flow saves significant clock power while accounting for available TSV placement area. Vertical integration also brings new reliability challenges including TSV's electromigration (EM) and several other reliability loss mechanisms caused by TSV-induced stress. These reliability loss models involve complex inter-dependencies between electrical and thermal conditions, which have not been investigated in the past. In this dissertation we set up an electrical/thermal/reliability co-simulation framework to capture the transient of reliability loss in 3D ICs. We further derive and validate an analytical reliability objective function that can be integrated into the 3D placement design flow. The reliability aware placement scheme enables co-design and co-optimization of both the electrical and reliability property, thus improves both the circuit's performance and its lifetime. Our electrical/reliability co-design scheme avoids unnecessary design cycles or application of ad-hoc fixes that lead to sub-optimal performance. Vertical integration also enables stacking DRAM on top of CPU, providing high bandwidth and short latency. However, non-uniform voltage fluctuation and local thermal hotspot in CPU layers are coupled into DRAM layers, causing a non-uniform bit-cell leakage (thereby bit flip) distribution. We propose a performance-power-resilience simulation framework to capture DRAM soft error in 3D multi-core CPU systems. In addition, a dynamic resilience management (DRM) scheme is investigated, which adaptively tunes CPU's operating points to adjust DRAM's voltage noise and thermal condition during runtime. The DRM uses dynamic frequency scaling to achieve a resilience borrow-in strategy, which effectively enhances DRAM's resilience without sacrificing performance. The proposed physical design methodologies should act as important building blocks for 3D ICs and push 3D ICs toward mainstream acceptance in the near future.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In the multi-core CPU world, transactional memory (TM)has emerged as an alternative to lock-based programming for thread synchronization. Recent research proposes the use of TM in GPU architectures, where a high number of computing threads, organized in SIMT fashion, requires an effective synchronization method. In contrast to CPUs, GPUs offer two memory spaces: global memory and local memory. The local memory space serves as a shared scratch-pad for a subset of the computing threads, and it is used by programmers to speed-up their applications thanks to its low latency. Prior work from the authors proposed a lightweight hardware TM (HTM) support based in the local memory, modifying the SIMT execution model and adding a conflict detection mechanism. An efficient implementation of these features is key in order to provide an effective synchronization mechanism at the local memory level. After a quick description of the main features of our HTM design for GPU local memory, in this work we gather together a number of proposals designed with the aim of improving those mechanisms with high impact on performance. Firstly, the SIMT execution model is modified to increase the parallelism of the application when transactions must be serialized in order to make forward progress. Secondly, the conflict detection mechanism is optimized depending on application characteristics, such us the read/write sets, the probability of conflict between transactions and the existence of read-only transactions. As these features can be present in hardware simultaneously, it is a task of the compiler and runtime to determine which ones are more important for a given application. This work includes a discussion on the analysis to be done in order to choose the best configuration solution.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Hardware vendors make an important effort creating low-power CPUs that keep battery duration and durability above acceptable levels. In order to achieve this goal and provide good performance-energy for a wide variety of applications, ARM designed the big.LITTLE architecture. This heterogeneous multi-core architecture features two different types of cores: big cores oriented to performance and little cores, slower and aimed to save energy consumption. As all the cores have access to the same memory, multi-threaded applications must resort to some mutual exclusion mechanism to coordinate the access to shared data by the concurrent threads. Transactional Memory (TM) represents an optimistic approach for shared-memory synchronization. To take full advantage of the features offered by software TM, but also benefit from the characteristics of the heterogeneous big.LITTLE architectures, our focus is to propose TM solutions that take into account the power/performance requirements of the application and what it is offered by the architecture. In order to understand the current state-of-the-art and obtain useful information for future power-aware software TM solutions, we have performed an analysis of a popular TM library running on top of an ARM big.LITTLE processor. Experiments show, in general, better scalability for the LITTLE cores for most of the applications except for one, which requires the computing performance that the big cores offer.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Catering to society’s demand for high performance computing, billions of transistors are now integrated on IC chips to deliver unprecedented performances. With increasing transistor density, the power consumption/density is growing exponentially. The increasing power consumption directly translates to the high chip temperature, which not only raises the packaging/cooling costs, but also degrades the performance/reliability and life span of the computing systems. Moreover, high chip temperature also greatly increases the leakage power consumption, which is becoming more and more significant with the continuous scaling of the transistor size. As the semiconductor industry continues to evolve, power and thermal challenges have become the most critical challenges in the design of new generations of computing systems. In this dissertation, we addressed the power/thermal issues from the system-level perspective. Specifically, we sought to employ real-time scheduling methods to optimize the power/thermal efficiency of the real-time computing systems, with leakage/ temperature dependency taken into consideration. In our research, we first explored the fundamental principles on how to employ dynamic voltage scaling (DVS) techniques to reduce the peak operating temperature when running a real-time application on a single core platform. We further proposed a novel real-time scheduling method, “M-Oscillations” to reduce the peak temperature when scheduling a hard real-time periodic task set. We also developed three checking methods to guarantee the feasibility of a periodic real-time schedule under peak temperature constraint. We further extended our research from single core platform to multi-core platform. We investigated the energy estimation problem on the multi-core platforms and developed a light weight and accurate method to calculate the energy consumption for a given voltage schedule on a multi-core platform. Finally, we concluded the dissertation with elaborated discussions of future extensions of our research.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Multislice computed tomography (MSCT) for the noninvasive detection of coronary artery stenoses is a promising candidate for widespread clinical application because of its non-invasive nature and high sensitivity and negative predictive value as found in several previous studies using 16 to 64 simultaneous detector rows. A multi-centre study of CT coronary angiography using 16 simultaneous detector rows has shown that 16-slice CT is limited by a high number of nondiagnostic cases and a high false-positive rate. A recent meta-analysis indicated a significant interaction between the size of the study sample and the diagnostic odds ratios suggestive of small study bias, highlighting the importance of evaluating MSCT using 64 simultaneous detector rows in a multi-centre approach with a larger sample size. In this manuscript we detail the objectives and methods of the prospective ""CORE-64"" trial (""Coronary Evaluation Using Multidetector Spiral Computed Tomography Angiography using 64 Detectors""). This multi-centre trial was unique in that it assessed the diagnostic performance of 64-slice CT coronary angiography in nine centres worldwide in comparison to conventional coronary angiography. In conclusion, the multi-centre, multi-institutional and multi-continental trial CORE-64 has great potential to ultimately assess the per-patient diagnostic performance of coronary CT angiography using 64 simultaneous detector rows.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

En aquest treball mostrem que, a diferència del cas bilateral, per als mercats multilaterals d'assignació coneguts amb el nom de Böhm-Bawerk assignment games, el nucleolus i el core-center, i. e. el centre de masses del core, no coincideixen en general. Per demostrar-ho provem que donant un m-sided Böhm-Bawerk assignment game les dues solucions anteriors poden obtenir-se respectivament del nucleolus i el core-center d'un joc convex definit en el conjunt format pels m sectors. Encara més, provem que per calcular el nucleolus d'aquest últim joc només les coalicions formades per un jugador o m-1 jugadors són importants. Aquests resultats simplifiquen el càlcul del nucleolus d'un multi-sided ¿¿ohm-Bawerk assignment market amb un número molt elevat d'agents.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

En aquest treball mostrem que, a diferència del cas bilateral, per als mercats multilaterals d'assignació coneguts amb el nom de Böhm-Bawerk assignment games, el nucleolus i el core-center, i. e. el centre de masses del core, no coincideixen en general. Per demostrar-ho provem que donant un m-sided Böhm-Bawerk assignment game les dues solucions anteriors poden obtenir-se respectivament del nucleolus i el core-center d'un joc convex definit en el conjunt format pels m sectors. Encara més, provem que per calcular el nucleolus d'aquest últim joc només les coalicions formades per un jugador o m-1 jugadors són importants. Aquests resultats simplifiquen el càlcul del nucleolus d'un multi-sided ¿¿ohm-Bawerk assignment market amb un número molt elevat d'agents.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

[cat] En aquest treball introduïm la classe de "multi-sided Böhm-Bawerk assignment games", que generalitza la coneguda classe de jocs d’assignació de Böhm-Bawerk bilaterals a situacions amb un nombre arbitrari de sectors. Trobem els extrems del core de qualsevol multi-sided Böhm-Bawerk assignment game a partir d’un joc convex definit en el conjunt de sectors enlloc del conjunt de venedors i compradors. Addicionalment estudiem quan el core d’aquests jocs d’assignació és estable en el sentit de von Neumann-Morgenstern.