986 resultados para Multi-core processor


Relevância:

100.00% 100.00%

Publicador:

Resumo:

The efficient emulation of a many-core architecture is a challenging task, each core could be emulated through a dedicated thread and such threads would be interleaved on an either single-core or a multi-core processor. The high number of context switches will results in an unacceptable performance. To support this kind of application, the GPU computational power is exploited in order to schedule the emulation threads on the GPU cores. This presents a non trivial divergence issue, since GPU computational power is offered through SIMD processing elements, that are forced to synchronously execute the same instruction on different memory portions. Thus, a new emulation technique is introduced in order to overcome this limitation: instead of providing a routine for each ISA opcode, the emulator mimics the behavior of the Micro Architecture level, here instructions are date that a unique routine takes as input. Our new technique has been implemented and compared with the classic emulation approach, in order to investigate the chance of a hybrid solution.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Modern embedded systems embrace many-core shared-memory designs. Due to constrained power and area budgets, most of them feature software-managed scratchpad memories instead of data caches to increase the data locality. It is therefore programmers’ responsibility to explicitly manage the memory transfers, and this make programming these platform cumbersome. Moreover, complex modern applications must be adequately parallelized before they can the parallel potential of the platform into actual performance. To support this, programming languages were proposed, which work at a high level of abstraction, and rely on a runtime whose cost hinders performance, especially in embedded systems, where resources and power budget are constrained. This dissertation explores the applicability of the shared-memory paradigm on modern many-core systems, focusing on the ease-of-programming. It focuses on OpenMP, the de-facto standard for shared memory programming. In a first part, the cost of algorithms for synchronization and data partitioning are analyzed, and they are adapted to modern embedded many-cores. Then, the original design of an OpenMP runtime library is presented, which supports complex forms of parallelism such as multi-level and irregular parallelism. In the second part of the thesis, the focus is on heterogeneous systems, where hardware accelerators are coupled to (many-)cores to implement key functional kernels with orders-of-magnitude of speedup and energy efficiency compared to the “pure software” version. However, three main issues rise, namely i) platform design complexity, ii) architectural scalability and iii) programmability. To tackle them, a template for a generic hardware processing unit (HWPU) is proposed, which share the memory banks with cores, and the template for a scalable architecture is shown, which integrates them through the shared-memory system. Then, a full software stack and toolchain are developed to support platform design and to let programmers exploiting the accelerators of the platform. The OpenMP frontend is extended to interact with it.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This thesis deals with heterogeneous architectures in standard workstations. Heterogeneous architectures represent an appealing alternative to traditional supercomputers because they are based on commodity components fabricated in large quantities. Hence their price-performance ratio is unparalleled in the world of high performance computing (HPC). In particular, different aspects related to the performance and consumption of heterogeneous architectures have been explored. The thesis initially focuses on an efficient implementation of a parallel application, where the execution time is dominated by an high number of floating point instructions. Then the thesis touches the central problem of efficient management of power peaks in heterogeneous computing systems. Finally it discusses a memory-bounded problem, where the execution time is dominated by the memory latency. Specifically, the following main contributions have been carried out: A novel framework for the design and analysis of solar field for Central Receiver Systems (CRS) has been developed. The implementation based on desktop workstation equipped with multiple Graphics Processing Units (GPUs) is motivated by the need to have an accurate and fast simulation environment for studying mirror imperfection and non-planar geometries. Secondly, a power-aware scheduling algorithm on heterogeneous CPU-GPU architectures, based on an efficient distribution of the computing workload to the resources, has been realized. The scheduler manages the resources of several computing nodes with a view to reducing the peak power. The two main contributions of this work follow: the approach reduces the supply cost due to high peak power whilst having negligible impact on the parallelism of computational nodes. from another point of view the developed model allows designer to increase the number of cores without increasing the capacity of the power supply unit. Finally, an implementation for efficient graph exploration on reconfigurable architectures is presented. The purpose is to accelerate graph exploration, reducing the number of random memory accesses.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We present in this paper several contributions on the collision detection optimization centered on hardware performance. We focus on the broad phase which is the first step of the collision detection process and propose three new ways of parallelization of the well-known Sweep and Prune algorithm. We first developed a multi-core model takes into account the number of available cores. Multi-core architecture enables us to distribute geometric computations with use of multi-threading. Critical writing section and threads idling have been minimized by introducing new data structures for each thread. Programming with directives, like OpenMP, appears to be a good compromise for code portability. We then proposed a new GPU-based algorithm also based on the "Sweep and Prune" that has been adapted to multi-GPU architectures. Our technique is based on a spatial subdivision method used to distribute computations among GPUs. Results show that significant speed-up can be obtained by passing from 1 to 4 GPUs in a large-scale environment.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Single- and multi-core passive and active germanate and tellurite glass fibers represent a new class of fiber host for in-fiber photonics devices and applications in mid-IR wavelength range, which are in increasing demand. Fiber Bragg grating (FBG) structures have been proven as one of the most functional in-fiber devices and have been mass-produced in silicate fibers by UV-inscription for almost countless laser and sensor applications. However, because of the strong UV absorption in germanate and tellurite fibers, FBG structures cannot be produced by UVinscription. In recent years femtosecond (fs) lasers have been developed for laser machining and microstructuring in a variety of glass fibers and planar substrates. A number of papers have been reported on fabrication of FBGs and long-period gratings in optical fibers and also on the photosensitivity mechanism using 800nm fs lasers. In this paper, we demonstrate for the first time the fabrication of FBG structures created in passive and active single- and three-core germanate and tellurite glass fibers by using 800nm fs-inscription and phase mask technique. With a fs peak power intensity in the order of 1011W/cm2, the FBG spectra with 2nd and 3rd order resonances at 1540nm and 1033nm in a single-core germanate glass fiber and 2nd order resonances between ~1694nm and ~1677nm with strengths up to 14dB in all three cores of three-core passive and active tellurite fibers were observed. Thermal and strain properties of the FBGs made in these mid-IR glass fibers were characterized, showing an average temperature responsivity of ~20pm/°C and a strain sensitivity of 1.219±0.003pm/µe.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

For the first time, Fiber Bragg grating (FBG) structures have been inscribed in single-core passive germanate and three-core passive and active tellurite glass fibers using 800nm femtosecond (fs) laser and phase mask technique. With fs peak power intensity in the order of 1011W/cm2, the FBG spectra with 2nd and 3rd order resonances at 1540 and 1033nm in the germanate glass fiber and 2nd order resonances at ~1694 and ~1677nm with strengths up to 14dB in all three cores in the tellurite fiber were observed. Thermal responsivities of the FBGs made in these mid-IR glass fibers were characterized, showing average temperature responsivity ~20pm/°C. Strain responsivities of the FBGs in germanate glass fiber were measured to be 1.219pm/µe.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Single- and multi-core passive and active germanate and tellurite glass fibers represent a new class of fiber host for in-fiber photonics devices and applications in mid-IR wavelength range, which are in increasing demand. Fiber Bragg grating (FBG) structures have been proven as one of the most functional in-fiber devices and have been mass-produced in silicate fibers by UV-inscription for almost countless laser and sensor applications. However, because of the strong UV absorption in germanate and tellurite fibers, FBG structures cannot be produced by UVinscription. In recent years femtosecond (fs) lasers have been developed for laser machining and microstructuring in a variety of glass fibers and planar substrates. A number of papers have been reported on fabrication of FBGs and long-period gratings in optical fibers and also on the photosensitivity mechanism using 800nm fs lasers. In this paper, we demonstrate for the first time the fabrication of FBG structures created in passive and active single- and three-core germanate and tellurite glass fibers by using 800nm fs-inscription and phase mask technique. With a fs peak power intensity in the order of 1011W/cm2, the FBG spectra with 2nd and 3rd order resonances at 1540nm and 1033nm in a single-core germanate glass fiber and 2nd order resonances between ~1694nm and ~1677nm with strengths up to 14dB in all three cores of three-core passive and active tellurite fibers were observed. Thermal and strain properties of the FBGs made in these mid-IR glass fibers were characterized, showing an average temperature responsivity of ~20pm/°C and a strain sensitivity of 1.219±0.003pm/µe.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

For the first time, Fiber Bragg grating (FBG) structures have been inscribed in single-core passive germanate and three-core passive and active tellurite glass fibers using 800 nm femtosecond (fs) laser and phase mask technique. With fs peak power intensity in the order of 10(11)W/cm(2), the FBG spectra with 2nd and 3rd order resonances at 1540 and 1033 nm in the germanate glass fiber and 2nd order resonances at approximately 1694 and approximately 1677 nm with strengths up to 14 dB in all three cores in the tellurite fiber were observed. Thermal responsivities of the FBGs made in these mid-IR glass fibers were characterized, showing average temperature responsivity approximately 20 pm/ degrees C. Strain responsivities of the FBGs in germanate glass fiber were measured to be 1.219 pm/microepsilon.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We demonstrate light pulse combining and pulse compression using a continuous-discrete nonlinear system implemented in a multi-core fiber (MCF). It is shown that the pulses initially injected into all of the cores of a ring MCF are combined by nonlinearity into a small number of cores with simultaneous pulse compression. We demonstrate the combining of 77% of the energy into one core with pulse compression over 14× in a 20-core MCF. We also demonstrate that a suggested scheme is insensitive to the phase perturbations. Nonlinear spatio-temporal pulse manipulation in multi-core fibers can be exploited for various applications, including pulse compression, switching, and combining.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Heterogeneous multi-core FPGAs contain different types of cores, which can improve efficiency when used with an effective online task scheduler. However, it is not easy to find the right cores for tasks when there are multiple objectives or dozens of cores. Inappropriate scheduling may cause hot spots which decrease the reliability of the chip. Given that, our research builds a simulating platform to evaluate all kinds of scheduling algorithms on a variety of architectures. On this platform, we provide an online scheduler which uses multi-objective evolutionary algorithm (EA). Comparing the EA and current algorithms such as Predictive Dynamic Thermal Management (PDTM) and Adaptive Temperature Threshold Dynamic Thermal Management (ATDTM), we find some drawbacks in previous work. First, current algorithms are overly dependent on manually set constant parameters. Second, those algorithms neglect optimization for heterogeneous architectures. Third, they use single-objective methods, or use linear weighting method to convert a multi-objective optimization into a single-objective optimization. Unlike other algorithms, the EA is adaptive and does not require resetting parameters when workloads switch from one to another. EAs also improve performance when used on heterogeneous architecture. A efficient Pareto front can be obtained with EAs for the purpose of multiple objectives.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

For the past several decades, we have experienced the tremendous growth, in both scale and scope, of real-time embedded systems, thanks largely to the advances in IC technology. However, the traditional approach to get performance boost by increasing CPU frequency has been a way of past. Researchers from both industry and academia are turning their focus to multi-core architectures for continuous improvement of computing performance. In our research, we seek to develop efficient scheduling algorithms and analysis methods in the design of real-time embedded systems on multi-core platforms. Real-time systems are the ones with the response time as critical as the logical correctness of computational results. In addition, a variety of stringent constraints such as power/energy consumption, peak temperature and reliability are also imposed to these systems. Therefore, real-time scheduling plays a critical role in design of such computing systems at the system level. We started our research by addressing timing constraints for real-time applications on multi-core platforms, and developed both partitioned and semi-partitioned scheduling algorithms to schedule fixed priority, periodic, and hard real-time tasks on multi-core platforms. Then we extended our research by taking temperature constraints into consideration. We developed a closed-form solution to capture temperature dynamics for a given periodic voltage schedule on multi-core platforms, and also developed three methods to check the feasibility of a periodic real-time schedule under peak temperature constraint. We further extended our research by incorporating the power/energy constraint with thermal awareness into our research problem. We investigated the energy estimation problem on multi-core platforms, and developed a computation efficient method to calculate the energy consumption for a given voltage schedule on a multi-core platform. In this dissertation, we present our research in details and demonstrate the effectiveness and efficiency of our approaches with extensive experimental results.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

É do conhecimento geral de que, hoje em dia, a tecnologia evolui rapidamente. São criadas novas arquitecturas para resolver determinadas limitações ou problemas. Por vezes, essa evolução é pacífica e não requer necessidade de adaptação e, por outras, essa evolução pode Implicar mudanças. As linguagens de programação são, desde sempre, o principal elo de comunicação entre o programador e o computador. Novas linguagens continuam a aparecer e outras estão sempre em desenvolvimento para se adaptarem a novos conceitos e paradigmas. Isto requer um esforço extra para o programador, que tem de estar sempre atento a estas mudanças. A Programação Visual pode ser uma solução para este problema. Exprimir funções como módulos que recebem determinado Input e retomam determinado output poderá ajudar os programadores espalhados pelo mundo, através da possibilidade de lhes dar uma margem para se abstraírem de pormenores de baixo nível relacionados com uma arquitectura específica. Esta tese não só mostra como combinar as capacidades do CeII/B.E. (que tem uma arquitectura multi­processador heterogénea) com o OpenDX (que tem um ambiente de programação visual), como também demonstra que tal pode ser feito sem grande perda de performance. ABSTRACT; lt is known that nowadays technology develops really fast. New architectures are created ln order to provide new solutions for different technology limitations and problems. Sometimes, this evolution is pacific and there is no need to adapt to new technologies, but things also may require a change every once ln a while. Programming languages have always been the communication bridge between the programmer and the computer. New ones keep coming and other ones keep improving ln order to adapt to new concepts and paradigms. This requires an extra-effort for the programmer, who always needs to be aware of these changes. Visual Programming may be a solution to this problem. Expressing functions as module boxes which receive determined Input and return determined output may help programmers across the world by giving them the possibility to abstract from specific low-level hardware issues. This thesis not only shows how the CeII/B.E. (which has a heterogeneous multi-core architecture) capabilities can be combined with OpenDX (which has a visual programming environment), but also demonstrates that lt can be done without losing much performance.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Quasi tutti i componenti del FIVR (regolatore di tensione Buck che fornisce l'alimentazione ai microprocessori multi-core) sono implementati sul die del SoC e quindi soffrono di problemi di affidabilità associati allo scaling della tecnologia microelettronica. In particolare, la variazione dei parametri di processo durante la fabbricazione e i guasti nei dispostivi di switching (circuiti aperti o cortocircuiti). Questa tesi si svolge in ambito di un progetto di ricerca in collaborazione con Intel Corporation, ed è stato sviluppato in due parti: Inizialmente è stato arricchito il lavoro di analisi dei guasti su FIVR, svolgendo un accurato studio su quelli che sono i principali effetti dell’invecchiamento sulle uscite dei regolatori di tensione integrati su chip. Successivamente è stato sviluppato uno schema di monitoraggio a basso costo in grado di rilevare gli effetti dei guasti più probabili del FIVR sul campo. Inoltre, lo schema sviluppato è in grado di rilevare, durante il tempo di vita del FIVR, gli effetti di invecchiamento che inducono un incorretto funzionamento del FIVR. Lo schema di monitoraggio è stato progettato in maniera tale che risulti self-checking nei confronti dei suoi guasti interni, questo per evitare che tali errori possano compromettere la corretta segnalazione di guasti sul FIVR.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

I moderni processori multi-core ad elevate prestazioni sono alimentati da regolatori di tensione integrati direttamente sul chip. Questi regolatori forniscono a ciascun power domain la tensione ottimale sulla base della sua attività, monitorata da una Power Control Unit. Questo consente da un lato di ottenere una riduzione dei consumi, dall'altro di avere un boost delle prestazioni in particolari contesti. Tali regolatori integrati sul die sono affetti da guasti e fenomeni di aging, che possono compromettere il corretto funzionamento del circuito. Questi problemi non sono tollerabili in contesti caratterizzati da esigenze di elevata reliability, come l'autonomous driving. Dunque, è stato sviluppato un monitor per rivelare on-line eventuali guasti che possono verificarsi durante il normale funzionamento sul campo. In caso di guasto il monitor è in grado di dare un'indicazione d'errore, che può essere utilizzata per attivare delle procedure di recovery. La soluzione proposta, basata su un approccio completamente differente rispetto a quello suggerito dallo standard ISO 26262, beneficia, rispetto a quest'ultima, di costi nettamente inferiori e prestazioni superiori. Il monitor può essere calibrato automaticamente per compensare le variazioni dei parametri di processo ed i fenomeni di aging che possono affliggere il monitor stesso. È stata verificata la self-checking ability del monitor rispetto a guasti di tipo transistor stuck-on, transistor stuck-open e bridging resistivo, risultando Totally Self-Checking rispetto all'insieme di guasti considerato.