936 resultados para CORE SYSTEMS
Resumo:
The use of efficient synchronization mechanisms is crucial for implementing fine grained parallel programs on modern shared cache multi-core architectures. In this paper we study this problem by considering Single-Producer/Single- Consumer (SPSC) coordination using unbounded queues. A novel unbounded SPSC algorithm capable of reducing the row synchronization latency and speeding up Producer-Consumer coordination is presented. The algorithm has been extensively tested on a shared-cache multi-core platform and a sketch proof of correctness is presented. The queues proposed have been used as basic building blocks to implement the FastFlow parallel framework, which has been demonstrated to offer very good performance for fine-grain parallel applications. © 2012 Springer-Verlag.
Resumo:
The recent technological advancements and market trends are causing an interesting phenomenon towards the convergence of High-Performance Computing (HPC) and Embedded Computing (EC) domains. On one side, new kinds of HPC applications are being required by markets needing huge amounts of information to be processed within a bounded amount of time. On the other side, EC systems are increasingly concerned with providing higher performance in real-time, challenging the performance capabilities of current architectures. The advent of next-generation many-core embedded platforms has the chance of intercepting this converging need for predictable high-performance, allowing HPC and EC applications to be executed on efficient and powerful heterogeneous architectures integrating general-purpose processors with many-core computing fabrics. To this end, it is of paramount importance to develop new techniques for exploiting the massively parallel computation capabilities of such platforms in a predictable way. P-SOCRATES will tackle this important challenge by merging leading research groups from the HPC and EC communities. The time-criticality and parallelisation challenges common to both areas will be addressed by proposing an integrated framework for executing workload-intensive applications with real-time requirements on top of next-generation commercial-off-the-shelf (COTS) platforms based on many-core accelerated architectures. The project will investigate new HPC techniques that fulfil real-time requirements. The main sources of indeterminism will be identified, proposing efficient mapping and scheduling algorithms, along with the associated timing and schedulability analysis, to guarantee the real-time and performance requirements of the applications.
Resumo:
Despite the several issues faced in the past, the evolutionary trend of silicon has kept its constant pace. Today an ever increasing number of cores is integrated onto the same die. Unfortunately, the extraordinary performance achievable by the many-core paradigm is limited by several factors. Memory bandwidth limitation, combined with inefficient synchronization mechanisms, can severely overcome the potential computation capabilities. Moreover, the huge HW/SW design space requires accurate and flexible tools to perform architectural explorations and validation of design choices. In this thesis we focus on the aforementioned aspects: a flexible and accurate Virtual Platform has been developed, targeting a reference many-core architecture. Such tool has been used to perform architectural explorations, focusing on instruction caching architecture and hybrid HW/SW synchronization mechanism. Beside architectural implications, another issue of embedded systems is considered: energy efficiency. Near Threshold Computing is a key research area in the Ultra-Low-Power domain, as it promises a tenfold improvement in energy efficiency compared to super-threshold operation and it mitigates thermal bottlenecks. The physical implications of modern deep sub-micron technology are severely limiting performance and reliability of modern designs. Reliability becomes a major obstacle when operating in NTC, especially memory operation becomes unreliable and can compromise system correctness. In the present work a novel hybrid memory architecture is devised to overcome reliability issues and at the same time improve energy efficiency by means of aggressive voltage scaling when allowed by workload requirements. Variability is another great drawback of near-threshold operation. The greatly increased sensitivity to threshold voltage variations in today a major concern for electronic devices. We introduce a variation-tolerant extension of the baseline many-core architecture. By means of micro-architectural knobs and a lightweight runtime control unit, the baseline architecture becomes dynamically tolerant to variations.
Resumo:
AIM: To assess survival rates and complications of root-filled teeth restored with or without post-and-core systems over a mean observation period of >or=4 years. METHODOLOGY: A total of 325 single- and multirooted teeth in 183 subjects treated in a private practice were root filled and restored with either a cast post-and-core or with a prefabricated titanium post and composite core. Root-filled teeth without post-retained restorations served as controls. The restored teeth served as abutments for single unit metal-ceramic or composite crowns or fixed bridges. Teeth supporting cantilever bridges, overdentures or telescopic crowns were excluded. RESULTS: Seventeen teeth in 17 subjects were lost to follow-up (17/325: 5.2%). The mean observation period was 5.2 +/- 1.8 (SD) years for restorations with titanium posts, 6.2 +/- 2.0 (SD) years for cast post-and-cores and 4.4 +/- 1.7 (SD) years for teeth without posts. Overall, 54% of build-ups included the incorporation of a titanium post and 26.5% the cementation of a cast post-and-core. The remaining 19.5% of the teeth were restored without intraradicular retention. The adjusted 5-year tooth survival rate amounted to 92.5% for teeth restored with titanium posts, to 97.1% for teeth restored with cast post-and-cores and to 94.3% for teeth without post restorations, respectively. The most frequent complications included root fracture (6.2%), recurrent caries (1.9%), post-treatment periradicular disease (1.6%) and loss of retention (1.3%). CONCLUSION: Provided that high-quality root canal treatment and restorative protocols are implemented, high survival and low complication rates of single- and multirooted root-filled teeth used as abutments for fixed restorations can be expected after a mean observation period of >or=4 years.
Resumo:
Aqueous core/polymer shell microcapsules with mommuclear and polynuclear core morphologies have been formed by internal phase separation from water-in-oil emulsions. The water-in-oil emulsions were prepared with the shell polymer dissolved in the aqueous phase by adding a low boiling point cosolvent. Subsequent removal of this cosolvent (by evaporation) leads to phase separation of the polymer and, if the spreading conditions are correct, formation of a polymer shell encapsulating the aqueous core. Poly(tetrahydrofuran) (PTHF) shell/aqueous core microcapsules, with a single (mononuclear) core, have been prepared, but the low T-g (-84 degreesC) of PTHF makes characterization of the particles more difficult. Poly(methyl methacrylate) and poly(isobutyl methacrylate) have higher T-g values (105 and 55 degreesC, respectively) and can be dissolved in water at sufficiently high acetone concentrations, but evaporation of the acetone from the emulsion droplets in these cases mostly resulted in polynuclear capsules, that is, having cores with many very small water droplets contained within the polymer matrix. Microcapsules with fewer, larger aqueous droplets in the core could be produced by reducing the rate of evaporation of the acetone. A possible mechanism for the formation of these polynuclear cores is suggested. These microcapsules were prepared dispersed in an oil-continuous phase. They could, however, be successfully transferred to a water-continuous phase, using a simple centrifugation technique. In this way, microcapsules with aqueous cores, dispersed in an aqueous medium, could be made. It would appear that a real challenge with the water-core systems, compared to the previous oil-core systems, is to obtain the correct order of magnitude of the three interfacial tensions, between the polymer, the aqueous phase, and the continuous oil phase; these control the spreading conditions necessary to produce shells rather than "acorns".
Resumo:
Real-time systems demand guaranteed and predictable run-time behaviour in order to ensure that no task has missed its deadline. Over the years we are witnessing an ever increasing demand for functionality enhancements in the embedded real-time systems. Along with the functionalities, the design itself grows more complex. Posed constraints, such as energy consumption, time, and space bounds, also require attention and proper handling. Additionally, efficient scheduling algorithms, as proven through analyses and simulations, often impose requirements that have significant run-time cost, specially in the context of multi-core systems. In order to further investigate the behaviour of such systems to quantify and compare these overheads involved, we have developed the SPARTS, a simulator of a generic embedded real- time device. The tasks in the simulator are described by externally visible parameters (e.g. minimum inter-arrival, sporadicity, WCET, BCET, etc.), rather than the code of the tasks. While our current implementation is primarily focused on our immediate needs in the area of power-aware scheduling, it is designed to be extensible to accommodate different task properties, scheduling algorithms and/or hardware models for the application in wide variety of simulations. The source code of the SPARTS is available for download at [1].
Resumo:
The financial crisis of 2007–2009 and the resultant pressures exerted on policymakers to prevent future crises have precipitated coordinated regulatory responses globally. A key focus of the new wave of regulation is to ensure the removal of practices now deemed problematic with new controls for conducting transactions and maintaining holdings. There is increasing pressure on organizations to retire manual processes and adopt core systems, such as Investment Management Systems (IMS). These systems facilitate trading and ensure transactions are compliant by transcribing regulatory requirements into automated rules and applying them to trades. The motivation of this study is to explore the extent to which such systems may enable the alteration of previously embedded practices. We researched implementations of an IMS at eight global financial organizations and found that overall the IMS encourages responsible trading through surveillance, monitoring and the automation of regulatory rules and that such systems are likely to become further embedded within financial organizations. We found evidence that some older practices persisted. Our study suggests that the institutionalization of technology-induced compliant behaviour is still uncertain.
Resumo:
This thesis explores the capabilities of heterogeneous multi-core systems, based on multiple Graphics Processing Units (GPUs) in a standard desktop framework. Multi-GPU accelerated desk side computers are an appealing alternative to other high performance computing (HPC) systems: being composed of commodity hardware components fabricated in large quantities, their price-performance ratio is unparalleled in the world of high performance computing. Essentially bringing “supercomputing to the masses”, this opens up new possibilities for application fields where investing in HPC resources had been considered unfeasible before. One of these is the field of bioelectrical imaging, a class of medical imaging technologies that occupy a low-cost niche next to million-dollar systems like functional Magnetic Resonance Imaging (fMRI). In the scope of this work, several computational challenges encountered in bioelectrical imaging are tackled with this new kind of computing resource, striving to help these methods approach their true potential. Specifically, the following main contributions were made: Firstly, a novel dual-GPU implementation of parallel triangular matrix inversion (TMI) is presented, addressing an crucial kernel in computation of multi-mesh head models of encephalographic (EEG) source localization. This includes not only a highly efficient implementation of the routine itself achieving excellent speedups versus an optimized CPU implementation, but also a novel GPU-friendly compressed storage scheme for triangular matrices. Secondly, a scalable multi-GPU solver for non-hermitian linear systems was implemented. It is integrated into a simulation environment for electrical impedance tomography (EIT) that requires frequent solution of complex systems with millions of unknowns, a task that this solution can perform within seconds. In terms of computational throughput, it outperforms not only an highly optimized multi-CPU reference, but related GPU-based work as well. Finally, a GPU-accelerated graphical EEG real-time source localization software was implemented. Thanks to acceleration, it can meet real-time requirements in unpreceeded anatomical detail running more complex localization algorithms. Additionally, a novel implementation to extract anatomical priors from static Magnetic Resonance (MR) scansions has been included.
Resumo:
Modern embedded systems embrace many-core shared-memory designs. Due to constrained power and area budgets, most of them feature software-managed scratchpad memories instead of data caches to increase the data locality. It is therefore programmers’ responsibility to explicitly manage the memory transfers, and this make programming these platform cumbersome. Moreover, complex modern applications must be adequately parallelized before they can the parallel potential of the platform into actual performance. To support this, programming languages were proposed, which work at a high level of abstraction, and rely on a runtime whose cost hinders performance, especially in embedded systems, where resources and power budget are constrained. This dissertation explores the applicability of the shared-memory paradigm on modern many-core systems, focusing on the ease-of-programming. It focuses on OpenMP, the de-facto standard for shared memory programming. In a first part, the cost of algorithms for synchronization and data partitioning are analyzed, and they are adapted to modern embedded many-cores. Then, the original design of an OpenMP runtime library is presented, which supports complex forms of parallelism such as multi-level and irregular parallelism. In the second part of the thesis, the focus is on heterogeneous systems, where hardware accelerators are coupled to (many-)cores to implement key functional kernels with orders-of-magnitude of speedup and energy efficiency compared to the “pure software” version. However, three main issues rise, namely i) platform design complexity, ii) architectural scalability and iii) programmability. To tackle them, a template for a generic hardware processing unit (HWPU) is proposed, which share the memory banks with cores, and the template for a scalable architecture is shown, which integrates them through the shared-memory system. Then, a full software stack and toolchain are developed to support platform design and to let programmers exploiting the accelerators of the platform. The OpenMP frontend is extended to interact with it.
Resumo:
During the last few decades an unprecedented technological growth has been at the center of the embedded systems design paramount, with Moore’s Law being the leading factor of this trend. Today in fact an ever increasing number of cores can be integrated on the same die, marking the transition from state-of-the-art multi-core chips to the new many-core design paradigm. Despite the extraordinarily high computing power, the complexity of many-core chips opens the door to several challenges. As a result of the increased silicon density of modern Systems-on-a-Chip (SoC), the design space exploration needed to find the best design has exploded and hardware designers are in fact facing the problem of a huge design space. Virtual Platforms have always been used to enable hardware-software co-design, but today they are facing with the huge complexity of both hardware and software systems. In this thesis two different research works on Virtual Platforms are presented: the first one is intended for the hardware developer, to easily allow complex cycle accurate simulations of many-core SoCs. The second work exploits the parallel computing power of off-the-shelf General Purpose Graphics Processing Units (GPGPUs), with the goal of an increased simulation speed. The term Virtualization can be used in the context of many-core systems not only to refer to the aforementioned hardware emulation tools (Virtual Platforms), but also for two other main purposes: 1) to help the programmer to achieve the maximum possible performance of an application, by hiding the complexity of the underlying hardware. 2) to efficiently exploit the high parallel hardware of many-core chips in environments with multiple active Virtual Machines. This thesis is focused on virtualization techniques with the goal to mitigate, and overtake when possible, some of the challenges introduced by the many-core design paradigm.
Resumo:
The development of mixed-criticality virtualized multi-core systems poses new challenges that are being subject of active research work. There is an additional complexity: it is now required to identify a set of partitions, and allocate applications to partitions. In this job, a number of issues have to be considered, such as the criticality level of the application, security and dependability requirements, time requirements granularity, etc. MultiPARTES [11] toolset relies on Model Driven Engineering (MDE), which is a suitable approach in this setting, as it helps to bridge the gap between design issues and partitioning concerns. MDE is changing the way systems are developed nowadays, reducing development time. In general, modelling approaches have shown their benefits when applied to embedded systems. These benefits have been achieved by fostering reuse with an intensive use of abstractions, or automating the generation of boiler-plate code.