13 resultados para Hardware Acceleration
em AMS Tesi di Dottorato - Alm@DL - Università di Bologna
Resumo:
The evolution of the electronics embedded applications forces electronics systems designers to match their ever increasing requirements. This evolution pushes the computational power of digital signal processing systems, as well as the energy required to accomplish the computations, due to the increasing mobility of such applications. Current approaches used to match these requirements relies on the adoption of application specific signal processors. Such kind of devices exploits powerful accelerators, which are able to match both performance and energy requirements. On the other hand, the too high specificity of such accelerators often results in a lack of flexibility which affects non-recurrent engineering costs, time to market, and market volumes too. The state of the art mainly proposes two solutions to overcome these issues with the ambition of delivering reasonable performance and energy efficiency: reconfigurable computing and multi-processors computing. All of these solutions benefits from the post-fabrication programmability, that definitively results in an increased flexibility. Nevertheless, the gap between these approaches and dedicated hardware is still too high for many application domains, especially when targeting the mobile world. In this scenario, flexible and energy efficient acceleration can be achieved by merging these two computational paradigms, in order to address all the above introduced constraints. This thesis focuses on the exploration of the design and application spectrum of reconfigurable computing, exploited as application specific accelerators for multi-processors systems on chip. More specifically, it introduces a reconfigurable digital signal processor featuring a heterogeneous set of reconfigurable engines, and a homogeneous multi-core system, exploiting three different flavours of reconfigurable and mask-programmable technologies as implementation platform for applications specific accelerators. In this work, the various trade-offs concerning the utilization multi-core platforms and the different configuration technologies are explored, characterizing the design space of the proposed approach in terms of programmability, performance, energy efficiency and manufacturing costs.
Resumo:
Galaxy clusters occupy a special position in the cosmic hierarchy as they are the largest bound structures in the Universe. There is now general agreement on a hierarchical picture for the formation of cosmic structures, in which galaxy clusters are supposed to form by accretion of matter and merging between smaller units. During merger events, shocks are driven by the gravity of the dark matter in the diffuse barionic component, which is heated up to the observed temperature. Radio and hard-X ray observations have discovered non-thermal components mixed with the thermal Intra Cluster Medium (ICM) and this is of great importance as it calls for a “revision” of the physics of the ICM. The bulk of present information comes from the radio observations which discovered an increasing number of Mpcsized emissions from the ICM, Radio Halos (at the cluster center) and Radio Relics (at the cluster periphery). These sources are due to synchrotron emission from ultra relativistic electrons diffusing through µG turbulent magnetic fields. Radio Halos are the most spectacular evidence of non-thermal components in the ICM and understanding the origin and evolution of these sources represents one of the most challenging goal of the theory of the ICM. Cluster mergers are the most energetic events in the Universe and a fraction of the energy dissipated during these mergers could be channelled into the amplification of the magnetic fields and into the acceleration of high energy particles via shocks and turbulence driven by these mergers. Present observations of Radio Halos (and possibly of hard X-rays) can be best interpreted in terms of the reacceleration scenario in which MHD turbulence injected during these cluster mergers re-accelerates high energy particles in the ICM. The physics involved in this scenario is very complex and model details are difficult to test, however this model clearly predicts some simple properties of Radio Halos (and resulting IC emission in the hard X-ray band) which are almost independent of the details of the adopted physics. In particular in the re-acceleration scenario MHD turbulence is injected and dissipated during cluster mergers and thus Radio Halos (and also the resulting hard X-ray IC emission) should be transient phenomena (with a typical lifetime <» 1 Gyr) associated with dynamically disturbed clusters. The physics of the re-acceleration scenario should produce an unavoidable cut-off in the spectrum of the re-accelerated electrons, which is due to the balance between turbulent acceleration and radiative losses. The energy at which this cut-off occurs, and thus the maximum frequency at which synchrotron radiation is produced, depends essentially on the efficiency of the acceleration mechanism so that observations at high frequencies are expected to catch only the most efficient phenomena while, in principle, low frequency radio surveys may found these phenomena much common in the Universe. These basic properties should leave an important imprint in the statistical properties of Radio Halos (and of non-thermal phenomena in general) which, however, have not been addressed yet by present modellings. The main focus of this PhD thesis is to calculate, for the first time, the expected statistics of Radio Halos in the context of the re-acceleration scenario. In particular, we shall address the following main questions: • Is it possible to model “self-consistently” the evolution of these sources together with that of the parent clusters? • How the occurrence of Radio Halos is expected to change with cluster mass and to evolve with redshift? How the efficiency to catch Radio Halos in galaxy clusters changes with the observing radio frequency? • How many Radio Halos are expected to form in the Universe? At which redshift is expected the bulk of these sources? • Is it possible to reproduce in the re-acceleration scenario the observed occurrence and number of Radio Halos in the Universe and the observed correlations between thermal and non-thermal properties of galaxy clusters? • Is it possible to constrain the magnetic field intensity and profile in galaxy clusters and the energetic of turbulence in the ICM from the comparison between model expectations and observations? Several astrophysical ingredients are necessary to model the evolution and statistical properties of Radio Halos in the context of re-acceleration model and to address the points given above. For these reason we deserve some space in this PhD thesis to review the important aspects of the physics of the ICM which are of interest to catch our goals. In Chapt. 1 we discuss the physics of galaxy clusters, and in particular, the clusters formation process; in Chapt. 2 we review the main observational properties of non-thermal components in the ICM; and in Chapt. 3 we focus on the physics of magnetic field and of particle acceleration in galaxy clusters. As a relevant application, the theory of Alfv´enic particle acceleration is applied in Chapt. 4 where we report the most important results from calculations we have done in the framework of the re-acceleration scenario. In this Chapter we show that a fraction of the energy of fluid turbulence driven in the ICM by the cluster mergers can be channelled into the injection of Alfv´en waves at small scales and that these waves can efficiently re-accelerate particles and trigger Radio Halos and hard X-ray emission. The main part of this PhD work, the calculation of the statistical properties of Radio Halos and non-thermal phenomena as expected in the context of the re-acceleration model and their comparison with observations, is presented in Chapts.5, 6, 7 and 8. In Chapt.5 we present a first approach to semi-analytical calculations of statistical properties of giant Radio Halos. The main goal of this Chapter is to model cluster formation, the injection of turbulence in the ICM and the resulting particle acceleration process. We adopt the semi–analytic extended Press & Schechter (PS) theory to follow the formation of a large synthetic population of galaxy clusters and assume that during a merger a fraction of the PdV work done by the infalling subclusters in passing through the most massive one is injected in the form of magnetosonic waves. Then the processes of stochastic acceleration of the relativistic electrons by these waves and the properties of the ensuing synchrotron (Radio Halos) and inverse Compton (IC, hard X-ray) emission of merging clusters are computed under the assumption of a constant rms average magnetic field strength in emitting volume. The main finding of these calculations is that giant Radio Halos are naturally expected only in the more massive clusters, and that the expected fraction of clusters with Radio Halos is consistent with the observed one. In Chapt. 6 we extend the previous calculations by including a scaling of the magnetic field strength with cluster mass. The inclusion of this scaling allows us to derive the expected correlations between the synchrotron radio power of Radio Halos and the X-ray properties (T, LX) and mass of the hosting clusters. For the first time, we show that these correlations, calculated in the context of the re-acceleration model, are consistent with the observed ones for typical µG strengths of the average B intensity in massive clusters. The calculations presented in this Chapter allow us to derive the evolution of the probability to form Radio Halos as a function of the cluster mass and redshift. The most relevant finding presented in this Chapter is that the luminosity functions of giant Radio Halos at 1.4 GHz are expected to peak around a radio power » 1024 W/Hz and to flatten (or cut-off) at lower radio powers because of the decrease of the electron re-acceleration efficiency in smaller galaxy clusters. In Chapt. 6 we also derive the expected number counts of Radio Halos and compare them with available observations: we claim that » 100 Radio Halos in the Universe can be observed at 1.4 GHz with deep surveys, while more than 1000 Radio Halos are expected to be discovered in the next future by LOFAR at 150 MHz. This is the first (and so far unique) model expectation for the number counts of Radio Halos at lower frequency and allows to design future radio surveys. Based on the results of Chapt. 6, in Chapt.7 we present a work in progress on a “revision” of the occurrence of Radio Halos. We combine past results from the NVSS radio survey (z » 0.05 − 0.2) with our ongoing GMRT Radio Halos Pointed Observations of 50 X-ray luminous galaxy clusters (at z » 0.2−0.4) and discuss the possibility to test our model expectations with the number counts of Radio Halos at z » 0.05 − 0.4. The most relevant limitation in the calculations presented in Chapt. 5 and 6 is the assumption of an “averaged” size of Radio Halos independently of their radio luminosity and of the mass of the parent clusters. This assumption cannot be released in the context of the PS formalism used to describe the formation process of clusters, while a more detailed analysis of the physics of cluster mergers and of the injection process of turbulence in the ICM would require an approach based on numerical (possible MHD) simulations of a very large volume of the Universe which is however well beyond the aim of this PhD thesis. On the other hand, in Chapt.8 we report our discovery of novel correlations between the size (RH) of Radio Halos and their radio power and between RH and the cluster mass within the Radio Halo region, MH. In particular this last “geometrical” MH − RH correlation allows us to “observationally” overcome the limitation of the “average” size of Radio Halos. Thus in this Chapter, by making use of this “geometrical” correlation and of a simplified form of the re-acceleration model based on the results of Chapt. 5 and 6 we are able to discuss expected correlations between the synchrotron power and the thermal cluster quantities relative to the radio emitting region. This is a new powerful tool of investigation and we show that all the observed correlations (PR − RH, PR − MH, PR − T, PR − LX, . . . ) now become well understood in the context of the re-acceleration model. In addition, we find that observationally the size of Radio Halos scales non-linearly with the virial radius of the parent cluster, and this immediately means that the fraction of the cluster volume which is radio emitting increases with cluster mass and thus that the non-thermal component in clusters is not self-similar.
Resumo:
ALICE, that is an experiment held at CERN using the LHC, is specialized in analyzing lead-ion collisions. ALICE will study the properties of quarkgluon plasma, a state of matter where quarks and gluons, under conditions of very high temperatures and densities, are no longer confined inside hadrons. Such a state of matter probably existed just after the Big Bang, before particles such as protons and neutrons were formed. The SDD detector, one of the ALICE subdetectors, is part of the ITS that is composed by 6 cylindrical layers with the innermost one attached to the beam pipe. The ITS tracks and identifies particles near the interaction point, it also aligns the tracks of the articles detected by more external detectors. The two ITS middle layers contain the whole 260 SDD detectors. A multichannel readout board, called CARLOSrx, receives at the same time the data coming from 12 SDD detectors. In total there are 24 CARLOSrx boards needed to read data coming from all the SDD modules (detector plus front end electronics). CARLOSrx packs data coming from the front end electronics through optical link connections, it stores them in a large data FIFO and then it sends them to the DAQ system. Each CARLOSrx is composed by two boards. One is called CARLOSrx data, that reads data coming from the SDD detectors and configures the FEE; the other one is called CARLOSrx clock, that sends the clock signal to all the FEE. This thesis contains a description of the hardware design and firmware features of both CARLOSrx data and CARLOSrx clock boards, which deal with all the SDD readout chain. A description of the software tools necessary to test and configure the front end electronics will be presented at the end of the thesis.
Resumo:
This work describes the development of a simulation tool which allows the simulation of the Internal Combustion Engine (ICE), the transmission and the vehicle dynamics. It is a control oriented simulation tool, designed in order to perform both off-line (Software In the Loop) and on-line (Hardware In the Loop) simulation. In the first case the simulation tool can be used in order to optimize Engine Control Unit strategies (as far as regard, for example, the fuel consumption or the performance of the engine), while in the second case it can be used in order to test the control system. In recent years the use of HIL simulations has proved to be very useful in developing and testing of control systems. Hardware In the Loop simulation is a technology where the actual vehicles, engines or other components are replaced by a real time simulation, based on a mathematical model and running in a real time processor. The processor reads ECU (Engine Control Unit) output signals which would normally feed the actuators and, by using mathematical models, provides the signals which would be produced by the actual sensors. The simulation tool, fully designed within Simulink, includes the possibility to simulate the only engine, the transmission and vehicle dynamics and the engine along with the vehicle and transmission dynamics, allowing in this case to evaluate the performance and the operating conditions of the Internal Combustion Engine, once it is installed on a given vehicle. Furthermore the simulation tool includes different level of complexity, since it is possible to use, for example, either a zero-dimensional or a one-dimensional model of the intake system (in this case only for off-line application, because of the higher computational effort). Given these preliminary remarks, an important goal of this work is the development of a simulation environment that can be easily adapted to different engine types (single- or multi-cylinder, four-stroke or two-stroke, diesel or gasoline) and transmission architecture without reprogramming. Also, the same simulation tool can be rapidly configured both for off-line and real-time application. The Matlab-Simulink environment has been adopted to achieve such objectives, since its graphical programming interface allows building flexible and reconfigurable models, and real-time simulation is possible with standard, off-the-shelf software and hardware platforms (such as dSPACE systems).
Resumo:
The term Ambient Intelligence (AmI) refers to a vision on the future of the information society where smart, electronic environment are sensitive and responsive to the presence of people and their activities (Context awareness). In an ambient intelligence world, devices work in concert to support people in carrying out their everyday life activities, tasks and rituals in an easy, natural way using information and intelligence that is hidden in the network connecting these devices. This promotes the creation of pervasive environments improving the quality of life of the occupants and enhancing the human experience. AmI stems from the convergence of three key technologies: ubiquitous computing, ubiquitous communication and natural interfaces. Ambient intelligent systems are heterogeneous and require an excellent cooperation between several hardware/software technologies and disciplines, including signal processing, networking and protocols, embedded systems, information management, and distributed algorithms. Since a large amount of fixed and mobile sensors embedded is deployed into the environment, the Wireless Sensor Networks is one of the most relevant enabling technologies for AmI. WSN are complex systems made up of a number of sensor nodes which can be deployed in a target area to sense physical phenomena and communicate with other nodes and base stations. These simple devices typically embed a low power computational unit (microcontrollers, FPGAs etc.), a wireless communication unit, one or more sensors and a some form of energy supply (either batteries or energy scavenger modules). WNS promises of revolutionizing the interactions between the real physical worlds and human beings. Low-cost, low-computational power, low energy consumption and small size are characteristics that must be taken into consideration when designing and dealing with WSNs. To fully exploit the potential of distributed sensing approaches, a set of challengesmust be addressed. Sensor nodes are inherently resource-constrained systems with very low power consumption and small size requirements which enables than to reduce the interference on the physical phenomena sensed and to allow easy and low-cost deployment. They have limited processing speed,storage capacity and communication bandwidth that must be efficiently used to increase the degree of local ”understanding” of the observed phenomena. A particular case of sensor nodes are video sensors. This topic holds strong interest for a wide range of contexts such as military, security, robotics and most recently consumer applications. Vision sensors are extremely effective for medium to long-range sensing because vision provides rich information to human operators. However, image sensors generate a huge amount of data, whichmust be heavily processed before it is transmitted due to the scarce bandwidth capability of radio interfaces. In particular, in video-surveillance, it has been shown that source-side compression is mandatory due to limited bandwidth and delay constraints. Moreover, there is an ample opportunity for performing higher-level processing functions, such as object recognition that has the potential to drastically reduce the required bandwidth (e.g. by transmitting compressed images only when something ‘interesting‘ is detected). The energy cost of image processing must however be carefully minimized. Imaging could play and plays an important role in sensing devices for ambient intelligence. Computer vision can for instance be used for recognising persons and objects and recognising behaviour such as illness and rioting. Having a wireless camera as a camera mote opens the way for distributed scene analysis. More eyes see more than one and a camera system that can observe a scene from multiple directions would be able to overcome occlusion problems and could describe objects in their true 3D appearance. In real-time, these approaches are a recently opened field of research. In this thesis we pay attention to the realities of hardware/software technologies and the design needed to realize systems for distributed monitoring, attempting to propose solutions on open issues and filling the gap between AmI scenarios and hardware reality. The physical implementation of an individual wireless node is constrained by three important metrics which are outlined below. Despite that the design of the sensor network and its sensor nodes is strictly application dependent, a number of constraints should almost always be considered. Among them: • Small form factor to reduce nodes intrusiveness. • Low power consumption to reduce battery size and to extend nodes lifetime. • Low cost for a widespread diffusion. These limitations typically result in the adoption of low power, low cost devices such as low powermicrocontrollers with few kilobytes of RAMand tenth of kilobytes of program memory with whomonly simple data processing algorithms can be implemented. However the overall computational power of the WNS can be very large since the network presents a high degree of parallelism that can be exploited through the adoption of ad-hoc techniques. Furthermore through the fusion of information from the dense mesh of sensors even complex phenomena can be monitored. In this dissertation we present our results in building several AmI applications suitable for a WSN implementation. The work can be divided into two main areas:Low Power Video Sensor Node and Video Processing Alghoritm and Multimodal Surveillance . Low Power Video Sensor Nodes and Video Processing Alghoritms In comparison to scalar sensors, such as temperature, pressure, humidity, velocity, and acceleration sensors, vision sensors generate much higher bandwidth data due to the two-dimensional nature of their pixel array. We have tackled all the constraints listed above and have proposed solutions to overcome the current WSNlimits for Video sensor node. We have designed and developed wireless video sensor nodes focusing on the small size and the flexibility of reuse in different applications. The video nodes target a different design point: the portability (on-board power supply, wireless communication), a scanty power budget (500mW),while still providing a prominent level of intelligence, namely sophisticated classification algorithmand high level of reconfigurability. We developed two different video sensor node: The device architecture of the first one is based on a low-cost low-power FPGA+microcontroller system-on-chip. The second one is based on ARM9 processor. Both systems designed within the above mentioned power envelope could operate in a continuous fashion with Li-Polymer battery pack and solar panel. Novel low power low cost video sensor nodes which, in contrast to sensors that just watch the world, are capable of comprehending the perceived information in order to interpret it locally, are presented. Featuring such intelligence, these nodes would be able to cope with such tasks as recognition of unattended bags in airports, persons carrying potentially dangerous objects, etc.,which normally require a human operator. Vision algorithms for object detection, acquisition like human detection with Support Vector Machine (SVM) classification and abandoned/removed object detection are implemented, described and illustrated on real world data. Multimodal surveillance: In several setup the use of wired video cameras may not be possible. For this reason building an energy efficient wireless vision network for monitoring and surveillance is one of the major efforts in the sensor network community. Energy efficiency for wireless smart camera networks is one of the major efforts in distributed monitoring and surveillance community. For this reason, building an energy efficient wireless vision network for monitoring and surveillance is one of the major efforts in the sensor network community. The Pyroelectric Infra-Red (PIR) sensors have been used to extend the lifetime of a solar-powered video sensor node by providing an energy level dependent trigger to the video camera and the wireless module. Such approach has shown to be able to extend node lifetime and possibly result in continuous operation of the node.Being low-cost, passive (thus low-power) and presenting a limited form factor, PIR sensors are well suited for WSN applications. Moreover techniques to have aggressive power management policies are essential for achieving long-termoperating on standalone distributed cameras needed to improve the power consumption. We have used an adaptive controller like Model Predictive Control (MPC) to help the system to improve the performances outperforming naive power management policies.
Resumo:
This thesis explores the capabilities of heterogeneous multi-core systems, based on multiple Graphics Processing Units (GPUs) in a standard desktop framework. Multi-GPU accelerated desk side computers are an appealing alternative to other high performance computing (HPC) systems: being composed of commodity hardware components fabricated in large quantities, their price-performance ratio is unparalleled in the world of high performance computing. Essentially bringing “supercomputing to the masses”, this opens up new possibilities for application fields where investing in HPC resources had been considered unfeasible before. One of these is the field of bioelectrical imaging, a class of medical imaging technologies that occupy a low-cost niche next to million-dollar systems like functional Magnetic Resonance Imaging (fMRI). In the scope of this work, several computational challenges encountered in bioelectrical imaging are tackled with this new kind of computing resource, striving to help these methods approach their true potential. Specifically, the following main contributions were made: Firstly, a novel dual-GPU implementation of parallel triangular matrix inversion (TMI) is presented, addressing an crucial kernel in computation of multi-mesh head models of encephalographic (EEG) source localization. This includes not only a highly efficient implementation of the routine itself achieving excellent speedups versus an optimized CPU implementation, but also a novel GPU-friendly compressed storage scheme for triangular matrices. Secondly, a scalable multi-GPU solver for non-hermitian linear systems was implemented. It is integrated into a simulation environment for electrical impedance tomography (EIT) that requires frequent solution of complex systems with millions of unknowns, a task that this solution can perform within seconds. In terms of computational throughput, it outperforms not only an highly optimized multi-CPU reference, but related GPU-based work as well. Finally, a GPU-accelerated graphical EEG real-time source localization software was implemented. Thanks to acceleration, it can meet real-time requirements in unpreceeded anatomical detail running more complex localization algorithms. Additionally, a novel implementation to extract anatomical priors from static Magnetic Resonance (MR) scansions has been included.
Resumo:
The laser driven ion acceleration is a burgeoning field of resarch and is attracting a growing number of scientists since the first results reported in 2000 obtained irradiating thin solid foils by high power laser pulses. The growing interest is driven by the peculiar characteristics of the produced bunches, the compactness of the whole accelerating system and the very short accelerating length of this all-optical accelerators. A fervent theoretical and experimental work has been done since then. An important part of the theoretical study is done by means of numerical simulations and the most widely used technique exploits PIC codes (“Particle In Cell'”). In this thesis the PIC code AlaDyn, developed by our research group considering innovative algorithms, is described. My work has been devoted to the developement of the code and the investigation of the laser driven ion acceleration for different target configurations. Two target configurations for the proton acceleration are presented together with the results of the 2D and 3D numerical investigation. One target configuration consists of a solid foil with a low density layer attached on the irradiated side. The nearly critical plasma of the foam layer allows a very high energy absorption by the target and an increase of the proton energy up to a factor 3, when compared to the ``pure'' TNSA configuration. The differences of the regime with respect to the standard TNSA are described The case of nearly critical density targets has been investigated with 3D simulations. In this case the laser travels throughout the plasma and exits on the rear side. During the propagation, the laser drills a channel and induce a magnetic vortex that expanding on the rear side of the targer is source of a very intense electric field. The protons of the plasma are strongly accelerated up to energies of 100 MeV using a 200PW laser.
Resumo:
Cost, performance and availability considerations are forcing even the most conservative high-integrity embedded real-time systems industry to migrate from simple hardware processors to ones equipped with caches and other acceleration features. This migration disrupts the practices and solutions that industry had developed and consolidated over the years to perform timing analysis. Industry that are confident with the efficiency/effectiveness of their verification and validation processes for old-generation processors, do not have sufficient insight on the effects of the migration to cache-equipped processors. Caches are perceived as an additional source of complexity, which has potential for shattering the guarantees of cost- and schedule-constrained qualification of their systems. The current industrial approach to timing analysis is ill-equipped to cope with the variability incurred by caches. Conversely, the application of advanced WCET analysis techniques on real-world industrial software, developed without analysability in mind, is hardly feasible. We propose a development approach aimed at minimising the cache jitters, as well as at enabling the application of advanced WCET analysis techniques to industrial systems. Our approach builds on:(i) identification of those software constructs that may impede or complicate timing analysis in industrial-scale systems; (ii) elaboration of practical means, under the model-driven engineering (MDE) paradigm, to enforce the automated generation of software that is analyzable by construction; (iii) implementation of a layout optimisation method to remove cache jitters stemming from the software layout in memory, with the intent of facilitating incremental software development, which is of high strategic interest to industry. The integration of those constituents in a structured approach to timing analysis achieves two interesting properties: the resulting software is analysable from the earliest releases onwards - as opposed to becoming so only when the system is final - and more easily amenable to advanced timing analysis by construction, regardless of the system scale and complexity.
Resumo:
The new generation of multicore processors opens new perspectives for the design of embedded systems. Multiprocessing, however, poses new challenges to the scheduling of real-time applications, in which the ever-increasing computational demands are constantly flanked by the need of meeting critical time constraints. Many research works have contributed to this field introducing new advanced scheduling algorithms. However, despite many of these works have solidly demonstrated their effectiveness, the actual support for multiprocessor real-time scheduling offered by current operating systems is still very limited. This dissertation deals with implementative aspects of real-time schedulers in modern embedded multiprocessor systems. The first contribution is represented by an open-source scheduling framework, which is capable of realizing complex multiprocessor scheduling policies, such as G-EDF, on conventional operating systems exploiting only their native scheduler from user-space. A set of experimental evaluations compare the proposed solution to other research projects that pursue the same goals by means of kernel modifications, highlighting comparable scheduling performances. The principles that underpin the operation of the framework, originally designed for symmetric multiprocessors, have been further extended first to asymmetric ones, which are subjected to major restrictions such as the lack of support for task migrations, and later to re-programmable hardware architectures (FPGAs). In the latter case, this work introduces a scheduling accelerator, which offloads most of the scheduling operations to the hardware and exhibits extremely low scheduling jitter. The realization of a portable scheduling framework presented many interesting software challenges. One of these has been represented by timekeeping. In this regard, a further contribution is represented by a novel data structure, called addressable binary heap (ABH). Such ABH, which is conceptually a pointer-based implementation of a binary heap, shows very interesting average and worst-case performances when addressing the problem of tick-less timekeeping of high-resolution timers.
Resumo:
In the race to obtain protons with higher energies, using more compact systems at the same time, laser-driven plasma accelerators are becoming an interesting possibility. But for now, only beams with extremely broad energy spectra and high divergence have been produced. The driving line of this PhD thesis was the study and design of a compact system to extract a high quality beam out of the initial bunch of protons produced by the interaction of a laser pulse with a thin solid target, using experimentally reliable technologies in order to be able to test such a system as soon as possible. In this thesis, different transport lines are analyzed. The first is based on a high field pulsed solenoid, some collimators and, for perfect filtering and post-acceleration, a high field high frequency compact linear accelerator, originally designed to accelerate a 30 MeV beam extracted from a cyclotron. The second one is based on a quadruplet of permanent magnetic quadrupoles: thanks to its greater simplicity and reliability, it has great interest for experiments, but the effectiveness is lower than the one based on the solenoid; in fact, the final beam intensity drops by an order of magnitude. An additional sensible decrease in intensity is verified in the third case, where the energy selection is achieved using a chicane, because of its very low efficiency for off-axis protons. The proposed schemes have all been analyzed with 3D simulations and all the significant results are presented. Future experimental work based on the outcome of this thesis can be planned and is being discussed now.
Resumo:
The aim of this work is to present various aspects of numerical simulation of particle and radiation transport for industrial and environmental protection applications, to enable the analysis of complex physical processes in a fast, reliable, and efficient way. In the first part we deal with speed-up of numerical simulation of neutron transport for nuclear reactor core analysis. The convergence properties of the source iteration scheme of the Method of Characteristics applied to be heterogeneous structured geometries has been enhanced by means of Boundary Projection Acceleration, enabling the study of 2D and 3D geometries with transport theory without spatial homogenization. The computational performances have been verified with the C5G7 2D and 3D benchmarks, showing a sensible reduction of iterations and CPU time. The second part is devoted to the study of temperature-dependent elastic scattering of neutrons for heavy isotopes near to the thermal zone. A numerical computation of the Doppler convolution of the elastic scattering kernel based on the gas model is presented, for a general energy dependent cross section and scattering law in the center of mass system. The range of integration has been optimized employing a numerical cutoff, allowing a faster numerical evaluation of the convolution integral. Legendre moments of the transfer kernel are subsequently obtained by direct quadrature and a numerical analysis of the convergence is presented. In the third part we focus our attention to remote sensing applications of radiative transfer employed to investigate the Earth's cryosphere. The photon transport equation is applied to simulate reflectivity of glaciers varying the age of the layer of snow or ice, its thickness, the presence or not other underlying layers, the degree of dust included in the snow, creating a framework able to decipher spectral signals collected by orbiting detectors.
Resumo:
Theories and numerical modeling are fundamental tools for understanding, optimizing and designing present and future laser-plasma accelerators (LPAs). Laser evolution and plasma wave excitation in a LPA driven by a weakly relativistically intense, short-pulse laser propagating in a preformed parabolic plasma channel, is studied analytically in 3D including the effects of pulse steepening and energy depletion. At higher laser intensities, the process of electron self-injection in the nonlinear bubble wake regime is studied by means of fully self-consistent Particle-in-Cell simulations. Considering a non-evolving laser driver propagating with a prescribed velocity, the geometrical properties of the non-evolving bubble wake are studied. For a range of parameters of interest for laser plasma acceleration, The dependence of the threshold for self-injection in the non-evolving wake on laser intensity and wake velocity is characterized. Due to the nonlinear and complex nature of the Physics involved, computationally challenging numerical simulations are required to model laser-plasma accelerators operating at relativistic laser intensities. The numerical and computational optimizations, that combined in the codes INF&RNO and INF&RNO/quasi-static give the possibility to accurately model multi-GeV laser wakefield acceleration stages with present supercomputing architectures, are discussed. The PIC code jasmine, capable of efficiently running laser-plasma simulations on Graphics Processing Units (GPUs) clusters, is presented. GPUs deliver exceptional performance to PIC codes, but the core algorithms had to be redesigned for satisfying the constraints imposed by the intrinsic parallelism of the architecture. The simulation campaigns, run with the code jasmine for modeling the recent LPA experiments with the INFN-FLAME and CNR-ILIL laser systems, are also presented.
Resumo:
During the last few decades an unprecedented technological growth has been at the center of the embedded systems design paramount, with Moore’s Law being the leading factor of this trend. Today in fact an ever increasing number of cores can be integrated on the same die, marking the transition from state-of-the-art multi-core chips to the new many-core design paradigm. Despite the extraordinarily high computing power, the complexity of many-core chips opens the door to several challenges. As a result of the increased silicon density of modern Systems-on-a-Chip (SoC), the design space exploration needed to find the best design has exploded and hardware designers are in fact facing the problem of a huge design space. Virtual Platforms have always been used to enable hardware-software co-design, but today they are facing with the huge complexity of both hardware and software systems. In this thesis two different research works on Virtual Platforms are presented: the first one is intended for the hardware developer, to easily allow complex cycle accurate simulations of many-core SoCs. The second work exploits the parallel computing power of off-the-shelf General Purpose Graphics Processing Units (GPGPUs), with the goal of an increased simulation speed. The term Virtualization can be used in the context of many-core systems not only to refer to the aforementioned hardware emulation tools (Virtual Platforms), but also for two other main purposes: 1) to help the programmer to achieve the maximum possible performance of an application, by hiding the complexity of the underlying hardware. 2) to efficiently exploit the high parallel hardware of many-core chips in environments with multiple active Virtual Machines. This thesis is focused on virtualization techniques with the goal to mitigate, and overtake when possible, some of the challenges introduced by the many-core design paradigm.