20 resultados para hardware computing

em AMS Tesi di Dottorato - Alm@DL - Università di Bologna


Relevância:

30.00% 30.00%

Publicador:

Resumo:

I moderni sistemi embedded sono equipaggiati con risorse hardware che consentono l’esecuzione di applicazioni molto complesse come il decoding audio e video. La progettazione di simili sistemi deve soddisfare due esigenze opposte. Da un lato è necessario fornire un elevato potenziale computazionale, dall’altro bisogna rispettare dei vincoli stringenti riguardo il consumo di energia. Uno dei trend più diffusi per rispondere a queste esigenze opposte è quello di integrare su uno stesso chip un numero elevato di processori caratterizzati da un design semplificato e da bassi consumi. Tuttavia, per sfruttare effettivamente il potenziale computazionale offerto da una batteria di processoriè necessario rivisitare pesantemente le metodologie di sviluppo delle applicazioni. Con l’avvento dei sistemi multi-processore su singolo chip (MPSoC) il parallel programming si è diffuso largamente anche in ambito embedded. Tuttavia, i progressi nel campo della programmazione parallela non hanno mantenuto il passo con la capacità di integrare hardware parallelo su un singolo chip. Oltre all’introduzione di multipli processori, la necessità di ridurre i consumi degli MPSoC comporta altre soluzioni architetturali che hanno l’effetto diretto di complicare lo sviluppo delle applicazioni. Il design del sottosistema di memoria, in particolare, è un problema critico. Integrare sul chip dei banchi di memoria consente dei tempi d’accesso molto brevi e dei consumi molto contenuti. Sfortunatamente, la quantità di memoria on-chip che può essere integrata in un MPSoC è molto limitata. Per questo motivo è necessario aggiungere dei banchi di memoria off-chip, che hanno una capacità molto maggiore, come maggiori sono i consumi e i tempi d’accesso. La maggior parte degli MPSoC attualmente in commercio destina una parte del budget di area all’implementazione di memorie cache e/o scratchpad. Le scratchpad (SPM) sono spesso preferite alle cache nei sistemi MPSoC embedded, per motivi di maggiore predicibilità, minore occupazione d’area e – soprattutto – minori consumi. Per contro, mentre l’uso delle cache è completamente trasparente al programmatore, le SPM devono essere esplicitamente gestite dall’applicazione. Esporre l’organizzazione della gerarchia di memoria ll’applicazione consente di sfruttarne in maniera efficiente i vantaggi (ridotti tempi d’accesso e consumi). Per contro, per ottenere questi benefici è necessario scrivere le applicazioni in maniera tale che i dati vengano partizionati e allocati sulle varie memorie in maniera opportuna. L’onere di questo compito complesso ricade ovviamente sul programmatore. Questo scenario descrive bene l’esigenza di modelli di programmazione e strumenti di supporto che semplifichino lo sviluppo di applicazioni parallele. In questa tesi viene presentato un framework per lo sviluppo di software per MPSoC embedded basato su OpenMP. OpenMP è uno standard di fatto per la programmazione di multiprocessori con memoria shared, caratterizzato da un semplice approccio alla parallelizzazione tramite annotazioni (direttive per il compilatore). La sua interfaccia di programmazione consente di esprimere in maniera naturale e molto efficiente il parallelismo a livello di loop, molto diffuso tra le applicazioni embedded di tipo signal processing e multimedia. OpenMP costituisce un ottimo punto di partenza per la definizione di un modello di programmazione per MPSoC, soprattutto per la sua semplicità d’uso. D’altra parte, per sfruttare in maniera efficiente il potenziale computazionale di un MPSoC è necessario rivisitare profondamente l’implementazione del supporto OpenMP sia nel compilatore che nell’ambiente di supporto a runtime. Tutti i costrutti per gestire il parallelismo, la suddivisione del lavoro e la sincronizzazione inter-processore comportano un costo in termini di overhead che deve essere minimizzato per non comprometterre i vantaggi della parallelizzazione. Questo può essere ottenuto soltanto tramite una accurata analisi delle caratteristiche hardware e l’individuazione dei potenziali colli di bottiglia nell’architettura. Una implementazione del task management, della sincronizzazione a barriera e della condivisione dei dati che sfrutti efficientemente le risorse hardware consente di ottenere elevate performance e scalabilità. La condivisione dei dati, nel modello OpenMP, merita particolare attenzione. In un modello a memoria condivisa le strutture dati (array, matrici) accedute dal programma sono fisicamente allocate su una unica risorsa di memoria raggiungibile da tutti i processori. Al crescere del numero di processori in un sistema, l’accesso concorrente ad una singola risorsa di memoria costituisce un evidente collo di bottiglia. Per alleviare la pressione sulle memorie e sul sistema di connessione vengono da noi studiate e proposte delle tecniche di partizionamento delle strutture dati. Queste tecniche richiedono che una singola entità di tipo array venga trattata nel programma come l’insieme di tanti sotto-array, ciascuno dei quali può essere fisicamente allocato su una risorsa di memoria differente. Dal punto di vista del programma, indirizzare un array partizionato richiede che ad ogni accesso vengano eseguite delle istruzioni per ri-calcolare l’indirizzo fisico di destinazione. Questo è chiaramente un compito lungo, complesso e soggetto ad errori. Per questo motivo, le nostre tecniche di partizionamento sono state integrate nella l’interfaccia di programmazione di OpenMP, che è stata significativamente estesa. Specificamente, delle nuove direttive e clausole consentono al programmatore di annotare i dati di tipo array che si vuole partizionare e allocare in maniera distribuita sulla gerarchia di memoria. Sono stati inoltre sviluppati degli strumenti di supporto che consentono di raccogliere informazioni di profiling sul pattern di accesso agli array. Queste informazioni vengono sfruttate dal nostro compilatore per allocare le partizioni sulle varie risorse di memoria rispettando una relazione di affinità tra il task e i dati. Più precisamente, i passi di allocazione nel nostro compilatore assegnano una determinata partizione alla memoria scratchpad locale al processore che ospita il task che effettua il numero maggiore di accessi alla stessa.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The evolution of the electronics embedded applications forces electronics systems designers to match their ever increasing requirements. This evolution pushes the computational power of digital signal processing systems, as well as the energy required to accomplish the computations, due to the increasing mobility of such applications. Current approaches used to match these requirements relies on the adoption of application specific signal processors. Such kind of devices exploits powerful accelerators, which are able to match both performance and energy requirements. On the other hand, the too high specificity of such accelerators often results in a lack of flexibility which affects non-recurrent engineering costs, time to market, and market volumes too. The state of the art mainly proposes two solutions to overcome these issues with the ambition of delivering reasonable performance and energy efficiency: reconfigurable computing and multi-processors computing. All of these solutions benefits from the post-fabrication programmability, that definitively results in an increased flexibility. Nevertheless, the gap between these approaches and dedicated hardware is still too high for many application domains, especially when targeting the mobile world. In this scenario, flexible and energy efficient acceleration can be achieved by merging these two computational paradigms, in order to address all the above introduced constraints. This thesis focuses on the exploration of the design and application spectrum of reconfigurable computing, exploited as application specific accelerators for multi-processors systems on chip. More specifically, it introduces a reconfigurable digital signal processor featuring a heterogeneous set of reconfigurable engines, and a homogeneous multi-core system, exploiting three different flavours of reconfigurable and mask-programmable technologies as implementation platform for applications specific accelerators. In this work, the various trade-offs concerning the utilization multi-core platforms and the different configuration technologies are explored, characterizing the design space of the proposed approach in terms of programmability, performance, energy efficiency and manufacturing costs.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

During the last few decades an unprecedented technological growth has been at the center of the embedded systems design paramount, with Moore’s Law being the leading factor of this trend. Today in fact an ever increasing number of cores can be integrated on the same die, marking the transition from state-of-the-art multi-core chips to the new many-core design paradigm. Despite the extraordinarily high computing power, the complexity of many-core chips opens the door to several challenges. As a result of the increased silicon density of modern Systems-on-a-Chip (SoC), the design space exploration needed to find the best design has exploded and hardware designers are in fact facing the problem of a huge design space. Virtual Platforms have always been used to enable hardware-software co-design, but today they are facing with the huge complexity of both hardware and software systems. In this thesis two different research works on Virtual Platforms are presented: the first one is intended for the hardware developer, to easily allow complex cycle accurate simulations of many-core SoCs. The second work exploits the parallel computing power of off-the-shelf General Purpose Graphics Processing Units (GPGPUs), with the goal of an increased simulation speed. The term Virtualization can be used in the context of many-core systems not only to refer to the aforementioned hardware emulation tools (Virtual Platforms), but also for two other main purposes: 1) to help the programmer to achieve the maximum possible performance of an application, by hiding the complexity of the underlying hardware. 2) to efficiently exploit the high parallel hardware of many-core chips in environments with multiple active Virtual Machines. This thesis is focused on virtualization techniques with the goal to mitigate, and overtake when possible, some of the challenges introduced by the many-core design paradigm.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Embedding intelligence in extreme edge devices allows distilling raw data acquired from sensors into actionable information, directly on IoT end-nodes. This computing paradigm, in which end-nodes no longer depend entirely on the Cloud, offers undeniable benefits, driving a large research area (TinyML) to deploy leading Machine Learning (ML) algorithms on micro-controller class of devices. To fit the limited memory storage capability of these tiny platforms, full-precision Deep Neural Networks (DNNs) are compressed by representing their data down to byte and sub-byte formats, in the integer domain. However, the current generation of micro-controller systems can barely cope with the computing requirements of QNNs. This thesis tackles the challenge from many perspectives, presenting solutions both at software and hardware levels, exploiting parallelism, heterogeneity and software programmability to guarantee high flexibility and high energy-performance proportionality. The first contribution, PULP-NN, is an optimized software computing library for QNN inference on parallel ultra-low-power (PULP) clusters of RISC-V processors, showing one order of magnitude improvements in performance and energy efficiency, compared to current State-of-the-Art (SoA) STM32 micro-controller systems (MCUs) based on ARM Cortex-M cores. The second contribution is XpulpNN, a set of RISC-V domain specific instruction set architecture (ISA) extensions to deal with sub-byte integer arithmetic computation. The solution, including the ISA extensions and the micro-architecture to support them, achieves energy efficiency comparable with dedicated DNN accelerators and surpasses the efficiency of SoA ARM Cortex-M based MCUs, such as the low-end STM32M4 and the high-end STM32H7 devices, by up to three orders of magnitude. To overcome the Von Neumann bottleneck while guaranteeing the highest flexibility, the final contribution integrates an Analog In-Memory Computing accelerator into the PULP cluster, creating a fully programmable heterogeneous fabric that demonstrates end-to-end inference capabilities of SoA MobileNetV2 models, showing two orders of magnitude performance improvements over current SoA analog/digital solutions.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Analog In-memory Computing (AIMC) has been proposed in the context of Beyond Von Neumann architectures as a valid strategy to reduce internal data transfers energy consumption and latency, and to improve compute efficiency. The aim of AIMC is to perform computations within the memory unit, typically leveraging the physical features of memory devices. Among resistive Non-volatile Memories (NVMs), Phase-change Memory (PCM) has become a promising technology due to its intrinsic capability to store multilevel data. Hence, PCM technology is currently investigated to enhance the possibilities and the applications of AIMC. This thesis aims at exploring the potential of new PCM-based architectures as in-memory computational accelerators. In a first step, a preliminar experimental characterization of PCM devices has been carried out in an AIMC perspective. PCM cells non-idealities, such as time-drift, noise, and non-linearity have been studied to develop a dedicated multilevel programming algorithm. Measurement-based simulations have been then employed to evaluate the feasibility of PCM-based operations in the fields of Deep Neural Networks (DNNs) and Structural Health Monitoring (SHM). Moreover, a first testchip has been designed and tested to evaluate the hardware implementation of Multiply-and-Accumulate (MAC) operations employing PCM cells. This prototype experimentally demonstrates the possibility to reach a 95% MAC accuracy with a circuit-level compensation of cells time drift and non-linearity. Finally, empirical circuit behavior models have been included in simulations to assess the use of this technology in specific DNN applications, and to enhance the potentiality of this innovative computation approach.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Service Oriented Computing is a new programming paradigm for addressing distributed system design issues. Services are autonomous computational entities which can be dynamically discovered and composed in order to form more complex systems able to achieve different kinds of task. E-government, e-business and e-science are some examples of the IT areas where Service Oriented Computing will be exploited in the next years. At present, the most credited Service Oriented Computing technology is that of Web Services, whose specifications are enriched day by day by industrial consortia without following a precise and rigorous approach. This PhD thesis aims, on the one hand, at modelling Service Oriented Computing in a formal way in order to precisely define the main concepts it is based upon and, on the other hand, at defining a new approach, called bipolar approach, for addressing system design issues by synergically exploiting choreography and orchestration languages related by means of a mathematical relation called conformance. Choreography allows us to describe systems of services from a global view point whereas orchestration supplies a means for addressing such an issue from a local perspective. In this work we present SOCK, a process algebra based language inspired by the Web Service orchestration language WS-BPEL which catches the essentials of Service Oriented Computing. From the definition of SOCK we will able to define a general model for dealing with Service Oriented Computing where services and systems of services are related to the design of finite state automata and process algebra concurrent systems, respectively. Furthermore, we introduce a formal language for dealing with choreography. Such a language is equipped with a formal semantics and it forms, together with a subset of the SOCK calculus, the bipolar framework. Finally, we present JOLIE which is a Java implentation of a subset of the SOCK calculus and it is part of the bipolar framework we intend to promote.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

ALICE, that is an experiment held at CERN using the LHC, is specialized in analyzing lead-ion collisions. ALICE will study the properties of quarkgluon plasma, a state of matter where quarks and gluons, under conditions of very high temperatures and densities, are no longer confined inside hadrons. Such a state of matter probably existed just after the Big Bang, before particles such as protons and neutrons were formed. The SDD detector, one of the ALICE subdetectors, is part of the ITS that is composed by 6 cylindrical layers with the innermost one attached to the beam pipe. The ITS tracks and identifies particles near the interaction point, it also aligns the tracks of the articles detected by more external detectors. The two ITS middle layers contain the whole 260 SDD detectors. A multichannel readout board, called CARLOSrx, receives at the same time the data coming from 12 SDD detectors. In total there are 24 CARLOSrx boards needed to read data coming from all the SDD modules (detector plus front end electronics). CARLOSrx packs data coming from the front end electronics through optical link connections, it stores them in a large data FIFO and then it sends them to the DAQ system. Each CARLOSrx is composed by two boards. One is called CARLOSrx data, that reads data coming from the SDD detectors and configures the FEE; the other one is called CARLOSrx clock, that sends the clock signal to all the FEE. This thesis contains a description of the hardware design and firmware features of both CARLOSrx data and CARLOSrx clock boards, which deal with all the SDD readout chain. A description of the software tools necessary to test and configure the front end electronics will be presented at the end of the thesis.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This thesis deals with Context Aware Services, Smart Environments, Context Management and solutions for Devices and Service Interoperability. Multi-vendor devices offer an increasing number of services and end-user applications that base their value on the ability to exploit the information originating from the surrounding environment by means of an increasing number of embedded sensors, e.g. GPS, compass, RFID readers, cameras and so on. However, usually such devices are not able to exchange information because of the lack of a shared data storage and common information exchange methods. A large number of standards and domain specific building blocks are available and are heavily used in today's products. However, the use of these solutions based on ready-to-use modules is not without problems. The integration and cooperation of different kinds of modules can be daunting because of growing complexity and dependency. In this scenarios it might be interesting to have an infrastructure that makes the coexistence of multi-vendor devices easy, while enabling low cost development and smooth access to services. This sort of technologies glue should reduce both software and hardware integration costs by removing the trouble of interoperability. The result should also lead to faster and simplified design, development and, deployment of cross-domain applications. This thesis is mainly focused on SW architectures supporting context aware service providers especially on the following subjects: - user preferences service adaptation - context management - content management - information interoperability - multivendor device interoperability - communication and connectivity interoperability Experimental activities were carried out in several domains including Cultural Heritage, indoor and personal smart spaces – all of which are considered significant test-beds in Context Aware Computing. The work evolved within european and national projects: on the europen side, I carried out my research activity within EPOCH, the FP6 Network of Excellence on “Processing Open Cultural Heritage” and within SOFIA, a project of the ARTEMIS JU on embedded systems. I worked in cooperation with several international establishments, including the University of Kent, VTT (the Technical Reserarch Center of Finland) and Eurotech. On the national side I contributed to a one-to-one research contract between ARCES and Telecom Italia. The first part of the thesis is focused on problem statement and related work and addresses interoperability issues and related architecture components. The second part is focused on specific architectures and frameworks: - MobiComp: a context management framework that I used in cultural heritage applications - CAB: a context, preference and profile based application broker which I designed within EPOCH Network of Excellence - M3: "Semantic Web based" information sharing infrastructure for smart spaces designed by Nokia within the European project SOFIA - NoTa: a service and transport independent connectivity framework - OSGi: the well known Java based service support framework The final section is dedicated to the middleware, the tools and, the SW agents developed during my Doctorate time to support context-aware services in smart environments.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This work describes the development of a simulation tool which allows the simulation of the Internal Combustion Engine (ICE), the transmission and the vehicle dynamics. It is a control oriented simulation tool, designed in order to perform both off-line (Software In the Loop) and on-line (Hardware In the Loop) simulation. In the first case the simulation tool can be used in order to optimize Engine Control Unit strategies (as far as regard, for example, the fuel consumption or the performance of the engine), while in the second case it can be used in order to test the control system. In recent years the use of HIL simulations has proved to be very useful in developing and testing of control systems. Hardware In the Loop simulation is a technology where the actual vehicles, engines or other components are replaced by a real time simulation, based on a mathematical model and running in a real time processor. The processor reads ECU (Engine Control Unit) output signals which would normally feed the actuators and, by using mathematical models, provides the signals which would be produced by the actual sensors. The simulation tool, fully designed within Simulink, includes the possibility to simulate the only engine, the transmission and vehicle dynamics and the engine along with the vehicle and transmission dynamics, allowing in this case to evaluate the performance and the operating conditions of the Internal Combustion Engine, once it is installed on a given vehicle. Furthermore the simulation tool includes different level of complexity, since it is possible to use, for example, either a zero-dimensional or a one-dimensional model of the intake system (in this case only for off-line application, because of the higher computational effort). Given these preliminary remarks, an important goal of this work is the development of a simulation environment that can be easily adapted to different engine types (single- or multi-cylinder, four-stroke or two-stroke, diesel or gasoline) and transmission architecture without reprogramming. Also, the same simulation tool can be rapidly configured both for off-line and real-time application. The Matlab-Simulink environment has been adopted to achieve such objectives, since its graphical programming interface allows building flexible and reconfigurable models, and real-time simulation is possible with standard, off-the-shelf software and hardware platforms (such as dSPACE systems).

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This work presents exact, hybrid algorithms for mixed resource Allocation and Scheduling problems; in general terms, those consist into assigning over time finite capacity resources to a set of precedence connected activities. The proposed methods have broad applicability, but are mainly motivated by applications in the field of Embedded System Design. In particular, high-performance embedded computing recently witnessed the shift from single CPU platforms with application-specific accelerators to programmable Multi Processor Systems-on-Chip (MPSoCs). Those allow higher flexibility, real time performance and low energy consumption, but the programmer must be able to effectively exploit the platform parallelism. This raises interest in the development of algorithmic techniques to be embedded in CAD tools; in particular, given a specific application and platform, the objective if to perform optimal allocation of hardware resources and to compute an execution schedule. On this regard, since embedded systems tend to run the same set of applications for their entire lifetime, off-line, exact optimization approaches are particularly appealing. Quite surprisingly, the use of exact algorithms has not been well investigated so far; this is in part motivated by the complexity of integrated allocation and scheduling, setting tough challenges for ``pure'' combinatorial methods. The use of hybrid CP/OR approaches presents the opportunity to exploit mutual advantages of different methods, while compensating for their weaknesses. In this work, we consider in first instance an Allocation and Scheduling problem over the Cell BE processor by Sony, IBM and Toshiba; we propose three different solution methods, leveraging decomposition, cut generation and heuristic guided search. Next, we face Allocation and Scheduling of so-called Conditional Task Graphs, explicitly accounting for branches with outcome not known at design time; we extend the CP scheduling framework to effectively deal with the introduced stochastic elements. Finally, we address Allocation and Scheduling with uncertain, bounded execution times, via conflict based tree search; we introduce a simple and flexible time model to take into account duration variability and provide an efficient conflict detection method. The proposed approaches achieve good results on practical size problem, thus demonstrating the use of exact approaches for system design is feasible. Furthermore, the developed techniques bring significant contributions to combinatorial optimization methods.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The term Ambient Intelligence (AmI) refers to a vision on the future of the information society where smart, electronic environment are sensitive and responsive to the presence of people and their activities (Context awareness). In an ambient intelligence world, devices work in concert to support people in carrying out their everyday life activities, tasks and rituals in an easy, natural way using information and intelligence that is hidden in the network connecting these devices. This promotes the creation of pervasive environments improving the quality of life of the occupants and enhancing the human experience. AmI stems from the convergence of three key technologies: ubiquitous computing, ubiquitous communication and natural interfaces. Ambient intelligent systems are heterogeneous and require an excellent cooperation between several hardware/software technologies and disciplines, including signal processing, networking and protocols, embedded systems, information management, and distributed algorithms. Since a large amount of fixed and mobile sensors embedded is deployed into the environment, the Wireless Sensor Networks is one of the most relevant enabling technologies for AmI. WSN are complex systems made up of a number of sensor nodes which can be deployed in a target area to sense physical phenomena and communicate with other nodes and base stations. These simple devices typically embed a low power computational unit (microcontrollers, FPGAs etc.), a wireless communication unit, one or more sensors and a some form of energy supply (either batteries or energy scavenger modules). WNS promises of revolutionizing the interactions between the real physical worlds and human beings. Low-cost, low-computational power, low energy consumption and small size are characteristics that must be taken into consideration when designing and dealing with WSNs. To fully exploit the potential of distributed sensing approaches, a set of challengesmust be addressed. Sensor nodes are inherently resource-constrained systems with very low power consumption and small size requirements which enables than to reduce the interference on the physical phenomena sensed and to allow easy and low-cost deployment. They have limited processing speed,storage capacity and communication bandwidth that must be efficiently used to increase the degree of local ”understanding” of the observed phenomena. A particular case of sensor nodes are video sensors. This topic holds strong interest for a wide range of contexts such as military, security, robotics and most recently consumer applications. Vision sensors are extremely effective for medium to long-range sensing because vision provides rich information to human operators. However, image sensors generate a huge amount of data, whichmust be heavily processed before it is transmitted due to the scarce bandwidth capability of radio interfaces. In particular, in video-surveillance, it has been shown that source-side compression is mandatory due to limited bandwidth and delay constraints. Moreover, there is an ample opportunity for performing higher-level processing functions, such as object recognition that has the potential to drastically reduce the required bandwidth (e.g. by transmitting compressed images only when something ‘interesting‘ is detected). The energy cost of image processing must however be carefully minimized. Imaging could play and plays an important role in sensing devices for ambient intelligence. Computer vision can for instance be used for recognising persons and objects and recognising behaviour such as illness and rioting. Having a wireless camera as a camera mote opens the way for distributed scene analysis. More eyes see more than one and a camera system that can observe a scene from multiple directions would be able to overcome occlusion problems and could describe objects in their true 3D appearance. In real-time, these approaches are a recently opened field of research. In this thesis we pay attention to the realities of hardware/software technologies and the design needed to realize systems for distributed monitoring, attempting to propose solutions on open issues and filling the gap between AmI scenarios and hardware reality. The physical implementation of an individual wireless node is constrained by three important metrics which are outlined below. Despite that the design of the sensor network and its sensor nodes is strictly application dependent, a number of constraints should almost always be considered. Among them: • Small form factor to reduce nodes intrusiveness. • Low power consumption to reduce battery size and to extend nodes lifetime. • Low cost for a widespread diffusion. These limitations typically result in the adoption of low power, low cost devices such as low powermicrocontrollers with few kilobytes of RAMand tenth of kilobytes of program memory with whomonly simple data processing algorithms can be implemented. However the overall computational power of the WNS can be very large since the network presents a high degree of parallelism that can be exploited through the adoption of ad-hoc techniques. Furthermore through the fusion of information from the dense mesh of sensors even complex phenomena can be monitored. In this dissertation we present our results in building several AmI applications suitable for a WSN implementation. The work can be divided into two main areas:Low Power Video Sensor Node and Video Processing Alghoritm and Multimodal Surveillance . Low Power Video Sensor Nodes and Video Processing Alghoritms In comparison to scalar sensors, such as temperature, pressure, humidity, velocity, and acceleration sensors, vision sensors generate much higher bandwidth data due to the two-dimensional nature of their pixel array. We have tackled all the constraints listed above and have proposed solutions to overcome the current WSNlimits for Video sensor node. We have designed and developed wireless video sensor nodes focusing on the small size and the flexibility of reuse in different applications. The video nodes target a different design point: the portability (on-board power supply, wireless communication), a scanty power budget (500mW),while still providing a prominent level of intelligence, namely sophisticated classification algorithmand high level of reconfigurability. We developed two different video sensor node: The device architecture of the first one is based on a low-cost low-power FPGA+microcontroller system-on-chip. The second one is based on ARM9 processor. Both systems designed within the above mentioned power envelope could operate in a continuous fashion with Li-Polymer battery pack and solar panel. Novel low power low cost video sensor nodes which, in contrast to sensors that just watch the world, are capable of comprehending the perceived information in order to interpret it locally, are presented. Featuring such intelligence, these nodes would be able to cope with such tasks as recognition of unattended bags in airports, persons carrying potentially dangerous objects, etc.,which normally require a human operator. Vision algorithms for object detection, acquisition like human detection with Support Vector Machine (SVM) classification and abandoned/removed object detection are implemented, described and illustrated on real world data. Multimodal surveillance: In several setup the use of wired video cameras may not be possible. For this reason building an energy efficient wireless vision network for monitoring and surveillance is one of the major efforts in the sensor network community. Energy efficiency for wireless smart camera networks is one of the major efforts in distributed monitoring and surveillance community. For this reason, building an energy efficient wireless vision network for monitoring and surveillance is one of the major efforts in the sensor network community. The Pyroelectric Infra-Red (PIR) sensors have been used to extend the lifetime of a solar-powered video sensor node by providing an energy level dependent trigger to the video camera and the wireless module. Such approach has shown to be able to extend node lifetime and possibly result in continuous operation of the node.Being low-cost, passive (thus low-power) and presenting a limited form factor, PIR sensors are well suited for WSN applications. Moreover techniques to have aggressive power management policies are essential for achieving long-termoperating on standalone distributed cameras needed to improve the power consumption. We have used an adaptive controller like Model Predictive Control (MPC) to help the system to improve the performances outperforming naive power management policies.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Electronic applications are nowadays converging under the umbrella of the cloud computing vision. The future ecosystem of information and communication technology is going to integrate clouds of portable clients and embedded devices exchanging information, through the internet layer, with processing clusters of servers, data-centers and high performance computing systems. Even thus the whole society is waiting to embrace this revolution, there is a backside of the story. Portable devices require battery to work far from the power plugs and their storage capacity does not scale as the increasing power requirement does. At the other end processing clusters, such as data-centers and server farms, are build upon the integration of thousands multiprocessors. For each of them during the last decade the technology scaling has produced a dramatic increase in power density with significant spatial and temporal variability. This leads to power and temperature hot-spots, which may cause non-uniform ageing and accelerated chip failure. Nonetheless all the heat removed from the silicon translates in high cooling costs. Moreover trend in ICT carbon footprint shows that run-time power consumption of the all spectrum of devices accounts for a significant slice of entire world carbon emissions. This thesis work embrace the full ICT ecosystem and dynamic power consumption concerns by describing a set of new and promising system levels resource management techniques to reduce the power consumption and related issues for two corner cases: Mobile Devices and High Performance Computing.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This thesis explores the capabilities of heterogeneous multi-core systems, based on multiple Graphics Processing Units (GPUs) in a standard desktop framework. Multi-GPU accelerated desk side computers are an appealing alternative to other high performance computing (HPC) systems: being composed of commodity hardware components fabricated in large quantities, their price-performance ratio is unparalleled in the world of high performance computing. Essentially bringing “supercomputing to the masses”, this opens up new possibilities for application fields where investing in HPC resources had been considered unfeasible before. One of these is the field of bioelectrical imaging, a class of medical imaging technologies that occupy a low-cost niche next to million-dollar systems like functional Magnetic Resonance Imaging (fMRI). In the scope of this work, several computational challenges encountered in bioelectrical imaging are tackled with this new kind of computing resource, striving to help these methods approach their true potential. Specifically, the following main contributions were made: Firstly, a novel dual-GPU implementation of parallel triangular matrix inversion (TMI) is presented, addressing an crucial kernel in computation of multi-mesh head models of encephalographic (EEG) source localization. This includes not only a highly efficient implementation of the routine itself achieving excellent speedups versus an optimized CPU implementation, but also a novel GPU-friendly compressed storage scheme for triangular matrices. Secondly, a scalable multi-GPU solver for non-hermitian linear systems was implemented. It is integrated into a simulation environment for electrical impedance tomography (EIT) that requires frequent solution of complex systems with millions of unknowns, a task that this solution can perform within seconds. In terms of computational throughput, it outperforms not only an highly optimized multi-CPU reference, but related GPU-based work as well. Finally, a GPU-accelerated graphical EEG real-time source localization software was implemented. Thanks to acceleration, it can meet real-time requirements in unpreceeded anatomical detail running more complex localization algorithms. Additionally, a novel implementation to extract anatomical priors from static Magnetic Resonance (MR) scansions has been included.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Throughout the twentieth century statistical methods have increasingly become part of experimental research. In particular, statistics has made quantification processes meaningful in the soft sciences, which had traditionally relied on activities such as collecting and describing diversity rather than timing variation. The thesis explores this change in relation to agriculture and biology, focusing on analysis of variance and experimental design, the statistical methods developed by the mathematician and geneticist Ronald Aylmer Fisher during the 1920s. The role that Fisher’s methods acquired as tools of scientific research, side by side with the laboratory equipment and the field practices adopted by research workers, is here investigated bottom-up, beginning with the computing instruments and the information technologies that were the tools of the trade for statisticians. Four case studies show under several perspectives the interaction of statistics, computing and information technologies, giving on the one hand an overview of the main tools – mechanical calculators, statistical tables, punched and index cards, standardised forms, digital computers – adopted in the period, and on the other pointing out how these tools complemented each other and were instrumental for the development and dissemination of analysis of variance and experimental design. The period considered is the half-century from the early 1920s to the late 1960s, the institutions investigated are Rothamsted Experimental Station and the Galton Laboratory, and the statisticians examined are Ronald Fisher and Frank Yates.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The new generation of multicore processors opens new perspectives for the design of embedded systems. Multiprocessing, however, poses new challenges to the scheduling of real-time applications, in which the ever-increasing computational demands are constantly flanked by the need of meeting critical time constraints. Many research works have contributed to this field introducing new advanced scheduling algorithms. However, despite many of these works have solidly demonstrated their effectiveness, the actual support for multiprocessor real-time scheduling offered by current operating systems is still very limited. This dissertation deals with implementative aspects of real-time schedulers in modern embedded multiprocessor systems. The first contribution is represented by an open-source scheduling framework, which is capable of realizing complex multiprocessor scheduling policies, such as G-EDF, on conventional operating systems exploiting only their native scheduler from user-space. A set of experimental evaluations compare the proposed solution to other research projects that pursue the same goals by means of kernel modifications, highlighting comparable scheduling performances. The principles that underpin the operation of the framework, originally designed for symmetric multiprocessors, have been further extended first to asymmetric ones, which are subjected to major restrictions such as the lack of support for task migrations, and later to re-programmable hardware architectures (FPGAs). In the latter case, this work introduces a scheduling accelerator, which offloads most of the scheduling operations to the hardware and exhibits extremely low scheduling jitter. The realization of a portable scheduling framework presented many interesting software challenges. One of these has been represented by timekeeping. In this regard, a further contribution is represented by a novel data structure, called addressable binary heap (ABH). Such ABH, which is conceptually a pointer-based implementation of a binary heap, shows very interesting average and worst-case performances when addressing the problem of tick-less timekeeping of high-resolution timers.