939 resultados para embedded system
Resumo:
Power efficiency is one of the most important constraints in the design of embedded systems since such systems are generally driven by batteries with limited energy budget or restricted power supply. In every embedded system, there are one or more processor cores to run the software and interact with the other hardware components of the system. The power consumption of the processor core(s) has an important impact on the total power dissipated in the system. Hence, the processor power optimization is crucial in satisfying the power consumption constraints, and developing low-power embedded systems. A key aspect of research in processor power optimization and management is “power estimation”. Having a fast and accurate method for processor power estimation at design time helps the designer to explore a large space of design possibilities, to make the optimal choices for developing a power efficient processor. Likewise, understanding the processor power dissipation behaviour of a specific software/application is the key for choosing appropriate algorithms in order to write power efficient software. Simulation-based methods for measuring the processor power achieve very high accuracy, but are available only late in the design process, and are often quite slow. Therefore, the need has arisen for faster, higher-level power prediction methods that allow the system designer to explore many alternatives for developing powerefficient hardware and software. The aim of this thesis is to present fast and high-level power models for the prediction of processor power consumption. Power predictability in this work is achieved in two ways: first, using a design method to develop power predictable circuits; second, analysing the power of the functions in the code which repeat during execution, then building the power model based on average number of repetitions. In the first case, a design method called Asynchronous Charge Sharing Logic (ACSL) is used to implement the Arithmetic Logic Unit (ALU) for the 8051 microcontroller. The ACSL circuits are power predictable due to the independency of their power consumption to the input data. Based on this property, a fast prediction method is presented to estimate the power of ALU by analysing the software program, and extracting the number of ALU-related instructions. This method achieves less than 1% error in power estimation and more than 100 times speedup in comparison to conventional simulation-based methods. In the second case, an average-case processor energy model is developed for the Insertion sort algorithm based on the number of comparisons that take place in the execution of the algorithm. The average number of comparisons is calculated using a high level methodology called MOdular Quantitative Analysis (MOQA). The parameters of the energy model are measured for the LEON3 processor core, but the model is general and can be used for any processor. The model has been validated through the power measurement experiments, and offers high accuracy and orders of magnitude speedup over the simulation-based method.
Resumo:
A significant issue encountered when fusing data received from multiple sensors is the accuracy of the timestamp associated with each piece of data. This is particularly important in applications such as Simultaneous Localisation and Mapping (SLAM) where vehicle velocity forms an important part of the mapping algorithms; on fastmoving vehicles, even millisecond inconsistencies in data timestamping can produce errors which need to be compensated for. The timestamping problem is compounded in a robot swarm environment due to the use of non-deterministic readily-available hardware (such as 802.11-based wireless) and inaccurate clock synchronisation protocols (such as Network Time Protocol (NTP)). As a result, the synchronisation of the clocks between robots can be out by tens-to-hundreds of milliseconds making correlation of data difficult and preventing the possibility of the units performing synchronised actions such as triggering cameras or intricate swarm manoeuvres. In this thesis, a complete data fusion unit is designed, implemented and tested. The unit, named BabelFuse, is able to accept sensor data from a number of low-speed communication buses (such as RS232, RS485 and CAN Bus) and also timestamp events that occur on General Purpose Input/Output (GPIO) pins referencing a submillisecondaccurate wirelessly-distributed "global" clock signal. In addition to its timestamping capabilities, it can also be used to trigger an attached camera at a predefined start time and frame rate. This functionality enables the creation of a wirelessly-synchronised distributed image acquisition system over a large geographic area; a real world application for this functionality is the creation of a platform to facilitate wirelessly-distributed 3D stereoscopic vision. A ‘best-practice’ design methodology is adopted within the project to ensure the final system operates according to its requirements. Initially, requirements are generated from which a high-level architecture is distilled. This architecture is then converted into a hardware specification and low-level design, which is then manufactured. The manufactured hardware is then verified to ensure it operates as designed and firmware and Linux Operating System (OS) drivers are written to provide the features and connectivity required of the system. Finally, integration testing is performed to ensure the unit functions as per its requirements. The BabelFuse System comprises of a single Grand Master unit which is responsible for maintaining the absolute value of the "global" clock. Slave nodes then determine their local clock o.set from that of the Grand Master via synchronisation events which occur multiple times per-second. The mechanism used for synchronising the clocks between the boards wirelessly makes use of specific hardware and a firmware protocol based on elements of the IEEE-1588 Precision Time Protocol (PTP). With the key requirement of the system being submillisecond-accurate clock synchronisation (as a basis for timestamping and camera triggering), automated testing is carried out to monitor the o.sets between each Slave and the Grand Master over time. A common strobe pulse is also sent to each unit for timestamping; the correlation between the timestamps of the di.erent units is used to validate the clock o.set results. Analysis of the automated test results show that the BabelFuse units are almost threemagnitudes more accurate than their requirement; clocks of the Slave and Grand Master units do not di.er by more than three microseconds over a running time of six hours and the mean clock o.set of Slaves to the Grand Master is less-than one microsecond. The common strobe pulse used to verify the clock o.set data yields a positive result with a maximum variation between units of less-than two microseconds and a mean value of less-than one microsecond. The camera triggering functionality is verified by connecting the trigger pulse output of each board to a four-channel digital oscilloscope and setting each unit to output a 100Hz periodic pulse with a common start time. The resulting waveform shows a maximum variation between the rising-edges of the pulses of approximately 39¥ìs, well below its target of 1ms.
Resumo:
Distributed Wireless Smart Camera (DWSC) network is a special type of Wireless Sensor Network (WSN) that processes captured images in a distributed manner. While image processing on DWSCs sees a great potential for growth, with its applications possessing a vast practical application domain such as security surveillance and health care, it suffers from tremendous constraints. In addition to the limitations of conventional WSNs, image processing on DWSCs requires more computational power, bandwidth and energy that presents significant challenges for large scale deployments. This dissertation has developed a number of algorithms that are highly scalable, portable, energy efficient and performance efficient, with considerations of practical constraints imposed by the hardware and the nature of WSN. More specifically, these algorithms tackle the problems of multi-object tracking and localisation in distributed wireless smart camera net- works and optimal camera configuration determination. Addressing the first problem of multi-object tracking and localisation requires solving a large array of sub-problems. The sub-problems that are discussed in this dissertation are calibration of internal parameters, multi-camera calibration for localisation and object handover for tracking. These topics have been covered extensively in computer vision literatures, however new algorithms must be invented to accommodate the various constraints introduced and required by the DWSC platform. A technique has been developed for the automatic calibration of low-cost cameras which are assumed to be restricted in their freedom of movement to either pan or tilt movements. Camera internal parameters, including focal length, principal point, lens distortion parameter and the angle and axis of rotation, can be recovered from a minimum set of two images of the camera, provided that the axis of rotation between the two images goes through the camera's optical centre and is parallel to either the vertical (panning) or horizontal (tilting) axis of the image. For object localisation, a novel approach has been developed for the calibration of a network of non-overlapping DWSCs in terms of their ground plane homographies, which can then be used for localising objects. In the proposed approach, a robot travels through the camera network while updating its position in a global coordinate frame, which it broadcasts to the cameras. The cameras use this, along with the image plane location of the robot, to compute a mapping from their image planes to the global coordinate frame. This is combined with an occupancy map generated by the robot during the mapping process to localised objects moving within the network. In addition, to deal with the problem of object handover between DWSCs of non-overlapping fields of view, a highly-scalable, distributed protocol has been designed. Cameras that follow the proposed protocol transmit object descriptions to a selected set of neighbours that are determined using a predictive forwarding strategy. The received descriptions are then matched at the subsequent camera on the object's path using a probability maximisation process with locally generated descriptions. The second problem of camera placement emerges naturally when these pervasive devices are put into real use. The locations, orientations, lens types etc. of the cameras must be chosen in a way that the utility of the network is maximised (e.g. maximum coverage) while user requirements are met. To deal with this, a statistical formulation of the problem of determining optimal camera configurations has been introduced and a Trans-Dimensional Simulated Annealing (TDSA) algorithm has been proposed to effectively solve the problem.
Resumo:
High end network security applications demand high speed operation and large rule set support. Packet classification is the core functionality that demands high throughput in such applications. This paper proposes a packet classification architecture to meet such high throughput. We have implemented a Firewall with this architecture in reconflgurable hardware. We propose an extension to Distributed Crossproducting of Field Labels (DCFL) technique to achieve scalable and high performance architecture. The implemented Firewall takes advantage of inherent structure and redundancy of rule set by using our DCFL Extended (DCFLE) algorithm. The use of DCFLE algorithm results in both speed and area improvement when it is implemented in hardware. Although we restrict ourselves to standard 5-tuple matching, the architecture supports additional fields. High throughput classification invariably uses Ternary Content Addressable Memory (TCAM) for prefix matching, though TCAM fares poorly in terms of area and power efficiency. Use of TCAM for port range matching is expensive, as the range to prefix conversion results in large number of prefixes leading to storage inefficiency. Extended TCAM (ETCAM) is fast and the most storage efficient solution for range matching. We present for the first time a reconfigurable hardware implementation of ETCAM. We have implemented our Firewall as an embedded system on Virtex-II Pro FPGA based platform, running Linux with the packet classification in hardware. The Firewall was tested in real time with 1 Gbps Ethernet link and 128 sample rules. The packet classification hardware uses a quarter of logic resources and slightly over one third of memory resources of XC2VP30 FPGA. It achieves a maximum classification throughput of 50 million packet/s corresponding to 16 Gbps link rate for the worst case packet size. The Firewall rule update involves only memory re-initialization in software without any hardware change.
Resumo:
High end network security applications demand high speed operation and large rule set support. Packet classification is the core functionality that demands high throughput in such applications. This paper proposes a packet classification architecture to meet such high throughput. We have Implemented a Firewall with this architecture in reconfigurable hardware. We propose an extension to Distributed Crossproducting of Field Labels (DCFL) technique to achieve scalable and high performance architecture. The implemented Firewall takes advantage of inherent structure and redundancy of rule set by using, our DCFL Extended (DCFLE) algorithm. The use of DCFLE algorithm results In both speed and area Improvement when It is Implemented in hardware. Although we restrict ourselves to standard 5-tuple matching, the architecture supports additional fields.High throughput classification Invariably uses Ternary Content Addressable Memory (TCAM) for prefix matching, though TCAM fares poorly In terms of area and power efficiency. Use of TCAM for port range matching is expensive, as the range to prefix conversion results in large number of prefixes leading to storage inefficiency. Extended TCAM (ETCAM) is fast and the most storage efficient solution for range matching. We present for the first time a reconfigurable hardware Implementation of ETCAM. We have implemented our Firewall as an embedded system on Virtex-II Pro FPGA based platform, running Linux with the packet classification in hardware. The Firewall was tested in real time with 1 Gbps Ethernet link and 128 sample rules. The packet classification hardware uses a quarter of logic resources and slightly over one third of memory resources of XC2VP30 FPGA. It achieves a maximum classification throughput of 50 million packet/s corresponding to 16 Gbps link rate for file worst case packet size. The Firewall rule update Involves only memory re-initialiization in software without any hardware change.
Resumo:
344 p.
Resumo:
Embedded system design (VHDL description) based on Xilinx's Spartan3E Development Kit to perform real-time PID control and monitoring of DC motors.
Resumo:
提出了一种嵌入式DSP系统的存储优化方法。该方法基于同步数据流模型SDF(Synchronous Data Flow)。针对其他优化算法不适用于存在反馈环的同步数据流模型的问题,该方法为反馈环的空间优化设计实现了启发式的调度算法,并提出了将SAS(Single Appearance Schedules)和Non-SAS类型调度序列相结合的层次化的空间优化方案,为同步数据流模型调度序列的空间优化提供一个通用的解决方案。实验结果证实了该方案的有效性。
Resumo:
为了解决空间辐射对嵌入式计算机系统正确性的影响越来越明显的问题,基于典型的编译级容错技术,在编译器LCC上实现了基于有向无环图的编译级容错检测方法VarBIFT。该方法可以有效的保护由于粒子效应所引起的瞬时硬件故障,并可针对不同的目标机自动生成容错代码。实验结果表明,VarBIFT使源程序的平均段错误率从32.3%降到了13.9%,平均错误输出率从28.6%降到了9.2%;而其时间开销和空间开销仅为0.7%和36%。
Resumo:
近年来,随着微纳米科技的迅速发展,机电产品有望向更微观化、高性能化发展,这将促进材料、制造、电子、生物医学、信息等领域新的科学技术出现,在新的科学技术层次上为可持续发展的理论提供物质和技术保障。微纳米科技最终目标是研究和发现微纳尺度物质所具有的新颖的物理、化学和生物学现象与特性。并以此为基础来设计、制作、组装成新的材料、组件或系统,实现与之相应的特定功能,促进新的科学技术发展与变革,这无疑具有十分重要的科学意义和经济价值。而实现这个目标的使能技术便是微纳米尺度下观测、操作和装配的科学方法与相关的技术和装备,因此开展微纳米操作研究具有特别重要的意义。微纳米操作是微纳米制造科学技术的重要内容之一,使用探针模式的机器人化微纳操作方法,实现在微纳米尺度物体的可控操作,对促进我国微纳米科学技术发展具有特别重要的意义。 目前已有的基于探针的纳米技术装置如SPM (Scanning Probe Microscope)是基于探针模式的纳米观测基本装置。在此基础上研究发展的基于探针的纳米操作已成为纳米科技研究的新领域,是目前世界上各国正在大力开发的前沿研究课题。但目前市场上的SPM等纳米观测设备缺乏驱动控制与信息交互功能和开放界面,限制了用户在此基础上开发纳米操作、装配等功能的能力,因而研究具有信息交互能力的、可进行在线操作控制与宏-微-纳观信息交互的纳米操作监控系统,进而发展成具有自动化/机器人化功能的纳米作业系统队纳米科学技术发展、纳米制造的实现无疑具有重要意义。本论文的科研内容是以面向纳米制造的机器人化系统为研究背景,在自主技术的基础上,开展应用ARM嵌入式系统构成纳米作业系统的实时控制器研究。实时多任务的操作控制系统是纳米作业系统的核心技术,可以实时进行基于探针的传感信息采集、状态反馈控制、形貌观测数据生成、作业运动轨迹生成、位置反馈控制等功能的数据处理与实现。本论文重点介绍以SAMSUNG公司的ARM9处理器芯片S3C2410为嵌入式控制器系统的核心,在移植嵌入式Linux作为操作系统的基础上,开发具有实时数据采集与控制指令、通信功能的人机交互界面。基于ARM的实时控制器的研究为探针模式的纳米观测与操作系统开发提供了关键技术,可以提供开放的AFM系统,促进操作型纳米系统的研究与实现,可以保证纳米观测与操作控制的实时性,可以为纳米作业控制方法提供方便的编程、开发功能。本论文主要研究了面向纳米作业的基于ARM嵌入式实时控制器硬件结构及软件系统的研究与开发过程。首先介绍嵌入式系统的基本概念和特点;其次介绍基于SPM模式的纳米操作系统性能与技术特点;第三,根据纳米作业系统的技术功能要求,详细介绍了具有实时多任务管理功能的硬件系统的设计,重点解决核心板和扩展板各部分功能模块的设计;第四,详细介绍了嵌入式Linux操作系统下的应用程序开发模式及开发过程;最后,详细介绍了嵌入式Linux操作系统下的应用程序开发,主要工作是完成SPM纳米操作系统中的ARM开发平台的功能接口模块的调试及Linux系统下多线程技术在本系统中的应用。本次毕业设计已完成ARM开发平台在整个SPM纳米操作系统中要实现的各个功能模块,结合SPM纳米操作系统的实时性问题,进行了ARM开发平台的系统软件架构分析和利用多线程技术的以太网通信实验,在一定程度上提高了纳米操作系统中的实时性和成像质量。
Resumo:
The efficient development of multi-threaded software has, for many years, been an unsolved problem in computer science. Finding a solution to this problem has become urgent with the advent of multi-core processors. Furthermore, the problem has become more complicated because multi-cores are everywhere (desktop, laptop, embedded system). As such, they execute generic programs which exhibit very different characteristics than the scientific applications that have been the focus of parallel computing in the past.
Implicitly parallel programming is an approach to parallel pro- gramming that promises high productivity and efficiency and rules out synchronization errors and race conditions by design. There are two main ingredients to implicitly parallel programming: (i) a con- ventional sequential programming language that is extended with annotations that describe the semantics of the program and (ii) an automatic parallelizing compiler that uses the annotations to in- crease the degree of parallelization.
It is extremely important that the annotations and the automatic parallelizing compiler are designed with the target application do- main in mind. In this paper, we discuss the Paralax approach to im- plicitly parallel programming and we review how the annotations and the compiler design help to successfully parallelize generic programs. We evaluate Paralax on SPECint benchmarks, which are a model for such programs, and demonstrate scalable speedups, up to a factor of 6 on 8 cores.
Resumo:
Os sistemas distribuídos embarcados (Distributed Embedded Systems – DES) têm sido usados ao longo dos últimos anos em muitos domínios de aplicação, da robótica, ao controlo de processos industriais passando pela aviónica e pelas aplicações veiculares, esperando-se que esta tendência continue nos próximos anos. A confiança no funcionamento é uma propriedade importante nestes domínios de aplicação, visto que os serviços têm de ser executados em tempo útil e de forma previsível, caso contrário, podem ocorrer danos económicos ou a vida de seres humanos poderá ser posta em causa. Na fase de projecto destes sistemas é impossível prever todos os cenários de falhas devido ao não determinismo do ambiente envolvente, sendo necessária a inclusão de mecanismos de tolerância a falhas. Adicionalmente, algumas destas aplicações requerem muita largura de banda, que também poderá ser usada para a evolução dos sistemas, adicionandolhes novas funcionalidades. A flexibilidade de um sistema é uma propriedade importante, pois permite a sua adaptação às condições e requisitos envolventes, contribuindo também para a simplicidade de manutenção e reparação. Adicionalmente, nos sistemas embarcados, a flexibilidade também é importante por potenciar uma melhor utilização dos, muitas vezes escassos, recursos existentes. Uma forma evidente de aumentar a largura de banda e a tolerância a falhas dos sistemas embarcados distribuídos é a replicação dos barramentos do sistema. Algumas soluções existentes, quer comerciais quer académicas, propõem a replicação dos barramentos para aumento da largura de banda ou para aumento da tolerância a falhas. No entanto e quase invariavelmente, o propósito é apenas um, sendo raras as soluções que disponibilizam uma maior largura de banda e um aumento da tolerância a falhas. Um destes raros exemplos é o FlexRay, com a limitação de apenas ser permitido o uso de dois barramentos. Esta tese apresentada e discute uma proposta para usar a replicação de barramentos de uma forma flexível com o objectivo duplo de aumentar a largura de banda e a tolerância a falhas. A flexibilidade dos protocolos propostos também permite a gestão dinâmica da topologia da rede, sendo o número de barramentos apenas limitado pelo hardware/software. As propostas desta tese foram validadas recorrendo ao barramento de campo CAN – Controller Area Network, escolhido devido à sua grande implantação no mercado. Mais especificamente, as soluções propostas foram implementadas e validadas usando um paradigma que combina flexibilidade com comunicações event-triggered e time-triggered: o FTT – Flexible Time- Triggered. No entanto, uma generalização para CAN nativo é também apresentada e discutida. A inclusão de mecanismos de replicação do barramento impõe a alteração dos antigos protocolos de replicação e substituição do nó mestre, bem como a definição de novos protocolos para esta finalidade. Este trabalho tira partido da arquitectura centralizada e da replicação do nó mestre para suportar de forma eficiente e flexível a replicação de barramentos. Em caso de ocorrência de uma falta num barramento (ou barramentos) que poderia provocar uma falha no sistema, os protocolos e componentes propostos nesta tese fazem com que o sistema reaja, mudando para um modo de funcionamento degradado. As mensagens que estavam a ser transmitidas nos barramentos onde ocorreu a falta são reencaminhadas para os outros barramentos. A replicação do nó mestre baseia-se numa estratégia líder-seguidores (leaderfollowers), onde o líder (leader) controla todo o sistema enquanto os seguidores (followers) servem como nós de reserva. Se um erro ocorrer no nó líder, um dos nós seguidores passará a controlar o sistema de uma forma transparente e mantendo as mesmas funcionalidades. As propostas desta tese foram também generalizadas para CAN nativo, tendo sido para tal propostos dois componentes adicionais. É, desta forma possível ter as mesmas capacidades de tolerância a falhas ao nível dos barramentos juntamente com a gestão dinâmica da topologia de rede. Todas as propostas desta tese foram implementadas e avaliadas. Uma implementação inicial, apenas com um barramento foi avaliada recorrendo a uma aplicação real, uma equipa de futebol robótico onde o protocolo FTT-CAN foi usado no controlo de movimento e da odometria. A avaliação do sistema com múltiplos barramentos foi feita numa plataforma de teste em laboratório. Para tal foi desenvolvido um sistema de injecção de faltas que permite impor faltas nos barramentos e nos nós mestre, e um sistema de medida de atrasos destinado a medir o tempo de resposta após a ocorrência de uma falta.
Resumo:
Trabalho Final de Mestrado para obtenção do grau de Mestre em Engenharia de Electrónica e Telecomunicações
Resumo:
Trabalho de Projeto para obtenção do grau de Mestre em Engenharia de Eletrónica e Telecomunicações
Resumo:
Trabalho de Projeto para obtenção do grau de Mestre em Engenharia Informática e de Computadores