847 resultados para Distributed embedded systems
Resumo:
As ubiquitous computing becomes a reality, sensitive information is increasingly processed and transmitted by smart cards, mobile devices and various types of embedded systems. This has led to the requirement of a new class of lightweight cryptographic algorithm to ensure security in these resource constrained environments. The International Organization for Standardization (ISO) has recently standardised two low-cost block ciphers for this purpose, Clefia and Present. In this paper we provide the first comprehensive hardware architecture comparison between these ciphers, as well as a comparison with the current National Institute of Standards and Technology (NIST) standard, the Advanced Encryption Standard.
Resumo:
Low-power processors and accelerators that were originally designed for the embedded systems market are emerging as building blocks for servers. Power capping has been actively explored as a technique to reduce the energy footprint of high-performance processors. The opportunities and limitations of power capping on the new low-power processor and accelerator ecosystem are less understood. This paper presents an efficient power capping and management infrastructure for heterogeneous SoCs based on hybrid ARM/FPGA designs. The infrastructure coordinates dynamic voltage and frequency scaling with task allocation on a customised Linux system for the Xilinx Zynq SoC. We present a compiler-assisted power model to guide voltage and frequency scaling, in conjunction with workload allocation between the ARM cores and the FPGA, under given power caps. The model achieves less than 5% estimation bias to mean power consumption. In an FFT case study, the proposed power capping schemes achieve on average 97.5% of the performance of the optimal execution and match the optimal execution in 87.5% of the cases, while always meeting power constraints.
Resumo:
This paper describes middleware-level support for agent mobility, targeted at hierarchically structured wireless sensor and actuator network applications. Agent mobility enables a dynamic deployment and adaptation of the application on top of the wireless network at runtime, while allowing the middleware to optimize the placement of agents, e.g., to reduce wireless network traffic, transparently to the application programmer. The paper presents the design of the mechanisms and protocols employed to instantiate agents on nodes and to move agents between nodes. It also gives an evaluation of a middleware prototype running on Imote2 nodes that communicate over ZigBee. The results show that our implementation is reasonably efficient and fast enough to support the envisioned functionality on top of a commodity multi-hop wireless technology. Our work is to a large extent platform-neutral, thus it can inform the design of other systems that adopt a hierarchical structuring of mobile components. © 2012 ICST Institute for Computer Science, Social Informatics and Telecommunications Engineering.
Resumo:
Realising memory intensive applications such as image and video processing on FPGA requires creation of complex, multi-level memory hierarchies to achieve real-time performance; however commerical High Level Synthesis tools are unable to automatically derive such structures and hence are unable to meet the demanding bandwidth and capacity constraints of these applications. Current approaches to solving this problem can only derive either single-level memory structures or very deep, highly inefficient hierarchies, leading in either case to one or more of high implementation cost and low performance. This paper presents an enhancement to an existing MC-HLS synthesis approach which solves this problem; it exploits and eliminates data duplication at multiple levels levels of the generated hierarchy, leading to a reduction in the number of levels and ultimately higher performance, lower cost implementations. When applied to synthesis of C-based Motion Estimation, Matrix Multiplication and Sobel Edge Detection applications, this enables reductions in Block RAM and Look Up Table (LUT) cost of up to 25%, whilst simultaneously increasing throughput.
Resumo:
Wearable devices performing advanced bio-signal analysis algorithms are aimed to foster a revolution in healthcare provision of chronic cardiac diseases. In this context, energy efficiency is of paramount importance, as long-term monitoring must be ensured while relying on a tiny power source. Operating at a scaled supply voltage, just above the threshold voltage, effectively helps in saving substantial energy, but it makes circuits, and especially memories, more prone to errors, threatening the correct execution of algorithms. The use of error detection and correction codes may help to protect the entire memory content, however it incurs in large area and energy overheads which may not be compatible with the tight energy budgets of wearable systems. To cope with this challenge, in this paper we propose to limit the overhead of traditional schemes by selectively detecting and correcting errors only in data highly impacting the end-to-end quality of service of ultra-low power wearable electrocardiogram (ECG) devices. This partition adopts the protection of either significant words or significant bits of each data element, according to the application characteristics (statistical properties of the data in the application buffers), and its impact in determining the output. The proposed heterogeneous error protection scheme in real ECG signals allows substantial energy savings (11% in wearable devices) compared to state-of-the-art approaches, like ECC, in which the whole memory is protected against errors. At the same time, it also results in negligible output quality degradation in the evaluated power spectrum analysis application of ECG signals.
Resumo:
Emerging web applications like cloud computing, Big Data and social networks have created the need for powerful centres hosting hundreds of thousands of servers. Currently, the data centres are based on general purpose processors that provide high flexibility buts lack the energy efficiency of customized accelerators. VINEYARD aims to develop an integrated platform for energy-efficient data centres based on new servers with novel, coarse-grain and fine-grain, programmable hardware accelerators. It will, also, build a high-level programming framework for allowing end-users to seamlessly utilize these accelerators in heterogeneous computing systems by employing typical data-centre programming frameworks (e.g. MapReduce, Storm, Spark, etc.). This programming framework will, further, allow the hardware accelerators to be swapped in and out of the heterogeneous infrastructure so as to offer high flexibility and energy efficiency. VINEYARD will foster the expansion of the soft-IP core industry, currently limited in the embedded systems, to the data-centre market. VINEYARD plans to demonstrate the advantages of its approach in three real use-cases (a) a bio-informatics application for high-accuracy brain modeling, (b) two critical financial applications, and (c) a big-data analysis application.
Resumo:
Power capping is a fundamental method for reducing the energy consumption of a wide range of modern computing environments, ranging from mobile embedded systems to datacentres. Unfortunately, maximising performance and system efficiency under static power caps remains challenging, while maximising performance under dynamic power caps has been largely unexplored. We present an adaptive power capping method that reduces the power consumption and maximizes the performance of heterogeneous SoCs for mobile and server platforms. Our technique combines power capping with coordinated DVFS, data partitioning and core allocations on a heterogeneous SoC with ARM processors and FPGA resources. We design our framework as a run-time system based on OpenMP and OpenCL to utilise the heterogeneous resources. We evaluate it through five data-parallel benchmarks on the Xilinx SoC which allows fully voltage and frequency control. Our experiments show a significant performance boost of 30% under dynamic power caps with concurrent execution on ARM and FPGA, compared to a naive separate approach.
Resumo:
The performance of real-time networks is under continuous improvement as a result of several trends in the digital world. However, these tendencies not only cause improvements, but also exacerbates a series of unideal aspects of real-time networks such as communication latency, jitter of the latency and packet drop rate. This Thesis focuses on the communication errors that appear on such realtime networks, from the point-of-view of automatic control. Specifically, it investigates the effects of packet drops in automatic control over fieldbuses, as well as the architectures and optimal techniques for their compensation. Firstly, a new approach to address the problems that rise in virtue of such packet drops, is proposed. This novel approach is based on the simultaneous transmission of several values in a single message. Such messages can be from sensor to controller, in which case they are comprised of several past sensor readings, or from controller to actuator in which case they are comprised of estimates of several future control values. A series of tests reveal the advantages of this approach. The above-explained approach is then expanded as to accommodate the techniques of contemporary optimal control. However, unlike the aforementioned approach, that deliberately does not send certain messages in order to make a more efficient use of network resources; in the second case, the techniques are used to reduce the effects of packet losses. After these two approaches that are based on data aggregation, it is also studied the optimal control in packet dropping fieldbuses, using generalized actuator output functions. This study ends with the development of a new optimal controller, as well as the function, among the generalized functions that dictate the actuator’s behaviour in the absence of a new control message, that leads to the optimal performance. The Thesis also presents a different line of research, related with the output oscillations that take place as a consequence of the use of classic co-design techniques of networked control. The proposed algorithm has the goal of allowing the execution of such classical co-design algorithms without causing an output oscillation that increases the value of the cost function. Such increases may, under certain circumstances, negate the advantages of the application of the classical co-design techniques. A yet another line of research, investigated algorithms, more efficient than contemporary ones, to generate task execution sequences that guarantee that at least a given number of activated jobs will be executed out of every set composed by a predetermined number of contiguous activations. This algorithm may, in the future, be applied to the generation of message transmission patterns in the above-mentioned techniques for the efficient use of network resources. The proposed task generation algorithm is better than its predecessors in the sense that it is capable of scheduling systems that cannot be scheduled by its predecessor algorithms. The Thesis also presents a mechanism that allows to perform multi-path routing in wireless sensor networks, while ensuring that no value will be counted in duplicate. Thereby, this technique improves the performance of wireless sensor networks, rendering them more suitable for control applications. As mentioned before, this Thesis is centered around techniques for the improvement of performance of distributed control systems in which several elements are connected through a fieldbus that may be subject to packet drops. The first three approaches are directly related to this topic, with the first two approaching the problem from an architectural standpoint, whereas the third one does so from more theoretical grounds. The fourth approach ensures that the approaches to this and similar problems that can be found in the literature that try to achieve goals similar to objectives of this Thesis, can do so without causing other problems that may invalidate the solutions in question. Then, the thesis presents an approach to the problem dealt with in it, which is centered in the efficient generation of the transmission patterns that are used in the aforementioned approaches.
Resumo:
Trabalho Final de Mestrado para obtenção do grau de Mestre em Engenharia de Electrónica e Telecomunicações
Resumo:
Trabalho Final de Mestrado para obtenção do grau de Mestre em Engenharia de Electrónica e Telecomunicações
Resumo:
The use of multicores is becoming widespread inthe field of embedded systems, many of which have real-time requirements. Hence, ensuring that real-time applications meet their timing constraints is a pre-requisite before deploying them on these systems. This necessitates the consideration of the impact of the contention due to shared lowlevel hardware resources like the front-side bus (FSB) on the Worst-CaseExecution Time (WCET) of the tasks. Towards this aim, this paper proposes a method to determine an upper bound on the number of bus requests that tasks executing on a core can generate in a given time interval. We show that our method yields tighter upper bounds in comparison with the state of-the-art. We then apply our method to compute the extra contention delay incurred by tasks, when they are co-scheduled on different cores and access the shared main memory, using a shared bus, access to which is granted using a round-robin arbitration (RR) protocol.
Resumo:
In embedded systems, the timing behaviour of the control mechanisms are sometimes of critical importance for the operational safety. These high criticality systems require strict compliance with the offline predicted task execution time. The execution of a task when subject to preemption may vary significantly in comparison to its non-preemptive execution. Hence, when preemptive scheduling is required to operate the workload, preemption delay estimation is of paramount importance. In this paper a preemption delay estimation method for floating non-preemptive scheduling policies is presented. This work builds on [1], extending the model and optimising it considerably. The preemption delay function is subject to a major tightness improvement, considering the WCET analysis context. Moreover more information is provided as well in the form of an extrinsic cache misses function, which enables the method to provide a solution in situations where the non-preemptive regions sizes are small. Finally experimental results from the implementation of the proposed solutions in Heptane are provided for real benchmarks which validate the significance of this work.
Resumo:
The usage of COTS-based multicores is becoming widespread in the field of embedded systems. Providing realtime guarantees at design-time is a pre-requisite to deploy real-time systems on these multicores. This necessitates the consideration of the impact of the contention due to shared low-level hardware resources on the Worst-Case Execution Time (WCET) of the tasks. As a step towards this aim, this paper first identifies the different factors that make the WCET analysis a challenging problem in a typical COTS-based multicore system. Then, we propose and prove, a mathematically correct method to determine tight upper bounds on the WCET of the tasks, when they are co-scheduled on different cores.
Resumo:
The current industry trend is towards using Commercially available Off-The-Shelf (COTS) based multicores for developing real time embedded systems, as opposed to the usage of custom-made hardware. In typical implementation of such COTS-based multicores, multiple cores access the main memory via a shared bus. This often leads to contention on this shared channel, which results in an increase of the response time of the tasks. Analyzing this increased response time, considering the contention on the shared bus, is challenging on COTS-based systems mainly because bus arbitration protocols are often undocumented and the exact instants at which the shared bus is accessed by tasks are not explicitly controlled by the operating system scheduler; they are instead a result of cache misses. This paper makes three contributions towards analyzing tasks scheduled on COTS-based multicores. Firstly, we describe a method to model the memory access patterns of a task. Secondly, we apply this model to analyze the worst case response time for a set of tasks. Although the required parameters to obtain the request profile can be obtained by static analysis, we provide an alternative method to experimentally obtain them by using performance monitoring counters (PMCs). We also compare our work against an existing approach and show that our approach outperforms it by providing tighter upper-bound on the number of bus requests generated by a task.