49 resultados para Worst-case execution-time
em Universidad Politécnica de Madrid
Resumo:
This paper describes a case study in WCET analysis of an on-board spacecraft software system. The attitude control system of UPMSat-2, an experimental micro-satellite which is scheduled to be launched in 2013, is used for an experiment on analysing the worst-case execution time of code automatically generated from a Simulink model. In order to properly test the code, a hardware-in-the-loop configuration with a simulation model of the spacecraft environment has been used as a test bench. The code has been analysed with RapiTime, with some modifications to the original instrumentation routines, in order to take into account the particularities of the test configuration. Results from the experiment are described and commented in the paper.
Resumo:
This paper describes the authors? experience with static analysis of both WCET and stack usage of a satellite on-board software subsystem. The work is a continuation of a previous case study that used a dynamic WCET analysis tool on an earlier version of the same software system. In particular, the AbsInt aiT tool has been evaluated by analysing both C and Ada code generated by Simulink within the UPMSat-2 project. Some aspects of the aiT tool, specifically those dealing with SPARC register windows, are compared to another static analysis tool, Bound-T. The results of the analysis are discussed, and some conclusions on the use of static WCET analysis tools on the SPARC architecture are commented in the paper.
Resumo:
Abstract machines provide a certain separation between platformdependent and platform-independent concerns in compilation. Many of the differences between architectures are encapsulated in the speciflc abstract machine implementation and the bytecode is left largely architecture independent. Taking advantage of this fact, we present a framework for estimating upper and lower bounds on the execution times of logic programs running on a bytecode-based abstract machine. Our approach includes a one-time, programindependent proflling stage which calculates constants or functions bounding the execution time of each abstract machine instruction. Then, a compile-time cost estimation phase, using the instruction timing information, infers expressions giving platform-dependent upper and lower bounds on actual execution time as functions of input data sizes for each program. Working at the abstract machine level makes it possible to take into account low-level issues in new architectures and platforms by just reexecuting the calibration stage instead of having to tailor the analysis for each architecture and platform. Applications of such predicted execution times include debugging/veriflcation of time properties, certiflcation of time properties in mobile code, granularity control in parallel/distributed computing, and resource-oriented specialization.
Resumo:
Predicting statically the running time of programs has many applications ranging from task scheduling in parallel execution to proving the ability of a program to meet strict time constraints. A starting point in order to attack this problem is to infer the computational complexity of such programs (or fragments thereof). This is one of the reasons why the development of static analysis techniques for inferring cost-related properties of programs (usually upper and/or lower bounds of actual costs) has received considerable attention.
Resumo:
Effective static analyses have been proposed which infer bounds on the number of resolutions or reductions. These have the advantage of being independent from the platform on which the programs are executed and have been shown to be useful in a number of applications, such as granularity control in parallel execution. On the other hand, in distributed computation scenarios where platforms with different capabilities come into play, it is necessary to express costs in metrics that include the characteristics of the platform. In particular, it is specially interesting to be able to infer upper and lower bounds on actual execution times. With this objective in mind, we propose an approach which combines compile-time analysis for cost bounds with a one-time profiling of the platform in order to determine the valúes of certain parameters for a given platform. These parameters calíbrate a cost model which, from then on, is able to compute statically time bound functions for procedures and to predict with a significant degree of accuracy the execution times of such procedures in the given platform. The approach has been implemented and integrated in the CiaoPP system.
Resumo:
Systems relying on fixed hardware components with a static level of parallelism can suffer from an underuse of logical resources, since they have to be designed for the worst-case scenario. This problem is especially important in video applications due to the emergence of new flexible standards, like Scalable Video Coding (SVC), which offer several levels of scalability. In this paper, Dynamic and Partial Reconfiguration (DPR) of modern FPGAs is used to achieve run-time variable parallelism, by using scalable architectures where the size can be adapted at run-time. Based on this proposal, a scalable Deblocking Filter core (DF), compliant with the H.264/AVC and SVC standards has been designed. This scalable DF allows run-time addition or removal of computational units working in parallel. Scalability is offered together with a scalable parallelization strategy at the macroblock (MB) level, such that when the size of the architecture changes, MB filtering order is modified accordingly
Resumo:
This paper presents some fundamental properties of independent and-parallelism and extends its applicability by enlarging the class of goals eligible for parallel execution. A simple model of (independent) and-parallel execution is proposed and issues of correctness and efficiency discussed in the light of this model. Two conditions, "strict" and "non-strict" independence, are defined and then proved sufficient to ensure correctness and efñciency of parallel execution: if goals which meet these conditions are executed in parallel the solutions obtained are the same as those produced by standard sequential execution. Also, in absence of failure, the parallel proof procedure does not genérate any additional work (with respect to standard SLD-resolution) while the actual execution time is reduced. Finally, in case of failure of any of the goals no slow down will occur. For strict independence the results are shown to hold independently of whether the parallel goals execute in the same environment or in sepárate environments. In addition, a formal basis is given for the automatic compile-time generation of independent and-parallelism: compile-time conditions to efficiently check goal independence at run-time are proposed and proved sufficient. Also, rules are given for constructing simpler conditions if information regarding the binding context of the goals to be executed in parallel is available to the compiler.
Resumo:
This paper presents and develops a generalized concept of Non-Strict Independent And Parallelism (NSIAP). NSIAP extends the applicability of Independent And- Parallelism (IAP) by enlarging the class of goals which are eligible for parallel execution. At the same time it maintains IAP's ability to run non-deterministic goals in parallel and to preserve the computational complexity expected in the execution of the program by the programmer. First, a parallel execution framework is defined and some fundamental correctness results, in the sense of equivalence of solutions with the sequential model, are discussed for this framework. The issue of efficiency is then considered. Two new definitions of NSI are given for the cases of puré and impure goals respectively and efficiency results are provided for programs parallelized under these definitions which include treatment of the case of goal failure: not only is reduction of execution time guaranteed (modulo run-time overheads) in the absence of failure but it is also shown that in the worst case of failure no speed-down will occur. In addition to applying to NSI, these results carry over and complete previous results shown in the context of IAP which did not deal with the case of goal failure. Finally, some practical examples of the application of the NSIAP concept to the parallelization of a set of programs are presented and performance results, showing the advantage of using NSI, are given.
Resumo:
Software architectural evaluation is a key discipline used to identify, at early stages of a real-time system (RTS) development, the problems that may arise during its operation. Typical mechanisms supporting concurrency, such as semaphores, mutexes or monitors, usually lead to concurrency problems in execution time that are difficult to be identified, reproduced and solved. For this reason, it is crucial to understand the root causes of these problems and to provide support to identify and mitigate them at early stages of the system lifecycle. This paper aims to present the results of a research work oriented to the development of the tool called ‘Deadlock Risk Evaluation of Architectural Models’ (DREAM) to assess deadlock risk in architectural models of an RTS. A particular architectural style, Pipelines of Processes in Object-Oriented Architectures–UML (PPOOA) was used to represent platform-independent models of an RTS architecture supported by the PPOOA –Visio tool. We validated the technique presented here by using several case studies related to RTS development and comparing our results with those from other deadlock detection approaches, supported by different tools. Here we present two of these case studies, one related to avionics and the other to planetary exploration robotics. Copyright © 2011 John Wiley & Sons, Ltd.
Resumo:
Modern FPGAs with Dynamic and Partial Reconfiguration (DPR) feature allow the implementation of complex, yet flexible, hardware systems. Combining this flexibility with evolvable hardware techniques, real adaptive systems, able to reconfigure themselves according to environmental changes, can be envisaged. In this paper, a highly regular and modular architecture combined with a fast reconfiguration mechanism is proposed, allowing the introduction of dynamic and partial reconfiguration in the evolvable hardware loop. Results and use case show that, following this approach, evolvable processing IP Cores can be built, providing intensive data processing capabilities, improving data and delay overheads with respect to previous proposals. Results also show that, in the worst case (maximum mutation rate), average reconfiguration time is 5 times lower than evaluation time.
Resumo:
Although several profiling techniques for identifying performance bottlenecks in logic programs have been developed, they are generally not automatic and in most cases they do not provide enough information for identifying the root causes of such bottlenecks. This complicates using their results for guiding performance improvement. We present a profiling method and tool that provides such explanations. Our profiler associates cost centers to certain program elements and can measure different types of resource-related properties that affect performance, preserving the precedence of cost centers in the cali graph. It includes an automatic method for detecting procedures that are performance bottlenecks. The profiling tool has been integrated in a previously developed run-time checking framework to allow verification of certain properties when they cannot be verified statically. The approach allows checking global computational properties which require complex instrumentation tracking information about previous execution states, such as, e.g., that the execution time accumulated by a given procedure is not greater than a given bound. We have built a prototype implementation, integrated it in the Ciao/CiaoPP system and successfully applied it to performance improvement, automatic optimization (e.g., resource-aware specialization of programs), run-time checking, and debugging of global computational properties (e.g., resource usage) in Prolog programs.
Resumo:
Although several profiling techniques for identifying performance bottlenecks in logic programs have been developed, they are generally not automatic and in most cases they do not provide enough information for identifying the root causes of such bottlenecks. This complicates using their results for guiding performance improvement. We present a profiling method and tool that provides such explanations. Our profiler associates cost centers to certain program elements and can measure different types of resource-related properties that affect performance, preserving the precedence of cost centers in the call graph. It includes an automatic method for detecting procedures that are performance bottlenecks. The profiling tool has been integrated in a previously developed run-time checking framework to allow verification of certain properties when they cannot be verified statically. The approach allows checking global computational properties which require complex instrumentation tracking information about previous execution states, such as, e.g., that the execution time accumulated by a given procedure is not greater than a given bound. We have built a prototype implementation, integrated it in the Ciao/CiaoPP system and successfully applied it to performance improvement, automatic optimization (e.g., resource-aware specialization of programs), run-time checking, and debugging of global computational properties (e.g., resource usage) in Prolog programs.
Resumo:
Effective static analyses have been proposed which allow inferring functions which bound the number of resolutions or reductions. These have the advantage of being independent from the platform on which the programs are executed and such bounds have been shown useful in a number of applications, such as granularity control in parallel execution. On the other hand, in certain distributed computation scenarios where different platforms come into play, with each platform having different capabilities, it is more interesting to express costs in metrics that include the characteristics of the platform. In particular, it is specially interesting to be able to infer upper and lower bounds on actual execution time. With this objective in mind, we propose a method which allows inferring upper and lower bounds on the execution times of procedures of a program in a given execution platform. The approach combines compile-time cost bounds analysis with a one-time profiling of the platform in order to determine the values of certain constants for that platform. These constants calibrate a cost model which from then on is able to compute statically time bound functions for procedures and to predict with a significant degree of accuracy the execution times of such procedures in the given platform. The approach has been implemented and integrated in the CiaoPP system.
Resumo:
Cada vez es más frecuente que los sistemas de comunicaciones realicen buena parte de sus funciones (modulación y demodulación, codificación y decodificación...) mediante software en lugar de utilizar hardware dedicado. Esta técnica se denomina “Radio software”. El objetivo de este PFC es estudiar un algoritmo implementado en C empleado en sistemas de comunicaciones modernos, en concreto la decodificación de Viterbi, el cual se encarga de corregir los posibles errores producidos a lo largo de la comunicación, para poder trasladarlo a sistemas empotrados multiprocesador. Partiendo de un código en C para el decodificador que realiza todas sus operaciones en serie, en este Proyecto fin de carrera se ha paralelizado dicho código, es decir, que el trabajo que realizaba un solo hilo para el caso del código serie, es procesado por un número de hilos configurables por el usuario, persiguiendo que el tiempo de ejecución se reduzca, es decir, que el programa paralelizado se ejecute de una manera más rápida. El trabajo se ha realizado en un PC con sistema operativo Linux, pero la versión paralelizada del código puede ser empleada en un sistema empotrado multiprocesador en el cual cada procesador ejecuta el código correspondiente a uno de los hilos de la versión de PC. ABSTRACT It is increasingly common for communications systems to perform most of its functions (modulation and demodulation, coding and decoding) by software instead of than using dedicated hardware. This technique is called: “Software Radio”. The aim of the PFC is to study an implemented algorithm in C language used in modern communications systems, particularly Viterbi decoding, which amends any possible error produced during the communication, in order to be able to move multiprocessor embedded systems. Starting from a C code of the decoder that performs every single operation in serial, in this final project, this code has been parallelized, which means that the work used to be done by just a single thread in the case of serial code, is processed by a number of threads configured by the user, in order to decrease the execution time, meaning that the parallelized program is executed faster. The work has been carried out on a PC using Linux operating system, but the parallelized version of the code could also be used in an embedded multiprocessor system in which each processor executes the corresponding code to every single one of the threads of the PC version.
Resumo:
Modern object oriented languages like C# and JAVA enable developers to build complex application in less time. These languages are based on selecting heap allocated pass-by-reference objects for user defined data structures. This simplifies programming by automatically managing memory allocation and deallocation in conjunction with automated garbage collection. This simplification of programming comes at the cost of performance. Using pass-by-reference objects instead of lighter weight pass-by value structs can have memory impact in some cases. These costs can be critical when these application runs on limited resource environments such as mobile devices and cloud computing systems. We explore the problem by using the simple and uniform memory model to improve the performance. In this work we address this problem by providing an automated and sounds static conversion analysis which identifies if a by reference type can be safely converted to a by value type where the conversion may result in performance improvements. This works focus on C# programs. Our approach is based on a combination of syntactic and semantic checks to identify classes that are safe to convert. We evaluate the effectiveness of our work in identifying convertible types and impact of this transformation. The result shows that the transformation of reference type to value type can have substantial performance impact in practice. In our case studies we optimize the performance in Barnes-Hut program which shows total memory allocation decreased by 93% and execution time also reduced by 15%.