876 resultados para compiler optimization


Relevância:

70.00% 70.00%

Publicador:

Resumo:

Los algoritmos basados en registros de desplazamiento con realimentación (en inglés FSR) se han utilizado como generadores de flujos pseudoaleatorios en aplicaciones con recursos limitados como los sistemas de apertura sin llave. Se considera canal primario a aquel que se utiliza para realizar una transmisión de información. La aparición de los ataques de canal auxiliar (en inglés SCA), que explotan información filtrada inintencionadamente a través de canales laterales como el consumo, las emisiones electromagnéticas o el tiempo empleado, supone una grave amenaza para estas aplicaciones, dado que los dispositivos son accesibles por un atacante. El objetivo de esta tesis es proporcionar un conjunto de protecciones que se puedan aplicar de forma automática y que utilicen recursos ya disponibles, evitando un incremento sustancial en los costes y alargando la vida útil de aplicaciones que puedan estar desplegadas. Explotamos el paralelismo existente en algoritmos FSR, ya que sólo hay 1 bit de diferencia entre estados de rondas consecutivas. Realizamos aportaciones en tres niveles: a nivel de sistema, utilizando un coprocesador reconfigurable, a través del compilador y a nivel de bit, aprovechando los recursos disponibles en el procesador. Proponemos un marco de trabajo que nos permite evaluar implementaciones de un algoritmo incluyendo los efectos introducidos por el compilador considerando que el atacante es experto. En el campo de los ataques, hemos propuesto un nuevo ataque diferencial que se adapta mejor a las condiciones de las implementaciones software de FSR, en las que el consumo entre rondas es muy similar. SORU2 es un co-procesador vectorial reconfigurable propuesto para reducir el consumo energético en aplicaciones con paralelismo y basadas en el uso de bucles. Proponemos el uso de SORU2, además, para ejecutar algoritmos basados en FSR de forma segura. Al ser reconfigurable, no supone un sobrecoste en recursos, ya que no está dedicado en exclusiva al algoritmo de cifrado. Proponemos una configuración que ejecuta múltiples algoritmos de cifrado similares de forma simultánea, con distintas implementaciones y claves. A partir de una implementación sin protecciones, que demostramos que es completamente vulnerable ante SCA, obtenemos una implementación segura a los ataques que hemos realizado. A nivel de compilador, proponemos un mecanismo para evaluar los efectos de las secuencias de optimización del compilador sobre una implementación. El número de posibles secuencias de optimizaciones de compilador es extremadamente alto. El marco de trabajo propuesto incluye un algoritmo para la selección de las secuencias de optimización a considerar. Debido a que las optimizaciones del compilador transforman las implementaciones, se pueden generar automáticamente implementaciones diferentes combinamos para incrementar la seguridad ante SCA. Proponemos 2 mecanismos de aplicación de estas contramedidas, que aumentan la seguridad de la implementación original sin poder considerarse seguras. Finalmente hemos propuesto la ejecución paralela a nivel de bit del algoritmo en un procesador. Utilizamos la forma algebraica normal del algoritmo, que automáticamente se paraleliza. La implementación sobre el algoritmo evaluado mejora en rendimiento y evita que se filtre información por una ejecución dependiente de datos. Sin embargo, es más vulnerable ante ataques diferenciales que la implementación original. Proponemos una modificación del algoritmo para obtener una implementación segura, descartando parcialmente ejecuciones del algoritmo, de forma aleatoria. Esta implementación no introduce una sobrecarga en rendimiento comparada con las implementaciones originales. En definitiva, hemos propuesto varios mecanismos originales a distintos niveles para introducir aleatoridad en implementaciones de algoritmos FSR sin incrementar sustancialmente los recursos necesarios. ABSTRACT Feedback Shift Registers (FSR) have been traditionally used to implement pseudorandom sequence generators. These generators are used in Stream ciphers in systems with tight resource constraints, such as Remote Keyless Entry. When communicating electronic devices, the primary channel is the one used to transmit the information. Side-Channel Attack (SCA) use additional information leaking from the actual implementation, including power consumption, electromagnetic emissions or timing information. Side-Channel Attacks (SCA) are a serious threat to FSR-based applications, as an attacker usually has physical access to the devices. The main objective of this Ph.D. thesis is to provide a set of countermeasures that can be applied automatically using the available resources, avoiding a significant cost overhead and extending the useful life of deployed systems. If possible, we propose to take advantage of the inherent parallelism of FSR-based algorithms, as the state of a FSR differs from previous values only in 1-bit. We have contributed in three different levels: architecture (using a reconfigurable co-processor), using compiler optimizations, and at bit level, making the most of the resources available at the processor. We have developed a framework to evaluate implementations of an algorithm including the effects introduced by the compiler. We consider the presence of an expert attacker with great knowledge on the application and the device. Regarding SCA, we have presented a new differential SCA that performs better than traditional SCA on software FSR-based algorithms, where the leaked values are similar between rounds. SORU2 is a reconfigurable vector co-processor. It has been developed to reduce energy consumption in loop-based applications with parallelism. In addition, we propose its use for secure implementations of FSR-based algorithms. The cost overhead is discarded as the co-processor is not exclusively dedicated to the encryption algorithm. We present a co-processor configuration that executes multiple simultaneous encryptions, using different implementations and keys. From a basic implementation, which is proved to be vulnerable to SCA, we obtain an implementation where the SCA applied were unsuccessful. At compiler level, we use the framework to evaluate the effect of sequences of compiler optimization passes on a software implementation. There are many optimization passes available. The optimization sequences are combinations of the available passes. The amount of sequences is extremely high. The framework includes an algorithm for the selection of interesting sequences that require detailed evaluation. As existing compiler optimizations transform the software implementation, using different optimization sequences we can automatically generate different implementations. We propose to randomly switch between the generated implementations to increase the resistance against SCA.We propose two countermeasures. The results show that, although they increase the resistance against SCA, the resulting implementations are not secure. At bit level, we propose to exploit bit level parallelism of FSR-based implementations using pseudo bitslice implementation in a wireless node processor. The bitslice implementation is automatically obtained from the Algebraic Normal Form of the algorithm. The results show a performance improvement, avoiding timing information leakage, but increasing the vulnerability against differential SCA.We provide a secure version of the algorithm by randomly discarding part of the data obtained. The overhead in performance is negligible when compared to the original implementations. To summarize, we have proposed a set of original countermeasures at different levels that introduce randomness in FSR-based algorithms avoiding a heavy overhead on the resources required.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

It has been widely known that a significant part of the bits are useless or even unused during the program execution. Bit-width analysis targets at finding the minimum bits needed for each variable in the program, which ensures the execution correctness and resources saving. In this paper, we proposed a static analysis method for bit-widths in general applications, which approximates conservatively at compile time and is independent of runtime conditions. While most related work focus on integer applications, our method is also tailored and applicable to floating point variables, which could be extended to transform floating point number into fixed point numbers together with precision analysis. We used more precise representations for data value ranges of both scalar and array variables. Element level analysis is carried out for arrays. We also suggested an alternative for the standard fixed-point iterations in bi-directional range analysis. These techniques are implemented on the Trimaran compiler structure and tested on a set of benchmarks to show the results.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Reuse distance analysis, the prediction of how many distinct memory addresses will be accessed between two accesses to a given address, has been established as a useful technique in profile-based compiler optimization, but the cost of collecting the memory reuse profile has been prohibitive for some applications. In this report, we propose using the hardware monitoring facilities available in existing CPUs to gather an approximate reuse distance profile. The difficulties associated with this monitoring technique are discussed, most importantly that there is no obvious link between the reuse profile produced by hardware monitoring and the actual reuse behavior. Potential applications which would be made viable by a reliable hardware-based reuse distance analysis are identified.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Recent approaches to mobile code safety, like proof- arrying code, involve associating safety information to programs. The code supplier provides a program and also includes with it a certifícate (or proof) whose validity entails compliance with a predefined safety policy. The intended benefit is that the program consumer can locally validate the certifícate w.r.t. the "untrusted" program by means of a certifícate checker—a process which should be much simpler, eflicient, and automatic than generating the original proof. We herein introduce a novel approach to mobile code safety which follows a similar scheme, but which is based throughout on the use of abstract interpretation techniques. In our framework the safety policy is specified by using an expressive assertion language defined over abstract domains. We identify a particular slice of the abstract interpretation-based static analysis results which is especially useful as a certifícate. We propose an algorithm for checking the validity of the certifícate on the consumer side which is itself in fact a very simplified and eflicient specialized abstract-interpreter. Our ideas are illustrated through an example implemented in the CiaoPP system. Though further experimentation is still required, we believe the proposed approach is of interest for bringing the automation and expressiveness which is inherent in the abstract interpretation techniques to the área of mobile code safety.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Most current-generation Wireless Sensor Network (WSN) nodes are equipped with multiple sensors of various types, and therefore support for multi-tasking and multiple concurrent applications is becoming increasingly common. This trend has been fostering the design of WSNs allowing several concurrent users to deploy applications with dissimilar requirements. In this paper, we extend the advantages of a holistic programming scheme by designing a novel compiler-assisted scheduling approach (called REIS) able to identify and eliminate redundancies across applications. To achieve this useful high-level optimization, we model each user application as a linear sequence of executable instructions. We show how well-known string-matching algorithms such as the Longest Common Subsequence (LCS) and the Shortest Common Super-sequence (SCS) can be used to produce an optimal merged monolithic sequence of the deployed applications that takes into account embedded scheduling information. We show that our approach can help in achieving about 60% average energy savings in processor usage compared to the normal execution of concurrent applications.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Embedded systems are usually designed for a single or a specified set of tasks. This specificity means the system design as well as its hardware/software development can be highly optimized. Embedded software must meet the requirements such as high reliability operation on resource-constrained platforms, real time constraints and rapid development. This necessitates the adoption of static machine codes analysis tools running on a host machine for the validation and optimization of embedded system codes, which can help meet all of these goals. This could significantly augment the software quality and is still a challenging field.Embedded systems are usually designed for a single or a specified set of tasks. This specificity means the system design as well as its hardware/software development can be highly optimized. Embedded software must meet the requirements such as high reliability operation on resource-constrained platforms, real time constraints and rapid development. This necessitates the adoption of static machine codes analysis tools running on a host machine for the validation and optimization of embedded system codes, which can help meet all of these goals. This could significantly augment the software quality and is still a challenging field.Embedded systems are usually designed for a single or a specified set of tasks. This specificity means the system design as well as its hardware/software development can be highly optimized. Embedded software must meet the requirements such as high reliability operation on resource-constrained platforms, real time constraints and rapid development. This necessitates the adoption of static machine codes analysis tools running on a host machine for the validation and optimization of embedded system codes, which can help meet all of these goals. This could significantly augment the software quality and is still a challenging field.Embedded systems are usually designed for a single or a specified set of tasks. This specificity means the system design as well as its hardware/software development can be highly optimized. Embedded software must meet the requirements such as high reliability operation on resource-constrained platforms, real time constraints and rapid development. This necessitates the adoption of static machine codes analysis tools running on a host machine for the validation and optimization of embedded system codes, which can help meet all of these goals. This could significantly augment the software quality and is still a challenging field.This dissertation contributes to an architecture oriented code validation, error localization and optimization technique assisting the embedded system designer in software debugging, to make it more effective at early detection of software bugs that are otherwise hard to detect, using the static analysis of machine codes. The focus of this work is to develop methods that automatically localize faults as well as optimize the code and thus improve the debugging process as well as quality of the code.Validation is done with the help of rules of inferences formulated for the target processor. The rules govern the occurrence of illegitimate/out of place instructions and code sequences for executing the computational and integrated peripheral functions. The stipulated rules are encoded in propositional logic formulae and their compliance is tested individually in all possible execution paths of the application programs. An incorrect sequence of machine code pattern is identified using slicing techniques on the control flow graph generated from the machine code.An algorithm to assist the compiler to eliminate the redundant bank switching codes and decide on optimum data allocation to banked memory resulting in minimum number of bank switching codes in embedded system software is proposed. A relation matrix and a state transition diagram formed for the active memory bank state transition corresponding to each bank selection instruction is used for the detection of redundant codes. Instances of code redundancy based on the stipulated rules for the target processor are identified.This validation and optimization tool can be integrated to the system development environment. It is a novel approach independent of compiler/assembler, applicable to a wide range of processors once appropriate rules are formulated. Program states are identified mainly with machine code pattern, which drastically reduces the state space creation contributing to an improved state-of-the-art model checking. Though the technique described is general, the implementation is architecture oriented, and hence the feasibility study is conducted on PIC16F87X microcontrollers. The proposed tool will be very useful in steering novices towards correct use of difficult microcontroller features in developing embedded systems.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Bank switching in embedded processors having partitioned memory architecture results in code size as well as run time overhead. An algorithm and its application to assist the compiler in eliminating the redundant bank switching codes introduced and deciding the optimum data allocation to banked memory is presented in this work. A relation matrix formed for the memory bank state transition corresponding to each bank selection instruction is used for the detection of redundant codes. Data allocation to memory is done by considering all possible permutation of memory banks and combination of data. The compiler output corresponding to each data mapping scheme is subjected to a static machine code analysis which identifies the one with minimum number of bank switching codes. Even though the method is compiler independent, the algorithm utilizes certain architectural features of the target processor. A prototype based on PIC 16F87X microcontrollers is described. This method scales well into larger number of memory blocks and other architectures so that high performance compilers can integrate this technique for efficient code generation. The technique is illustrated with an example

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This thesis describes Optimist, an optimizing compiler for the Concurrent Smalltalk language developed by the Concurrent VLSI Architecture Group. Optimist compiles Concurrent Smalltalk to the assembly language of the Message-Driven Processor (MDP). The compiler includes numerous optimization techniques such as dead code elimination, dataflow analysis, constant folding, move elimination, concurrency analysis, duplicate code merging, tail forwarding, use of register variables, as well as various MDP-specific optimizations in the code generator. The MDP presents some unique challenges and opportunities for compilation. Due to the MDP's small memory size, it is critical that the size of the generated code be as small as possible. The MDP is an inherently concurrent processor with efficient mechanisms for sending and receiving messages; the compiler takes advantage of these mechanisms. The MDP's tagged architecture allows very efficient support of object-oriented languages such as Concurrent Smalltalk. The initial goals for the MDP were to have the MDP execute about twenty instructions per method and contain 4096 words of memory. This compiler shows that these goals are too optimistic -- most methods are longer, both in terms of code size and running time. Thus, the memory size of the MDP should be increased.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Compiler optimizations help to make code run faster at runtime. When the compilation is done before the program is run, compilation time is less of an issue, but how do on-the-fly compilation and optimization impact the overall runtime? If the compiler must compete with the running application for resources, the running application will take more time to complete. This paper investigates the impact of specific compiler optimizations on the overall runtime of an application. A foldover Plackett and Burman design is used to choose compiler optimizations that appear to contribute to shorter overall runtimes. These selected optimizations are compared with the default optimization levels in the Jikes RVM. This method selects optimizations that result in a shorter overall runtime than the default O0, O1, and O2 levels. This shows that careful selection of compiler optimizations can have a significant, positive impact on overall runtime.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Dynamic scheduling increases the expressive power of logic programming languages, but also introduces some overhead. In this paper we present two classes of program transformations designed to reduce this additional overhead, while preserving the operational semantics of the original programs, modulo ordering of literals woken at the same time. The first class of transformations simplifies the delay conditions while the second class moves delayed literals later in the rule body. Application of the program transformations can be automated using information provided by compile-time analysis. We provide experimental results obtained from an implementation of the proposed techniques using the CIAO prototype compiler. Our results show that the techniques can lead to substantial performance improvement.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this report we discuss some of the issues involved in the specialization and optimization of constraint logic programs with dynamic scheduling. Dynamic scheduling, as any other form of concurrency, increases the expressive power of constraint logic programs, but also introduces run-time overhead. The objective of the specialization and optimization is to reduce as much as possible such overhead automatically, while preserving the semantics of the original programs. This is done by program transformation based on global analysis. We present implementation techniques for this purpose and report on experimental results obtained from an implementation of the techniques in the context of the CIAO compiler.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The design of fault tolerant systems is gaining importance in large domains of embedded applications where design constrains are as important as reliability. New software techniques, based on selective application of redundancy, have shown remarkable fault coverage with reduced costs and overheads. However, the large number of different solutions provided by these techniques, and the costly process to assess their reliability, make the design space exploration a very difficult and time-consuming task. This paper proposes the integration of a multi-objective optimization tool with a software hardening environment to perform an automatic design space exploration in the search for the best trade-offs between reliability, cost, and performance. The first tool is commanded by a genetic algorithm which can simultaneously fulfill many design goals thanks to the use of the NSGA-II multi-objective algorithm. The second is a compiler-based infrastructure that automatically produces selective protected (hardened) versions of the software and generates accurate overhead reports and fault coverage estimations. The advantages of our proposal are illustrated by means of a complex and detailed case study involving a typical embedded application, the AES (Advanced Encryption Standard).

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Modern compilers present a great and ever increasing number of options which can modify the features and behavior of a compiled program. Many of these options are often wasted due to the required comprehensive knowledge about both the underlying architecture and the internal processes of the compiler. In this context, it is usual, not having a single design goal but a more complex set of objectives. In addition, the dependencies between different goals are difficult to be a priori inferred. This paper proposes a strategy for tuning the compilation of any given application. This is accomplished by using an automatic variation of the compilation options by means of multi-objective optimization and evolutionary computation commanded by the NSGA-II algorithm. This allows finding compilation options that simultaneously optimize different objectives. The advantages of our proposal are illustrated by means of a case study based on the well-known Apache web server. Our strategy has demonstrated an ability to find improvements up to 7.5% and up to 27% in context switches and L2 cache misses, respectively, and also discovers the most important bottlenecks involved in the application performance.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Insulin was used as model protein to developed innovative Solid Lipid Nanoparticles (SLNs) for the delivery of hydrophilic biotech drugs, with potential use in medicinal chemistry. SLNs were prepared by double emulsion with the purpose of promoting stability and enhancing the protein bioavailability. Softisan(®)100 was selected as solid lipid matrix. The surfactants (Tween(®)80, Span(®)80 and Lipoid(®)S75) and insulin were chosen applying a 2(2) factorial design with triplicate of central point, evaluating the influence of dependents variables as polydispersity index (PI), mean particle size (z-AVE), zeta potential (ZP) and encapsulation efficiency (EE) by factorial design using the ANOVA test. Therefore, thermodynamic stability, polymorphism and matrix crystallinity were checked by Differential Scanning Calorimetry (DSC) and Wide Angle X-ray Diffraction (WAXD), whereas the effect of toxicity of SLNs was check in HepG2 and Caco-2 cells. Results showed a mean particle size (z-AVE) width between 294.6 nm and 627.0 nm, a PI in the range of 0.425-0.750, ZP about -3 mV, and the EE between 38.39% and 81.20%. After tempering the bulk lipid (mimicking the end process of production), the lipid showed amorphous characteristics, with a melting point of ca. 30 °C. The toxicity of SLNs was evaluated in two distinct cell lines (HEPG-2 and Caco-2), showing to be dependent on the concentration of particles in HEPG-2 cells, while no toxicity in was reported in Caco-2 cells. SLNs were stable for 24 h in in vitro human serum albumin (HSA) solution. The resulting SLNs fabricated by double emulsion may provide a promising approach for administration of protein therapeutics and antigens.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Response surface methodology based on Box-Behnken (BBD) design was successfully applied to the optimization in the operating conditions of the electrochemical oxidation of sanitary landfill leachate aimed for making this method feasible for scale up. Landfill leachate was treated in continuous batch-recirculation system, where a dimensional stable anode (DSA(©)) coated with Ti/TiO2 and RuO2 film oxide were used. The effects of three variables, current density (milliampere per square centimeter), time of treatment (minutes), and supporting electrolyte dosage (moles per liter) upon the total organic carbon removal were evaluated. Optimized conditions were obtained for the highest desirability at 244.11 mA/cm(2), 41.78 min, and 0.07 mol/L of NaCl and 242.84 mA/cm(2), 37.07 min, and 0.07 mol/L of Na2SO4. Under the optimal conditions, 54.99 % of chemical oxygen demand (COD) and 71.07 ammonia nitrogen (NH3-N) removal was achieved with NaCl and 45.50 of COD and 62.13 NH3-N with Na2SO4. A new kinetic model predicted obtained from the relation between BBD and the kinetic model was suggested.