997 resultados para Code optimization


Relevância:

80.00% 80.00%

Publicador:

Resumo:

Embedded systems are usually designed for a single or a specified set of tasks. This specificity means the system design as well as its hardware/software development can be highly optimized. Embedded software must meet the requirements such as high reliability operation on resource-constrained platforms, real time constraints and rapid development. This necessitates the adoption of static machine codes analysis tools running on a host machine for the validation and optimization of embedded system codes, which can help meet all of these goals. This could significantly augment the software quality and is still a challenging field.Embedded systems are usually designed for a single or a specified set of tasks. This specificity means the system design as well as its hardware/software development can be highly optimized. Embedded software must meet the requirements such as high reliability operation on resource-constrained platforms, real time constraints and rapid development. This necessitates the adoption of static machine codes analysis tools running on a host machine for the validation and optimization of embedded system codes, which can help meet all of these goals. This could significantly augment the software quality and is still a challenging field.Embedded systems are usually designed for a single or a specified set of tasks. This specificity means the system design as well as its hardware/software development can be highly optimized. Embedded software must meet the requirements such as high reliability operation on resource-constrained platforms, real time constraints and rapid development. This necessitates the adoption of static machine codes analysis tools running on a host machine for the validation and optimization of embedded system codes, which can help meet all of these goals. This could significantly augment the software quality and is still a challenging field.Embedded systems are usually designed for a single or a specified set of tasks. This specificity means the system design as well as its hardware/software development can be highly optimized. Embedded software must meet the requirements such as high reliability operation on resource-constrained platforms, real time constraints and rapid development. This necessitates the adoption of static machine codes analysis tools running on a host machine for the validation and optimization of embedded system codes, which can help meet all of these goals. This could significantly augment the software quality and is still a challenging field.This dissertation contributes to an architecture oriented code validation, error localization and optimization technique assisting the embedded system designer in software debugging, to make it more effective at early detection of software bugs that are otherwise hard to detect, using the static analysis of machine codes. The focus of this work is to develop methods that automatically localize faults as well as optimize the code and thus improve the debugging process as well as quality of the code.Validation is done with the help of rules of inferences formulated for the target processor. The rules govern the occurrence of illegitimate/out of place instructions and code sequences for executing the computational and integrated peripheral functions. The stipulated rules are encoded in propositional logic formulae and their compliance is tested individually in all possible execution paths of the application programs. An incorrect sequence of machine code pattern is identified using slicing techniques on the control flow graph generated from the machine code.An algorithm to assist the compiler to eliminate the redundant bank switching codes and decide on optimum data allocation to banked memory resulting in minimum number of bank switching codes in embedded system software is proposed. A relation matrix and a state transition diagram formed for the active memory bank state transition corresponding to each bank selection instruction is used for the detection of redundant codes. Instances of code redundancy based on the stipulated rules for the target processor are identified.This validation and optimization tool can be integrated to the system development environment. It is a novel approach independent of compiler/assembler, applicable to a wide range of processors once appropriate rules are formulated. Program states are identified mainly with machine code pattern, which drastically reduces the state space creation contributing to an improved state-of-the-art model checking. Though the technique described is general, the implementation is architecture oriented, and hence the feasibility study is conducted on PIC16F87X microcontrollers. The proposed tool will be very useful in steering novices towards correct use of difficult microcontroller features in developing embedded systems.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The Lattice-Boltzmann method (LBM), a promising new particle-based simulation technique for complex and multiscale fluid flows, has seen tremendous adoption in recent years in computational fluid dynamics. Even with a state-of-the-art LBM solver such as Palabos, a user has to still manually write the program using library-supplied primitives. We propose an automated code generator for a class of LBM computations with the objective to achieve high performance on modern architectures. Few studies have looked at time tiling for LBM codes. We exploit a key similarity between stencils and LBM to enable polyhedral optimizations and in turn time tiling for LBM. We also characterize the performance of LBM with the Roofline performance model. Experimental results for standard LBM simulations like Lid Driven Cavity, Flow Past Cylinder, and Poiseuille Flow show that our scheme consistently outperforms Palabos-on average by up to 3x while running on 16 cores of an Intel Xeon (Sandybridge). We also obtain an improvement of 2.47x on the SPEC LBM benchmark.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

In this paper, we perform a thorough analysis of a spectral phase-encoded time spreading optical code division multiple access (SPECTS-OCDMA) system based on Walsh-Hadamard (W-H) codes aiming not only at finding optimal code-set selections but also at assessing its loss of security due to crosstalk. We prove that an inadequate choice of codes can make the crosstalk between active users to become large enough so as to cause the data from the user of interest to be detected by other user. The proposed algorithm for code optimization targets code sets that produce minimum bit error rate (BER) among all codes for a specific number of simultaneous users. This methodology allows us to find optimal code sets for any OCDMA system, regardless the code family used and the number of active users. This procedure is crucial for circumventing the unexpected lack of security due to crosstalk. We also show that a SPECTS-OCDMA system based on W-H 32(64) fundamentally limits the number of simultaneous users to 4(8) with no security violation due to crosstalk. More importantly, we prove that only a small fraction of the available code sets is actually immune to crosstalk with acceptable BER (<10(-9)) i.e., approximately 0.5% for W-H 32 with four simultaneous users, and about 1 x 10(-4)% for W-H 64 with eight simultaneous users.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Multi-GPU machines are being increasingly used in high-performance computing. Each GPU in such a machine has its own memory and does not share the address space either with the host CPU or other GPUs. Hence, applications utilizing multiple GPUs have to manually allocate and manage data on each GPU. Existing works that propose to automate data allocations for GPUs have limitations and inefficiencies in terms of allocation sizes, exploiting reuse, transfer costs, and scalability. We propose a scalable and fully automatic data allocation and buffer management scheme for affine loop nests on multi-GPU machines. We call it the Bounding-Box-based Memory Manager (BBMM). BBMM can perform at runtime, during standard set operations like union, intersection, and difference, finding subset and superset relations on hyperrectangular regions of array data (bounding boxes). It uses these operations along with some compiler assistance to identify, allocate, and manage data required by applications in terms of disjoint bounding boxes. This allows it to (1) allocate exactly or nearly as much data as is required by computations running on each GPU, (2) efficiently track buffer allocations and hence maximize data reuse across tiles and minimize data transfer overhead, and (3) and as a result, maximize utilization of the combined memory on multi-GPU machines. BBMM can work with any choice of parallelizing transformations, computation placement, and scheduling schemes, whether static or dynamic. Experiments run on a four-GPU machine with various scientific programs showed that BBMM reduces data allocations on each GPU by up to 75% compared to current allocation schemes, yields performance of at least 88% of manually written code, and allows excellent weak scaling.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Task-parallel languages are increasingly popular. Many of them provide expressive mechanisms for intertask synchronization. For example, OpenMP 4.0 will integrate data-driven execution semantics derived from the StarSs research language. Compared to the more restrictive data-parallel and fork-join concurrency models, the advanced features being introduced into task-parallelmodels in turn enable improved scalability through load balancing, memory latency hiding, mitigation of the pressure on memory bandwidth, and, as a side effect, reduced power consumption. In this article, we develop a systematic approach to compile loop nests into concurrent, dynamically constructed graphs of dependent tasks. We propose a simple and effective heuristic that selects the most profitable parallelization idiom for every dependence type and communication pattern. This heuristic enables the extraction of interband parallelism (cross-barrier parallelism) in a number of numerical computations that range from linear algebra to structured grids and image processing. The proposed static analysis and code generation alleviates the burden of a full-blown dependence resolver to track the readiness of tasks at runtime. We evaluate our approach and algorithms in the PPCG compiler, targeting OpenStream, a representative dataflow task-parallel language with explicit intertask dependences and a lightweight runtime. Experimental results demonstrate the effectiveness of the approach.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Graph algorithms have been shown to possess enough parallelism to keep several computing resources busy-even hundreds of cores on a GPU. Unfortunately, tuning their implementation for efficient execution on a particular hardware configuration of heterogeneous systems consisting of multicore CPUs and GPUs is challenging, time consuming, and error prone. To address these issues, we propose a domain-specific language (DSL), Falcon, for implementing graph algorithms that (i) abstracts the hardware, (ii) provides constructs to write explicitly parallel programs at a higher level, and (iii) can work with general algorithms that may change the graph structure (morph algorithms). We illustrate the usage of our DSL to implement local computation algorithms (that do not change the graph structure) and morph algorithms such as Delaunay mesh refinement, survey propagation, and dynamic SSSP on GPU and multicore CPUs. Using a set of benchmark graphs, we illustrate that the generated code performs close to the state-of-the-art hand-tuned implementations.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Simultaneous multithreading processors dynamically share processor resources between multiple threads. In general, shared SMT resources may be managed explicitly, for instance, by dynamically setting queue occupation bounds for each thread as in the DCRA and Hill-Climbing policies. Alternatively, resources may be managed implicitly; that is, resource usage is controlled by placing the desired instruction mix in the resources. In this case, the main resource management tool is the instruction fetch policy which must predict the behavior of each thread (branch mispredictions, long-latency loads, etc.) as it fetches instructions.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Branch prediction feeds a speculative execution processor core with instructions. Branch mispredictions are inevitable and have negative effects on performance and energy consumption. With the advent of highly accurate conditional branch predictors, nonconditional branch instructions are gaining importance.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Efficiently exploring exponential-size architectural design spaces with many interacting parameters remains an open problem: the sheer number of experiments required renders detailed simulation intractable.We attack this via an automated approach that builds accurate predictive models. We simulate sampled points, using results to teach our models the function describing relationships among design parameters. The models can be queried and are very fast, enabling efficient design tradeoff discovery. We validate our approach via two uniprocessor sensitivity studies, predicting IPC with only 1–2% error. In an experimental study using the approach, training on 1% of a 250-K-point CMP design space allows our models to predict performance with only 4–5% error. Our predictive modeling combines well with techniques that reduce the time taken by each simulation experiment, achieving net time savings of three-four orders of magnitude.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Processor architectures has taken a turn towards many-core processors, which integrate multiple processing cores on a single chip to increase overall performance, and there are no signs that this trend will stop in the near future. Many-core processors are harder to program than multi-core and single-core processors due to the need of writing parallel or concurrent programs with high degrees of parallelism. Moreover, many-cores have to operate in a mode of strong scaling because of memory bandwidth constraints. In strong scaling increasingly finer-grain parallelism must be extracted in order to keep all processing cores busy.

Task dataflow programming models have a high potential to simplify parallel program- ming because they alleviate the programmer from identifying precisely all inter-task de- pendences when writing programs. Instead, the task dataflow runtime system detects and enforces inter-task dependences during execution based on the description of memory each task accesses. The runtime constructs a task dataflow graph that captures all tasks and their dependences. Tasks are scheduled to execute in parallel taking into account dependences specified in the task graph.

Several papers report important overheads for task dataflow systems, which severely limits the scalability and usability of such systems. In this paper we study efficient schemes to manage task graphs and analyze their scalability. We assume a programming model that supports input, output and in/out annotations on task arguments, as well as commutative in/out and reductions. We analyze the structure of task graphs and identify versions and generations as key concepts for efficient management of task graphs. Then, we present three schemes to manage task graphs building on graph representations, hypergraphs and lists. We also consider a fourth edge-less scheme that synchronizes tasks using integers. Analysis using micro-benchmarks shows that the graph representation is not always scalable and that the edge-less scheme introduces least overhead in nearly all situations.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In recent decades, full electric and hybrid electric vehicles have emerged as an alternative to conventional cars due to a range of factors, including environmental and economic aspects. These vehicles are the result of considerable efforts to seek ways of reducing the use of fossil fuel for vehicle propulsion. Sophisticated technologies such as hybrid and electric powertrains require careful study and optimization. Mathematical models play a key role at this point. Currently, many advanced mathematical analysis tools, as well as computer applications have been built for vehicle simulation purposes. Given the great interest of hybrid and electric powertrains, along with the increasing importance of reliable computer-based models, the author decided to integrate both aspects in the research purpose of this work. Furthermore, this is one of the first final degree projects held at the ETSII (Higher Technical School of Industrial Engineers) that covers the study of hybrid and electric propulsion systems. The present project is based on MBS3D 2.0, a specialized software for the dynamic simulation of multibody systems developed at the UPM Institute of Automobile Research (INSIA). Automobiles are a clear example of complex multibody systems, which are present in nearly every field of engineering. The work presented here benefits from the availability of MBS3D software. This program has proven to be a very efficient tool, with a highly developed underlying mathematical formulation. On this basis, the focus of this project is the extension of MBS3D features in order to be able to perform dynamic simulations of hybrid and electric vehicle models. This requires the joint simulation of the mechanical model of the vehicle, together with the model of the hybrid or electric powertrain. These sub-models belong to completely different physical domains. In fact the powertrain consists of energy storage systems, electrical machines and power electronics, connected to purely mechanical components (wheels, suspension, transmission, clutch…). The challenge today is to create a global vehicle model that is valid for computer simulation. Therefore, the main goal of this project is to apply co-simulation methodologies to a comprehensive model of an electric vehicle, where sub-models from different areas of engineering are coupled. The created electric vehicle (EV) model consists of a separately excited DC electric motor, a Li-ion battery pack, a DC/DC chopper converter and a multibody vehicle model. Co-simulation techniques allow car designers to simulate complex vehicle architectures and behaviors, which are usually difficult to implement in a real environment due to safety and/or economic reasons. In addition, multi-domain computational models help to detect the effects of different driving patterns and parameters and improve the models in a fast and effective way. Automotive designers can greatly benefit from a multidisciplinary approach of new hybrid and electric vehicles. In this case, the global electric vehicle model includes an electrical subsystem and a mechanical subsystem. The electrical subsystem consists of three basic components: electric motor, battery pack and power converter. A modular representation is used for building the dynamic model of the vehicle drivetrain. This means that every component of the drivetrain (submodule) is modeled separately and has its own general dynamic model, with clearly defined inputs and outputs. Then, all the particular submodules are assembled according to the drivetrain configuration and, in this way, the power flow across the components is completely determined. Dynamic models of electrical components are often based on equivalent circuits, where Kirchhoff’s voltage and current laws are applied to draw the algebraic and differential equations. Here, Randles circuit is used for dynamic modeling of the battery and the electric motor is modeled through the analysis of the equivalent circuit of a separately excited DC motor, where the power converter is included. The mechanical subsystem is defined by MBS3D equations. These equations consider the position, velocity and acceleration of all the bodies comprising the vehicle multibody system. MBS3D 2.0 is entirely written in MATLAB and the structure of the program has been thoroughly studied and understood by the author. MBS3D software is adapted according to the requirements of the applied co-simulation method. Some of the core functions are modified, such as integrator and graphics, and several auxiliary functions are added in order to compute the mathematical model of the electrical components. By coupling and co-simulating both subsystems, it is possible to evaluate the dynamic interaction among all the components of the drivetrain. ‘Tight-coupling’ method is used to cosimulate the sub-models. This approach integrates all subsystems simultaneously and the results of the integration are exchanged by function-call. This means that the integration is done jointly for the mechanical and the electrical subsystem, under a single integrator and then, the speed of integration is determined by the slower subsystem. Simulations are then used to show the performance of the developed EV model. However, this project focuses more on the validation of the computational and mathematical tool for electric and hybrid vehicle simulation. For this purpose, a detailed study and comparison of different integrators within the MATLAB environment is done. Consequently, the main efforts are directed towards the implementation of co-simulation techniques in MBS3D software. In this regard, it is not intended to create an extremely precise EV model in terms of real vehicle performance, although an acceptable level of accuracy is achieved. The gap between the EV model and the real system is filled, in a way, by introducing the gas and brake pedals input, which reflects the actual driver behavior. This input is included directly in the differential equations of the model, and determines the amount of current provided to the electric motor. For a separately excited DC motor, the rotor current is proportional to the traction torque delivered to the car wheels. Therefore, as it occurs in the case of real vehicle models, the propulsion torque in the mathematical model is controlled through acceleration and brake pedal commands. The designed transmission system also includes a reduction gear that adapts the torque coming for the motor drive and transfers it. The main contribution of this project is, therefore, the implementation of a new calculation path for the wheel torques, based on performance characteristics and outputs of the electric powertrain model. Originally, the wheel traction and braking torques were input to MBS3D through a vector directly computed by the user in a MATLAB script. Now, they are calculated as a function of the motor current which, in turn, depends on the current provided by the battery pack across the DC/DC chopper converter. The motor and battery currents and voltages are the solutions of the electrical ODE (Ordinary Differential Equation) system coupled to the multibody system. Simultaneously, the outputs of MBS3D model are the position, velocity and acceleration of the vehicle at all times. The motor shaft speed is computed from the output vehicle speed considering the wheel radius, the gear reduction ratio and the transmission efficiency. This motor shaft speed, somehow available from MBS3D model, is then introduced in the differential equations corresponding to the electrical subsystem. In this way, MBS3D and the electrical powertrain model are interconnected and both subsystems exchange values resulting as expected with tight-coupling approach.When programming mathematical models of complex systems, code optimization is a key step in the process. A way to improve the overall performance of the integration, making use of C/C++ as an alternative programming language, is described and implemented. Although this entails a higher computational burden, it leads to important advantages regarding cosimulation speed and stability. In order to do this, it is necessary to integrate MATLAB with another integrated development environment (IDE), where C/C++ code can be generated and executed. In this project, C/C++ files are programmed in Microsoft Visual Studio and the interface between both IDEs is created by building C/C++ MEX file functions. These programs contain functions or subroutines that can be dynamically linked and executed from MATLAB. This process achieves reductions in simulation time up to two orders of magnitude. The tests performed with different integrators, also reveal the stiff character of the differential equations corresponding to the electrical subsystem, and allow the improvement of the cosimulation process. When varying the parameters of the integration and/or the initial conditions of the problem, the solutions of the system of equations show better dynamic response and stability, depending on the integrator used. Several integrators, with variable and non-variable step-size, and for stiff and non-stiff problems are applied to the coupled ODE system. Then, the results are analyzed, compared and discussed. From all the above, the project can be divided into four main parts: 1. Creation of the equation-based electric vehicle model; 2. Programming, simulation and adjustment of the electric vehicle model; 3. Application of co-simulation methodologies to MBS3D and the electric powertrain subsystem; and 4. Code optimization and study of different integrators. Additionally, in order to deeply understand the context of the project, the first chapters include an introduction to basic vehicle dynamics, current classification of hybrid and electric vehicles and an explanation of the involved technologies such as brake energy regeneration, electric and non-electric propulsion systems for EVs and HEVs (hybrid electric vehicles) and their control strategies. Later, the problem of dynamic modeling of hybrid and electric vehicles is discussed. The integrated development environment and the simulation tool are also briefly described. The core chapters include an explanation of the major co-simulation methodologies and how they have been programmed and applied to the electric powertrain model together with the multibody system dynamic model. Finally, the last chapters summarize the main results and conclusions of the project and propose further research topics. In conclusion, co-simulation methodologies are applicable within the integrated development environments MATLAB and Visual Studio, and the simulation tool MBS3D 2.0, where equation-based models of multidisciplinary subsystems, consisting of mechanical and electrical components, are coupled and integrated in a very efficient way.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

A rápida evolução do hardware demanda uma evolução contínua dos compiladores. Um processo de ajuste deve ser realizado pelos projetistas de compiladores para garantir que o código gerado pelo compilador mantenha uma determinada qualidade, seja em termos de tempo de processamento ou outra característica pré-definida. Este trabalho visou automatizar o processo de ajuste de compiladores por meio de técnicas de aprendizado de máquina. Como resultado os planos de compilação obtidos usando aprendizado de máquina com as características propostas produziram código para programas cujos valores para os tempos de execução se aproximaram daqueles seguindo o plano padrão utilizado pela LLVM.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Tese de doutoramento, Bioquimica, Faculdade de Ciências e Tecnologia, Universidade do Algarve, 2015