991 resultados para Efficient implementation


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Mestrado em Engenharia Electrotécnica e de Computadores

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents a single precision floating point arithmetic unit with support for multiplication, addition, fused multiply-add, reciprocal, square-root and inverse squareroot with high-performance and low resource usage. The design uses a piecewise 2nd order polynomial approximation to implement reciprocal, square-root and inverse square-root. The unit can be configured with any number of operations and is capable to calculate any function with a throughput of one operation per cycle. The floatingpoint multiplier of the unit is also used to implement the polynomial approximation and the fused multiply-add operation. We have compared our implementation with other state-of-the-art proposals, including the Xilinx Core-Gen operators, and conclude that the approach has a high relative performance/area efficiency. © 2014 Technical University of Munich (TUM).

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Due to various advantages such as flexibility, scalability and updatability, software intensive systems are increasingly embedded in everyday life. The constantly growing number of functions executed by these systems requires a high level of performance from the underlying platform. The main approach to incrementing performance has been the increase of operating frequency of a chip. However, this has led to the problem of power dissipation, which has shifted the focus of research to parallel and distributed computing. Parallel many-core platforms can provide the required level of computational power along with low power consumption. On the one hand, this enables parallel execution of highly intensive applications. With their computational power, these platforms are likely to be used in various application domains: from home use electronics (e.g., video processing) to complex critical control systems. On the other hand, the utilization of the resources has to be efficient in terms of performance and power consumption. However, the high level of on-chip integration results in the increase of the probability of various faults and creation of hotspots leading to thermal problems. Additionally, radiation, which is frequent in space but becomes an issue also at the ground level, can cause transient faults. This can eventually induce a faulty execution of applications. Therefore, it is crucial to develop methods that enable efficient as well as resilient execution of applications. The main objective of the thesis is to propose an approach to design agentbased systems for many-core platforms in a rigorous manner. When designing such a system, we explore and integrate various dynamic reconfiguration mechanisms into agents functionality. The use of these mechanisms enhances resilience of the underlying platform whilst maintaining performance at an acceptable level. The design of the system proceeds according to a formal refinement approach which allows us to ensure correct behaviour of the system with respect to postulated properties. To enable analysis of the proposed system in terms of area overhead as well as performance, we explore an approach, where the developed rigorous models are transformed into a high-level implementation language. Specifically, we investigate methods for deriving fault-free implementations from these models into, e.g., a hardware description language, namely VHDL.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents an efficient tabu search algorithm (TSA) to solve the problem of feeder reconfiguration of distribution systems. The main characteristics that make the proposed TSA particularly efficient are a) the way in which the neighborhood of the current solution was defined; b) the way in which the objective function value was estimated; and c) the reduction of the neighborhood using heuristic criteria. Four electrical systems, described in detail in the specialized literature, were used to test the proposed TSA. The result demonstrate that it is computationally very fast and finds the best solutions known in the specialized literature. © 2012 IEEE.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

An alternative way is provided to define the discrete Pascal transform using difference operators to reveal the fundamental concept of the transform, in both one- and two-dimensional cases, which is extended to cover non-square two-dimensional applications. Efficient modularised implementations are proposed.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

While negation has been a very active área of research in logic programming, comparatively few papers have been devoted to implementation issues. Furthermore, the negation-related capabilities of current Prolog systems are limited. We recently presented a novel method for incorporating negation in a Prolog compiler which takes a number of existing methods (some modified and improved) and uses them in a combined fashion. The method makes use of information provided by a global analysis of the source code. Our previous work focused on the systematic description of the techniques and the reasoning about correctness and completeness of the method, but provided no experimental evidence to evalúate the proposal. In this paper, after proposing some extensions to the method, we provide experimental data which indicates that the method is not only feasible but also quite promising from the efficiency point of view. In addition, the tests have provided new insight as to how to improve the proposal further. Abstract interpretation techniques (in particular those included in the Ciao Prolog system preprocessor) have had a significant role in the success of the technique.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The term "Logic Programming" refers to a variety of computer languages and execution models which are based on the traditional concept of Symbolic Logic. The expressive power of these languages offers promise to be of great assistance in facing the programming challenges of present and future symbolic processing applications in Artificial Intelligence, Knowledge-based systems, and many other areas of computing. The sequential execution speed of logic programs has been greatly improved since the advent of the first interpreters. However, higher inference speeds are still required in order to meet the demands of applications such as those contemplated for next generation computer systems. The execution of logic programs in parallel is currently considered a promising strategy for attaining such inference speeds. Logic Programming in turn appears as a suitable programming paradigm for parallel architectures because of the many opportunities for parallel execution present in the implementation of logic programs. This dissertation presents an efficient parallel execution model for logic programs. The model is described from the source language level down to an "Abstract Machine" level suitable for direct implementation on existing parallel systems or for the design of special purpose parallel architectures. Few assumptions are made at the source language level and therefore the techniques developed and the general Abstract Machine design are applicable to a variety of logic (and also functional) languages. These techniques offer efficient solutions to several areas of parallel Logic Programming implementation previously considered problematic or a source of considerable overhead, such as the detection and handling of variable binding conflicts in AND-Parallelism, the specification of control and management of the execution tree, the treatment of distributed backtracking, and goal scheduling and memory management issues, etc. A parallel Abstract Machine design is offered, specifying data areas, operation, and a suitable instruction set. This design is based on extending to a parallel environment the techniques introduced by the Warren Abstract Machine, which have already made very fast and space efficient sequential systems a reality. Therefore, the model herein presented is capable of retaining sequential execution speed similar to that of high performance sequential systems, while extracting additional gains in speed by efficiently implementing parallel execution. These claims are supported by simulations of the Abstract Machine on sample programs.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

For the last decade, elliptic curve cryptography has gained increasing interest in industry and in the academic community. This is especially due to the high level of security it provides with relatively small keys and to its ability to create very efficient and multifunctional cryptographic schemes by means of bilinear pairings. Pairings require pairing-friendly elliptic curves and among the possible choices, Barreto-Naehrig (BN) curves arguably constitute one of the most versatile families. In this paper, we further expand the potential of the BN curve family. We describe BN curves that are not only computationally very simple to generate, but also specially suitable for efficient implementation on a very broad range of scenarios. We also present implementation results of the optimal ate pairing using such a curve defined over a 254-bit prime field. (C) 2001 Elsevier Inc. All rights reserved.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Hyperspectral imaging can be used for object detection and for discriminating between different objects based on their spectral characteristics. One of the main problems of hyperspectral data analysis is the presence of mixed pixels, due to the low spatial resolution of such images. This means that several spectrally pure signatures (endmembers) are combined into the same mixed pixel. Linear spectral unmixing follows an unsupervised approach which aims at inferring pure spectral signatures and their material fractions at each pixel of the scene. The huge data volumes acquired by such sensors put stringent requirements on processing and unmixing methods. This paper proposes an efficient implementation of a unsupervised linear unmixing method on GPUs using CUDA. The method finds the smallest simplex by solving a sequence of nonsmooth convex subproblems using variable splitting to obtain a constraint formulation, and then applying an augmented Lagrangian technique. The parallel implementation of SISAL presented in this work exploits the GPU architecture at low level, using shared memory and coalesced accesses to memory. The results herein presented indicate that the GPU implementation can significantly accelerate the method's execution over big datasets while maintaining the methods accuracy.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Ultrafast 2D NMR is a powerful methodology that allows recording of a 2D NMR spectrum in a fraction of second. However, due to the numerous non-conventional parameters involved in this methodology its implementation is no trivial task. Here, an optimized experimental protocol is carefully described to ensure efficient implementation of ultrafast NMR. The ultrafast spectra resulting from this implementation are presented based on the example of two widely used 2D NMR experiments, COSY and HSQC, obtained in 0.2 s and 41 s, respectively.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

In the present work we describe a method which allows the incorporation of surface tension into the GENSMAC2D code. This is achieved on two scales. First on the scale of a cell, the surface tension effects are incorporated into the free surface boundary conditions through the computation of the capillary pressure. The required curvature is estimated by fitting a least square circle to the free surface using the tracking particles in the cell and in its close neighbors. On a sub-cell scale, short wavelength perturbations are filtered out using a local 4-point stencil which is mass conservative. An efficient implementation is obtained through a dual representation of the cell data, using both a matrix representation, for ease at identifying neighbouring cells, and also a tree data structure, which permits the representation of specific groups of cells with additional information pertaining to that group. The resulting code is shown to be robust, and to produce accurate results when compared with exact solutions of selected fluid dynamic problems involving surface tension.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

An international seminar-workshop entitled "Facilitation of trade and transport in Latin America: situation and outlook" was held at the headquarters of the Economic Commission for Latin America and the Caribbean (ECLAC) on 29 and 30 November 2005, organized jointly by the ECLAC Division of International Trade and Integration and the United Nations Conference on Trade and Development (UNCTAD). The event was attended by about 50 persons involved in customs modernization and/or the implementation of single window systems for foreign trade in 20 Ibero-American countries.The main purpose of the seminar-workshop was to exchange ideas, opinions and proposals concerning the efficient implementation of trade facilitation instruments. The conclusions reached at this event point to the need to seek convergence among the existing trade agreements associated with trade facilitation in Latin America. Customs modernization requires the re-design of processes and procedures in order to achieve interoperability among the systems, and single window systems for foreign trade can only be implemented successfully if clear political leadership is established with broad participation from both public and private organizations.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Starting from the Durbin algorithm in polynomial space with an inner product defined by the signal autocorrelation matrix, an isometric transformation is defined that maps this vector space into another one where the Levinson algorithm is performed. Alternatively, for iterative algorithms such as discrete all-pole (DAP), an efficient implementation of a Gohberg-Semencul (GS) relation is developed for the inversion of the autocorrelation matrix which considers its centrosymmetry. In the solution of the autocorrelation equations, the Levinson algorithm is found to be less complex operationally than the procedures based on GS inversion for up to a minimum of five iterations at various linear prediction (LP) orders.