970 resultados para Efficient implementation
Resumo:
142 p.
Resumo:
We present a new software framework for the implementation of applications that use stencil computations on block-structured grids to solve partial differential equations. A key feature of the framework is the extensive use of automatic source code generation which is used to achieve high performance on a range of leading multi-core processors. Results are presented for a simple model stencil running on Intel and AMD CPUs as well as the NVIDIA GT200 GPU. The generality of the framework is demonstrated through the implementation of a complete application consisting of many different stencil computations, taken from the field of computational fluid dynamics. © 2010 IEEE.
Resumo:
Coupled Monte Carlo depletion systems provide a versatile and an accurate tool for analyzing advanced thermal and fast reactor designs for a variety of fuel compositions and geometries. The main drawback of Monte Carlo-based systems is a long calculation time imposing significant restrictions on the complexity and amount of design-oriented calculations. This paper presents an alternative approach to interfacing the Monte Carlo and depletion modules aimed at addressing this problem. The main idea is to calculate the one-group cross sections for all relevant isotopes required by the depletion module in a separate module external to Monte Carlo calculations. Thus, the Monte Carlo module will produce the criticality and neutron spectrum only, without tallying of the individual isotope reaction rates. The onegroup cross section for all isotopes will be generated in a separate module by collapsing a universal multigroup (MG) cross-section library using the Monte Carlo calculated flux. Here, the term "universal" means that a single MG cross-section set will be applicable for all reactor systems and is independent of reactor characteristics such as a neutron spectrum; fuel composition; and fuel cell, assembly, and core geometries. This approach was originally proposed by Haeck et al. and implemented in the ALEPH code. Implementation of the proposed approach to Monte Carlo burnup interfacing was carried out through the BGCORE system. One-group cross sections generated by the BGCORE system were compared with those tallied directly by the MCNP code. Analysis of this comparison was carried out and led to the conclusion that in order to achieve the accuracy required for a reliable core and fuel cycle analysis, accounting for the background cross section (σ0) in the unresolved resonance energy region is essential. An extension of the one-group cross-section generation model was implemented and tested by tabulating and interpolating by a simplified σ0 model. A significant improvement of the one-group cross-section accuracy was demonstrated.
Resumo:
Coherent shared memory is a convenient, but inefficient, method of inter-process communication for parallel programs. By contrast, message passing can be less convenient, but more efficient. To get the benefits of both models, several non-coherent memory behaviors have recently been proposed in the literature. We present an implementation of Mermera, a shared memory system that supports both coherent and non-coherent behaviors in a manner that enables programmers to mix multiple behaviors in the same program[HS93]. A programmer can debug a Mermera program using coherent memory, and then improve its performance by selectively reducing the level of coherence in the parts that are critical to performance. Mermera permits a trade-off of coherence for performance. We analyze this trade-off through measurements of our implementation, and by an example that illustrates the style of programming needed to exploit non-coherence. We find that, even on a small network of workstations, the performance advantage of non-coherence is compelling. Raw non-coherent memory operations perform 20-40~times better than non-coherent memory operations. An example application program is shown to run 5-11~times faster when permitted to exploit non-coherence. We conclude by commenting on our use of the Isis Toolkit of multicast protocols in implementing Mermera.
Resumo:
Current low-level networking abstractions on modern operating systems are commonly implemented in the kernel to provide sufficient performance for general purpose applications. However, it is desirable for high performance applications to have more control over the networking subsystem to support optimizations for their specific needs. One approach is to allow networking services to be implemented at user-level. Unfortunately, this typically incurs costs due to scheduling overheads and unnecessary data copying via the kernel. In this paper, we describe a method to implement efficient application-specific network service extensions at user-level, that removes the cost of scheduling and provides protected access to lower-level system abstractions. We present a networking implementation that, with minor modifications to the Linux kernel, passes data between "sandboxed" extensions and the Ethernet device without copying or processing in the kernel. Using this mechanism, we put a customizable networking stack into a user-level sandbox and show how it can be used to efficiently process and forward data via proxies, or intermediate hosts, in the communication path of high performance data streams. Unlike other user-level networking implementations, our method makes no special hardware requirements to avoid unnecessary data copies. Results show that we achieve a substantial increase in throughput over comparable user-space methods using our networking stack implementation.
Resumo:
In work that involves mathematical rigor, there are numerous benefits to adopting a representation of models and arguments that can be supplied to a formal reasoning or verification system: reusability, automatic evaluation of examples, and verification of consistency and correctness. However, accessibility has not been a priority in the design of formal verification tools that can provide these benefits. In earlier work [Lap09a], we attempt to address this broad problem by proposing several specific design criteria organized around the notion of a natural context: the sphere of awareness a working human user maintains of the relevant constructs, arguments, experiences, and background materials necessary to accomplish the task at hand. This work expands one aspect of the earlier work by considering more extensively an essential capability for any formal reasoning system whose design is oriented around simulating the natural context: native support for a collection of mathematical relations that deal with common constructs in arithmetic and set theory. We provide a formal definition for a context of relations that can be used to both validate and assist formal reasoning activities. We provide a proof that any algorithm that implements this formal structure faithfully will necessary converge. Finally, we consider the efficiency of an implementation of this formal structure that leverages modular implementations of well-known data structures: balanced search trees and transitive closures of hypergraphs.
Resumo:
Consumer demand is revolutionizing the way products are being produced, distributed and marketed. In relation to the dairy sector in developing countries, aspects of milk quality are receiving more attention from both society and the government. However, milk quality management needs to be better addressed in dairy production systems to guarantee the access of stakeholders, mainly small-holders, into dairy markets. The present study is focused on an analysis of the interaction of the upstream part of the dairy supply chain (farmers and dairies) in the Mantaro Valley (Peruvian central Andes), in order to understand possible constraints both stakeholders face implementing milk quality controls and practices; and evaluate “ex-ante” how different strategies suggested to improve milk quality could affect farmers and processors’ profits. The analysis is based on three complementary field studies conducted between 2012 and 2013. Our work has shown that the presence of a dual supply chain combining both formal and informal markets has a direct impact on dairy production at the technical and organizational levels, affecting small formal dairy processors’ possibilities to implement contracts, including agreements on milk quality standards. The analysis of milk quality management from farms to dairy plants highlighted the poor hygiene in the study area, even when average values of milk composition were usually high. Some husbandry practices evaluated at farm level demonstrated cost effectiveness and a big impact on hygienic quality; however, regular application of these practices was limited, since small-scale farmers do not receive a bonus for producing hygienic milk. On the basis of these two results, we co-designed with formal small-scale dairy processors a simulation tool to show prospective scenarios, in which they could select their best product portfolio but also design milk payment systems to reward farmers’ with high milk quality performances. This type of approach allowed dairy processors to realize the importance of including milk quality management in their collection and manufacturing processes, especially in a context of high competition for milk supply. We concluded that the improvement of milk quality in a smallholder farming context requires a more coordinated effort among stakeholders. Successful implementation of strategies will depend on the willingness of small-scale dairy processors to reward farmers producing high milk quality; but also on the support from the State to provide incentives to the stakeholders in the formal sector.
A policy-definition language and prototype implementation library for policy-based autonomic systems
Resumo:
This paper presents work towards generic policy toolkit support for autonomic computing systems in which the policies themselves can be adapted dynamically and automatically. The work is motivated by three needs: the need for longer-term policy-based adaptation where the policy itself is dynamically adapted to continually maintain or improve its effectiveness despite changing environmental conditions; the need to enable non autonomics-expert practitioners to embed self-managing behaviours with low cost and risk; and the need for adaptive policy mechanisms that are easy to deploy into legacy code. A policy definition language is presented; designed to permit powerful expression of self-managing behaviours. The language is very flexible through the use of simple yet expressive syntax and semantics, and facilitates a very diverse policy behaviour space through both hierarchical and recursive uses of language elements. A prototype library implementation of the policy support mechanisms is described. The library reads and writes policies in well-formed XML script. The implementation extends the state of the art in policy-based autonomics through innovations which include support for multiple policy versions of a given policy type, multiple configuration templates, and meta-policies to dynamically select between policy instances and templates. Most significantly, the scheme supports hot-swapping between policy instances. To illustrate the feasibility and generalised applicability of these tools, two dissimilar example deployment scenarios are examined. The first is taken from an exploratory implementation of self-managing parallel processing, and is used to demonstrate the simple and efficient use of the tools. The second example demonstrates more-advanced functionality, in the context of an envisioned multi-policy stock trading scheme which is sensitive to environmental volatility
Resumo:
A zone based systems design framework is described and utilised in the implementation of a message authentication code (MAC) algorithm based on symmetric key block ciphers. The resulting block cipher based MAC algorithm may be used to provide assurance of the authenticity and, hence, the integrity of binary data. Using software simulation to benchmark against the de facto cipher block chaining MAC (CBC-MAC) variant used in the TinySec security protocol for wireless sensor networks and the NIST cipher block chaining MAC standard, CMAC; we show that our zone based systems design framework can lead to block cipher based MAC constructs that point to improvements in message processing efficiency, processing throughput and processing latency.
Resumo:
This paper presents a multi-language framework to FPGA hardware development which aims to satisfy the dual requirement of high-level hardware design and efficient hardware implementation. The central idea of this framework is the integration of different hardware languages in a way that harnesses the best features of each language. This is illustrated in this paper by the integration of two hardware languages in the form of HIDE: a structured hardware language which provides more abstract and elegant hardware descriptions and compositions than are possible in traditional hardware description languages such as VHDL or Verilog, and Handel-C: an ANSI C-like hardware language which allows software and hardware engineers alike to target FPGAs from high-level algorithmic descriptions. On the one hand, HIDE has proven to be very successful in the description and generation of highly optimised parameterisable FPGA circuits from geometric descriptions. On the other hand, Handel-C has also proven to be very successful in the rapid design and prototyping of FPGA circuits from algorithmic application descriptions. The proposed integrated framework hence harnesses HIDE for the generation of highly optimised circuits for regular parts of algorithms, while Handel-C is used as a top-level design language from which HIDE functionality is dynamically invoked. The overall message of this paper posits that there need not be an exclusive choice between different hardware design flows. Rather, an integrated framework where different design flows can seamlessly interoperate should be adopted. Although the idea might seem simple prima facie, it could have serious implications on the design of future generations of hardware languages.
Resumo:
A novel application-specific instruction set processor (ASIP) for use in the construction of modern signal processing systems is presented. This is a flexible device that can be used in the construction of array processor systems for the real-time implementation of functions such as singular-value decomposition (SVD) and QR decomposition (QRD), as well as other important matrix computations. It uses a coordinate rotation digital computer (CORDIC) module to perform arithmetic operations and several approaches are adopted to achieve high performance including pipelining of the micro-rotations, the use of parallel instructions and a dual-bus architecture. In addition, a novel method for scale factor correction is presented which only needs to be applied once at the end of the computation. This also reduces computation time and enhances performance. Methods are described which allow this processor to be used in reduced dimension (i.e., folded) array processor structures that allow tradeoffs between hardware and performance. The net result is a flexible matrix computational processing element (PE) whose functionality can be changed under program control for use in a wider range of scenarios than previous work. Details are presented of the results of a design study, which considers the application of this decomposition PE architecture in a combined SVD/QRD system and demonstrates that a combination of high performance and efficient silicon implementation are achievable. © 2005 IEEE.
Resumo:
Computionally efficient sequential learning algorithms are developed for direct-link resource-allocating networks (DRANs). These are achieved by decomposing existing recursive training algorithms on a layer by layer and neuron by neuron basis. This allows network weights to be updated in an efficient parallel manner and facilitates the implementation of minimal update extensions that yield a significant reduction in computation load per iteration compared to existing sequential learning methods employed in resource-allocation network (RAN) and minimal RAN (MRAN) approaches. The new algorithms, which also incorporate a pruning strategy to control network growth, are evaluated on three different system identification benchmark problems and shown to outperform existing methods both in terms of training error convergence and computational efficiency. (c) 2005 Elsevier B.V. All rights reserved.
Resumo:
A scalable large vocabulary, speaker independent speech recognition system is being developed using Hidden Markov Models (HMMs) for acoustic modeling and a Weighted Finite State Transducer (WFST) to compile sentence, word, and phoneme models. The system comprises a software backend search and an FPGA-based Gaussian calculation which are covered here. In this paper, we present an efficient pipelined design implemented both as an embedded peripheral and as a scalable, parallel hardware accelerator. Both architectures have been implemented on an Alpha Data XRC-5T1, reconfigurable computer housing a Virtex 5 SX95T FPGA. The core has been tested and is capable of calculating a full set of Gaussian results from 3825 acoustic models in 9.03 ms which coupled with a backend search of 5000 words has provided an accuracy of over 80%. Parallel implementations have been designed with up to 32 cores and have been successfully implemented with a clock frequency of 133?MHz.
Resumo:
The Kyoto Protocol and the European Energy Performance of Buildings Directive put an onus on governments
and organisations to lower carbon footprint in order to contribute towards reducing global warming. A key
parameter to be considered in buildings towards energy and cost savings is its indoor lighting that has a major
impact on overall energy usage and Carbon Dioxide emissions. Lighting control in buildings using Passive
Infrared sensors is a reliable and well established approach; however, the use of only Passive Infrared does not
offer much savings towards reducing carbon, energy, and cost. Accurate occupancy monitoring information can
greatly affect a building’s lighting control strategy towards a greener usage. This paper presents an approach for
data fusion of Passive Infrared sensors and passive Radio Frequency Identification (RFID) based occupancy
monitoring. The idea is to have efficient, need-based, and reliable control of lighting towards a green indoor
environment, all while considering visual comfort of occupants. The proposed approach provides an estimated
13% electrical energy savings in one open-plan office of a University building in one working day. Practical
implementation of RFID gateways provide real-world occupancy profiling data to be fused with Passive
Infrared sensing towards analysis and improvement of building lighting usage and control.
Resumo:
The inclusion of the Discrete Wavelet Transform in the JPEG-2000 standard has added impetus to the research of hardware architectures for the two-dimensional wavelet transform. In this paper, a VLSI architecture for performing the symmetrically extended two-dimensional transform is presented. This architecture conforms to the JPEG-2000 standard and is capable of near-optimal performance when dealing with the image boundaries. The architecture also achieves efficient processor utilization. Implementation results based on a Xilinx Virtex-2 FPGA device are included.