182 resultados para Compute Unified Device Architecture(CUDA)
Resumo:
Soft error has become one of the major areas of attention with the device scaling and large scale integration. Lot of variants for superscalar architecture were proposed with focus on program re-execution, thread re-execution and instruction re-execution. In this paper we proposed a fault tolerant micro-architecture of pipelined RISC. The proposed architecture, Floating Resources Extended pipeline (FREP), re-executes the instructions using extended pipeline stages. The instructions are re-executed by hybrid architecture with a suitable combination of space and time redundancy.
Resumo:
In this paper we explore an implementation of a high-throughput, streaming application on REDEFINE-v2, which is an enhancement of REDEFINE. REDEFINE is a polymorphic ASIC combining the flexibility of a programmable solution with the execution speed of an ASIC. In REDEFINE Compute Elements are arranged in an 8x8 grid connected via a Network on Chip (NoC) called RECONNECT, to realize the various macrofunctional blocks of an equivalent ASIC. For a 1024-FFT we carry out an application-architecture design space exploration by examining the various characterizations of Compute Elements in terms of the size of the instruction store. We further study the impact by using application specific, vectorized FUs. By setting up different partitions of the FFT algorithm for persistent execution on REDEFINE-v2, we derive the benefits of setting up pipelined execution for higher performance. The impact of the REDEFINE-v2 micro-architecture for any arbitrary N-point FFT (N > 4096) FFT is also analyzed. We report the various algorithm-architecture tradeoffs in terms of area and execution speed with that of an ASIC implementation. In addition we compare the performance gain with respect to a GPP.
Resumo:
Precision, sophistication and economic factors in many areas of scientific research that demand very high magnitude of compute power is the order of the day. Thus advance research in the area of high performance computing is getting inevitable. The basic principle of sharing and collaborative work by geographically separated computers is known by several names such as metacomputing, scalable computing, cluster computing, internet computing and this has today metamorphosed into a new term known as grid computing. This paper gives an overview of grid computing and compares various grid architectures. We show the role that patterns can play in architecting complex systems, and provide a very pragmatic reference to a set of well-engineered patterns that the practicing developer can apply to crafting his or her own specific applications. We are not aware of pattern-oriented approach being applied to develop and deploy a grid. There are many grid frameworks that are built or are in the process of being functional. All these grids differ in some functionality or the other, though the basic principle over which the grids are built is the same. Despite this there are no standard requirements listed for building a grid. The grid being a very complex system, it is mandatory to have a standard Software Architecture Specification (SAS). We attempt to develop the same for use by any grid user or developer. Specifically, we analyze the grid using an object oriented approach and presenting the architecture using UML. This paper will propose the usage of patterns at all levels (analysis. design and architectural) of the grid development.
Resumo:
Simultaneous consideration of both performance and reliability issues is important in the choice of computer architectures for real-time aerospace applications. One of the requirements for such a fault-tolerant computer system is the characteristic of graceful degradation. A shared and replicated resources computing system represents such an architecture. In this paper, a combinatorial model is used for the evaluation of the instruction execution rate of a degradable, replicated resources computing system such as a modular multiprocessor system. Next, a method is presented to evaluate the computation reliability of such a system utilizing a reliability graph model and the instruction execution rate. Finally, this computation reliability measure, which simultaneously describes both performance and reliability, is applied as a constraint in an architecture optimization model for such computing systems. Index Terms-Architecture optimization, computation
Resumo:
The properties of the manifold of a Lie groupG, fibered by the cosets of a sub-groupH, are exploited to obtain a geometrical description of gauge theories in space-timeG/H. Gauge potentials and matter fields are pullbacks of equivariant fields onG. Our concept of a connection is more restricted than that in the similar scheme of Ne'eman and Regge, so that its degrees of freedom are just those of a set of gauge potentials forG, onG/H, with no redundant components. The ldquotranslationalrdquo gauge potentials give rise in a natural way to a nonsingular tetrad onG/H. The underlying groupG to be gauged is the groupG of left translations on the manifoldG and is associated with a ldquotrivialrdquo connection, namely the Maurer-Cartan form. Gauge transformations are all those diffeomorphisms onG that preserve the fiber-bundle structure.
Resumo:
Experimental results pertaining to the initiation, dynamics and mechanism of cavitation erosion on poly(methyl methacrylate) specimens tested in a rotating disk device are described in detail. Erosion normally starts at the location nearest to the center of rotation (CR). As the exposure time to cavitation increases, additional erosion areas or sites appear away from the CR and secondary erosion (induced by eroded pits) spreads upstream and merges with the main pit. The microcracks increase in density towards the end of the incubation period and transform into macrocracks in most cases. A study of light optical photographs and scanning electron micrographs of the eroded area shows that material particles are removed from the network of cracks because of crack joining and pits indicate particle debris. Optical degradation (loss of transmittance) is observed to be greater on the back of the specimen than on the front.
Resumo:
This paper presents the architecture and the VHDL design of an integer 2-D DCT used in the H.264/AVC. The 2-D DCT computation is performed by exploiting it’s orthogonality and separability property. The symmetry of the forward and inverse transform is used in this implementation. To reduce the computation overhead for the addition, subtraction and multiplication operations, we analyze the suitability of carry-free position independent residue number system (RNS) for the implementation of 2-D DCT. The implementation has been carried out in VHDL for Altera FPGA. We used the negative number representation in RNS, bit width analysis of the transforms and dedicated registers present in the Logic element of the FPGA to optimize the area. The complexity and efficiency analysis show that the proposed architecture could provide higher through-put.
Resumo:
We discuss micro ring resonator based optical logic gates using Kerr-type nonlinearity. Resonant wavelength selectivity is one key factor in achieving the desired gate. Based on basic gates like AND gate, OR gate etc. We proceed to propose a 3-bit binary adder circuit.Due to the presence of more than a single wavelength, the system gets complicated as we increase the number of components in the circuit. Hence it has been observed that for efficient designing and functioning of digital circuits in optical domain, we need a device which can give single wavelength output, filtering out all other wavelengths and at the same time preserve the digital characteristics of the output. We propose such filter-preserver device based on micro ring resonator.
Resumo:
This paper presents the architecture of a fault-tolerant, special-purpose multi-microprocessor system for solving Partial Differential Equations (PDEs). The modular nature of the architecture allows the use of hundreds of Processing Elements (PEs) for high throughput. Its performance is evaluated by both analytical and simulation methods. The results indicate that the system can achieve high operation rates and is not sensitive to inter-processor communication delay.
Resumo:
A recent article on the unified theory of Elementary Particle Forces by Howard Georgi and Sheldon Glashow (September 1980, page 30) points out that the unification of strong, weak and electromagnetic interactions involves the appearance of particles having almost macroscopic masses of about a nanogram (~1014 GeV). Such superheavy particles seem to be an inevitable feature of most grand unified theories Gravitation is still, however, left out of these various schemes.
Resumo:
The short duration of the Doppler signal and noise content in it necessitate a validation scheme to be incorporated in the electronic processor used for frequency measurement, There are several different validation schemes that can be employed in period timing devices. A detailed study of the influence of these validation schemes on the measured frequency has been reported here. These studies were carried out by using a combination of a fast A/D converter and computer. Doppler bursts obtained from an air flow were digitised and stored on magnetic discs. Suitable computer programs were then used to simulate the performance of period timing devices with different validation schemes and the frequency of the stored bursts were evaluated. It is found that best results are obtained when the validation scheme enables frequency measurement to be made over a large number of cycles within the burst.
Resumo:
In this paper, three parallel polygon scan conversion algorithms have been proposed, and their performance when executed on a shared bus architecture has been compared. It has been shown that the parallel algorithm that does not use edge coherence performs better than those that use edge coherence. Further, a multiprocessing architecture has been proposed to execute the parallel polygon scan conversion algorithms more efficiently than a single shared bus architecture.
Resumo:
In our earlier work ([1]) we proposed WLAN Manager (or WM) a centralised controller for QoS management of infrastructure WLANs based on the IEEE 802.11 DCF standards. The WM approach is based on queueing and scheduling packets in a device that sits between all traffic flowing between the APs and the wireline LAN, requires no changes to the AP or the STAs, and can be viewed as implementing a "Split-MAC" architecture. The objectives of WM were to manage various TCP performance related issues (such as the throughput "anomaly" when STAs associate with an AP with mixed PHY rates, and upload-download unfairness induced by finite AP buffers), and also to serve as the controller for VoIP admission control and handovers, and for other QoS management measures. In this paper we report our experiences in implementing the proposals in [1]: the insights gained, new control techniques developed, and the effectiveness of the WM approach in managing TCP performance in an infrastructure WLAN. We report results from a hybrid experiment where a physical WM manages actual TCP controlled packet flows between a server and clients, with the WLAN being simulated, and also from a small physical testbed with an actual AP.
Resumo:
A period timing device suitable for processing laser Doppler anemometer signals has been described here. The important features of this instrument are: it is inexpensive, simple to operate, and easy to fabricate. When the concentration of scattering particles is low the Doppler signal is in the form of a burst and the Doppler frequency is measured by timing the zero crossings of the signal. But the presence of noise calls for the use of validation criterion, and a 5–8 cycles comparison has been used in this instrument. Validation criterion requires the differential count between the 5 and 8 cycles to be multiplied by predetermined numbers that prescribe the accuracy of measurement. By choosing these numbers to be binary numbers, much simplification in circuit design has been accomplished since this permits the use of shift registers for multiplication. Validation accuracies of 1.6%, 3.2%, 6.3%, and 12.5% are possible with this device. The design presented here is for a 16-bit processor and uses TTL components. By substituting Schottky barrier TTLs the clock frequency can be increased from about 10 to 30 MHz resulting in an extension in the range of the instrument. Review of Scientific Instruments is copyrighted by The American Institute of Physics.