Biblioteca Digital

176 resultados para Graphics hardware

An interactive graphics system for 2-D drawing and design

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper is about a software system, GRASS-Graphic Software System for 2-D drawing and design—which has been implemented on a PDP-11/35 system with RSX-11M operating system. It is a low cost interactive graphics system for the design of two dimensional drawings and uses a minimum of hardware. It provides comprehensive facilities for creating, editing, storing and retrieving pictures. It has been implemented in the language Pascal and has the potential to be used as a powerful data-imputting tool for a design-automation system. The important features of the system are its low cost, software character generation and a user-trainable character recognizer, which has been included.

Design of a Multibus-compatible colour graphics subsystem

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Colour graphics subsystems can be used in a variety of applications such as high-end business graphics, low-end scientific computations, and for realtime display of process control diagrams. The design of such a subsystem is shown. This subsystem can be added to any Multibus-compatible microcomputer system. The use of an NEC 7220 graphics display controller chip has simplified the design to a considerable extent. CGRAM (CORE graphics on Multibus), a comprehensive subset of the CORE graphics standard package, is supported on the subsystem.

Implementation of an interactive relational graphics database

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The benefits that accrue from the use of design database include (i) reduced costs of preparing data for application programs and of producing the final specification, and (ii) possibility of later usage of data stored in the database for other applications related to Computer Aided Engineering (CAE). An INTEractive Relational GRAphics Database (INTERGRAD) based on relational models has been developed to create, store, retrieve and update the data related to two dimensional drawings. INTERGRAD provides two languages, Picture Definition Language (PDL) and Picture Manipulation Language (PML). The software package has been implemented on a PDP 11/35 system under the RSX-11M version 3.1 operating system and uses the graphics facility consisting of a VT-11 graphics terminal, the DECgraphic 11 software and an input device, a lightpen.

Synergistic Execution of Stream Programs on Multicores with Accelerators

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The StreamIt programming model has been proposed to exploit parallelism in streaming applications oil general purpose multicore architectures. The StreamIt graphs describe task, data and pipeline parallelism which can be exploited on accelerators such as Graphics Processing Units (GPUs) or CellBE which support abundant parallelism in hardware. In this paper, we describe a novel method to orchestrate the execution of if StreamIt program oil a multicore platform equipped with an accelerator. The proposed approach identifies, using profiling, the relative benefits of executing a task oil the superscalar CPU cores and the accelerator. We formulate the problem of partitioning the work between the CPU cores and the GPU, taking into account the latencies for data transfers and the required buffer layout transformations associated with the partitioning, as all integrated Integer Linear Program (ILP) which can then be solved by an ILP solver. We also propose an efficient heuristic algorithm for the work-partitioning between the CPU and the GPU, which provides solutions which are within 9.05% of the optimal solution on an average across the benchmark Suite. The partitioned tasks are then software pipelined to execute oil the multiple CPU cores and the Streaming Multiprocessors (SMs) of the GPU. The software pipelining algorithm orchestrates the execution between CPU cores and the GPU by emitting the code for the CPU and the GPU, and the code for the required data transfers. Our experiments on a platform with 8 CPU cores and a GeForce 8800 GTS 512 GPU show a geometric mean speedup of 6.94X with it maximum of 51.96X over it single threaded CPU execution across the StreamIt benchmarks. This is a 18.9% improvement over it partitioning strategy that maps only the filters that cannot be executed oil the GPU - the filters with state that is persistent across firings - onto the CPU.

Software Pipelined Execution of Stream Programs on GPUs

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The StreamIt programming model has been proposed to exploit parallelism in streaming applications on general purpose multi-core architectures. This model allows programmers to specify the structure of a program as a set of filters that act upon data, and a set of communication channels between them. The StreamIt graphs describe task, data and pipeline parallelism which can be exploited on modern Graphics Processing Units (GPUs), as they support abundant parallelism in hardware. In this paper, we describe the challenges in mapping StreamIt to GPUs and propose an efficient technique to software pipeline the execution of stream programs on GPUs. We formulate this problem - both scheduling and assignment of filters to processors - as an efficient Integer Linear Program (ILP), which is then solved using ILP solvers. We also describe a novel buffer layout technique for GPUs which facilitates exploiting the high memory bandwidth available in GPUs. The proposed scheduling utilizes both the scalar units in GPU, to exploit data parallelism, and multiprocessors, to exploit task and pipelin parallelism. Further it takes into consideration the synchronization and bandwidth limitations of GPUs, and yields speedups between 1.87X and 36.83X over a single threaded CPU.

A Hardware Grid Simulator to Test Grid-Connected Inverter Systems.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Grid-connected systems when put to use at the site would experience scenarios like voltage sag, voltage swell, frequency deviations and unbalance which are common in the real world grid. When these systems are tested at laboratory, these scenarios do not exist and an almost stiff voltage source is what is usually seen. But, to qualify the grid-connected systems to operate at the site, it becomes essential to test them under the grid conditions mentioned earlier. The grid simulator is a hardware that can be programmed to generate some of the typical conditions experienced by the grid-connected systems at site. It is an inverter that is controlled to act like a voltage source in series with a grid impedance. The series grid impedance is emulated virtually within the inverter control rather than through physical components, thus avoiding the losses and the need for bulky reactive components. This paper describes the design of a grid simulator. Control implementation issues are highlighted in the experimental results.

Parallel min-cut placement on reduced hardware SIMD architecture.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Massively parallel SIMD computing is applied to obtain an order of magnitude improvement in the executional speed of an important algorithm in VLSI design automation. The physical design of a VLSI circuit involves logic module placement as a subtask. The paper is concerned with accelerating the well known Min-cut placement technique for logic cell placement. The inherent parallelism of the Min-cut algorithm is identified, and it is shown that a parallel machine based on the efficient execution of the placement procedure.

Toward Effective Scalar Hardware for Highly Vectorizable Applications

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The performance of a program will ultimately be limited by its serial (scalar) portion, as pointed out by Amdahl′s Law. Reported studies thus far of instruction-level parallelism have mixed data-parallel program portions with scalar program portions, often leading to contradictory and controversial results. We report an instruction-level behavioral characterization of scalar code containing minimal data-parallelism, extracted from highly vectorized programs of the PERFECT benchmark suite running on a Cray Y-MP system. We classify scalar basic blocks according to their instruction mix, characterize the data dependencies seen in each class, and, as a first step, measure the maximum intrablock instruction-level parallelism available. We observe skewed rather than balanced instruction distributions in scalar code and in individual basic block classes of scalar code; nonuniform distribution of parallelism across instruction classes; and, as expected, limited available intrablock parallelism. We identify frequently occurring data-dependence patterns and discuss new instructions to reduce latency. Toward effective scalar hardware, we study latency-pipelining trade-offs and restricted multiple instruction issue mechanisms.

Accelerating frequency-domain diffuse optical tomographic image reconstruction using graphics processing units

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Diffuse optical tomographic image reconstruction uses advanced numerical models that are computationally costly to be implemented in the real time. The graphics processing units (GPUs) offer desktop massive parallelization that can accelerate these computations. An open-source GPU-accelerated linear algebra library package is used to compute the most intensive matrix-matrix calculations and matrix decompositions that are used in solving the system of linear equations. These open-source functions were integrated into the existing frequency-domain diffuse optical image reconstruction algorithms to evaluate the acceleration capability of the GPUs (NVIDIA Tesla C 1060) with increasing reconstruction problem sizes. These studies indicate that single precision computations are sufficient for diffuse optical tomographic image reconstruction. The acceleration per iteration can be up to 40, using GPUs compared to traditional CPUs in case of three-dimensional reconstruction, where the reconstruction problem is more underdetermined, making the GPUs more attractive in the clinical settings. The current limitation of these GPUs in the available onboard memory (4 GB) that restricts the reconstruction of a large set of optical parameters, more than 13, 377. (C) 2010 Society of Photo-Optical Instrumentation Engineers. DOI: 10.1117/1.3506216]

Graphics Processing Units for the Real-time Linear Elastostatic Simulation of Liver

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Biomedical engineering solutions like surgical simulators need High Performance Computing (HPC) to achieve real-time performance. Graphics Processing Units (GPUs) offer HPC capabilities at low cost and low power consumption. In this work, it is demonstrated that a liver which is discretized by about 2500 finite element nodes, can be graphically simulated in realtime, by making use of a GPU. Present work takes into consideration the time needed for the data transfer from CPU to GPU and back from GPU to CPU. Although behaviour of liver is very complicated, present computer simulation assumes linear elastostatics. One needs to use the commercial software ANSYS to obtain the global stiffness matrix of the liver. Results show that GPUs are useful for the real-time graphical simulation of liver, which in turn is needed in simulators that are used for training surgeons in laparoscopic surgery. Although the computer simulation should involve rendering also, neither rendering, nor the time needed for rendering and displaying the liver on a screen, is considered in the present work. The present work is just a demonstration of a concept; the concept is not really implemented and validated. Future work is to develop software which can accomplish real-time and very realistic graphical simulation of liver, with rendered image of liver on the screen changing in real-time according to the position of the surgical tool tip approximated as the mouse cursor in 3D.

Hardware implementation of 4x4 DCT/Quantization block using multiplication and error-free algorithm

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The 4ÃÂ4 discrete cosine transform is one of the most important building blocks for the emerging video coding standard, viz. H.264. The conventional implementation does some approximation to the transform matrix elements to facilitate integer arithmetic, for which hardware is suitably prepared. Though the transform coding does not involve any multiplications, quantization process requires sixteen 16-bit multiplications. The algorithm used here eliminates the process of approximation in transform coding and multiplication in the quantization process, by usage of algebraic integer coding. We propose an area-efficient implementation of the transform and quantization blocks based on the algebraic integer coding. The designs were synthesized with 90 nm TSMC CMOS technology and were also implemented on a Xilinx FPGA. The gate counts and throughput achievable in this case are 7000 and 125 Msamples/sec.

Derating Based Hardware Optimizations in Soft Error Tolerant Designs

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Ensuring reliable operation over an extended period of time is one of the biggest challenges facing present day electronic systems. The increased vulnerability of the components to atmospheric particle strikes poses a big threat in attaining the reliability required for various mission critical applications. Various soft error mitigation methodologies exist to address this reliability challenge. A general solution to this problem is to arrive at a soft error mitigation methodology with an acceptable implementation overhead and error tolerance level. This implementation overhead can then be reduced by taking advantage of various derating effects like logical derating, electrical derating and timing window derating, and/or making use of application redundancy, e. g. redundancy in firmware/software executing on the so designed robust hardware. In this paper, we analyze the impact of various derating factors and show how they can be profitably employed to reduce the hardware overhead to implement a given level of soft error robustness. This analysis is performed on a set of benchmark circuits using the delayed capture methodology. Experimental results show upto 23% reduction in the hardware overhead when considering individual and combined derating factors.

Interplay of software bloat, hardware energy proportionality and system bottlenecks

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In large flexible software systems, bloat occurs in many forms, causing excess resource utilization and resource bottlenecks. This results in lost throughput and wasted joules. However, mitigating bloat is not easy; efforts are best applied where savings would be substantial. To aid this we develop an analytical model establishing the relation between bottleneck in resources, bloat, performance and power. Analyses with the model places into perspective results from the first experimental study of the power-performance implications of bloat. In the experiments we find that while bloat reduction can provide as much as 40% energy savings, the degree of impact depends on hardware and software characteristics. We confirm predictions from our model with selected results from our experimental study. Our findings show that a software-only view is inadequate when assessing the effects of bloat. The impact of bloat on physical resource usage and power should be understood for a full systems perspective to properly deploy bloat reduction solutions and reap their power-performance benefits.

USHA: unified software and hardware architecture for video decoding

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Video decoders used in emerging applications need to be flexible to handle a large variety of video formats and deliver scalable performance to handle wide variations in workloads. In this paper we propose a unified software and hardware architecture for video decoding to achieve scalable performance with flexibility. The light weight processor tiles and the reconfigurable hardware tiles in our architecture enable software and hardware implementations to co-exist, while a programmable interconnect enables dynamic interconnection of the tiles. Our process network oriented compilation flow achieves realization agnostic application partitioning and enables seamless migration across uniprocessor, multi-processor, semi hardware and full hardware implementations of a video decoder. An application quality of service aware scheduler monitors and controls the operation of the entire system. We prove the concept through a prototype of the architecture on an off-the-shelf FPGA. The FPGA prototype shows a scaling in performance from QCIF to 1080p resolutions in four discrete steps. We also demonstrate that the reconfiguration time is short enough to allow migration from one configuration to the other without any frame loss.

A study of the speed and the accuracy of the Boundary Element Method as applied to the computational simulation of biological organs

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this work, first a Fortran code is developed for three dimensional linear elastostatics using constant boundary elements; the code is based on a MATLAB code developed by the author earlier. Next, the code is parallelized using BLACS, MPI, and ScaLAPACK. Later, the parallelized code is used to demonstrate the usefulness of the Boundary Element Method (BEM) as applied to the realtime computational simulation of biological organs, while focusing on the speed and accuracy offered by BEM. A computer cluster is used in this part of the work. The commercial software package ANSYS is used to obtain the `exact' solution against which the solution from BEM is compared; analytical solutions, wherever available, are also used to establish the accuracy of BEM. A pig liver is the biological organ considered. Next, instead of the computer cluster, a Graphics Processing Unit (GPU) is used as the parallel hardware. Results indicate that BEM is an interesting choice for the simulation of biological organs. Although the use of BEM for the simulation of biological organs is not new, the results presented in the present study are not found elsewhere in the literature. Also, a serial MATLAB code, and both serial and parallel versions of a Fortran code, which can solve three dimensional (3D) linear elastostatic problems using constant boundary elements, are provided as supplementary files that can be freely downloaded.

«
1
2
3
4
5
6
7
8
...
11
12
»