21 resultados para perforated vent tiles
Resumo:
Video decoders used in emerging applications need to be flexible to handle a large variety of video formats and deliver scalable performance to handle wide variations in workloads. In this paper we propose a unified software and hardware architecture for video decoding to achieve scalable performance with flexibility. The light weight processor tiles and the reconfigurable hardware tiles in our architecture enable software and hardware implementations to co-exist, while a programmable interconnect enables dynamic interconnection of the tiles. Our process network oriented compilation flow achieves realization agnostic application partitioning and enables seamless migration across uniprocessor, multi-processor, semi hardware and full hardware implementations of a video decoder. An application quality of service aware scheduler monitors and controls the operation of the entire system. We prove the concept through a prototype of the architecture on an off-the-shelf FPGA. The FPGA prototype shows a scaling in performance from QCIF to 1080p resolutions in four discrete steps. We also demonstrate that the reconfiguration time is short enough to allow migration from one configuration to the other without any frame loss.
Resumo:
It has been shown recently that the acoustic performance of the extended tube expansion chambers can be improved substantially by making the extended inlet and outlet equal to half and quarter chamber lengths, duly incorporating the end corrections due to the evanescent higher order modes that would be generated at the discontinuities. Such chambers however suffer from the disadvantages of high back pressure and generation of aerodynamic noise at the area discontinuities. These two disadvantages can be overcome by means of a perforated bridge between the extended inlet and extended outlet. This paper deals with design or tuning of these extended concentric tube resonators.
Resumo:
Most stencil computations allow tile-wise concurrent start, i.e., there always exists a face of the iteration space and a set of tiling hyperplanes such that all tiles along that face can be started concurrently. This provides load balance and maximizes parallelism. However, existing automatic tiling frameworks often choose hyperplanes that lead to pipelined start-up and load imbalance. We address this issue with a new tiling technique that ensures concurrent start-up as well as perfect load-balance whenever possible. We first provide necessary and sufficient conditions on tiling hyperplanes to enable concurrent start for programs with affine data accesses. We then provide an approach to find such hyperplanes. Experimental evaluation on a 12-core Intel Westmere shows that our code is able to outperform a tuned domain-specific stencil code generator by 4% to 27%, and previous compiler techniques by a factor of 2x to 10.14x.
Resumo:
Exhaust noise in engines has always been a major source of automotive noise. Challenges for muffler design have been constraints on size, back pressure, and, of course, the cost. Designing for sufficient insertion loss at the engine firing frequency and the first few harmonics has been the biggest challenge. Most advances in the design of efficient mufflers have resulted from linear plane wave theory, making use of the transfer matrix method. This review paper deals with evaluating approximate source characteristics required for prediction of the unmuffled intake and exhaust noise, making use of the electroacoustical analogies. In the last few years, significant advances have been made in the analysis of variable area perforated ducts, transverse plane wave analysis of short elliptical as well as circular chambers, double-tuned expansion chambers and concentric tube resonators, catalytic converters, diesel particulate filters, air cleaners, etc. The development of long strand fibrous materials that can be used in hot exhaust systems without binders has led to the use of combination mufflers in exhaust systems. Breakthroughs have been achieved in the prediction and control of breakout noise from the elliptical and circular muffler shell as well as the end plates of typical mufflers. Diesel particulate filters and inlet air cleaners have also been modeled acoustically. Some of these recent advances are the subject of this review paper.
Resumo:
Complexity of mufflers generally introduces considerable pressure drop, which affects the engine performance adversely. Not much literature is available for pressure drop across perforates. In this paper, the stagnation pressure drop across perforated muffler elements has been measured experimentally and generalized expressions have been developed for the pressure loss across cross-flow expansion and cross-flow contraction elements. A flow resistance model available in the literature has been made use of to analytically determine the flow distribution and thereby the pressure drop of mufflers. A generalized expression has been derived here for evaluation of the equivalent flow resistance for parallel flow paths. Expressions for flow resistance across perforated elements, derived by means of flow experiments, have been implemented in the flow resistance network. The results have been validated with experimental data. Thus, the newly developed integrated flow resistance networks would enable us to determine the normalized stagnation pressure drop of commercial automotive mufflers, thus enabling an efficient flow-acoustic design of silencing systems.
Resumo:
Multi-GPU machines are being increasingly used in high-performance computing. Each GPU in such a machine has its own memory and does not share the address space either with the host CPU or other GPUs. Hence, applications utilizing multiple GPUs have to manually allocate and manage data on each GPU. Existing works that propose to automate data allocations for GPUs have limitations and inefficiencies in terms of allocation sizes, exploiting reuse, transfer costs, and scalability. We propose a scalable and fully automatic data allocation and buffer management scheme for affine loop nests on multi-GPU machines. We call it the Bounding-Box-based Memory Manager (BBMM). BBMM can perform at runtime, during standard set operations like union, intersection, and difference, finding subset and superset relations on hyperrectangular regions of array data (bounding boxes). It uses these operations along with some compiler assistance to identify, allocate, and manage data required by applications in terms of disjoint bounding boxes. This allows it to (1) allocate exactly or nearly as much data as is required by computations running on each GPU, (2) efficiently track buffer allocations and hence maximize data reuse across tiles and minimize data transfer overhead, and (3) and as a result, maximize utilization of the combined memory on multi-GPU machines. BBMM can work with any choice of parallelizing transformations, computation placement, and scheduling schemes, whether static or dynamic. Experiments run on a four-GPU machine with various scientific programs showed that BBMM reduces data allocations on each GPU by up to 75% compared to current allocation schemes, yields performance of at least 88% of manually written code, and allows excellent weak scaling.