63 resultados para 120100 ARCHITECTURE
em Indian Institute of Science - Bangalore - Índia
Resumo:
Simultaneous consideration of both performance and reliability issues is important in the choice of computer architectures for real-time aerospace applications. One of the requirements for such a fault-tolerant computer system is the characteristic of graceful degradation. A shared and replicated resources computing system represents such an architecture. In this paper, a combinatorial model is used for the evaluation of the instruction execution rate of a degradable, replicated resources computing system such as a modular multiprocessor system. Next, a method is presented to evaluate the computation reliability of such a system utilizing a reliability graph model and the instruction execution rate. Finally, this computation reliability measure, which simultaneously describes both performance and reliability, is applied as a constraint in an architecture optimization model for such computing systems. Index Terms-Architecture optimization, computation
Resumo:
This paper presents the architecture and the VHDL design of an integer 2-D DCT used in the H.264/AVC. The 2-D DCT computation is performed by exploiting it’s orthogonality and separability property. The symmetry of the forward and inverse transform is used in this implementation. To reduce the computation overhead for the addition, subtraction and multiplication operations, we analyze the suitability of carry-free position independent residue number system (RNS) for the implementation of 2-D DCT. The implementation has been carried out in VHDL for Altera FPGA. We used the negative number representation in RNS, bit width analysis of the transforms and dedicated registers present in the Logic element of the FPGA to optimize the area. The complexity and efficiency analysis show that the proposed architecture could provide higher through-put.
Resumo:
This paper presents the architecture of a fault-tolerant, special-purpose multi-microprocessor system for solving Partial Differential Equations (PDEs). The modular nature of the architecture allows the use of hundreds of Processing Elements (PEs) for high throughput. Its performance is evaluated by both analytical and simulation methods. The results indicate that the system can achieve high operation rates and is not sensitive to inter-processor communication delay.
Resumo:
Packet forwarding is a memory-intensive application requiring multiple accesses through a trie structure. With the requirement to process packets at line rates, high-performance routers need to forward millions of packets every second with each packet needing up to seven memory accesses. Earlier work shows that a single cache for the nodes of a trie can reduce the number of external memory accesses. It is observed that the locality characteristics of the level-one nodes of a trie are significantly different from those of lower level nodes. Hence, we propose a heterogeneously segmented cache architecture (HSCA) which uses separate caches for level-one and lower level nodes, each with carefully chosen sizes. Besides reducing misses, segmenting the cache allows us to focus on optimizing the more frequently accessed level-one node segment. We find that due to the nonuniform distribution of nodes among cache sets, the level-one nodes cache is susceptible t high conflict misses. We reduce conflict misses by introducing a novel two-level mapping-based cache placement framework. We also propose an elegant way to fit the modified placement function into the cache organization with minimal increase in access time. Further, we propose an attribute preserving trace generation methodology which emulates real traces and can generate traces with varying locality. Performanc results reveal that our HSCA scheme results in a 32 percent speedup in average memory access time over a unified nodes cache. Also, HSC outperforms IHARC, a cache for lookup results, with as high as a 10-fold speedup in average memory access time. Two-level mappin further enhances the performance of the base HSCA by up to 13 percent leading to an overall improvement of up to 40 percent over the unified scheme.
Resumo:
In this paper, three parallel polygon scan conversion algorithms have been proposed, and their performance when executed on a shared bus architecture has been compared. It has been shown that the parallel algorithm that does not use edge coherence performs better than those that use edge coherence. Further, a multiprocessing architecture has been proposed to execute the parallel polygon scan conversion algorithms more efficiently than a single shared bus architecture.
Resumo:
With the intent of probing the feasibility of employing annulation as a tactic to engender axial rich conformations in nucleoside analogues, two adenine-derived, ``conformationally restricted'' nucleocylitols, 9 and 10, have been conceptualized as representatives of a hitherto unexplored class of nucleic acid base-cyclitol hybrids. A general synthetic strategy, with an inherent scope for diversification, allowed rapid functionalization of indane and tetralin to furnish 9 and 10 respectively in fair yield. Single-crystal X-ray diffraction analysis revealed that the two nucleocyclitols under study, though homologous, present completely dissimilar modes of molecular packing, marked, in particular, by the nature of involvement of the adenynyl NH2 group in the supramolecular assembly. In addition, the crystal structures of 9 and 10 also exhibit two different conformations of the functionalized cyclohexane ring. Thus, while the six-membered carbocycle in cyclopenta-annulated 9 exists in the expected chair (C) conformation that in cyclohexaannulated 10, which crystallizes as a dihydrate, shows an unusual twist-boat (TB) conformation. From a close analysis of the (HNMR)-H-1 spectroscopic data recorded for 9 and 10 in CD3OD, it was possible to put forth a putative explanation for the uncanny conformational preferences of crystalline 9 and 10.
Resumo:
Run-time interoperability between different applications based on H.264/AVC is an emerging need in networked infotainment, where media delivery must match the desired resolution and quality of the end terminals. In this paper, we describe the architecture and design of a polymorphic ASIC to support this. The H.264 decoding flow is partitioned into modules, such that the polymorphic ASIC meets the design goals of low-power, low-area, high flexibility, high throughput and fast interoperability between different profiles and levels of H.264. We demonstrate the idea with a multi-mode decoder that can decode baseline, main and high profile H.264 streams and can interoperate at run.time across these profiles. The decoder is capable of processing frame sizes of up to 1024 times 768 at 30 fps. The design synthesized with UMC 0.13 mum technology, occupies 250 k gates and runs at 100 MHz.
Resumo:
Copper(I) complexes with {Cu(μ2-S)N}4 and {Cu(μ3-S)N}12 core portions of butterfly-shaped or double wheel architectures have been isolated in the reaction of Cu(I) with the Schiff base ligand C6H4(CHNC6H4S)2, aiso-abtâ, under different conditions. View the MathML source containing the tetranuclear electroneutral complex View the MathML source is formed by the reaction of CuI in acetonitrilic solution and recrystallization from DMF, whereas View the MathML source containing dodecanuclear View the MathML source wheels is accessible starting from CuBF4. Complexes 2 and 4 represent the first examples of cyclic complexes with the same overall stoichiometry but different ring sizes. The ligand induces two different coordination environments around copper(I) by switching between μ2- and μ3-sulfur bridging modes.
Resumo:
In modern wireline and wireless communication systems, Viterbi decoder is one of the most compute intensive and essential elements. Each standard requires a different configuration of Viterbi decoder. Hence there is a need to design a flexible reconfigurable Viterbi decoder to support different configurations on a single platform. In this paper we present a reconfigurable Viterbi decoder which can be reconfigured for standards such as WCDMA, CDMA2000, IEEE 802.11, DAB, DVB, and GSM. Different parameters like code rate, constraint length, polynomials and truncation length can be configured to map any of the above mentioned standards. Our design provides higher throughput and scalable power consumption in various configuration of the reconfigurable Viterbi decoder. The power and throughput can also be optimized for different standards.
Resumo:
REDEFINE is a reconfigurable SoC architecture that provides a unique platform for high performance and low power computing by exploiting the synergistic interaction between coarse grain dynamic dataflow model of computation (to expose abundant parallelism in applications) and runtime composition of efficient compute structures (on the reconfigurable computation resources). We propose and study the throttling of execution in REDEFINE to maximize the architecture efficiency. A feature specific fast hybrid (mixed level) simulation framework for early in design phase study is developed and implemented to make the huge design space exploration practical. We do performance modeling in terms of selection of important performance criteria, ranking of the explored throttling schemes and investigate effectiveness of the design space exploration using statistical hypothesis testing. We find throttling schemes which give appreciable (24.8%) overall performance gain in the architecture and 37% resource usage gain in the throttling unit simultaneously.
Resumo:
H.264 video standard achieves high quality video along with high data compression when compared to other existing video standards. H.264 uses context-based adaptive variable length coding (CAVLC) to code residual data in Baseline profile. In this paper we describe a novel architecture for CAVLC decoder including coeff-token decoder, level decoder total-zeros decoder and run-before decoder UMC library in 0.13 mu CMOS technology is used to synthesize the proposed design. The proposed design reduces chip area and improves critical path performance of CAVLC decoder in comparison with [1]. Macroblock level (including luma and chroma) pipeline processing for CAVLC is implemented with an average of 141 cycles (including pipeline buffering) per macroblock at 250MHz clock frequency. To compare our results with [1] clock frequency is constrained to 125MHz. The area required for the proposed architecture is 17586 gates, which is 22.1% improvement in comparison to [1]. We obtain a throughput of 1.73 * 10(6) macroblocks/second, which is 28% higher than that reported in [1]. The proposed design meets the processing requirement of 1080HD [5] video at 30frames/seconds.
Resumo:
The physical design of a VLSI circuit involves circuit partitioning as a subtask. Typically, it is necessary to partition a large electrical circuit into several smaller circuits such that the total cross-wiring is minimized. This problem is a variant of the more general graph partitioning problem, and it is known that there does not exist a polynomial time algorithm to obtain an optimal partition. The heuristic procedure proposed by Kernighan and Lin1,2 requires O(n2 log2n) time to obtain a near-optimal two-way partition of a circuit with n modules. In the VLSI context, due to the large problem size involved, this computational requirement is unacceptably high. This paper is concerned with the hardware acceleration of the Kernighan-Lin procedure on an SIMD architecture. The proposed parallel partitioning algorithm requires O(n) processors, and has a time complexity of O(n log2n). In the proposed scheme, the reduced array architecture is employed with due considerations towards cost effectiveness and VLSI realizability of the architecture.The authors are not aware of any earlier attempts to parallelize a circuit partitioning algorithm in general or the Kernighan-Lin algorithm in particular. The use of the reduced array architecture is novel and opens up the possibilities of using this computing structure for several other applications in electronic design automation.
Resumo:
Modern wireline and wireless communication devices are multimode and multifunctional communication devices. In order to support multiple standards on a single platform, it is necessary to develop a reconfigurable architecture that can provide the required flexibility and performance. The Channel decoder is one of the most compute intensive and essential elements of any communication system. Most of the standards require a reconfigurable Channel decoder that is capable of performing Viterbi decoding and Turbo decoding. Furthermore, the Channel decoder needs to support different configurations of Viterbi and Turbo decoders. In this paper, we propose a reconfigurable Channel decoder that can be reconfigured for standards such as WCDMA, CDMA2000, IEEE802.11, DAB, DVB and GSM. Different parameters like code rate, constraint length, polynomials and truncation length can be configured to map any of the above mentioned standards. A multiprocessor approach has been followed to provide higher throughput and scalable power consumption in various configurations of the reconfigurable Viterbi decoder and Turbo decoder. We have proposed A Hybrid register exchange approach for multiprocessor architecture to minimize power consumption.
Resumo:
In this paper, we propose a systolic architecture for hidden-surface removal. Systolic architecture is a kind of parallel architecture best known for its easy VLSI implementability. After discussing the design details of the architecture, we present the results of the simulation experiments conducted in order to evaluate the performance of the architecture.