37 resultados para Architecture design
Resumo:
Molecular self-assembly is of key importance for the rational design of advanced materials. To investigate the causal relation between molecular structure and the consequent self-assembled microstructure, self-assembled tubules of diacetylenic lipids were studied. Circular-dichroism studies give experimental evidence that the formation of tubules is driven by chiral molecular packing, in agreement with recent theories of tubules. On the basis of these results, a molecular mechanism for the formation of tubules is proposed.
Resumo:
Wave pipelining is a design technique for increasing the throughput of a digital circuit or system without introducing pipelining registers between adjacent combinational logic blocks in the circuit/system. However, this requires balancing of the delays along all the paths from the input to the output which comes the way of its implementation. Static CMOS is inherently susceptible to delay variation with input data, and hence, receives a low priority for wave pipelined digital design. On the other hand, ECL and CML, which are amenable to wave pipelining, lack the compactness and low power attributes of CMOS. In this paper we attempt to exploit wave pipelining in CMOS technology. We use a single generic building block in Normal Process Complementary Pass Transistor Logic (NPCPL), modeled after CPL, to achieve equal delay along all the propagation paths in the logic structure. An 8×8 b multiplier is designed using this logic in a 0.8 ?m technology. The carry-save multiplier architecture is modified suitably to support wave pipelining, viz., the logic depth of all the paths are made identical. The 1 mm×0.6 mm multiplier core supports a throughput of 400 MHz and dissipates a total power of 0.6 W. We develop simple enhancements to the NPCPL building blocks that allow the multiplier to sustain throughputs in excess of 600 MHz. The methodology can be extended to introduce wave pipelining in other circuits as well
Resumo:
We describe the design of a directory-based shared memory architecture on a hierarchical network of hypercubes. The distributed directory scheme comprises two separate hierarchical networks for handling cache requests and transfers. Further, the scheme assumes a single address space and each processing element views the entire network as contiguous memory space. The size of individual directories stored at each node of the network remains constant throughout the network. Although the size of the directory increases with the network size, the architecture is scalable. The results of the analytical studies demonstrate superior performance characteristics of our scheme compared with those of other schemes.
Resumo:
Regular Expressions are generic representations for a string or a collection of strings. This paper focuses on implementation of a regular expression matching architecture on reconfigurable fabric like FPGA. We present a Nondeterministic Finite Automata based implementation with extended regular expression syntax set compared to previous approaches. We also describe a dynamically reconfigurable generic block that implements the supported regular expression syntax. This enables formation of the regular expression hardware by a simple cascade of generic blocks as well as a possibility for reconfiguring the generic blocks to change the regular expression being matched. Further,we have developed an HDL code generator to obtain the VHDL description of the hardware for any regular expression set. Our optimized regular expression engine achieves a throughput of 2.45 Gbps. Our dynamically reconfigurable regular expression engine achieves a throughput of 0.8 Gbps using 12 FPGA slices per generic block on Xilinx Virtex2Pro FPGA.
Resumo:
Today's feature-rich multimedia products require embedded system solution with complex System-on-Chip (SoC) to meet market expectations of high performance at a low cost and lower energy consumption. The memory architecture of the embedded system strongly influences critical system design objectives like area, power and performance. Hence the embedded system designer performs a complete memory architecture exploration to custom design a memory architecture for a given set of applications. Further, the designer would be interested in multiple optimal design points to address various market segments. However, tight time-to-market constraints enforces short design cycle time. In this paper we address the multi-level multi-objective memory architecture exploration problem through a combination of exhaustive-search based memory exploration at the outer level and a two step based integrated data layout for SPRAM-Cache based architectures at the inner level. We present a two step integrated approach for data layout for SPRAM-Cache based hybrid architectures with the first step as data-partitioning that partitions data between SPRAM and Cache, and the second step is the cache conscious data layout. We formulate the cache-conscious data layout as a graph partitioning problem and show that our approach gives up to 34% improvement over an existing approach and also optimizes the off-chip memory address space. We experimented our approach with 3 embedded multimedia applications and our approach explores several hundred memory configurations for each application, yielding several optimal design points in a few hours of computation on a standard desktop.
Resumo:
This paper presents the design of the area optimized integer two dimensional discrete cosine transform (2-D DCT) used in H.264/AVC codecs. The 2-D DCT calculation is performed by utilizing the separability property, in such a way that 2-D DCT is divided into two 1-D DCT calculation that are joined through a common memory. Due to its area optimized approach, the design will find application in mobile devices. Verilog hardware description language (HDL) in cadence environment has been used for design, compilation, simulation and synthesis of transform block in 0.18 mu TSMC technology.
Resumo:
Soft error has become one of the major areas of attention with the device scaling and large scale integration. Lot of variants for superscalar architecture were proposed with focus on program re-execution, thread re-execution and instruction re-execution. In this paper we proposed a fault tolerant micro-architecture of pipelined RISC. The proposed architecture, Floating Resources Extended pipeline (FREP), re-executes the instructions using extended pipeline stages. The instructions are re-executed by hybrid architecture with a suitable combination of space and time redundancy.
Resumo:
A novel comparator architecture is proposed for speed operation in low voltage environment. Performance comparison with a conventional regenerative comparator shows a speed-up of 41%. The proposed comparator is embedded in a continuous time sigma-delta ADC so as to reduce the quantizer delay and hence minimizes the excess loop delay problem. A performance enhancement of 1dB in the dynamic range of the ADC is achieved with this new comparator. We have implemented this ADC in a standard single-poly 8-Metal 0.13 mum UMC process. The entire system operates at 1.2 V supply providing a dynamic range of 32 dB consuming 720 muW of power and occupies an area of 0.1 mm2.
Resumo:
Today's feature-rich multimedia products require embedded system solution with complex System-on-Chip (SoC) to meet market expectations of high performance at a low cost and lower energy consumption. The memory architecture of the embedded system strongly influences these parameters. Hence the embedded system designer performs a complete memory architecture exploration. This problem is a multi-objective optimization problem and can be tackled as a two-level optimization problem. The outer level explores various memory architecture while the inner level explores placement of data sections (data layout problem) to minimize memory stalls. Further, the designer would be interested in multiple optimal design points to address various market segments. However, tight time-to-market constraints enforces short design cycle time. In this paper we address the multi-level multi-objective memory architecture exploration problem through a combination of Multi-objective Genetic Algorithm (Memory Architecture exploration) and an efficient heuristic data placement algorithm. At the outer level the memory architecture exploration is done by picking memory modules directly from a ASIC memory Library. This helps in performing the memory architecture exploration in a integrated framework, where the memory allocation, memory exploration and data layout works in a tightly coupled way to yield optimal design points with respect to area, power and performance. We experimented our approach for 3 embedded applications and our approach explores several thousand memory architecture for each application, yielding a few hundred optimal design points in a few hours of computation time on a standard desktop.
Resumo:
A new scheme for robust estimation of the partial state of linear time-invariant multivariable systems is presented, and it is shown how this may be used for the detection of sensor faults in such systems. We consider an observer to be robust if it generates a faithful estimate of the plant state in the face of modelling uncertainty or plant perturbations. Using the Stable Factorization approach we formulate the problem of optimal robust observer design by minimizing an appropriate norm on the estimation error. A logical candidate is the 2-norm, corresponding to an H�¿ optimization problem, for which solutions are readily available. In the special case of a stable plant, the optimal fault diagnosis scheme reduces to an internal model control architecture.
Resumo:
Precision, sophistication and economic factors in many areas of scientific research that demand very high magnitude of compute power is the order of the day. Thus advance research in the area of high performance computing is getting inevitable. The basic principle of sharing and collaborative work by geographically separated computers is known by several names such as metacomputing, scalable computing, cluster computing, internet computing and this has today metamorphosed into a new term known as grid computing. This paper gives an overview of grid computing and compares various grid architectures. We show the role that patterns can play in architecting complex systems, and provide a very pragmatic reference to a set of well-engineered patterns that the practicing developer can apply to crafting his or her own specific applications. We are not aware of pattern-oriented approach being applied to develop and deploy a grid. There are many grid frameworks that are built or are in the process of being functional. All these grids differ in some functionality or the other, though the basic principle over which the grids are built is the same. Despite this there are no standard requirements listed for building a grid. The grid being a very complex system, it is mandatory to have a standard Software Architecture Specification (SAS). We attempt to develop the same for use by any grid user or developer. Specifically, we analyze the grid using an object oriented approach and presenting the architecture using UML. This paper will propose the usage of patterns at all levels (analysis. design and architectural) of the grid development.
Resumo:
Real-Time services are traditionally supported on circuit switched network. However, there is a need to port these services on packet switched network. Architecture for audio conferencing application over the Internet in the light of ITU-T H.323 recommendations is considered. In a conference, considering packets only from a set of selected clients can reduce speech quality degradation because mixing packets from all clients can lead to lack of speech clarity. A distributed algorithm and architecture for selecting clients for mixing is suggested here based on a new quantifier of the voice activity called “Loudness Number” (LN). The proposed system distributes the computation load and reduces the load on client terminals. The highlights of this architecture are scalability, bandwidth saving and speech quality enhancement. Client selection for playing out tries to mimic a physical conference where the most vocal participants attract more attention. The contributions of the paper are expected to aid H.323 recommendations implementations for Multipoint Processors (MP). A working prototype based on the proposed architecture is already functional.
Resumo:
Electrostatic self-assembly of colloidal and nanoparticles has attracted a lot of attention in recent years, since it offers the possibility of producing novel crystalline structures that have the potential to be used as advanced materials for photonic and other applications. The stoichiometry of these crystals is not constrained by charge neutrality of the two types of particles due to the presence of counterions, and hence a variety of three-dimensional structures have been observed depending on the relative sizes of the particles and their charge. Here we report structural polymorphism of two-dimensional crystals of oppositely charged linear macroions, namely DNA and self-assembled cylindrical micelles of cationic amphiphiles. Our system differs from those studied earlier in terms of the presence of a strongly binding counterion that competes with DNA to bind to the micelle. The presence of these counterions leads to novel structures of these crystals, such as a square lattice and a root 3 x root 3 superlattice of an underlying hexagonal lattice, determined from a detailed analysis of the small-angle diffraction data. These lower-dimensional equilibrium systems can play an important role in developing a deeper theoretical understanding of the stability of crystals of oppositely charged particles. Further, it should be possible to use the same design principles to fabricate structures on a longer length-scale by an appropriate choice of the two macroions.
Resumo:
Today's SoCs are complex designs with multiple embedded processors, memory subsystems, and application specific peripherals. The memory architecture of embedded SoCs strongly influences the power and performance of the entire system. Further, the memory subsystem constitutes a major part (typically up to 70%) of the silicon area for the current day SoC. In this article, we address the on-chip memory architecture exploration for DSP processors which are organized as multiple memory banks, where banks can be single/dual ported with non-uniform bank sizes. In this paper we propose two different methods for physical memory architecture exploration and identify the strengths and applicability of these methods in a systematic way. Both methods address the memory architecture exploration for a given target application by considering the application's data access characteristics and generates a set of Pareto-optimal design points that are interesting from a power, performance and VLSI area perspective. To the best of our knowledge, this is the first comprehensive work on memory space exploration at physical memory level that integrates data layout and memory exploration to address the system objectives from both hardware design and application software development perspective. Further we propose an automatic framework that explores the design space identifying 100's of Pareto-optimal design points within a few hours of running on a standard desktop configuration.
Resumo:
This paper presents a decentralized/peer-to-peer architecture-based parallel version of the vector evaluated particle swarm optimization (VEPSO) algorithm for multi-objective design optimization of laminated composite plates using message passing interface (MPI). The design optimization of laminated composite plates being a combinatorially explosive constrained non-linear optimization problem (CNOP), with many design variables and a vast solution space, warrants the use of non-parametric and heuristic optimization algorithms like PSO. Optimization requires minimizing both the weight and cost of these composite plates, simultaneously, which renders the problem multi-objective. Hence VEPSO, a multi-objective variant of the PSO algorithm, is used. Despite the use of such a heuristic, the application problem, being computationally intensive, suffers from long execution times due to sequential computation. Hence, a parallel version of the PSO algorithm for the problem has been developed to run on several nodes of an IBM P720 cluster. The proposed parallel algorithm, using MPI's collective communication directives, establishes a peer-to-peer relationship between the constituent parallel processes, deviating from the more common master-slave approach, in achieving reduction of computation time by factor of up to 10. Finally we show the effectiveness of the proposed parallel algorithm by comparing it with a serial implementation of VEPSO and a parallel implementation of the vector evaluated genetic algorithm (VEGA) for the same design problem. (c) 2012 Elsevier Ltd. All rights reserved.