82 resultados para minimalist hardware architecture


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Soft error has become one of the major areas of attention with the device scaling and large scale integration. Lot of variants for superscalar architecture were proposed with focus on program re-execution, thread re-execution and instruction re-execution. In this paper we proposed a fault tolerant micro-architecture of pipelined RISC. The proposed architecture, Floating Resources Extended pipeline (FREP), re-executes the instructions using extended pipeline stages. The instructions are re-executed by hybrid architecture with a suitable combination of space and time redundancy.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper we explore an implementation of a high-throughput, streaming application on REDEFINE-v2, which is an enhancement of REDEFINE. REDEFINE is a polymorphic ASIC combining the flexibility of a programmable solution with the execution speed of an ASIC. In REDEFINE Compute Elements are arranged in an 8x8 grid connected via a Network on Chip (NoC) called RECONNECT, to realize the various macrofunctional blocks of an equivalent ASIC. For a 1024-FFT we carry out an application-architecture design space exploration by examining the various characterizations of Compute Elements in terms of the size of the instruction store. We further study the impact by using application specific, vectorized FUs. By setting up different partitions of the FFT algorithm for persistent execution on REDEFINE-v2, we derive the benefits of setting up pipelined execution for higher performance. The impact of the REDEFINE-v2 micro-architecture for any arbitrary N-point FFT (N > 4096) FFT is also analyzed. We report the various algorithm-architecture tradeoffs in terms of area and execution speed with that of an ASIC implementation. In addition we compare the performance gain with respect to a GPP.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A novel comparator architecture is proposed for speed operation in low voltage environment. Performance comparison with a conventional regenerative comparator shows a speed-up of 41%. The proposed comparator is embedded in a continuous time sigma-delta ADC so as to reduce the quantizer delay and hence minimizes the excess loop delay problem. A performance enhancement of 1dB in the dynamic range of the ADC is achieved with this new comparator. We have implemented this ADC in a standard single-poly 8-Metal 0.13 mum UMC process. The entire system operates at 1.2 V supply providing a dynamic range of 32 dB consuming 720 muW of power and occupies an area of 0.1 mm2.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Today's feature-rich multimedia products require embedded system solution with complex System-on-Chip (SoC) to meet market expectations of high performance at a low cost and lower energy consumption. The memory architecture of the embedded system strongly influences these parameters. Hence the embedded system designer performs a complete memory architecture exploration. This problem is a multi-objective optimization problem and can be tackled as a two-level optimization problem. The outer level explores various memory architecture while the inner level explores placement of data sections (data layout problem) to minimize memory stalls. Further, the designer would be interested in multiple optimal design points to address various market segments. However, tight time-to-market constraints enforces short design cycle time. In this paper we address the multi-level multi-objective memory architecture exploration problem through a combination of Multi-objective Genetic Algorithm (Memory Architecture exploration) and an efficient heuristic data placement algorithm. At the outer level the memory architecture exploration is done by picking memory modules directly from a ASIC memory Library. This helps in performing the memory architecture exploration in a integrated framework, where the memory allocation, memory exploration and data layout works in a tightly coupled way to yield optimal design points with respect to area, power and performance. We experimented our approach for 3 embedded applications and our approach explores several thousand memory architecture for each application, yielding a few hundred optimal design points in a few hours of computation time on a standard desktop.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Miniaturization of devices and the ensuing decrease in the threshold voltage has led to a substantial increase in the leakage component of the total processor energy consumption. Relatively simpler issue logic and the presence of a large number of function units in the VLIW and the clustered VLIW architectures attribute a large fraction of this leakage energy consumption in the functional units. However, functional units are not fully utilized in the VLIW architectures because of the inherent variations in the ILP of the programs. This underutilization is even more pronounced in the context of clustered VLIW architectures because of the contentions for the limited number of slow intercluster communication channels which lead to many short idle cycles.In the past, some architectural schemes have been proposed to obtain leakage energy bene .ts by aggressively exploiting the idleness of functional units. However, presence of many short idle cycles cause frequent transitions from the active mode to the sleep mode and vice-versa and adversely a ffects the energy benefits of a purely hardware based scheme. In this paper, we propose and evaluate a compiler instruction scheduling algorithm that assist such a hardware based scheme in the context of VLIW and clustered VLIW architectures. The proposed scheme exploits the scheduling slacks of instructions to orchestrate the functional unit mapping with the objective of reducing the number of transitions in functional units thereby keeping them off for a longer duration. The proposed compiler-assisted scheme obtains a further 12% reduction of energy consumption of functional units with negligible performance degradation over a hardware-only scheme for a VLIW architecture. The benefits are 15% and 17% in the context of a 2-clustered and a 4-clustered VLIW architecture respectively. Our test bed uses the Trimaran compiler infrastructure.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

With the advent of Internet, video over IP is gaining popularity. In such an environment, scalability and fault tolerance will be the key issues. Existing video on demand (VoD) service systems are usually neither scalable nor tolerant to server faults and hence fail to comply to multi-user, failure-prone networks such as the Internet. Current research areas concerning VoD often focus on increasing the throughput and reliability of single server, but rarely addresses the smooth provision of service during server as well as network failures. Reliable Server Pooling (RSerPool), being capable of providing high availability by using multiple redundant servers as single source point, can be a solution to overcome the above failures. During a possible server failure, the continuity of service is retained by another server. In order to achieve transparent failover, efficient state sharing is an important requirement. In this paper, we present an elegant, simple, efficient and scalable approach which has been developed to facilitate the transfer of state by the client itself, using extended cookie mechanism, which ensures that there is no noticeable change in disruption or the video quality.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

H.264 is a video codec standard which delivers high resolution video even at low bit rates. To provide high throughput at low bit rates hardware implementations are essential. In this paper, we propose hardware implementations for speed and area optimized DCT and quantizer modules. To target above criteria we propose two architectures. First architecture is speed optimized which gives a high throughput and can meet requirements of 4096x2304 frame at 30 frames/sec. Second architecture is area optimized and occupies 2009 LUTs in Altera’s stratix-II and can meet the requirements of 1080HD at 30 frames/sec.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper proposes a Petri net model for a commercial network processor (Intel iXP architecture) which is a multithreaded multiprocessor architecture. We consider and model three different applications viz., IPv4 forwarding, network address translation, and IP security running on IXP 2400/2850. A salient feature of the Petri net model is its ability to model the application, architecture and their interaction in great detail. The model is validated using the Intel proprietary tool (SDK 3.51 for IXP architecture) over a range of configurations. We conduct a detailed performance evaluation, identify the bottleneck resource, and propose a few architectural extensions and evaluate them in detail.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Precision, sophistication and economic factors in many areas of scientific research that demand very high magnitude of compute power is the order of the day. Thus advance research in the area of high performance computing is getting inevitable. The basic principle of sharing and collaborative work by geographically separated computers is known by several names such as metacomputing, scalable computing, cluster computing, internet computing and this has today metamorphosed into a new term known as grid computing. This paper gives an overview of grid computing and compares various grid architectures. We show the role that patterns can play in architecting complex systems, and provide a very pragmatic reference to a set of well-engineered patterns that the practicing developer can apply to crafting his or her own specific applications. We are not aware of pattern-oriented approach being applied to develop and deploy a grid. There are many grid frameworks that are built or are in the process of being functional. All these grids differ in some functionality or the other, though the basic principle over which the grids are built is the same. Despite this there are no standard requirements listed for building a grid. The grid being a very complex system, it is mandatory to have a standard Software Architecture Specification (SAS). We attempt to develop the same for use by any grid user or developer. Specifically, we analyze the grid using an object oriented approach and presenting the architecture using UML. This paper will propose the usage of patterns at all levels (analysis. design and architectural) of the grid development.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Real-Time services are traditionally supported on circuit switched network. However, there is a need to port these services on packet switched network. Architecture for audio conferencing application over the Internet in the light of ITU-T H.323 recommendations is considered. In a conference, considering packets only from a set of selected clients can reduce speech quality degradation because mixing packets from all clients can lead to lack of speech clarity. A distributed algorithm and architecture for selecting clients for mixing is suggested here based on a new quantifier of the voice activity called “Loudness Number” (LN). The proposed system distributes the computation load and reduces the load on client terminals. The highlights of this architecture are scalability, bandwidth saving and speech quality enhancement. Client selection for playing out tries to mimic a physical conference where the most vocal participants attract more attention. The contributions of the paper are expected to aid H.323 recommendations implementations for Multipoint Processors (MP). A working prototype based on the proposed architecture is already functional.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Summary form only given. A scheme for code compression that has a fast decompression algorithm, which can be implemented using simple hardware, is proposed. The effectiveness of the scheme on the TMS320C62x architecture that includes the overheads of a line address table (LAT) is evaluated and obtained compression rates ranging from 70% to 80%. Two schemes for decompression are proposed. The basic idea underlying the scheme is a simple clustering algorithm that partially maps a block of instructions into a set of clusters. The clustering algorithm is a greedy algorithm based on the frequency of occurrence of various instructions.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The highest levels of security can be achieved through the use of more than one type of cryptographic algorithm for each security function. In this paper, the REDEFINE polymorphic architecture is presented as an architecture framework that can optimally support a varied set of crypto algorithms without losing high performance. The presented solution is capable of accelerating the advanced encryption standard (AES) and elliptic curve cryptography (ECC) cryptographic protocols, while still supporting different flavors of these algorithms as well as different underlying finite field sizes. The compelling feature of this cryptosystem is the ability to provide acceleration support for new field sizes as well as new (possibly proprietary) cryptographic algorithms decided upon after the cryptosystem is deployed.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Electrostatic self-assembly of colloidal and nanoparticles has attracted a lot of attention in recent years, since it offers the possibility of producing novel crystalline structures that have the potential to be used as advanced materials for photonic and other applications. The stoichiometry of these crystals is not constrained by charge neutrality of the two types of particles due to the presence of counterions, and hence a variety of three-dimensional structures have been observed depending on the relative sizes of the particles and their charge. Here we report structural polymorphism of two-dimensional crystals of oppositely charged linear macroions, namely DNA and self-assembled cylindrical micelles of cationic amphiphiles. Our system differs from those studied earlier in terms of the presence of a strongly binding counterion that competes with DNA to bind to the micelle. The presence of these counterions leads to novel structures of these crystals, such as a square lattice and a root 3 x root 3 superlattice of an underlying hexagonal lattice, determined from a detailed analysis of the small-angle diffraction data. These lower-dimensional equilibrium systems can play an important role in developing a deeper theoretical understanding of the stability of crystals of oppositely charged particles. Further, it should be possible to use the same design principles to fabricate structures on a longer length-scale by an appropriate choice of the two macroions.