67 resultados para Architecture nest
Resumo:
Regular Expressions are generic representations for a string or a collection of strings. This paper focuses on implementation of a regular expression matching architecture on reconfigurable fabric like FPGA. We present a Nondeterministic Finite Automata based implementation with extended regular expression syntax set compared to previous approaches. We also describe a dynamically reconfigurable generic block that implements the supported regular expression syntax. This enables formation of the regular expression hardware by a simple cascade of generic blocks as well as a possibility for reconfiguring the generic blocks to change the regular expression being matched. Further,we have developed an HDL code generator to obtain the VHDL description of the hardware for any regular expression set. Our optimized regular expression engine achieves a throughput of 2.45 Gbps. Our dynamically reconfigurable regular expression engine achieves a throughput of 0.8 Gbps using 12 FPGA slices per generic block on Xilinx Virtex2Pro FPGA.
Resumo:
Today's feature-rich multimedia products require embedded system solution with complex System-on-Chip (SoC) to meet market expectations of high performance at a low cost and lower energy consumption. The memory architecture of the embedded system strongly influences critical system design objectives like area, power and performance. Hence the embedded system designer performs a complete memory architecture exploration to custom design a memory architecture for a given set of applications. Further, the designer would be interested in multiple optimal design points to address various market segments. However, tight time-to-market constraints enforces short design cycle time. In this paper we address the multi-level multi-objective memory architecture exploration problem through a combination of exhaustive-search based memory exploration at the outer level and a two step based integrated data layout for SPRAM-Cache based architectures at the inner level. We present a two step integrated approach for data layout for SPRAM-Cache based hybrid architectures with the first step as data-partitioning that partitions data between SPRAM and Cache, and the second step is the cache conscious data layout. We formulate the cache-conscious data layout as a graph partitioning problem and show that our approach gives up to 34% improvement over an existing approach and also optimizes the off-chip memory address space. We experimented our approach with 3 embedded multimedia applications and our approach explores several hundred memory configurations for each application, yielding several optimal design points in a few hours of computation on a standard desktop.
Resumo:
This paper presents the design of the area optimized integer two dimensional discrete cosine transform (2-D DCT) used in H.264/AVC codecs. The 2-D DCT calculation is performed by utilizing the separability property, in such a way that 2-D DCT is divided into two 1-D DCT calculation that are joined through a common memory. Due to its area optimized approach, the design will find application in mobile devices. Verilog hardware description language (HDL) in cadence environment has been used for design, compilation, simulation and synthesis of transform block in 0.18 mu TSMC technology.
Resumo:
The prevalent virtualization technologies provide QoS support within the software layers of the virtual machine monitor(VMM) or the operating system of the virtual machine(VM). The QoS features are mostly provided as extensions to the existing software used for accessing the I/O device because of which the applications sharing the I/O device experience loss of performance due to crosstalk effects or usable bandwidth. In this paper we examine the NIC sharing effects across VMs on a Xen virtualized server and present an alternate paradigm that improves the shared bandwidth and reduces the crosstalk effect on the VMs. We implement the proposed hardwaresoftware changes in a layered queuing network (LQN) model and use simulation techniques to evaluate the architecture. We find that simple changes in the device architecture and associated system software lead to application throughput improvement of up to 60%. The architecture also enables finer QoS controls at device level and increases the scalability of device sharing across multiple virtual machines. We find that the performance improvement derived using LQN model is comparable to that reported by similar but real implementations.
Resumo:
Soft error has become one of the major areas of attention with the device scaling and large scale integration. Lot of variants for superscalar architecture were proposed with focus on program re-execution, thread re-execution and instruction re-execution. In this paper we proposed a fault tolerant micro-architecture of pipelined RISC. The proposed architecture, Floating Resources Extended pipeline (FREP), re-executes the instructions using extended pipeline stages. The instructions are re-executed by hybrid architecture with a suitable combination of space and time redundancy.
Resumo:
In this paper we explore an implementation of a high-throughput, streaming application on REDEFINE-v2, which is an enhancement of REDEFINE. REDEFINE is a polymorphic ASIC combining the flexibility of a programmable solution with the execution speed of an ASIC. In REDEFINE Compute Elements are arranged in an 8x8 grid connected via a Network on Chip (NoC) called RECONNECT, to realize the various macrofunctional blocks of an equivalent ASIC. For a 1024-FFT we carry out an application-architecture design space exploration by examining the various characterizations of Compute Elements in terms of the size of the instruction store. We further study the impact by using application specific, vectorized FUs. By setting up different partitions of the FFT algorithm for persistent execution on REDEFINE-v2, we derive the benefits of setting up pipelined execution for higher performance. The impact of the REDEFINE-v2 micro-architecture for any arbitrary N-point FFT (N > 4096) FFT is also analyzed. We report the various algorithm-architecture tradeoffs in terms of area and execution speed with that of an ASIC implementation. In addition we compare the performance gain with respect to a GPP.
Resumo:
This study in Western Ghats, India, investigates the relation between nesting sites of ants and a single remotely sensed variable: the Normalised Difference Vegetation Index (NDVI). We carried out sampling in 60 plots each measuring 30 x 30 m and recorded nest sites of 13 ant species. We found that NDVI values at the nesting sites varied considerably between individual species and also between the six functional groups the ants belong to. The functional groups Cryptic Species, Tropical Climate Specialists and Specialist Predators were present in regions with high NDVI whereas Hot Climate Specialists and Opportunists were found in sites with low NDVI. As expected we found that low NDVI values were associated with scrub jungles and high NDVI values with evergreen forests. Interestingly, we found that Pachycondyla rufipes, an ant species found only in deciduous and evergreen forests, established nests only in sites with low NDVI (range = 0.015 - 0.1779). Our results show that these low NDVI values in deciduous and evergreen forests correspond to canopy gaps in otherwise closed deciduous and evergreen forests. Subsequent fieldwork confirmed the observed high prevalence of P. rufipes in these NDVI-constrained areas. We discuss the value of using NDVI for the remote detection and distinction of ant nest sites.
Resumo:
A novel comparator architecture is proposed for speed operation in low voltage environment. Performance comparison with a conventional regenerative comparator shows a speed-up of 41%. The proposed comparator is embedded in a continuous time sigma-delta ADC so as to reduce the quantizer delay and hence minimizes the excess loop delay problem. A performance enhancement of 1dB in the dynamic range of the ADC is achieved with this new comparator. We have implemented this ADC in a standard single-poly 8-Metal 0.13 mum UMC process. The entire system operates at 1.2 V supply providing a dynamic range of 32 dB consuming 720 muW of power and occupies an area of 0.1 mm2.
Resumo:
Today's feature-rich multimedia products require embedded system solution with complex System-on-Chip (SoC) to meet market expectations of high performance at a low cost and lower energy consumption. The memory architecture of the embedded system strongly influences these parameters. Hence the embedded system designer performs a complete memory architecture exploration. This problem is a multi-objective optimization problem and can be tackled as a two-level optimization problem. The outer level explores various memory architecture while the inner level explores placement of data sections (data layout problem) to minimize memory stalls. Further, the designer would be interested in multiple optimal design points to address various market segments. However, tight time-to-market constraints enforces short design cycle time. In this paper we address the multi-level multi-objective memory architecture exploration problem through a combination of Multi-objective Genetic Algorithm (Memory Architecture exploration) and an efficient heuristic data placement algorithm. At the outer level the memory architecture exploration is done by picking memory modules directly from a ASIC memory Library. This helps in performing the memory architecture exploration in a integrated framework, where the memory allocation, memory exploration and data layout works in a tightly coupled way to yield optimal design points with respect to area, power and performance. We experimented our approach for 3 embedded applications and our approach explores several thousand memory architecture for each application, yielding a few hundred optimal design points in a few hours of computation time on a standard desktop.
Resumo:
In this paper we propose the architecture of a SoC fabric onto which applications described in a HLL are synthesized. The fabric is a homogeneous layout of computation, storage and communication resources on silicon. Through a process of composition of resources (as opposed to decomposition of applications), application specific computational structures are defined on the fabric at runtime to realize different modules of the applications in hardware. Applications synthesized on this fabric offers performance comparable to ASICs while retaining the programmability of processing cores. We outline the application synthesis methodology through examples, and compare our results with software implementations on traditional platforms with unbounded resources.
Resumo:
With the advent of Internet, video over IP is gaining popularity. In such an environment, scalability and fault tolerance will be the key issues. Existing video on demand (VoD) service systems are usually neither scalable nor tolerant to server faults and hence fail to comply to multi-user, failure-prone networks such as the Internet. Current research areas concerning VoD often focus on increasing the throughput and reliability of single server, but rarely addresses the smooth provision of service during server as well as network failures. Reliable Server Pooling (RSerPool), being capable of providing high availability by using multiple redundant servers as single source point, can be a solution to overcome the above failures. During a possible server failure, the continuity of service is retained by another server. In order to achieve transparent failover, efficient state sharing is an important requirement. In this paper, we present an elegant, simple, efficient and scalable approach which has been developed to facilitate the transfer of state by the client itself, using extended cookie mechanism, which ensures that there is no noticeable change in disruption or the video quality.
Resumo:
This paper proposes a Petri net model for a commercial network processor (Intel iXP architecture) which is a multithreaded multiprocessor architecture. We consider and model three different applications viz., IPv4 forwarding, network address translation, and IP security running on IXP 2400/2850. A salient feature of the Petri net model is its ability to model the application, architecture and their interaction in great detail. The model is validated using the Intel proprietary tool (SDK 3.51 for IXP architecture) over a range of configurations. We conduct a detailed performance evaluation, identify the bottleneck resource, and propose a few architectural extensions and evaluate them in detail.
Resumo:
Precision, sophistication and economic factors in many areas of scientific research that demand very high magnitude of compute power is the order of the day. Thus advance research in the area of high performance computing is getting inevitable. The basic principle of sharing and collaborative work by geographically separated computers is known by several names such as metacomputing, scalable computing, cluster computing, internet computing and this has today metamorphosed into a new term known as grid computing. This paper gives an overview of grid computing and compares various grid architectures. We show the role that patterns can play in architecting complex systems, and provide a very pragmatic reference to a set of well-engineered patterns that the practicing developer can apply to crafting his or her own specific applications. We are not aware of pattern-oriented approach being applied to develop and deploy a grid. There are many grid frameworks that are built or are in the process of being functional. All these grids differ in some functionality or the other, though the basic principle over which the grids are built is the same. Despite this there are no standard requirements listed for building a grid. The grid being a very complex system, it is mandatory to have a standard Software Architecture Specification (SAS). We attempt to develop the same for use by any grid user or developer. Specifically, we analyze the grid using an object oriented approach and presenting the architecture using UML. This paper will propose the usage of patterns at all levels (analysis. design and architectural) of the grid development.
Resumo:
Real-Time services are traditionally supported on circuit switched network. However, there is a need to port these services on packet switched network. Architecture for audio conferencing application over the Internet in the light of ITU-T H.323 recommendations is considered. In a conference, considering packets only from a set of selected clients can reduce speech quality degradation because mixing packets from all clients can lead to lack of speech clarity. A distributed algorithm and architecture for selecting clients for mixing is suggested here based on a new quantifier of the voice activity called “Loudness Number” (LN). The proposed system distributes the computation load and reduces the load on client terminals. The highlights of this architecture are scalability, bandwidth saving and speech quality enhancement. Client selection for playing out tries to mimic a physical conference where the most vocal participants attract more attention. The contributions of the paper are expected to aid H.323 recommendations implementations for Multipoint Processors (MP). A working prototype based on the proposed architecture is already functional.
Resumo:
The highest levels of security can be achieved through the use of more than one type of cryptographic algorithm for each security function. In this paper, the REDEFINE polymorphic architecture is presented as an architecture framework that can optimally support a varied set of crypto algorithms without losing high performance. The presented solution is capable of accelerating the advanced encryption standard (AES) and elliptic curve cryptography (ECC) cryptographic protocols, while still supporting different flavors of these algorithms as well as different underlying finite field sizes. The compelling feature of this cryptosystem is the ability to provide acceleration support for new field sizes as well as new (possibly proprietary) cryptographic algorithms decided upon after the cryptosystem is deployed.