965 resultados para Computer Architecture


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Very Long Instruction Word (VLIW) architectures exploit instruction level parallelism (ILP) with the help of the compiler to achieve higher instruction throughput with minimal hardware. However, control and data dependencies between operations limit the available ILP, which not only hinders the scalability of VLIW architectures, but also result in code size expansion. Although speculation and predicated execution mitigate ILP limitations due to control dependencies to a certain extent, they increase hardware cost and exacerbate code size expansion. Simultaneous multistreaming (SMS) can significantly improve operation throughput by allowing interleaved execution of operations from multiple instruction streams. In this paper we study SMS for VLIW architectures and quantify the benefits associated with it using a case study of the MPEG-2 video decoder. We also propose the notion of virtual resources for VLIW architectures, which decouple architectural resources (resources exposed to the compiler) from the microarchitectural resources, to limit code size expansion. Our results for a VLIW architecture demonstrate that: (1) SMS delivers much higher throughput than that achieved by speculation and predicated execution, (2) the increase in performance due to the addition of speculation and predicated execution support over SMS averages around 12%. The minor increase in performance might not warrant the additional hardware complexity involved, and (3) the notion of virtual resources is very effective in reducing no-operations (NOPs) and consequently reduce code size with little or no impact on performance.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Hybrid wireless networks are extensively used in the superstores, market places, malls, etc. and provide high QoS (Quality of Service) to the end-users has become a challenging task. In this paper, we propose a policy-based transaction-aware QoS management architecture in a hybrid wireless superstore environment. The proposed scheme operates at the transaction level, for the downlink QoS management. We derive a policy for the estimation of QoS parameters, like, delay, jitter, bandwidth, availability, packet loss for every transaction before scheduling on the downlink. We also propose a QoS monitor which monitors the specified QoS and automatically adjusts the QoS according to the requirement. The proposed scheme has been simulated in hybrid wireless superstore environment and tested for various superstore transactions. The results shows that the policy-based transaction QoS management is enhance the performance and utilize network resources efficiently at the peak time of the superstore business.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Regular Expressions are generic representations for a string or a collection of strings. This paper focuses on implementation of a regular expression matching architecture on reconfigurable fabric like FPGA. We present a Nondeterministic Finite Automata based implementation with extended regular expression syntax set compared to previous approaches. We also describe a dynamically reconfigurable generic block that implements the supported regular expression syntax. This enables formation of the regular expression hardware by a simple cascade of generic blocks as well as a possibility for reconfiguring the generic blocks to change the regular expression being matched. Further,we have developed an HDL code generator to obtain the VHDL description of the hardware for any regular expression set. Our optimized regular expression engine achieves a throughput of 2.45 Gbps. Our dynamically reconfigurable regular expression engine achieves a throughput of 0.8 Gbps using 12 FPGA slices per generic block on Xilinx Virtex2Pro FPGA.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Today's feature-rich multimedia products require embedded system solution with complex System-on-Chip (SoC) to meet market expectations of high performance at a low cost and lower energy consumption. The memory architecture of the embedded system strongly influences critical system design objectives like area, power and performance. Hence the embedded system designer performs a complete memory architecture exploration to custom design a memory architecture for a given set of applications. Further, the designer would be interested in multiple optimal design points to address various market segments. However, tight time-to-market constraints enforces short design cycle time. In this paper we address the multi-level multi-objective memory architecture exploration problem through a combination of exhaustive-search based memory exploration at the outer level and a two step based integrated data layout for SPRAM-Cache based architectures at the inner level. We present a two step integrated approach for data layout for SPRAM-Cache based hybrid architectures with the first step as data-partitioning that partitions data between SPRAM and Cache, and the second step is the cache conscious data layout. We formulate the cache-conscious data layout as a graph partitioning problem and show that our approach gives up to 34% improvement over an existing approach and also optimizes the off-chip memory address space. We experimented our approach with 3 embedded multimedia applications and our approach explores several hundred memory configurations for each application, yielding several optimal design points in a few hours of computation on a standard desktop.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper elucidates the methodology of applying artificial neural network model (ANNM) to predict the percent swell of calcitic soil in sulphuric acid solutions, a complex phenomenon involving many parameters. Swell data required for modelling is experimentally obtained using conventional oedometer tests under nominal surcharge. The phases in ANN include optimal design of architecture, operation and training of architecture. The designed optimal neural model (3-5-1) is a fully connected three layer feed forward network with symmetric sigmoid activation function and trained by the back propagation algorithm to minimize a quadratic error criterion.The used model requires parameters such as duration of interaction, calcite mineral content and acid concentration for prediction of swell. The observed strong correlation coefficient (R2 = 0.9979) between the values determined by the experiment and predicted using the developed model demonstrates that the network can provide answers to complex problems in geotechnical engineering.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents the design of the area optimized integer two dimensional discrete cosine transform (2-D DCT) used in H.264/AVC codecs. The 2-D DCT calculation is performed by utilizing the separability property, in such a way that 2-D DCT is divided into two 1-D DCT calculation that are joined through a common memory. Due to its area optimized approach, the design will find application in mobile devices. Verilog hardware description language (HDL) in cadence environment has been used for design, compilation, simulation and synthesis of transform block in 0.18 mu TSMC technology.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The prevalent virtualization technologies provide QoS support within the software layers of the virtual machine monitor(VMM) or the operating system of the virtual machine(VM). The QoS features are mostly provided as extensions to the existing software used for accessing the I/O device because of which the applications sharing the I/O device experience loss of performance due to crosstalk effects or usable bandwidth. In this paper we examine the NIC sharing effects across VMs on a Xen virtualized server and present an alternate paradigm that improves the shared bandwidth and reduces the crosstalk effect on the VMs. We implement the proposed hardwaresoftware changes in a layered queuing network (LQN) model and use simulation techniques to evaluate the architecture. We find that simple changes in the device architecture and associated system software lead to application throughput improvement of up to 60%. The architecture also enables finer QoS controls at device level and increases the scalability of device sharing across multiple virtual machines. We find that the performance improvement derived using LQN model is comparable to that reported by similar but real implementations.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Sub-pixel classification is essential for the successful description of many land cover (LC) features with spatial resolution less than the size of the image pixels. A commonly used approach for sub-pixel classification is linear mixture models (LMM). Even though, LMM have shown acceptable results, pragmatically, linear mixtures do not exist. A non-linear mixture model, therefore, may better describe the resultant mixture spectra for endmember (pure pixel) distribution. In this paper, we propose a new methodology for inferring LC fractions by a process called automatic linear-nonlinear mixture model (AL-NLMM). AL-NLMM is a three step process where the endmembers are first derived from an automated algorithm. These endmembers are used by the LMM in the second step that provides abundance estimation in a linear fashion. Finally, the abundance values along with the training samples representing the actual proportions are fed to multi-layer perceptron (MLP) architecture as input to train the neurons which further refines the abundance estimates to account for the non-linear nature of the mixing classes of interest. AL-NLMM is validated on computer simulated hyperspectral data of 200 bands. Validation of the output showed overall RMSE of 0.0089±0.0022 with LMM and 0.0030±0.0001 with the MLP based AL-NLMM, when compared to actual class proportions indicating that individual class abundances obtained from AL-NLMM are very close to the real observations.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Soft error has become one of the major areas of attention with the device scaling and large scale integration. Lot of variants for superscalar architecture were proposed with focus on program re-execution, thread re-execution and instruction re-execution. In this paper we proposed a fault tolerant micro-architecture of pipelined RISC. The proposed architecture, Floating Resources Extended pipeline (FREP), re-executes the instructions using extended pipeline stages. The instructions are re-executed by hybrid architecture with a suitable combination of space and time redundancy.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper we explore an implementation of a high-throughput, streaming application on REDEFINE-v2, which is an enhancement of REDEFINE. REDEFINE is a polymorphic ASIC combining the flexibility of a programmable solution with the execution speed of an ASIC. In REDEFINE Compute Elements are arranged in an 8x8 grid connected via a Network on Chip (NoC) called RECONNECT, to realize the various macrofunctional blocks of an equivalent ASIC. For a 1024-FFT we carry out an application-architecture design space exploration by examining the various characterizations of Compute Elements in terms of the size of the instruction store. We further study the impact by using application specific, vectorized FUs. By setting up different partitions of the FFT algorithm for persistent execution on REDEFINE-v2, we derive the benefits of setting up pipelined execution for higher performance. The impact of the REDEFINE-v2 micro-architecture for any arbitrary N-point FFT (N > 4096) FFT is also analyzed. We report the various algorithm-architecture tradeoffs in terms of area and execution speed with that of an ASIC implementation. In addition we compare the performance gain with respect to a GPP.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A novel comparator architecture is proposed for speed operation in low voltage environment. Performance comparison with a conventional regenerative comparator shows a speed-up of 41%. The proposed comparator is embedded in a continuous time sigma-delta ADC so as to reduce the quantizer delay and hence minimizes the excess loop delay problem. A performance enhancement of 1dB in the dynamic range of the ADC is achieved with this new comparator. We have implemented this ADC in a standard single-poly 8-Metal 0.13 mum UMC process. The entire system operates at 1.2 V supply providing a dynamic range of 32 dB consuming 720 muW of power and occupies an area of 0.1 mm2.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Prior work on modeling interconnects has focused on optimizing the wire and repeater design for trading off energy and delay, and is largely based on low level circuit parameters. Hence these models are hard to use directly to make high level microarchitectural trade-offs in the initial exploration phase of a design. In this paper, we propose INTACTE, a tool that can be used by architects toget reasonably accurate interconnect area, delay, and power estimates based on a few architecture level parameters for the interconnect such as length, width (in number of bits), frequency, and latency for a specified technology and voltage. The tool uses well known models of interconnect delay and energy taking into account the wire pitch, repeater size, and spacing for a range of voltages and technologies.It then solves an optimization problem of finding the lowest energy interconnect design in terms of the low level circuit parameters, which meets the architectural constraintsgiven as inputs. In addition, the tool also provides the area, energy, and delay for a range of supply voltages and degrees of pipelining, which can be used for micro-architectural exploration of a chip. The delay and energy models used by the tool have been validated against low level circuit simulations. We discuss several potential applications of the tool and present an example of optimizing interconnect design in the context of clustered VLIW architectures. Copyright 2007 ACM.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Today's feature-rich multimedia products require embedded system solution with complex System-on-Chip (SoC) to meet market expectations of high performance at a low cost and lower energy consumption. The memory architecture of the embedded system strongly influences these parameters. Hence the embedded system designer performs a complete memory architecture exploration. This problem is a multi-objective optimization problem and can be tackled as a two-level optimization problem. The outer level explores various memory architecture while the inner level explores placement of data sections (data layout problem) to minimize memory stalls. Further, the designer would be interested in multiple optimal design points to address various market segments. However, tight time-to-market constraints enforces short design cycle time. In this paper we address the multi-level multi-objective memory architecture exploration problem through a combination of Multi-objective Genetic Algorithm (Memory Architecture exploration) and an efficient heuristic data placement algorithm. At the outer level the memory architecture exploration is done by picking memory modules directly from a ASIC memory Library. This helps in performing the memory architecture exploration in a integrated framework, where the memory allocation, memory exploration and data layout works in a tightly coupled way to yield optimal design points with respect to area, power and performance. We experimented our approach for 3 embedded applications and our approach explores several thousand memory architecture for each application, yielding a few hundred optimal design points in a few hours of computation time on a standard desktop.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper we propose the architecture of a SoC fabric onto which applications described in a HLL are synthesized. The fabric is a homogeneous layout of computation, storage and communication resources on silicon. Through a process of composition of resources (as opposed to decomposition of applications), application specific computational structures are defined on the fabric at runtime to realize different modules of the applications in hardware. Applications synthesized on this fabric offers performance comparable to ASICs while retaining the programmability of processing cores. We outline the application synthesis methodology through examples, and compare our results with software implementations on traditional platforms with unbounded resources.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Real-time simulation of deformable solids is essential for some applications such as biological organ simulations for surgical simulators. In this work, deformable solids are approximated to be linear elastic, and an easy and straight forward numerical technique, the Finite Point Method (FPM), is used to model three dimensional linear elastostatics. Graphics Processing Unit (GPU) is used to accelerate computations. Results show that the Finite Point Method, together with GPU, can compute three dimensional linear elastostatic responses of solids at rates suitable for real-time graphics, for solids represented by reasonable number of points.