911 resultados para Modular Architectures


Relevância:

20.00% 20.00%

Publicador:

Resumo:

In recent years, parallel computers have been attracting attention for simulating artificial neural networks (ANN). This is due to the inherent parallelism in ANN. This work is aimed at studying ways of parallelizing adaptive resonance theory (ART), a popular neural network algorithm. The core computations of ART are separated and different strategies of parallelizing ART are discussed. We present mapping strategies for ART 2-A neural network onto ring and mesh architectures. The required parallel architecture is simulated using a parallel architectural simulator, PROTEUS and parallel programs are written using a superset of C for the algorithms presented. A simulation-based scalability study of the algorithm-architecture match is carried out. The various overheads are identified in order to suggest ways of improving the performance. Our main objective is to find out the performance of the ART2-A network on different parallel architectures. (C) 1999 Elsevier Science B.V. All rights reserved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The protein-protein docking programs typically perform four major tasks: (i) generation of docking poses, (ii) selecting a subset of poses, (iii) their structural refinement and (iv) scoring, ranking for the final assessment of the true quaternary structure. Although the tasks can be integrated or performed in a serial order, they are by nature modular, allowing an opportunity to substitute one algorithm with another. We have implemented two modular web services, (i) PRUNE: to select a subset of docking poses generated during sampling search (http://pallab.serc.iisc.ernet.in/prune) and (ii) PROBE: to refine, score and rank them (http://pallab.serc.iisc.ernet.in/probe). The former uses a new interface area based edge-scoring function to eliminate > 95% of the poses generated during docking search. In contrast to other multi-parameter-based screening functions, this single parameter based elimination reduces the computational time significantly, in addition to increasing the chances of selecting native-like models in the top rank list. The PROBE server performs ranking of pruned poses, after structure refinement and scoring using a regression model for geometric compatibility, and normalized interaction energy. While web-service similar to PROBE is infrequent, no web-service akin to PRUNE has been described before. Both the servers are publicly accessible and free for use.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper we consider the problem of scheduling expression trees on delayed-load architectures. The problem tackled here takes root from the one considered in [Proceedings of the ACM SIGPLAN '91 Conf. on Programming Language Design and Implementation, 1991. p. 256] in which the leaves of the expression trees all refer to memory locations. A generalization of this involves the situation in which the trees may contain register variables, with the registers being used only at the leaves. Solutions to this generalization are given in [ACM Trans. Prog. Lang. Syst. 17 (1995) 740, Microproc. Microprog. 40 (1994) 577]. This paper considers the most general case in which the registers are reusable. This problem is tackled in [Comput. Lang, 21 (1995) 49] which gives an approximate solution to the problem under certain assumptions about the contiguity of the evaluation order: Here we propose an optimal solution (which may involve even a non-contiguous evaluation of the tree). The schedule generated by the algorithm given in this paper is optimal in the sense that it is an interlock-free schedule which uses the minimum number of registers required. An extension to the algorithm incorporates spilling. The problem as stated in this paper is an instruction scheduling problem. However, the problem could also be rephrased as an operations research problem with a difference in terminology. (C) 2002 Elsevier Science B.V. All rights reserved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Very Long Instruction Word (VLIW) architectures exploit instruction level parallelism (ILP) with the help of the compiler to achieve higher instruction throughput with minimal hardware. However, control and data dependencies between operations limit the available ILP, which not only hinders the scalability of VLIW architectures, but also result in code size expansion. Although speculation and predicated execution mitigate ILP limitations due to control dependencies to a certain extent, they increase hardware cost and exacerbate code size expansion. Simultaneous multistreaming (SMS) can significantly improve operation throughput by allowing interleaved execution of operations from multiple instruction streams. In this paper we study SMS for VLIW architectures and quantify the benefits associated with it using a case study of the MPEG-2 video decoder. We also propose the notion of virtual resources for VLIW architectures, which decouple architectural resources (resources exposed to the compiler) from the microarchitectural resources, to limit code size expansion. Our results for a VLIW architecture demonstrate that: (1) SMS delivers much higher throughput than that achieved by speculation and predicated execution, (2) the increase in performance due to the addition of speculation and predicated execution support over SMS averages around 12%. The minor increase in performance might not warrant the additional hardware complexity involved, and (3) the notion of virtual resources is very effective in reducing no-operations (NOPs) and consequently reduce code size with little or no impact on performance.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Verification is one of the important stages in designing an SoC (system on chips) that consumes upto 70% of the design time. In this work, we present a methodology to automatically generate verification test-cases to verify a class of SoCs and also enable re-use of verification resources created from one SoC to another. A prototype implementation for generating the test-cases is also presented.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Frequent accesses to the register file make it one of the major sources of energy consumption in ILP architectures. The large number of functional units connected to a large unified register file in VLIW architectures make power dissipation in the register file even worse because of the need for a large number of ports. High power dissipation in a relatively smaller area occupied by a register file leads to a high power density in the register file and makes it one of the prime hot-spots. This makes it highly susceptible to the possibility of a catastrophic heatstroke. This in turn impacts the performance and cost because of the need for periodic cool down and sophisticated packaging and cooling techniques respectively. Clustered VLIW architectures partition the register file among clusters of functional units and reduce the number of ports required thereby reducing the power dissipation. However, we observe that the aggregate accesses to register files in clustered VLIW architectures (and associated energy consumption) become very high compared to the centralized VLIW architectures and this can be attributed to a large number of explicit inter-cluster communications. Snooping based clustered VLIW architectures provide very limited but very fast way of inter-cluster communication by allowing some of the functional units to directly read some of the operands from the register file of some of the other clusters. In this paper, we propose instruction scheduling algorithms that exploit the limited snooping capability to reduce the register file energy consumption on an average by 12% and 18% and improve the overall performance by 5% and 11% for a 2-clustered and a 4-clustered machine respectively, over an earlier state-of-the-art clustered scheduling algorithm when evaluated in the context of snooping based clustered VLIW architectures.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper, the design and development of micro electro mechanical systems (MEMS) based pressure sensor with triple modular redundancy (TMR) for space applications has been presented. In order to minimize the mass of the system and also to avoid the uncertainty in the pressure measurement of the three independent hardware, an integrated approach with TMR is adopted. Sequential steps of TMR logic followed and the test results obtained are included.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Miniaturization of devices and the ensuing decrease in the threshold voltage has led to a substantial increase in the leakage component of the total processor energy consumption. Relatively simpler issue logic and the presence of a large number of function units in the VLIW and the clustered VLIW architectures attribute a large fraction of this leakage energy consumption in the functional units. However, functional units are not fully utilized in the VLIW architectures because of the inherent variations in the ILP of the programs. This underutilization is even more pronounced in the context of clustered VLIW architectures because of the contentions for the limited number of slow intercluster communication channels which lead to many short idle cycles.In the past, some architectural schemes have been proposed to obtain leakage energy bene .ts by aggressively exploiting the idleness of functional units. However, presence of many short idle cycles cause frequent transitions from the active mode to the sleep mode and vice-versa and adversely a ffects the energy benefits of a purely hardware based scheme. In this paper, we propose and evaluate a compiler instruction scheduling algorithm that assist such a hardware based scheme in the context of VLIW and clustered VLIW architectures. The proposed scheme exploits the scheduling slacks of instructions to orchestrate the functional unit mapping with the objective of reducing the number of transitions in functional units thereby keeping them off for a longer duration. The proposed compiler-assisted scheme obtains a further 12% reduction of energy consumption of functional units with negligible performance degradation over a hardware-only scheme for a VLIW architecture. The benefits are 15% and 17% in the context of a 2-clustered and a 4-clustered VLIW architecture respectively. Our test bed uses the Trimaran compiler infrastructure.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

ASICs offer the best realization of DSP algorithms in terms of performance, but the cost is prohibitive, especially when the volumes involved are low. However, if the architecture synthesis trajectory for such algorithms is such that the target architecture can be identified as an interconnection of elementary parameterized computational structures, then it is possible to attain a close match, both in terms of performance and power with respect to an ASIC, for any algorithmic parameters of the given algorithm. Such an architecture is weakly programmable (configurable) and can be viewed as an application specific instruction-set processor (ASIP). In this work, we present a methodology to synthesize ASIPs for DSP algorithms.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We propose a novel technique for reducing the power consumed by the on-chip cache in SNUCA chip multicore platform. This is achieved by what we call a "remap table", which maps accesses to the cache banks that are as close as possible to the cores, on which the processes are scheduled. With this technique, instead of using all the available cache, we use a portion of the cache and allocate lesser cache to the application. We formulate the problem as an energy-delay (ED) minimization problem and solve it offline using a scalable genetic algorithm approach. Our experiments show up to 40% of savings in the memory sub-system power consumption and 47% savings in energy-delay product (ED).

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We propose a novel technique for reducing the power consumed by the on-chip cache in SNUCA chip multicore platform. This is achieved by what we call a "remap table", which maps accesses to the cache banks that are as close as possible to the cores, on which the processes are scheduled. With this technique, instead of using all the available cache, we use a portion of the cache and allocate lesser cache to the application. We formulate the problem as an energy-delay (ED) minimization problem and solve it offline using a scalable genetic algorithm approach. Our experiments show up to 40% of savings in the memory sub-system power consumption and 47% savings in energy-delay product (ED).