960 resultados para middleware architectures


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Very Long Instruction Word (VLIW) architectures exploit instruction level parallelism (ILP) with the help of the compiler to achieve higher instruction throughput with minimal hardware. However, control and data dependencies between operations limit the available ILP, which not only hinders the scalability of VLIW architectures, but also result in code size expansion. Although speculation and predicated execution mitigate ILP limitations due to control dependencies to a certain extent, they increase hardware cost and exacerbate code size expansion. Simultaneous multistreaming (SMS) can significantly improve operation throughput by allowing interleaved execution of operations from multiple instruction streams. In this paper we study SMS for VLIW architectures and quantify the benefits associated with it using a case study of the MPEG-2 video decoder. We also propose the notion of virtual resources for VLIW architectures, which decouple architectural resources (resources exposed to the compiler) from the microarchitectural resources, to limit code size expansion. Our results for a VLIW architecture demonstrate that: (1) SMS delivers much higher throughput than that achieved by speculation and predicated execution, (2) the increase in performance due to the addition of speculation and predicated execution support over SMS averages around 12%. The minor increase in performance might not warrant the additional hardware complexity involved, and (3) the notion of virtual resources is very effective in reducing no-operations (NOPs) and consequently reduce code size with little or no impact on performance.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Computational grids with multiple batch systems (batch grids) can be powerful infrastructures for executing long-running multicomponent parallel applications. In this paper, we have constructed a middleware framework for executing such long-running applications spanning multiple submissions to the queues on multiple batch systems. We have used our framework for execution of a foremost long-running multi-component application for climate modeling, the Community Climate System Model (CCSM). Our framework coordinates the distribution, execution, migration and restart of the components of CCSM on the multiple queues where the component jobs of the different queues can have different queue waiting and startup times.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Frequent accesses to the register file make it one of the major sources of energy consumption in ILP architectures. The large number of functional units connected to a large unified register file in VLIW architectures make power dissipation in the register file even worse because of the need for a large number of ports. High power dissipation in a relatively smaller area occupied by a register file leads to a high power density in the register file and makes it one of the prime hot-spots. This makes it highly susceptible to the possibility of a catastrophic heatstroke. This in turn impacts the performance and cost because of the need for periodic cool down and sophisticated packaging and cooling techniques respectively. Clustered VLIW architectures partition the register file among clusters of functional units and reduce the number of ports required thereby reducing the power dissipation. However, we observe that the aggregate accesses to register files in clustered VLIW architectures (and associated energy consumption) become very high compared to the centralized VLIW architectures and this can be attributed to a large number of explicit inter-cluster communications. Snooping based clustered VLIW architectures provide very limited but very fast way of inter-cluster communication by allowing some of the functional units to directly read some of the operands from the register file of some of the other clusters. In this paper, we propose instruction scheduling algorithms that exploit the limited snooping capability to reduce the register file energy consumption on an average by 12% and 18% and improve the overall performance by 5% and 11% for a 2-clustered and a 4-clustered machine respectively, over an earlier state-of-the-art clustered scheduling algorithm when evaluated in the context of snooping based clustered VLIW architectures.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Miniaturization of devices and the ensuing decrease in the threshold voltage has led to a substantial increase in the leakage component of the total processor energy consumption. Relatively simpler issue logic and the presence of a large number of function units in the VLIW and the clustered VLIW architectures attribute a large fraction of this leakage energy consumption in the functional units. However, functional units are not fully utilized in the VLIW architectures because of the inherent variations in the ILP of the programs. This underutilization is even more pronounced in the context of clustered VLIW architectures because of the contentions for the limited number of slow intercluster communication channels which lead to many short idle cycles.In the past, some architectural schemes have been proposed to obtain leakage energy bene .ts by aggressively exploiting the idleness of functional units. However, presence of many short idle cycles cause frequent transitions from the active mode to the sleep mode and vice-versa and adversely a ffects the energy benefits of a purely hardware based scheme. In this paper, we propose and evaluate a compiler instruction scheduling algorithm that assist such a hardware based scheme in the context of VLIW and clustered VLIW architectures. The proposed scheme exploits the scheduling slacks of instructions to orchestrate the functional unit mapping with the objective of reducing the number of transitions in functional units thereby keeping them off for a longer duration. The proposed compiler-assisted scheme obtains a further 12% reduction of energy consumption of functional units with negligible performance degradation over a hardware-only scheme for a VLIW architecture. The benefits are 15% and 17% in the context of a 2-clustered and a 4-clustered VLIW architecture respectively. Our test bed uses the Trimaran compiler infrastructure.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Video streaming applications have hitherto been supported by single server systems. A major drawback of such a solution is that it increases the server load. The server restricts the number of clients that can be simultaneously supported due to limitation in bandwidth. The constraints of a single server system can be overcome in video streaming if we exploit the endless resources available in a distributed and networked system. We explore a P2P system for streaming video applications. In this paper we build a P2P streaming video (SVP2P) service in which multiple peers co-operate to serve video segments for new requests, thereby reducing server load and bandwidth used. Our simulation shows the playback latency using SVP2P is roughly 1/4th of the latency incurred when the server directly streams the video. Bandwidth consumed for control messages (overhead) is as low as 1.5% of the total data transfered. The most important observation is that the capacity of the SVP2P grows dynamically.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

ASICs offer the best realization of DSP algorithms in terms of performance, but the cost is prohibitive, especially when the volumes involved are low. However, if the architecture synthesis trajectory for such algorithms is such that the target architecture can be identified as an interconnection of elementary parameterized computational structures, then it is possible to attain a close match, both in terms of performance and power with respect to an ASIC, for any algorithmic parameters of the given algorithm. Such an architecture is weakly programmable (configurable) and can be viewed as an application specific instruction-set processor (ASIP). In this work, we present a methodology to synthesize ASIPs for DSP algorithms.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We propose a novel technique for reducing the power consumed by the on-chip cache in SNUCA chip multicore platform. This is achieved by what we call a "remap table", which maps accesses to the cache banks that are as close as possible to the cores, on which the processes are scheduled. With this technique, instead of using all the available cache, we use a portion of the cache and allocate lesser cache to the application. We formulate the problem as an energy-delay (ED) minimization problem and solve it offline using a scalable genetic algorithm approach. Our experiments show up to 40% of savings in the memory sub-system power consumption and 47% savings in energy-delay product (ED).

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We propose a novel technique for reducing the power consumed by the on-chip cache in SNUCA chip multicore platform. This is achieved by what we call a "remap table", which maps accesses to the cache banks that are as close as possible to the cores, on which the processes are scheduled. With this technique, instead of using all the available cache, we use a portion of the cache and allocate lesser cache to the application. We formulate the problem as an energy-delay (ED) minimization problem and solve it offline using a scalable genetic algorithm approach. Our experiments show up to 40% of savings in the memory sub-system power consumption and 47% savings in energy-delay product (ED).

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We have developed a graphical user interface based dendrimer builder toolkit (DBT) which can be used to generate the dendrimer configuration of desired generation for various dendrimer architectures. The validation of structures generated by this tool was carried out by studying the structural properties of two well known classes of dendrimers: ethylenediamine cored poly(amidoamine) (PAMAM) dendrimer, diaminobutyl cored poly(propylene imine) (PPI) dendrimer. Using full atomistic molecular dynamics (MD) simulation we have calculated the radius of gyration, shape tensor and monomer density distribution for PAMAM and PPI dendrimer at neutral and high pH. A good agreement between the available simulation and experimental (small angle X-ray and neutron scattering; SAXS, SANS) results and calculated radius of gyration was observed. With this validation we have used DBT to build another new class of nitrogen cored poly(propyl ether imine) dendrimer and study it's structural features using all atomistic MD simulation. DBT is a versatile tool and can be easily used to generate other dendrimer structures with different chemistry and topology. The use of general amber force field to describe the intra-molecular interactions allows us to integrate this tool easily with the widely used molecular dynamics software AMBER. This makes our tool a very useful utility which can help to facilitate the study of dendrimer interaction with nucleic acids, protein and lipid bilayer for various biological applications. © 2012 Wiley Periodicals, Inc.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We have developed a graphical user interface based dendrimer builder toolkit (DBT) which can be used to generate the dendrimer configuration of desired generation for various dendrimer architectures. The validation of structures generated by this tool was carried out by studying the structural properties of two well known classes of dendrimers: ethylenediamine cored poly(amidoamine) (PAMAM) dendrimer, diaminobutyl cored poly(propylene imine) (PPI) dendrimer. Using full atomistic molecular dynamics (MD) simulation we have calculated the radius of gyration, shape tensor and monomer density distribution for PAMAM and PPI dendrimer at neutral and high pH. A good agreement between the available simulation and experimental (small angle X-ray and neutron scattering; SAXS, SANS) results and calculated radius of gyration was observed. With this validation we have used DBT to build another new class of nitrogen cored poly(propyl ether imine) dendrimer and study it's structural features using all atomistic MD simulation. DBT is a versatile tool and can be easily used to generate other dendrimer structures with different chemistry and topology. The use of general amber force field to describe the intra-molecular interactions allows us to integrate this tool easily with the widely used molecular dynamics software AMBER. This makes our tool a very useful utility which can help to facilitate the study of dendrimer interaction with nucleic acids, protein and lipid bilayer for various biological applications. (c) 2012 Wiley Periodicals, Inc.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We report on multifunctional devices based on CNT arrays-ZnO nanowires hybrid architectures. The hybrid structure exhibit excellent high current Schottky like behavior with ZnO as p-type and an ideality factor close to the ideal value. Further the CNT-ZnO hybrid structures can be used as high current p-type field effect transistors that can deliver currents of the order of milliamperes and also can be used as ultraviolet detectors with controllable current on-off ratio and response time. The p-type nature of ZnO and possible mechanism for the rectifying characteristics of CNT-ZnO has been presented.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We report the synthesis and aggregation behaviour of new water-soluble, bile acid derived tripodal architectures based on a core derived from triphenylphosphine oxide. We employed the well-established copper-catalysed 1,3]-dipolar cycloaddition (CuAAC) for the construction of these tripodal molecules. The aggregation behaviour of these molecules in aqueous media was studied by different analytical methods such as dye solubilisation, dynamic light scattering, NMR and AFM. These molecular architectures also offer an additional advantage in aiding understanding of the influence of the nature of the bile acid backbone and of the configuration at the steroid C-3 position in these architectures; to the best of our knowledge this has not been reported in the literature. The unique gelation properties of the -derivatives were explained through molecular modelling studies and the mechanical behaviour of these gels was studied by rheology experiments.