55 resultados para deep architectures


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Very Long Instruction Word (VLIW) architectures exploit instruction level parallelism (ILP) with the help of the compiler to achieve higher instruction throughput with minimal hardware. However, control and data dependencies between operations limit the available ILP, which not only hinders the scalability of VLIW architectures, but also result in code size expansion. Although speculation and predicated execution mitigate ILP limitations due to control dependencies to a certain extent, they increase hardware cost and exacerbate code size expansion. Simultaneous multistreaming (SMS) can significantly improve operation throughput by allowing interleaved execution of operations from multiple instruction streams. In this paper we study SMS for VLIW architectures and quantify the benefits associated with it using a case study of the MPEG-2 video decoder. We also propose the notion of virtual resources for VLIW architectures, which decouple architectural resources (resources exposed to the compiler) from the microarchitectural resources, to limit code size expansion. Our results for a VLIW architecture demonstrate that: (1) SMS delivers much higher throughput than that achieved by speculation and predicated execution, (2) the increase in performance due to the addition of speculation and predicated execution support over SMS averages around 12%. The minor increase in performance might not warrant the additional hardware complexity involved, and (3) the notion of virtual resources is very effective in reducing no-operations (NOPs) and consequently reduce code size with little or no impact on performance.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Frequent accesses to the register file make it one of the major sources of energy consumption in ILP architectures. The large number of functional units connected to a large unified register file in VLIW architectures make power dissipation in the register file even worse because of the need for a large number of ports. High power dissipation in a relatively smaller area occupied by a register file leads to a high power density in the register file and makes it one of the prime hot-spots. This makes it highly susceptible to the possibility of a catastrophic heatstroke. This in turn impacts the performance and cost because of the need for periodic cool down and sophisticated packaging and cooling techniques respectively. Clustered VLIW architectures partition the register file among clusters of functional units and reduce the number of ports required thereby reducing the power dissipation. However, we observe that the aggregate accesses to register files in clustered VLIW architectures (and associated energy consumption) become very high compared to the centralized VLIW architectures and this can be attributed to a large number of explicit inter-cluster communications. Snooping based clustered VLIW architectures provide very limited but very fast way of inter-cluster communication by allowing some of the functional units to directly read some of the operands from the register file of some of the other clusters. In this paper, we propose instruction scheduling algorithms that exploit the limited snooping capability to reduce the register file energy consumption on an average by 12% and 18% and improve the overall performance by 5% and 11% for a 2-clustered and a 4-clustered machine respectively, over an earlier state-of-the-art clustered scheduling algorithm when evaluated in the context of snooping based clustered VLIW architectures.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Miniaturization of devices and the ensuing decrease in the threshold voltage has led to a substantial increase in the leakage component of the total processor energy consumption. Relatively simpler issue logic and the presence of a large number of function units in the VLIW and the clustered VLIW architectures attribute a large fraction of this leakage energy consumption in the functional units. However, functional units are not fully utilized in the VLIW architectures because of the inherent variations in the ILP of the programs. This underutilization is even more pronounced in the context of clustered VLIW architectures because of the contentions for the limited number of slow intercluster communication channels which lead to many short idle cycles.In the past, some architectural schemes have been proposed to obtain leakage energy bene .ts by aggressively exploiting the idleness of functional units. However, presence of many short idle cycles cause frequent transitions from the active mode to the sleep mode and vice-versa and adversely a ffects the energy benefits of a purely hardware based scheme. In this paper, we propose and evaluate a compiler instruction scheduling algorithm that assist such a hardware based scheme in the context of VLIW and clustered VLIW architectures. The proposed scheme exploits the scheduling slacks of instructions to orchestrate the functional unit mapping with the objective of reducing the number of transitions in functional units thereby keeping them off for a longer duration. The proposed compiler-assisted scheme obtains a further 12% reduction of energy consumption of functional units with negligible performance degradation over a hardware-only scheme for a VLIW architecture. The benefits are 15% and 17% in the context of a 2-clustered and a 4-clustered VLIW architecture respectively. Our test bed uses the Trimaran compiler infrastructure.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A deep‐level transient spectroscopy (DLTS) technique is reported for determining the capture cross‐section activation energy directly. Conventionally, the capture activation energy is obtained from the temperature dependence of the capture cross section. Capture cross‐section measurement is often very doubtful due to many intrinsic errors and is more critical for nonexponential capture kinetics. The essence of this technique is to use an emission pulse to allow the defects to emit electrons and the transient signal from capture process due to a large capture barrier was analyzed, in contrast with the emission signal in conventional DLTS. This technique has been applied for determining the capture barrier for silicon‐related DX centers in AlxGa1−xAs for different AlAs mole fractions.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A major bottleneck in protein structure prediction is the selection of correct models from a pool of decoys. Relative activities of similar to 1,200 individual single-site mutants in a saturation library of the bacterial toxin CcdB were estimated by determining their relative populations using deep sequencing. This phenotypic information was used to define an empirical score for each residue (Rank Score), which correlated with the residue depth, and identify active-site residues. Using these correlations, similar to 98% of correct models of CcdB (RMSD <= 4 angstrom) were identified from a large set of decoys. The model-discrimination methodology was further validated on eleven different monomeric proteins using simulated RankScore values. The methodology is also a rapid, accurate way to obtain relative activities of each mutant in a large pool and derive sequence-structure-function relationships without protein isolation or characterization. It can be applied to any system in which mutational effects can be monitored by a phenotypic readout.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

ASICs offer the best realization of DSP algorithms in terms of performance, but the cost is prohibitive, especially when the volumes involved are low. However, if the architecture synthesis trajectory for such algorithms is such that the target architecture can be identified as an interconnection of elementary parameterized computational structures, then it is possible to attain a close match, both in terms of performance and power with respect to an ASIC, for any algorithmic parameters of the given algorithm. Such an architecture is weakly programmable (configurable) and can be viewed as an application specific instruction-set processor (ASIP). In this work, we present a methodology to synthesize ASIPs for DSP algorithms.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The theme of the session is the New Concept and Applications both for the grouting and deep mixing technologies. Nineteen papers were submitted to this session, and those covered a variety of topics; 1) New concepts and development, 2) Refinement of techniques, and 3) analysis and applications. Eight papers out of them were presented orally.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We propose a novel technique for reducing the power consumed by the on-chip cache in SNUCA chip multicore platform. This is achieved by what we call a "remap table", which maps accesses to the cache banks that are as close as possible to the cores, on which the processes are scheduled. With this technique, instead of using all the available cache, we use a portion of the cache and allocate lesser cache to the application. We formulate the problem as an energy-delay (ED) minimization problem and solve it offline using a scalable genetic algorithm approach. Our experiments show up to 40% of savings in the memory sub-system power consumption and 47% savings in energy-delay product (ED).

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We propose a novel technique for reducing the power consumed by the on-chip cache in SNUCA chip multicore platform. This is achieved by what we call a "remap table", which maps accesses to the cache banks that are as close as possible to the cores, on which the processes are scheduled. With this technique, instead of using all the available cache, we use a portion of the cache and allocate lesser cache to the application. We formulate the problem as an energy-delay (ED) minimization problem and solve it offline using a scalable genetic algorithm approach. Our experiments show up to 40% of savings in the memory sub-system power consumption and 47% savings in energy-delay product (ED).

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We have developed a graphical user interface based dendrimer builder toolkit (DBT) which can be used to generate the dendrimer configuration of desired generation for various dendrimer architectures. The validation of structures generated by this tool was carried out by studying the structural properties of two well known classes of dendrimers: ethylenediamine cored poly(amidoamine) (PAMAM) dendrimer, diaminobutyl cored poly(propylene imine) (PPI) dendrimer. Using full atomistic molecular dynamics (MD) simulation we have calculated the radius of gyration, shape tensor and monomer density distribution for PAMAM and PPI dendrimer at neutral and high pH. A good agreement between the available simulation and experimental (small angle X-ray and neutron scattering; SAXS, SANS) results and calculated radius of gyration was observed. With this validation we have used DBT to build another new class of nitrogen cored poly(propyl ether imine) dendrimer and study it's structural features using all atomistic MD simulation. DBT is a versatile tool and can be easily used to generate other dendrimer structures with different chemistry and topology. The use of general amber force field to describe the intra-molecular interactions allows us to integrate this tool easily with the widely used molecular dynamics software AMBER. This makes our tool a very useful utility which can help to facilitate the study of dendrimer interaction with nucleic acids, protein and lipid bilayer for various biological applications. © 2012 Wiley Periodicals, Inc.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We have developed a graphical user interface based dendrimer builder toolkit (DBT) which can be used to generate the dendrimer configuration of desired generation for various dendrimer architectures. The validation of structures generated by this tool was carried out by studying the structural properties of two well known classes of dendrimers: ethylenediamine cored poly(amidoamine) (PAMAM) dendrimer, diaminobutyl cored poly(propylene imine) (PPI) dendrimer. Using full atomistic molecular dynamics (MD) simulation we have calculated the radius of gyration, shape tensor and monomer density distribution for PAMAM and PPI dendrimer at neutral and high pH. A good agreement between the available simulation and experimental (small angle X-ray and neutron scattering; SAXS, SANS) results and calculated radius of gyration was observed. With this validation we have used DBT to build another new class of nitrogen cored poly(propyl ether imine) dendrimer and study it's structural features using all atomistic MD simulation. DBT is a versatile tool and can be easily used to generate other dendrimer structures with different chemistry and topology. The use of general amber force field to describe the intra-molecular interactions allows us to integrate this tool easily with the widely used molecular dynamics software AMBER. This makes our tool a very useful utility which can help to facilitate the study of dendrimer interaction with nucleic acids, protein and lipid bilayer for various biological applications. (c) 2012 Wiley Periodicals, Inc.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We report on multifunctional devices based on CNT arrays-ZnO nanowires hybrid architectures. The hybrid structure exhibit excellent high current Schottky like behavior with ZnO as p-type and an ideality factor close to the ideal value. Further the CNT-ZnO hybrid structures can be used as high current p-type field effect transistors that can deliver currents of the order of milliamperes and also can be used as ultraviolet detectors with controllable current on-off ratio and response time. The p-type nature of ZnO and possible mechanism for the rectifying characteristics of CNT-ZnO has been presented.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Various leg exercises have been recommended to prevent deep vein thrombosis (DVT), a condition where a blood clot forms in the deep veins, especially during long-haul flights. Accessing the benefit of each of these exercises in avoiding the DVT, which can be fatal, is important in the context of suggesting the correct and the most beneficial exercises. Present work aims at demonstrating the fiber Bragg grating (FBG)-based sensing methodology for measuring surface strains generated on the skin of the calf muscle to evaluate the suggested airline exercises to avoid DVT. As the dataset in the experiment involves multiple subjects performing these exercises, an inertial measurement unit has been used to validate the repetitiveness of each of the exercises. The surface strain on the calf muscle obtained using the FBG sensor, which is a measure of the calf muscle deformation, has been compared against the variation of blood velocity in the femoral vein of the thigh measured using a commercial electronic-phased array color Doppler ultrasound system. Apart from analyzing the effectiveness of suggested exercises, a new exercise which is more effective in terms of strain generated to avoid DVT is proposed and evaluated. (C) 2013 Society of Photo-Optical Instrumentation Engineers (SPIE)