983 resultados para architectures


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Frequent accesses to the register file make it one of the major sources of energy consumption in ILP architectures. The large number of functional units connected to a large unified register file in VLIW architectures make power dissipation in the register file even worse because of the need for a large number of ports. High power dissipation in a relatively smaller area occupied by a register file leads to a high power density in the register file and makes it one of the prime hot-spots. This makes it highly susceptible to the possibility of a catastrophic heatstroke. This in turn impacts the performance and cost because of the need for periodic cool down and sophisticated packaging and cooling techniques respectively. Clustered VLIW architectures partition the register file among clusters of functional units and reduce the number of ports required thereby reducing the power dissipation. However, we observe that the aggregate accesses to register files in clustered VLIW architectures (and associated energy consumption) become very high compared to the centralized VLIW architectures and this can be attributed to a large number of explicit inter-cluster communications. Snooping based clustered VLIW architectures provide very limited but very fast way of inter-cluster communication by allowing some of the functional units to directly read some of the operands from the register file of some of the other clusters. In this paper, we propose instruction scheduling algorithms that exploit the limited snooping capability to reduce the register file energy consumption on an average by 12% and 18% and improve the overall performance by 5% and 11% for a 2-clustered and a 4-clustered machine respectively, over an earlier state-of-the-art clustered scheduling algorithm when evaluated in the context of snooping based clustered VLIW architectures.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Miniaturization of devices and the ensuing decrease in the threshold voltage has led to a substantial increase in the leakage component of the total processor energy consumption. Relatively simpler issue logic and the presence of a large number of function units in the VLIW and the clustered VLIW architectures attribute a large fraction of this leakage energy consumption in the functional units. However, functional units are not fully utilized in the VLIW architectures because of the inherent variations in the ILP of the programs. This underutilization is even more pronounced in the context of clustered VLIW architectures because of the contentions for the limited number of slow intercluster communication channels which lead to many short idle cycles.In the past, some architectural schemes have been proposed to obtain leakage energy bene .ts by aggressively exploiting the idleness of functional units. However, presence of many short idle cycles cause frequent transitions from the active mode to the sleep mode and vice-versa and adversely a ffects the energy benefits of a purely hardware based scheme. In this paper, we propose and evaluate a compiler instruction scheduling algorithm that assist such a hardware based scheme in the context of VLIW and clustered VLIW architectures. The proposed scheme exploits the scheduling slacks of instructions to orchestrate the functional unit mapping with the objective of reducing the number of transitions in functional units thereby keeping them off for a longer duration. The proposed compiler-assisted scheme obtains a further 12% reduction of energy consumption of functional units with negligible performance degradation over a hardware-only scheme for a VLIW architecture. The benefits are 15% and 17% in the context of a 2-clustered and a 4-clustered VLIW architecture respectively. Our test bed uses the Trimaran compiler infrastructure.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

ASICs offer the best realization of DSP algorithms in terms of performance, but the cost is prohibitive, especially when the volumes involved are low. However, if the architecture synthesis trajectory for such algorithms is such that the target architecture can be identified as an interconnection of elementary parameterized computational structures, then it is possible to attain a close match, both in terms of performance and power with respect to an ASIC, for any algorithmic parameters of the given algorithm. Such an architecture is weakly programmable (configurable) and can be viewed as an application specific instruction-set processor (ASIP). In this work, we present a methodology to synthesize ASIPs for DSP algorithms.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We propose a novel technique for reducing the power consumed by the on-chip cache in SNUCA chip multicore platform. This is achieved by what we call a "remap table", which maps accesses to the cache banks that are as close as possible to the cores, on which the processes are scheduled. With this technique, instead of using all the available cache, we use a portion of the cache and allocate lesser cache to the application. We formulate the problem as an energy-delay (ED) minimization problem and solve it offline using a scalable genetic algorithm approach. Our experiments show up to 40% of savings in the memory sub-system power consumption and 47% savings in energy-delay product (ED).

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We propose a novel technique for reducing the power consumed by the on-chip cache in SNUCA chip multicore platform. This is achieved by what we call a "remap table", which maps accesses to the cache banks that are as close as possible to the cores, on which the processes are scheduled. With this technique, instead of using all the available cache, we use a portion of the cache and allocate lesser cache to the application. We formulate the problem as an energy-delay (ED) minimization problem and solve it offline using a scalable genetic algorithm approach. Our experiments show up to 40% of savings in the memory sub-system power consumption and 47% savings in energy-delay product (ED).

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We have developed a graphical user interface based dendrimer builder toolkit (DBT) which can be used to generate the dendrimer configuration of desired generation for various dendrimer architectures. The validation of structures generated by this tool was carried out by studying the structural properties of two well known classes of dendrimers: ethylenediamine cored poly(amidoamine) (PAMAM) dendrimer, diaminobutyl cored poly(propylene imine) (PPI) dendrimer. Using full atomistic molecular dynamics (MD) simulation we have calculated the radius of gyration, shape tensor and monomer density distribution for PAMAM and PPI dendrimer at neutral and high pH. A good agreement between the available simulation and experimental (small angle X-ray and neutron scattering; SAXS, SANS) results and calculated radius of gyration was observed. With this validation we have used DBT to build another new class of nitrogen cored poly(propyl ether imine) dendrimer and study it's structural features using all atomistic MD simulation. DBT is a versatile tool and can be easily used to generate other dendrimer structures with different chemistry and topology. The use of general amber force field to describe the intra-molecular interactions allows us to integrate this tool easily with the widely used molecular dynamics software AMBER. This makes our tool a very useful utility which can help to facilitate the study of dendrimer interaction with nucleic acids, protein and lipid bilayer for various biological applications. © 2012 Wiley Periodicals, Inc.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We have developed a graphical user interface based dendrimer builder toolkit (DBT) which can be used to generate the dendrimer configuration of desired generation for various dendrimer architectures. The validation of structures generated by this tool was carried out by studying the structural properties of two well known classes of dendrimers: ethylenediamine cored poly(amidoamine) (PAMAM) dendrimer, diaminobutyl cored poly(propylene imine) (PPI) dendrimer. Using full atomistic molecular dynamics (MD) simulation we have calculated the radius of gyration, shape tensor and monomer density distribution for PAMAM and PPI dendrimer at neutral and high pH. A good agreement between the available simulation and experimental (small angle X-ray and neutron scattering; SAXS, SANS) results and calculated radius of gyration was observed. With this validation we have used DBT to build another new class of nitrogen cored poly(propyl ether imine) dendrimer and study it's structural features using all atomistic MD simulation. DBT is a versatile tool and can be easily used to generate other dendrimer structures with different chemistry and topology. The use of general amber force field to describe the intra-molecular interactions allows us to integrate this tool easily with the widely used molecular dynamics software AMBER. This makes our tool a very useful utility which can help to facilitate the study of dendrimer interaction with nucleic acids, protein and lipid bilayer for various biological applications. (c) 2012 Wiley Periodicals, Inc.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We report on multifunctional devices based on CNT arrays-ZnO nanowires hybrid architectures. The hybrid structure exhibit excellent high current Schottky like behavior with ZnO as p-type and an ideality factor close to the ideal value. Further the CNT-ZnO hybrid structures can be used as high current p-type field effect transistors that can deliver currents of the order of milliamperes and also can be used as ultraviolet detectors with controllable current on-off ratio and response time. The p-type nature of ZnO and possible mechanism for the rectifying characteristics of CNT-ZnO has been presented.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We report the synthesis and aggregation behaviour of new water-soluble, bile acid derived tripodal architectures based on a core derived from triphenylphosphine oxide. We employed the well-established copper-catalysed 1,3]-dipolar cycloaddition (CuAAC) for the construction of these tripodal molecules. The aggregation behaviour of these molecules in aqueous media was studied by different analytical methods such as dye solubilisation, dynamic light scattering, NMR and AFM. These molecular architectures also offer an additional advantage in aiding understanding of the influence of the nature of the bile acid backbone and of the configuration at the steroid C-3 position in these architectures; to the best of our knowledge this has not been reported in the literature. The unique gelation properties of the -derivatives were explained through molecular modelling studies and the mechanical behaviour of these gels was studied by rheology experiments.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Programming for parallel architectures that do not have a shared address space is extremely difficult due to the need for explicit communication between memories of different compute devices. A heterogeneous system with CPUs and multiple GPUs, or a distributed-memory cluster are examples of such systems. Past works that try to automate data movement for distributed-memory architectures can lead to excessive redundant communication. In this paper, we propose an automatic data movement scheme that minimizes the volume of communication between compute devices in heterogeneous and distributed-memory systems. We show that by partitioning data dependences in a particular non-trivial way, one can generate data movement code that results in the minimum volume for a vast majority of cases. The techniques are applicable to any sequence of affine loop nests and works on top of any choice of loop transformations, parallelization, and computation placement. The data movement code generated minimizes the volume of communication for a particular configuration of these. We use a combination of powerful static analyses relying on the polyhedral compiler framework and lightweight runtime routines they generate, to build a source-to-source transformation tool that automatically generates communication code. We demonstrate that the tool is scalable and leads to substantial gains in efficiency. On a heterogeneous system, the communication volume is reduced by a factor of 11X to 83X over state-of-the-art, translating into a mean execution time speedup of 1.53X. On a distributed-memory cluster, our scheme reduces the communication volume by a factor of 1.4X to 63.5X over state-of-the-art, resulting in a mean speedup of 1.55X. In addition, our scheme yields a mean speedup of 2.19X over hand-optimized UPC codes.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

To harvest solar energy more efficiently, novel Ag2S/Bi2WO6 heterojunctions were synthesized by a hydrothermal route. This novel photocatalyst was synthesized by impregnating Ag2S into a Bi2WO6 semiconductor by a hydrothermal route without any surfactants or templates. The as prepared structures were characterized by multiple techniques such as X-ray diffraction (XRD), X-ray photoelectron spectroscopy (XPS), Brunauer-Emmet-Teller (BET) analysis, scanning electron microscopy (SEM), transmission electron microscopy (TEM), energy dispersive X-ray spectrometry (EDS), UV-vis diffuse reflection spectroscopy (DRS) and photoluminescence (PL). The characterization results suggest mesoporous hierarchical spherical structures with a high surface area and improved photo response in the visible spectrum. Compared to bare Bi2WO6, Ag2S/Bi2WO6 exhibited much higher photocatalytic activity towards the degradation of dye Rhodamine B (RhB). Although silver based catalysts are easily eroded by photogenerated holes, the Ag2S/Bi2WO6 photocatalyst was found to be highly stable in the cyclic experiments. Based on the results of BET, Pl and DRS analysis, two possible reasons have been proposed for the enhanced visible light activity and stability of this novel photocatalyst: (1) broadening of the photoabsorption range and (2) efficient separation of photoinduced charge carriers which does not allow the photoexcited electrons to accumulate on the conduction band of Ag2S and hence prevents the photocorrosion.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We investigate the problem of timing recovery for 2-D magnetic recording (TDMR) channels. We develop a timing error model for TDMR channel considering the phase and frequency offsets with noise. We propose a 2-D data-aided phase-locked loop (PLL) architecture for tracking variations in the position and movement of the read head in the down-track and cross-track directions and analyze the convergence of the algorithm under non-separable timing errors. We further develop a 2-D interpolation-based timing recovery scheme that works in conjunction with the 2-D PLL. We quantify the efficiency of our proposed algorithms by simulations over a 2-D magnetic recording channel with timing errors.