94 resultados para Processor architecture


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Unending quest for performance improvement coupled with the advancements in integrated circuit technology have led to the development of new architectural paradigm. Speculative multithreaded architecture (SpMT) philosophy relies on aggressive speculative execution for improved performance. However, aggressive speculative execution comes with a mixed flavor of improving performance, when successful, and adversely affecting the energy consumption (and performance) because of useless computation in the event of mis-speculation. Dynamic instruction criticality information can be usefully applied to control and guide such an aggressive speculative execution. In this paper, we present a model of micro-execution for SpMT architecture that we have developed to determine the dynamic instruction criticality. We have also developed two novel techniques utilizing the criticality information namely delaying the non-critical loads and the criticality based thread-prediction for reducing useless computations and energy consumption. Experimental results showing break-up of critical instructions and effectiveness of proposed techniques in reducing energy consumption are presented in the context of multiscalar processor that implements SpMT architecture. Our experiments show 17.7% and 11.6% reduction in dynamic energy for criticality based thread prediction and criticality based delayed load scheme respectively while the improvement in dynamic energy delay product is 13.9% and 5.5%, respectively. (c) 2012 Published by Elsevier B.V.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Clustered architecture processors are preferred for embedded systems because centralized register file architectures scale poorly in terms of clock rate, chip area, and power consumption. Although clustering helps by improving the clock speed, reducing the energy consumption of the logic, and making the design simpler, it introduces extra overheads by way of inter-cluster communication. This communication happens over long global wires having high load capacitance which leads to delay in execution and significantly high energy consumption. Inter-cluster communication also introduces many short idle cycles, thereby significantly increasing the overall leakage energy consumption in the functional units. The trend towards miniaturization of devices (and associated reduction in threshold voltage) makes energy consumption in interconnects and functional units even worse, and limits the usability of clustered architectures in smaller technologies. However, technological advancements now permit the design of interconnects and functional units with varying performance and power modes. In this paper, we propose scheduling algorithms that aggregate the scheduling slack of instructions and communication slack of data values to exploit the low-power modes of functional units and interconnects. Finally, we present a synergistic combination of these algorithms that simultaneously saves energy in functional units and interconnects to improves the usability of clustered architectures by achieving better overall energy-performance trade-offs. Even with conservative estimates of the contribution of the functional units and interconnects to the overall processor energy consumption, the proposed combined scheme obtains on average 8% and 10% improvement in overall energy-delay product with 3.5% and 2% performance degradation for a 2-clustered and a 4-clustered machine, respectively. We present a detailed experimental evaluation of the proposed schemes. Our test bed uses the Trimaran compiler infrastructure. (C) 2012 Elsevier Inc. All rights reserved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

High performance video standards use prediction techniques to achieve high picture quality at low bit rates. The type of prediction decides the bit rates and the image quality. Intra Prediction achieves high video quality with significant reduction in bit rate. This paper presents novel area optimized architecture for Intra prediction of H.264 decoding at HDTV resolution. The architecture has been validated on a Xilinx Virtex-5 FPGA based platform and achieved a frame rate of 64 fps. The architecture is based on multi-level memory hierarchy to reduce latency and ensure optimum resources utilization. It removes redundancy by reusing same functional blocks across different modes. The proposed architecture uses only 13% of the total LUTs available on the Xilinx FPGA XC5VLX50T.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Advances in technology have increased the number of cores and size of caches present on chip multicore platforms(CMPs). As a result, leakage power consumption of on-chip caches has already become a major power consuming component of the memory subsystem. We propose to reduce leakage power consumption in static nonuniform cache architecture(SNUCA) on a tiled CMP by dynamically varying the number of cache slices used and switching off unused cache slices. A cache slice in a tile includes all cache banks present in that tile. Switched-off cache slices are remapped considering the communication costs to reduce cache usage with minimal impact on execution time. This saves leakage power consumption in switched-off L2 cache slices. On an average, there map policy achieves 41% and 49% higher EDP savings compared to static and dynamic NUCA (DNUCA) cache policies on a scalable tiled CMP, respectively.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We propose a power scalable digital base band for a low-IF receiver for IEEE 802.15.4-2006. The digital section's sampling frequency and bit width are used as knobs to reduce the power under favorable signal and interference scenarios, thus recovering the design margins introduced to handle worst case conditions. We propose tuning of these knobs based on measurements of Signal and the interference levels. We show that in a 0.13u CMOS technology, for an adaptive digital base band section of the receiver designed to meet the 802.15.4 standard specification, power saving can be up to nearly 85% (0.49mW against 3.3mW) in favorable interference and signal conditions.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A novel composite architecture consisting of a periodic arrangement of closely-spaced spheres of a stiff material embedded in a soft matrix is proposed for extremely high damping and shock absorption capacity. Efficacy of this architecture is demonstrated by compression loading a composite, where multiple steel balls were stacked upon each other in a polydimethylsiloxane (PDMS) matrix, at a low strain-rate of 0.05 s(-1) and a very high strain-rate of >2400 s(-1). The balls slide over each other upon loading, and revert to their original position when the load is removed. Because of imposition of additional strains into the matrix via this reversible, constrained movement of the balls, the composite absorbs significantly larger energy and endures much lesser permanent damage than the monolithic PDMS during both quasi-static and impact loadings. During the impact loading, energy absorbed per unit weight for the composite was, 8 times larger than the monolithic PDMS.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper we present the design of ``e-SURAKSHAK,'' a novel cyber-physical health care management system of Wireless Embedded Internet Devices (WEIDs) that sense vital health parameters. The system is capable of sensing body temperature, heart rate, oxygen saturation level and also allows noninvasive blood pressure (NIBP) measurement. End to end internet connectivity is provided by using 6LoWPAN based wireless network that uses the 802.15.4 radio. A service oriented architecture (SOA) 1] is implemented to extract meaningful information and present it in an easy-to-understand form to the end-user instead of raw data made available by sensors. A central electronic database and health care management software are developed. Vital health parameters are measured and stored periodically in the database. Further, support for real-time measurement of health parameters is provided through a web based GUI. The system has been implemented completely and demonstrated with multiple users and multiple WEIDs.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Gene expression is the most fundamental biological process, which is essential for phenotypic variation. It is regulated by various external (environment and evolution) and internal (genetic) factors. The level of gene expression depends on promoter architecture, along with other external factors. Presence of sequence motifs, such as transcription factor binding sites (TFBSs) and TATA-box, or DNA methylation in vertebrates has been implicated in the regulation of expression of some genes in eukaryotes, but a large number of genes lack these sequences. On the other hand, several experimental and computational studies have shown that promoter sequences possess some special structural properties, such as low stability, less bendability, low nucleosome occupancy, and more curvature, which are prevalent across all organisms. These structural features may play role in transcription initiation and regulation of gene expression. We have studied the relationship between the structural features of promoter DNA, promoter directionality and gene expression variability in S. cerevisiae. This relationship has been analyzed for seven different measures of gene expression variability, along with two different regulatory effect measures. We find that a few of the variability measures of gene expression are linked to DNA structural properties, nucleosome occupancy, TATA-box presence, and bidirectionality of promoter regions. Interestingly, gene responsiveness is most intimately correlated with DNA structural features and promoter architecture.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Many meteorological phenomena occur at different locations simultaneously. These phenomena vary temporally and spatially. It is essential to track these multiple phenomena for accurate weather prediction. Efficient analysis require high-resolution simulations which can be conducted by introducing finer resolution nested simulations, nests at the locations of these phenomena. Simultaneous tracking of these multiple weather phenomena requires simultaneous execution of the nests on different subsets of the maximum number of processors for the main weather simulation. Dynamic variation in the number of these nests require efficient processor reallocation strategies. In this paper, we have developed strategies for efficient partitioning and repartitioning of the nests among the processors. As a case study, we consider an application of tracking multiple organized cloud clusters in tropical weather systems. We first present a parallel data analysis algorithm to detect such clouds. We have developed a tree-based hierarchical diffusion method which reallocates processors for the nests such that the redistribution cost is less. We achieve this by a novel tree reorganization approach. We show that our approach exhibits up to 25% lower redistribution cost and 53% lesser hop-bytes than the processor reallocation strategy that does not consider the existing processor allocation.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper we present a framework for realizing arbitrary instruction set extensions (IE) that are identified post-silicon. The proposed framework has two components viz., an IE synthesis methodology and the architecture of a reconfigurable data-path for realization of the such IEs. The IE synthesis methodology ensures maximal utilization of resources on the reconfigurable data-path. In this context we present the techniques used to realize IEs for applications that demand high throughput or those that must process data streams. The reconfigurable hardware called HyperCell comprises a reconfigurable execution fabric. The fabric is a collection of interconnected compute units. A typical use case of HyperCell is where it acts as a co-processor with a host and accelerates execution of IEs that are defined post-silicon. We demonstrate the effectiveness of our approach by evaluating the performance of some well-known integer kernels that are realized as IEs on HyperCell. Our methodology for realizing IEs through HyperCells permits overlapping of potentially all memory transactions with computations. We show significant improvement in performance for streaming applications over general purpose processor based solutions, by fully pipelining the data-path. (C) 2014 Elsevier B.V. All rights reserved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The current manuscript describes conformational analysis of 15-membered cyclic tetrapeptides (CTPs), with alpha 3 delta architecture, containing sugar amino acids (SAA) having variation in the stereocenter at C5 carbon. Conformational analyses of both the series, in protected and deprotected forms, were carried out in DMSO-d(6) using various NMR techniques, supported by restrained MD calculations. It was intriguing to notice that the alpha 3 delta macrocycles got stabilized by both 10-membered beta-turn as well as a seven-membered gamma-turn, fused within the same macrocycle. The presence of fused sub-structures within a 15-membered macrocycle is rare to see. Also, the stereocenter variation at C5 did not affect the fused turn structures and exhibited similar conformations in both the series. The design becomes highly advantageous as fused reverse turn structures are occurring in the cyclic structure with minimalistic size macrocycle and this can be applied to develop suitable pharmacophores in the drug development process. (C) 2014 Elsevier Ltd. All rights reserved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The correctness of a hard real-time system depends its ability to meet all its deadlines. Existing real-time systems use either a pure real-time scheduler or a real-time scheduler embedded as a real-time scheduling class in the scheduler of an operating system (OS). Existing implementations of schedulers in multicore systems that support real-time and non-real-time tasks, permit the execution of non-real-time tasks in all the cores with priorities lower than those of real-time tasks, but interrupts and softirqs associated with these non-real-time tasks can execute in any core with priorities higher than those of real-time tasks. As a result, the execution overhead of real-time tasks is quite large in these systems, which, in turn, affects their runtime. In order that the hard real-time tasks can be executed in such systems with minimal interference from other Linux tasks, we propose, in this paper, an integrated scheduler architecture, called SchedISA, which aims to considerably reduce the execution overhead of real-time tasks in these systems. In order to test the efficacy of the proposed scheduler, we implemented partitioned earliest deadline first (P-EDF) scheduling algorithm in SchedISA on Linux kernel, version 3.8, and conducted experiments on Intel core i7 processor with eight logical cores. We compared the execution overhead of real-time tasks in the above implementation of SchedISA with that in SCHED_DEADLINE's P-EDF implementation, which concurrently executes real-time and non-real-time tasks in Linux OS in all the cores. The experimental results show that the execution overhead of real-time tasks in the above implementation of SchedISA is considerably less than that in SCHED_DEADLINE. We believe that, with further refinement of SchedISA, the execution overhead of real-time tasks in SchedISA can be reduced to a predictable maximum, making it suitable for scheduling hard real-time tasks without affecting the CPU share of Linux tasks.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The steady-state negative supercoiling of eubacterial genomes is maintained by the action of DNA topoisomerases. Topoisomerase distribution varies in different species of mycobacteria. While Mycobacterium tuberculosis (Mtb) contains a single type I (Topol) and a single type II (Gyrase) enzyme, Mycobacterium smegmatis (Msm) and other members harbour additional relaxases. Topol is essential for Mtb survival. However, the necessity of Topol or other relaxases in Msm has not been investigated. To recognize the importance of Topol for growth, physiology and gene expression of Msm, we have developed a conditional knock-down strain of Topol in Msm. The Topol-depleted strain exhibited extremely slow growth and drastic changes in phenotypic characteristics. The cessation of growth indicates the essential requirement of the enzyme for the organism in spite of having additional DNA relaxation enzymes in the cell. Notably, the imbalance in Topol level led to the altered expression of topology modulatory proteins, resulting in a diffused nucleoid architecture. Proteomic and transcript analysis of the mutant indicated reduced expression of the genes involved in central metabolic pathways and core DNA transaction processes. RNA polymerase (RNAP) distribution on the transcription units was affected in the Topol-depleted cells, suggesting global alteration in transcription. The study thus highlights the essential requirement of Topol in the maintenance of cellular phenotype, growth characteristics and gene expression in mycobacteria. A decrease in Topol level led to altered RNAP occupancy and impaired transcription elongation, causing severe downstream effects.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We propose an architecture for dramatically enhancing the stress bearing and energy absorption capacities of a polymer based composite. Different weight fractions of iron oxide nano-particles (NPs) are mixed in a poly(dimethylesiloxane) (PDMS) matrix either uniformly or into several vertically aligned cylindrical pillars. These composites are compressed up to a strain of 60% at a strain rate of 0.01 s(-1) following which they are fully unloaded at the same rate. Load bearing and energy absorption capacities of the composite with uniform distribution of NPs increase by similar to 50% upon addition of 5 wt% of NPs; however, these properties monotonically decrease with further addition of NPs so much so that the load bearing capacity of the composite becomes 1/6th of PDMS upon addition of 20 wt% of NPs. On the contrary, stress at a strain of 60% and energy absorption capacity of the composites with pillar configuration monotonically increase with the weight fraction of NPs in the pillars wherein the load bearing capacity becomes 1.5 times of PDMS when the pillars consisted of 20 wt% of NPs. In situ mechanical testing of composites with pillars reveals outward bending of the pillars wherein the pillars and the PDMS in between two pillars, located along a radius, are significantly compressed. Reasoning based on effects of compressive hydrostatic stress and shape of fillers is developed to explain the observed anomalous strengthening of the composite with pillar architecture.