10 resultados para MICROARCHITECTURE
em Indian Institute of Science - Bangalore - Índia
Resumo:
This paper presents a power, latency and throughput trade-off study on NoCs by varying microarchitectural (e.g. pipelining) and circuit level (e.g. frequency and voltage) parameters. We change pipelining depth, operating frequency and supply voltage for 3 example NoCs - 16 node 2D Torus, Tree network and Reduced 2D Torus. We use an in-house NoC exploration framework capable of topology generation and comparison using parameterized models of Routers and links developed in SystemC. The framework utilizes interconnect power and delay models from a low-level modelling tool called Intacte[1]1. We find that increased pipelining can actually reduce latency. We also find that there exists an optimal degree of pipelining which is the most energy efficient in terms of minimizing energy-delay product.
Resumo:
This paper proposes the use of empirical modeling techniques for building microarchitecture sensitive models for compiler optimizations. The models we build relate program performance to settings of compiler optimization flags, associated heuristics and key microarchitectural parameters. Unlike traditional analytical modeling methods, this relationship is learned entirely from data obtained by measuring performance at a small number of carefully selected compiler/microarchitecture configurations. We evaluate three different learning techniques in this context viz. linear regression, adaptive regression splines and radial basis function networks. We use the generated models to a) predict program performance at arbitrary compiler/microarchitecture configurations, b) quantify the significance of complex interactions between optimizations and the microarchitecture, and c) efficiently search for'optimal' settings of optimization flags and heuristics for any given microarchitectural configuration. Our evaluation using benchmarks from the SPEC CPU2000 suits suggests that accurate models (< 5% average error in prediction) can be generated using a reasonable number of simulations. We also find that using compiler settings prescribed by a model-based search can improve program performance by as much as 19% (with an average of 9.5%) over highly optimized binaries.
Resumo:
We conducted the present study to investigate the therapeutic effects of the antiresorptive agent zoledronic acid (ZOL), alone and in combination with alfacalcidol (ALF), in a rat model of postmenopausal osteoporosis. Female Wistar rats were ovariectomized (OVX) or sham-operated at 3 months of age. Twelve weeks post surgery, rats were randomized into six groups: (1) sham + vehicle, (2) OVX + vehicle, (3) OVX + ZOL (100 mu g/kg, i.v. single dose), (4) OVX + ZOL (50 mu g/kg, i.v. single dose), (5) OVX + ALF (0.5 mu g/kg, oral gauge daily) and (6) OVX + ZOL (50 mu g/kg, i.v. single dose) + ALF (0.5 mu g/kg, oral gauge daily) for 12 weeks. After treatment, we evaluated the mechanical properties of the lumbar vertebra and femoral mid-shaft. Femurs were also tested for bone density, porosity and trabecular micro-architecture. Biochemical markers in serum and urine were also determined. With respect to improvement in the mechanical strength of the lumbar spine and the femoral mid-shaft, the combination treatment of ZOL and ALF was more effective than each administered as a monotherapy. Moreover, combination therapy using ZOL and ALF preserved the trabecular micro-architecture and cortical bone porosity. Furthermore, the combination treatment of ZOL and ALF corrected the decrease in serum calcium and increase in serum alkaline phosphatase and the tartarate-resistant acid phosphatase level better than single-drug therapy using ZOL or ALF in OVX rats. In addition, the combination treatment of ZOL and ALF corrected the increase in urine calcium, phosphorous and creatinine levels better than single-drug therapy using ZOL or ALF in OVX rats. These data suggest that the combination treatment of ZOL and ALF has a therapeutic advantage over each monotherapy for the treatment of osteoporosis.
Resumo:
CMPs enable simultaneous execution of multiple applications on the same platforms that share cache resources. Diversity in the cache access patterns of these simultaneously executing applications can potentially trigger inter-application interference, leading to cache pollution. Whereas a large cache can ameliorate this problem, the issues of larger power consumption with increasing cache size, amplified at sub-100nm technologies, makes this solution prohibitive. In this paper in order to address the issues relating to power-aware performance of caches, we propose a caching structure that addresses the following: 1. Definition of application-specific cache partitions as an aggregation of caching units (molecules). The parameters of each molecule namely size, associativity and line size are chosen so that the power consumed by it and access time are optimal for the given technology. 2. Application-Specific resizing of cache partitions with variable and adaptive associativity per cache line, way size and variable line size. 3. A replacement policy that is transparent to the partition in terms of size, heterogeneity in associativity and line size. Through simulation studies we establish the superiority of molecular cache (caches built as aggregations of molecules) that offers a 29% power advantage over that of an equivalently performing traditional cache.
Resumo:
This paper describes the design of a power efficient microarchitecture for transient fault detection in chip multiprocessors (CMPs) We introduce a new per-core dynamic voltage and frequency scaling (DVFS) algorithm for our architecture that significantly reduces power dissipation for redundant execution with a minimal performance overhead. Using cycle accurate simulation combined with a simple first order power model, we estimate that our architecture reduces dynamic power dissipation in the redundant core by an mean value of 79% and a maximum of 85% with an associated mean performance overhead of only 1:2%
Resumo:
The inherent temporal locality in memory accesses is filtered out by the L1 cache. As a consequence, an L2 cache with LRU replacement incurs significantly higher misses than the optimal replacement policy (OPT). We propose to narrow this gap through a novel replacement strategy that mimics the replacement decisions of OPT. The L2 cache is logically divided into two components, a Shepherd Cache (SC) with a simple FIFO replacement and a Main Cache (MC) with an emulation of optimal replacement. The SC plays the dual role of caching lines and guiding the replacement decisions in MC. Our pro- posed organization can cover 40% of the gap between OPT and LRU for a 2MB cache resulting in 7% overall speedup. Comparison with the dynamic insertion policy, a victim buffer, a V-Way cache and an LRU based fully associative cache demonstrates that our scheme performs better than all these strategies.
Resumo:
Designing and optimizing high performance microprocessors is an increasingly difficult task due to the size and complexity of the processor design space, high cost of detailed simulation and several constraints that a processor design must satisfy. In this paper, we propose the use of empirical non-linear modeling techniques to assist processor architects in making design decisions and resolving complex trade-offs. We propose a procedure for building accurate non-linear models that consists of the following steps: (i) selection of a small set of representative design points spread across processor design space using latin hypercube sampling, (ii) obtaining performance measures at the selected design points using detailed simulation, (iii) building non-linear models for performance using the function approximation capabilities of radial basis function networks, and (iv) validating the models using an independently and randomly generated set of design points. We evaluate our model building procedure by constructing non-linear performance models for programs from the SPEC CPU2000 benchmark suite with a microarchitectural design space that consists of 9 key parameters. Our results show that the models, built using a relatively small number of simulations, achieve high prediction accuracy (only 2.8% error in CPI estimates on average) across a large processor design space. Our models can potentially replace detailed simulation for common tasks such as the analysis of key microarchitectural trends or searches for optimal processor design points.
Resumo:
Currently beta-adrenergic receptor blockers are considered to be potential drugs under investigation for preventive or therapeutic effect in osteoporosis. However, there is no published data showing the comparative study of beta-blockers with well accepted agents for the treatment of osteoporosis. To address this question, we compared the effects of propranolol with well accepted treatments like zoledronic acid and alfacalcidol in an animal model of postmenopausal osteoporosis. Five days after ovariectomy, 36 ovariectomized (OVX) rats were divided into 6 equal groups, randomized to treatments zoledronic acid (100 mu g/kg, intravenous single dose); alfacalcidol (0.5 mu g/kg, oral gauge daily); propranolol (0.1 mg/kg, subcutaneously 5 days per week) for 12 weeks. Untreated OVX and sham OVX were used as controls. At the end of treatment serum calcium and alkaline phosphatase were assayed. Femurs were removed and tested for bone density, bone porosity, bone mechanical properties and trabecular micro-architecture. Propranolol showed a significant decrease in alkaline phosphatase levels and bone porosity in comparison to OVX control. Moreover, propranolol significantly improved bone density, bone mechanical properties and inhibited the deterioration of trabecular microarchitecture when compared with OVX control. The osteoprotective effect of propranolol was comparable with zoledronic acid and alfacalcidol. Based on this comparative study, the results strongly suggest that propranolol can be a candidate therapeutic drug for the management of postmenopausal osteoporosis.
Resumo:
We investigated the potential of using novel zoledronic acid (ZOL)-hydroxyapatite (HA) nanoparticle based drug formulation in a rat model of postmenopausal osteoporosis. By a classical adsorption method, nanoparticles of HA loaded with ZOL (HNLZ) drug formulation with a size range of 100-130 nm were prepared. 56 female Wistar rats were ovariectomized (OVX) or sham-operated at 3 months of age. Twelve weeks post surgery, rats were randomized into seven groups and treated with various doses of HNLZ (100, 50 and 25 mu g/kg, intravenous single dose), ZOL (100 mu g/kg, intravenous single dose) and HA nanoparticle (100 mu g/kg, intravenous single dose). Untreated OVX and sham OVX served as controls. After three months treatment period, we evaluated the mechanical properties of the lumbar vertebra and femoral mid-shaft. Femurs were also tested for trabecular microarchitecture. Sensitive biochemical markers of bone formation and bone resorption in serum were also determined. With respect to improvement in the mechanical strength of the lumbar spine and the femoral mid-shaft, the therapy with HNLZ drug formulation was more effective than ZOL therapy in OVX rats. Moreover, HNLZ drug therapy preserved the trabecular microarchitecture better than ZOL therapy in OVX rats. Furthermore, the HNLZ drug formulation corrected increase in serum levels of bone-specific alkaline phosphatase, procollagen type I N-terminal propeptide, osteocalcin, tartrate-resistant acid phosphatase 5b and C-telopeptide of type 1 collagen better than ZOL therapy in OVX rats. The results strongly suggest that HNLZ novel drug formulation appears to be more effective approach for treating severe osteoporosis in humans. (C) 2014 Elsevier B.V. All rights reserved.
Resumo:
In this paper, we present Bi-Modal Cache - a flexible stacked DRAM cache organization which simultaneously achieves several objectives: (i) improved cache hit ratio, (ii) moving the tag storage overhead to DRAM, (iii) lower cache hit latency than tags-in-SRAM, and (iv) reduction in off-chip bandwidth wastage. The Bi-Modal Cache addresses the miss rate versus off-chip bandwidth dilemma by organizing the data in a bi-modal fashion - blocks with high spatial locality are organized as large blocks and those with little spatial locality as small blocks. By adaptively selecting the right granularity of storage for individual blocks at run-time, the proposed DRAM cache organization is able to make judicious use of the available DRAM cache capacity as well as reduce the off-chip memory bandwidth consumption. The Bi-Modal Cache improves cache hit latency despite moving the metadata to DRAM by means of a small SRAM based Way Locator. Further by leveraging the tremendous internal bandwidth and capacity that stacked DRAM organizations provide, the Bi-Modal Cache enables efficient concurrent accesses to tags and data to reduce hit time. Through detailed simulations, we demonstrate that the Bi-Modal Cache achieves overall performance improvement (in terms of Average Normalized Turnaround Time (ANTT)) of 10.8%, 13.8% and 14.0% in 4-core, 8-core and 16-core workloads respectively.