22 resultados para Architecture and state
Resumo:
Advances in technology have increased the number of cores and size of caches present on chip multicore platforms(CMPs). As a result, leakage power consumption of on-chip caches has already become a major power consuming component of the memory subsystem. We propose to reduce leakage power consumption in static nonuniform cache architecture(SNUCA) on a tiled CMP by dynamically varying the number of cache slices used and switching off unused cache slices. A cache slice in a tile includes all cache banks present in that tile. Switched-off cache slices are remapped considering the communication costs to reduce cache usage with minimal impact on execution time. This saves leakage power consumption in switched-off L2 cache slices. On an average, there map policy achieves 41% and 49% higher EDP savings compared to static and dynamic NUCA (DNUCA) cache policies on a scalable tiled CMP, respectively.
Resumo:
Multi-GPU machines are being increasingly used in high-performance computing. Each GPU in such a machine has its own memory and does not share the address space either with the host CPU or other GPUs. Hence, applications utilizing multiple GPUs have to manually allocate and manage data on each GPU. Existing works that propose to automate data allocations for GPUs have limitations and inefficiencies in terms of allocation sizes, exploiting reuse, transfer costs, and scalability. We propose a scalable and fully automatic data allocation and buffer management scheme for affine loop nests on multi-GPU machines. We call it the Bounding-Box-based Memory Manager (BBMM). BBMM can perform at runtime, during standard set operations like union, intersection, and difference, finding subset and superset relations on hyperrectangular regions of array data (bounding boxes). It uses these operations along with some compiler assistance to identify, allocate, and manage data required by applications in terms of disjoint bounding boxes. This allows it to (1) allocate exactly or nearly as much data as is required by computations running on each GPU, (2) efficiently track buffer allocations and hence maximize data reuse across tiles and minimize data transfer overhead, and (3) and as a result, maximize utilization of the combined memory on multi-GPU machines. BBMM can work with any choice of parallelizing transformations, computation placement, and scheduling schemes, whether static or dynamic. Experiments run on a four-GPU machine with various scientific programs showed that BBMM reduces data allocations on each GPU by up to 75% compared to current allocation schemes, yields performance of at least 88% of manually written code, and allows excellent weak scaling.
Resumo:
Gray water treatment and reuse is an immediate option to counter the upcoming water shortages in various parts of world, especially urban areas. Anaerobic treatment of gray water in houses is an alternative low cost, low energy and low sludge generating option that can meet this challenge. Typical problems of fluctuating VFA, low pH and sludge washout at low loading rates with gray water feedstock was overcome in two chambered anaerobic biofilm reactors using natural fibers as the biofilm support. The long term performance of using natural fiber based biofilms at moderate and low organic loading rates (OLR) have been examined. Biofilms raised on natural fibers (coir, ridge-gourd) were similar to that of synthetic media (PVC, polyethylene) at lower OLR when operated in pulse fed mode without effluent recirculation and achieved 80-90% COD removal at HRT of 2 d showing a small variability during start-up. Confocal microscopy of the biofilms on natural fibers indicated thinner biofilms, dense cell architecture and low extra cellular polymeric substances (EPS) compared to synthetic supports and this is believed to be key factor in high performance at low OLR and low strength gray water. Natural fibers are thus shown to be an effective biofilm support that withstand fluctuating characteristic of domestic gray water. (C) 2013 The Institution of Chemical Engineers. Published by Elsevier B.V. All rights reserved.
Resumo:
We conducted the present study to investigate the therapeutic effects of the antiresorptive agent zoledronic acid (ZOL), alone and in combination with alfacalcidol (ALF), in a rat model of postmenopausal osteoporosis. Female Wistar rats were ovariectomized (OVX) or sham-operated at 3 months of age. Twelve weeks post surgery, rats were randomized into six groups: (1) sham + vehicle, (2) OVX + vehicle, (3) OVX + ZOL (100 mu g/kg, i.v. single dose), (4) OVX + ZOL (50 mu g/kg, i.v. single dose), (5) OVX + ALF (0.5 mu g/kg, oral gauge daily) and (6) OVX + ZOL (50 mu g/kg, i.v. single dose) + ALF (0.5 mu g/kg, oral gauge daily) for 12 weeks. After treatment, we evaluated the mechanical properties of the lumbar vertebra and femoral mid-shaft. Femurs were also tested for bone density, porosity and trabecular micro-architecture. Biochemical markers in serum and urine were also determined. With respect to improvement in the mechanical strength of the lumbar spine and the femoral mid-shaft, the combination treatment of ZOL and ALF was more effective than each administered as a monotherapy. Moreover, combination therapy using ZOL and ALF preserved the trabecular micro-architecture and cortical bone porosity. Furthermore, the combination treatment of ZOL and ALF corrected the decrease in serum calcium and increase in serum alkaline phosphatase and the tartarate-resistant acid phosphatase level better than single-drug therapy using ZOL or ALF in OVX rats. In addition, the combination treatment of ZOL and ALF corrected the increase in urine calcium, phosphorous and creatinine levels better than single-drug therapy using ZOL or ALF in OVX rats. These data suggest that the combination treatment of ZOL and ALF has a therapeutic advantage over each monotherapy for the treatment of osteoporosis.
Resumo:
The Lattice-Boltzmann method (LBM), a promising new particle-based simulation technique for complex and multiscale fluid flows, has seen tremendous adoption in recent years in computational fluid dynamics. Even with a state-of-the-art LBM solver such as Palabos, a user has to still manually write the program using library-supplied primitives. We propose an automated code generator for a class of LBM computations with the objective to achieve high performance on modern architectures. Few studies have looked at time tiling for LBM codes. We exploit a key similarity between stencils and LBM to enable polyhedral optimizations and in turn time tiling for LBM. We also characterize the performance of LBM with the Roofline performance model. Experimental results for standard LBM simulations like Lid Driven Cavity, Flow Past Cylinder, and Poiseuille Flow show that our scheme consistently outperforms Palabos-on average by up to 3x while running on 16 cores of an Intel Xeon (Sandybridge). We also obtain an improvement of 2.47x on the SPEC LBM benchmark.
Resumo:
Human transthyretin (hTTR) is a multifunctional protein that is involved in several neurodegenerative diseases. Besides the transportation of thyroxin and vitamin A, it is also involved in the proteolysis of apolipoprotein A1 and A beta peptide. Extensive analyses of 32 high-resolution X-ray and neutron diffraction structures of hTTR followed by molecular-dynamics simulation studies using a set of 15 selected structures affirmed the presence of 44 conserved water molecules in its dimeric structure. They are found to play several important roles in the structure and function of the protein. Eight water molecules stabilize the dimeric structure through an extensive hydrogen-bonding network. The absence of some of these water molecules in highly acidic conditions (pH <= 4.0) severely affects the interfacial hydrogen-bond network, which may destabilize the native tetrameric structure, leading to its dissociation. Three pairs of conserved water molecules contribute to maintaining the geometry of the ligand-binding cavities. Some other water molecules control the orientation and dynamics of different structural elements of hTTR. This systematic study of the location, absence, networking and interactions of the conserved water molecules may shed some light on various structural and functional aspects of the protein. The present study may also provide some rational clues about the conserved water-mediated architecture and stability of hTTR.
Resumo:
Graph algorithms have been shown to possess enough parallelism to keep several computing resources busy-even hundreds of cores on a GPU. Unfortunately, tuning their implementation for efficient execution on a particular hardware configuration of heterogeneous systems consisting of multicore CPUs and GPUs is challenging, time consuming, and error prone. To address these issues, we propose a domain-specific language (DSL), Falcon, for implementing graph algorithms that (i) abstracts the hardware, (ii) provides constructs to write explicitly parallel programs at a higher level, and (iii) can work with general algorithms that may change the graph structure (morph algorithms). We illustrate the usage of our DSL to implement local computation algorithms (that do not change the graph structure) and morph algorithms such as Delaunay mesh refinement, survey propagation, and dynamic SSSP on GPU and multicore CPUs. Using a set of benchmark graphs, we illustrate that the generated code performs close to the state-of-the-art hand-tuned implementations.