Biblioteca Digital

48 resultados para Conservative pact

A Scalable Low Power Store Queue For Large Instruction Window Superscalar processors

Relevância:

10.00% 10.00%

Publicador:

Veja mais

Two-level Mapping Based Cache Index Selection for Packet Forwarding Engines

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Packet forwarding is a memory-intensive application requiring multiple accesses through a trie structure. The efficiency of a cache for this application critically depends on the placement function to reduce conflict misses. Traditional placement functions use a one-level mapping that naively partitions trie-nodes into cache sets. However, as a significant percentage of trie nodes are not useful, these schemes suffer from a non-uniform distribution of useful nodes to sets. This in turn results in increased conflict misses. Newer organizations such as variable associativity caches achieve flexibility in placement at the expense of increased hit-latency. This makes them unsuitable for L1 caches.We propose a novel two-level mapping framework that retains the hit-latency of one-level mapping yet incurs fewer conflict misses. This is achieved by introducing a secondlevel mapping which reorganizes the nodes in the naive initial partitions into refined partitions with near-uniform distribution of nodes. Further as this remapping is accomplished by simply adapting the index bits to a given routing table the hit-latency is not affected. We propose three new schemes which result in up to 16% reduction in the number of misses and 13% speedup in memory access time. In comparison, an XOR-based placement scheme known to perform extremely well for general purpose architectures, can obtain up to 2% speedup in memory access time.

Veja mais

Performance analysis of subband-level channel quality indicator feedback scheme of LTE

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Frequency-domain scheduling and rate adaptation enable next generation wireless cellular systems such as Long Term Evolution (LTE) to achieve significantly higher downlink throughput. LTE assigns subcarriers in chunks, called physical resource blocks (PRBs), to users to reduce control signaling overhead. To reduce the enormous feedback overhead, the channel quality indicator (CQI) report that is used to feed back channel state information is averaged over a subband, which, in turn, is a group of multiple PRBs. In this paper, we develop closed-form expressions for the throughput achieved by the subband-level CQI feedback mechanism of LTE. We show that the coarse frequency resolution of the CQI incurs a significant loss in throughput and limits the multi-user gains achievable by the system. We then show that the performance can be improved by means of an offset mechanism that effectively makes the users more conservative in reporting their CQI.

Veja mais

Generalization of the collision cone approach for motion safety in 3-D environments

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Avoidance of collision between moving objects in a 3-D environment is fundamental to the problem of planning safe trajectories in dynamic environments. This problem appears in several diverse fields including robotics, air vehicles, underwater vehicles and computer animation. Most of the existing literature on collision prediction assumes objects to be modelled as spheres. While the conservative spherical bounding box is valid in many cases, in many other cases, where objects operate in close proximity, a less conservative approach, that allows objects to be modelled using analytic surfaces that closely mimic the shape of the object, is more desirable. In this paper, a collision cone approach (previously developed only for objects moving on a plane) is used to determine collision between objects, moving in 3-D space, whose shapes can be modelled by general quadric surfaces. Exact collision conditions for such quadric surfaces are obtained and used to derive dynamic inversion based avoidance strategies.

Veja mais

Compiler-assisted energy optimization for clustered VLIW processors

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Clustered architecture processors are preferred for embedded systems because centralized register file architectures scale poorly in terms of clock rate, chip area, and power consumption. Although clustering helps by improving the clock speed, reducing the energy consumption of the logic, and making the design simpler, it introduces extra overheads by way of inter-cluster communication. This communication happens over long global wires having high load capacitance which leads to delay in execution and significantly high energy consumption. Inter-cluster communication also introduces many short idle cycles, thereby significantly increasing the overall leakage energy consumption in the functional units. The trend towards miniaturization of devices (and associated reduction in threshold voltage) makes energy consumption in interconnects and functional units even worse, and limits the usability of clustered architectures in smaller technologies. However, technological advancements now permit the design of interconnects and functional units with varying performance and power modes. In this paper, we propose scheduling algorithms that aggregate the scheduling slack of instructions and communication slack of data values to exploit the low-power modes of functional units and interconnects. Finally, we present a synergistic combination of these algorithms that simultaneously saves energy in functional units and interconnects to improves the usability of clustered architectures by achieving better overall energy-performance trade-offs. Even with conservative estimates of the contribution of the functional units and interconnects to the overall processor energy consumption, the proposed combined scheme obtains on average 8% and 10% improvement in overall energy-delay product with 3.5% and 2% performance degradation for a 2-clustered and a 4-clustered machine, respectively. We present a detailed experimental evaluation of the proposed schemes. Our test bed uses the Trimaran compiler infrastructure. (C) 2012 Elsevier Inc. All rights reserved.

Veja mais

Making STMs Cache Friendly with Compiler Transformations

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Software transactional memory (STM) is a promising programming paradigm for shared memory multithreaded programs. In order for STMs to be adopted widely for performance critical software, understanding and improving the cache performance of applications running on STM becomes increasingly crucial, as the performance gap between processor and memory continues to grow. In this paper, we present the most detailed experimental evaluation to date, of the cache behavior of STM applications and quantify the impact of the different STM factors on the cache misses experienced by the applications. We find that STMs are not cache friendly, with the data cache stall cycles contributing to more than 50% of the execution cycles in a majority of the benchmarks. We find that on an average, misses occurring inside the STM account for 62% of total data cache miss latency cycles experienced by the applications and the cache performance is impacted adversely due to certain inherent characteristics of the STM itself. The above observations motivate us to propose a set of specific compiler transformations targeted at making the STMs cache friendly. We find that STM's fine grained and application unaware locking is a major contributor to its poor cache behavior. Hence we propose selective Lock Data co-location (LDC) and Redundant Lock Access Removal (RLAR) to address the lock access misses. We find that even transactions that are completely disjoint access parallel, suffer from costly coherence misses caused by the centralized global time stamp updates and hence we propose the Selective Per-Partition Time Stamp (SPTS) transformation to address this. We show that our transformations are effective in improving the cache behavior of STM applications by reducing the data cache miss latency by 20.15% to 37.14% and improving execution time by 18.32% to 33.12% in five of the 8 STAMP applications.

Veja mais

Perturbative expansion of the QCD Adler function improved by renormalization-group summation and analytic continuation in the Borel plane

Relevância:

10.00% 10.00%

Publicador:

Resumo:

We examine the large-order behavior of a recently proposed renormalization-group-improved expansion of the Adler function in perturbative QCD, which sums in an analytically closed form the leading logarithms accessible from renormalization-group invariance. The expansion is first written as an effective series in powers of the one-loop coupling, and its leading singularities in the Borel plane are shown to be identical to those of the standard ``contour-improved'' expansion. Applying the technique of conformal mappings for the analytic continuation in the Borel plane, we define a class of improved expansions, which implement both the renormalization-group invariance and the knowledge about the large-order behavior of the series. Detailed numerical studies of specific models for the Adler function indicate that the new expansions have remarkable convergence properties up to high orders. Using these expansions for the determination of the strong coupling from the hadronic width of the tau lepton we obtain, with a conservative estimate of the uncertainty due to the nonperturbative corrections, alpha(s)(M-tau(2)) = 0.3189(-0.0151)(+0.0173), which translates to alpha(s)(M-Z(2)) = 0.1184(-0.0018)(+0.0021). DOI: 10.1103/PhysRevD.87.014008

Veja mais

Row-buffer reorganization: simultaneously improving performance and reducing energy in DRAMs

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this paper, based on the temporal and spatial locality characteristics of memory accesses in multicores, we propose a re-organization of the existing single large row buffer in a DRAM bank into multiple smaller row-buffers. The proposed configuration helps improve the row hit rates and also brings down the energy required for row-activations. The major contribution of this work is proposing such a reorganization without requiring any significant changes to the existing widely accepted DRAM specifications. Our proposed reorganization improves performance by 35.8%, 14.5% and 21.6% in quad, eight and sixteen core workloads along with a 42%, 28% and 31% reduction in DRAM energy. Additionally, we introduce a Need Based Allocation scheme for buffer management that shows additional performance improvement.

Veja mais

Penetrative phototactic bioconvection in an isotropic scattering suspension

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Phototaxis is a directed swimming response dependent upon the light intensity sensed by micro-organisms. Positive (negative) phototaxis denotes the motion directed towards (away from) the source of light. Using the phototaxis model of Ghorai, Panda, and Hill ''Bioconvection in a suspension of isotropically scattering phototactic algae,'' Phys. Fluids 22, 071901 (2010)], we investigate two-dimensional phototactic bioconvection in an absorbing and isotropic scattering suspension in the nonlinear regime. The suspension is confined by a rigid bottom boundary, and stress-free top and lateral boundaries. The governing equations for phototactic bioconvection consist of Navier-Stokes equations for an incompressible fluid coupled with a conservation equation for micro-organisms and the radiative transfer equation for light transport. The governing system is solved efficiently using a semi-implicit second-order accurate conservative finite-difference method. The radiative transfer equation is solved by the finite volume method using a suitable step scheme. The resulting bioconvective patterns differ qualitatively from those found by Ghorai and Hill ''Penetrative phototactic bioconvection,'' Phys. Fluids 17, 074101 (2005)] at a higher critical wavelength due to the effects of scattering. The solutions show transition from steady state to periodic oscillations as the governing parameters are varied. Also, we notice the accumulation of micro-organisms in two horizontal layers at two different depths via their mean swimming orientation profile for some governing parameters at a higher scattering albedo. (C) 2013 AIP Publishing LLC.

Veja mais

Fast and efficient automatic memory management for GPUs using compiler-assisted runtime coherence scheme

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Exploiting the performance potential of GPUs requires managing the data transfers to and from them efficiently which is an error-prone and tedious task. In this paper, we develop a software coherence mechanism to fully automate all data transfers between the CPU and GPU without any assistance from the programmer. Our mechanism uses compiler analysis to identify potential stale accesses and uses a runtime to initiate transfers as necessary. This allows us to avoid redundant transfers that are exhibited by all other existing automatic memory management proposals. We integrate our automatic memory manager into the X10 compiler and runtime, and find that it not only results in smaller and simpler programs, but also eliminates redundant memory transfers. Tested on eight programs ported from the Rodinia benchmark suite it achieves (i) a 1.06x speedup over hand-tuned manual memory management, and (ii) a 1.29x speedup over another recently proposed compiler--runtime automatic memory management system. Compared to other existing runtime-only and compiler-only proposals, it also transfers 2.2x to 13.3x less data on average.

Veja mais

Increasing cleavage specificity and activity of restriction endonuclease KpnI

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Restriction enzyme KpnI is a HNH superfamily endonuclease requiring divalent metal ions for DNA cleavage but not for binding. The active site of KpnI can accommodate metal ions of different atomic radii for DNA cleavage. Although Mg2+ ion higher than 500 mu M mediates promiscuous activity, Ca2+ suppresses the promiscuity and induces high cleavage fidelity. Here, we report that a conservative mutation of the metal-coordinating residue D148 to Glu results in the elimination of the Ca2+-mediated cleavage but imparting high cleavage fidelity with Mg2+. High cleavage fidelity of the mutant D148E is achieved through better discrimination of the target site at the binding and cleavage steps. Biochemical experiments and molecular dynamics simulations suggest that the mutation inhibits Ca2+-mediated cleavage activity by altering the geometry of the Ca2+-bound HNH active site. Although the D148E mutant reduces the specific activity of the enzyme, we identified a suppressor mutation that increases the turnover rate to restore the specific activity of the high fidelity mutant to the wild-type level. Our results show that active site plasticity in coordinating different metal ions is related to KpnI promiscuous activity, and tinkering the metal ion coordination is a plausible way to reduce promiscuous activity of metalloenzymes.

Veja mais

Zero miss distance guidance using feedforward and periodic control

Relevância:

10.00% 10.00%

Publicador:

Resumo:

There have been attempts at obtaining robust guidance laws to ensure zero miss distance (ZMD) for interceptors with parametric uncertainties. All these laws require the plant to be of minimum phase type to enable the overall guidance loop transfer function to satisfy strict positive realness (SPR). The SPR property implies absolute stability of the closed loop system, and has been shown in the literature to lead to ZMD because it avoids saturation of lateral acceleration. In these works higher order interceptors are reduced to lower order equivalent models for which control laws are designed to ensure ZMD. However, it has also been shown that when the original system with right half plane (RHP) zeros is considered, the resulting miss distances, using such strategies, can be quite high. In this paper, an alternative approach using the circle criterion establishes the conditions for absolute stability of the guidance loop and relaxes the conservative nature of some earlier results arising from assumption of in�nite engagement time. Further, a feedforward scheme in conjunction with a lead-lag compensator is used as one control strategy while a generalized sampled hold function is used as a second strategy, to shift the RHP transmission zeros, thereby achieving ZMD. It is observed that merely shifting the RHP zero(s) to the left half plane reduces miss distances signi�cantly even when no additional controllers are used to ensure SPR conditions.

Veja mais

Generating Efficient Data Movement Code for Heterogeneous Architectures with Distributed-Memory

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Programming for parallel architectures that do not have a shared address space is extremely difficult due to the need for explicit communication between memories of different compute devices. A heterogeneous system with CPUs and multiple GPUs, or a distributed-memory cluster are examples of such systems. Past works that try to automate data movement for distributed-memory architectures can lead to excessive redundant communication. In this paper, we propose an automatic data movement scheme that minimizes the volume of communication between compute devices in heterogeneous and distributed-memory systems. We show that by partitioning data dependences in a particular non-trivial way, one can generate data movement code that results in the minimum volume for a vast majority of cases. The techniques are applicable to any sequence of affine loop nests and works on top of any choice of loop transformations, parallelization, and computation placement. The data movement code generated minimizes the volume of communication for a particular configuration of these. We use a combination of powerful static analyses relying on the polyhedral compiler framework and lightweight runtime routines they generate, to build a source-to-source transformation tool that automatically generates communication code. We demonstrate that the tool is scalable and leads to substantial gains in efficiency. On a heterogeneous system, the communication volume is reduced by a factor of 11X to 83X over state-of-the-art, translating into a mean execution time speedup of 1.53X. On a distributed-memory cluster, our scheme reduces the communication volume by a factor of 1.4X to 63.5X over state-of-the-art, resulting in a mean speedup of 1.55X. In addition, our scheme yields a mean speedup of 2.19X over hand-optimized UPC codes.

Veja mais

Parallel Flow-Sensitive Pointer Analysis by Graph-Rewriting

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Precise pointer analysis is a problem of interest to both the compiler and the program verification community. Flow-sensitivity is an important dimension of pointer analysis that affects the precision of the final result computed. Scaling flow-sensitive pointer analysis to millions of lines of code is a major challenge. Recently, staged flow-sensitive pointer analysis has been proposed, which exploits a sparse representation of program code created by staged analysis. In this paper we formulate the staged flow-sensitive pointer analysis as a graph-rewriting problem. Graph-rewriting has already been used for flow-insensitive analysis. However, formulating flow-sensitive pointer analysis as a graph-rewriting problem adds additional challenges due to the nature of flow-sensitivity. We implement our parallel algorithm using Intel Threading Building Blocks and demonstrate considerable scaling (upto 2.6x) for 8 threads on a set of 10 benchmarks. Compared to the sequential implementation of staged flow-sensitive analysis, a single threaded execution of our implementation performs better in 8 of the benchmarks.

Veja mais

Using long time series of agricultural-derived nitrates for estimating catchment transit times

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The estimation of water and solute transit times in catchments is crucial for predicting the response of hydrosystems to external forcings (climatic or anthropogenic). The hydrogeochemical signatures of tracers (either natural or anthropogenic) in streams have been widely used to estimate transit times in catchments as they integrate the various processes at stake. However, most of these tracers are well suited for catchments with mean transit times lower than about 4-5 years. Since the second half of the 20th century, the intensification of agriculture led to a general increase of the nitrogen load in rivers. As nitrate is mainly transported by groundwater in agricultural catchments, this signal can be used to estimate transit times greater than several years, even if nitrate is not a conservative tracer. Conceptual hydrological models can be used to estimate catchment transit times provided their consistency is demonstrated, based on their ability to simulate the stream chemical signatures at various time scales and catchment internal processes such as N storage in groundwater. The objective of this study was to assess if a conceptual lumped model was able to simulate the observed patterns of nitrogen concentration, at various time scales, from seasonal to pluriannual and thus if it was relevant to estimate the nitrogen transit times in headwater catchments. A conceptual lumped model, representing shallow groundwater flow as two parallel linear stores with double porosity, and riparian processes by a constant nitrogen removal function, was applied on two paired agricultural catchments which belong to the Research Observatory ORE AgrHys. The Global Likelihood Uncertainty Estimation (GLUE) approach was used to estimate parameter values and uncertainties. The model performance was assessed on (i) its ability to simulate the contrasted patterns of stream flow and stream nitrate concentrations at seasonal and inter-annual time scales, (ii) its ability to simulate the patterns observed in groundwater at the same temporal scales, and (iii) the consistency of long-term simulations using the calibrated model and the general pattern of the nitrate concentration increase in the region since the beginning of the intensification of agriculture in the 1960s. The simulated nitrate transit times were found more sensitive to climate variability than to parameter uncertainty, and average values were found to be consistent with results from others studies in the same region involving modeling and groundwater dating. This study shows that a simple model can be used to simulate the main dynamics of nitrogen in an intensively polluted catchment and then be used to estimate the transit times of these pollutants in the system which is crucial to guide mitigation plans design and assessment. (C) 2015 Elsevier B.V. All rights reserved.

Veja mais

48 resultados para Conservative pact

Filtro por publicador