Biblioteca Digital

Energy efficiency is an essential requirement for all contemporary computing systems. We thus need tools to measure the energy consumption of computing systems and to understand how workloads affect it. Significant recent research effort has targeted direct power measurements on production computing systems using on-board sensors or external instruments. These direct methods have in turn guided studies of software techniques to reduce energy consumption via workload allocation and scaling. Unfortunately, direct energy measurements are hampered by the low power sampling frequency of power sensors. The coarse granularity of power sensing limits our understanding of how power is allocated in systems and our ability to optimize energy efficiency via workload allocation.
We present ALEA, a tool to measure power and energy consumption at the granularity of basic blocks, using a probabilistic approach. ALEA provides fine-grained energy profiling via sta- tistical sampling, which overcomes the limitations of power sens- ing instruments. Compared to state-of-the-art energy measurement tools, ALEA provides finer granularity without sacrificing accuracy. ALEA achieves low overhead energy measurements with mean error rates between 1.4% and 3.5% in 14 sequential and paral- lel benchmarks tested on both Intel and ARM platforms. The sampling method caps execution time overhead at approximately 1%. ALEA is thus suitable for online energy monitoring and optimization. Finally, ALEA is a user-space tool with a portable, machine-independent sampling method. We demonstrate two use cases of ALEA, where we reduce the energy consumption of a k-means computational kernel by 37% and an ocean modelling code by 33%, compared to high-performance execution baselines, by varying the power optimization strategy between basic blocks.

Veja mais

Storage of sediment-associated nutrients and contaminants in river channel and floodplain systems

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Samples of fine-grained channel bed sediment and overbank floodplain deposits were collected along the main channels of the Rivers Aire (and its main tributary, the River Calder) and Swale, in Yorkshire, UK, in order to investigate downstream changes in the storage and deposition of heavy metals (Cr, Cu, Pb, Zn), total P and the sum of selected PCB congeners, and to estimate the total storage of these contaminants within the main channels and floodplains of these river systems. Downstream trends in the contaminant content of the <63 μm fraction of channel bed and floodplain sediment in the study rivers are controlled mainly by the location of the main sources of the contaminants, which varies between rivers. In the Rivers Aire and Calder, the contaminant content of the <63 μm fraction of channel bed and floodplain sediment generally increases in a downstream direction, reflecting the location of the main urban and industrialized areas in the middle and lower parts of the basin. In the River Swale, the concentrations of most of the contaminants examined are approximately constant along the length of the river, due to the relatively unpolluted nature of this river. However, the Pb and Zn content of fine channel bed sediment decreases downstream, due to the location of historic metal mines in the headwaters of this river, and the effect of downstream dilution with uncontaminated sediment. The magnitude and spatial variation of contaminant storage and deposition on channel beds and floodplains are also controlled by the amount of <63 μm sediment stored on the channel bed and deposited on the floodplain during overbank events. Consequently, contaminant deposition and storage are strongly influenced by the surface area of the floodplain and channel bed. Contaminant storage on the channel beds of the study rivers is, therefore, generally greatest in the middle and lower reaches of the rivers, since channel width increases downstream. Comparisons of the estimates of total storage of specific contaminants on the channel beds of the main channel systems of the study rivers with the annual contaminant flux at the catchment outlets indicate that channel storage represents <3% of the outlet flux and is, therefore, of limited importance in regulating that flux. Similar comparisons between the annual deposition flux of specific contaminants to the floodplains of the study rivers and the annual contaminant flux at the catchment outlet, emphasise the potential importance of floodplain deposition as a conveyance loss. In the case of the River Aire the floodplain deposition flux is equivalent to between ca. 2% (PCBs) and 36% (Pb) of the outlet flux. With the exception of PCBs, for which the value is ≅0, the equivalent values for the River Swale range between 18% (P) and 95% (Pb). The study emphasises that knowledge of the fine-grained sediment delivery system operating in a river basin is an essential prerequisite for understanding the transport and storage of sediment-associated contaminants in river systems and that conveyance losses associated with floodplain deposition exert an important control on downstream contaminant fluxes and the fate of such contaminants. © 2003 Elsevier Science Ltd. All rights reserved.

Veja mais

Late-Pleistocene palaeoclimate and glacial activity recorded from lake sediments in the Eastern Alps

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Greenland ice core data show that the last glaciation in the Northern Hemisphere was characterized by relatively short and rapid warming-cooling cycles. While the Last Glacial Maximum (LGM) and the following Late Glacial are well documented in the Eastern Alps, continuous and well dated records of the time period preceding the LGM are only known from stalagmites. Although most of the sediment that filled the Alpine valleys prior to the LGM was eroded, thick successions have been locally preserved as terraces along the flanks of large longitudinal valleys. The Inn valley in Tyrol (Austria) offers the most striking examples of Pleistocene terraces in the Eastern Alps. A large number of drill cores provides the opportunity to study these sediments for the first time in great detail. Our study focuses on the river terrace of Unterangerberg near Wörgl, where LGM gravel and till were deposited on top of (glacio)lacustrine sediments. The cores comprise mostly silty material, ranging from organic-rich to organic-poor and dropstone-rich beds. A diamictic layer classified as basal till is present at the bottom of the lake sediments. Radiocarbon ages of plant macro remains from the lake sequences indicate deposition between ~40 and >50 cal. ka BP. Luminescence ages obtained from fine-grain polymineral (4-11 μm) samples suggest an age of the lake deposits between ~40 to 60 ka and are consistent with the radiocarbon dates. Sedimentological analyses indicate that sedimentation in these palaeolakes was driven by local processes, but also by climatically induced changes in nearby glacier activity. These observations strongly hint towards a significant ice advance in the Eastern Alps during the early last glacial and subsequent mild interstadial conditions, supporting a local coniferous forest vegetation.

Veja mais

Abundance and diversity of sedimentary bacterial communities in a coastal productive setting in the Western Irish Sea

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The bacterial community composition and biomass abundance from a depositional mud belt in the western Irish Sea and regional sands were investigated by phospholipid ester-linked fatty acid profiling, denaturing gradient gel electrophoresis and barcoded pyrosequencing of 16S rRNA genes. The study area varied by water depth (12-111 m), organic carbon content (0.09-1.57% TOC), grain size, hydrographic regime (well-mixed vs. stratified), and water column phytodetrital input (represented by algal polyunsaturated PLFA). The relative abundance of bacterial-derived PLFA (sum of methyl-branched, cyclopropyl and odd-carbon number PLFA) was positively correlated with fine-grained sediment, and was highest in the depositional mud belt. A strong association between bacterial biomass and eukaryote primary production was suggested based on observed positive correlations with total nitrogen and algal polyunsaturated fatty acids. In addition, 16S rRNA genes affiliated to the classes Clostridia and Flavobacteria represented a major proportion of total 16S rRNA gene sequences. This suggests that benthic bacterial communities are also important degraders of phytodetrital organic matter and closely coupled to water column productivity in the western Irish Sea.

Veja mais

Germanium bonding to AI2O3

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Germanium has been bonded to both single crystal Al2O 3 (sapphire) as well as fine grain Al2O3. A germanium to sapphire bonding energy of 3 J/m2 has been measured after a 200 °C bond anneal. Micro voids formed between the germanium/sapphire interface can be removed by employing an interfacial layer of silicon dioxide on either surface. Patterning the sapphire into a grid pattern prior to bonding creates an escape path for trapped gas or moisture allowing micro void free direct bonding to be achieved. Modifying the surface of the fine grain Al2O3 surface with a polycrystalline silicon deposition and polish creates a surface, having an rms roughness (measured over a 250© m2 area), of 1.5nm, suitable for bonding. Techniques employed in the germanium sapphire bonding can then be used in the bonding of fine grain A12O3 to germanium. © The Electrochemical Society.

Veja mais

Cache-Integrated Network Interfaces: Flexible On-Chip Communication and Synchronization for Large-Scale CMPs

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Per-core scratchpad memories (or local stores) allow direct inter-core communication, with latency and energy advantages over coherent cache-based communication, especially as CMP architectures become more distributed. We have designed cache-integrated network interfaces, appropriate for scalable multicores, that combine the best of two worlds – the flexibility of caches and the efficiency of scratchpad memories: on-chip SRAM is configurably shared among caching, scratchpad, and virtualized network interface (NI) functions. This paper presents our architecture, which provides local and remote scratchpad access, to either individual words or multiword blocks through RDMA copy. Furthermore, we introduce event responses, as a technique that enables software configurable communication and synchronization primitives. We present three event response mechanisms that expose NI functionality to software, for multiword transfer initiation, completion notifications for software selected sets of arbitrary size transfers, and multi-party synchronization queues. We implemented these mechanisms in a four-core FPGA prototype, and measure the logic overhead over a cache-only design for basic NI functionality to be less than 20%. We also evaluate the on-chip communication performance on the prototype, as well as the performance of synchronization functions with simulation of CMPs with up to 128 cores. We demonstrate efficient synchronization, low-overhead communication, and amortized-overhead bulk transfers, which allow parallelization gains for fine-grain tasks, and efficient exploitation of the hardware bandwidth.

Veja mais

High performance VLSI architecture for Wave Digital Filtering

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The application of fine grain pipelining techniques in the design of high performance Wave Digital Filters (WDFs) is described. It is shown that significant increases in the sampling rate of bit parallel circuits can be achieved using most significant bit (msb) first arithmetic. A novel VLSI architecture for implementing two-port adaptor circuits is described which embodies these ideas. The circuit in question is highly regular, uses msb first arithmetic and is implemented using simple carry-save adders. © 1992 Kluwer Academic Publishers.

Veja mais

Pipelined two-port adaptor for wave digital filtering

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The application of fine-grain pipelining techniques in the design of high-performance wave digital filters (WDFs) is described. The problems of latency in feedback loops can be significantly reduced if computations are organized most significant, as opposed to least significant, bit first and if the results are fed back as soon as they are formed. The result is that chips can be designed which offer significantly higher sampling rates than otherwise can be obtained using conventional methods. How these concepts can be extended to the more challenging problem of WDFs is discussed. It is shown that significant increases in the sampling rate of bit-parallel circuits can be achieved using most significant bit first arithmetic.

Veja mais

Targeting distributed systems in FastFlow

Relevância:

80.00% 80.00%

Publicador:

Resumo:

FastFlow is a structured parallel programming framework targeting shared memory multi-core architectures. In this paper we introduce a FastFlow extension aimed at supporting also a network of multi-core workstations. The extension supports the execution of FastFlow programs by coordinating-in a structured way-the fine grain parallel activities running on a single workstation. We discuss the design and the implementation of this extension presenting preliminary experimental results validating it on state-of-the-art networked multi-core nodes. © 2013 Springer-Verlag.

Veja mais

An efficient unbounded lock-free queue for multi-core systems

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The use of efficient synchronization mechanisms is crucial for implementing fine grained parallel programs on modern shared cache multi-core architectures. In this paper we study this problem by considering Single-Producer/Single- Consumer (SPSC) coordination using unbounded queues. A novel unbounded SPSC algorithm capable of reducing the row synchronization latency and speeding up Producer-Consumer coordination is presented. The algorithm has been extensively tested on a shared-cache multi-core platform and a sketch proof of correctness is presented. The queues proposed have been used as basic building blocks to implement the FastFlow parallel framework, which has been demonstrated to offer very good performance for fine-grain parallel applications. © 2012 Springer-Verlag.

Veja mais

Prefetching and Cache Management using Task Lifetimes

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Task-based dataflow programming models and runtimes emerge as promising candidates for programming multicore and manycore architectures. These programming models analyze dynamically task dependencies at runtime and schedule independent tasks concurrently to the processing elements. In such models, cache locality, which is critical for performance, becomes more challenging in the presence of fine-grain tasks, and in architectures with many simple cores.

This paper presents a combined hardware-software approach to improve cache locality and offer better performance is terms of execution time and energy in the memory system. We propose the explicit bulk prefetcher (EBP) and epoch-based cache management (ECM) to help runtimes prefetch task data and guide the replacement decisions in caches. The runtimem software can use this hardware support to expose its internal knowledge about the tasks to the architecture and achieve more efficient task-based execution. Our combined scheme outperforms HW-only prefetchers and state-of-the-art replacement policies, improves performance by an average of 17%, generates on average 26% fewer L2 misses, and consumes on average 28% less energy in the components of the memory system.

Veja mais

TProf: An energy profiler for task-parallel programs

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We present TProf, an energy profiling tool for OpenMP-like task-parallel programs. To compute the energy consumed by each task in a parallel application, TProf dynamically traces the parallel execution and uses a novel technique to estimate the per-task energy consumption. To achieve this estimation, TProf apportions the total processor energy among cores and overcomes the limitation of current works which would otherwise make parallel accounting impossible to achieve. We demonstrate the value of TProf by characterizing a set of task parallel programs, where we find that data locality, memory access patterns and task working sets are responsible for significant variance in energy consumption between seemingly homogeneous tasks. In addition, we identify opportunities for fine-grain energy optimization by applying per-task Dynamic Voltage and Frequency Scaling (DVFS).

Veja mais

20 resultados para Fine grain sediment

Filtro por publicador