218 resultados para 291605 Processor Architectures
Resumo:
Sensor nodes with energy harvesting sources are gaining popularity due to their ability to improve the network life time and are becoming a preferred choice supporting `green communication'. We study such a sensor node with an energy harvesting source and compare various architectures by which the harvested energy is used. We find its Shannon capacity when it is transmitting its observations over an AWGN channel and show that the capacity achieving energy management policies are related to the throughput optimal policies. We also obtain the capacity when energy conserving sleep-wake modes are supported and an achievable rate for the system with inefficiencies in energy storage.
Resumo:
Video decoders used in emerging applications need to be flexible to handle a large variety of video formats and deliver scalable performance to handle wide variations in workloads. In this paper we propose a unified software and hardware architecture for video decoding to achieve scalable performance with flexibility. The light weight processor tiles and the reconfigurable hardware tiles in our architecture enable software and hardware implementations to co-exist, while a programmable interconnect enables dynamic interconnection of the tiles. Our process network oriented compilation flow achieves realization agnostic application partitioning and enables seamless migration across uniprocessor, multi-processor, semi hardware and full hardware implementations of a video decoder. An application quality of service aware scheduler monitors and controls the operation of the entire system. We prove the concept through a prototype of the architecture on an off-the-shelf FPGA. The FPGA prototype shows a scaling in performance from QCIF to 1080p resolutions in four discrete steps. We also demonstrate that the reconfiguration time is short enough to allow migration from one configuration to the other without any frame loss.
Resumo:
An in situ seeding growth methodology towards the preparation of core-shell nanoparticles composed of noble metals has been developed by employing trimethylamine borane (TMAB) as the reducing agent. Being a weak reducing agent, TMAB is able to distinguish the smallest reduction potential window of any two metals which renders selective reduction of metal ions thus affording a core-shell architecture of the nanoparticles. A dramatic effect of solvent was noted during the reduction of Ag+ ions: an immediate reduction took place at room temperature when dry THF was used as solvent however, usage of wet THF (THF used directly from the bottle) brings out the reduction only at reflux conditions. In the case of Au and Pd nanoparticles, preparation was found to be independent of the quality of solvent used. Au nanoparticles are realized at room temperature whereas reflux conditions are required in the case of Pd nanoparticles. This difference in behavior of the monometallic nanoparticles was successfully exploited to construct different noble metal nanoparticles with core-shell architectures such as Au@Ag, Ag@Au, and Ag@Pd. Transformation of these core-shell nanoparticles to their thermodynamically stable alloy counterparts is also demonstrated under very mild conditions reported to date.
Resumo:
Managing heat produced by computer processors is an important issue today, especially when the size of processors is decreasing rapidly while the number of transistors in the processor is increasing rapidly. This poster describes a preliminary study of the process of adding carbon nanotubes (CNTs) to a standard silicon paste covering a CPU. Measurements were made in two rounds of tests to compare the rate of cool-down with and without CNTs present. The silicon paste acts as an interface between the CPU and the heat sink, increasing the heat transfer rate away from the CPU. To the silicon paste was added 0.05% by weight of CNTs. These were not aligned. A series of K-type thermocouples was used to measure the temperature as a function of time in the vicinity of the CPU, following its shut-off. An Omega data acquisition system was attached to the thermocouples. The CPU temperature was not measured directly because attachment of a thermocouple would have prevented its automatic shut-off A thermocouple in the paste containing the CNTs actually reached a higher temperature than the standard paste, an effect easily explained. But the rate of cooling with the CNTs was about 4.55% better.
Resumo:
Single-carrier frequency division multiple access (SC-FDMA) has become a popular alternative to orthogonal frequency division multiple access (OFDMA) in multiuser communication on the uplink. This is mainly due to the low peak-to-average power ratio (PAPR) of SC-FDMA compared to that of OFDMA. Long-term evolution (LTE) uses SC-FDMA on the uplink to exploit this PAPR advantage to reduce transmit power amplifier backoff in user terminals. In this paper, we show that SC-FDMA can be beneficially used for multiuser communication on the downlink as well. We present SC-FDMA transmit and receive signaling architectures for multiuser communication on the downlink. The benefits of using SC-FDMA on the downlink are that SC-FDMA can achieve i) significantly better bit error rate (BER) performance at the user terminal compared to OFDMA, and ii) improved PAPR compared to OFDMA which reduces base station (BS) power amplifier backoff (making BSs more green). SC-FDMA receiver needs to do joint equalization, which can be carried out using low complexity equalization techniques. For this, we present a local neighborhood search based equalization algorithm for SC-FDMA. This algorithm is very attractive both in complexity as well as performance. We present simulation results that establish the PAPR and BER performance advantage of SC-FDMA over OFDMA in multiuser SISO/MIMO downlink as well as in large-scale multiuser MISO downlink with tens to hundreds of antennas at the BS.
Resumo:
Estimating program worst case execution time(WCET) accurately and efficiently is a challenging task. Several programs exhibit phase behavior wherein cycles per instruction (CPI) varies in phases during execution. Recent work has suggested the use of phases in such programs to estimate WCET with minimal instrumentation. However the suggested model uses a function of mean CPI that has no probabilistic guarantees. We propose to use Chebyshev's inequality that can be applied to any arbitrary distribution of CPI samples, to probabilistically bound CPI of a phase. Applying Chebyshev's inequality to phases that exhibit high CPI variation leads to pessimistic upper bounds. We propose a mechanism that refines such phases into sub-phases based on program counter(PC) signatures collected using profiling and also allows the user to control variance of CPI within a sub-phase. We describe a WCET analyzer built on these lines and evaluate it with standard WCET and embedded benchmark suites on two different architectures for three chosen probabilities, p={0.9, 0.95 and 0.99}. For p= 0.99, refinement based on PC signatures alone, reduces average pessimism of WCET estimate by 36%(77%) on Arch1 (Arch2). Compared to Chronos, an open source static WCET analyzer, the average improvement in estimates obtained by refinement is 5%(125%) on Arch1 (Arch2). On limiting variance of CPI within a sub-phase to {50%, 10%, 5% and 1%} of its original value, average accuracy of WCET estimate improves further to {9%, 11%, 12% and 13%} respectively, on Arch1. On Arch2, average accuracy of WCET improves to 159% when CPI variance is limited to 50% of its original value and improvement is marginal beyond that point.
Resumo:
Accurate and timely prediction of weather phenomena, such as hurricanes and flash floods, require high-fidelity compute intensive simulations of multiple finer regions of interest within a coarse simulation domain. Current weather applications execute these nested simulations sequentially using all the available processors, which is sub-optimal due to their sub-linear scalability. In this work, we present a strategy for parallel execution of multiple nested domain simulations based on partitioning the 2-D processor grid into disjoint rectangular regions associated with each domain. We propose a novel combination of performance prediction, processor allocation methods and topology-aware mapping of the regions on torus interconnects. Experiments on IBM Blue Gene systems using WRF show that the proposed strategies result in performance improvement of up to 33% with topology-oblivious mapping and up to additional 7% with topology-aware mapping over the default sequential strategy.
Resumo:
Exploiting the performance potential of GPUs requires managing the data transfers to and from them efficiently which is an error-prone and tedious task. In this paper, we develop a software coherence mechanism to fully automate all data transfers between the CPU and GPU without any assistance from the programmer. Our mechanism uses compiler analysis to identify potential stale accesses and uses a runtime to initiate transfers as necessary. This allows us to avoid redundant transfers that are exhibited by all other existing automatic memory management proposals. We integrate our automatic memory manager into the X10 compiler and runtime, and find that it not only results in smaller and simpler programs, but also eliminates redundant memory transfers. Tested on eight programs ported from the Rodinia benchmark suite it achieves (i) a 1.06x speedup over hand-tuned manual memory management, and (ii) a 1.29x speedup over another recently proposed compiler--runtime automatic memory management system. Compared to other existing runtime-only and compiler-only proposals, it also transfers 2.2x to 13.3x less data on average.
Resumo:
The twin demands of energy-efficiency and higher performance on DRAM are highly emphasized in multicore architectures. A variety of schemes have been proposed to address either the latency or the energy consumption of DRAMs. These schemes typically require non-trivial hardware changes and end up improving latency at the cost of energy or vice-versa. One specific DRAM performance problem in multicores is that interleaved accesses from different cores can potentially degrade row-buffer locality. In this paper, based on the temporal and spatial locality characteristics of memory accesses, we propose a reorganization of the existing single large row-buffer in a DRAM bank into multiple sub-row buffers (MSRB). This re-organization not only improves row hit rates, and hence the average memory latency, but also brings down the energy consumed by the DRAM. The first major contribution of this work is proposing such a reorganization without requiring any significant changes to the existing widely accepted DRAM specifications. Our proposed reorganization improves weighted speedup by 35.8%, 14.5% and 21.6% in quad, eight and sixteen core workloads along with a 42%, 28% and 31% reduction in DRAM energy. The proposed MSRB organization enables opportunities for the management of multiple row-buffers at the memory controller level. As the memory controller is aware of the behaviour of individual cores it allows us to implement coordinated buffer allocation schemes for different cores that take into account program behaviour. We demonstrate two such schemes, namely Fairness Oriented Allocation and Performance Oriented Allocation, which show the flexibility that memory controllers can now exploit in our MSRB organization to improve overall performance and/or fairness. Further, the MSRB organization enables additional opportunities for DRAM intra-bank parallelism and selective early precharging of the LRU row-buffer to further improve memory access latencies. These two optimizations together provide an additional 5.9% performance improvement.
Resumo:
An organometallic building block 1,3,5-tris(4-trans-Pt(PEt3)(2)I(ethynyl)phenyl)benzene (1) incorporating Pt-ethynyl functionality has been synthesized and characterized. 2 + 3] self-assembly of its nitrate analogue 1,3,5-tris(4-trans-Pt(PEt3)(2)(ONO2)(ethynyl)phenyl)benzene (2) with ``clip'' type bidentate donors (L-1-L-3) separately afforded three trigonal prismatic architectures (3a-3c), respectively. All these prisms were characterized and their shapes/sizes are predicted through geometry optimization employing molecular mechanics universal force field (MMUFF) simulation. The extended p-conjugation including the presence of Pt-ethynyl functionality makes them electron rich as well as luminescent in nature. Macrocycles 3b and 3c exhibit fluorescence quenching in solution upon addition of picric acid PA], which is a common constituent of many explosives. Interestingly, the non-responsive nature of fluorescent intensity towards other electron-deficient nitro-aromatic explosives (NAEs) makes them promising selective sensors for PA with a detection limit predicted to be ppb level. Furthermore, solid-state quenching of fluorescent intensity of the thin film of 3b upon exposure to saturated vapor of picric acid has drawn special attention for infield applications.
Resumo:
Energy research is to a large extent materials research, encompassing the physics and chemistry of materials, including their synthesis, processing toward components and design toward architectures, allowing for their functionality as energy devices, extending toward their operation parameters and environment, including also their degradation, limited life, ultimate failure and potential recycling. In all these stages, X-ray and electron spectroscopy are helpful methods for analysis, characterization and diagnostics for the engineer and for the researcher working in basic science.This paper gives a short overview of experiments with X-ray and electron spectroscopy for solar energy and water splitting materials and addresses also the issue of solar fuel, a relatively new topic in energy research. The featured systems are iron oxide and tungsten oxide as photoanodes, and hydrogenases as molecular systems. We present surface and subsurface studies with ambient pressure XPS and hard X-ray XPS, resonant photoemission, light induced effects in resonant photoemission experiments and a photo-electrochemical in situ/operando NEXAFS experiment in a liquid cell, and nuclear resonant vibrational spectroscopy (NRVS). (C) 2012 Elsevier B.V. All rights reserved.
Resumo:
Two Pd-6 molecular aggregates (1 and 2), self-sorted via a template-free three-component self-assembly process, represent new examples of discrete architectures exhibiting very high proton conductivity 0.78 x 10(-3) S cm(-1) (1) and 0.22 X 10(-3) S cm(-1) (2)] at 300 K at low relative humidity (B46%) with low activation energy comparable to that of currently used Nafion in fuel cells.
Resumo:
Recent years have seen a tremendous increase in the interest for constructing hollowed-out molecular frameworks, for their potential uses. Metal-ligand coordination-driven self-assembly has provided multitudes of opportunities in the formation of molecular architectures of desired shapes and sizes, with the help of the information already coded in the components. This article summarizes the recent developments in the construction of multicomponent molecular cages through this process, with a focus on the decreasing relevance of templates, and use of these systems in catalysis/host-guest chemistry.
Resumo:
We discuss the computational bottlenecks in molecular dynamics (MD) and describe the challenges in parallelizing the computation-intensive tasks. We present a hybrid algorithm using MPI (Message Passing Interface) with OpenMP threads for parallelizing a generalized MD computation scheme for systems with short range interatomic interactions. The algorithm is discussed in the context of nano-indentation of Chromium films with carbon indenters using the Embedded Atom Method potential for Cr-Cr interaction and the Morse potential for Cr-C interactions. We study the performance of our algorithm for a range of MPI-thread combinations and find the performance to depend strongly on the computational task and load sharing in the multi-core processor. The algorithm scaled poorly with MPI and our hybrid schemes were observed to outperform the pure message passing scheme, despite utilizing the same number of processors or cores in the cluster. Speed-up achieved by our algorithm compared favorably with that achieved by standard MD packages. (C) 2013 Elsevier Inc. All rights reserved.
Resumo:
We model communication of bursty sources: 1) over multiaccess channels, with either independent decoding or joint decoding and 2) over degraded broadcast channels, by a discrete-time multiclass processor sharing queue. We utilize error exponents to give a characterization of the processor sharing queue. We analyze the processor sharing queue model for the stable region of message arrival rates, and show the existence of scheduling policies for which the stability region converges to the information-theoretic capacity region in an appropriate limiting sense.