994 resultados para Adaptive reuse
Resumo:
Long running multi-physics coupled parallel applications have gained prominence in recent years. The high computational requirements and long durations of simulations of these applications necessitate the use of multiple systems of a Grid for execution. In this paper, we have built an adaptive middleware framework for execution of long running multi-physics coupled applications across multiple batch systems of a Grid. Our framework, apart from coordinating the executions of the component jobs of an application on different batch systems, also automatically resubmits the jobs multiple times to the batch queues to continue and sustain long running executions. As the set of active batch systems available for execution changes, our framework performs migration and rescheduling of components using a robust rescheduling decision algorithm. We have used our framework for improving the application throughput of a foremost long running multi-component application for climate modeling, the Community Climate System Model (CCSM). Our real multi-site experiments with CCSM indicate that Grid executions can lead to improved application throughput for climate models.
Resumo:
Chronic recording of neural signals is indispensable in designing efficient brain–machine interfaces and to elucidate human neurophysiology. The advent of multichannel micro-electrode arrays has driven the need for electronics to record neural signals from many neurons. The dynamic range of the system can vary over time due to change in electrode–neuron distance and background noise. We propose a neural amplifier in UMC 130 nm, 1P8M complementary metal–oxide–semiconductor (CMOS) technology. It can be biased adaptively from 200 nA to 2 $mu{rm A}$, modulating input referred noise from 9.92 $mu{rm V}$ to 3.9 $mu{rm V}$. We also describe a low noise design technique which minimizes the noise contribution of the load circuitry. Optimum sizing of the input transistors minimizes the accentuation of the input referred noise of the amplifier and obviates the need of large input capacitance. The amplifier achieves a noise efficiency factor of 2.58. The amplifier can pass signal from 5 Hz to 7 kHz and the bandwidth of the amplifier can be tuned for rejecting low field potentials (LFP) and power line interference. The amplifier achieves a mid-band voltage gain of 37 dB. In vitro experiments are performed to validate the applicability of the neural low noise amplifier in neural recording systems.
Resumo:
High frequency PWM inverters produce an output voltage spectrum at the fundamental reference frequency and around the switching frequency. Thus ideally PWM inverters do not introduce any significant lower order harmonics. However, in real systems, due to dead-time effect, device drops and other non-idealities lower order harmonics are present. In order to attenuate these lower order harmonics and hence to improve the quality of output current, this paper presents an \emph{adaptive harmonic elimination technique}. This technique uses an adaptive filter to estimate a particular harmonic that is to be attenuated and generates a voltage reference which will be added to the voltage reference produced by the current control loop of the inverter. This would have an effect of cancelling the voltage that was producing the particular harmonic. The effectiveness and the limitations of the technique are verified experimentally in a single phase PWM inverter in stand-alone as well as g rid interactive modes of operation.
Resumo:
We address the problem of estimating instantaneous frequency (IF) of a real-valued constant amplitude time-varying sinusoid. Estimation of polynomial IF is formulated using the zero-crossings of the signal. We propose an algorithm to estimate nonpolynomial IF by local approximation using a low-order polynomial, over a short segment of the signal. This involves the choice of window length to minimize the mean square error (MSE). The optimal window length found by directly minimizing the MSE is a function of the higher-order derivatives of the IF which are not available a priori. However, an optimum solution is formulated using an adaptive window technique based on the concept of intersection of confidence intervals. The adaptive algorithm enables minimum MSE-IF (MMSE-IF) estimation without requiring a priori information about the IF. Simulation results show that the adaptive window zero-crossing-based IF estimation method is superior to fixed window methods and is also better than adaptive spectrogram and adaptive Wigner-Ville distribution (WVD)-based IF estimators for different signal-to-noise ratio (SNR).
Resumo:
Instruction reuse is a microarchitectural technique that improves the execution time of a program by removing redundant computations at run-time. Although this is the job of an optimizing compiler, they do not succeed many a time due to limited knowledge of run-time data. In this paper we examine instruction reuse of integer ALU and load instructions in network processing applications. Specifically, this paper attempts to answer the following questions: (1) How much of instruction reuse is inherent in network processing applications?, (2) Can reuse be improved by reducing interference in the reuse buffer?, (3) What characteristics of network applications can be exploited to improve reuse?, and (4) What is the effect of reuse on resource contention and memory accesses? We propose an aggregation scheme that combines the high-level concept of network traffic i.e. "flows" with a low level microarchitectural feature of programs i.e. repetition of instructions and data along with an architecture that exploits temporal locality in incoming packet data to improve reuse. We find that for the benchmarks considered, 1% to 50% of instructions are reused while the speedup achieved varies between 1% and 24%. As a side effect, instruction reuse reduces memory traffic and can therefore be considered as a scheme for low power.
Resumo:
Exascale systems of the future are predicted to have mean time between failures (MTBF) of less than one hour. Malleable applications, where the number of processors on which the applications execute can be changed during executions, can make use of their malleability to better tolerate high failure rates. We present AdFT, an adaptive fault tolerance framework for long running malleable applications to maximize application performance in the presence of failures. AdFT framework includes cost models for evaluating the benefits of various fault tolerance actions including checkpointing, live-migration and rescheduling, and runtime decisions for dynamically selecting the fault tolerance actions at different points of application execution to maximize performance. Simulations with real and synthetic failure traces show that our approach outperforms existing fault tolerance mechanisms for malleable applications yielding up to 23% improvement in application performance, and is effective even for petascale systems and beyond.
Resumo:
This paper presents a singular edge-based smoothed finite element method (sES-FEM) for mechanics problems with singular stress fields of arbitrary order. The sES-FEM uses a basic mesh of three-noded linear triangular (T3) elements and a special layer of five-noded singular triangular elements (sT5) connected to the singular-point of the stress field. The sT5 element has an additional node on each of the two edges connected to the singular-point. It allows us to represent simple and efficient enrichment with desired terms for the displacement field near the singular-point with the satisfaction of partition-of-unity property. The stiffness matrix of the discretized system is then obtained using the assumed displacement values (not the derivatives) over smoothing domains associated with the edges of elements. An adaptive procedure for the sES-FEM is proposed to enhance the quality of the solution with minimized number of nodes. Several numerical examples are provided to validate the reliability of the present sES-FEM method. (C) 2012 Elsevier B.V. All rights reserved.
Resumo:
A low complexity, essentially-ML decoding technique for the Golden code and the three antenna Perfect code was introduced by Sirianunpiboon, Howard and Calderbank. Though no theoretical analysis of the decoder was given, the simulations showed that this decoding technique has almost maximum-likelihood (ML) performance. Inspired by this technique, in this paper we introduce two new low complexity decoders for Space-Time Block Codes (STBCs)-the Adaptive Conditional Zero-Forcing (ACZF) decoder and the ACZF decoder with successive interference cancellation (ACZF-SIC), which include as a special case the decoding technique of Sirianunpiboon et al. We show that both ACZF and ACZF-SIC decoders are capable of achieving full-diversity, and we give a set of sufficient conditions for an STBC to give full-diversity with these decoders. We then show that the Golden code, the three and four antenna Perfect codes, the three antenna Threaded Algebraic Space-Time code and the four antenna rate 2 code of Srinath and Rajan are all full-diversity ACZF/ACZF-SIC decodable with complexity strictly less than that of their ML decoders. Simulations show that the proposed decoding method performs identical to ML decoding for all these five codes. These STBCs along with the proposed decoding algorithm have the least decoding complexity and best error performance among all known codes for transmit antennas. We further provide a lower bound on the complexity of full-diversity ACZF/ACZF-SIC decoding. All the five codes listed above achieve this lower bound and hence are optimal in terms of minimizing the ACZF/ACZF-SIC decoding complexity. Both ACZF and ACZF-SIC decoders are amenable to sphere decoding implementation.
Resumo:
Most Java programmers would agree that Java is a language that promotes a philosophy of “create and go forth”. By design, temporary objects are meant to be created on the heap, possibly used and then abandoned to be collected by the garbage collector. Excessive generation of temporary objects is termed “object churn” and is a form of software bloat that often leads to performance and memory problems. To mitigate this problem, many compiler optimizations aim at identifying objects that may be allocated on the stack. However, most such optimizations miss large opportunities for memory reuse when dealing with objects inside loops or when dealing with container objects. In this paper, we describe a novel algorithm that detects bloat caused by the creation of temporary container and String objects within a loop. Our analysis determines which objects created within a loop can be reused. Then we describe a source-to-source transformation that efficiently reuses such objects. Empirical evaluation indicates that our solution can reduce upto 40% of temporary object allocations in large programs, resulting in a performance improvement that can be as high as a 20% reduction in the run time, specifically when a program has a high churn rate or when the program is memory intensive and needs to run the GC often.
Resumo:
Service systems are labor intensive. Further, the workload tends to vary greatly with time. Adapting the staffing levels to the workloads in such systems is nontrivial due to a large number of parameters and operational variations, but crucial for business objectives such as minimal labor inventory. One of the central challenges is to optimize the staffing while maintaining system steady-state and compliance to aggregate SLA constraints. We formulate this problem as a parametrized constrained Markov process and propose a novel stochastic optimization algorithm for solving it. Our algorithm is a multi-timescale stochastic approximation scheme that incorporates a SPSA based algorithm for ‘primal descent' and couples it with a ‘dual ascent' scheme for the Lagrange multipliers. We validate this optimization scheme on five real-life service systems and compare it with a state-of-the-art optimization tool-kit OptQuest. Being two orders of magnitude faster than OptQuest, our scheme is particularly suitable for adaptive labor staffing. Also, we observe that it guarantees convergence and finds better solutions than OptQuest in many cases.
Resumo:
There are many wireless sensor network(WSN) applications which require reliable data transfer between the nodes. Several techniques including link level retransmission, error correction methods and hybrid Automatic Repeat re- Quest(ARQ) were introduced into the wireless sensor networks for ensuring reliability. In this paper, we use Automatic reSend request(ASQ) technique with regular acknowledgement to design reliable end-to-end communication protocol, called Adaptive Reliable Transport(ARTP) protocol, for WSNs. Besides ensuring reliability, objective of ARTP protocol is to ensure message stream FIFO at the receiver side instead of the byte stream FIFO used in TCP/IP protocol suite. To realize this objective, a new protocol stack has been used in the ARTP protocol. The ARTP protocol saves energy without affecting the throughput by sending three different types of acknowledgements, viz. ACK, NACK and FNACK with semantics different from that existing in the literature currently and adapting to the network conditions. Additionally, the protocol controls flow based on the receiver's feedback and congestion by holding ACK messages. To the best of our knowledge, there has been little or no attempt to build a receiver controlled regularly acknowledged reliable communication protocol. We have carried out extensive simulation studies of our protocol using Castalia simulator, and the study shows that our protocol performs better than related protocols in wireless/wire line networks, in terms of throughput and energy efficiency.
Resumo:
Advances in technology have increased the number of cores and size of caches present on chip multicore platforms(CMPs). As a result, leakage power consumption of on-chip caches has already become a major power consuming component of the memory subsystem. We propose to reduce leakage power consumption in static nonuniform cache architecture(SNUCA) on a tiled CMP by dynamically varying the number of cache slices used and switching off unused cache slices. A cache slice in a tile includes all cache banks present in that tile. Switched-off cache slices are remapped considering the communication costs to reduce cache usage with minimal impact on execution time. This saves leakage power consumption in switched-off L2 cache slices. On an average, there map policy achieves 41% and 49% higher EDP savings compared to static and dynamic NUCA (DNUCA) cache policies on a scalable tiled CMP, respectively.