84 resultados para Endurance running
Resumo:
Long running multi-physics coupled parallel applications have gained prominence in recent years. The high computational requirements and long durations of simulations of these applications necessitate the use of multiple systems of a Grid for execution. In this paper, we have built an adaptive middleware framework for execution of long running multi-physics coupled applications across multiple batch systems of a Grid. Our framework, apart from coordinating the executions of the component jobs of an application on different batch systems, also automatically resubmits the jobs multiple times to the batch queues to continue and sustain long running executions. As the set of active batch systems available for execution changes, our framework performs migration and rescheduling of components using a robust rescheduling decision algorithm. We have used our framework for improving the application throughput of a foremost long running multi-component application for climate modeling, the Community Climate System Model (CCSM). Our real multi-site experiments with CCSM indicate that Grid executions can lead to improved application throughput for climate models.
Resumo:
Abstract | Electrical switching which has applications in areas such as information storage, power control, etc is a scientifically interesting and technologically important phenomenon exhibited by glassy chalcogenide semiconductors. The phase change memories based on electrical switching appear to be the most promising next generation non-volatile memories, due to many attributes which include high endurance in write/read operations, shorter write/read time, high scalability, multi-bit capability, lower cost and a compatibility with complementary metal oxide semiconductor technology.Studies on the electrical switching behavior of chalcogenide glasses help us in identifying newer glasses which could be used for phase change memory applications. In particular, studies on the composition dependence of electrical switching parameters and investigations on the correlation between switching behavior with other material properties are necessary for the selection of proper compositions which make good memory materials.In this review, an attempt has been made to summarize the dependence of the electrical switching behavior of chalcogenide glasses with other material properties such as network topological effects, glass transition & crystallization temperature, activation energy for crystallization, thermal diffusivity, electrical resistivity and others.
Resumo:
This paper proposes a Petri net model for a commercial network processor (Intel iXP architecture) which is a multithreaded multiprocessor architecture. We consider and model three different applications viz., IPv4 forwarding, network address translation, and IP security running on IXP 2400/2850. A salient feature of the Petri net model is its ability to model the application, architecture and their interaction in great detail. The model is validated using the Intel proprietary tool (SDK 3.51 for IXP architecture) over a range of configurations. We conduct a detailed performance evaluation, identify the bottleneck resource, and propose a few architectural extensions and evaluate them in detail.
Resumo:
We consider the problem of computing a minimum cycle basis in a directed graph G. The input to this problem is a directed graph whose arcs have positive weights. In this problem a {- 1, 0, 1} incidence vector is associated with each cycle and the vector space over Q generated by these vectors is the cycle space of G. A set of cycles is called a cycle basis of G if it forms a basis for its cycle space. A cycle basis where the sum of weights of the cycles is minimum is called a minimum cycle basis of G. The current fastest algorithm for computing a minimum cycle basis in a directed graph with m arcs and n vertices runs in O(m(w+1)n) time (where w < 2.376 is the exponent of matrix multiplication). If one allows randomization, then an (O) over tilde (m(3)n) algorithm is known for this problem. In this paper we present a simple (O) over tilde (m(2)n) randomized algorithm for this problem. The problem of computing a minimum cycle basis in an undirected graph has been well-studied. In this problem a {0, 1} incidence vector is associated with each cycle and the vector space over F-2 generated by these vectors is the cycle space of the graph. The fastest known algorithm for computing a minimum cycle basis in an undirected graph runs in O(m(2)n + mn(2) logn) time and our randomized algorithm for directed graphs almost matches this running time.
Resumo:
In developing countries, a high rate of growth in the demand for electric energy is felt, and so the addition of new generating units becomes inevitable. In deregulated power systems, private generating stations are encouraged to add new generations. Some of the factors considered while placing a new generating unit are: availability of esources, ease of transmitting power, distance from the load centre, etc. Finding the most appropriate locations for generation expansion can be done by running repeated power flows and carrying system studies like analyzing the voltage profile, voltage stability, loss analysis, etc. In this paper a new methodology is proposed which will mainly consider the existing network topology. A concept of T-index is introduced in this paper, which considers the electrical distances between generator and load nodes. This index is used for ranking the most significant new generation expansion locations and also indicates the amount of permissible generations that can be installed at these new locations. This concept facilitates for the medium and long term planning of power generation expansions within the available transmission corridors. Studies carried out on an EHV equivalent 10-bus system and IEEE 30 bus systems are presented for illustration purposes.
Resumo:
There has been substantial public debate recently on a host of issues such as climate change, genetically modified crops and nuclear power; a common theme running through these issues involves science and public policy. This note will be based broadly on three interlinked themes: growth of specialization in science, significant commercial interests pushing science and technology, and a checkered track record of the promises made and the broken-reality.
Resumo:
Today's SoCs are complex designs with multiple embedded processors, memory subsystems, and application specific peripherals. The memory architecture of embedded SoCs strongly influences the power and performance of the entire system. Further, the memory subsystem constitutes a major part (typically up to 70%) of the silicon area for the current day SoC. In this article, we address the on-chip memory architecture exploration for DSP processors which are organized as multiple memory banks, where banks can be single/dual ported with non-uniform bank sizes. In this paper we propose two different methods for physical memory architecture exploration and identify the strengths and applicability of these methods in a systematic way. Both methods address the memory architecture exploration for a given target application by considering the application's data access characteristics and generates a set of Pareto-optimal design points that are interesting from a power, performance and VLSI area perspective. To the best of our knowledge, this is the first comprehensive work on memory space exploration at physical memory level that integrates data layout and memory exploration to address the system objectives from both hardware design and application software development perspective. Further we propose an automatic framework that explores the design space identifying 100's of Pareto-optimal design points within a few hours of running on a standard desktop configuration.
Resumo:
We present two online algorithms for maintaining a topological order of a directed n-vertex acyclic graph as arcs are added, and detecting a cycle when one is created. Our first algorithm handles m arc additions in O(m(3/2)) time. For sparse graphs (m/n = O(1)), this bound improves the best previous bound by a logarithmic factor, and is tight to within a constant factor among algorithms satisfying a natural locality property. Our second algorithm handles an arbitrary sequence of arc additions in O(n(5/2)) time. For sufficiently dense graphs, this bound improves the best previous bound by a polynomial factor. Our bound may be far from tight: we show that the algorithm can take Omega(n(2)2 root(2lgn)) time by relating its performance to a generalization of the k-levels problem of combinatorial geometry. A completely different algorithm running in Theta (n(2) log n) time was given recently by Bender, Fineman, and Gilbert. We extend both of our algorithms to the maintenance of strong components, without affecting the asymptotic time bounds.
Resumo:
Software transactional memory (STM) is a promising programming paradigm for shared memory multithreaded programs. In order for STMs to be adopted widely for performance critical software, understanding and improving the cache performance of applications running on STM becomes increasingly crucial, as the performance gap between processor and memory continues to grow. In this paper, we present the most detailed experimental evaluation to date, of the cache behavior of STM applications and quantify the impact of the different STM factors on the cache misses experienced by the applications. We find that STMs are not cache friendly, with the data cache stall cycles contributing to more than 50% of the execution cycles in a majority of the benchmarks. We find that on an average, misses occurring inside the STM account for 62% of total data cache miss latency cycles experienced by the applications and the cache performance is impacted adversely due to certain inherent characteristics of the STM itself. The above observations motivate us to propose a set of specific compiler transformations targeted at making the STMs cache friendly. We find that STM's fine grained and application unaware locking is a major contributor to its poor cache behavior. Hence we propose selective Lock Data co-location (LDC) and Redundant Lock Access Removal (RLAR) to address the lock access misses. We find that even transactions that are completely disjoint access parallel, suffer from costly coherence misses caused by the centralized global time stamp updates and hence we propose the Selective Per-Partition Time Stamp (SPTS) transformation to address this. We show that our transformations are effective in improving the cache behavior of STM applications by reducing the data cache miss latency by 20.15% to 37.14% and improving execution time by 18.32% to 33.12% in five of the 8 STAMP applications.
Resumo:
Exascale systems of the future are predicted to have mean time between failures (MTBF) of less than one hour. Malleable applications, where the number of processors on which the applications execute can be changed during executions, can make use of their malleability to better tolerate high failure rates. We present AdFT, an adaptive fault tolerance framework for long running malleable applications to maximize application performance in the presence of failures. AdFT framework includes cost models for evaluating the benefits of various fault tolerance actions including checkpointing, live-migration and rescheduling, and runtime decisions for dynamically selecting the fault tolerance actions at different points of application execution to maximize performance. Simulations with real and synthetic failure traces show that our approach outperforms existing fault tolerance mechanisms for malleable applications yielding up to 23% improvement in application performance, and is effective even for petascale systems and beyond.
Resumo:
The presence of new matter fields charged under the Standard Model gauge group at intermediate scales below the Grand Unification scale modifies the renormalization group evolution of the gauge couplings. This can in turn significantly change the running of the Minimal Supersymmetric Standard Model parameters, in particular the gaugino and the scalar masses. In the absence of new large Yukawa couplings we can parameterise all the intermediate scale models in terms of only two parameters controlling the size of the unified gauge coupling. As a consequence of the modified running, the low energy spectrum can be strongly affected with interesting phenomenological consequences. In particular, we show that scalar over gaugino mass ratios tend to increase and the regions of the parameter space with neutralino Dark Matter compatible with cosmological observations get drastically modified. Moreover, we discuss some observables that can be used to test the intermediate scale physics at the LHC in a wide class of models.
Resumo:
Low density parity-check (LDPC) codes are a class of linear block codes that are decoded by running belief propagation (BP) algorithm or log-likelihood ratio belief propagation (LLR-BP) over the factor graph of the code. One of the disadvantages of LDPC codes is the onset of an error floor at high values of signal to noise ratio caused by trapping sets. In this paper, we propose a two stage decoder to deal with different types of trapping sets. Oscillating trapping sets are taken care by the first stage of the decoder and the elementary trapping sets are handled by the second stage of the decoder. Simulation results on the regular PEG (504,252,3,6) code and the irregular PEG (1024,518,15,8) code shows that the proposed two stage decoder performs significantly better than the standard decoder.
Resumo:
We implement two energy models that accurately and comprehensively estimates the system energy cost and communication energy cost for using Bluetooth and Wi-Fi interfaces. The energy models running on a system is used to smartly pick the most energy optimal network interface so that data transfer between two end points is maximized.
Resumo:
The problem of human detection is challenging, more so, when faced with adverse conditions such as occlusion and background clutter. This paper addresses the problem of human detection by representing an extracted feature of an image using a sparse linear combination of chosen dictionary atoms. The detection along with the scale finding, is done by using the coefficients obtained from sparse representation. This is of particular interest as we address the problem of scale using a scale-embedded dictionary where the conventional methods detect the object by running the detection window at all scales.
Resumo:
Most of the existing WCET estimation methods directly estimate execution time, ET, in cycles. We propose to study ET as a product of two factors, ET = IC * CPI, where IC is instruction count and CPI is cycles per instruction. Considering directly the estimation of ET may lead to a highly pessimistic estimate since implicitly these methods may be using worst case IC and worst case CPI. We hypothesize that there exists a functional relationship between CPI and IC such that CPI=f(IC). This is ascertained by computing the covariance matrix and studying the scatter plots of CPI versus IC. IC and CPI values are obtained by running benchmarks with a large number of inputs using the cycle accurate architectural simulator, Simplescalar on two different architectures. It is shown that the benchmarks can be grouped into different classes based on the CPI versus IC relationship. For some benchmarks like FFT, FIR etc., both IC and CPI are almost a constant irrespective of the input. There are other benchmarks that exhibit a direct or an inverse relationship between CPI and IC. In such a case, one can predict CPI for a given IC as CPI=f(IC). We derive the theoretical worst case IC for a program, denoted as SWIC, using integer linear programming(ILP) and estimate WCET as SWIC*f(SWIC). However, if CPI decreases sharply with IC then measured maximum cycles is observed to be a better estimate. For certain other benchmarks, it is observed that the CPI versus IC relationship is either random or CPI remains constant with varying IC. In such cases, WCET is estimated as the product of SWIC and measured maximum CPI. It is observed that use of the proposed method results in tighter WCET estimates than Chronos, a static WCET analyzer, for most benchmarks for the two architectures considered in this paper.