396 resultados para Supercomputer
Resumo:
Relentless CMOS scaling coupled with lower design tolerances is making ICs increasingly susceptible to wear-out related permanent faults and transient faults, necessitating on-chip fault tolerance in future chip microprocessors (CMPs). In this paper, we describe a power-efficient architecture for redundant execution on chip multiprocessors (CMPs) which when coupled with our per-core dynamic voltage and frequency scaling (DVFS) algorithm significantly reduces the energy overhead of redundant execution without sacrificing performance. Our evaluation shows that this architecture has a performance overhead of only 0.3% and consumes only 1.48 times the energy of a non-fault-tolerant baseline.
Resumo:
Superscalar processors currently have the potential to fetch multiple basic blocks per cycle by employing one of several recently proposed instruction fetch mechanisms. However, this increased fetch bandwidth cannot be exploited unless pipeline stages further downstream correspondingly improve. In particular,register renaming a large number of instructions per cycle is diDcult. A large instruction window, needed to receive multiple basic blocks per cycle, will slow down dependence resolution and instruction issue. This paper addresses these and related issues by proposing (i) partitioning of the instruction window into multiple blocks, each holding a dynamic code sequence; (ii) logical partitioning of the registerjle into a global file and several local jles, the latter holding registers local to a dynamic code sequence; (iii) the dynamic recording and reuse of register renaming information for registers local to a dynamic code sequence. Performance studies show these mechanisms improve performance over traditional superscalar processors by factors ranging from 1.5 to a little over 3 for the SPEC Integer programs. Next, it is observed that several of the loops in the benchmarks display vector-like behavior during execution, even if the static loop bodies are likely complex for compile-time vectorization. A dynamic loop vectorization mechanism that builds on top of the above mechanisms is briefly outlined. The mechanism vectorizes up to 60% of the dynamic instructions for some programs, albeit the average number of iterations per loop is quite small.
Resumo:
Long running multi-physics coupled parallel applications have gained prominence in recent years. The high computational requirements and long durations of simulations of these applications necessitate the use of multiple systems of a Grid for execution. In this paper, we have built an adaptive middleware framework for execution of long running multi-physics coupled applications across multiple batch systems of a Grid. Our framework, apart from coordinating the executions of the component jobs of an application on different batch systems, also automatically resubmits the jobs multiple times to the batch queues to continue and sustain long running executions. As the set of active batch systems available for execution changes, our framework performs migration and rescheduling of components using a robust rescheduling decision algorithm. We have used our framework for improving the application throughput of a foremost long running multi-component application for climate modeling, the Community Climate System Model (CCSM). Our real multi-site experiments with CCSM indicate that Grid executions can lead to improved application throughput for climate models.
Resumo:
Video streaming applications have hitherto been supported by single server systems. A major drawback of such a solution is that it increases the server load. The server restricts the number of clients that can be simultaneously supported due to limitation in bandwidth. The constraints of a single server system can be overcome in video streaming if we exploit the endless resources available in a distributed and networked system. We explore a P2P system for streaming video applications. In this paper we build a P2P streaming video (SVP2P) service in which multiple peers co-operate to serve video segments for new requests, thereby reducing server load and bandwidth used. Our simulation shows the playback latency using SVP2P is roughly 1/4th of the latency incurred when the server directly streams the video. Bandwidth consumed for control messages (overhead) is as low as 1.5% of the total data transfered. The most important observation is that the capacity of the SVP2P grows dynamically.
Resumo:
The Morse-Smale complex is a useful topological data structure for the analysis and visualization of scalar data. This paper describes an algorithm that processes all mesh elements of the domain in parallel to compute the Morse-Smale complex of large two-dimensional data sets at interactive speeds. We employ a reformulation of the Morse-Smale complex using Forman's Discrete Morse Theory and achieve scalability by computing the discrete gradient using local accesses only. We also introduce a novel approach to merge gradient paths that ensures accurate geometry of the computed complex. We demonstrate that our algorithm performs well on both multicore environments and on massively parallel architectures such as the GPU.
Resumo:
Many of the research institutions and universities across the world are facilitating open-access (OA) to their intellectual outputs through their respective OA institutional repositories (IRs) or through the centralized subject-based repositories. The registry of open access repositories (ROAR) lists more than 2850 such repositories across the world. The awareness about the benefits of OA to scholarly literature and OA publishing is picking up in India, too. As per the ROAR statistics, to date, there are more than 90 OA repositories in the country. India is doing particularly well in publishing open-access journals (OAJ). As per the directory of open-access journals (DOAJ), to date, India with 390 OAJs, is ranked 5th in the world in terms of numbers of OAJs being published. Much of the research done in India is reported in the journals published from India. These journals have limited readership and many of them are not being indexed by Web of Science, Scopus or other leading international abstracting and indexing databases. Consequently, research done in the country gets hidden not only from the fellow countrymen, but also from the international community. This situation can be easily overcome if all the researchers facilitate OA to their publications. One of the easiest ways to facilitate OA to scientific literature is through the institutional repositories. If every research institution and university in India set up an open-access IR and ensure that copies of the final accepted versions of all the research publications are uploaded in the IRs, then the research done in India will get far better visibility. The federation of metadata from all the distributed, interoperable OA repositories in the country will serve as a window to the research done across the country. Federation of metadata from the distributed OAI-compliant repositories can be easily achieved by setting up harvesting software like the PKP Harvester. In this paper, we share our experience in setting up a prototype metadata harvesting service using the PKP harvesting software for the OAI-compliant repositories in India.
Resumo:
Mathematical models have provided key insights into the pathogenesis of hepatitis C virus (HCV) in vivo, suggested predominant mechanism(s) of drug action, explained confounding patterns of viral load changes in HCV infected patients undergoing therapy, and presented a framework for therapy optimization. In this article, I present an overview of the major advances in the mathematical modeling of HCV dynamics.
Resumo:
A finite element method for solving multidimensional population balance systems is proposed where the balance of fluid velocity, temperature and solute partial density is considered as a two-dimensional system and the balance of particle size distribution as a three-dimensional one. The method is based on a dimensional splitting into physical space and internal property variables. In addition, the operator splitting allows to decouple the equations for temperature, solute partial density and particle size distribution. Further, a nodal point based parallel finite element algorithm for multi-dimensional population balance systems is presented. The method is applied to study a crystallization process assuming, for simplicity, a size independent growth rate and neglecting agglomeration and breakage of particles. Simulations for different wall temperatures are performed to show the effect of cooling on the crystal growth. Although the method is described in detail only for the case of d=2 space and s=1 internal property variables it has the potential to be extendable to d+s variables, d=2, 3 and s >= 1. (C) 2011 Elsevier Ltd. All rights reserved.
Resumo:
A computational framework for modeling the respiratory motion of lung tumors provides a 4D parametric representation that tracks, analyzes, and models movement to provide more accurate guidance in the planning and delivery of lung tumor radiotherapy.
Resumo:
This paper proposes a Petri net model for a commercial network processor (Intel iXP architecture) which is a multithreaded multiprocessor architecture. We consider and model three different applications viz., IPv4 forwarding, network address translation, and IP security running on IXP 2400/2850. A salient feature of the Petri net model is its ability to model the application, architecture and their interaction in great detail. The model is validated using the Intel proprietary tool (SDK 3.51 for IXP architecture) over a range of configurations. We conduct a detailed performance evaluation, identify the bottleneck resource, and propose a few architectural extensions and evaluate them in detail.
Resumo:
The questions that one should answer in engineering computations - deterministic, probabilistic/randomized, as well as heuristic - are (i) how good the computed results/outputs are and (ii) how much the cost in terms of amount of computation and the amount of storage utilized in getting the outputs is. The absolutely errorfree quantities as well as the completely errorless computations done in a natural process can never be captured by any means that we have at our disposal. While the computations including the input real quantities in nature/natural processes are exact, all the computations that we do using a digital computer or are carried out in an embedded form are never exact. The input data for such computations are also never exact because any measuring instrument has inherent error of a fixed order associated with it and this error, as a matter of hypothesis and not as a matter of assumption, is not less than 0.005 per cent. Here by error we imply relative error bounds. The fact that exact error is never known under any circumstances and any context implies that the term error is nothing but error-bounds. Further, in engineering computations, it is the relative error or, equivalently, the relative error-bounds (and not the absolute error) which is supremely important in providing us the information regarding the quality of the results/outputs. Another important fact is that inconsistency and/or near-consistency in nature, i.e., in problems created from nature is completely nonexistent while in our modelling of the natural problems we may introduce inconsistency or near-inconsistency due to human error or due to inherent non-removable error associated with any measuring device or due to assumptions introduced to make the problem solvable or more easily solvable in practice. Thus if we discover any inconsistency or possibly any near-inconsistency in a mathematical model, it is certainly due to any or all of the three foregoing factors. We do, however, go ahead to solve such inconsistent/near-consistent problems and do get results that could be useful in real-world situations. The talk considers several deterministic, probabilistic, and heuristic algorithms in numerical optimisation, other numerical and statistical computations, and in PAC (probably approximately correct) learning models. It highlights the quality of the results/outputs through specifying relative error-bounds along with the associated confidence level, and the cost, viz., amount of computations and that of storage through complexity. It points out the limitation in error-free computations (wherever possible, i.e., where the number of arithmetic operations is finite and is known a priori) as well as in the usage of interval arithmetic. Further, the interdependence among the error, the confidence, and the cost is discussed.
Resumo:
Instruction reuse is a microarchitectural technique that improves the execution time of a program by removing redundant computations at run-time. Although this is the job of an optimizing compiler, they do not succeed many a time due to limited knowledge of run-time data. In this paper we examine instruction reuse of integer ALU and load instructions in network processing applications. Specifically, this paper attempts to answer the following questions: (1) How much of instruction reuse is inherent in network processing applications?, (2) Can reuse be improved by reducing interference in the reuse buffer?, (3) What characteristics of network applications can be exploited to improve reuse?, and (4) What is the effect of reuse on resource contention and memory accesses? We propose an aggregation scheme that combines the high-level concept of network traffic i.e. "flows" with a low level microarchitectural feature of programs i.e. repetition of instructions and data along with an architecture that exploits temporal locality in incoming packet data to improve reuse. We find that for the benchmarks considered, 1% to 50% of instructions are reused while the speedup achieved varies between 1% and 24%. As a side effect, instruction reuse reduces memory traffic and can therefore be considered as a scheme for low power.
Resumo:
ASICs offer the best realization of DSP algorithms in terms of performance, but the cost is prohibitive, especially when the volumes involved are low. However, if the architecture synthesis trajectory for such algorithms is such that the target architecture can be identified as an interconnection of elementary parameterized computational structures, then it is possible to attain a close match, both in terms of performance and power with respect to an ASIC, for any algorithmic parameters of the given algorithm. Such an architecture is weakly programmable (configurable) and can be viewed as an application specific instruction-set processor (ASIP). In this work, we present a methodology to synthesize ASIPs for DSP algorithms.
Resumo:
The highest levels of security can be achieved through the use of more than one type of cryptographic algorithm for each security function. In this paper, the REDEFINE polymorphic architecture is presented as an architecture framework that can optimally support a varied set of crypto algorithms without losing high performance. The presented solution is capable of accelerating the advanced encryption standard (AES) and elliptic curve cryptography (ECC) cryptographic protocols, while still supporting different flavors of these algorithms as well as different underlying finite field sizes. The compelling feature of this cryptosystem is the ability to provide acceleration support for new field sizes as well as new (possibly proprietary) cryptographic algorithms decided upon after the cryptosystem is deployed.
Resumo:
In this paper, we consider the problem of computing numerical solutions for Ito stochastic differential equations (SDEs). The five-stage Milstein (FSM) methods are constructed for solving SDEs driven by an m-dimensional Wiener process. The FSM methods are fully explicit methods. It is proved that the FSM methods are convergent with strong order 1 for SDEs driven by an m-dimensional Wiener process. The analysis of stability (with multidimensional Wiener process) shows that the mean-square stable regions of the FSM methods are unbounded. The analysis of stability shows that the mean-square stable regions of the methods proposed in this paper are larger than the Milstein method and three-stage Milstein methods.