23 resultados para Maths lessons execution
em Indian Institute of Science - Bangalore - Índia
Resumo:
The StreamIt programming model has been proposed to exploit parallelism in streaming applications oil general purpose multicore architectures. The StreamIt graphs describe task, data and pipeline parallelism which can be exploited on accelerators such as Graphics Processing Units (GPUs) or CellBE which support abundant parallelism in hardware. In this paper, we describe a novel method to orchestrate the execution of if StreamIt program oil a multicore platform equipped with an accelerator. The proposed approach identifies, using profiling, the relative benefits of executing a task oil the superscalar CPU cores and the accelerator. We formulate the problem of partitioning the work between the CPU cores and the GPU, taking into account the latencies for data transfers and the required buffer layout transformations associated with the partitioning, as all integrated Integer Linear Program (ILP) which can then be solved by an ILP solver. We also propose an efficient heuristic algorithm for the work-partitioning between the CPU and the GPU, which provides solutions which are within 9.05% of the optimal solution on an average across the benchmark Suite. The partitioned tasks are then software pipelined to execute oil the multiple CPU cores and the Streaming Multiprocessors (SMs) of the GPU. The software pipelining algorithm orchestrates the execution between CPU cores and the GPU by emitting the code for the CPU and the GPU, and the code for the required data transfers. Our experiments on a platform with 8 CPU cores and a GeForce 8800 GTS 512 GPU show a geometric mean speedup of 6.94X with it maximum of 51.96X over it single threaded CPU execution across the StreamIt benchmarks. This is a 18.9% improvement over it partitioning strategy that maps only the filters that cannot be executed oil the GPU - the filters with state that is persistent across firings - onto the CPU.
Resumo:
An important question which has to be answered in evaluting the suitability of a microcomputer for a control application is the time it would take to execute the specified control algorithm. In this paper, we present a method of obtaining closed-form formulas to estimate this time. These formulas are applicable to control algorithms in which arithmetic operations and matrix manipulations dominate. The method does not require writing detailed programs for implementing the control algorithm. Using this method, the execution times of a variety of control algorithms on a range of 16-bit mini- and recently announced microcomputers are calculated. The formulas have been verified independently by an analysis program, which computes the execution time bounds of control algorithms coded in Pascal when they are run on a specified micro- or minicomputer.
Resumo:
The StreamIt programming model has been proposed to exploit parallelism in streaming applications on general purpose multi-core architectures. This model allows programmers to specify the structure of a program as a set of filters that act upon data, and a set of communication channels between them. The StreamIt graphs describe task, data and pipeline parallelism which can be exploited on modern Graphics Processing Units (GPUs), as they support abundant parallelism in hardware. In this paper, we describe the challenges in mapping StreamIt to GPUs and propose an efficient technique to software pipeline the execution of stream programs on GPUs. We formulate this problem - both scheduling and assignment of filters to processors - as an efficient Integer Linear Program (ILP), which is then solved using ILP solvers. We also describe a novel buffer layout technique for GPUs which facilitates exploiting the high memory bandwidth available in GPUs. The proposed scheduling utilizes both the scalar units in GPU, to exploit data parallelism, and multiprocessors, to exploit task and pipelin parallelism. Further it takes into consideration the synchronization and bandwidth limitations of GPUs, and yields speedups between 1.87X and 36.83X over a single threaded CPU.
Resumo:
We carry out systematic and high-resolution studies of dynamo action in a shell model for magnetohydro-dynamic (MHD) turbulence over wide ranges of the magnetic Prandtl number Pr-M and the magnetic Reynolds number Re-M. Our study suggests that it is natural to think of dynamo onset as a nonequilibrium first-order phase transition between two different turbulent, but statistically steady, states. The ratio of the magnetic and kinetic energies is a convenient order parameter for this transition. By using this order parameter, we obtain the stability diagram (or nonequilibrium phase diagram) for dynamo formation in our MHD shell model in the (Pr-M(-1), Re-M) plane. The dynamo boundary, which separates dynamo and no-dynamo regions, appears to have a fractal character. We obtain a hysteretic behavior of the order parameter across this boundary and suggestions of nucleation-type phenomena.
Resumo:
REDEFINE is a reconfigurable SoC architecture that provides a unique platform for high performance and low power computing by exploiting the synergistic interaction between coarse grain dynamic dataflow model of computation (to expose abundant parallelism in applications) and runtime composition of efficient compute structures (on the reconfigurable computation resources). We propose and study the throttling of execution in REDEFINE to maximize the architecture efficiency. A feature specific fast hybrid (mixed level) simulation framework for early in design phase study is developed and implemented to make the huge design space exploration practical. We do performance modeling in terms of selection of important performance criteria, ranking of the explored throttling schemes and investigate effectiveness of the design space exploration using statistical hypothesis testing. We find throttling schemes which give appreciable (24.8%) overall performance gain in the architecture and 37% resource usage gain in the throttling unit simultaneously.
Resumo:
Ramakrishnan A, Chokhandre S, Murthy A. Voluntary control of multisaccade gaze shifts during movement preparation and execution. J Neurophysiol 103: 2400-2416, 2010. First published February 17, 2010; doi: 10.1152/jn.00843.2009. Although the nature of gaze control regulating single saccades is relatively well documented, how such control is implemented to regulate multisaccade gaze shifts is not known. We used highly eccentric targets to elicit multisaccade gaze shifts and tested the ability of subjects to control the saccade sequence by presenting a second target on random trials. Their response allowed us to test the nature of control at many levels: before, during, and between saccades. Although the saccade sequence could be inhibited before it began, we observed clear signs of truncation of the first saccade, which confirmed that it could be inhibited in midflight as well. Using a race model that explains the control of single saccades, we estimated that it took about 100 ms to inhibit a planned saccade but took about 150 ms to inhibit a saccade during its execution. Although the time taken to inhibit was different, the high subject-wise correlation suggests a unitary inhibitory control acting at different levels in the oculomotor system. We also frequently observed responses that consisted of hypometric initial saccades, followed by secondary saccades to the initial target. Given the estimates of the inhibitory process provided by the model that also took into account the variances of the processes as well, the secondary saccades (average latency similar to 215 ms) should have been inhibited. Failure to inhibit the secondary saccade suggests that the intersaccadic interval in a multisaccade response is a ballistic stage. Collectively, these data indicate that the oculomotor system can control a response until a very late stage in its execution. However, if the response consists of multiple movements then the preparation of the second movement becomes refractory to new visual input, either because it is part of a preprogrammed sequence or as a consequence of being a corrective response to a motor error.
Resumo:
We discuss the key issues in the deployment of sparse sensor networks. The network monitors several environment parameters and is deployed in a semi-arid region for the benefit of small and marginal farmers. We begin by discussing the problems of an existing unreliable 1 sq km sparse network deployed in a village. The proposed solutions are implemented in a new cluster. The new cluster is a reliable 5 sq km network. Our contributions are two fold. Firstly, we describe a. novel methodology to deploy a sparse reliable data gathering sensor network and evaluate the ``safe distance'' or ``reliable'' distance between nodes using propagation models. Secondly, we address the problem of transporting data from rural aggregation servers to urban data centres. This paper tracks our steps in deploying a sensor network in a village,in India, trying to provide better diagnosis for better crop management. Keywords - Rural, Agriculture, CTRS, Sparse.
Resumo:
Implementation details of efficient schemes for lenient execution and concurrent execution of re-entrant routines in a data flow model have been discussed in this paper. The proposed schemes require no extra hardware support and utilise the existing hardware resources such as the Matching Unit and Memory Network Interface, effectively to achieve the above mentioned goals.
Resumo:
Cardiac arrhythmias, such as ventricular tachycardia (VT) and ventricular fibrillation (VF), are among the leading causes of death in the industrialized world. These are associated with the formation of spiral and scroll waves of electrical activation in cardiac tissue; single spiral and scroll waves are believed to be associated with VT whereas their turbulent analogs are associated with VF. Thus, the study of these waves is an important biophysical problem. We present a systematic study of the combined effects of muscle-fiber rotation and inhomogeneities on scroll-wave dynamics in the TNNP (ten Tusscher Noble Noble Panfilov) model for human cardiac tissue. In particular, we use the three-dimensional TNNP model with fiber rotation and consider both conduction and ionic inhomogeneities. We find that, in addition to displaying a sensitive dependence on the positions, sizes, and types of inhomogeneities, scroll-wave dynamics also depends delicately upon the degree of fiber rotation. We find that the tendency of scroll waves to anchor to cylindrical conduction inhomogeneities increases with the radius of the inhomogeneity. Furthermore, the filament of the scroll wave can exhibit drift or meandering, transmural bending, twisting, and break-up. If the scroll-wave filament exhibits weak meandering, then there is a fine balance between the anchoring of this wave at the inhomogeneity and a disruption of wave-pinning by fiber rotation. If this filament displays strong meandering, then again the anchoring is suppressed by fiber rotation; also, the scroll wave can be eliminated from most of the layers only to be regenerated by a seed wave. Ionic inhomogeneities can also lead to an anchoring of the scroll wave; scroll waves can now enter the region inside an ionic inhomogeneity and can display a coexistence of spatiotemporal chaos and quasi-periodic behavior in different parts of the simulation domain. We discuss the experimental implications of our study.
Resumo:
This paper sets out the motivation for carrying out an observational experiment on the atmospheric boundary layer along the monsoon trough, in the light of earlier studies of the atmospheric boundary layer in India and elsewhere, and the significant role that the trough has been shown to play as a key semi-permanent feature of the southwest monsoon. The scientific objectives of the experiment are set out, and its planning and execution are touched upon. Some of the gains resulting from the experiment are mentioned, and lessons for the future about the conduct of such programmes are drawn.
Resumo:
MATLAB is an array language, initially popular for rapid prototyping, but is now being increasingly used to develop production code for numerical and scientific applications. Typical MATLAB programs have abundant data parallelism. These programs also have control flow dominated scalar regions that have an impact on the program's execution time. Today's computer systems have tremendous computing power in the form of traditional CPU cores and throughput oriented accelerators such as graphics processing units(GPUs). Thus, an approach that maps the control flow dominated regions to the CPU and the data parallel regions to the GPU can significantly improve program performance. In this paper, we present the design and implementation of MEGHA, a compiler that automatically compiles MATLAB programs to enable synergistic execution on heterogeneous processors. Our solution is fully automated and does not require programmer input for identifying data parallel regions. We propose a set of compiler optimizations tailored for MATLAB. Our compiler identifies data parallel regions of the program and composes them into kernels. The problem of combining statements into kernels is formulated as a constrained graph clustering problem. Heuristics are presented to map identified kernels to either the CPU or GPU so that kernel execution on the CPU and the GPU happens synergistically and the amount of data transfer needed is minimized. In order to ensure required data movement for dependencies across basic blocks, we propose a data flow analysis and edge splitting strategy. Thus our compiler automatically handles composition of kernels, mapping of kernels to CPU and GPU, scheduling and insertion of required data transfer. The proposed compiler was implemented and experimental evaluation using a set of MATLAB benchmarks shows that our approach achieves a geometric mean speedup of 19.8X for data parallel benchmarks over native execution of MATLAB.
Resumo:
Fault-tolerance is due to the semiconductor technology development important, not only for safety-critical systems but also for general-purpose (non-safety critical) systems. However, instead of guaranteeing that deadlines always are met, it is for general-purpose systems important to minimize the average execution time (AET) while ensuring fault-tolerance. For a given job and a soft (transient) error probability, we define mathematical formulas for AET that includes bus communication overhead for both voting (active replication) and rollback-recovery with checkpointing (RRC). And, for a given multi-processor system-on-chip (MPSoC), we define integer linear programming (ILP) models that minimize AET including bus communication overhead when: (1) selecting the number of checkpoints when using RRC, (2) finding the number of processors and job-to-processor assignment when using voting, and (3) defining fault-tolerance scheme (voting or RRC) per job and defining its usage for each job. Experiments demonstrate significant savings in AET.
Resumo:
This paper describes the design of a power efficient microarchitecture for transient fault detection in chip multiprocessors (CMPs) We introduce a new per-core dynamic voltage and frequency scaling (DVFS) algorithm for our architecture that significantly reduces power dissipation for redundant execution with a minimal performance overhead. Using cycle accurate simulation combined with a simple first order power model, we estimate that our architecture reduces dynamic power dissipation in the redundant core by an mean value of 79% and a maximum of 85% with an associated mean performance overhead of only 1:2%