190 resultados para PARALLEL COMPUTING
Resumo:
In this paper, a wireless control strategy for parallel operation of three-phase four-wire inverters is proposed. A generalized situation is considered where the inverters are of unequal power ratings and the loads are nonlinear and unbalanced in nature. The proposed control algorithm exploits the potential of sinusoidal domain proportional+multiresonant controller ( in the inner voltage regulation loop) to make the system suitable for nonlinear and unbalanced loads with a simple and generalized structure of virtual output-impedance loop. The decentralized operation is achieved by using three-phase P/Q droop characteristics. The overall control algorithm helps to limit the harmonic contents and the degree of unbalance in the output-voltage waveform and to achieve excellent power-sharing accuracy in spite of mismatch in the inverter output impedances. Moreover, a synchronized turn on with consequent change over to the droop mode is applied for the new incoming unit in order to limit the circulating current completely. The simulation and experimental results from-1 kVA and -0.5 kVA paralleled units validate the effectiveness of the scheme.
Resumo:
In this paper, we introduce an analytical technique based on queueing networks and Petri nets for making a performance analysis of dataflow computations when executed on the Manchester machine. This technique is also applicable for the analysis of parallel computations on multiprocessors. We characterize the parallelism in dataflow computations through a four-parameter characterization, namely, the minimum parallelism, the maximum parallelism, the average parallelism and the variance in parallelism. We observe through detailed investigation of our analytical models that the average parallelism is a good characterization of the dataflow computations only as long as the variance in parallelism is small. However, significant difference in performance measures will result when the variance in parallelism is comparable to or higher than the average parallelism.
Analyzing Cache Performance Bottlenecks of STM Applications and addressing them with Compiler's help
Resumo:
Software transactional memory (STM) is a promising programming paradigm for shared memory multithreaded programs as an alternative to traditional lock based synchronization. However adoption of STM in mainstream software has been quite low due to its considerable overheads and its poor cache/memory performance. In this paper, we perform a detailed study of the cache behavior of STM applications and quantify the impact of different STM factors on the cache misses experienced by the applications. Based on our analysis, we propose a compiler driven Lock-Data Colocation (LDC), targeted at reducing the cache overheads on STM. We show that LDC is effective in improving the cache behavior of STM applications by reducing the dcache miss latency and improving execution time performance.
Resumo:
In this work, we propose a new organization for the last level shared cache of a rnulticore system. Our design is based on the observation that the Next-Use distance, measured in terms of intervening misses between the eviction of a line and its next use, for lines brought in by a given delinquent PC falls within a predictable range of values. We exploit this correlation to improve the performance of shared caches in multi-core architectures by proposing the NUcache organization.
Resumo:
The performance of a program will ultimately be limited by its serial (scalar) portion, as pointed out by Amdahl′s Law. Reported studies thus far of instruction-level parallelism have mixed data-parallel program portions with scalar program portions, often leading to contradictory and controversial results. We report an instruction-level behavioral characterization of scalar code containing minimal data-parallelism, extracted from highly vectorized programs of the PERFECT benchmark suite running on a Cray Y-MP system. We classify scalar basic blocks according to their instruction mix, characterize the data dependencies seen in each class, and, as a first step, measure the maximum intrablock instruction-level parallelism available. We observe skewed rather than balanced instruction distributions in scalar code and in individual basic block classes of scalar code; nonuniform distribution of parallelism across instruction classes; and, as expected, limited available intrablock parallelism. We identify frequently occurring data-dependence patterns and discuss new instructions to reduce latency. Toward effective scalar hardware, we study latency-pipelining trade-offs and restricted multiple instruction issue mechanisms.
Resumo:
Adsorption of n-alkane mixtures in the zeolite LTA-5A under liquid-phase conditions has been studied using grand canonical Monte Carlo (GCMC) simulations combined with parallel tempering. Normal GCMC techniques fail for some of these systems due to the preference of linear molecules to coil within a single cage in the zeolite. The narrow zeolite windows severerly restrict interactions of the molecules, making it difficult to simulate cooperative rearrangements necessary to explore configuration space. Because of these reasons, normal GCMC simulations results show poor reproducibility in some cases. These problems were overcome with parallel tempering techniques. Even with parallel tempering, these are very challenging systems for molecular simulation. Similar problems may arise for other zeolites such as CHA, AFX, ERI, KFI, and RHO having cages connected by narrow windows. The simulations capture the complex selectivity behavior observed in experiments such as selectivity inversion and azeotrope formation.
Resumo:
We address the problem of computing the level-crossings of an analog signal from samples measured on a uniform grid. Such a problem is important, for example, in multilevel analog-to-digital (A/D) converters. The first operation in such sampling modalities is a comparator, which gives rise to a bilevel waveform. Since bilevel signals are not bandlimited, measuring the level-crossing times exactly becomes impractical within the conventional framework of Shannon sampling. In this paper, we propose a novel sub-Nyquist sampling technique for making measurements on a uniform grid and thereby for exactly computing the level-crossing times from those samples. The computational complexity of the technique is low and comprises simple arithmetic operations. We also present a finite-rate-of-innovation sampling perspective of the proposed approach and also show how exponential splines fit in naturally into the proposed sampling framework. We also discuss some concrete practical applications of the sampling technique.
Resumo:
We describe a compiler for the Flat Concurrent Prolog language on a message passing multiprocessor architecture. This compiler permits symbolic and declarative programming in the syntax of Guarded Horn Rules, The implementation has been verified and tested on the 64-node PARAM parallel computer developed by C-DAC (Centre for the Development of Advanced Computing, India), Flat Concurrent Prolog (FCP) is a logic programming language designed for concurrent programming and parallel execution, It is a process oriented language, which embodies dataflow synchronization and guarded-command as its basic control mechanisms. An identical algorithm is executed on every processor in the network, We assume regular network topologies like mesh, ring, etc, Each node has a local memory, The algorithm comprises of two important parts: reduction and communication, The most difficult task is to integrate the solutions of problems that arise in the implementation in a coherent and efficient manner. We have tested the efficacy of the compiler on various benchmark problems of the ICOT project that have been reported in the recent book by Evan Tick, These problems include Quicksort, 8-queens, and Prime Number Generation, The results of the preliminary tests are favourable, We are currently examining issues like indexing and load balancing to further optimize our compiler.
Resumo:
In this paper, we have developed a method to compute fractal dimension (FD) of discrete time signals, in the time domain, by modifying the box-counting method. The size of the box is dependent on the sampling frequency of the signal. The number of boxes required to completely cover the signal are obtained at multiple time resolutions. The time resolutions are made coarse by decimating the signal. The loglog plot of total number of boxes required to cover the curve versus size of the box used appears to be a straight line, whose slope is taken as an estimate of FD of the signal. The results are provided to demonstrate the performance of the proposed method using parametric fractal signals. The estimation accuracy of the method is compared with that of Katz, Sevcik, and Higuchi methods. In ddition, some properties of the FD are discussed.
Resumo:
A new class of nets, called S-nets, is introduced for the performance analysis of scheduling algorithms used in real-time systems Deterministic timed Petri nets do not adequately model the scheduling of resources encountered in real-time systems, and need to be augmented with resource places and signal places, and a scheduler block, to facilitate the modeling of scheduling algorithms. The tokens are colored, and the transition firing rules are suitably modified. Further, the concept of transition folding is used, to get intuitively simple models of multiframe real-time systems. Two generic performance measures, called �load index� and �balance index,� which characterize the resource utilization and the uniformity of workload distribution, respectively, are defined. The utility of S-nets for evaluating heuristic-based scheduling schemes is illustrated by considering three heuristics for real-time scheduling. S-nets are useful in tuning the hardware configuration and the underlying scheduling policy, so that the system utilization is maximized, and the workload distribution among the computing resources is balanced.
Resumo:
in this short note, we determine precisely which operators have the property that their (full, symmetric or antisymmetric) second quantisation is an operator which is bounded or belongs to one of the various Schatten ideals; we also note that in 'the interior' of the natural domain, the second quantisation is a continuous map.