918 resultados para parallel execution


Relevância:

20.00% 20.00%

Publicador:

Resumo:

The Artificial Neural Networks (ANNs) are being used to solve a variety of problems in pattern recognition, robotic control, VLSI CAD and other areas. In most of these applications, a speedy response from the ANNs is imperative. However, ANNs comprise a large number of artificial neurons, and a massive interconnection network among them. Hence, implementation of these ANNs involves execution of computer-intensive operations. The usage of multiprocessor systems therefore becomes necessary. In this article, we have presented the implementation of ART1 and ART2 ANNs on ring and mesh architectures. The overall system design and implementation aspects are presented. The performance of the algorithm on ring, 2-dimensional mesh and n-dimensional mesh topologies is presented. The parallel algorithm presented for implementation of ART1 is not specific to any particular architecture. The parallel algorithm for ARTE is more suitable for a ring architecture.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We consider the problem of deciding whether the output of a boolean circuit is determined by a partial assignment to its inputs. This problem is easily shown to be hard, i.e., co-Image Image -complete. However, many of the consequences of a partial input assignment may be determined in linear time, by iterating the following step: if we know the values of some inputs to a gate, we can deduce the values of some outputs of that gate. This process of iteratively deducing some of the consequences of a partial assignment is called propagation. This paper explores the parallel complexity of propagation, i.e., the complexity of determining whether the output of a given boolean circuit is determined by propagating a given partial input assignment. We give a complete classification of the problem into those cases that are Image -complete and those that are unlikely to be Image complete.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The paper presents two new algorithms for the direct parallel solution of systems of linear equations. The algorithms employ a novel recursive doubling technique to obtain solutions to an nth-order system in n steps with no more than 2n(n −1) processors. Comparing their performance with the Gaussian elimination algorithm (GE), we show that they are almost 100% faster than the latter. This speedup is achieved by dispensing with all the computation involved in the back-substitution phase of GE. It is also shown that the new algorithms exhibit error characteristics which are superior to GE. An n(n + 1) systolic array structure is proposed for the implementation of the new algorithms. We show that complete solutions can be obtained, through these single-phase solution methods, in 5n−log2n−4 computational steps, without the need for intermediate I/O operations.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A new method of specifying the syntax of programming languages, known as hierarchical language specifications (HLS), is proposed. Efficient parallel algorithms for parsing languages generated by HLS are presented. These algorithms run on an exclusive-read exclusive-write parallel random-access machine. They require O(n) processors and O(log2n) time, where n is the length of the string to be parsed. The most important feature of these algorithms is that they do not use a stack.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper, the design and implementation of a single shared bus, shared memory multiprocessing system using Intel's single board computers is presented. The hardware configuration and the operating system developed to execute the parallel algorithms are discussed. The performance evaluation studies carried out on Image are outlined.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Abstract is not available.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In the modern business environment, meeting due dates and avoiding delay penalties are very important goals that can be accomplished by minimizing total weighted tardiness. We consider a scheduling problem in a system of parallel processors with the objective of minimizing total weighted tardiness. Our aim in the present work is to develop an efficient algorithm for solving the parallel processor problem as compared to the available heuristics in the literature and we propose the ant colony optimization approach for this problem. An extensive experimentation is conducted to evaluate the performance of the ACO approach on different problem sizes with the varied tardiness factors. Our experimentation shows that the proposed ant colony optimization algorithm is giving promising results compared to the best of the available heuristics.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

It is shown that the conclusions arrived at regarding the instability of an incompressible fluid cylinder in the presence of the magnetic field and the streaming velocity in a recent communication easily follow from the study of propagation characteristics of Alfvén surface waves along cylindrical plasma columns made earlier.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

X-ray crystal structure analysis of 7-methoxycoumarin reveals that the reactive double bonds are rotated by about 65° with respect to each other, the centre-to-centre distance between the double bonds being 3.83 Å. In spite of this unfavourable arrangement, photodimerization occurs in the crystalline state yielding the syn-head-tail dimer as the only product. Lattice energy calculations on ground-state molecules in crystals throw light on the mechanism of the reaction.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Tridiagonal diagonally dominant linear systems arise in many scientific and engineering applications. The standard Thomas algorithm for solving such systems is inherently serial forming a bottleneck in computation. Algorithms such as cyclic reduction and SPIKE reduce a single large tridiagonal system into multiple small independent systems which can be solved in parallel. We have developed portable cyclic reduction and SPIKE algorithm OpenCL implementations with the intent to target a range of co-processors in a heterogeneous computing environment including Field Programmable Gate Arrays (FPGAs), Graphics Processing Units (GPUs) and other multi-core processors. In this paper, we evaluate these designs in the context of solver performance, resource efficiency and numerical accuracy.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper, three parallel polygon scan conversion algorithms have been proposed, and their performance when executed on a shared bus architecture has been compared. It has been shown that the parallel algorithm that does not use edge coherence performs better than those that use edge coherence. Further, a multiprocessing architecture has been proposed to execute the parallel polygon scan conversion algorithms more efficiently than a single shared bus architecture.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Multi-access techniques are widely used in computer networking and distributed multiprocessor systems. On-the-fly arbitration schemes permit one of the many contenders to access the medium without collisions. Serial arbitration is cost effective but is slow and hence unsuitable for high-speed multiprocessor environments supporting very high data transfer rates. A fully parallel arbitration scheme takes less time but is not practically realisable for large numbers of contenders. In this paper, a generalised parallel-serial scheme is proposed which significantly reduces the arbitration time and is practically realisable.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The StreamIt programming model has been proposed to exploit parallelism in streaming applications oil general purpose multicore architectures. The StreamIt graphs describe task, data and pipeline parallelism which can be exploited on accelerators such as Graphics Processing Units (GPUs) or CellBE which support abundant parallelism in hardware. In this paper, we describe a novel method to orchestrate the execution of if StreamIt program oil a multicore platform equipped with an accelerator. The proposed approach identifies, using profiling, the relative benefits of executing a task oil the superscalar CPU cores and the accelerator. We formulate the problem of partitioning the work between the CPU cores and the GPU, taking into account the latencies for data transfers and the required buffer layout transformations associated with the partitioning, as all integrated Integer Linear Program (ILP) which can then be solved by an ILP solver. We also propose an efficient heuristic algorithm for the work-partitioning between the CPU and the GPU, which provides solutions which are within 9.05% of the optimal solution on an average across the benchmark Suite. The partitioned tasks are then software pipelined to execute oil the multiple CPU cores and the Streaming Multiprocessors (SMs) of the GPU. The software pipelining algorithm orchestrates the execution between CPU cores and the GPU by emitting the code for the CPU and the GPU, and the code for the required data transfers. Our experiments on a platform with 8 CPU cores and a GeForce 8800 GTS 512 GPU show a geometric mean speedup of 6.94X with it maximum of 51.96X over it single threaded CPU execution across the StreamIt benchmarks. This is a 18.9% improvement over it partitioning strategy that maps only the filters that cannot be executed oil the GPU - the filters with state that is persistent across firings - onto the CPU.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

An apolar synthetic analog of the first 10 residues at the NH2-terminal end of zervamicin IIA crystallizes in the triclinic space group P1 with cell dimensions a = 10.206 +/- 0.002 A, b = 12.244 +/- 0.002 A, c = 15.049 +/- 0.002 A, alpha = 93.94 +/- 0.01 degrees, beta = 95.10 +/- 0.01 degrees, gamma = 104.56 +/- 0.01 degrees, Z = 1, C60H97N11O13 X 2H2O. Despite the relatively few alpha-aminoisobutyric acid residues, the peptide maintains a helical form. The first intrahelical hydrogen bond is of the 3(10) type between N(3) and O(0), followed by five alpha-helix-type hydrogen bonds. Solution 1H NMR studies in chloroform also favor a helical conformation, with seven solvent-shielded NH groups. Continuous columns are formed by head-to-tail hydrogen bonds between the helical molecules along the helix axis. The absence of polar side chains precludes any lateral hydrogen bonds. Since the peptide crystallizes with one molecule in a triclinic space group, aggregation of the helical columns must necessarily be parallel rather than antiparallel. The packing of the columns is rather inefficient, as indicated by very few good van der Waals' contacts and the occurrence of voids between the molecules.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The implementation of CSP-S (a subset of CSP)—a high level language for distributed programming—is presented in this paper. The language CSP-S features a parallel command, communication by message passing and the use of guarded command. The implementation consists of a compiler translating the CSP-S constructs into intermediate language. The execution is carried out by a scheduler which creates an illusion of concurrency. Using the CSP-S language constructs, distributed algorithms are written, executed and tested with the compiler designed.