459 resultados para Parallelism
Resumo:
Mode of access: Internet.
Resumo:
Mode of access: Internet.
Resumo:
Mode of access: Internet.
Resumo:
Includes bibliographical references.
The effective use of implicit parallelism through the use of an object-oriented programming language
Resumo:
This thesis explores translating well-written sequential programs in a subset of the Eiffel programming language - without syntactic or semantic extensions - into parallelised programs for execution on a distributed architecture. The main focus is on constructing two object-oriented models: a theoretical self-contained model of concurrency which enables a simplified second model for implementing the compiling process. There is a further presentation of principles that, if followed, maximise the potential levels of parallelism. Model of Concurrency. The concurrency model is designed to be a straightforward target for mapping sequential programs onto, thus making them parallel. It aids the compilation process by providing a high level of abstraction, including a useful model of parallel behaviour which enables easy incorporation of message interchange, locking, and synchronization of objects. Further, the model is sufficient such that a compiler can and has been practically built. Model of Compilation. The compilation-model's structure is based upon an object-oriented view of grammar descriptions and capitalises on both a recursive-descent style of processing and abstract syntax trees to perform the parsing. A composite-object view with an attribute grammar style of processing is used to extract sufficient semantic information for the parallelisation (i.e. code-generation) phase. Programming Principles. The set of principles presented are based upon information hiding, sharing and containment of objects and the dividing up of methods on the basis of a command/query division. When followed, the level of potential parallelism within the presented concurrency model is maximised. Further, these principles naturally arise from good programming practice. Summary. In summary this thesis shows that it is possible to compile well-written programs, written in a subset of Eiffel, into parallel programs without any syntactic additions or semantic alterations to Eiffel: i.e. no parallel primitives are added, and the parallel program is modelled to execute with equivalent semantics to the sequential version. If the programming principles are followed, a parallelised program achieves the maximum level of potential parallelisation within the concurrency model.
Resumo:
ransition P-systems are based on biological membranes and try to emulate cell behavior and its evolution due to the presence of chemical elements. These systems perform computation through transition between two consecutive configurations, which consist in a m-tuple of multisets present at any moment in the existing m regions of the system. Transition between two configurations is performed by using evolution rules also present in each region. Among main Transition P-systems characteristics are massive parallelism and non determinism. This work is part of a very large project and tries to determine the design of a hardware circuit that can improve remarkably the process involved in the evolution of a membrane. Process in biological cells has two different levels of parallelism: the first one, obviously, is the evolution of each cell inside the whole set, and the second one is the application of the rules inside one membrane. This paper presents an evolution of the work done previously and includes an improvement that uses massive parallelism to do transition between two states. To achieve this, the initial set of rules is transformed into a new set that consists in all their possible combinations, and each of them is treated like a new rule (participant antecedents are added to generate a new multiset), converting an unique rule application in a way of parallelism in the means that several rules are applied at the same time. In this paper, we present a circuit that is able to process this kind of rules and to decode the result, taking advantage of all the potential that hardware has to implement P Systems versus previously proposed sequential solutions.
Resumo:
Heterogeneous computing systems have become common in modern processor architectures. These systems, such as those released by AMD, Intel, and Nvidia, include both CPU and GPU cores on a single die available with reduced communication overhead compared to their discrete predecessors. Currently, discrete CPU/GPU systems are limited, requiring larger, regular, highly-parallel workloads to overcome the communication costs of the system. Without the traditional communication delay assumed between GPUs and CPUs, we believe non-traditional workloads could be targeted for GPU execution. Specifically, this thesis focuses on the execution model of nested parallel workloads on heterogeneous systems. We have designed a simulation flow which utilizes widely used CPU and GPU simulators to model heterogeneous computing architectures. We then applied this simulator to non-traditional GPU workloads using different execution models. We also have proposed a new execution model for nested parallelism allowing users to exploit these heterogeneous systems to reduce execution time.
Resumo:
The Streaming SIMD extension (SSE) is a special feature embedded in the Intel Pentium III and IV classes of microprocessors. It enables the execution of SIMD type operations to exploit data parallelism. This article presents improving computation performance of a railway network simulator by means of SSE. Voltage and current at various points of the supply system to an electrified railway line are crucial for design, daily operation and planning. With computer simulation, their time-variations can be attained by solving a matrix equation, whose size mainly depends upon the number of trains present in the system. A large coefficient matrix, as a result of congested railway line, inevitably leads to heavier computational demand and hence jeopardizes the simulation speed. With the special architectural features of the latest processors on PC platforms, significant speed-up in computations can be achieved.
Resumo:
Streaming SIMD Extensions (SSE) is a unique feature embedded in the Pentium III and IV classes of microprocessors. By fully exploiting SSE, parallel algorithms can be implemented on a standard personal computer and a theoretical speedup of four can be achieved. In this paper, we demonstrate the implementation of a parallel LU matrix decomposition algorithm for solving linear systems with SSE and discuss advantages and disadvantages of this approach based on our experimental study.
Resumo:
There are many applications in aeronautics where there exist strong couplings between disciplines. One practical example is within the context of Unmanned Aerial Vehicle(UAV) automation where there exists strong coupling between operation constraints, aerodynamics, vehicle dynamics, mission and path planning. UAV path planning can be done either online or offline. The current state of path planning optimisation online UAVs with high performance computation is not at the same level as its ground-based offline optimizer's counterpart, this is mainly due to the volume, power and weight limitations on the UAV; some small UAVs do not have the computational power needed for some optimisation and path planning task. In this paper, we describe an optimisation method which can be applied to Multi-disciplinary Design Optimisation problems and UAV path planning problems. Hardware-based design optimisation techniques are used. The power and physical limitations of UAV, which may not be a problem in PC-based solutions, can be approached by utilizing a Field Programmable Gate Array (FPGA) as an algorithm accelerator. The inevitable latency produced by the iterative process of an Evolutionary Algorithm (EA) is concealed by exploiting the parallelism component within the dataflow paradigm of the EA on an FPGA architecture. Results compare software PC-based solutions and the hardware-based solutions for benchmark mathematical problems as well as a simple real world engineering problem. Results also indicate the practicality of the method which can be used for more complex single and multi objective coupled problems in aeronautical applications.
Resumo:
Experimental and theoretical studies have shown the importance of stochastic processes in genetic regulatory networks and cellular processes. Cellular networks and genetic circuits often involve small numbers of key proteins such as transcriptional factors and signaling proteins. In recent years stochastic models have been used successfully for studying noise in biological pathways, and stochastic modelling of biological systems has become a very important research field in computational biology. One of the challenge problems in this field is the reduction of the huge computing time in stochastic simulations. Based on the system of the mitogen-activated protein kinase cascade that is activated by epidermal growth factor, this work give a parallel implementation by using OpenMP and parallelism across the simulation. Special attention is paid to the independence of the generated random numbers in parallel computing, that is a key criterion for the success of stochastic simulations. Numerical results indicate that parallel computers can be used as an efficient tool for simulating the dynamics of large-scale genetic regulatory networks and cellular processes
Resumo:
Many computationally intensive scientific applications involve repetitive floating point operations other than addition and multiplication which may present a significant performance bottleneck due to the relatively large latency or low throughput involved in executing such arithmetic primitives on commod- ity processors. A promising alternative is to execute such primitives on Field Programmable Gate Array (FPGA) hardware acting as an application-specific custom co-processor in a high performance reconfig- urable computing platform. The use of FPGAs can provide advantages such as fine-grain parallelism but issues relating to code development in a hardware description language and efficient data transfer to and from the FPGA chip can present significant application development challenges. In this paper, we discuss our practical experiences in developing a selection of floating point hardware designs to be implemented using FPGAs. Our designs include some basic mathemati cal library functions which can be implemented for user defined precisions suitable for novel applications requiring non-standard floating point represen- tation. We discuss the details of our designs along with results from performance and accuracy analysis tests.
Resumo:
MapReduce frameworks such as Hadoop are well suited to handling large sets of data which can be processed separately and independently, with canonical applications in information retrieval and sales record analysis. Rapid advances in sequencing technology have ensured an explosion in the availability of genomic data, with a consequent rise in the importance of large scale comparative genomics, often involving operations and data relationships which deviate from the classical Map Reduce structure. This work examines the application of Hadoop to patterns of this nature, using as our focus a wellestablished workflow for identifying promoters - binding sites for regulatory proteins - Across multiple gene regions and organisms, coupled with the unifying step of assembling these results into a consensus sequence. Our approach demonstrates the utility of Hadoop for problems of this nature, showing how the tyranny of the "dominant decomposition" can be at least partially overcome. It also demonstrates how load balance and the granularity of parallelism can be optimized by pre-processing that splits and reorganizes input files, allowing a wide range of related problems to be brought under the same computational umbrella.