802 resultados para parallel programming paradigms
Resumo:
The problem of preparation of a program to perform it on multiprocessor system of a cluster type is considered. When developing programs for a cluster computer the technology based on use of the remote terminal is applied. The situation when such remote terminal is the computer with operational system Windows is considered. The set of the tool means, allowing carrying out of editing program texts, compiling and starting programs on a cluster computer, is suggested. Advantage of an offered way of preparation of programs to execution is that it allows as much as possible to use practical experience of programmers used to working in OS Windows environment.
Resumo:
We present an Integrated Environment suitable for learning and teaching computer programming which is designed for both students of specialised Computer Science courses, and also non-specialist students such as those following Liberal Arts. The environment is rich enough to allow exploration of concepts from robotics, artificial intelligence, social science, and philosophy as well as the specialist areas of operating systems and the various computer programming paradigms.
Resumo:
Processors with large numbers of cores are becoming commonplace. In order to utilise the available resources in such systems, the programming paradigm has to move towards increased parallelism. However, increased parallelism does not necessarily lead to better performance. Parallel programming models have to provide not only flexible ways of defining parallel tasks, but also efficient methods to manage the created tasks. Moreover, in a general-purpose system, applications residing in the system compete for the shared resources. Thread and task scheduling in such a multiprogrammed multithreaded environment is a significant challenge. In this thesis, we introduce a new task-based parallel reduction model, called the Glasgow Parallel Reduction Machine (GPRM). Our main objective is to provide high performance while maintaining ease of programming. GPRM supports native parallelism; it provides a modular way of expressing parallel tasks and the communication patterns between them. Compiling a GPRM program results in an Intermediate Representation (IR) containing useful information about tasks, their dependencies, as well as the initial mapping information. This compile-time information helps reduce the overhead of runtime task scheduling and is key to high performance. Generally speaking, the granularity and the number of tasks are major factors in achieving high performance. These factors are even more important in the case of GPRM, as it is highly dependent on tasks, rather than threads. We use three basic benchmarks to provide a detailed comparison of GPRM with Intel OpenMP, Cilk Plus, and Threading Building Blocks (TBB) on the Intel Xeon Phi, and with GNU OpenMP on the Tilera TILEPro64. GPRM shows superior performance in almost all cases, only by controlling the number of tasks. GPRM also provides a low-overhead mechanism, called “Global Sharing”, which improves performance in multiprogramming situations. We use OpenMP, as the most popular model for shared-memory parallel programming as the main GPRM competitor for solving three well-known problems on both platforms: LU factorisation of Sparse Matrices, Image Convolution, and Linked List Processing. We focus on proposing solutions that best fit into the GPRM’s model of execution. GPRM outperforms OpenMP in all cases on the TILEPro64. On the Xeon Phi, our solution for the LU Factorisation results in notable performance improvement for sparse matrices with large numbers of small blocks. We investigate the overhead of GPRM’s task creation and distribution for very short computations using the Image Convolution benchmark. We show that this overhead can be mitigated by combining smaller tasks into larger ones. As a result, GPRM can outperform OpenMP for convolving large 2D matrices on the Xeon Phi. Finally, we demonstrate that our parallel worksharing construct provides an efficient solution for Linked List processing and performs better than OpenMP implementations on the Xeon Phi. The results are very promising, as they verify that our parallel programming framework for manycore processors is flexible and scalable, and can provide high performance without sacrificing productivity.
Resumo:
Solving linear systems is an important problem for scientific computing. Exploiting parallelism is essential for solving complex systems, and this traditionally involves writing parallel algorithms on top of a library such as MPI. The SPIKE family of algorithms is one well-known example of a parallel solver for linear systems. The Hierarchically Tiled Array data type extends traditional data-parallel array operations with explicit tiling and allows programmers to directly manipulate tiles. The tiles of the HTA data type map naturally to the block nature of many numeric computations, including the SPIKE family of algorithms. The higher level of abstraction of the HTA enables the same program to be portable across different platforms. Current implementations target both shared-memory and distributed-memory models. In this thesis we present a proof-of-concept for portable linear solvers. We implement two algorithms from the SPIKE family using the HTA library. We show that our implementations of SPIKE exploit the abstractions provided by the HTA to produce a compact, clean code that can run on both shared-memory and distributed-memory models without modification. We discuss how we map the algorithms to HTA programs as well as examine their performance. We compare the performance of our HTA codes to comparable codes written in MPI as well as current state-of-the-art linear algebra routines.
Resumo:
In this paper we describe a distributed object oriented logic programming language in which an object is a collection of threads deductively accessing and updating a shared logic program. The key features of the language, such as static and dynamic object methods and multiple inheritance, are illustrated through a series of small examples. We show how we can implement object servers, allowing remote spawning of objects, which we can use as staging posts for mobile agents. We give as an example an information gathering mobile agent that can be queried about the information it has so far gathered whilst it is gathering new information. Finally we define a class of co-operative reasoning agents that can do resource bounded inference for full first order predicate logic, handling multiple queries and information updates concurrently. We believe that the combination of the concurrent OO and the LP programming paradigms produces a powerful tool for quickly implementing rational multi-agent applications on the internet.
Resumo:
A novel high throughput and scalable unified architecture for the computation of the transform operations in video codecs for advanced standards is presented in this paper. This structure can be used as a hardware accelerator in modern embedded systems to efficiently compute all the two-dimensional 4 x 4 and 2 x 2 transforms of the H.264/AVC standard. Moreover, its highly flexible design and hardware efficiency allows it to be easily scaled in terms of performance and hardware cost to meet the specific requirements of any given video coding application. Experimental results obtained using a Xilinx Virtex-5 FPGA demonstrated the superior performance and hardware efficiency levels provided by the proposed structure, which presents a throughput per unit of area relatively higher than other similar recently published designs targeting the H.264/AVC standard. Such results also showed that, when integrated in a multi-core embedded system, this architecture provides speedup factors of about 120x concerning pure software implementations of the transform algorithms, therefore allowing the computation, in real-time, of all the above mentioned transforms for Ultra High Definition Video (UHDV) sequences (4,320 x 7,680 @ 30 fps).
Resumo:
In the past years, Software Architecture has attracted increased attention by academia and industry as the unifying concept to structure the design of complex systems. One particular research area deals with the possibility of reconfiguring architectures to adapt the systems they describe to new requirements. Reconfiguration amounts to adding and removing components and connections, and may have to occur without stopping the execution of the system being reconfigured. This work contributes to the formal description of such a process. Taking as a premise that a single formalism hardly ever satisfies all requirements in every situation, we present three approaches, each one with its own assumptions about the systems it can be applied to and with different advantages and disadvantages. Each approach is based on work of other researchers and has the aesthetic concern of changing as little as possible the original formalism, keeping its spirit. The first approach shows how a given reconfiguration can be specified in the same manner as the system it is applied to and in a way to be efficiently executed. The second approach explores the Chemical Abstract Machine, a formalism for rewriting multisets of terms, to describe architectures, computations, and reconfigurations in a uniform way. The last approach uses a UNITY-like parallel programming design language to describe computations, represents architectures by diagrams in the sense of Category Theory, and specifies reconfigurations by graph transformation rules.
Resumo:
The definition and programming of distributed applications has become a major research issue due to the increasing availability of (large scale) distributed platforms and the requirements posed by the economical globalization. However, such a task requires a huge effort due to the complexity of the distributed environments: large amount of users may communicate and share information across different authority domains; moreover, the “execution environment” or “computations” are dynamic since the number of users and the computational infrastructure change in time. Grid environments, in particular, promise to be an answer to deal with such complexity, by providing high performance execution support to large amount of users, and resource sharing across different organizations. Nevertheless, programming in Grid environments is still a difficult task. There is a lack of high level programming paradigms and support tools that may guide the application developer and allow reusability of state-of-the-art solutions. Specifically, the main goal of the work presented in this thesis is to contribute to the simplification of the development cycle of applications for Grid environments by bringing structure and flexibility to three stages of that cycle through a commonmodel. The stages are: the design phase, the execution phase, and the reconfiguration phase. The common model is based on the manipulation of patterns through pattern operators, and the division of both patterns and operators into two categories, namely structural and behavioural. Moreover, both structural and behavioural patterns are first class entities at each of the aforesaid stages. At the design phase, patterns can be manipulated like other first class entities such as components. This allows a more structured way to build applications by reusing and composing state-of-the-art patterns. At the execution phase, patterns are units of execution control: it is possible, for example, to start or stop and to resume the execution of a pattern as a single entity. At the reconfiguration phase, patterns can also be manipulated as single entities with the additional advantage that it is possible to perform a structural reconfiguration while keeping some of the behavioural constraints, and vice-versa. For example, it is possible to replace a behavioural pattern, which was applied to some structural pattern, with another behavioural pattern. In this thesis, besides the proposal of the methodology for distributed application development, as sketched above, a definition of a relevant set of pattern operators was made. The methodology and the expressivity of the pattern operators were assessed through the development of several representative distributed applications. To support this validation, a prototype was designed and implemented, encompassing some relevant patterns and a significant part of the patterns operators defined. This prototype was based in the Triana environment; Triana supports the development and deployment of distributed applications in the Grid through a dataflow-based programming model. Additionally, this thesis also presents the analysis of a mapping of some operators for execution control onto the Distributed Resource Management Application API (DRMAA). This assessment confirmed the suitability of the proposed model, as well as the generality and flexibility of the defined pattern operators
Resumo:
Nos últimos anos começaram a ser vulgares os computadores dotados de multiprocessadores e multi-cores. De modo a aproveitar eficientemente as novas características desse hardware começaram a surgir ferramentas para facilitar o desenvolvimento de software paralelo, através de linguagens e frameworks, adaptadas a diferentes linguagens. Com a grande difusão de redes de alta velocidade, tal como Gigabit Ethernet e a última geração de redes Wi-Fi, abre-se a oportunidade de, além de paralelizar o processamento entre processadores e cores, poder em simultâneo paralelizá-lo entre máquinas diferentes. Ao modelo que permite paralelizar processamento localmente e em simultâneo distribuí-lo para máquinas que também têm capacidade de o paralelizar, chamou-se “modelo paralelo distribuído”. Nesta dissertação foram analisadas técnicas e ferramentas utilizadas para fazer programação paralela e o trabalho que está feito dentro da área de programação paralela e distribuída. Tendo estes dois factores em consideração foi proposta uma framework que tenta aplicar a simplicidade da programação paralela ao conceito paralelo distribuído. A proposta baseia-se na disponibilização de uma framework em Java com uma interface de programação simples, de fácil aprendizagem e legibilidade que, de forma transparente, é capaz de paralelizar e distribuir o processamento. Apesar de simples, existiu um esforço para a tornar configurável de forma a adaptar-se ao máximo de situações possível. Nesta dissertação serão exploradas especialmente as questões relativas à execução e distribuição de trabalho, e a forma como o código é enviado de forma automática pela rede, para outros nós cooperantes, evitando assim a instalação manual das aplicações em todos os nós da rede. Para confirmar a validade deste conceito e das ideias defendidas nesta dissertação foi implementada esta framework à qual se chamou DPF4j (Distributed Parallel Framework for JAVA) e foram feitos testes e retiradas métricas para verificar a existência de ganhos de performance em relação às soluções já existentes.
Resumo:
Trabalho apresentado no âmbito do Mestrado em Engenharia Informática, como requisito parcial para obtenção do grau de Mestre em Engenharia Informática
Resumo:
Presented at Work in Progress Session, IEEE Real-Time Systems Symposium (RTSS 2015). 1 to 3, Dec, 2015. San Antonio, U.S.A..
Resumo:
Dissertação para obtenção do Grau de Mestre em Engenharia Informática
Resumo:
Dissertation submitted in partial fulfillment of the requirements for the Degree of Master of Science in Geospatial Technologies.
Resumo:
Dissertação para obtenção do Grau de Mestre em Engenharia Informática
Resumo:
Dissertação para obtenção do Grau de Mestre em Engenharia Biomédica