888 resultados para Parallel processing (Electronic computers) - Research
Resumo:
Postprint
Resumo:
Postprint
Resumo:
A new parallel approach for solving a pentadiagonal linear system is presented. The parallel partition method for this system and the TW parallel partition method on a chain of P processors are introduced and discussed. The result of this algorithm is a reduced pentadiagonal linear system of order P \Gamma 2 compared with a system of order 2P \Gamma 2 for the parallel partition method. More importantly the new method involves only half the number of communications startups than the parallel partition method (and other standard parallel methods) and hence is a far more efficient parallel algorithm.
Resumo:
Processors with large numbers of cores are becoming commonplace. In order to utilise the available resources in such systems, the programming paradigm has to move towards increased parallelism. However, increased parallelism does not necessarily lead to better performance. Parallel programming models have to provide not only flexible ways of defining parallel tasks, but also efficient methods to manage the created tasks. Moreover, in a general-purpose system, applications residing in the system compete for the shared resources. Thread and task scheduling in such a multiprogrammed multithreaded environment is a significant challenge. In this thesis, we introduce a new task-based parallel reduction model, called the Glasgow Parallel Reduction Machine (GPRM). Our main objective is to provide high performance while maintaining ease of programming. GPRM supports native parallelism; it provides a modular way of expressing parallel tasks and the communication patterns between them. Compiling a GPRM program results in an Intermediate Representation (IR) containing useful information about tasks, their dependencies, as well as the initial mapping information. This compile-time information helps reduce the overhead of runtime task scheduling and is key to high performance. Generally speaking, the granularity and the number of tasks are major factors in achieving high performance. These factors are even more important in the case of GPRM, as it is highly dependent on tasks, rather than threads. We use three basic benchmarks to provide a detailed comparison of GPRM with Intel OpenMP, Cilk Plus, and Threading Building Blocks (TBB) on the Intel Xeon Phi, and with GNU OpenMP on the Tilera TILEPro64. GPRM shows superior performance in almost all cases, only by controlling the number of tasks. GPRM also provides a low-overhead mechanism, called “Global Sharing”, which improves performance in multiprogramming situations. We use OpenMP, as the most popular model for shared-memory parallel programming as the main GPRM competitor for solving three well-known problems on both platforms: LU factorisation of Sparse Matrices, Image Convolution, and Linked List Processing. We focus on proposing solutions that best fit into the GPRM’s model of execution. GPRM outperforms OpenMP in all cases on the TILEPro64. On the Xeon Phi, our solution for the LU Factorisation results in notable performance improvement for sparse matrices with large numbers of small blocks. We investigate the overhead of GPRM’s task creation and distribution for very short computations using the Image Convolution benchmark. We show that this overhead can be mitigated by combining smaller tasks into larger ones. As a result, GPRM can outperform OpenMP for convolving large 2D matrices on the Xeon Phi. Finally, we demonstrate that our parallel worksharing construct provides an efficient solution for Linked List processing and performs better than OpenMP implementations on the Xeon Phi. The results are very promising, as they verify that our parallel programming framework for manycore processors is flexible and scalable, and can provide high performance without sacrificing productivity.
Resumo:
Embedded software systems in vehicles are of rapidly increasing commercial importance for the automotive industry. Current systems employ a static run-time environment; due to the difficulty and cost involved in the development of dynamic systems in a high-integrity embedded control context. A dynamic system, referring to the system configuration, would greatly increase the flexibility of the offered functionality and enable customised software configuration for individual vehicles, adding customer value through plug-and-play capability, and increased quality due to its inherent ability to adjust to changes in hardware and software. We envisage an automotive system containing a variety of components, from a multitude of organizations, not necessarily known at development time. The system dynamically adapts its configuration to suit the run-time system constraints. This paper presents our vision for future automotive control systems that will be regarded in an EU research project, referred to as DySCAS (Dynamically Self-Configuring Automotive Systems). We propose a self-configuring vehicular control system architecture, with capabilities that include automatic discovery and inclusion of new devices, self-optimisation to best-use the processing, storage and communication resources available, self-diagnostics and ultimately self-healing. Such an architecture has benefits extending to reduced development and maintenance costs, improved passenger safety and comfort, and flexible owner customisation. Specifically, this paper addresses the following issues: The state of the art of embedded software systems in vehicles, emphasising the current limitations arising from fixed run-time configurations; and the benefits and challenges of dynamic configuration, giving rise to opportunities for self-healing, self-optimisation, and the automatic inclusion of users’ Consumer Electronic (CE) devices. Our proposal for a dynamically reconfigurable automotive software system platform is outlined and a typical use-case is presented as an example to exemplify the benefits of the envisioned dynamic capabilities.
Resumo:
This thesis proposes a generic visual perception architecture for robotic clothes perception and manipulation. This proposed architecture is fully integrated with a stereo vision system and a dual-arm robot and is able to perform a number of autonomous laundering tasks. Clothes perception and manipulation is a novel research topic in robotics and has experienced rapid development in recent years. Compared to the task of perceiving and manipulating rigid objects, clothes perception and manipulation poses a greater challenge. This can be attributed to two reasons: firstly, deformable clothing requires precise (high-acuity) visual perception and dexterous manipulation; secondly, as clothing approximates a non-rigid 2-manifold in 3-space, that can adopt a quasi-infinite configuration space, the potential variability in the appearance of clothing items makes them difficult to understand, identify uniquely, and interact with by machine. From an applications perspective, and as part of EU CloPeMa project, the integrated visual perception architecture refines a pre-existing clothing manipulation pipeline by completing pre-wash clothes (category) sorting (using single-shot or interactive perception for garment categorisation and manipulation) and post-wash dual-arm flattening. To the best of the author’s knowledge, as investigated in this thesis, the autonomous clothing perception and manipulation solutions presented here were first proposed and reported by the author. All of the reported robot demonstrations in this work follow a perception-manipulation method- ology where visual and tactile feedback (in the form of surface wrinkledness captured by the high accuracy depth sensor i.e. CloPeMa stereo head or the predictive confidence modelled by Gaussian Processing) serve as the halting criteria in the flattening and sorting tasks, respectively. From scientific perspective, the proposed visual perception architecture addresses the above challenges by parsing and grouping 3D clothing configurations hierarchically from low-level curvatures, through mid-level surface shape representations (providing topological descriptions and 3D texture representations), to high-level semantic structures and statistical descriptions. A range of visual features such as Shape Index, Surface Topologies Analysis and Local Binary Patterns have been adapted within this work to parse clothing surfaces and textures and several novel features have been devised, including B-Spline Patches with Locality-Constrained Linear coding, and Topology Spatial Distance to describe and quantify generic landmarks (wrinkles and folds). The essence of this proposed architecture comprises 3D generic surface parsing and interpretation, which is critical to underpinning a number of laundering tasks and has the potential to be extended to other rigid and non-rigid object perception and manipulation tasks. The experimental results presented in this thesis demonstrate that: firstly, the proposed grasp- ing approach achieves on-average 84.7% accuracy; secondly, the proposed flattening approach is able to flatten towels, t-shirts and pants (shorts) within 9 iterations on-average; thirdly, the proposed clothes recognition pipeline can recognise clothes categories from highly wrinkled configurations and advances the state-of-the-art by 36% in terms of classification accuracy, achieving an 83.2% true-positive classification rate when discriminating between five categories of clothes; finally the Gaussian Process based interactive perception approach exhibits a substantial improvement over single-shot perception. Accordingly, this thesis has advanced the state-of-the-art of robot clothes perception and manipulation.
Resumo:
In this work, an algorithm to compute the envelope of non-destructive testing (NDT) signals is proposed. This method allows increasing the speed and reducing the memory in extensive data processing. Also, this procedure presents advantage of preserving the data information for physical modeling applications of time-dependent measurements. The algorithm is conceived to be applied for analyze data from non-destructive testing. The comparison between different envelope methods and the proposed method, applied to Magnetic Bark Signal (MBN), is studied. (C) 2010 Elsevier Ltd. All rights reserved.
Resumo:
Recently, the development of industrial processes brought on the outbreak of technologically complex systems. This development generated the necessity of research relative to the mathematical techniques that have the capacity to deal with project complexities and validation. Fuzzy models have been receiving particular attention in the area of nonlinear systems identification and analysis due to it is capacity to approximate nonlinear behavior and deal with uncertainty. A fuzzy rule-based model suitable for the approximation of many systems and functions is the Takagi-Sugeno (TS) fuzzy model. IS fuzzy models are nonlinear systems described by a set of if then rules which gives local linear representations of an underlying system. Such models can approximate a wide class of nonlinear systems. In this paper a performance analysis of a system based on IS fuzzy inference system for the calibration of electronic compass devices is considered. The contribution of the evaluated IS fuzzy inference system is to reduce the error obtained in data acquisition from a digital electronic compass. For the reliable operation of the TS fuzzy inference system, adequate error measurements must be taken. The error noise must be filtered before the application of the IS fuzzy inference system. The proposed method demonstrated an effectiveness of 57% at reducing the total error based on considered tests. (C) 2011 Elsevier Ltd. All rights reserved.
Resumo:
Embedded real-time applications increasingly present high computation requirements, which need to be completed within specific deadlines, but that present highly variable patterns, depending on the set of data available in a determined instant. The current trend to provide parallel processing in the embedded domain allows providing higher processing power; however, it does not address the variability in the processing pattern. Dimensioning each device for its worst-case scenario implies lower average utilization, and increased available, but unusable, processing in the overall system. A solution for this problem is to extend the parallel execution of the applications, allowing networked nodes to distribute the workload, on peak situations, to neighbour nodes. In this context, this report proposes a framework to develop parallel and distributed real-time embedded applications, transparently using OpenMP and Message Passing Interface (MPI), within a programming model based on OpenMP. The technical report also devises an integrated timing model, which enables the structured reasoning on the timing behaviour of these hybrid architectures.
Resumo:
Nos últimos anos começaram a ser vulgares os computadores dotados de multiprocessadores e multi-cores. De modo a aproveitar eficientemente as novas características desse hardware começaram a surgir ferramentas para facilitar o desenvolvimento de software paralelo, através de linguagens e frameworks, adaptadas a diferentes linguagens. Com a grande difusão de redes de alta velocidade, tal como Gigabit Ethernet e a última geração de redes Wi-Fi, abre-se a oportunidade de, além de paralelizar o processamento entre processadores e cores, poder em simultâneo paralelizá-lo entre máquinas diferentes. Ao modelo que permite paralelizar processamento localmente e em simultâneo distribuí-lo para máquinas que também têm capacidade de o paralelizar, chamou-se “modelo paralelo distribuído”. Nesta dissertação foram analisadas técnicas e ferramentas utilizadas para fazer programação paralela e o trabalho que está feito dentro da área de programação paralela e distribuída. Tendo estes dois factores em consideração foi proposta uma framework que tenta aplicar a simplicidade da programação paralela ao conceito paralelo distribuído. A proposta baseia-se na disponibilização de uma framework em Java com uma interface de programação simples, de fácil aprendizagem e legibilidade que, de forma transparente, é capaz de paralelizar e distribuir o processamento. Apesar de simples, existiu um esforço para a tornar configurável de forma a adaptar-se ao máximo de situações possível. Nesta dissertação serão exploradas especialmente as questões relativas à execução e distribuição de trabalho, e a forma como o código é enviado de forma automática pela rede, para outros nós cooperantes, evitando assim a instalação manual das aplicações em todos os nós da rede. Para confirmar a validade deste conceito e das ideias defendidas nesta dissertação foi implementada esta framework à qual se chamou DPF4j (Distributed Parallel Framework for JAVA) e foram feitos testes e retiradas métricas para verificar a existência de ganhos de performance em relação às soluções já existentes.
Resumo:
This paper proposes and reports the development of an open source solution for the integrated management of Infrastructure as a Service (IaaS) cloud computing resources, through the use of a common API taxonomy, to incorporate open source and proprietary platforms. This research included two surveys on open source IaaS platforms (OpenNebula, OpenStack and CloudStack) and a proprietary platform (Parallels Automation for Cloud Infrastructure - PACI) as well as on IaaS abstraction solutions (jClouds, Libcloud and Deltacloud), followed by a thorough comparison to determine the best approach. The adopted implementation reuses the Apache Deltacloud open source abstraction framework, which relies on the development of software driver modules to interface with different IaaS platforms, and involved the development of a new Deltacloud driver for PACI. The resulting interoperable solution successfully incorporates OpenNebula, OpenStack (reuses pre-existing drivers) and PACI (includes the developed Deltacloud PACI driver) nodes and provides a Web dashboard and a Representational State Transfer (REST) interface library. The results of the exchanged data payload and time response tests performed are presented and discussed. The conclusions show that open source abstraction tools like Deltacloud allow the modular and integrated management of IaaS platforms (open source and proprietary), introduce relevant time and negligible data overheads and, as a result, can be adopted by Small and Medium-sized Enterprise (SME) cloud providers to circumvent the vendor lock-in problem whenever service response time is not critical.
Resumo:
Single processor architectures are unable to provide the required performance of high performance embedded systems. Parallel processing based on general-purpose processors can achieve these performances with a considerable increase of required resources. However, in many cases, simplified optimized parallel cores can be used instead of general-purpose processors achieving better performance at lower resource utilization. In this paper, we propose a configurable many-core architecture to serve as a co-processor for high-performance embedded computing on Field-Programmable Gate Arrays. The architecture consists of an array of configurable simple cores with support for floating-point operations interconnected with a configurable interconnection network. For each core it is possible to configure the size of the internal memory, the supported operations and number of interfacing ports. The architecture was tested in a ZYNQ-7020 FPGA in the execution of several parallel algorithms. The results show that the proposed many-core architecture achieves better performance than that achieved with a parallel generalpurpose processor and that up to 32 floating-point cores can be implemented in a ZYNQ-7020 SoC FPGA.
Resumo:
Trabalho apresentado no âmbito do Mestrado em Engenharia Informática, como requisito parcial para obtenção do grau de Mestre em Engenharia Informática