962 resultados para ubiquitous multi-core framework
Resumo:
Réalisé en majeure partie sous la tutelle de feu le Professeur Paul Arminjon. Après sa disparition, le Docteur Aziz Madrane a pris la relève de la direction de mes travaux.
Resumo:
clRNG et clProbdist sont deux interfaces de programmation (APIs) que nous avons développées pour la génération de nombres aléatoires uniformes et non uniformes sur des dispositifs de calculs parallèles en utilisant l’environnement OpenCL. La première interface permet de créer au niveau d’un ordinateur central (hôte) des objets de type stream considérés comme des générateurs virtuels parallèles qui peuvent être utilisés aussi bien sur l’hôte que sur les dispositifs parallèles (unités de traitement graphique, CPU multinoyaux, etc.) pour la génération de séquences de nombres aléatoires. La seconde interface permet aussi de générer au niveau de ces unités des variables aléatoires selon différentes lois de probabilité continues et discrètes. Dans ce mémoire, nous allons rappeler des notions de base sur les générateurs de nombres aléatoires, décrire les systèmes hétérogènes ainsi que les techniques de génération parallèle de nombres aléatoires. Nous présenterons aussi les différents modèles composant l’architecture de l’environnement OpenCL et détaillerons les structures des APIs développées. Nous distinguons pour clRNG les fonctions qui permettent la création des streams, les fonctions qui génèrent les variables aléatoires uniformes ainsi que celles qui manipulent les états des streams. clProbDist contient les fonctions de génération de variables aléatoires non uniformes selon la technique d’inversion ainsi que les fonctions qui permettent de retourner différentes statistiques des lois de distribution implémentées. Nous évaluerons ces interfaces de programmation avec deux simulations qui implémentent un exemple simplifié d’un modèle d’inventaire et un exemple d’une option financière. Enfin, nous fournirons les résultats d’expérimentation sur les performances des générateurs implémentés.
Resumo:
The process of developing software that takes advantage of multiple processors is commonly referred to as parallel programming. For various reasons, this process is much harder than the sequential case. For decades, parallel programming has been a problem for a small niche only: engineers working on parallelizing mostly numerical applications in High Performance Computing. This has changed with the advent of multi-core processors in mainstream computer architectures. Parallel programming in our days becomes a problem for a much larger group of developers. The main objective of this thesis was to find ways to make parallel programming easier for them. Different aims were identified in order to reach the objective: research the state of the art of parallel programming today, improve the education of software developers about the topic, and provide programmers with powerful abstractions to make their work easier. To reach these aims, several key steps were taken. To start with, a survey was conducted among parallel programmers to find out about the state of the art. More than 250 people participated, yielding results about the parallel programming systems and languages in use, as well as about common problems with these systems. Furthermore, a study was conducted in university classes on parallel programming. It resulted in a list of frequently made mistakes that were analyzed and used to create a programmers' checklist to avoid them in the future. For programmers' education, an online resource was setup to collect experiences and knowledge in the field of parallel programming - called the Parawiki. Another key step in this direction was the creation of the Thinking Parallel weblog, where more than 50.000 readers to date have read essays on the topic. For the third aim (powerful abstractions), it was decided to concentrate on one parallel programming system: OpenMP. Its ease of use and high level of abstraction were the most important reasons for this decision. Two different research directions were pursued. The first one resulted in a parallel library called AthenaMP. It contains so-called generic components, derived from design patterns for parallel programming. These include functionality to enhance the locks provided by OpenMP, to perform operations on large amounts of data (data-parallel programming), and to enable the implementation of irregular algorithms using task pools. AthenaMP itself serves a triple role: the components are well-documented and can be used directly in programs, it enables developers to study the source code and learn from it, and it is possible for compiler writers to use it as a testing ground for their OpenMP compilers. The second research direction was targeted at changing the OpenMP specification to make the system more powerful. The main contributions here were a proposal to enable thread-cancellation and a proposal to avoid busy waiting. Both were implemented in a research compiler, shown to be useful in example applications, and proposed to the OpenMP Language Committee.
Resumo:
The design space of emerging heterogenous multi-core architectures with re-configurability element makes it feasible to design mixed fine-grained and coarse-grained parallel architectures. This paper presents a hierarchical composite array design which extends the curret design space of regular array design by combining a sequence of transformations. This technique is applied to derive a new design of a pipelined parallel regular array with different dataflow between phases of computation.
Resumo:
Can autonomic computing concepts be applied to traditional multi-core systems found in high performance computing environments? In this paper, we propose a novel synergy between parallel computing and swarm robotics to offer a new computing paradigm, `Swarm-Array Computing' that can harness and apply autonomic computing for parallel computing systems. One approach among three proposed approaches in swarm-array computing based on landscapes of intelligent cores, in which the cores of a parallel computing system are abstracted to swarm agents, is investigated. A task gets executed and transferred seamlessly between cores in the proposed approach thereby achieving self-ware properties that characterize autonomic computing. FPGAs are considered as an experimental platform taking into account its application in space robotics. The feasibility of the proposed approach is validated on the SeSAm multi-agent simulator.
Resumo:
Recently major processor manufacturers have announced a dramatic shift in their paradigm to increase computing power over the coming years. Instead of focusing on faster clock speeds and more powerful single core CPUs, the trend clearly goes towards multi core systems. This will also result in a paradigm shift for the development of algorithms for computationally expensive tasks, such as data mining applications. Obviously, work on parallel algorithms is not new per se but concentrated efforts in the many application domains are still missing. Multi-core systems, but also clusters of workstations and even large-scale distributed computing infrastructures provide new opportunities and pose new challenges for the design of parallel and distributed algorithms. Since data mining and machine learning systems rely on high performance computing systems, research on the corresponding algorithms must be on the forefront of parallel algorithm research in order to keep pushing data mining and machine learning applications to be more powerful and, especially for the former, interactive. To bring together researchers and practitioners working in this exciting field, a workshop on parallel data mining was organized as part of PKDD/ECML 2006 (Berlin, Germany). The six contributions selected for the program describe various aspects of data mining and machine learning approaches featuring low to high degrees of parallelism: The first contribution focuses the classic problem of distributed association rule mining and focuses on communication efficiency to improve the state of the art. After this a parallelization technique for speeding up decision tree construction by means of thread-level parallelism for shared memory systems is presented. The next paper discusses the design of a parallel approach for dis- tributed memory systems of the frequent subgraphs mining problem. This approach is based on a hierarchical communication topology to solve issues related to multi-domain computational envi- ronments. The forth paper describes the combined use and the customization of software packages to facilitate a top down parallelism in the tuning of Support Vector Machines (SVM) and the next contribution presents an interesting idea concerning parallel training of Conditional Random Fields (CRFs) and motivates their use in labeling sequential data. The last contribution finally focuses on very efficient feature selection. It describes a parallel algorithm for feature selection from random subsets. Selecting the papers included in this volume would not have been possible without the help of an international Program Committee that has provided detailed reviews for each paper. We would like to also thank Matthew Otey who helped with publicity for the workshop.
Resumo:
We describe infinitely scalable pipeline machines with perfect parallelism, in the sense that every instruction of an inline program is executed, on successive data, on every clock tick. Programs with shared data effectively execute in less than a clock tick. We show that pipeline machines are faster than single or multi-core, von Neumann machines for sufficiently many program runs of a sufficiently time consuming program. Our pipeline machines exploit the totality of transreal arithmetic and the known waiting time of statically compiled programs to deliver the interesting property that they need no hardware or software exception handling.
Resumo:
Running hydrodynamic models interactively allows both visual exploration and change of model state during simulation. One of the main characteristics of an interactive model is that it should provide immediate feedback to the user, for example respond to changes in model state or view settings. For this reason, such features are usually only available for models with a relatively small number of computational cells, which are used mainly for demonstration and educational purposes. It would be useful if interactive modeling would also work for models typically used in consultancy projects involving large scale simulations. This results in a number of technical challenges related to the combination of the model itself and the visualisation tools (scalability, implementation of an appropriate API for control and access to the internal state). While model parallelisation is increasingly addressed by the environmental modeling community, little effort has been spent on developing a high-performance interactive environment. What can we learn from other high-end visualisation domains such as 3D animation, gaming, virtual globes (Autodesk 3ds Max, Second Life, Google Earth) that also focus on efficient interaction with 3D environments? In these domains high efficiency is usually achieved by the use of computer graphics algorithms such as surface simplification depending on current view, distance to objects, and efficient caching of the aggregated representation of object meshes. We investigate how these algorithms can be re-used in the context of interactive hydrodynamic modeling without significant changes to the model code and allowing model operation on both multi-core CPU personal computers and high-performance computer clusters.
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
Transactional memory (TM) is a new synchronization mechanism devised to simplify parallel programming, thereby helping programmers to unleash the power of current multicore processors. Although software implementations of TM (STM) have been extensively analyzed in terms of runtime performance, little attention has been paid to an equally important constraint faced by nearly all computer systems: energy consumption. In this work we conduct a comprehensive study of energy and runtime tradeoff sin software transactional memory systems. We characterize the behavior of three state-of-the-art lock-based STM algorithms, along with three different conflict resolution schemes. As a result of this characterization, we propose a DVFS-based technique that can be integrated into the resolution policies so as to improve the energy-delay product (EDP). Experimental results show that our DVFS-enhanced policies are indeed beneficial for applications with high contention levels. Improvements of up to 59% in EDP can be observed in this scenario, with an average EDP reduction of 16% across the STAMP workloads. © 2012 IEEE.
Resumo:
Breakthrough advances in microprocessor technology and efficient power management have altered the course of development of processors with the emergence of multi-core processor technology, in order to bring higher level of processing. The utilization of many-core technology has boosted computing power provided by cluster of workstations or SMPs, providing large computational power at an affordable cost using solely commodity components. Different implementations of message-passing libraries and system softwares (including Operating Systems) are installed in such cluster and multi-cluster computing systems. In order to guarantee correct execution of message-passing parallel applications in a computing environment other than that originally the parallel application was developed, review of the application code is needed. In this paper, a hybrid communication interfacing strategy is proposed, to execute a parallel application in a group of computing nodes belonging to different clusters or multi-clusters (computing systems may be running different operating systems and MPI implementations), interconnected with public or private IP addresses, and responding interchangeably to user execution requests. Experimental results demonstrate the feasibility of this proposed strategy and its effectiveness, through the execution of benchmarking parallel applications.
Resumo:
[EN] The accuracy and performance of current variational optical ow methods have considerably increased during the last years. The complexity of these techniques is high and enough care has to be taken for the implementation. The aim of this work is to present a comprehensible implementation of recent variational optical flow methods. We start with an energy model that relies on brightness and gradient constancy terms and a ow-based smoothness term. We minimize this energy model and derive an e cient implicit numerical scheme. In the experimental results, we evaluate the accuracy and performance of this implementation with the Middlebury benchmark database. We show that it is a competitive solution with respect to current methods in the literature. In order to increase the performance, we use a simple strategy to parallelize the execution on multi-core processors.
Resumo:
This Doctoral Dissertation is triggered by an emergent trend: firms are increasingly referring to investments in corporate venture capital (CVC) as means to create new competencies and foster the search for competitive advantage through the use of external resources. CVC is generally defined as the practice by non-financial firms of placing equity investments in entrepreneurial companies. Thus, CVC can be interpreted (i) as a key component of corporate entrepreneurship - acts of organizational creation, renewal, or innovation that occur within or outside an existing organization– and (ii) as a particular form of venture capital (VC) investment where the investor is not a traditional and financial institution, but an established corporation. My Dissertation, thus, simultaneously refers to two streams of research: corporate strategy and venture capital. In particular, I directed my attention to three topics of particular relevance for better understanding the role of CVC. In the first study, I moved from the consideration that competitive environments with rapid technological changes increasingly force established corporations to access knowledge from external sources. Firms, thus, extensively engage in external business development activities through different forms of collaboration with partners. While the underlying process common to these mechanisms is one of knowledge access, they are substantially different. The aim of the first study is to figure out how corporations choose among CVC, alliance, joint venture and acquisition. I addressed this issue adopting a multi-theoretical framework where the resource-based view and real options theory are integrated. While the first study mainly looked into the use of external resources for corporate growth, in the second work, I combined an internal and an external perspective to figure out the relationship between CVC investments (exploiting external resources) and a more traditional strategy to create competitive advantage, that is, corporate diversification (based on internal resources). Adopting an explorative lens, I investigated how these different modes to renew corporate current capabilities interact to each other. More precisely, is CVC complementary or substitute to corporate diversification? Finally, the third study focused on the more general field of VC to investigate (i) how VC firms evaluate the patent portfolios of their potential investee companies and (ii) whether the ability to evaluate technology and intellectual property varies depending on the type of investors, in particular for what concern the distinction between specialized versus generalist VCs and independent versus corporate VCs. This topic is motivated by two observations. First, it is not clear yet which determinants of patent value are primarily considered by VCs in their investment decisions. Second, VCs are not all alike in terms of technological experiences and these differences need to be taken into account.
Resumo:
The term "Brain Imaging" identi�es a set of techniques to analyze the structure and/or functional behavior of the brain in normal and/or pathological situations. These techniques are largely used in the study of brain activity. In addition to clinical usage, analysis of brain activity is gaining popularity in others recent �fields, i.e. Brain Computer Interfaces (BCI) and the study of cognitive processes. In this context, usage of classical solutions (e.g. f MRI, PET-CT) could be unfeasible, due to their low temporal resolution, high cost and limited portability. For these reasons alternative low cost techniques are object of research, typically based on simple recording hardware and on intensive data elaboration process. Typical examples are ElectroEncephaloGraphy (EEG) and Electrical Impedance Tomography (EIT), where electric potential at the patient's scalp is recorded by high impedance electrodes. In EEG potentials are directly generated from neuronal activity, while in EIT by the injection of small currents at the scalp. To retrieve meaningful insights on brain activity from measurements, EIT and EEG relies on detailed knowledge of the underlying electrical properties of the body. This is obtained from numerical models of the electric �field distribution therein. The inhomogeneous and anisotropic electric properties of human tissues make accurate modeling and simulation very challenging, leading to a tradeo�ff between physical accuracy and technical feasibility, which currently severely limits the capabilities of these techniques. Moreover elaboration of data recorded requires usage of regularization techniques computationally intensive, which influences the application with heavy temporal constraints (such as BCI). This work focuses on the parallel implementation of a work-flow for EEG and EIT data processing. The resulting software is accelerated using multi-core GPUs, in order to provide solution in reasonable times and address requirements of real-time BCI systems, without over-simplifying the complexity and accuracy of the head models.
Resumo:
This work presents exact algorithms for the Resource Allocation and Cyclic Scheduling Problems (RA&CSPs). Cyclic Scheduling Problems arise in a number of application areas, such as in hoist scheduling, mass production, compiler design (implementing scheduling loops on parallel architectures), software pipelining, and in embedded system design. The RA&CS problem concerns time and resource assignment to a set of activities, to be indefinitely repeated, subject to precedence and resource capacity constraints. In this work we present two constraint programming frameworks facing two different types of cyclic problems. In first instance, we consider the disjunctive RA&CSP, where the allocation problem considers unary resources. Instances are described through the Synchronous Data-flow (SDF) Model of Computation. The key problem of finding a maximum-throughput allocation and scheduling of Synchronous Data-Flow graphs onto a multi-core architecture is NP-hard and has been traditionally solved by means of heuristic (incomplete) algorithms. We propose an exact (complete) algorithm for the computation of a maximum-throughput mapping of applications specified as SDFG onto multi-core architectures. Results show that the approach can handle realistic instances in terms of size and complexity. Next, we tackle the Cyclic Resource-Constrained Scheduling Problem (i.e. CRCSP). We propose a Constraint Programming approach based on modular arithmetic: in particular, we introduce a modular precedence constraint and a global cumulative constraint along with their filtering algorithms. Many traditional approaches to cyclic scheduling operate by fixing the period value and then solving a linear problem in a generate-and-test fashion. Conversely, our technique is based on a non-linear model and tackles the problem as a whole: the period value is inferred from the scheduling decisions. The proposed approaches have been tested on a number of non-trivial synthetic instances and on a set of realistic industrial instances achieving good results on practical size problem.