808 resultados para scalable parallel programming
Resumo:
The fast increase in the size and number of databases demands data mining approaches that are scalable to large amounts of data. This has led to the exploration of parallel computing technologies in order to perform data mining tasks concurrently using several processors. Parallelization seems to be a natural and cost-effective way to scale up data mining technologies. One of the most important of these data mining technologies is the classification of newly recorded data. This paper surveys advances in parallelization in the field of classification rule induction.
Resumo:
Advances in hardware technologies allow to capture and process data in real-time and the resulting high throughput data streams require novel data mining approaches. The research area of Data Stream Mining (DSM) is developing data mining algorithms that allow us to analyse these continuous streams of data in real-time. The creation and real-time adaption of classification models from data streams is one of the most challenging DSM tasks. Current classifiers for streaming data address this problem by using incremental learning algorithms. However, even so these algorithms are fast, they are challenged by high velocity data streams, where data instances are incoming at a fast rate. This is problematic if the applications desire that there is no or only a very little delay between changes in the patterns of the stream and absorption of these patterns by the classifier. Problems of scalability to Big Data of traditional data mining algorithms for static (non streaming) datasets have been addressed through the development of parallel classifiers. However, there is very little work on the parallelisation of data stream classification techniques. In this paper we investigate K-Nearest Neighbours (KNN) as the basis for a real-time adaptive and parallel methodology for scalable data stream classification tasks.
Resumo:
This work presents a scalable and efficient parallel implementation of the Standard Simplex algorithm in the multicore architecture to solve large scale linear programming problems. We present a general scheme explaining how each step of the standard Simplex algorithm was parallelized, indicating some important points of the parallel implementation. Performance analysis were conducted by comparing the sequential time using the Simplex tableau and the Simplex of the CPLEXR IBM. The experiments were executed on a shared memory machine with 24 cores. The scalability analysis was performed with problems of different dimensions, finding evidence that our parallel standard Simplex algorithm has a better parallel efficiency for problems with more variables than constraints. In comparison with CPLEXR , the proposed parallel algorithm achieved a efficiency of up to 16 times better
Resumo:
This paper presents a new parallel methodology for calculating the determinant of matrices of the order n, with computational complexity O(n), using the Gauss-Jordan Elimination Method and Chio's Rule as references. We intend to present our step-by-step methodology using clear mathematical language, where we will demonstrate how to calculate the determinant of a matrix of the order n in an analytical format. We will also present a computational model with one sequential algorithm and one parallel algorithm using a pseudo-code.
Resumo:
[EN]A new parallel algorithm for simultaneous untangling and smoothing of tetrahedral meshes is proposed in this paper. We provide a detailed analysis of its performance on shared-memory many-core computer architectures. This performance analysis includes the evaluation of execution time, parallel scalability, load balancing, and parallelism bottlenecks. Additionally, we compare the impact of three previously published graph coloring procedures on the performance of our parallel algorithm. We use six benchmark meshes with a wide range of sizes. Using these experimental data sets, we describe the behavior of the parallel algorithm for different data sizes. We demonstrate that this algorithm is highly scalable when it runs on two different high-performance many-core computers with up to 128 processors...
Resumo:
Hybrid technologies, thanks to the convergence of integrated microelectronic devices and new class of microfluidic structures could open new perspectives to the way how nanoscale events are discovered, monitored and controlled. The key point of this thesis is to evaluate the impact of such an approach into applications of ion-channel High Throughput Screening (HTS)platforms. This approach offers promising opportunities for the development of new classes of sensitive, reliable and cheap sensors. There are numerous advantages of embedding microelectronic readout structures strictly coupled to sensing elements. On the one hand the signal-to-noise-ratio is increased as a result of scaling. On the other, the readout miniaturization allows organization of sensors into arrays, increasing the capability of the platform in terms of number of acquired data, as required in the HTS approach, to improve sensing accuracy and reliabiity. However, accurate interface design is required to establish efficient communication between ionic-based and electronic-based signals. The work made in this thesis will show a first example of a complete parallel readout system with single ion channel resolution, using a compact and scalable hybrid architecture suitable to be interfaced to large array of sensors, ensuring simultaneous signal recording and smart control of the signal-to-noise ratio and bandwidth trade off. More specifically, an array of microfluidic polymer structures, hosting artificial lipid bilayers blocks where single ion channel pores are embededed, is coupled with an array of ultra-low noise current amplifiers for signal amplification and data processing. As demonstrating working example, the platform was used to acquire ultra small currents derived by single non-covalent molecular binding between alpha-hemolysin pores and beta-cyclodextrin molecules in artificial lipid membranes.
Resumo:
Mainstream hardware is becoming parallel, heterogeneous, and distributed on every desk, every home and in every pocket. As a consequence, in the last years software is having an epochal turn toward concurrency, distribution, interaction which is pushed by the evolution of hardware architectures and the growing of network availability. This calls for introducing further abstraction layers on top of those provided by classical mainstream programming paradigms, to tackle more effectively the new complexities that developers have to face in everyday programming. A convergence it is recognizable in the mainstream toward the adoption of the actor paradigm as a mean to unite object-oriented programming and concurrency. Nevertheless, we argue that the actor paradigm can only be considered a good starting point to provide a more comprehensive response to such a fundamental and radical change in software development. Accordingly, the main objective of this thesis is to propose Agent-Oriented Programming (AOP) as a high-level general purpose programming paradigm, natural evolution of actors and objects, introducing a further level of human-inspired concepts for programming software systems, meant to simplify the design and programming of concurrent, distributed, reactive/interactive programs. To this end, in the dissertation first we construct the required background by studying the state-of-the-art of both actor-oriented and agent-oriented programming, and then we focus on the engineering of integrated programming technologies for developing agent-based systems in their classical application domains: artificial intelligence and distributed artificial intelligence. Then, we shift the perspective moving from the development of intelligent software systems, toward general purpose software development. Using the expertise maturated during the phase of background construction, we introduce a general-purpose programming language named simpAL, which founds its roots on general principles and practices of software development, and at the same time provides an agent-oriented level of abstraction for the engineering of general purpose software systems.
Resumo:
Modern embedded systems embrace many-core shared-memory designs. Due to constrained power and area budgets, most of them feature software-managed scratchpad memories instead of data caches to increase the data locality. It is therefore programmers’ responsibility to explicitly manage the memory transfers, and this make programming these platform cumbersome. Moreover, complex modern applications must be adequately parallelized before they can the parallel potential of the platform into actual performance. To support this, programming languages were proposed, which work at a high level of abstraction, and rely on a runtime whose cost hinders performance, especially in embedded systems, where resources and power budget are constrained. This dissertation explores the applicability of the shared-memory paradigm on modern many-core systems, focusing on the ease-of-programming. It focuses on OpenMP, the de-facto standard for shared memory programming. In a first part, the cost of algorithms for synchronization and data partitioning are analyzed, and they are adapted to modern embedded many-cores. Then, the original design of an OpenMP runtime library is presented, which supports complex forms of parallelism such as multi-level and irregular parallelism. In the second part of the thesis, the focus is on heterogeneous systems, where hardware accelerators are coupled to (many-)cores to implement key functional kernels with orders-of-magnitude of speedup and energy efficiency compared to the “pure software” version. However, three main issues rise, namely i) platform design complexity, ii) architectural scalability and iii) programmability. To tackle them, a template for a generic hardware processing unit (HWPU) is proposed, which share the memory banks with cores, and the template for a scalable architecture is shown, which integrates them through the shared-memory system. Then, a full software stack and toolchain are developed to support platform design and to let programmers exploiting the accelerators of the platform. The OpenMP frontend is extended to interact with it.
Resumo:
An important problem in computational biology is finding the longest common subsequence (LCS) of two nucleotide sequences. This paper examines the correctness and performance of a recently proposed parallel LCS algorithm that uses successor tables and pruning rules to construct a list of sets from which an LCS can be easily reconstructed. Counterexamples are given for two pruning rules that were given with the original algorithm. Because of these errors, performance measurements originally reported cannot be validated. The work presented here shows that speedup can be reliably achieved by an implementation in Unified Parallel C that runs on an Infiniband cluster. This performance is partly facilitated by exploiting the software cache of the MuPC runtime system. In addition, this implementation achieved speedup without bulk memory copy operations and the associated programming complexity of message passing.
DESIGN AND IMPLEMENT DYNAMIC PROGRAMMING BASED DISCRETE POWER LEVEL SMART HOME SCHEDULING USING FPGA
Resumo:
With the development and capabilities of the Smart Home system, people today are entering an era in which household appliances are no longer just controlled by people, but also operated by a Smart System. This results in a more efficient, convenient, comfortable, and environmentally friendly living environment. A critical part of the Smart Home system is Home Automation, which means that there is a Micro-Controller Unit (MCU) to control all the household appliances and schedule their operating times. This reduces electricity bills by shifting amounts of power consumption from the on-peak hour consumption to the off-peak hour consumption, in terms of different “hour price”. In this paper, we propose an algorithm for scheduling multi-user power consumption and implement it on an FPGA board, using it as the MCU. This algorithm for discrete power level tasks scheduling is based on dynamic programming, which could find a scheduling solution close to the optimal one. We chose FPGA as our system’s controller because FPGA has low complexity, parallel processing capability, a large amount of I/O interface for further development and is programmable on both software and hardware. In conclusion, it costs little time running on FPGA board and the solution obtained is good enough for the consumers.
Resumo:
Since the early days of logic programming, researchers in the field realized the potential for exploitation of parallelism present in the execution of logic programs. Their high-level nature, the presence of nondeterminism, and their referential transparency, among other characteristics, make logic programs interesting candidates for obtaining speedups through parallel execution. At the same time, the fact that the typical applications of logic programming frequently involve irregular computations, make heavy use of dynamic data structures with logical variables, and involve search and speculation, makes the techniques used in the corresponding parallelizing compilers and run-time systems potentially interesting even outside the field. The objective of this article is to provide a comprehensive survey of the issues arising in parallel execution of logic programming languages along with the most relevant approaches explored to date in the field. Focus is mostly given to the challenges emerging from the parallel execution of Prolog programs. The article describes the major techniques used for shared memory implementation of Or-parallelism, And-parallelism, and combinations of the two. We also explore some related issues, such as memory management, compile-time analysis, and execution visualization.
Resumo:
A highly parallel and scalable Deblocking Filter (DF) hardware architecture for H.264/AVC and SVC video codecs is presented in this paper. The proposed architecture mainly consists on a coarse grain systolic array obtained by replicating a unique and homogeneous Functional Unit (FU), in which a whole Deblocking-Filter unit is implemented. The proposal is also based on a novel macroblock-level parallelization strategy of the filtering algorithm which improves the final performance by exploiting specific data dependences. This way communication overhead is reduced and a more intensive parallelism in comparison with the existing state-of-the-art solutions is obtained. Furthermore, the architecture is completely flexible, since the level of parallelism can be changed, according to the application requirements. The design has been implemented in a Virtex-5 FPGA, and it allows filtering 4CIF (704 × 576 pixels @30 fps) video sequences in real-time at frequencies lower than 10.16 Mhz.
Resumo:
Irregular computations pose sorne of the most interesting and challenging problems in automatic parallelization. Irregularity appears in certain kinds of numerical problems and is pervasive in symbolic applications. Such computations often use dynamic data structures, which make heavy use of pointers. This complicates all the steps of a parallelizing compiler, from independence detection to task partitioning and placement. Starting in the mid 80s there has been significant progress in the development of parallelizing compilers for logic programming (and more recently, constraint programming) resulting in quite capable parallelizers. The typical applications of these paradigms frequently involve irregular computations, and make heavy use of dynamic data structures with pointers, since logical variables represent in practice a well-behaved form of pointers. This arguably makes the techniques used in these compilers potentially interesting. In this paper, we introduce in a tutoríal way, sorne of the problems faced by parallelizing compilers for logic and constraint programs and provide pointers to sorne of the significant progress made in the area. In particular, this work has resulted in a series of achievements in the areas of inter-procedural pointer aliasing analysis for independence detection, cost models and cost analysis, cactus-stack memory management, techniques for managing speculative and irregular computations through task granularity control and dynamic task allocation such as work-stealing schedulers), etc.
Resumo:
Systems relying on fixed hardware components with a static level of parallelism can suffer from an underuse of logical resources, since they have to be designed for the worst-case scenario. This problem is especially important in video applications due to the emergence of new flexible standards, like Scalable Video Coding (SVC), which offer several levels of scalability. In this paper, Dynamic and Partial Reconfiguration (DPR) of modern FPGAs is used to achieve run-time variable parallelism, by using scalable architectures where the size can be adapted at run-time. Based on this proposal, a scalable Deblocking Filter core (DF), compliant with the H.264/AVC and SVC standards has been designed. This scalable DF allows run-time addition or removal of computational units working in parallel. Scalability is offered together with a scalable parallelization strategy at the macroblock (MB) level, such that when the size of the architecture changes, MB filtering order is modified accordingly
Resumo:
In recent years a lot of research has been invested in parallel processing of numerical applications. However, parallel processing of Symbolic and AI applications has received less attention. This paper presents a system for parallel symbolic computitig, narned ACE, based on the logic programming paradigm. ACE is a computational model for the full Prolog language, capable of exploiting Or-parall< lism and Independent And-parallelism. In this paper vve focus on the implementation of the and-parallel part of the ACE system (ralled &ACE) on a shared memory multiprocessor, d< scribing its organization, some optimizations, and presenting some performance figures, proving the abilhy of &ACE to efficiently exploit parallelism.