17 resultados para Parallelism
em CentAUR: Central Archive University of Reading - UK
Resumo:
Since its introduction in 1993, the Message Passing Interface (MPI) has become a de facto standard for writing High Performance Computing (HPC) applications on clusters and Massively Parallel Processors (MPPs). The recent emergence of multi-core processor systems presents a new challenge for established parallel programming paradigms, including those based on MPI. This paper presents a new Java messaging system called MPJ Express. Using this system, we exploit multiple levels of parallelism - messaging and threading - to improve application performance on multi-core processors. We refer to our approach as nested parallelism. This MPI-like Java library can support nested parallelism by using Java or Java OpenMP (JOMP) threads within an MPJ Express process. Practicality of this approach is assessed by porting to Java a massively parallel structure formation code from Cosmology called Gadget-2. We introduce nested parallelism in the Java version of the simulation code and report good speed-ups. To the best of our knowledge it is the first time this kind of hybrid parallelism is demonstrated in a high performance Java application. (C) 2009 Elsevier Inc. All rights reserved.
Resumo:
A parallel structure is suggested for feedback control systems. Such a technique can be applied to either single or multi-sensor environments and is ideally suited for parallel processor implementation. The control input actually applied is based on a weighted summation of the different parallel controller values, the weightings being either fixed values or chosen by an adaptive decision-making mechanism. The effect of different controller combinations is a field now open to study.
Resumo:
Mainframes, corporate and central servers are becoming information servers. The requirement for more powerful information servers is the best opportunity to exploit the potential of parallelism. ICL recognized the opportunity of the 'knowledge spectrum' namely to convert raw data into information and then into high grade knowledge. Parallel Processing and Data Management Its response to this and to the underlying search problems was to introduce the CAFS retrieval engine. The CAFS product demonstrates that it is possible to move functionality within an established architecture, introduce a different technology mix and exploit parallelism to achieve radically new levels of performance. CAFS also demonstrates the benefit of achieving this transparently behind existing interfaces. ICL is now working with Bull and Siemens to develop the information servers of the future by exploiting new technologies as available. The objective of the joint Esprit II European Declarative System project is to develop a smoothly scalable, highly parallel computer system, EDS. EDS will in the main be an SQL server and an information server. It will support the many data-intensive applications which the companies foresee; it will also support application-intensive and logic-intensive systems.
Resumo:
These notes have been issued on a small scale in 1983 and 1987 and on request at other times. This issue follows two items of news. First, WaIter Colquitt and Luther Welsh found the 'missed' Mersenne prime M110503 and advanced the frontier of complete Mp-testing to 139,267. In so doing, they terminated Slowinski's significant string of four consecutive Mersenne primes. Secondly, a team of five established a non-Mersenne number as the largest known prime. This result terminated the 1952-89 reign of Mersenne primes. All the original Mersenne numbers with p < 258 were factorised some time ago. The Sandia Laboratories team of Davis, Holdridge & Simmons with some little assistance from a CRAY machine cracked M211 in 1983 and M251 in 1984. They contributed their results to the 'Cunningham Project', care of Sam Wagstaff. That project is now moving apace thanks to developments in technology, factorisation and primality testing. New levels of computer power and new computer architectures motivated by the open-ended promise of parallelism are now available. Once again, the suppliers may be offering free buildings with the computer. However, the Sandia '84 CRAY-l implementation of the quadratic-sieve method is now outpowered by the number-field sieve technique. This is deployed on either purpose-built hardware or large syndicates, even distributed world-wide, of collaborating standard processors. New factorisation techniques of both special and general applicability have been defined and deployed. The elliptic-curve method finds large factors with helpful properties while the number-field sieve approach is breaking down composites with over one hundred digits. The material is updated on an occasional basis to follow the latest developments in primality-testing large Mp and factorising smaller Mp; all dates derive from the published literature or referenced private communications. Minor corrections, additions and changes merely advance the issue number after the decimal point. The reader is invited to report any errors and omissions that have escaped the proof-reading, to answer the unresolved questions noted and to suggest additional material associated with this subject.
Resumo:
The authors present a systolic design for a simple GA mechanism which provides high throughput and unidirectional pipelining by exploiting the inherent parallelism in the genetic operators. The design computes in O(N+G) time steps using O(N2) cells where N is the population size and G is the chromosome length. The area of the device is independent of the chromosome length and so can be easily scaled by replicating the arrays or by employing fine-grain migration. The array is generic in the sense that it does not rely on the fitness function and can be used as an accelerator for any GA application using uniform crossover between pairs of chromosomes. The design can also be used in hybrid systems as an add-on to complement existing designs and methods for fitness function acceleration and island-style population management
Resumo:
We have designed a highly parallel design for a simple genetic algorithm using a pipeline of systolic arrays. The systolic design provides high throughput and unidirectional pipelining by exploiting the implicit parallelism in the genetic operators. The design is significant because, unlike other hardware genetic algorithms, it is independent of both the fitness function and the particular chromosome length used in a problem. We have designed and simulated a version of the mutation array using Xilinix FPGA tools to investigate the feasibility of hardware implementation. A simple 5-chromosome mutation array occupies 195 CLBs and is capable of performing more than one million mutations per second. I. Introduction Genetic algorithms (GAs) are established search and optimization techniques which have been applied to a range of engineering and applied problems with considerable success [1]. They operate by maintaining a population of trial solutions encoded, using a suitable encoding scheme.
Resumo:
We propose a bridge between two important parallel programming paradigms: data parallelism and communicating sequential processes (CSP). Data parallel pipelined architectures obtained with the Alpha language can be embedded in a control intensive application expressed in CSP-based Handel formalism. The interface is formally defined from the semantics of the languages Alpha and Handel. This work will ease the design of compute intensive applications on FPGAs.
Resumo:
We describe a FORTRAN-90 program to compute low-energy electron diffraction I(V) curves. Plane-waves and layer doubling are used to compute the inter-layer multiple-scattering, while the intra-layer multiple-scattering is computed in the standard way expanding the wavefield on a basis of spherical waves. The program is kept as general as possible, in order to allow testing different parts of multiple-scattering calculations. In particular, it can handle non-diagonal t-matrices describing the scattering of non-spherical potentials, anisotropic vibrations, anharmonicity, etc. The program does not use old FORTRAN flavours, and has been written keeping in mind the advantage for parallelism brought forward by FORTRAN-90.
Resumo:
Recently major processor manufacturers have announced a dramatic shift in their paradigm to increase computing power over the coming years. Instead of focusing on faster clock speeds and more powerful single core CPUs, the trend clearly goes towards multi core systems. This will also result in a paradigm shift for the development of algorithms for computationally expensive tasks, such as data mining applications. Obviously, work on parallel algorithms is not new per se but concentrated efforts in the many application domains are still missing. Multi-core systems, but also clusters of workstations and even large-scale distributed computing infrastructures provide new opportunities and pose new challenges for the design of parallel and distributed algorithms. Since data mining and machine learning systems rely on high performance computing systems, research on the corresponding algorithms must be on the forefront of parallel algorithm research in order to keep pushing data mining and machine learning applications to be more powerful and, especially for the former, interactive. To bring together researchers and practitioners working in this exciting field, a workshop on parallel data mining was organized as part of PKDD/ECML 2006 (Berlin, Germany). The six contributions selected for the program describe various aspects of data mining and machine learning approaches featuring low to high degrees of parallelism: The first contribution focuses the classic problem of distributed association rule mining and focuses on communication efficiency to improve the state of the art. After this a parallelization technique for speeding up decision tree construction by means of thread-level parallelism for shared memory systems is presented. The next paper discusses the design of a parallel approach for dis- tributed memory systems of the frequent subgraphs mining problem. This approach is based on a hierarchical communication topology to solve issues related to multi-domain computational envi- ronments. The forth paper describes the combined use and the customization of software packages to facilitate a top down parallelism in the tuning of Support Vector Machines (SVM) and the next contribution presents an interesting idea concerning parallel training of Conditional Random Fields (CRFs) and motivates their use in labeling sequential data. The last contribution finally focuses on very efficient feature selection. It describes a parallel algorithm for feature selection from random subsets. Selecting the papers included in this volume would not have been possible without the help of an international Program Committee that has provided detailed reviews for each paper. We would like to also thank Matthew Otey who helped with publicity for the workshop.
Resumo:
Generally classifiers tend to overfit if there is noise in the training data or there are missing values. Ensemble learning methods are often used to improve a classifier's classification accuracy. Most ensemble learning approaches aim to improve the classification accuracy of decision trees. However, alternative classifiers to decision trees exist. The recently developed Random Prism ensemble learner for classification aims to improve an alternative classification rule induction approach, the Prism family of algorithms, which addresses some of the limitations of decision trees. However, Random Prism suffers like any ensemble learner from a high computational overhead due to replication of the data and the induction of multiple base classifiers. Hence even modest sized datasets may impose a computational challenge to ensemble learners such as Random Prism. Parallelism is often used to scale up algorithms to deal with large datasets. This paper investigates parallelisation for Random Prism, implements a prototype and evaluates it empirically using a Hadoop computing cluster.
Resumo:
The time to process each of W/B processing blocks of a median calculation method on a set of N W-bit integers is improved here by a factor of three compared to the literature. Parallelism uncovered in blocks containing B-bit slices are exploited by independent accumulative parallel counters so that the median is calculated faster than any known previous method for any N, W values. The improvements to the method are discussed in the context of calculating the median for a moving set of N integers for which a pipelined architecture is developed. An extra benefit of smaller area for the architecture is also reported.
Resumo:
The Mobile Network Optimization (MNO) technologies have advanced at a tremendous pace in recent years. And the Dynamic Network Optimization (DNO) concept emerged years ago, aimed to continuously optimize the network in response to variations in network traffic and conditions. Yet, DNO development is still at its infancy, mainly hindered by a significant bottleneck of the lengthy optimization runtime. This paper identifies parallelism in greedy MNO algorithms and presents an advanced distributed parallel solution. The solution is designed, implemented and applied to real-life projects whose results yield a significant, highly scalable and nearly linear speedup up to 6.9 and 14.5 on distributed 8-core and 16-core systems respectively. Meanwhile, optimization outputs exhibit self-consistency and high precision compared to their sequential counterpart. This is a milestone in realizing the DNO. Further, the techniques may be applied to similar greedy optimization algorithm based applications.