31 resultados para Parallelisation
Resumo:
In a world where data is captured on a large scale the major challenge for data mining algorithms is to be able to scale up to large datasets. There are two main approaches to inducing classification rules, one is the divide and conquer approach, also known as the top down induction of decision trees; the other approach is called the separate and conquer approach. A considerable amount of work has been done on scaling up the divide and conquer approach. However, very little work has been conducted on scaling up the separate and conquer approach.In this work we describe a parallel framework that allows the parallelisation of a certain family of separate and conquer algorithms, the Prism family. Parallelisation helps the Prism family of algorithms to harvest additional computer resources in a network of computers in order to make the induction of classification rules scale better on large datasets. Our framework also incorporates a pre-pruning facility for parallel Prism algorithms.
Resumo:
Inducing rules from very large datasets is one of the most challenging areas in data mining. Several approaches exist to scaling up classification rule induction to large datasets, namely data reduction and the parallelisation of classification rule induction algorithms. In the area of parallelisation of classification rule induction algorithms most of the work has been concentrated on the Top Down Induction of Decision Trees (TDIDT), also known as the ‘divide and conquer’ approach. However powerful alternative algorithms exist that induce modular rules. Most of these alternative algorithms follow the ‘separate and conquer’ approach of inducing rules, but very little work has been done to make the ‘separate and conquer’ approach scale better on large training data. This paper examines the potential of the recently developed blackboard based J-PMCRI methodology for parallelising modular classification rule induction algorithms that follow the ‘separate and conquer’ approach. A concrete implementation of the methodology is evaluated empirically on very large datasets.
Resumo:
In order to gain knowledge from large databases, scalable data mining technologies are needed. Data are captured on a large scale and thus databases are increasing at a fast pace. This leads to the utilisation of parallel computing technologies in order to cope with large amounts of data. In the area of classification rule induction, parallelisation of classification rules has focused on the divide and conquer approach, also known as the Top Down Induction of Decision Trees (TDIDT). An alternative approach to classification rule induction is separate and conquer which has only recently been in the focus of parallelisation. This work introduces and evaluates empirically a framework for the parallel induction of classification rules, generated by members of the Prism family of algorithms. All members of the Prism family of algorithms follow the separate and conquer approach.
Resumo:
Generally classifiers tend to overfit if there is noise in the training data or there are missing values. Ensemble learning methods are often used to improve a classifier's classification accuracy. Most ensemble learning approaches aim to improve the classification accuracy of decision trees. However, alternative classifiers to decision trees exist. The recently developed Random Prism ensemble learner for classification aims to improve an alternative classification rule induction approach, the Prism family of algorithms, which addresses some of the limitations of decision trees. However, Random Prism suffers like any ensemble learner from a high computational overhead due to replication of the data and the induction of multiple base classifiers. Hence even modest sized datasets may impose a computational challenge to ensemble learners such as Random Prism. Parallelism is often used to scale up algorithms to deal with large datasets. This paper investigates parallelisation for Random Prism, implements a prototype and evaluates it empirically using a Hadoop computing cluster.
Resumo:
Advances in hardware technologies allow to capture and process data in real-time and the resulting high throughput data streams require novel data mining approaches. The research area of Data Stream Mining (DSM) is developing data mining algorithms that allow us to analyse these continuous streams of data in real-time. The creation and real-time adaption of classification models from data streams is one of the most challenging DSM tasks. Current classifiers for streaming data address this problem by using incremental learning algorithms. However, even so these algorithms are fast, they are challenged by high velocity data streams, where data instances are incoming at a fast rate. This is problematic if the applications desire that there is no or only a very little delay between changes in the patterns of the stream and absorption of these patterns by the classifier. Problems of scalability to Big Data of traditional data mining algorithms for static (non streaming) datasets have been addressed through the development of parallel classifiers. However, there is very little work on the parallelisation of data stream classification techniques. In this paper we investigate K-Nearest Neighbours (KNN) as the basis for a real-time adaptive and parallel methodology for scalable data stream classification tasks.
Resumo:
Running hydrodynamic models interactively allows both visual exploration and change of model state during simulation. One of the main characteristics of an interactive model is that it should provide immediate feedback to the user, for example respond to changes in model state or view settings. For this reason, such features are usually only available for models with a relatively small number of computational cells, which are used mainly for demonstration and educational purposes. It would be useful if interactive modeling would also work for models typically used in consultancy projects involving large scale simulations. This results in a number of technical challenges related to the combination of the model itself and the visualisation tools (scalability, implementation of an appropriate API for control and access to the internal state). While model parallelisation is increasingly addressed by the environmental modeling community, little effort has been spent on developing a high-performance interactive environment. What can we learn from other high-end visualisation domains such as 3D animation, gaming, virtual globes (Autodesk 3ds Max, Second Life, Google Earth) that also focus on efficient interaction with 3D environments? In these domains high efficiency is usually achieved by the use of computer graphics algorithms such as surface simplification depending on current view, distance to objects, and efficient caching of the aggregated representation of object meshes. We investigate how these algorithms can be re-used in the context of interactive hydrodynamic modeling without significant changes to the model code and allowing model operation on both multi-core CPU personal computers and high-performance computer clusters.
Resumo:
Die vorliegende Dissertation beinhaltet Anwendungen der Quantenchemie und methodische Entwicklungen im Bereich der "Coupled-Cluster"-Theorie zu den folgenden Themen: 1.) Die Bestimmung von Geometrieparametern in wasserstoffverbrückten Komplexen mit Pikometer-Genauigkeit durch Kopplung von NMR-Experimenten und quantenchemischen Rechnungen wird an zwei Beispielen dargelegt. 2.) Die hierin auftretenden Unterschiede in Theorie und Experiment werden diskutiert. Hierzu wurde die Schwingungsmittelung des Dipolkopplungstensors implementiert, um Nullpunkt-Effekte betrachten zu können. 3.) Ein weiterer Aspekt der Arbeit behandelt die Strukturaufklärung an diskotischen Flüssigkristallen. Die quantenchemische Modellbildung und das Zusammenspiel mit experimentellen Methoden, vor allem der Festkörper-NMR, wird vorgestellt. 4.) Innerhalb dieser Arbeit wurde mit der Parallelisierung des Quantenchemiepaketes ACESII begonnen. Die grundlegende Strategie und erste Ergebnisse werden vorgestellt. 5.) Zur Skalenreduktion des CCCSD(T)-Verfahrens durch Faktorisierung wurden verschiedene Zerlegungen des Energienenners getestet. Ein sich hieraus ergebendes Verfahren zur Berechnung der CCSD(T)-Energie wurde implementiert. 6.) Die Reaktionsaufklärung der Bildung von HSOH aus di-tert-Butyl-Sulfoxid wird vorgestellt. Dazu wurde die Thermodynamik der Reaktionsschritte mit Methoden der Quantenchemie berechnet.
Resumo:
The XSophe computer simulation software suite consisting of a daemon, the XSophe interface and the computational program Sophe is a state of the art package for the simulation of electron paramagnetic resonance spectra. The Sophe program performs the computer simulation and includes a number of new technologies including; the SOPHE partition and interpolation schemes, a field segmentation algorithm, homotopy, parallelisation and spectral optimisation. The SOPHE partition and interpolation scheme along with a field segmentation algorithm greatly increases the speed of simulations for most systems. Multidimensional homotopy provides an efficient method for accurately tracing energy levels and hence tracing transitions in the presence of energy level anticrossings and looping transitions and allowing computer simulations in frequency space. Recent enhancements to Sophe include the generalised treatment of distributions of orientational parameters, termed the mosaic misorientation linewidth model and a faster more efficient algorithm for the calculation of resonant field positions and transition probabilities. For complex systems the parallelisation enables the simulation of these systems on a parallel computer and the optimisation algorithms in the suite provide the experimentalist with the possibility of finding the spin Hamiltonian parameters in a systematic manner rather than a trial-and-error process. The XSophe software suite has been used to simulate multifrequency EPR spectra (200 MHz to 6 00 GHz) from isolated spin systems (S > ~½) and coupled centres (Si, Sj _> I/2). Griffin, M.; Muys, A.; Noble, C.; Wang, D.; Eldershaw, C.; Gates, K.E.; Burrage, K.; Hanson, G.R."XSophe, a Computer Simulation Software Suite for the Analysis of Electron Paramagnetic Resonance Spectra", 1999, Mol. Phys. Rep., 26, 60-84.
The effective use of implicit parallelism through the use of an object-oriented programming language
Resumo:
This thesis explores translating well-written sequential programs in a subset of the Eiffel programming language - without syntactic or semantic extensions - into parallelised programs for execution on a distributed architecture. The main focus is on constructing two object-oriented models: a theoretical self-contained model of concurrency which enables a simplified second model for implementing the compiling process. There is a further presentation of principles that, if followed, maximise the potential levels of parallelism. Model of Concurrency. The concurrency model is designed to be a straightforward target for mapping sequential programs onto, thus making them parallel. It aids the compilation process by providing a high level of abstraction, including a useful model of parallel behaviour which enables easy incorporation of message interchange, locking, and synchronization of objects. Further, the model is sufficient such that a compiler can and has been practically built. Model of Compilation. The compilation-model's structure is based upon an object-oriented view of grammar descriptions and capitalises on both a recursive-descent style of processing and abstract syntax trees to perform the parsing. A composite-object view with an attribute grammar style of processing is used to extract sufficient semantic information for the parallelisation (i.e. code-generation) phase. Programming Principles. The set of principles presented are based upon information hiding, sharing and containment of objects and the dividing up of methods on the basis of a command/query division. When followed, the level of potential parallelism within the presented concurrency model is maximised. Further, these principles naturally arise from good programming practice. Summary. In summary this thesis shows that it is possible to compile well-written programs, written in a subset of Eiffel, into parallel programs without any syntactic additions or semantic alterations to Eiffel: i.e. no parallel primitives are added, and the parallel program is modelled to execute with equivalent semantics to the sequential version. If the programming principles are followed, a parallelised program achieves the maximum level of potential parallelisation within the concurrency model.
Resumo:
The focus of this study is development of parallelised version of severely sequential and iterative numerical algorithms based on multi-threaded parallel platform such as a graphics processing unit. This requires design and development of a platform-specific numerical solution that can benefit from the parallel capabilities of the chosen platform. Graphics processing unit was chosen as a parallel platform for design and development of a numerical solution for a specific physical model in non-linear optics. This problem appears in describing ultra-short pulse propagation in bulk transparent media that has recently been subject to several theoretical and numerical studies. The mathematical model describing this phenomenon is a challenging and complex problem and its numerical modeling limited on current modern workstations. Numerical modeling of this problem requires a parallelisation of an essentially serial algorithms and elimination of numerical bottlenecks. The main challenge to overcome is parallelisation of the globally non-local mathematical model. This thesis presents a numerical solution for elimination of numerical bottleneck associated with the non-local nature of the mathematical model. The accuracy and performance of the parallel code is identified by back-to-back testing with a similar serial version.
Resumo:
Nanoparticles offer an ideal platform for the delivery of small molecule drugs, subunit vaccines and genetic constructs. Besides the necessity of a homogenous size distribution, defined loading efficiencies and reasonable production and development costs, one of the major bottlenecks in translating nanoparticles into clinical application is the need for rapid, robust and reproducible development techniques. Within this thesis, microfluidic methods were investigated for the manufacturing, drug or protein loading and purification of pharmaceutically relevant nanoparticles. Initially, methods to prepare small liposomes were evaluated and compared to a microfluidics-directed nanoprecipitation method. To support the implementation of statistical process control, design of experiment models aided the process robustness and validation for the methods investigated and gave an initial overview of the size ranges obtainable in each method whilst evaluating advantages and disadvantages of each method. The lab-on-a-chip system resulted in a high-throughput vesicle manufacturing, enabling a rapid process and a high degree of process control. To further investigate this method, cationic low transition temperature lipids, cationic bola-amphiphiles with delocalized charge centers, neutral lipids and polymers were used in the microfluidics-directed nanoprecipitation method to formulate vesicles. Whereas the total flow rate (TFR) and the ratio of solvent to aqueous stream (flow rate ratio, FRR) was shown to be influential for controlling the vesicle size in high transition temperature lipids, the factor FRR was found the most influential factor controlling the size of vesicles consisting of low transition temperature lipids and polymer-based nanoparticles. The biological activity of the resulting constructs was confirmed by an invitro transfection of pDNA constructs using cationic nanoprecipitated vesicles. Design of experiments and multivariate data analysis revealed the mathematical relationship and significance of the factors TFR and FRR in the microfluidics process to the liposome size, polydispersity and transfection efficiency. Multivariate tools were used to cluster and predict specific in-vivo immune responses dependent on key liposome adjuvant characteristics upon delivery a tuberculosis antigen in a vaccine candidate. The addition of a low solubility model drug (propofol) in the nanoprecipitation method resulted in a significantly higher solubilisation of the drug within the liposomal bilayer, compared to the control method. The microfluidics method underwent scale-up work by increasing the channel diameter and parallelisation of the mixers in a planar way, resulting in an overall 40-fold increase in throughput. Furthermore, microfluidic tools were developed based on a microfluidics-directed tangential flow filtration, which allowed for a continuous manufacturing, purification and concentration of liposomal drug products.
Resumo:
A large class of computational problems are characterised by frequent synchronisation, and computational requirements which change as a function of time. When such a problem is solved on a message passing multiprocessor machine [5], the combination of these characteristics leads to system performance which deteriorate in time. As the communication performance of parallel hardware steadily improves so load balance becomes a dominant factor in obtaining high parallel efficiency. Performance can be improved with periodic redistribution of computational load; however, redistribution can sometimes be very costly. We study the issue of deciding when to invoke a global load re-balancing mechanism. Such a decision policy must actively weigh the costs of remapping against the performance benefits, and should be general enough to apply automatically to a wide range of computations. This paper discusses a generic strategy for Dynamic Load Balancing (DLB) in unstructured mesh computational mechanics applications. The strategy is intended to handle varying levels of load changes throughout the run. The major issues involved in a generic dynamic load balancing scheme will be investigated together with techniques to automate the implementation of a dynamic load balancing mechanism within the Computer Aided Parallelisation Tools (CAPTools) environment, which is a semi-automatic tool for parallelisation of mesh based FORTRAN codes.
Resumo:
Unstructured mesh codes for modelling continuum physics phenomena have evolved to provide the facility to model complex interacting systems. Parallelisation of such codes using single Program Multi Data (SPMD) domain decomposition techniques implemented with message passing has been demonstrated to provide high parallel efficiency, scalability to large numbers of processors P and portability across a wide range of parallel platforms. High efficiency, especially for large P requires that load balance is achieved in each parallel loop. For a code in which loops span a variety of mesh entity types, for example, elements, faces and vertices, some compromise is required between load balance for each entity type and the quantity of inter-processor communication required to satisfy data dependence between processors.
Resumo:
As the efficiency of parallel software increases it is becoming common to measure near linear speedup for many applications. For a problem size N on P processors then with software running at O(N=P ) the performance restrictions due to file i/o systems and mesh decomposition running at O(N) become increasingly apparent especially for large P . For distributed memory parallel systems an additional limit to scalability results from the finite memory size available for i/o scatter/gather operations. Simple strategies developed to address the scalability of scatter/gather operations for unstructured mesh based applications have been extended to provide scalable mesh decomposition through the development of a parallel graph partitioning code, JOSTLE [8]. The focus of this work is directed towards the development of generic strategies that can be incorporated into the Computer Aided Parallelisation Tools (CAPTools) project.
Resumo:
Due to the growth of design size and complexity, design verification is an important aspect of the Logic Circuit development process. The purpose of verification is to validate that the design meets the system requirements and specification. This is done by either functional or formal verification. The most popular approach to functional verification is the use of simulation based techniques. Using models to replicate the behaviour of an actual system is called simulation. In this thesis, a software/data structure architecture without explicit locks is proposed to accelerate logic gate circuit simulation. We call thus system ZSIM. The ZSIM software architecture simulator targets low cost SIMD multi-core machines. Its performance is evaluated on the Intel Xeon Phi and 2 other machines (Intel Xeon and AMD Opteron). The aim of these experiments is to: • Verify that the data structure used allows SIMD acceleration, particularly on machines with gather instructions ( section 5.3.1). • Verify that, on sufficiently large circuits, substantial gains could be made from multicore parallelism ( section 5.3.2 ). • Show that a simulator using this approach out-performs an existing commercial simulator on a standard workstation ( section 5.3.3 ). • Show that the performance on a cheap Xeon Phi card is competitive with results reported elsewhere on much more expensive super-computers ( section 5.3.5 ). To evaluate the ZSIM, two types of test circuits were used: 1. Circuits from the IWLS benchmark suit [1] which allow direct comparison with other published studies of parallel simulators.2. Circuits generated by a parametrised circuit synthesizer. The synthesizer used an algorithm that has been shown to generate circuits that are statistically representative of real logic circuits. The synthesizer allowed testing of a range of very large circuits, larger than the ones for which it was possible to obtain open source files. The experimental results show that with SIMD acceleration and multicore, ZSIM gained a peak parallelisation factor of 300 on Intel Xeon Phi and 11 on Intel Xeon. With only SIMD enabled, ZSIM achieved a maximum parallelistion gain of 10 on Intel Xeon Phi and 4 on Intel Xeon. Furthermore, it was shown that this software architecture simulator running on a SIMD machine is much faster than, and can handle much bigger circuits than a widely used commercial simulator (Xilinx) running on a workstation. The performance achieved by ZSIM was also compared with similar pre-existing work on logic simulation targeting GPUs and supercomputers. It was shown that ZSIM simulator running on a Xeon Phi machine gives comparable simulation performance to the IBM Blue Gene supercomputer at very much lower cost. The experimental results have shown that the Xeon Phi is competitive with simulation on GPUs and allows the handling of much larger circuits than have been reported for GPU simulation. When targeting Xeon Phi architecture, the automatic cache management of the Xeon Phi, handles and manages the on-chip local store without any explicit mention of the local store being made in the architecture of the simulator itself. However, targeting GPUs, explicit cache management in program increases the complexity of the software architecture. Furthermore, one of the strongest points of the ZSIM simulator is its portability. Note that the same code was tested on both AMD and Xeon Phi machines. The same architecture that efficiently performs on Xeon Phi, was ported into a 64 core NUMA AMD Opteron. To conclude, the two main achievements are restated as following: The primary achievement of this work was proving that the ZSIM architecture was faster than previously published logic simulators on low cost platforms. The secondary achievement was the development of a synthetic testing suite that went beyond the scale range that was previously publicly available, based on prior work that showed the synthesis technique is valid.