19 resultados para Parallel numerical algorithms
em Aston University Research Archive
Resumo:
Femtosecond laser microfabrication has emerged over the last decade as a 3D flexible technology in photonics. Numerical simulations provide an important insight into spatial and temporal beam and pulse shaping during the course of extremely intricate nonlinear propagation (see e.g. [1,2]). Electromagnetics of such propagation is typically described in the form of the generalized Non-Linear Schrdinger Equation (NLSE) coupled with Drude model for plasma [3]. In this paper we consider a multi-threaded parallel numerical solution for a specific model which describes femtosecond laser pulse propagation in transparent media [4, 5]. However our approach can be extended to similar models. The numerical code is implemented in NVIDIA Graphics Processing Unit (GPU) which provides an effitient hardware platform for multi-threded computing. We compare the performance of the described below parallel code implementated for GPU using CUDA programming interface [3] with a serial CPU version used in our previous papers [4,5]. © 2011 IEEE.
Resumo:
The focus of this study is development of parallelised version of severely sequential and iterative numerical algorithms based on multi-threaded parallel platform such as a graphics processing unit. This requires design and development of a platform-specific numerical solution that can benefit from the parallel capabilities of the chosen platform. Graphics processing unit was chosen as a parallel platform for design and development of a numerical solution for a specific physical model in non-linear optics. This problem appears in describing ultra-short pulse propagation in bulk transparent media that has recently been subject to several theoretical and numerical studies. The mathematical model describing this phenomenon is a challenging and complex problem and its numerical modeling limited on current modern workstations. Numerical modeling of this problem requires a parallelisation of an essentially serial algorithms and elimination of numerical bottlenecks. The main challenge to overcome is parallelisation of the globally non-local mathematical model. This thesis presents a numerical solution for elimination of numerical bottleneck associated with the non-local nature of the mathematical model. The accuracy and performance of the parallel code is identified by back-to-back testing with a similar serial version.
Resumo:
Very large spatially-referenced datasets, for example, those derived from satellite-based sensors which sample across the globe or large monitoring networks of individual sensors, are becoming increasingly common and more widely available for use in environmental decision making. In large or dense sensor networks, huge quantities of data can be collected over small time periods. In many applications the generation of maps, or predictions at specific locations, from the data in (near) real-time is crucial. Geostatistical operations such as interpolation are vital in this map-generation process and in emergency situations, the resulting predictions need to be available almost instantly, so that decision makers can make informed decisions and define risk and evacuation zones. It is also helpful when analysing data in less time critical applications, for example when interacting directly with the data for exploratory analysis, that the algorithms are responsive within a reasonable time frame. Performing geostatistical analysis on such large spatial datasets can present a number of problems, particularly in the case where maximum likelihood. Although the storage requirements only scale linearly with the number of observations in the dataset, the computational complexity in terms of memory and speed, scale quadratically and cubically respectively. Most modern commodity hardware has at least 2 processor cores if not more. Other mechanisms for allowing parallel computation such as Grid based systems are also becoming increasingly commonly available. However, currently there seems to be little interest in exploiting this extra processing power within the context of geostatistics. In this paper we review the existing parallel approaches for geostatistics. By recognising that diffeerent natural parallelisms exist and can be exploited depending on whether the dataset is sparsely or densely sampled with respect to the range of variation, we introduce two contrasting novel implementations of parallel algorithms based on approximating the data likelihood extending the methods of Vecchia [1988] and Tresp [2000]. Using parallel maximum likelihood variogram estimation and parallel prediction algorithms we show that computational time can be significantly reduced. We demonstrate this with both sparsely sampled data and densely sampled data on a variety of architectures ranging from the common dual core processor, found in many modern desktop computers, to large multi-node super computers. To highlight the strengths and weaknesses of the diffeerent methods we employ synthetic data sets and go on to show how the methods allow maximum likelihood based inference on the exhaustive Walker Lake data set.
Resumo:
This book constitutes the refereed proceedings of the 14th International Conference on Parallel Problem Solving from Nature, PPSN 2016, held in Edinburgh, UK, in September 2016. The total of 93 revised full papers were carefully reviewed and selected from 224 submissions. The meeting began with four workshops which offered an ideal opportunity to explore specific topics in intelligent transportation Workshop, landscape-aware heuristic search, natural computing in scheduling and timetabling, and advances in multi-modal optimization. PPSN XIV also included sixteen free tutorials to give us all the opportunity to learn about new aspects: gray box optimization in theory; theory of evolutionary computation; graph-based and cartesian genetic programming; theory of parallel evolutionary algorithms; promoting diversity in evolutionary optimization: why and how; evolutionary multi-objective optimization; intelligent systems for smart cities; advances on multi-modal optimization; evolutionary computation in cryptography; evolutionary robotics - a practical guide to experiment with real hardware; evolutionary algorithms and hyper-heuristics; a bridge between optimization over manifolds and evolutionary computation; implementing evolutionary algorithms in the cloud; the attainment function approach to performance evaluation in EMO; runtime analysis of evolutionary algorithms: basic introduction; meta-model assisted (evolutionary) optimization. The papers are organized in topical sections on adaption, self-adaption and parameter tuning; differential evolution and swarm intelligence; dynamic, uncertain and constrained environments; genetic programming; multi-objective, many-objective and multi-level optimization; parallel algorithms and hardware issues; real-word applications and modeling; theory; diversity and landscape analysis.
Resumo:
In Fourier domain optical coherence tomography (FD-OCT), a large amount of interference data needs to be resampled from the wavelength domain to the wavenumber domain prior to Fourier transformation. We present an approach to optimize this data processing, using a graphics processing unit (GPU) and parallel processing algorithms. We demonstrate an increased processing and rendering rate over that previously reported by using GPU paged memory to render data in the GPU rather than copying back to the CPU. This avoids unnecessary and slow data transfer, enabling a processing and display rate of well over 524,000 A-scan/s for a single frame. To the best of our knowledge this is the fastest processing demonstrated to date and the first time that FD-OCT processing and rendering has been demonstrated entirely on a GPU.
Resumo:
Image segmentation is one of the most computationally intensive operations in image processing and computer vision. This is because a large volume of data is involved and many different features have to be extracted from the image data. This thesis is concerned with the investigation of practical issues related to the implementation of several classes of image segmentation algorithms on parallel architectures. The Transputer is used as the basic building block of hardware architectures and Occam is used as the programming language. The segmentation methods chosen for implementation are convolution, for edge-based segmentation; the Split and Merge algorithm for segmenting non-textured regions; and the Granlund method for segmentation of textured images. Three different convolution methods have been implemented. The direct method of convolution, carried out in the spatial domain, uses the array architecture. The other two methods, based on convolution in the frequency domain, require the use of the two-dimensional Fourier transform. Parallel implementations of two different Fast Fourier Transform algorithms have been developed, incorporating original solutions. For the Row-Column method the array architecture has been adopted, and for the Vector-Radix method, the pyramid architecture. The texture segmentation algorithm, for which a system-level design is given, demonstrates a further application of the Vector-Radix Fourier transform. A novel concurrent version of the quad-tree based Split and Merge algorithm has been implemented on the pyramid architecture. The performance of the developed parallel implementations is analysed. Many of the obtained speed-up and efficiency measures show values close to their respective theoretical maxima. Where appropriate comparisons are drawn between different implementations. The thesis concludes with comments on general issues related to the use of the Transputer system as a development tool for image processing applications; and on the issues related to the engineering of concurrent image processing applications.
Resumo:
We describe a novel and potentially important tool for candidate subunit vaccine selection through in silico reverse-vaccinology. A set of Bayesian networks able to make individual predictions for specific subcellular locations is implemented in three pipelines with different architectures: a parallel implementation with a confidence level-based decision engine and two serial implementations with a hierarchical decision structure, one initially rooted by prediction between membrane types and another rooted by soluble versus membrane prediction. The parallel pipeline outperformed the serial pipeline, but took twice as long to execute. The soluble-rooted serial pipeline outperformed the membrane-rooted predictor. Assessment using genomic test sets was more equivocal, as many more predictions are made by the parallel pipeline, yet the serial pipeline identifies 22 more of the 74 proteins of known location.
Resumo:
The performance of seven minimization algorithms are compared on five neural network problems. These include a variable-step-size algorithm, conjugate gradient, and several methods with explicit analytic or numerical approximations to the Hessian.
Resumo:
The optimization of resource allocation in sparse networks with real variables is studied using methods of statistical physics. Efficient distributed algorithms are devised on the basis of insight gained from the analysis and are examined using numerical simulations, showing excellent performance and full agreement with the theoretical results.
Resumo:
The transition of internally heated inclined plane parallel shear flows is examined numerically for the case of finite values of the Prandtl number Pr. We show that as the strength of the homogeneously distributed heat source is increased the basic flow loses stability to two-dimensional perturbations of the transverse roll type in a Hopf bifurcation for the vertical orientation of the fluid layer, whereas perturbations of the longitudinal roll type are most dangerous for a wide range of the value of the angle of inclination. In the case of the horizontal inclination transverse roll and longitudinal roll perturbations share the responsibility for the prime instability. Following the linear stability analysis for the general inclination of the fluid layer our attention is focused on a numerical study of the finite amplitude secondary travelling-wave solutions (TW) that develop from the perturbations of the transverse roll type for the vertical inclination of the fluid layer. The stability of the secondary TW against three-dimensional perturbations is also examined and our study shows that for Pr=0.71 the secondary instability sets in as a quasi-periodic mode, while for Pr=7 it is phase-locked to the secondary TW. The present study complements and extends the recent study by Nagata and Generalis (2002) in the case of vertical inclination for Pr=0.
Resumo:
With the ability to collect and store increasingly large datasets on modern computers comes the need to be able to process the data in a way that can be useful to a Geostatistician or application scientist. Although the storage requirements only scale linearly with the number of observations in the dataset, the computational complexity in terms of memory and speed, scale quadratically and cubically respectively for likelihood-based Geostatistics. Various methods have been proposed and are extensively used in an attempt to overcome these complexity issues. This thesis introduces a number of principled techniques for treating large datasets with an emphasis on three main areas: reduced complexity covariance matrices, sparsity in the covariance matrix and parallel algorithms for distributed computation. These techniques are presented individually, but it is also shown how they can be combined to produce techniques for further improving computational efficiency.
Resumo:
In the processing industries particulate materials are often in the form of powders which themselves are agglomerations of much smaller sized particles. During powder processing operations agglomerate degradation occurs primarily as a result of collisions between agglomerates and between agglomerates and the process equipment. Due to the small size of the agglomerates and the very short duration of the collisions it is currently not possible to obtain sufficiently detailed quantitative information from real experiments to provide a sound theoretically based strategy for designing particles to prevent or guarantee breakage. However, with the aid of computer simulated experiments, the micro-examination of these short duration dynamic events is made possible. This thesis presents the results of computer simulated experiments on a 2D monodisperse agglomerate in which the algorithms used to model the particle-particle interactions have been derived from contact mechanics theories and, necessarily, incorporate contact adhesion. A detailed description of the theoretical background is included in the thesis. The results of the agglomerate impact simulations show three types of behaviour depending on whether the initial impact velocity is high, moderate or low. It is demonstrated that high velocity impacts produce extensive plastic deformation which leads to subsequent shattering of the agglomerate. At moderate impact velocities semi-brittle fracture is observed and there is a threshold velocity below which the agglomerate bounces off the wall with little or no visible damage. The micromechanical processes controlling these different types of behaviour are discussed and illustrated by computer graphics. Further work is reported to demonstrate the effect of impact velocity and bond strength on the damage produced. Empirical relationships between impact velocity, bond strength and damage are presented and their relevance to attrition and comminution is discussed. The particle size distribution curves resulting from the agglomerate impacts are also provided. Computer simulated diametrical compression tests on the same agglomerate have also been carried out. Simulations were performed for different platen velocities and different bond strengths. The results show that high platen velocities produce extensive plastic deformation and crushing. Low platen velocities produce semi-brittle failure in which cracks propagate from the platens inwards towards the centre of the agglomerate. The results are compared with the results of the agglomerate impact tests in terms of work input, applied velocity and damage produced.
Resumo:
The trend in modal extraction algorithms is to use all the available frequency response functions data to obtain a global estimate of the natural frequencies, damping ratio and mode shapes. Improvements in transducer and signal processing technology allow the simultaneous measurement of many hundreds of channels of response data. The quantity of data available and the complexity of the extraction algorithms make considerable demands on the available computer power and require a powerful computer or dedicated workstation to perform satisfactorily. An alternative to waiting for faster sequential processors is to implement the algorithm in parallel, for example on a network of Transputers. Parallel architectures are a cost effective means of increasing computational power, and a larger number of response channels would simply require more processors. This thesis considers how two typical modal extraction algorithms, the Rational Fraction Polynomial method and the Ibrahim Time Domain method, may be implemented on a network of transputers. The Rational Fraction Polynomial Method is a well known and robust frequency domain 'curve fitting' algorithm. The Ibrahim Time Domain method is an efficient algorithm that 'curve fits' in the time domain. This thesis reviews the algorithms, considers the problems involved in a parallel implementation, and shows how they were implemented on a real Transputer network.
Resumo:
Orthogonal frequency division multiplexing (OFDM) is becoming a fundamental technology in future generation wireless communications. Call admission control is an effective mechanism to guarantee resilient, efficient, and quality-of-service (QoS) services in wireless mobile networks. In this paper, we present several call admission control algorithms for OFDM-based wireless multiservice networks. Call connection requests are differentiated into narrow-band calls and wide-band calls. For either class of calls, the traffic process is characterized as batch arrival since each call may request multiple subcarriers to satisfy its QoS requirement. The batch size is a random variable following a probability mass function (PMF) with realistically maximum value. In addition, the service times for wide-band and narrow-band calls are different. Following this, we perform a tele-traffic queueing analysis for OFDM-based wireless multiservice networks. The formulae for the significant performance metrics call blocking probability and bandwidth utilization are developed. Numerical investigations are presented to demonstrate the interaction between key parameters and performance metrics. The performance tradeoff among different call admission control algorithms is discussed. Moreover, the analytical model has been validated by simulation. The methodology as well as the result provides an efficient tool for planning next-generation OFDM-based broadband wireless access systems.
Resumo:
We present a parallel genetic algorithm for nding matrix multiplication algo-rithms. For 3 x 3 matrices our genetic algorithm successfully discovered algo-rithms requiring 23 multiplications, which are equivalent to the currently best known human-developed algorithms. We also studied the cases with less mul-tiplications and evaluated the suitability of the methods discovered. Although our evolutionary method did not reach the theoretical lower bound it led to an approximate solution for 22 multiplications.