947 resultados para parallel processing
Resumo:
Thesis (Ph.D.)--University of Washington, 2016-06
Resumo:
A specialised reconfigurable architecture is targeted at wireless base-band processing. It is built to cater for multiple wireless standards. It has lower power consumption than the processor-based solution. It can be scaled to run in parallel for processing multiple channels. Test resources are embedded on the architecture and testing strategies are included. This architecture is functionally partitioned according to the common operations found in wireless standards, such as CRC error correction, convolution and interleaving. These modules are linked via Virtual Wire Hardware modules and route-through switch matrices. Data can be processed in any order through this interconnect structure. Virtual Wire ensures the same flexibility as normal interconnects, but the area occupied and the number of switches needed is reduced. The testing algorithm scans all possible paths within the interconnection network exhaustively and searches for faults in the processing modules. The testing algorithm starts by scanning the externally addressable memory space and testing the master controller. The controller then tests every switch in the route-through switch matrix by making loops from the shared memory to each of the switches. The local switch matrix is also tested in the same way. Next the local memory is scanned. Finally, pre-defined test vectors are loaded into local memory to check the processing modules. This paper compares various base-band processing solutions. It describes the proposed platform and its implementation. It outlines the test resources and algorithm. It concludes with the mapping of Bluetooth and GSM base-band onto the platform.
Resumo:
Very large spatially-referenced datasets, for example, those derived from satellite-based sensors which sample across the globe or large monitoring networks of individual sensors, are becoming increasingly common and more widely available for use in environmental decision making. In large or dense sensor networks, huge quantities of data can be collected over small time periods. In many applications the generation of maps, or predictions at specific locations, from the data in (near) real-time is crucial. Geostatistical operations such as interpolation are vital in this map-generation process and in emergency situations, the resulting predictions need to be available almost instantly, so that decision makers can make informed decisions and define risk and evacuation zones. It is also helpful when analysing data in less time critical applications, for example when interacting directly with the data for exploratory analysis, that the algorithms are responsive within a reasonable time frame. Performing geostatistical analysis on such large spatial datasets can present a number of problems, particularly in the case where maximum likelihood. Although the storage requirements only scale linearly with the number of observations in the dataset, the computational complexity in terms of memory and speed, scale quadratically and cubically respectively. Most modern commodity hardware has at least 2 processor cores if not more. Other mechanisms for allowing parallel computation such as Grid based systems are also becoming increasingly commonly available. However, currently there seems to be little interest in exploiting this extra processing power within the context of geostatistics. In this paper we review the existing parallel approaches for geostatistics. By recognising that diffeerent natural parallelisms exist and can be exploited depending on whether the dataset is sparsely or densely sampled with respect to the range of variation, we introduce two contrasting novel implementations of parallel algorithms based on approximating the data likelihood extending the methods of Vecchia [1988] and Tresp [2000]. Using parallel maximum likelihood variogram estimation and parallel prediction algorithms we show that computational time can be significantly reduced. We demonstrate this with both sparsely sampled data and densely sampled data on a variety of architectures ranging from the common dual core processor, found in many modern desktop computers, to large multi-node super computers. To highlight the strengths and weaknesses of the diffeerent methods we employ synthetic data sets and go on to show how the methods allow maximum likelihood based inference on the exhaustive Walker Lake data set.
Resumo:
Image segmentation is one of the most computationally intensive operations in image processing and computer vision. This is because a large volume of data is involved and many different features have to be extracted from the image data. This thesis is concerned with the investigation of practical issues related to the implementation of several classes of image segmentation algorithms on parallel architectures. The Transputer is used as the basic building block of hardware architectures and Occam is used as the programming language. The segmentation methods chosen for implementation are convolution, for edge-based segmentation; the Split and Merge algorithm for segmenting non-textured regions; and the Granlund method for segmentation of textured images. Three different convolution methods have been implemented. The direct method of convolution, carried out in the spatial domain, uses the array architecture. The other two methods, based on convolution in the frequency domain, require the use of the two-dimensional Fourier transform. Parallel implementations of two different Fast Fourier Transform algorithms have been developed, incorporating original solutions. For the Row-Column method the array architecture has been adopted, and for the Vector-Radix method, the pyramid architecture. The texture segmentation algorithm, for which a system-level design is given, demonstrates a further application of the Vector-Radix Fourier transform. A novel concurrent version of the quad-tree based Split and Merge algorithm has been implemented on the pyramid architecture. The performance of the developed parallel implementations is analysed. Many of the obtained speed-up and efficiency measures show values close to their respective theoretical maxima. Where appropriate comparisons are drawn between different implementations. The thesis concludes with comments on general issues related to the use of the Transputer system as a development tool for image processing applications; and on the issues related to the engineering of concurrent image processing applications.
Resumo:
The orientations of lines and edges are important in defining the structure of the visual environment, and observers can detect differences in line orientation within the first few hundred milliseconds of scene viewing. The present work is a psychophysical investigation of the mechanisms of early visual orientation-processing. In experiments with briefly presented displays of line elements, observers indicated whether all the elements were uniformly oriented or whether a uniquely oriented target was present among uniformly oriented nontargets. The minimum difference between nontarget and target orientations that was required for effective target-detection (the orientation increment threshold) varied little with the number of elements and their spatial density, but the percentage of correct responses in detection of a large orientation-difference increased with increasing element density. The differing variations with element density of thresholds and percent-correct scores may indicate the operation of more than one mechanism in early visual orientation-processIng. Reducing element length caused threshold to increase with increasing number of elements, showing that the effectiveness of rapid, spatially parallel orientation-processing depends on element length. Orientational anisotropy in line-target detection has been reported previously: a coarse periodic variation and some finer variations in orientation increment threshold with nontarget orientation have been found. In the present work, the prominence of the coarse variation in relation to finer variations decreased with increasing effective viewing duration, as if the operation of coarse orientation-processing mechanisms precedes the operation of finer ones. Orientational anisotropy was prominent even when observers lay horizontally and viewed displays by looking upwards through a black cylinder that excluded all possible visual references for orientation. So, gravitational and visual cues are not essential to the definition of an orientational reference frame for early vision, and such a reference can be well defined by retinocentric neural coding, awareness of body-axis orientation, or both.
Resumo:
The focus of this study is development of parallelised version of severely sequential and iterative numerical algorithms based on multi-threaded parallel platform such as a graphics processing unit. This requires design and development of a platform-specific numerical solution that can benefit from the parallel capabilities of the chosen platform. Graphics processing unit was chosen as a parallel platform for design and development of a numerical solution for a specific physical model in non-linear optics. This problem appears in describing ultra-short pulse propagation in bulk transparent media that has recently been subject to several theoretical and numerical studies. The mathematical model describing this phenomenon is a challenging and complex problem and its numerical modeling limited on current modern workstations. Numerical modeling of this problem requires a parallelisation of an essentially serial algorithms and elimination of numerical bottlenecks. The main challenge to overcome is parallelisation of the globally non-local mathematical model. This thesis presents a numerical solution for elimination of numerical bottleneck associated with the non-local nature of the mathematical model. The accuracy and performance of the parallel code is identified by back-to-back testing with a similar serial version.
Resumo:
We report the performance of a group of adult dyslexics and matched controls in an array-matching task where two strings of either consonants or symbols are presented side by side and have to be judged to be the same or different. The arrays may differ either in the order or identity of two adjacent characters. This task does not require naming – which has been argued to be the cause of dyslexics’ difficulty in processing visual arrays – but, instead, has a strong serial component as demonstrated by the fact that, in both groups, Reaction times (RTs) increase monotonically with position of a mismatch. The dyslexics are clearly impaired in all conditions and performance in the identity conditions predicts performance across orthographic tasks even after age, performance IQ and phonology are partialled out. Moreover, the shapes of serial position curves are revealing of the underlying impairment. In the dyslexics, RTs increase with position at the same rate as in the controls (lines are parallel) ruling out reduced processing speed or difficulties in shifting attention. Instead, error rates show a catastrophic increase for positions which are either searched later or more subject to interference. These results are consistent with a reduction in the attentional capacity needed in a serial task to bind together identity and positional information. This capacity is best seen as a reduction in the number of spotlights into which attention can be split to process information at different locations rather than as a more generic reduction of resources which would also affect processing the details of single objects.
Resumo:
Data processing services for Meteosat geostationary satellite are presented. Implemented services correspond to the different levels of remote-sensing data processing, including noise reduction at preprocessing level, cloud mask extraction at low-level and fractal dimension estimation at high-level. Cloud mask obtained as a result of Markovian segmentation of infrared data. To overcome high computation complexity of Markovian segmentation parallel algorithm is developed. Fractal dimension of Meteosat data estimated using fractional Brownian motion models.
Resumo:
Femtosecond laser microfabrication has emerged over the last decade as a 3D flexible technology in photonics. Numerical simulations provide an important insight into spatial and temporal beam and pulse shaping during the course of extremely intricate nonlinear propagation (see e.g. [1,2]). Electromagnetics of such propagation is typically described in the form of the generalized Non-Linear Schrdinger Equation (NLSE) coupled with Drude model for plasma [3]. In this paper we consider a multi-threaded parallel numerical solution for a specific model which describes femtosecond laser pulse propagation in transparent media [4, 5]. However our approach can be extended to similar models. The numerical code is implemented in NVIDIA Graphics Processing Unit (GPU) which provides an effitient hardware platform for multi-threded computing. We compare the performance of the described below parallel code implementated for GPU using CUDA programming interface [3] with a serial CPU version used in our previous papers [4,5]. © 2011 IEEE.
Resumo:
As massive data sets become increasingly available, people are facing the problem of how to effectively process and understand these data. Traditional sequential computing models are giving way to parallel and distributed computing models, such as MapReduce, both due to the large size of the data sets and their high dimensionality. This dissertation, as in the same direction of other researches that are based on MapReduce, tries to develop effective techniques and applications using MapReduce that can help people solve large-scale problems. Three different problems are tackled in the dissertation. The first one deals with processing terabytes of raster data in a spatial data management system. Aerial imagery files are broken into tiles to enable data parallel computation. The second and third problems deal with dimension reduction techniques that can be used to handle data sets of high dimensionality. Three variants of the nonnegative matrix factorization technique are scaled up to factorize matrices of dimensions in the order of millions in MapReduce based on different matrix multiplication implementations. Two algorithms, which compute CANDECOMP/PARAFAC and Tucker tensor decompositions respectively, are parallelized in MapReduce based on carefully partitioning the data and arranging the computation to maximize data locality and parallelism.
Resumo:
This research aims at a study of the hybrid flow shop problem which has parallel batch-processing machines in one stage and discrete-processing machines in other stages to process jobs of arbitrary sizes. The objective is to minimize the makespan for a set of jobs. The problem is denoted as: FF: batch1,sj:Cmax. The problem is formulated as a mixed-integer linear program. The commercial solver, AMPL/CPLEX, is used to solve problem instances to their optimality. Experimental results show that AMPL/CPLEX requires considerable time to find the optimal solution for even a small size problem, i.e., a 6-job instance requires 2 hours in average. A bottleneck-first-decomposition heuristic (BFD) is proposed in this study to overcome the computational (time) problem encountered while using the commercial solver. The proposed BFD heuristic is inspired by the shifting bottleneck heuristic. It decomposes the entire problem into three sub-problems, and schedules the sub-problems one by one. The proposed BFD heuristic consists of four major steps: formulating sub-problems, prioritizing sub-problems, solving sub-problems and re-scheduling. For solving the sub-problems, two heuristic algorithms are proposed; one for scheduling a hybrid flow shop with discrete processing machines, and the other for scheduling parallel batching machines (single stage). Both consider job arrival and delivery times. An experiment design is conducted to evaluate the effectiveness of the proposed BFD, which is further evaluated against a set of common heuristics including a randomized greedy heuristic and five dispatching rules. The results show that the proposed BFD heuristic outperforms all these algorithms. To evaluate the quality of the heuristic solution, a procedure is developed to calculate a lower bound of makespan for the problem under study. The lower bound obtained is tighter than other bounds developed for related problems in literature. A meta-search approach based on the Genetic Algorithm concept is developed to evaluate the significance of further improving the solution obtained from the proposed BFD heuristic. The experiment indicates that it reduces the makespan by 1.93 % in average within a negligible time when problem size is less than 50 jobs.
Resumo:
Structured parallel programming, and in particular programming models using the algorithmic skeleton or parallel design pattern concepts, are increasingly considered to be the only viable means of supporting effective development of scalable and efficient parallel programs. Structured parallel programming models have been assessed in a number of works in the context of performance. In this paper we consider how the use of structured parallel programming models allows knowledge of the parallel patterns present to be harnessed to address both performance and energy consumption. We consider different features of structured parallel programming that may be leveraged to impact the performance/energy trade-off and we discuss a preliminary set of experiments validating our claims.
Resumo:
Graph analytics is an important and computationally demanding class of data analytics. It is essential to balance scalability, ease-of-use and high performance in large scale graph analytics. As such, it is necessary to hide the complexity of parallelism, data distribution and memory locality behind an abstract interface. The aim of this work is to build a scalable graph analytics framework that does not demand significant parallel programming experience based on NUMA-awareness.
The realization of such a system faces two key problems:
(i)~how to develop a scale-free parallel programming framework that scales efficiently across NUMA domains; (ii)~how to efficiently apply graph partitioning in order to create separate and largely independent work items that can be distributed among threads.
Resumo:
Thesis (Ph.D.)--University of Washington, 2016-08