999 resultados para parallel sorting


Relevância:

80.00% 80.00%

Publicador:

Resumo:

We found an interesting relation between convex optimization and sorting problem. We present a parallel algorithm to compute multiple order statistics of the data by minimizing a number of related convex functions. The computed order statistics serve as splitters that group the data into buckets suitable for parallel bitonic sorting. This led us to a parallel bucket sort algorithm, which we implemented for many-core architecture of graphics processing units (GPUs). The proposed sorting method is competitive to the state-of-the-art GPU sorting algorithms and is superior to most of them for long sorting keys.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

We consider the often-studied problem of sorting, for a parallel computer. Given an input array distributed evenly over p processors, the task is to compute the sorted output array, also distributed over the p processors. Many existing algorithms take the approach of approximately load-balancing the output, leaving each processor with Θ(n/p) elements. However, in many cases, approximate load-balancing leads to inefficiencies in both the sorting itself and in further uses of the data after sorting. We provide a deterministic parallel sorting algorithm that uses parallel selection to produce any output distribution exactly, particularly one that is perfectly load-balanced. Furthermore, when using a comparison sort, this algorithm is 1-optimal in both computation and communication. We provide an empirical study that illustrates the efficiency of exact data splitting, and shows an improvement over two sample sort algorithms.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper describes a fast integer sorting algorithm, herein referred as Bit-index sort, which is a non-comparison sorting algorithm for partial per-mutations, with linear complexity order in execution time. Bit-index sort uses a bit-array to classify input sequences of distinct integers, and exploits built-in bit functions in C compilers supported by machine hardware to retrieve the ordered output sequence. Results show that Bit-index sort outperforms in execution time to quicksort and counting sort algorithms. A parallel approach for Bit-index sort using two simultaneous threads is included, which obtains speedups up to 1.6.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We present an approach to computing high-breakdown regression estimators in parallel on graphics processing units (GPU).We show that sorting the residuals is not necessary, and it can be substituted by calculating the median. We present and compare various methods to calculate the median and order statistics on GPUs. We introduce an alternative method based on the optimization of a convex function, and showits numerical superiority when calculating the order statistics of very large arrays on GPUs.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper presents a parallel surrogate-based global optimization method for computationally expensive objective functions that is more effective for larger numbers of processors. To reach this goal, we integrated concepts from multi-objective optimization and tabu search into, single objective, surrogate optimization. Our proposed derivative-free algorithm, called SOP, uses non-dominated sorting of points for which the expensive function has been previously evaluated. The two objectives are the expensive function value of the point and the minimum distance of the point to previously evaluated points. Based on the results of non-dominated sorting, P points from the sorted fronts are selected as centers from which many candidate points are generated by random perturbations. Based on surrogate approximation, the best candidate point is subsequently selected for expensive evaluation for each of the P centers, with simultaneous computation on P processors. Centers that previously did not generate good solutions are tabu with a given tenure. We show almost sure convergence of this algorithm under some conditions. The performance of SOP is compared with two RBF based methods. The test results show that SOP is an efficient method that can reduce time required to find a good near optimal solution. In a number of cases the efficiency of SOP is so good that SOP with 8 processors found an accurate answer in less wall-clock time than the other algorithms did with 32 processors.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper outlines the problems found in the parallelization of SPH (Smoothed Particle Hydrodynamics) algorithms using Graphics Processing Units. Different results of some parallel GPU implementations in terms of the speed-up and the scalability compared to the CPU sequential codes are shown. The most problematic stage in the GPU-SPH algorithms is the one responsible for locating neighboring particles and building the vectors where this information is stored, since these specific algorithms raise many dificulties for a data-level parallelization. Because of the fact that the neighbor location using linked lists does not show enough data-level parallelism, two new approaches have been pro- posed to minimize bank conflicts in the writing and subsequent reading of the neighbor lists. The first strategy proposes an efficient coordination between CPU-GPU, using GPU algorithms for those stages that allow a straight forward parallelization, and sequential CPU algorithms for those instructions that involve some kind of vector reduction. This coordination provides a relatively orderly reading of the neighbor lists in the interactions stage, achieving a speed-up factor of x47 in this stage. However, since the construction of the neighbor lists is quite expensive, it is achieved an overall speed-up of x41. The second strategy seeks to maximize the use of the GPU in the neighbor's location process by executing a specific vector sorting algorithm that allows some data-level parallelism. Al- though this strategy has succeeded in improving the speed-up on the stage of neighboring location, the global speed-up on the interactions stage falls, due to inefficient reading of the neighbor vectors. Some changes to these strategies are proposed, aimed at maximizing the computational load of the GPU and using the GPU texture-units, in order to reach the maximum speed-up for such codes. Different practical applications have been added to the mentioned GPU codes. First, the classical dam-break problem is studied. Second, the wave impact of the sloshing fluid contained in LNG vessel tanks is also simulated as a practical example of particle methods

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Includes bibliographical references.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper describes a fast integer sorting algorithm, herein referred to as Bit-index sort, which does not use comparisons and is intended to sort partial permutations. Experimental results exhibit linear complexity order in execution time. Bit-index sort uses a bit-array to classify input sequences of distinct integers, and exploits built-in bit functions in C compilers, supported by machine hardware, to retrieve the ordered output sequence. Results show that Bit-index sort outperforms quicksort and counting sort algorithms when compared in their execution time. A parallel approach for Bit-index sort using two simultaneous threads is also included, which obtains further speedups of up to 1.6 compared to its sequential case.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Investigated the psychometric properties of the original and alternate sets of the Trail Making Test (TMT) and the Controlled Oral Word Association Test (COWAT; A. L. Benton and D. Hamsher, 1978) in 50 orthopedic and 15 closed head injured (1 yr after trauma) patients (aged 15–59 yrs). Although the alternate forms of both measures proved to be stable and consistent with each other in both groups, only the parallel sets of TMT reliably discriminated the clinical group from controls. Practice effects in the head injured were significant only for Trail B of TMT. Factor analysis of the control group's results identified Verbal Knowledge as a major contributor to performance on COWAT, whereas TMT was more dependent on Rapid Visual Search and Visuomotor Sequencing.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper, the train scheduling problem is modelled as a blocking parallel-machine job shop scheduling (BPMJSS) problem. In the model, trains, single-track sections and multiple-track sections, respectively, are synonymous with jobs, single machines and parallel machines, and an operation is regarded as the movement/traversal of a train across a section. Due to the lack of buffer space, the real-life case should consider blocking or hold-while-wait constraints, which means that a track section cannot release and must hold the train until next section on the routing becomes available. Based on literature review and our analysis, it is very hard to find a feasible complete schedule directly for BPMJSS problems. Firstly, a parallel-machine job-shop-scheduling (PMJSS) problem is solved by an improved shifting bottleneck procedure (SBP) algorithm without considering blocking conditions. Inspired by the proposed SBP algorithm, feasibility satisfaction procedure (FSP) algorithm is developed to solve and analyse the BPMJSS problem, by an alternative graph model that is an extension of the classical disjunctive graph models. The proposed algorithms have been implemented and validated using real-world data from Queensland Rail. Sensitivity analysis has been applied by considering train length, upgrading track sections, increasing train speed and changing bottleneck sections. The outcomes show that the proposed methodology would be a very useful tool for the real-life train scheduling problems