13 resultados para Processing Element Array

em CentAUR: Central Archive University of Reading - UK


Relevância:

80.00% 80.00%

Publicador:

Resumo:

Global communicationrequirements andloadimbalanceof someparalleldataminingalgorithms arethe major obstacles to exploitthe computational power of large-scale systems. This work investigates how non-uniform data distributions can be exploited to remove the global communication requirement and to reduce the communication costin parallel data mining algorithms and, in particular, in the k-means algorithm for cluster analysis. In the straightforward parallel formulation of the k-means algorithm, data and computation loads are uniformly distributed over the processing nodes. This approach has excellent load balancing characteristics that may suggest it could scale up to large and extreme-scale parallel computing systems. However, at each iteration step the algorithm requires a global reduction operationwhichhinders thescalabilityoftheapproach.Thisworkstudiesadifferentparallelformulation of the algorithm where the requirement of global communication is removed, while maintaining the same deterministic nature ofthe centralised algorithm. The proposed approach exploits a non-uniform data distribution which can be either found in real-world distributed applications or can be induced by means ofmulti-dimensional binary searchtrees. The approachcanalso be extended to accommodate an approximation error which allows a further reduction ofthe communication costs. The effectiveness of the exact and approximate methods has been tested in a parallel computing system with 64 processors and in simulations with 1024 processing element

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We have optimised the atmospheric radiation algorithm of the FAMOUS climate model on several hardware platforms. The optimisation involved translating the Fortran code to C and restructuring the algorithm around the computation of a single air column. Instead of the existing MPI-based domain decomposition, we used a task queue and a thread pool to schedule the computation of individual columns on the available processors. Finally, four air columns are packed together in a single data structure and computed simultaneously using Single Instruction Multiple Data operations. The modified algorithm runs more than 50 times faster on the CELL’s Synergistic Processing Elements than on its main PowerPC processing element. On Intel-compatible processors, the new radiation code runs 4 times faster. On the tested graphics processor, using OpenCL, we find a speed-up of more than 2.5 times as compared to the original code on the main CPU. Because the radiation code takes more than 60% of the total CPU time, FAMOUS executes more than twice as fast. Our version of the algorithm returns bit-wise identical results, which demonstrates the robustness of our approach. We estimate that this project required around two and a half man-years of work.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The paper presents a design for a hardware genetic algorithm which uses a pipeline of systolic arrays. These arrays have been designed using systolic synthesis techniques which involve expressing the algorithm as a set of uniform recurrence relations. The final design divorces the fitness function evaluation from the hardware and can process chromosomes of different lengths, giving the design a generic quality. The paper demonstrates the design methodology by progressively re-writing a simple genetic algorithm, expressed in C code, into a form from which systolic structures can be deduced. This paper extends previous work by introducing a simplification to a previous systolic design for the genetic algorithm. The simplification results in the removal of 2N 2 + 4N cells and reduces the time complexity by 3N + 1 cycles.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Syntactic theory provides a rich array of representational assumptions about linguistic knowledge and processes. Such detailed and independently motivated constraints on grammatical knowledge ought to play a role in sentence comprehension. However most grammar-based explanations of processing difficulty in the literature have attempted to use grammatical representations and processes per se to explain processing difficulty. They did not take into account that the description of higher cognition in mind and brain encompasses two levels: on the one hand, at the macrolevel, symbolic computation is performed, and on the other hand, at the microlevel, computation is achieved through processes within a dynamical system. One critical question is therefore how linguistic theory and dynamical systems can be unified to provide an explanation for processing effects. Here, we present such a unification for a particular account to syntactic theory: namely a parser for Stabler's Minimalist Grammars, in the framework of Smolensky's Integrated Connectionist/Symbolic architectures. In simulations we demonstrate that the connectionist minimalist parser produces predictions which mirror global empirical findings from psycholinguistic research.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

How can a bridge be built between autonomic computing approaches and parallel computing systems? How can autonomic computing approaches be extended towards building reliable systems? How can existing technologies be merged to provide a solution for self-managing systems? The work reported in this paper aims to answer these questions by proposing Swarm-Array Computing, a novel technique inspired from swarm robotics and built on the foundations of autonomic and parallel computing paradigms. Two approaches based on intelligent cores and intelligent agents are proposed to achieve autonomy in parallel computing systems. The feasibility of the proposed approaches is validated on a multi-agent simulator.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The paper is concerned with the uniformization of a system of affine recurrence equations. This transformation is used in the design (or compilation) of highly parallel embedded systems (VLSI systolic arrays, signal processing filters, etc.). We present and implement an automatic system to achieve uniformization of systems of affine recurrence equations. We unify the results from many earlier papers, develop some theoretical extensions, and then propose effective uniformization algorithms. Our results can be used in any high level synthesis tool based on polyhedral representation of nested loop computations.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

How can a bridge be built between autonomic computing approaches and parallel computing systems? The work reported in this paper is motivated towards bridging this gap by proposing a swarm-array computing approach based on ‘Intelligent Agents’ to achieve autonomy for distributed parallel computing systems. In the proposed approach, a task to be executed on parallel computing cores is carried onto a computing core by carrier agents that can seamlessly transfer between processing cores in the event of a predicted failure. The cognitive capabilities of the carrier agents on a parallel processing core serves in achieving the self-ware objectives of autonomic computing, hence applying autonomic computing concepts for the benefit of parallel computing systems. The feasibility of the proposed approach is validated by simulation studies using a multi-agent simulator on an FPGA (Field-Programmable Gate Array) and experimental studies using MPI (Message Passing Interface) on a computer cluster. Preliminary results confirm that applying autonomic computing principles to parallel computing systems is beneficial.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A new approach is presented to identify the number of incoming signals in antenna array processing. The new method exploits the inherent properties existing in the noise eigenvalues of the covariance matrix of the array output. A single threshold has been established concerning information about the signal and noise strength, data length, and array size. When the subspace-based algorithms are adopted the computation cost of the signal number detector can almost be neglected. The performance of the threshold is robust against low SNR and short data length.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The real-time parallel computation of histograms using an array of pipelined cells is proposed and prototyped in this paper with application to consumer imaging products. The array operates in two modes: histogram computation and histogram reading. The proposed parallel computation method does not use any memory blocks. The resulting histogram bins can be stored into an external memory block in a pipelined fashion for subsequent reading or streaming of the results. The array of cells can be tuned to accommodate the required data path width in a VLSI image processing engine as present in many imaging consumer devices. Synthesis of the architectures presented in this paper in FPGA are shown to compute the real-time histogram of images streamed at over 36 megapixels at 30 frames/s by processing in parallel 1, 2 or 4 pixels per clock cycle.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background: Expression microarrays are increasingly used to obtain large scale transcriptomic information on a wide range of biological samples. Nevertheless, there is still much debate on the best ways to process data, to design experiments and analyse the output. Furthermore, many of the more sophisticated mathematical approaches to data analysis in the literature remain inaccessible to much of the biological research community. In this study we examine ways of extracting and analysing a large data set obtained using the Agilent long oligonucleotide transcriptomics platform, applied to a set of human macrophage and dendritic cell samples. Results: We describe and validate a series of data extraction, transformation and normalisation steps which are implemented via a new R function. Analysis of replicate normalised reference data demonstrate that intrarray variability is small (only around 2 of the mean log signal), while interarray variability from replicate array measurements has a standard deviation (SD) of around 0.5 log(2) units (6 of mean). The common practise of working with ratios of Cy5/Cy3 signal offers little further improvement in terms of reducing error. Comparison to expression data obtained using Arabidopsis samples demonstrates that the large number of genes in each sample showing a low level of transcription reflect the real complexity of the cellular transcriptome. Multidimensional scaling is used to show that the processed data identifies an underlying structure which reflect some of the key biological variables which define the data set. This structure is robust, allowing reliable comparison of samples collected over a number of years and collected by a variety of operators. Conclusions: This study outlines a robust and easily implemented pipeline for extracting, transforming normalising and visualising transcriptomic array data from Agilent expression platform. The analysis is used to obtain quantitative estimates of the SD arising from experimental (non biological) intra- and interarray variability, and for a lower threshold for determining whether an individual gene is expressed. The study provides a reliable basis for further more extensive studies of the systems biology of eukaryotic cells.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We propose and analyse a hybrid numerical–asymptotic hp boundary element method (BEM) for time-harmonic scattering of an incident plane wave by an arbitrary collinear array of sound-soft two-dimensional screens. Our method uses an approximation space enriched with oscillatory basis functions, chosen to capture the high-frequency asymptotics of the solution. We provide a rigorous frequency-explicit error analysis which proves that the method converges exponentially as the number of degrees of freedom N increases, and that to achieve any desired accuracy it is sufficient to increase N in proportion to the square of the logarithm of the frequency as the frequency increases (standard BEMs require N to increase at least linearly with frequency to retain accuracy). Our numerical results suggest that fixed accuracy can in fact be achieved at arbitrarily high frequencies with a frequency-independent computational cost, when the oscillatory integrals required for implementation are computed using Filon quadrature. We also show how our method can be applied to the complementary ‘breakwater’ problem of propagation through an aperture in an infinite sound-hard screen.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This article reports on a study investigating the relative influence of the first and dominant language on L2 and L3 morpho-lexical processing. A lexical decision task compared the responses to English NV-er compounds (e.g., taxi driver) and non-compounds provided by a group of native speakers and three groups of learners at various levels of English proficiency: L1 Spanish-L2 English sequential bilinguals and two groups of early Spanish-Basque bilinguals with English as their L3. Crucially, the two trilingual groups differed in their first and dominant language (i.e., L1 Spanish-L2 Basque vs. L1 Basque-L2 Spanish). Our materials exploit an (a)symmetry between these languages: while Basque and English pattern together in the basic structure of (productive) NV-er compounds, Spanish presents a construction that differs in directionality as well as inflection of the verbal element (V[3SG] + N). Results show between and within group differences in accuracy and response times that may be ascribable to two factors besides proficiency: the number of languages spoken by a given participant and their dominant language. An examination of response bias reveals an influence of the participants' first and dominant language on the processing of NV-er compounds. Our data suggest that morphological information in the nonnative lexicon may extend beyond morphemic structure and that, similarly to bilingualism, there are costs to sequential multilingualism in lexical retrieval.