30 resultados para parallel processing
Resumo:
Advances in hardware and software technology enable us to collect, store and distribute large quantities of data on a very large scale. Automatically discovering and extracting hidden knowledge in the form of patterns from these large data volumes is known as data mining. Data mining technology is not only a part of business intelligence, but is also used in many other application areas such as research, marketing and financial analytics. For example medical scientists can use patterns extracted from historic patient data in order to determine if a new patient is likely to respond positively to a particular treatment or not; marketing analysts can use extracted patterns from customer data for future advertisement campaigns; finance experts have an interest in patterns that forecast the development of certain stock market shares for investment recommendations. However, extracting knowledge in the form of patterns from massive data volumes imposes a number of computational challenges in terms of processing time, memory, bandwidth and power consumption. These challenges have led to the development of parallel and distributed data analysis approaches and the utilisation of Grid and Cloud computing. This chapter gives an overview of parallel and distributed computing approaches and how they can be used to scale up data mining to large datasets.
Resumo:
It has been previously demonstrated that extensive activation in the dorsolateral temporal lobes associated with masking a speech target with a speech masker, consistent with the hypothesis that competition for central auditory processes is an important factor in informational masking. Here, masking from speech and two additional maskers derived from the original speech were investigated. One of these is spectrally rotated speech, which is unintelligible and has a similar (inverted) spectrotemporal profile to speech. The authors also controlled for the possibility of “glimpsing” of the target signal during modulated masking sounds by using speech-modulated noise as a masker in a baseline condition. Functional imaging results reveal that masking speech with speech leads to bilateral superior temporal gyrus (STG) activation relative to a speech-in-noise baseline, while masking speech with spectrally rotated speech leads solely to right STG activation relative to the baseline. This result is discussed in terms of hemispheric asymmetries for speech perception, and interpreted as showing that masking effects can arise through two parallel neural systems, in the left and right temporal lobes. This has implications for the competition for resources caused by speech and rotated speech maskers, and may illuminate some of the mechanisms involved in informational masking.
Resumo:
It has been previously demonstrated that extensive activation in the dorsolateral temporal lobes associated with masking a speech target with a speech masker, consistent with the hypothesis that competition for central auditory processes is an important factor in informational masking. Here, masking from speech and two additional maskers derived from the original speech were investigated. One of these is spectrally rotated speech, which is unintelligible and has a similar (inverted) spectrotemporal profile to speech. The authors also controlled for the possibility of "glimpsing" of the target signal during modulated masking sounds by using speech-modulated noise as a masker in a baseline condition. Functional imaging results reveal that masking speech with speech leads to bilateral superior temporal gyrus (STG) activation relative to a speech-in-noise baseline, while masking speech with spectrally rotated speech leads solely to right STG activation relative to the baseline. This result is discussed in terms of hemispheric asymmetries for speech perception, and interpreted as showing that masking effects can arise through two parallel neural systems, in the left and right temporal lobes. This has implications for the competition for resources caused by speech and rotated speech maskers, and may illuminate some of the mechanisms involved in informational masking.
Resumo:
This paper presents a paralleled Two-Pass Hexagonal (TPA) algorithm constituted by Linear Hashtable Motion Estimation Algorithm (LHMEA) and Hexagonal Search (HEXBS) for motion estimation. In the TPA., Motion Vectors (MV) are generated from the first-pass LHMEA and are used as predictors for second-pass HEXBS motion estimation, which only searches a small number of Macroblocks (MBs). We introduced hashtable into video processing and completed parallel implementation. We propose and evaluate parallel implementations of the LHMEA of TPA on clusters of workstations for real time video compression. It discusses how parallel video coding on load balanced multiprocessor systems can help, especially on motion estimation. The effect of load balancing for improved performance is discussed. The performance or the algorithm is evaluated by using standard video sequences and the results are compared to current algorithms.
Resumo:
This paper presents a paralleled Two-Pass Hexagonal (TPA) algorithm constituted by Linear Hashtable Motion Estimation Algorithm (LHMEA) and Hexagonal Search (HEXBS) for motion estimation. In the TPA, Motion Vectors (MV) are generated from the first-pass LHMEA and are used as predictors for second-pass HEXBS motion estimation, which only searches a small number of Macroblocks (MBs). We introduced hashtable into video processing and completed parallel implementation. We propose and evaluate parallel implementations of the LHMEA of TPA on clusters of workstations for real time video compression. It discusses how parallel video coding on load balanced multiprocessor systems can help, especially on motion estimation. The effect of load balancing for improved performance is discussed. The performance of the algorithm is evaluated by using standard video sequences and the results are compared to current algorithms.
Resumo:
This paper presents an improved parallel Two-Pass Hexagonal (TPA) algorithm constituted by Linear Hashtable Motion Estimation Algorithm (LHMEA) and Hexagonal Search (HEXBS) for motion estimation. Motion Vectors (MV) are generated from the first-pass LHMEA and used as predictors for second-pass HEXBS motion estimation, which only searches a small number of Macroblocks (MBs). We used bashtable into video processing and completed parallel implementation. The hashtable structure of LHMEA is improved compared to the original TPA and LHMEA. We propose and evaluate parallel implementations of the LHMEA of TPA on clusters of workstations for real time video compression. The implementation contains spatial and temporal approaches. The performance of the algorithm is evaluated by using standard video sequences and the results are compared to current algorithms.
Resumo:
The real-time parallel computation of histograms using an array of pipelined cells is proposed and prototyped in this paper with application to consumer imaging products. The array operates in two modes: histogram computation and histogram reading. The proposed parallel computation method does not use any memory blocks. The resulting histogram bins can be stored into an external memory block in a pipelined fashion for subsequent reading or streaming of the results. The array of cells can be tuned to accommodate the required data path width in a VLSI image processing engine as present in many imaging consumer devices. Synthesis of the architectures presented in this paper in FPGA are shown to compute the real-time histogram of images streamed at over 36 megapixels at 30 frames/s by processing in parallel 1, 2 or 4 pixels per clock cycle.
Resumo:
Global communicationrequirements andloadimbalanceof someparalleldataminingalgorithms arethe major obstacles to exploitthe computational power of large-scale systems. This work investigates how non-uniform data distributions can be exploited to remove the global communication requirement and to reduce the communication costin parallel data mining algorithms and, in particular, in the k-means algorithm for cluster analysis. In the straightforward parallel formulation of the k-means algorithm, data and computation loads are uniformly distributed over the processing nodes. This approach has excellent load balancing characteristics that may suggest it could scale up to large and extreme-scale parallel computing systems. However, at each iteration step the algorithm requires a global reduction operationwhichhinders thescalabilityoftheapproach.Thisworkstudiesadifferentparallelformulation of the algorithm where the requirement of global communication is removed, while maintaining the same deterministic nature ofthe centralised algorithm. The proposed approach exploits a non-uniform data distribution which can be either found in real-world distributed applications or can be induced by means ofmulti-dimensional binary searchtrees. The approachcanalso be extended to accommodate an approximation error which allows a further reduction ofthe communication costs. The effectiveness of the exact and approximate methods has been tested in a parallel computing system with 64 processors and in simulations with 1024 processing element
Resumo:
Background Selective serotonin reuptake inhibitors (SSRIs) are popular medications for anxiety and depression, but their effectiveness, particularly in patients with prominent symptoms of loss of motivation and pleasure, has been questioned. There are few studies of the effect of SSRIs on neural reward mechanisms in humans. Methods We studied 45 healthy participants who were randomly allocated to receive the SSRI citalopram, the noradrenaline reuptake inhibitor reboxetine, or placebo for 7 days in a double-blind, parallel group design. We used functional magnetic resonance imaging to measure the neural response to rewarding (sight and/or flavor of chocolate) and aversive stimuli (sight of moldy strawberries and/or an unpleasant strawberry taste) on the final day of drug treatment. Results Citalopram reduced activation to the chocolate stimuli in the ventral striatum and the ventral medial/orbitofrontal cortex. In contrast, reboxetine did not suppress ventral striatal activity and in fact increased neural responses within medial orbitofrontal cortex to reward. Citalopram also decreased neural responses to the aversive stimuli conditions in key “punishment” areas such as the lateral orbitofrontal cortex. Reboxetine produced a similar, although weaker effect. Conclusions Our findings are the first to show that treatment with SSRIs can diminish the neural processing of both rewarding and aversive stimuli. The ability of SSRIs to decrease neural responses to reward might underlie the questioned efficacy of SSRIs in depressive conditions characterized by decreased motivation and anhedonia and could also account for the experience of emotional blunting described by some patients during SSRI treatment.
Resumo:
We have optimised the atmospheric radiation algorithm of the FAMOUS climate model on several hardware platforms. The optimisation involved translating the Fortran code to C and restructuring the algorithm around the computation of a single air column. Instead of the existing MPI-based domain decomposition, we used a task queue and a thread pool to schedule the computation of individual columns on the available processors. Finally, four air columns are packed together in a single data structure and computed simultaneously using Single Instruction Multiple Data operations. The modified algorithm runs more than 50 times faster on the CELL’s Synergistic Processing Elements than on its main PowerPC processing element. On Intel-compatible processors, the new radiation code runs 4 times faster. On the tested graphics processor, using OpenCL, we find a speed-up of more than 2.5 times as compared to the original code on the main CPU. Because the radiation code takes more than 60% of the total CPU time, FAMOUS executes more than twice as fast. Our version of the algorithm returns bit-wise identical results, which demonstrates the robustness of our approach. We estimate that this project required around two and a half man-years of work.
Resumo:
Exascale systems are the next frontier in high-performance computing and are expected to deliver a performance of the order of 10^18 operations per second using massive multicore processors. Very large- and extreme-scale parallel systems pose critical algorithmic challenges, especially related to concurrency, locality and the need to avoid global communication patterns. This work investigates a novel protocol for dynamic group communication that can be used to remove the global communication requirement and to reduce the communication cost in parallel formulations of iterative data mining algorithms. The protocol is used to provide a communication-efficient parallel formulation of the k-means algorithm for cluster analysis. The approach is based on a collective communication operation for dynamic groups of processes and exploits non-uniform data distributions. Non-uniform data distributions can be either found in real-world distributed applications or induced by means of multidimensional binary search trees. The analysis of the proposed dynamic group communication protocol has shown that it does not introduce significant communication overhead. The parallel clustering algorithm has also been extended to accommodate an approximation error, which allows a further reduction of the communication costs. The effectiveness of the exact and approximate methods has been tested in a parallel computing system with 64 processors and in simulations with 1024 processing elements.
Resumo:
The time to process each of W/B processing blocks of a median calculation method on a set of N W-bit integers is improved here by a factor of three compared to the literature. Parallelism uncovered in blocks containing B-bit slices are exploited by independent accumulative parallel counters so that the median is calculated faster than any known previous method for any N, W values. The improvements to the method are discussed in the context of calculating the median for a moving set of N integers for which a pipelined architecture is developed. An extra benefit of smaller area for the architecture is also reported.
Resumo:
Rationale: Animal studies indicate that dopamine pathways in the ventral striatum code for the motivational salience of both rewarding and aversive stimuli, but evidence for this mechanism in humans is less established. We have developed a functional magnetic resonance imaging (fMRI) model which permits examination of the neural processing of both rewarding and aversive stimuli. Objectives: The aim of the study was to determine the effect of the dopamine receptor antagonist, sulpiride, on the neural processing of rewarding and aversive stimuli in healthy volunteers. Methods: We studied 30 healthy participants who were randomly allocated to receive a single dose of sulpiride (400 mg) or placebo, in a double-blind, parallel-group design. We used fMRI to measure the neural response to rewarding (taste or sight of chocolate) and aversive stimuli (sight of mouldy strawberries or unpleasant strawberry taste) 4 h after drug treatment. Results: Relative to placebo, sulpiride reduced blood oxygenation level-dependent responses to chocolate stimuli in the striatum (ventral striatum) and anterior cingulate cortex. Sulpiride also reduced lateral orbitofrontal cortex and insula activations to the taste and sight of the aversive condition. Conclusions: These results suggest that acute dopamine receptor blockade modulates mesolimbic and mesocortical neural activations in response to both rewarding and aversive stimuli in healthy volunteers. This effect may be relevant to the effects of dopamine receptor antagonists in the treatment of psychosis and may also have implications for the possible antidepressant properties of sulpiride.
Resumo:
A parallel formulation for the simulation of a branch prediction algorithm is presented. This parallel formulation identifies independent tasks in the algorithm which can be executed concurrently. The parallel implementation is based on the multithreading model and two parallel programming platforms: pthreads and Cilk++. Improvement in execution performance by up to 7 times is observed for a generic 2-bit predictor in a 12-core multiprocessor system.