148 resultados para Dual gratings parallel matched
Resumo:
The fast increase in the size and number of databases demands data mining approaches that are scalable to large amounts of data. This has led to the exploration of parallel computing technologies in order to perform data mining tasks concurrently using several processors. Parallelization seems to be a natural and cost-effective way to scale up data mining technologies. One of the most important of these data mining technologies is the classification of newly recorded data. This paper surveys advances in parallelization in the field of classification rule induction.
Resumo:
Generally classifiers tend to overfit if there is noise in the training data or there are missing values. Ensemble learning methods are often used to improve a classifier's classification accuracy. Most ensemble learning approaches aim to improve the classification accuracy of decision trees. However, alternative classifiers to decision trees exist. The recently developed Random Prism ensemble learner for classification aims to improve an alternative classification rule induction approach, the Prism family of algorithms, which addresses some of the limitations of decision trees. However, Random Prism suffers like any ensemble learner from a high computational overhead due to replication of the data and the induction of multiple base classifiers. Hence even modest sized datasets may impose a computational challenge to ensemble learners such as Random Prism. Parallelism is often used to scale up algorithms to deal with large datasets. This paper investigates parallelisation for Random Prism, implements a prototype and evaluates it empirically using a Hadoop computing cluster.
Resumo:
The All-Weather Volcano Topography Imaging Sensor remote sensing instrument is a custom-built millimeter-wave (MMW) sensor that has been developed as a practical field tool for remote sensing of volcanic terrain at active lava domes. The portable instrument combines active and passive MMW measurements to record topographic and thermal data in almost all weather conditions from ground-based survey points. We describe how the instrument is deployed in the field, the quality of the primary ranging and radiometric measurements, and the postprocessing techniques used to derive the geophysical products of the target terrain, surface temperature, and reflectivity. By comparison of changing topography, we estimate the volume change and the lava extrusion rate. Validation of the MMW radiometry is also presented by quantitative comparison with coincident infrared thermal imagery.
Resumo:
Java is becoming an increasingly popular language for developing distributed and parallel scientific and engineering applications. Jini is a Java-based infrastructure developed by Sun that can allegedly provide all the services necessary to support distributed applications. It is the aim of this paper to explore and investigate the services and properties that Jini actually provides and match these against the needs of high performance distributed and parallel applications written in Java. The motivation for this work is the need to develop a distributed infrastructure to support an MPI-like interface to Java known as MPJ. In the first part of the paper we discuss the needs of MPJ, the parallel environment that we wish to support. In particular we look at aspects such as reliability and ease of use. We then move on to sketch out the Jini architecture and review the components and services that Jini provides. In the third part of the paper we critically explore a Jini infrastructure that could be used to support MPJ. Here we are particularly concerned with Jini's ability to support reliably a cocoon of MPJ processes executing in a heterogeneous envirnoment. In the final part of the paper we summarise our findings and report on future work being undertaken on Jini and MPJ.
Resumo:
Advances in hardware and software technology enable us to collect, store and distribute large quantities of data on a very large scale. Automatically discovering and extracting hidden knowledge in the form of patterns from these large data volumes is known as data mining. Data mining technology is not only a part of business intelligence, but is also used in many other application areas such as research, marketing and financial analytics. For example medical scientists can use patterns extracted from historic patient data in order to determine if a new patient is likely to respond positively to a particular treatment or not; marketing analysts can use extracted patterns from customer data for future advertisement campaigns; finance experts have an interest in patterns that forecast the development of certain stock market shares for investment recommendations. However, extracting knowledge in the form of patterns from massive data volumes imposes a number of computational challenges in terms of processing time, memory, bandwidth and power consumption. These challenges have led to the development of parallel and distributed data analysis approaches and the utilisation of Grid and Cloud computing. This chapter gives an overview of parallel and distributed computing approaches and how they can be used to scale up data mining to large datasets.
Resumo:
Global communicationrequirements andloadimbalanceof someparalleldataminingalgorithms arethe major obstacles to exploitthe computational power of large-scale systems. This work investigates how non-uniform data distributions can be exploited to remove the global communication requirement and to reduce the communication costin parallel data mining algorithms and, in particular, in the k-means algorithm for cluster analysis. In the straightforward parallel formulation of the k-means algorithm, data and computation loads are uniformly distributed over the processing nodes. This approach has excellent load balancing characteristics that may suggest it could scale up to large and extreme-scale parallel computing systems. However, at each iteration step the algorithm requires a global reduction operationwhichhinders thescalabilityoftheapproach.Thisworkstudiesadifferentparallelformulation of the algorithm where the requirement of global communication is removed, while maintaining the same deterministic nature ofthe centralised algorithm. The proposed approach exploits a non-uniform data distribution which can be either found in real-world distributed applications or can be induced by means ofmulti-dimensional binary searchtrees. The approachcanalso be extended to accommodate an approximation error which allows a further reduction ofthe communication costs. The effectiveness of the exact and approximate methods has been tested in a parallel computing system with 64 processors and in simulations with 1024 processing element
Resumo:
A parallel pipelined array of cells suitable for realtime computation of histograms is proposed. The cell architecture builds on previous work to now allow operating on a stream of data at 1 pixel per clock cycle. This new cell is more suitable for interfacing to camera sensors or to microprocessors of 8-bit data buses which are common in consumer digital cameras. Arrays using the new proposed cells are obtained via C-slow retiming techniques and can be clocked at a 65% faster frequency than previous arrays. This achieves over 80% of the performance of two-pixel per clock cycle parallel pipelined arrays.
Resumo:
A parallel formulation of an algorithm for the histogram computation of n data items using an on-the-fly data decomposition and a novel quantum-like representation (QR) is developed. The QR transformation separates multiple data read operations from multiple bin update operations thereby making it easier to bind data items into their corresponding histogram bins. Under this model the steps required to compute the histogram is n/s + t steps, where s is a speedup factor and t is associated with pipeline latency. Here, we show that an overall speedup factor, s, is available for up to an eightfold acceleration. Our evaluation also shows that each one of these cells requires less area/time complexity compared to similar proposals found in the literature.
Resumo:
We propose a Nystr¨om/product integration method for a class of second kind integral equations on the real line which arise in problems of two-dimensional scalar and elastic wave scattering by unbounded surfaces. Stability and convergence of the method is established with convergence rates dependent on the smoothness of components of the kernel. The method is applied to the problem of acoustic scattering by a sound soft one-dimensional surface which is the graph of a function f, and superalgebraic convergence is established in the case when f is infinitely smooth. Numerical results are presented illustrating this behavior for the case when f is periodic (the diffraction grating case). The Nystr¨om method for this problem is stable and convergent uniformly with respect to the period of the grating, in contrast to standard integral equation methods for diffraction gratings which fail at a countable set of grating periods.
Resumo:
Major Depressive Disorder (MDD) has been associated with biased processing and abnormal regulation of negative and positive information, which may result from compromised coordinated activity of prefrontal and subcortical brain regions involved in evaluating emotional information. We tested whether patients with MDD show distributed changes in functional connectivity with a set of independently derived brain networks that have shown high correspondence with different task demands, including stimulus salience and emotional processing. We further explored if connectivity during emotional word processing related to the tendency to engage in positive or negative emotional states. In this study, 25 medication-free MDD patients without current or past comorbidity and matched controls (n=25) performed an emotional word-evaluation task during functional MRI. Using a dual regression approach, individual spatial connectivity maps representing each subject’s connectivity with each standard network were used to evaluate between-group differences and effects of positive and negative emotionality (extraversion and neuroticism, respectively, as measured with the NEO-FFI). Results showed decreased functional connectivity of the medial prefrontal cortex, ventrolateral prefrontal cortex, and ventral striatum with the fronto-opercular salience network in MDD patients compared to controls. In patients, abnormal connectivity was related to extraversion, but not neuroticism. These results confirm the hypothesis of a relative (para)limbic-cortical decoupling that may explain dysregulated affect in MDD. As connectivity of these regions with the salience network was related to extraversion, but not to general depression severity or negative emotionality, dysfunction of this network may be responsible for the failure to sustain engagement in rewarding behavior.
Resumo:
A novel method is presented for obtaining rigorous upper bounds on the finite-amplitude growth of instabilities to parallel shear flows on the beta-plane. The method relies on the existence of finite-amplitude Liapunov (normed) stability theorems, due to Arnol'd, which are nonlinear generalizations of the classical stability theorems of Rayleigh and Fjørtoft. Briefly, the idea is to use the finite-amplitude stability theorems to constrain the evolution of unstable flows in terms of their proximity to a stable flow. Two classes of general bounds are derived, and various examples are considered. It is also shown that, for a certain kind of forced-dissipative problem with dissipation proportional to vorticity, the finite-amplitude stability theorems (which were originally derived for inviscid, unforced flow) remain valid (though they are no longer strictly Liapunov); the saturation bounds therefore continue to hold under these conditions.
Resumo:
Disturbances of arbitrary amplitude are superposed on a basic flow which is assumed to be steady and either (a) two-dimensional, homogeneous, and incompressible (rotating or non-rotating) or (b) stably stratified and quasi-geostrophic. Flow over shallow topography is allowed in either case. The basic flow, as well as the disturbance, is assumed to be subject neither to external forcing nor to dissipative processes like viscosity. An exact, local ‘wave-activity conservation theorem’ is derived in which the density A and flux F are second-order ‘wave properties’ or ‘disturbance properties’, meaning that they are O(a2) in magnitude as disturbance amplitude a [rightward arrow] 0, and that they are evaluable correct to O(a2) from linear theory, to O(a3) from second-order theory, and so on to higher orders in a. For a disturbance in the form of a single, slowly varying, non-stationary Rossby wavetrain, $\overline{F}/\overline{A}$ reduces approximately to the Rossby-wave group velocity, where (${}^{-}$) is an appropriate averaging operator. F and A have the formal appearance of Eulerian quantities, but generally involve a multivalued function the correct branch of which requires a certain amount of Lagrangian information for its determination. It is shown that, in a certain sense, the construction of conservable, quasi-Eulerian wave properties like A is unique and that the multivaluedness is inescapable in general. The connection with the concepts of pseudoenergy (quasi-energy), pseudomomentum (quasi-momentum), and ‘Eliassen-Palm wave activity’ is noted. The relationship of this and similar conservation theorems to dynamical fundamentals and to Arnol'd's nonlinear stability theorems is discussed in the light of recent advances in Hamiltonian dynamics. These show where such conservation theorems come from and how to construct them in other cases. An elementary proof of the Hamiltonian structure of two-dimensional Eulerian vortex dynamics is put on record, with explicit attention to the boundary conditions. The connection between Arnol'd's second stability theorem and the suppression of shear and self-tuning resonant instabilities by boundary constraints is discussed, and a finite-amplitude counterpart to Rayleigh's inflection-point theorem noted
Resumo:
We have optimised the atmospheric radiation algorithm of the FAMOUS climate model on several hardware platforms. The optimisation involved translating the Fortran code to C and restructuring the algorithm around the computation of a single air column. Instead of the existing MPI-based domain decomposition, we used a task queue and a thread pool to schedule the computation of individual columns on the available processors. Finally, four air columns are packed together in a single data structure and computed simultaneously using Single Instruction Multiple Data operations. The modified algorithm runs more than 50 times faster on the CELL’s Synergistic Processing Elements than on its main PowerPC processing element. On Intel-compatible processors, the new radiation code runs 4 times faster. On the tested graphics processor, using OpenCL, we find a speed-up of more than 2.5 times as compared to the original code on the main CPU. Because the radiation code takes more than 60% of the total CPU time, FAMOUS executes more than twice as fast. Our version of the algorithm returns bit-wise identical results, which demonstrates the robustness of our approach. We estimate that this project required around two and a half man-years of work.