20 resultados para Data flow
em QUB Research Portal - Research Directory and Institutional Repository for Queen's University Belfast
Resumo:
We propose a data flow based run time system as an efficient tool for supporting execution of parallel code on heterogeneous architectures hosting both multicore CPUs and GPUs. We discuss how the proposed run time system may be the target of both structured parallel applications developed using algorithmic skeletons/parallel design patterns and also more "domain specific" programming models. Experimental results demonstrating the feasibility of the approach are presented. © 2012 World Scientific Publishing Company.
Resumo:
We show how the architecture of two recently reported bit-level systolic array circuits - a single-bit coefficient correlator and a multibit convolver - may be modified to incorporate unidirectional data flow. This feature has advantages in terms of chip cascadability, fault tolerance and possible wafer-scale integration.
Resumo:
Data flow techniques have been around since the early '70s when they were used in compilers for sequential languages. Shortly after their introduction they were also consideredas a possible model for parallel computing, although the impact here was limited. Recently, however, data flow has been identified as a candidate for efficient implementation of various programming models on multi-core architectures. In most cases, however, the burden of determining data flow "macro" instructions is left to the programmer, while the compiler/run time system manages only the efficient scheduling of these instructions. We discuss a structured parallel programming approach supporting automatic compilation of programs to macro data flow and we show experimental results demonstrating the feasibility of the approach and the efficiency of the resulting "object" code on different classes of state-of-the-art multi-core architectures. The experimental results use different base mechanisms to implement the macro data flow run time support, from plain pthreads with condition variables to more modern and effective lock- and fence-free parallel frameworks. Experimental results comparing efficiency of the proposed approach with those achieved using other, more classical, parallel frameworks are also presented. © 2012 IEEE.
Resumo:
A new universal flow map has been developed for two-phase co-current flow. The map has been successfully tested against wide variety of data. Flow regime transition predictors suggested by other authors have been shown to be useful. New transitional models are proposed for the stratified to annular regimes, blow through slug and intermittent regimes.
Resumo:
Recent years have witnessed an incredibly increasing interest in the topic of incremental learning. Unlike conventional machine learning situations, data flow targeted by incremental learning becomes available continuously over time. Accordingly, it is desirable to be able to abandon the traditional assumption of the availability of representative training data during the training period to develop decision boundaries. Under scenarios of continuous data flow, the challenge is how to transform the vast amount of stream raw data into information and knowledge representation, and accumulate experience over time to support future decision-making process. In this paper, we propose a general adaptive incremental learning framework named ADAIN that is capable of learning from continuous raw data, accumulating experience over time, and using such knowledge to improve future learning and prediction performance. Detailed system level architecture and design strategies are presented in this paper. Simulation results over several real-world data sets are used to validate the effectiveness of this method.
Resumo:
Traditional static analysis fails to auto-parallelize programs with a complex control and data flow. Furthermore, thread-level parallelism in such programs is often restricted to pipeline parallelism, which can be hard to discover by a programmer. In this paper we propose a tool that, based on profiling information, helps the programmer to discover parallelism. The programmer hand-picks the code transformations from among the proposed candidates which are then applied by automatic code transformation techniques.
This paper contributes to the literature by presenting a profiling tool for discovering thread-level parallelism. We track dependencies at the whole-data structure level rather than at the element level or byte level in order to limit the profiling overhead. We perform a thorough analysis of the needs and costs of this technique. Furthermore, we present and validate the belief that programs with complex control and data flow contain significant amounts of exploitable coarse-grain pipeline parallelism in the program’s outer loops. This observation validates our approach to whole-data structure dependencies. As state-of-the-art compilers focus on loops iterating over data structure members, this observation also explains why our approach finds coarse-grain pipeline parallelism in cases that have remained out of reach for state-of-the-art compilers. In cases where traditional compilation techniques do find parallelism, our approach allows to discover higher degrees of parallelism, allowing a 40% speedup over traditional compilation techniques. Moreover, we demonstrate real speedups on multiple hardware platforms.
Resumo:
In this paper, we present a methodology for implementing a complete Digital Signal Processing (DSP) system onto a heterogeneous network including Field Programmable Gate Arrays (FPGAs) automatically. The methodology aims to allow design refinement and real time verification at the system level. The DSP application is constructed in the form of a Data Flow Graph (DFG) which provides an entry point to the methodology. The netlist for parts that are mapped onto the FPGA(s) together with the corresponding software and hardware Application Protocol Interface (API) are also generated. Using a set of case studies, we demonstrate that the design and development time can be significantly reduced using the methodology developed.
Resumo:
Bit level systolic array structures for computing sums of products are studied in detail. It is shown that these can be sub-divided into two classes and that, within each class, architectures can be described in terms of a set of constraint equations. It is further demonstrated that high performance system level functions with attractive VLSI properties can be constructed by matching data flow geometries in bit level and word level architectures.
Resumo:
Bit-level systolic-array structures for computing sums of products are studied in detail. It is shown that these can be subdivided into two classes and that within each class architectures can be described in terms of a set of constraint equations. It is further demonstrated that high-performance system-level functions with attractive VLSI properties can be constructed by matching data-flow geometries in bit-level and word-level architectures.
Resumo:
The mapping of matrix multiplied by matrix multiplication onto both word and bit level systolic arrays has been investigated. It has been found that well defined word and bit level data flow constraints must be satisfied within such circuits. An efficient and highly regular bit level array has been generated by exploiting the basic compatibilities in data flow symmetries at each level of the problem. A description of the circuit which emerges is given and some details relating to its practical implementation are discussed.
Resumo:
A method for measuring the phase of oscillations from noisy time series is proposed. To obtain the phase, the signal is filtered in such a way that the filter output has minimal relative variation in the amplitude over all filters with complex-valued impulse response. The argument of the filter output yields the phase. Implementation of the algorithm and interpretation of the result are discussed. We argue that the phase obtained by the proposed method has a low susceptibility to measurement noise and a low rate of artificial phase slips. The method is applied for the detection and classification of mode locking in vortex flow meters. A measure for the strength of mode locking is proposed.
Resumo:
Flow maldistribution of the exhaust gas entering a Diesel Particulate Filter (DPF) can cause uneven soot distribution during loading and excessive temperature gradients during the regeneration phase. Minimising the magnitude of this maldistribution is therefore an important consideration in the design of the inlet pipe and diffuser, particularly in situations where packaging constraints dictate bends in the inlet pipe close to the filter, or a sharp diffuser angle. This paper describes the use of Particle Image Velocimetry (PIV) to validate a Computational Fluid Dynamic (CFD) model of the flow within the inlet diffuser of a DPF so that CFD can be used with confidence as a tool to minimise this flow maldistribution. PIV is used to study the flow of gas into a DPF over a range of steady state flow conditions. The distribution of flow approaching the front face of the substrate was of particular interest to this study. Optically clear diffusing cones were designed and placed between pipe and substrate to allow PIV analysis to take place. Stereoscopic PIV was used to eliminate any error produced by the optical aberrations caused by looking through the curved wall of the inlet cone. In parallel to the experiments, numerical analysis was carried out using a CFD program with an incorporated DPF model. Boundary conditions for the CFD simulations were taken from the experimental data, allowing an experimental validation of the numerical results. The CFD model incorporated a DPF model, the cement layers seen in segmented filters and the intumescent matting that is commonly used to pack the filter into a metal casing. The mesh contained approximately 580,000 cells and used the realizable ?-e turbulence model. The CFD simulation predicted both pressure drop across the DPF and the velocity field within the cone and at the DPF face with reasonable accuracy, providing confidence in the use the CFD in future work to design new, more efficient cones.
Resumo:
We propose an exchange rate model that is a hybrid of the conventional specification with monetary fundamentals and the Evans–Lyons microstructure approach. We estimate a model augmented with order flow variables, using a unique data set: almost 100 monthly observations on interdealer order flow on dollar/euro and dollar/yen. The augmented macroeconomic, or “hybrid,” model exhibits greater in-sample stability and out of sample forecasting improvement vis-à-vis the basic macroeconomic and random walk specifications.