59 resultados para dataflow
Resumo:
Processor architectures has taken a turn towards many-core processors, which integrate multiple processing cores on a single chip to increase overall performance, and there are no signs that this trend will stop in the near future. Many-core processors are harder to program than multi-core and single-core processors due to the need of writing parallel or concurrent programs with high degrees of parallelism. Moreover, many-cores have to operate in a mode of strong scaling because of memory bandwidth constraints. In strong scaling increasingly finer-grain parallelism must be extracted in order to keep all processing cores busy.
Task dataflow programming models have a high potential to simplify parallel program- ming because they alleviate the programmer from identifying precisely all inter-task de- pendences when writing programs. Instead, the task dataflow runtime system detects and enforces inter-task dependences during execution based on the description of memory each task accesses. The runtime constructs a task dataflow graph that captures all tasks and their dependences. Tasks are scheduled to execute in parallel taking into account dependences specified in the task graph.
Several papers report important overheads for task dataflow systems, which severely limits the scalability and usability of such systems. In this paper we study efficient schemes to manage task graphs and analyze their scalability. We assume a programming model that supports input, output and in/out annotations on task arguments, as well as commutative in/out and reductions. We analyze the structure of task graphs and identify versions and generations as key concepts for efficient management of task graphs. Then, we present three schemes to manage task graphs building on graph representations, hypergraphs and lists. We also consider a fourth edge-less scheme that synchronizes tasks using integers. Analysis using micro-benchmarks shows that the graph representation is not always scalable and that the edge-less scheme introduces least overhead in nearly all situations.
Resumo:
With security and surveillance, there is an increasing need to be able to process image data efficiently and effectively either at source or in a large data networks. Whilst Field Programmable Gate Arrays have been seen as a key technology for enabling this, they typically use high level and/or hardware description language synthesis approaches; this provides a major disadvantage in terms of the time needed to design or program them and to verify correct operation; it considerably reduces the programmability capability of any technique based on this technology. The work here proposes a different approach of using optimised soft-core processors which can be programmed in software. In particular, the paper proposes a design tool chain for programming such processors that uses the CAL Actor Language as a starting point for describing an image processing algorithm and targets its implementation to these custom designed, soft-core processors on FPGA. The main purpose is to exploit the task and data parallelism in order to achieve the same parallelism as a previous HDL implementation but avoiding the design time, verification and debugging steps associated with such approaches.
Resumo:
Increased system variability and irregularity of parallelism in applications put increasing demands on the ef- ficiency of dynamic task schedulers. This paper presents a new design for a work-stealing scheduler supporting both Cilk- style recursively parallel code and parallelism deduced from dataflow dependences. Initial evaluation on a set of linear algebra kernels demonstrates that our scheduler outperforms PLASMA’s QUARK scheduler by up to 12% on a 16-thread Intel Xeon and by up to 50% on a 32-thread AMD Bulldozer.
Resumo:
The dataflow model of computation exposes and exploits parallelism in programs without requiring programmer annotation; however, instruction- level dataflow is too fine-grained to be efficient on general-purpose processors. A popular solution is to develop a "hybrid'' model of computation where regions of dataflow graphs are combined into sequential blocks of code. I have implemented such a system to allow the J-Machine to run Id programs, leaving exposed a high amount of parallelism --- such as among loop iterations. I describe this system and provide an analysis of its strengths and weaknesses and those of the J-Machine, along with ideas for improvement.
Resumo:
This work shows the design, simulation, and analysis of two optical interconnection networks for a Dataflow parallel computer architecture. To verify the optical interconnection network performance on the Dataflow architecture, we have analyzed the load balancing among the processors during the parallel programs executions. The load balancing is a very important parameter because it is directly associated to the dataflow parallelism degree. This article proves that optical interconnection networks designed with simple optical devices can provide efficiently the dataflow requirements of a high performance communication system.
Resumo:
The aim of this work is to propose a simple and efficient mechanism to deal with the problem of executing sequential code in a pure dataflow machine. Our results is obtained with a simulator of Wolf [4] architecture. The implemented mechanism improved the architecture performance when executing sequential code and we expect that this improvement could be better if we use some heuristics to deal with some special groups of instructions such as branch operations. Further research will show us if this is true.
Resumo:
Spreadsheets are widely used but often contain faults. Thus, in prior work we presented a data-flow testing methodology for use with spreadsheets, which studies have shown can be used cost-effectively by end-user programmers. To date, however, the methodology has been investigated across a limited set of spreadsheet language features. Commercial spreadsheet environments are multiparadigm languages, utilizing features not accommodated by our prior approaches. In addition, most spreadsheets contain large numbers of replicated formulas that severely limit the efficiency of data-flow testing approaches. We show how to handle these two issues with a new data-flow adequacy criterion and automated detection of areas of replicated formulas, and report results of a controlled experiment investigating the feasibility of our approach.
Resumo:
We develop and study the concept of dataflow process networks as used for exampleby Kahn to suit exact computation over data types related to real numbers, such as continuous functions and geometrical solids. Furthermore, we consider communicating these exact objectsamong processes using protocols of a query-answer nature as introduced in our earlier work. This enables processes to provide valid approximations with certain accuracy and focusing on certainlocality as demanded by the receiving processes through queries. We define domain-theoretical denotational semantics of our networks in two ways: (1) directly, i. e. by viewing the whole network as a composite process and applying the process semantics introduced in our earlier work; and (2) compositionally, i. e. by a fixed-point construction similarto that used by Kahn from the denotational semantics of individual processes in the network. The direct semantics closely corresponds to the operational semantics of the network (i. e. it iscorrect) but very difficult to study for concrete networks. The compositional semantics enablescompositional analysis of concrete networks, assuming it is correct. We prove that the compositional semantics is a safe approximation of the direct semantics. Wealso provide a method that can be used in many cases to establish that the two semantics fully coincide, i. e. safety is not achieved through inactivity or meaningless answers. The results are extended to cover recursively-defined infinite networks as well as nested finitenetworks. A robust prototype implementation of our model is available.
Resumo:
The methods of designing of information systems for large organizations are considered in the paper. The structural and object-oriented approaches are compared. For the practical realization of the automated dataflow systems the combined method for the system development and analysis is proposed.
Resumo:
There are many applications in aeronautics where there exist strong couplings between disciplines. One practical example is within the context of Unmanned Aerial Vehicle(UAV) automation where there exists strong coupling between operation constraints, aerodynamics, vehicle dynamics, mission and path planning. UAV path planning can be done either online or offline. The current state of path planning optimisation online UAVs with high performance computation is not at the same level as its ground-based offline optimizer's counterpart, this is mainly due to the volume, power and weight limitations on the UAV; some small UAVs do not have the computational power needed for some optimisation and path planning task. In this paper, we describe an optimisation method which can be applied to Multi-disciplinary Design Optimisation problems and UAV path planning problems. Hardware-based design optimisation techniques are used. The power and physical limitations of UAV, which may not be a problem in PC-based solutions, can be approached by utilizing a Field Programmable Gate Array (FPGA) as an algorithm accelerator. The inevitable latency produced by the iterative process of an Evolutionary Algorithm (EA) is concealed by exploiting the parallelism component within the dataflow paradigm of the EA on an FPGA architecture. Results compare software PC-based solutions and the hardware-based solutions for benchmark mathematical problems as well as a simple real world engineering problem. Results also indicate the practicality of the method which can be used for more complex single and multi objective coupled problems in aeronautical applications.
Resumo:
A distributed fuzzy system is a real-time fuzzy system in which the input, output and computation may be located on different networked computing nodes. The ability for a distributed software application, such as a distributed fuzzy system, to adapt to changes in the computing network at runtime can provide real-time performance improvement and fault-tolerance. This paper introduces an Adaptable Mobile Component Framework (AMCF) that provides a distributed dataflow-based platform with a fine-grained level of runtime reconfigurability. The execution location of small fragments (possibly as little as few machine-code instructions) of an AMCF application can be moved between different computing nodes at runtime. A case study is included that demonstrates the applicability of the AMCF to a distributed fuzzy system scenario involving multiple physical agents (such as autonomous robots). Using the AMCF, fuzzy systems can now be developed such that they can be distributed automatically across multiple computing nodes and are adaptable to runtime changes in the networked computing environment. This provides the opportunity to improve the performance of fuzzy systems deployed in scenarios where the computing environment is resource-constrained and volatile, such as multiple autonomous robots, smart environments and sensor networks.
Resumo:
Flexible information exchange is critical to successful design-analysis integration, but current top-down, standards-based and model-oriented strategies impose restrictions that contradicts this flexibility. In this article we present a bottom-up, user-controlled and process-oriented approach to linking design and analysis applications that is more responsive to the varied needs of designers and design teams. Drawing on research into scientific workflows, we present a framework for integration that capitalises on advances in cloud computing to connect discrete tools via flexible and distributed process networks. We then discuss how a shared mapping process that is flexible and user friendly supports non-programmers in creating these custom connections. Adopting a services-oriented system architecture, we propose a web-based platform that enables data, semantics and models to be shared on the fly. We then discuss potential challenges and opportunities for its development as a flexible, visual, collaborative, scalable and open system.
Resumo:
Flexible information exchange is critical to successful design integration, but current top-down, standards-based and model-oriented strategies impose restrictions that are contradictory to this flexibility. In this paper we present a bottom-up, user-controlled and process-oriented approach to linking design and analysis applications that is more responsive to the varied needs of designers and design teams. Drawing on research into scientific workflows, we present a framework for integration that capitalises on advances in cloud computing to connect discrete tools via flexible and distributed process networks. Adopting a services-oriented system architecture, we propose a web-based platform that enables data, semantics and models to be shared on the fly. We discuss potential challenges and opportunities for the development thereof as a flexible, visual, collaborative, scalable and open system.