35 resultados para dataflow
Resumo:
23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP 2015). 4 to 6, Mar, 2015. Turku, Finland.
Resumo:
Abstract Dataflow programs are widely used. Each program is a directed graph where nodes are computations and edges indicate the flow of data. In prior work, we reverse-engineered legacy dataflow programs by deriving their optimized implementations from a simple specification graph using graph transformations called refinements and optimizations. In MDE-speak, our derivations were PIM-to-PSM mappings. In this paper, we show how extensions complement refinements, optimizations, and PIM-to-PSM derivations to make the process of reverse engineering complex legacy dataflow programs tractable. We explain how optional functionality in transformations can be encoded, thereby enabling us to encode product lines of transformations as well as product lines of dataflow programs. We describe the implementation of extensions in the ReFlO tool and present two non-trivial case studies as evidence of our work’s generality
Resumo:
With the shift towards many-core computer architectures, dataflow programming has been proposed as one potential solution for producing software that scales to a varying number of processor cores. Programming for parallel architectures is considered difficult as the current popular programming languages are inherently sequential and introducing parallelism is typically up to the programmer. Dataflow, however, is inherently parallel, describing an application as a directed graph, where nodes represent calculations and edges represent a data dependency in form of a queue. These queues are the only allowed communication between the nodes, making the dependencies between the nodes explicit and thereby also the parallelism. Once a node have the su cient inputs available, the node can, independently of any other node, perform calculations, consume inputs, and produce outputs. Data ow models have existed for several decades and have become popular for describing signal processing applications as the graph representation is a very natural representation within this eld. Digital lters are typically described with boxes and arrows also in textbooks. Data ow is also becoming more interesting in other domains, and in principle, any application working on an information stream ts the dataflow paradigm. Such applications are, among others, network protocols, cryptography, and multimedia applications. As an example, the MPEG group standardized a dataflow language called RVC-CAL to be use within reconfigurable video coding. Describing a video coder as a data ow network instead of with conventional programming languages, makes the coder more readable as it describes how the video dataflows through the different coding tools. While dataflow provides an intuitive representation for many applications, it also introduces some new problems that need to be solved in order for data ow to be more widely used. The explicit parallelism of a dataflow program is descriptive and enables an improved utilization of available processing units, however, the independent nodes also implies that some kind of scheduling is required. The need for efficient scheduling becomes even more evident when the number of nodes is larger than the number of processing units and several nodes are running concurrently on one processor core. There exist several data ow models of computation, with different trade-offs between expressiveness and analyzability. These vary from rather restricted but statically schedulable, with minimal scheduling overhead, to dynamic where each ring requires a ring rule to evaluated. The model used in this work, namely RVC-CAL, is a very expressive language, and in the general case it requires dynamic scheduling, however, the strong encapsulation of dataflow nodes enables analysis and the scheduling overhead can be reduced by using quasi-static, or piecewise static, scheduling techniques. The scheduling problem is concerned with nding the few scheduling decisions that must be run-time, while most decisions are pre-calculated. The result is then an, as small as possible, set of static schedules that are dynamically scheduled. To identify these dynamic decisions and to find the concrete schedules, this thesis shows how quasi-static scheduling can be represented as a model checking problem. This involves identifying the relevant information to generate a minimal but complete model to be used for model checking. The model must describe everything that may affect scheduling of the application while omitting everything else in order to avoid state space explosion. This kind of simplification is necessary to make the state space analysis feasible. For the model checker to nd the actual schedules, a set of scheduling strategies are de ned which are able to produce quasi-static schedulers for a wide range of applications. The results of this work show that actor composition with quasi-static scheduling can be used to transform data ow programs to t many different computer architecture with different type and number of cores. This in turn, enables dataflow to provide a more platform independent representation as one application can be fitted to a specific processor architecture without changing the actual program representation. Instead, the program representation is in the context of design space exploration optimized by the development tools to fit the target platform. This work focuses on representing the dataflow scheduling problem as a model checking problem and is implemented as part of a compiler infrastructure. The thesis also presents experimental results as evidence of the usefulness of the approach.
Resumo:
The dataflow model of computation exposes and exploits parallelism in programs without requiring programmer annotation; however, instruction- level dataflow is too fine-grained to be efficient on general-purpose processors. A popular solution is to develop a "hybrid'' model of computation where regions of dataflow graphs are combined into sequential blocks of code. I have implemented such a system to allow the J-Machine to run Id programs, leaving exposed a high amount of parallelism --- such as among loop iterations. I describe this system and provide an analysis of its strengths and weaknesses and those of the J-Machine, along with ideas for improvement.
Resumo:
This work shows the design, simulation, and analysis of two optical interconnection networks for a Dataflow parallel computer architecture. To verify the optical interconnection network performance on the Dataflow architecture, we have analyzed the load balancing among the processors during the parallel programs executions. The load balancing is a very important parameter because it is directly associated to the dataflow parallelism degree. This article proves that optical interconnection networks designed with simple optical devices can provide efficiently the dataflow requirements of a high performance communication system.
Resumo:
The aim of this work is to propose a simple and efficient mechanism to deal with the problem of executing sequential code in a pure dataflow machine. Our results is obtained with a simulator of Wolf [4] architecture. The implemented mechanism improved the architecture performance when executing sequential code and we expect that this improvement could be better if we use some heuristics to deal with some special groups of instructions such as branch operations. Further research will show us if this is true.
Resumo:
Spreadsheets are widely used but often contain faults. Thus, in prior work we presented a data-flow testing methodology for use with spreadsheets, which studies have shown can be used cost-effectively by end-user programmers. To date, however, the methodology has been investigated across a limited set of spreadsheet language features. Commercial spreadsheet environments are multiparadigm languages, utilizing features not accommodated by our prior approaches. In addition, most spreadsheets contain large numbers of replicated formulas that severely limit the efficiency of data-flow testing approaches. We show how to handle these two issues with a new data-flow adequacy criterion and automated detection of areas of replicated formulas, and report results of a controlled experiment investigating the feasibility of our approach.
Resumo:
We develop and study the concept of dataflow process networks as used for exampleby Kahn to suit exact computation over data types related to real numbers, such as continuous functions and geometrical solids. Furthermore, we consider communicating these exact objectsamong processes using protocols of a query-answer nature as introduced in our earlier work. This enables processes to provide valid approximations with certain accuracy and focusing on certainlocality as demanded by the receiving processes through queries. We define domain-theoretical denotational semantics of our networks in two ways: (1) directly, i. e. by viewing the whole network as a composite process and applying the process semantics introduced in our earlier work; and (2) compositionally, i. e. by a fixed-point construction similarto that used by Kahn from the denotational semantics of individual processes in the network. The direct semantics closely corresponds to the operational semantics of the network (i. e. it iscorrect) but very difficult to study for concrete networks. The compositional semantics enablescompositional analysis of concrete networks, assuming it is correct. We prove that the compositional semantics is a safe approximation of the direct semantics. Wealso provide a method that can be used in many cases to establish that the two semantics fully coincide, i. e. safety is not achieved through inactivity or meaningless answers. The results are extended to cover recursively-defined infinite networks as well as nested finitenetworks. A robust prototype implementation of our model is available.
Resumo:
The methods of designing of information systems for large organizations are considered in the paper. The structural and object-oriented approaches are compared. For the practical realization of the automated dataflow systems the combined method for the system development and analysis is proposed.
Resumo:
Around 98% of all transcriptional output in humans is noncoding RNA. RNA-mediated gene regulation is widespread in higher eukaryotes and complex genetic phenomena like RNA interference, co-suppression, transgene silencing, imprinting, methylation, and possibly position-effect variegation and transvection, all involve intersecting pathways based on or connected to RNA signaling. I suggest that the central dogma is incomplete, and that intronic and other non-coding RNAs have evolved to comprise a second tier of gene expression in eukaryotes, which enables the integration and networking of complex suites of gene activity. Although proteins are the fundamental effectors of cellular function, the basis of eukaryotic complexity and phenotypic variation may lie primarily in a control architecture composed of a highly parallel system of trans-acting RNAs that relay state information required for the coordination and modulation of gene expression, via chromatin remodeling, RNA-DNA, RNA-RNA and RNA-protein interactions. This system has interesting and perhaps informative analogies with small world networks and dataflow computing.
Resumo:
The definition and programming of distributed applications has become a major research issue due to the increasing availability of (large scale) distributed platforms and the requirements posed by the economical globalization. However, such a task requires a huge effort due to the complexity of the distributed environments: large amount of users may communicate and share information across different authority domains; moreover, the “execution environment” or “computations” are dynamic since the number of users and the computational infrastructure change in time. Grid environments, in particular, promise to be an answer to deal with such complexity, by providing high performance execution support to large amount of users, and resource sharing across different organizations. Nevertheless, programming in Grid environments is still a difficult task. There is a lack of high level programming paradigms and support tools that may guide the application developer and allow reusability of state-of-the-art solutions. Specifically, the main goal of the work presented in this thesis is to contribute to the simplification of the development cycle of applications for Grid environments by bringing structure and flexibility to three stages of that cycle through a commonmodel. The stages are: the design phase, the execution phase, and the reconfiguration phase. The common model is based on the manipulation of patterns through pattern operators, and the division of both patterns and operators into two categories, namely structural and behavioural. Moreover, both structural and behavioural patterns are first class entities at each of the aforesaid stages. At the design phase, patterns can be manipulated like other first class entities such as components. This allows a more structured way to build applications by reusing and composing state-of-the-art patterns. At the execution phase, patterns are units of execution control: it is possible, for example, to start or stop and to resume the execution of a pattern as a single entity. At the reconfiguration phase, patterns can also be manipulated as single entities with the additional advantage that it is possible to perform a structural reconfiguration while keeping some of the behavioural constraints, and vice-versa. For example, it is possible to replace a behavioural pattern, which was applied to some structural pattern, with another behavioural pattern. In this thesis, besides the proposal of the methodology for distributed application development, as sketched above, a definition of a relevant set of pattern operators was made. The methodology and the expressivity of the pattern operators were assessed through the development of several representative distributed applications. To support this validation, a prototype was designed and implemented, encompassing some relevant patterns and a significant part of the patterns operators defined. This prototype was based in the Triana environment; Triana supports the development and deployment of distributed applications in the Grid through a dataflow-based programming model. Additionally, this thesis also presents the analysis of a mapping of some operators for execution control onto the Distributed Resource Management Application API (DRMAA). This assessment confirmed the suitability of the proposed model, as well as the generality and flexibility of the defined pattern operators
Resumo:
This paper describes a navigation system for autonomous underwater vehicles (AUVs) in partially structured environments, such as dams, harbors, marinas or marine platforms. A mechanical scanning imaging sonar is used to obtain information about the location of planar structures present in such environments. A modified version of the Hough transform has been developed to extract line features, together with their uncertainty, from the continuous sonar dataflow. The information obtained is incorporated into a feature-based SLAM algorithm running an Extended Kalman Filter (EKF). Simultaneously, the AUV's position estimate is provided to the feature extraction algorithm to correct the distortions that the vehicle motion produces in the acoustic images. Experiments carried out in a marina located in the Costa Brava (Spain) with the Ictineu AUV show the viability of the proposed approach
Resumo:
Markkinasegmentointi nousi esiin ensi kerran jo 50-luvulla ja se on ollut siitä lähtien yksi markkinoinnin peruskäsitteistä. Suuri osa segmentointia käsittelevästä tutkimuksesta on kuitenkin keskittynyt kuluttajamarkkinoiden segmentointiin yritys- ja teollisuusmarkkinoiden segmentoinnin jäädessä vähemmälle huomiolle. Tämän tutkimuksen tavoitteena on luoda segmentointimalli teollismarkkinoille tietotekniikan tuotteiden ja palveluiden tarjoajan näkökulmasta. Tarkoituksena on selvittää mahdollistavatko case-yrityksen nykyiset asiakastietokannat tehokkaan segmentoinnin, selvittää sopivat segmentointikriteerit sekä arvioida tulisiko tietokantoja kehittää ja kuinka niitä tulisi kehittää tehokkaamman segmentoinnin mahdollistamiseksi. Tarkoitus on luoda yksi malli eri liiketoimintayksiköille yhteisesti. Näin ollen eri yksiköiden tavoitteet tulee ottaa huomioon eturistiriitojen välttämiseksi. Tutkimusmetodologia on tapaustutkimus. Lähteinä tutkimuksessa käytettiin sekundäärisiä lähteitä sekä primäärejä lähteitä kuten case-yrityksen omia tietokantoja sekä haastatteluita. Tutkimuksen lähtökohtana oli tutkimusongelma: Voiko tietokantoihin perustuvaa segmentointia käyttää kannattavaan asiakassuhdejohtamiseen PK-yritys sektorilla? Tavoitteena on luoda segmentointimalli, joka hyödyntää tietokannoissa olevia tietoja tinkimättä kuitenkaan tehokkaan ja kannattavan segmentoinnin ehdoista. Teoriaosa tutkii segmentointia yleensä painottuen kuitenkin teolliseen markkinasegmentointiin. Tarkoituksena on luoda selkeä kuva erilaisista lähestymistavoista aiheeseen ja syventää näkemystä tärkeimpien teorioiden osalta. Tietokantojen analysointi osoitti selviä puutteita asiakastiedoissa. Peruskontaktitiedot löytyvät mutta segmentointia varten tietoa on erittäin rajoitetusti. Tietojen saantia jälleenmyyjiltä ja tukkureilta tulisi parantaa loppuasiakastietojen saannin takia. Segmentointi nykyisten tietojen varassa perustuu lähinnä sekundäärisiin tietoihin kuten toimialaan ja yrityskokoon. Näitäkään tietoja ei ole saatavilla kaikkien tietokannassa olevien yritysten kohdalta.
Resumo:
A foundational model of concurrency is developed in this thesis. We examine issues in the design of parallel systems and show why the actor model is suitable for exploiting large-scale parallelism. Concurrency in actors is constrained only by the availability of hardware resources and by the logical dependence inherent in the computation. Unlike dataflow and functional programming, however, actors are dynamically reconfigurable and can model shared resources with changing local state. Concurrency is spawned in actors using asynchronous message-passing, pipelining, and the dynamic creation of actors. This thesis deals with some central issues in distributed computing. Specifically, problems of divergence and deadlock are addressed. For example, actors permit dynamic deadlock detection and removal. The problem of divergence is contained because independent transactions can execute concurrently and potentially infinite processes are nevertheless available for interaction.