943 resultados para Scientific workflow
Resumo:
Concurrent software executes multiple threads or processes to achieve high performance. However, concurrency results in a huge number of different system behaviors that are difficult to test and verify. The aim of this dissertation is to develop new methods and tools for modeling and analyzing concurrent software systems at design and code levels. This dissertation consists of several related results. First, a formal model of Mondex, an electronic purse system, is built using Petri nets from user requirements, which is formally verified using model checking. Second, Petri nets models are automatically mined from the event traces generated from scientific workflows. Third, partial order models are automatically extracted from some instrumented concurrent program execution, and potential atomicity violation bugs are automatically verified based on the partial order models using model checking. Our formal specification and verification of Mondex have contributed to the world wide effort in developing a verified software repository. Our method to mine Petri net models automatically from provenance offers a new approach to build scientific workflows. Our dynamic prediction tool, named McPatom, can predict several known bugs in real world systems including one that evades several other existing tools. McPatom is efficient and scalable as it takes advantage of the nature of atomicity violations and considers only a pair of threads and accesses to a single shared variable at one time. However, predictive tools need to consider the tradeoffs between precision and coverage. Based on McPatom, this dissertation presents two methods for improving the coverage and precision of atomicity violation predictions: 1) a post-prediction analysis method to increase coverage while ensuring precision; 2) a follow-up replaying method to further increase coverage. Both methods are implemented in a completely automatic tool.
Resumo:
Workflows have been successfully applied to express the decomposition of complex scientific applications. However the existing tools still lack adequate support to important aspects namely, decoupling the enactment engine from tasks specification, decentralizing the control of workflow activities allowing their tasks to run in distributed infrastructures, and supporting dynamic workflow reconfigurations. We present the AWARD (Autonomic Workflow Activities Reconfigurable and Dynamic) model of computation, based on Process Networks, where the workflow activities (AWA) are autonomic processes with independent control that can run in parallel on distributed infrastructures. Each AWA executes a task developed as a Java class with a generic interface allowing end-users to code their applications without low-level details. The data-driven coordination of AWA interactions is based on a shared tuple space that also enables dynamic workflow reconfiguration. For evaluation we describe experimental results of AWARD workflow executions in several application scenarios, mapped to the Amazon (Elastic Computing EC2) Cloud.
Resumo:
Thesis submitted to the Universidade Nova de Lisboa, Faculdade de Ciências e Tecnologia, for the degree of Doctor of Philosophy in Biochemistry
Resumo:
Aki Lassilan esitys Europeana työpajassa 20.11.2012 Helsingissä.
Resumo:
Wednesday 23rd April 2014 Speaker(s): Willi Hasselbring Organiser: Leslie Carr Time: 23/04/2014 11:00-11:50 Location: B32/3077 File size: 669 Mb Abstract For good scientific practice, it is important that research results may be properly checked by reviewers and possibly repeated and extended by other researchers. This is of particular interest for "digital science" i.e. for in-silico experiments. In this talk, I'll discuss some issues of how software systems and services may contribute to good scientific practice. Particularly, I'll present our PubFlow approach to automate publication workflows for scientific data. The PubFlow workflow management system is based on established technology. We integrate institutional repository systems (based on EPrints) and world data centers (in marine science). PubFlow collects provenance data automatically via our monitoring framework Kieker. Provenance information describes the origins and the history of scientific data in its life cycle, and the process by which it arrived. Thus, provenance information is highly relevant to repeatability and trustworthiness of scientific results. In our evaluation in marine science, we collaborate with the GEOMAR Helmholtz Centre for Ocean Research Kiel.
Resumo:
In the Biodiversity World (BDW) project we have created a flexible and extensible Web Services-based Grid environment for biodiversity researchers to solve problems in biodiversity and analyse biodiversity patterns. In this environment, heterogeneous and globally distributed biodiversity-related resources such as data sets and analytical tools are made available to be accessed and assembled by users into workflows to perform complex scientific experiments. One such experiment is bioclimatic modelling of the geographical distribution of individual species using climate variables in order to predict past and future climate-related changes in species distribution. Data sources and analytical tools required for such analysis of species distribution are widely dispersed, available on heterogeneous platforms, present data in different formats and lack interoperability. The BDW system brings all these disparate units together so that the user can combine tools with little thought as to their availability, data formats and interoperability. The current Web Servicesbased Grid environment enables execution of the BDW workflow tasks in remote nodes but with a limited scope. The next step in the evolution of the BDW architecture is to enable workflow tasks to utilise computational resources available within and outside the BDW domain. We describe the present BDW architecture and its transition to a new framework which provides a distributed computational environment for mapping and executing workflows in addition to bringing together heterogeneous resources and analytical tools.
Resumo:
The service-oriented approach to performing distributed scientific research is potentially very powerful but is not yet widely used in many scientific fields. This is partly due to the technical difficulties involved in creating services and workflows and the inefficiency of many workflow systems with regard to handling large datasets. We present the Styx Grid Service, a simple system that wraps command-line programs and allows them to be run over the Internet exactly as if they were local programs. Styx Grid Services are very easy to create and use and can be composed into powerful workflows with simple shell scripts or more sophisticated graphical tools. An important feature of the system is that data can be streamed directly from service to service, significantly increasing the efficiency of workflows that use large data volumes. The status and progress of Styx Grid Services can be monitored asynchronously using a mechanism that places very few demands on firewalls. We show how Styx Grid Services can interoperate with with Web Services and WS-Resources using suitable adapters.
Resumo:
Background: In many experimental pipelines, clustering of multidimensional biological datasets is used to detect hidden structures in unlabelled input data. Taverna is a popular workflow management system that is used to design and execute scientific workflows and aid in silico experimentation. The availability of fast unsupervised methods for clustering and visualization in the Taverna platform is important to support a data-driven scientific discovery in complex and explorative bioinformatics applications. Results: This work presents a Taverna plugin, the Biological Data Interactive Clustering Explorer (BioDICE), that performs clustering of high-dimensional biological data and provides a nonlinear, topology preserving projection for the visualization of the input data and their similarities. The core algorithm in the BioDICE plugin is Fast Learning Self Organizing Map (FLSOM), which is an improved variant of the Self Organizing Map (SOM) algorithm. The plugin generates an interactive 2D map that allows the visual exploration of multidimensional data and the identification of groups of similar objects. The effectiveness of the plugin is demonstrated on a case study related to chemical compounds. Conclusions: The number and variety of available tools and its extensibility have made Taverna a popular choice for the development of scientific data workflows. This work presents a novel plugin, BioDICE, which adds a data-driven knowledge discovery component to Taverna. BioDICE provides an effective and powerful clustering tool, which can be adopted for the explorative analysis of biological datasets.
Resumo:
As scientific workflows and the data they operate on, grow in size and complexity, the task of defining how those workflows should execute (which resources to use, where the resources must be in readiness for processing etc.) becomes proportionally more difficult. While "workflow compilers", such as Pegasus, reduce this burden, a further problem arises: since specifying details of execution is now automatic, a workflow's results are harder to interpret, as they are partly due to specifics of execution. By automating steps between the experiment design and its results, we lose the connection between them, hindering interpretation of results. To reconnect the scientific data with the original experiment, we argue that scientists should have access to the full provenance of their data, including not only parameters, inputs and intermediary data, but also the abstract experiment, refined into a concrete execution by the "workflow compiler". In this paper, we describe preliminary work on adapting Pegasus to capture the process of workflow refinement in the PASOA provenance system.
Resumo:
Este trabajo tiene como objetivo describir la experiencia de implementación y desarrollo del Portal de revistas de la Facultad de Humanidades y Ciencias de Educación de la Universidad Nacional de La Plata a fin de que pueda ser aprovechada por todos aquellos que emprendan iniciativas de características similares. Para ello, se realiza en primer lugar un repaso por la trayectoria de la Facultad respecto a la edición de revistas científicas y la labor bibliotecaria para contribuir a su visualización. En segundo orden, se exponen las tareas llevadas adelante por la Prosecretaría de Gestión Editorial y Difusión (PGEyD) de la Facultad para concretar la puesta en marcha del portal. Se hace especial referencia a la personalización del software, a la metodología utilizada para la carga masiva de información en el sistema (usuarios y números retrospectivos) y a los procedimientos que permiten la inclusión en repositorio institucional y en el catálogo web de todos los contenidos del portal de manera semi-automática. Luego, se hace alusión al trabajo que se está realizando en relación al soporte y a la capacitación de los editores. Se exponen, después, los resultados conseguidos hasta el momento en un año de trabajo: creación de 10 revistas, migración de 4 títulos completos e inclusión del 25de las contribuciones publicadas en las revistas editadas por la FaHCE. A modo de cierre se enuncian una serie de desafíos que la Prosecretaría se ha propuesto para mejorar el Portal y optimizar los flujos de trabajo intra e interinstitucionales
Resumo:
Este trabajo tiene como objetivo describir la experiencia de implementación y desarrollo del Portal de revistas de la Facultad de Humanidades y Ciencias de Educación de la Universidad Nacional de La Plata a fin de que pueda ser aprovechada por todos aquellos que emprendan iniciativas de características similares. Para ello, se realiza en primer lugar un repaso por la trayectoria de la Facultad respecto a la edición de revistas científicas y la labor bibliotecaria para contribuir a su visualización. En segundo orden, se exponen las tareas llevadas adelante por la Prosecretaría de Gestión Editorial y Difusión (PGEyD) de la Facultad para concretar la puesta en marcha del portal. Se hace especial referencia a la personalización del software, a la metodología utilizada para la carga masiva de información en el sistema (usuarios y números retrospectivos) y a los procedimientos que permiten la inclusión en repositorio institucional y en el catálogo web de todos los contenidos del portal de manera semi-automática. Luego, se hace alusión al trabajo que se está realizando en relación al soporte y a la capacitación de los editores. Se exponen, después, los resultados conseguidos hasta el momento en un año de trabajo: creación de 10 revistas, migración de 4 títulos completos e inclusión del 25de las contribuciones publicadas en las revistas editadas por la FaHCE. A modo de cierre se enuncian una serie de desafíos que la Prosecretaría se ha propuesto para mejorar el Portal y optimizar los flujos de trabajo intra e interinstitucionales
Resumo:
Este trabajo tiene como objetivo describir la experiencia de implementación y desarrollo del Portal de revistas de la Facultad de Humanidades y Ciencias de Educación de la Universidad Nacional de La Plata a fin de que pueda ser aprovechada por todos aquellos que emprendan iniciativas de características similares. Para ello, se realiza en primer lugar un repaso por la trayectoria de la Facultad respecto a la edición de revistas científicas y la labor bibliotecaria para contribuir a su visualización. En segundo orden, se exponen las tareas llevadas adelante por la Prosecretaría de Gestión Editorial y Difusión (PGEyD) de la Facultad para concretar la puesta en marcha del portal. Se hace especial referencia a la personalización del software, a la metodología utilizada para la carga masiva de información en el sistema (usuarios y números retrospectivos) y a los procedimientos que permiten la inclusión en repositorio institucional y en el catálogo web de todos los contenidos del portal de manera semi-automática. Luego, se hace alusión al trabajo que se está realizando en relación al soporte y a la capacitación de los editores. Se exponen, después, los resultados conseguidos hasta el momento en un año de trabajo: creación de 10 revistas, migración de 4 títulos completos e inclusión del 25de las contribuciones publicadas en las revistas editadas por la FaHCE. A modo de cierre se enuncian una serie de desafíos que la Prosecretaría se ha propuesto para mejorar el Portal y optimizar los flujos de trabajo intra e interinstitucionales
Resumo:
Virtualized Infrastructures are a promising way for providing flexible and dynamic computing solutions for resourceconsuming tasks. Scientific Workflows are one of these kind of tasks, as they need a large amount of computational resources during certain periods of time. To provide the best infrastructure configuration for a workflow it is necessary to explore as many providers as possible taking into account different criteria like Quality of Service, pricing, response time, network latency, etc. Moreover, each one of these new resources must be tuned to provide the tools and dependencies required by each of the steps of the workflow. Working with different infrastructure providers, either public or private using their own concepts and terms, and with a set of heterogeneous applications requires a framework for integrating all the information about these elements. This work proposes semantic technologies for describing and integrating all the information about the different components of the overall system and a set of policies created by the user. Based on this information a scheduling process will be performed to generate an infrastructure configuration defining the set of virtual machines that must be run and the tools that must be deployed on them.
Resumo:
We describe a corpus of provenance traces that we have collected by executing 120 real world scientific workflows. The workflows are from two different workflow systems: Taverna [5] and Wings [3], and 12 different application domains (see Figure 1). Table 1 provides a summary of this PROV-corpus.
Resumo:
Workflows are increasingly used to manage and share scientific computations and methods. Workflow tools can be used to design, validate, execute and visualize scientific workflows and their execution results. Other tools manage workflow libraries or mine their contents. There has been a lot of recent work on workflow system integration as well as common workflow interlinguas, but the interoperability among workflow systems remains a challenge. Ideally, these tools would form a workflow ecosystem such that it should be possible to create a workflow with a tool, execute it with another, visualize it with another, and use yet another tool to mine a repository of such workflows or their executions. In this paper, we describe our approach to create a workflow ecosystem through the use of standard models for provenance (OPM and W3C PROV) and extensions (P-PLAN and OPMW) to represent workflows. The ecosystem integrates different workflow tools with diverse functions (workflow generation, execution, browsing, mining, and visualization) created by a variety of research groups. This is, to our knowledge, the first time that such a variety of workflow systems and functions are integrated.