998 resultados para Scientific workflow


Relevância:

100.00% 100.00%

Publicador:

Resumo:

In recent years several scientific Workflow Management Systems (WfMSs) have been developed with the aim to automate large scale scientific experiments. As yet, many offerings have been developed, but none of them has been promoted as an accepted standard. In this paper we propose a pattern-based evaluation of three among the most widely used scientific WfMSs: Kepler, Taverna and Triana. The aim is to compare them with traditional business WfMSs, emphasizing the strengths and deficiencies of both systems. Moreover, a set of new patterns is defined from the analysis of the three considered systems.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Increasingly scientists are using collections of software tools in their research. These tools are typically used in concert, often necessitating laborious and error-prone manual data reformatting and transfer. We present an intuitive workflow environment to support scientists with their research. The workflow, GPFlow, wraps legacy tools, presenting a high level, interactive web-based front end to scientists. The workflow backend is realized by a commercial grade workflow engine (Windows Workflow Foundation). The workflow model is inspired by spreadsheets and is novel in its support for an intuitive method of interaction enabling experimentation as required by many scientists, e.g. bioinformaticians. We apply GPFlow to two bioinformatics experiments and demonstrate its flexibility and simplicity.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

E-scientists want to run their scientific experiments on Distributed Computing Infrastructures (DCI) to be able to access large pools of resources and services. To run experiments on these infrastructures requires specific expertise that e-scientists may not have. Workflows can hide resources and services as a virtualization layer providing a user interface that e-scientists can use. There are many workflow systems used by research communities but they are not interoperable. To learn a workflow system and create workflows in this workflow system may require significant efforts from e-scientists. Considering these efforts it is not reasonable to expect that research communities will learn new workflow systems if they want to run workflows developed in other workflow systems. The solution is to create workflow interoperability solutions to allow workflow sharing. The FP7 Sharing Interoperable Workflow for Large-Scale Scientific Simulation on Available DCIs (SHIWA) project developed two interoperability solutions to support workflow sharing: Coarse-Grained Interoperability (CGI) and Fine-Grained Interoperability (FGI). The project created the SHIWA Simulation Platform (SSP) to implement the Coarse-Grained Interoperability approach as a production-level service for research communities. The paper describes the CGI approach and how it enables sharing and combining existing workflows into complex applications and run them on Distributed Computing Infrastructures. The paper also outlines the architecture, components and usage scenarios of the simulation platform.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Provenance plays a major role when understanding and reusing the methods applied in a scientic experiment, as it provides a record of inputs, the processes carried out and the use and generation of intermediate and nal results. In the specic case of in-silico scientic experiments, a large variety of scientic workflow systems (e.g., Wings, Taverna, Galaxy, Vistrails) have been created to support scientists. All of these systems produce some sort of provenance about the executions of the workflows that encode scientic experiments. However, provenance is normally recorded at a very low level of detail, which complicates the understanding of what happened during execution. In this paper we propose an approach to automatically obtain abstractions from low-level provenance data by finding common workflow fragments on workflow execution provenance and relating them to templates. We have tested our approach with a dataset of workflows published by the Wings workflow system. Our results show that by using these kinds of abstractions we can highlight the most common abstract methods used in the executions of a repository, relating different runs and workflow templates with each other.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Scientific workflow offers a framework for cooperation between remote and shared resources on a grid computing environment (GCE) for scientific discovery. One major function of scientific workflow is to schedule a collection of computational subtasks in well-defined orders for efficient outputs by estimating task duration at runtime. In this paper, we propose a novel time computation model based on algorithm complexity (termed as TCMAC model) for high-level data intensive scientific workflow design. The proposed model schedules the subtasks based on their durations and the complexities of participant algorithms. Characterized by utilization of task duration computation function for time efficiency, the TCMAC model has three features for a full-aspect scientific workflow including both dataflow and control-flow: (1) provides flexible and reusable task duration functions in GCE;(2) facilitates better parallelism in iteration structures for providing more precise task durations;and (3) accommodates dynamic task durations for rescheduling in selective structures of control flow. We will also present theories and examples in scientific workflows to show the efficiency of the TCMAC model, especially for control-flow. Copyright©2009 John Wiley & Sons, Ltd.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Scientific workflow is a complicated data intensive application. How to achieve an effective data placement schema in hybrid cloud environment has become a crucial issue nowadays, especially with the new challenges brought by the security issues. Traditional data placement strategies usually adopt load balancing-based partition model to allocate datasets. Although these data placement schemas can have good performance in load balancing, their data transfer time may not be optimal. In contrast to traditional strategies, this paper focuses on the hybrid cloud environment and proposes a data dependency destruction-based partition model to achieve the minimal data dependency destruction partition. In addition, it presents a novel datacenter-oriented data placement strategy. This strategy allocates high dependency datasets to one datacenter according to the new partition model and thus significantly reduces data transfer time between datacenters. Experimental results show that the proposed strategy can effectively reduce data transfer time during workflow's execution.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Scientific processes are usually time constrained with overall deadlines and local milestones. In scientific workflow systems, due to the dynamic nature of the underlying computing infrastructures such as grid and cloud, execution delays often take place and result in a large number of temporal violations. Temporal violation handling is to execute violation handling strategies which can compensate for the occurring time deficit but would impose some additional cost. Generally speaking, the two fundamental requirements for delivering satisfactory temporal QoS in scientific workflow systems are temporal conformance and cost effectiveness. Every task for workflow temporal management incurs some cost. Take a single temporal violation handling as an example, its cost can be primarily referred to monetary costs and time overheads of violation handling strategies which are normally nontrivial in scientific workflow systems.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Grid workflow authoring tools are typically specific to particular workflow engines built into Grid middleware, or are application specific and are designed to interact with specific software implementations. g-Eclipse is a middleware independent Grid workbench that aims to provide a unified abstraction of the Grid and includes a Grid workflow builder to allow users to author and deploy workflows to the Grid. This paper describes the g-Eclipse Workflow Builder and its implementations for two Grid middlewares, gLite and GRIA, and a case study utilizing the Workflow Builder in a Grid user's scientific workflow deployment.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

La reproducibilidad de estudios y resultados científicos es una meta a tener en cuenta por cualquier científico a la hora de publicar el producto de una investigación. El auge de la ciencia computacional, como una forma de llevar a cabo estudios empíricos haciendo uso de modelos matemáticos y simulaciones, ha derivado en una serie de nuevos retos con respecto a la reproducibilidad de dichos experimentos. La adopción de los flujos de trabajo como método para especificar el procedimiento científico de estos experimentos, así como las iniciativas orientadas a la conservación de los datos experimentales desarrolladas en las últimas décadas, han solucionado parcialmente este problema. Sin embargo, para afrontarlo de forma completa, la conservación y reproducibilidad del equipamiento computacional asociado a los flujos de trabajo científicos deben ser tenidas en cuenta. La amplia gama de recursos hardware y software necesarios para ejecutar un flujo de trabajo científico hace que sea necesario aportar una descripción completa detallando que recursos son necesarios y como estos deben de ser configurados. En esta tesis abordamos la reproducibilidad de los entornos de ejecución para flujos de trabajo científicos, mediante su documentación usando un modelo formal que puede ser usado para obtener un entorno equivalente. Para ello, se ha propuesto un conjunto de modelos para representar y relacionar los conceptos relevantes de dichos entornos, así como un conjunto de herramientas que hacen uso de dichos módulos para generar una descripción de la infraestructura, y un algoritmo capaz de generar una nueva especificación de entorno de ejecución a partir de dicha descripción, la cual puede ser usada para recrearlo usando técnicas de virtualización. Estas contribuciones han sido aplicadas a un conjunto representativo de experimentos científicos pertenecientes a diferentes dominios de la ciencia, exponiendo cada uno de ellos diferentes requisitos hardware y software. Los resultados obtenidos muestran la viabilidad de propuesta desarrollada, reproduciendo de forma satisfactoria los experimentos estudiados en diferentes entornos de virtualización. ABSTRACT Reproducibility of scientific studies and results is a goal that every scientist must pursuit when announcing research outcomes. The rise of computational science, as a way of conducting empirical studies by using mathematical models and simulations, have opened a new range of challenges in this context. The adoption of workflows as a way of detailing the scientific procedure of these experiments, along with the experimental data conservation initiatives that have been undertaken during last decades, have partially eased this problem. However, in order to fully address it, the conservation and reproducibility of the computational equipment related to them must be also considered. The wide range of software and hardware resources required to execute a scientific workflow implies that a comprehensive description detailing what those resources are and how they are arranged is necessary. In this thesis we address the issue of reproducibility of execution environments for scientific workflows, by documenting them in a formalized way, which can be later used to obtain and equivalent one. In order to do so, we propose a set of semantic models for representing and relating the relevant information of those environments, as well as a set of tools that uses these models for generating a description of the infrastructure, and an algorithmic process that consumes these descriptions for deriving a new execution environment specification, which can be enacted into a new equivalent one using virtualization solutions. We apply these three contributions to a set of representative scientific experiments, belonging to different scientific domains, and exposing different software and hardware requirements. The obtained results prove the feasibility of the proposed approach, by successfully reproducing the target experiments under different virtualization environments.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Many scientific workflows are data intensive where large volumes of intermediate data are generated during their execution. Some valuable intermediate data need to be stored for sharing or reuse. Traditionally, they are selectively stored according to the system storage capacity, determined manually. As doing science in the cloud has become popular nowadays, more intermediate data can be stored in scientific cloud workflows based on a pay-for-use model. In this paper, we build an intermediate data dependency graph (IDG) from the data provenance in scientific workflows. With the IDG, deleted intermediate data can be regenerated, and as such we develop a novel intermediate data storage strategy that can reduce the cost of scientific cloud workflow systems by automatically storing appropriate intermediate data sets with one cloud service provider. The strategy has significant research merits, i.e. it achieves a cost-effective trade-off of computation cost and storage cost and is not strongly impacted by the forecasting inaccuracy of data sets' usages. Meanwhile, the strategy also takes the users' tolerance of data accessing delay into consideration. We utilize Amazon's cost model and apply the strategy to general random as well as specific astrophysics pulsar searching scientific workflows for evaluation. The results show that our strategy can reduce the overall cost of scientific cloud workflow execution significantly.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Thesis (Master's)--University of Washington, 2015

Relevância:

70.00% 70.00%

Publicador:

Resumo:

While workflow technology has gained momentum in the last decade as a means for specifying and enacting computational experiments in modern science, reusing and repurposing existing workflows to build new scientific experiments is still a daunting task. This is partly due to the difficulty that scientists experience when attempting to understand existing workflows, which contain several data preparation and adaptation steps in addition to the scientifically significant analysis steps. One way to tackle the understandability problem is through providing abstractions that give a high-level view of activities undertaken within workflows. As a first step towards abstractions, we report in this paper on the results of a manual analysis performed over a set of real-world scientific workflows from Taverna and Wings systems. Our analysis has resulted in a set of scientific workflow motifs that outline i) the kinds of data intensive activities that are observed in workflows (data oriented motifs), and ii) the different manners in which activities are implemented within workflows (workflow oriented motifs). These motifs can be useful to inform workflow designers on the good and bad practices for workflow development, to inform the design of automated tools for the generation of workflow abstractions, etc.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Workflow technology continues to play an important role as a means for specifying and enacting computational experiments in modern science. Reusing and re-purposing workflows allow scientists to do new experiments faster, since the workflows capture useful expertise from others. As workflow libraries grow, scientists face the challenge of finding workflows appropriate for their task, understanding what each workflow does, and reusing relevant portions of a given workflow.We believe that workflows would be easier to understand and reuse if high-level views (abstractions) of their activities were available in workflow libraries. As a first step towards obtaining these abstractions, we report in this paper on the results of a manual analysis performed over a set of real-world scientific workflows from Taverna, Wings, Galaxy and Vistrails. Our analysis has resulted in a set of scientific workflow motifs that outline (i) the kinds of data-intensive activities that are observed in workflows (Data-Operation motifs), and (ii) the different manners in which activities are implemented within workflows (Workflow-Oriented motifs). These motifs are helpful to identify the functionality of the steps in a given workflow, to develop best practices for workflow design, and to develop approaches for automated generation of workflow abstractions.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Part 14: Interoperability and Integration