949 resultados para Scientific Workflows


Relevância:

70.00% 70.00%

Publicador:

Resumo:

Los flujos de trabajo científicos han sido adoptados durante la última década para representar los métodos computacionales utilizados en experimentos in silico, así como para dar soporte a sus publicaciones asociadas. Dichos flujos de trabajo han demostrado ser útiles para compartir y reproducir experimentos científicos, permitiendo a investigadores visualizar, depurar y ahorrar tiempo a la hora de re-ejecutar un trabajo realizado con anterioridad. Sin embargo, los flujos de trabajo científicos pueden ser en ocasiones difíciles de entender y reutilizar. Esto es debido a impedimentos como el gran número de flujos de trabajo existentes en repositorios, su heterogeneidad o la falta generalizada de documentación y ejemplos de uso. Además, dado que normalmente es posible implementar un mismo método utilizando algoritmos o técnicas distintas, flujos de trabajo aparentemente distintos pueden estar relacionados a un determinado nivel de abstracción, basándose, por ejemplo, en su funcionalidad común. Esta tesis se centra en la reutilización de flujos de trabajo y su abstracción mediante la exploración de relaciones entre los flujos de trabajo de un repositorio y la extracción de abstracciones que podrían ayudar a la hora de reutilizar otros flujos de trabajo existentes. Para ello, se propone un modelo simple de representación de flujos de trabajo y sus ejecuciones, se analizan las abstracciones típicas que se pueden encontrar en los repositorios de flujos de trabajo, se exploran las prácticas habituales de los usuarios a la hora de reutilizar flujos de trabajo existentes y se describe un método para descubrir abstracciones útiles para usuarios, basadas en técnicas existentes de teoría de grafos. Los resultados obtenidos exponen las abstracciones y prácticas comunes de usuarios en términos de reutilización de flujos de trabajo, y muestran cómo las abstracciones que se extraen automáticamente tienen potencial para ser reutilizadas por usuarios que buscan diseñar nuevos flujos de trabajo. Abstract Scientific workflows have been adopted in the last decade to represent the computational methods used in in silico scientific experiments and their associated research products. Scientific workflows have demonstrated to be useful for sharing and reproducing scientific experiments, allowing scientists to visualize, debug and save time when re-executing previous work. However, scientific workflows may be difficult to understand and reuse. The large amount of available workflows in repositories, together with their heterogeneity and lack of documentation and usage examples may become an obstacle for a scientist aiming to reuse the work from other scientists. Furthermore, given that it is often possible to implement a method using different algorithms or techniques, seemingly disparate workflows may be related at a higher level of abstraction, based on their common functionality. In this thesis we address the issue of reusability and abstraction by exploring how workflows relate to one another in a workflow repository, mining abstractions that may be helpful for workflow reuse. In order to do so, we propose a simple model for representing and relating workflows and their executions, we analyze the typical common abstractions that can be found in workflow repositories, we explore the current practices of users regarding workflow reuse and we describe a method for discovering useful abstractions for workflows based on existing graph mining techniques. Our results expose the common abstractions and practices of users in terms of workflow reuse, and show how our proposed abstractions have potential to become useful for users designing new workflows.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

La reproducibilidad de estudios y resultados científicos es una meta a tener en cuenta por cualquier científico a la hora de publicar el producto de una investigación. El auge de la ciencia computacional, como una forma de llevar a cabo estudios empíricos haciendo uso de modelos matemáticos y simulaciones, ha derivado en una serie de nuevos retos con respecto a la reproducibilidad de dichos experimentos. La adopción de los flujos de trabajo como método para especificar el procedimiento científico de estos experimentos, así como las iniciativas orientadas a la conservación de los datos experimentales desarrolladas en las últimas décadas, han solucionado parcialmente este problema. Sin embargo, para afrontarlo de forma completa, la conservación y reproducibilidad del equipamiento computacional asociado a los flujos de trabajo científicos deben ser tenidas en cuenta. La amplia gama de recursos hardware y software necesarios para ejecutar un flujo de trabajo científico hace que sea necesario aportar una descripción completa detallando que recursos son necesarios y como estos deben de ser configurados. En esta tesis abordamos la reproducibilidad de los entornos de ejecución para flujos de trabajo científicos, mediante su documentación usando un modelo formal que puede ser usado para obtener un entorno equivalente. Para ello, se ha propuesto un conjunto de modelos para representar y relacionar los conceptos relevantes de dichos entornos, así como un conjunto de herramientas que hacen uso de dichos módulos para generar una descripción de la infraestructura, y un algoritmo capaz de generar una nueva especificación de entorno de ejecución a partir de dicha descripción, la cual puede ser usada para recrearlo usando técnicas de virtualización. Estas contribuciones han sido aplicadas a un conjunto representativo de experimentos científicos pertenecientes a diferentes dominios de la ciencia, exponiendo cada uno de ellos diferentes requisitos hardware y software. Los resultados obtenidos muestran la viabilidad de propuesta desarrollada, reproduciendo de forma satisfactoria los experimentos estudiados en diferentes entornos de virtualización. ABSTRACT Reproducibility of scientific studies and results is a goal that every scientist must pursuit when announcing research outcomes. The rise of computational science, as a way of conducting empirical studies by using mathematical models and simulations, have opened a new range of challenges in this context. The adoption of workflows as a way of detailing the scientific procedure of these experiments, along with the experimental data conservation initiatives that have been undertaken during last decades, have partially eased this problem. However, in order to fully address it, the conservation and reproducibility of the computational equipment related to them must be also considered. The wide range of software and hardware resources required to execute a scientific workflow implies that a comprehensive description detailing what those resources are and how they are arranged is necessary. In this thesis we address the issue of reproducibility of execution environments for scientific workflows, by documenting them in a formalized way, which can be later used to obtain and equivalent one. In order to do so, we propose a set of semantic models for representing and relating the relevant information of those environments, as well as a set of tools that uses these models for generating a description of the infrastructure, and an algorithmic process that consumes these descriptions for deriving a new execution environment specification, which can be enacted into a new equivalent one using virtualization solutions. We apply these three contributions to a set of representative scientific experiments, belonging to different scientific domains, and exposing different software and hardware requirements. The obtained results prove the feasibility of the proposed approach, by successfully reproducing the target experiments under different virtualization environments.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

A ciência tem feito uso frequente de recursos computacionais para execução de experimentos e processos científicos, que podem ser modelados como workflows que manipulam grandes volumes de dados e executam ações como seleção, análise e visualização desses dados segundo um procedimento determinado. Workflows científicos têm sido usados por cientistas de várias áreas, como astronomia e bioinformática, e tendem a ser computacionalmente intensivos e fortemente voltados à manipulação de grandes volumes de dados, o que requer o uso de plataformas de execução de alto desempenho como grades ou nuvens de computadores. Para execução dos workflows nesse tipo de plataforma é necessário o mapeamento dos recursos computacionais disponíveis para as atividades do workflow, processo conhecido como escalonamento. Plataformas de computação em nuvem têm se mostrado um alternativa viável para a execução de workflows científicos, mas o escalonamento nesse tipo de plataforma geralmente deve considerar restrições específicas como orçamento limitado ou o tipo de recurso computacional a ser utilizado na execução. Nesse contexto, informações como a duração estimada da execução ou limites de tempo e de custo (chamadas aqui de informações de suporte ao escalonamento) são importantes para garantir que o escalonamento seja eficiente e a execução ocorra de forma a atingir os resultados esperados. Este trabalho identifica as informações de suporte que podem ser adicionadas aos modelos de workflows científicos para amparar o escalonamento e a execução eficiente em plataformas de computação em nuvem. É proposta uma classificação dessas informações, e seu uso nos principais Sistemas Gerenciadores de Workflows Científicos (SGWC) é analisado. Para avaliar o impacto do uso das informações no escalonamento foram realizados experimentos utilizando modelos de workflows científicos com diferentes informações de suporte, escalonados com algoritmos que foram adaptados para considerar as informações inseridas. Nos experimentos realizados, observou-se uma redução no custo financeiro de execução do workflow em nuvem de até 59% e redução no makespan chegando a 8,6% se comparados à execução dos mesmos workflows sendo escalonados sem nenhuma informação de suporte disponível.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

This paper describes the use of the Business Process Execution Language for Web Services (BPEL4WS/BPEL) for managing scientific workflows. This work is result of our attempt to adopt Service Oriented Architecture in order to perform Web services – based simulation of metal vapor lasers. Scientific workflows can be more demanding in their requirements than business processes. In the context of addressing these requirements, the features of the BPEL4WS specification are discussed, which is widely regarded as the de-facto standard for orchestrating Web services for business workflows. A typical use case of calculation the electric field potential and intensity distributions is discussed as an example of building a BPEL process to perform distributed simulation constructed by loosely-coupled services.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Variations that exist in the treatment of patients (with similar symptoms) across different hospitals do substantially impact the quality and costs of healthcare. Consequently, it is important to understand the similarities and differences between the practices across different hospitals. This paper presents a case study on the application of process mining techniques to measure and quantify the differences in the treatment of patients presenting with chest pain symptoms across four South Australian hospitals. Our case study focuses on cross-organisational benchmarking of processes and their performance. Techniques such as clustering, process discovery, performance analysis, and scientific workflows were applied to facilitate such comparative analyses. Lessons learned in overcoming unique challenges in cross-organisational process mining, such as ensuring population comparability, data granularity comparability, and experimental repeatability are also presented.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Background: In many experimental pipelines, clustering of multidimensional biological datasets is used to detect hidden structures in unlabelled input data. Taverna is a popular workflow management system that is used to design and execute scientific workflows and aid in silico experimentation. The availability of fast unsupervised methods for clustering and visualization in the Taverna platform is important to support a data-driven scientific discovery in complex and explorative bioinformatics applications. Results: This work presents a Taverna plugin, the Biological Data Interactive Clustering Explorer (BioDICE), that performs clustering of high-dimensional biological data and provides a nonlinear, topology preserving projection for the visualization of the input data and their similarities. The core algorithm in the BioDICE plugin is Fast Learning Self Organizing Map (FLSOM), which is an improved variant of the Self Organizing Map (SOM) algorithm. The plugin generates an interactive 2D map that allows the visual exploration of multidimensional data and the identification of groups of similar objects. The effectiveness of the plugin is demonstrated on a case study related to chemical compounds. Conclusions: The number and variety of available tools and its extensibility have made Taverna a popular choice for the development of scientific data workflows. This work presents a novel plugin, BioDICE, which adds a data-driven knowledge discovery component to Taverna. BioDICE provides an effective and powerful clustering tool, which can be adopted for the explorative analysis of biological datasets.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

A description of a data item's provenance can be provided in dierent forms, and which form is best depends on the intended use of that description. Because of this, dierent communities have made quite distinct underlying assumptions in their models for electronically representing provenance. Approaches deriving from the library and archiving communities emphasise agreed vocabulary by which resources can be described and, in particular, assert their attribution (who created the resource, who modied it, where it was stored etc.) The primary purpose here is to provide intuitive metadata by which users can search for and index resources. In comparison, models for representing the results of scientific workflows have been developed with the assumption that each event or piece of intermediary data in a process' execution can and should be documented, to give a full account of the experiment undertaken. These occurrences are connected together by stating where one derived from, triggered, or otherwise caused another, and so form a causal graph. Mapping between the two approaches would be benecial in integrating systems and exploiting the strengths of each. In this paper, we specify such a mapping between Dublin Core and the Open Provenance Model. We further explain the technical issues to overcome and the rationale behind the approach, to allow the same method to apply in mapping similar schemes.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The rule creation to clone selection in different projects is a hard task to perform by using traditional implementations to control all the processes of the system. The use of an algebraic language is an alternative approach to manage all of system flow in a flexible way. In order to increase the power of versatility and consistency in defining the rules for optimal clone selection, this paper presents the software OCI 2 in which uses process algebra in the flow behavior of the system. OCI 2, controlled by an algebraic approach was applied in the rules elaboration for clone selection containing unique genes in the partial genome of the bacterium Bradyrhizobium elkanii Semia 587 and in the whole genome of the bacterium Xanthomonas axonopodis pv. citri. Copyright© (2009) by the International Society for Research in Science and Technology.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Due to the wide diversity of unknown organisms in the environment, 99% of them cannot be grown in traditional culture medium in laboratories. Therefore, metagenomics projects are proposed to study microbial communities present in the environment, from molecular techniques, especially the sequencing. Thereby, for the coming years it is expected an accumulation of sequences produced by these projects. Thus, the sequences produced by genomics and metagenomics projects present several challenges for the treatment, storing and analysis such as: the search for clones containing genes of interest. This work presents the OCI Metagenomics, which allows defines and manages dynamically the rules of clone selection in metagenomic libraries, thought an algebraic approach based on process algebra. Furthermore, a web interface was developed to allow researchers to easily create and execute their own rules to select clones in genomic sequence database. This software has been tested in metagenomic cosmid library and it was able to select clones containing genes of interest. Copyright 2010 ACM.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We describe a corpus of provenance traces that we have collected by executing 120 real world scientific workflows. The workflows are from two different workflow systems: Taverna [5] and Wings [3], and 12 different application domains (see Figure 1). Table 1 provides a summary of this PROV-corpus.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

New digital artifacts are emerging in data-intensive science. For example, scientific workflows are executable descriptions of scientific procedures that define the sequence of computational steps in an automated data analysis, supporting reproducible research and the sharing and replication of best-practice and know-how through reuse. Workflows are specified at design time and interpreted through their execution in a variety of situations, environments, and domains. Hence it is essential to preserve both their static and dynamic aspects, along with the research context in which they are used. To achieve this, we propose the use of multidimensional digital objects (Research Objects) that aggregate the resources used and/or produced in scientific investigations, including workflow models, provenance of their executions, and links to the relevant associated resources, along with the provision of technological support for their preservation and efficient retrieval and reuse. In this direction, we specified a software architecture for the design and implementation of a Research Object preservation system, and realized this architecture with a set of services and clients, drawing together practices in digital libraries, preservation systems, workflow management, social networking and Semantic Web technologies. In this paper, we describe the backbone system of this realization, a digital library system built on top of dLibra.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Workflows are increasingly used to manage and share scientific computations and methods. Workflow tools can be used to design, validate, execute and visualize scientific workflows and their execution results. Other tools manage workflow libraries or mine their contents. There has been a lot of recent work on workflow system integration as well as common workflow interlinguas, but the interoperability among workflow systems remains a challenge. Ideally, these tools would form a workflow ecosystem such that it should be possible to create a workflow with a tool, execute it with another, visualize it with another, and use yet another tool to mine a repository of such workflows or their executions. In this paper, we describe our approach to create a workflow ecosystem through the use of standard models for provenance (OPM and W3C PROV) and extensions (P-PLAN and OPMW) to represent workflows. The ecosystem integrates different workflow tools with diverse functions (workflow generation, execution, browsing, mining, and visualization) created by a variety of research groups. This is, to our knowledge, the first time that such a variety of workflow systems and functions are integrated.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Two important characteristics of science are the ?reproducibility? and ?clarity?. By rigorous practices, scientists explore aspects of the world that they can reproduce under carefully controlled experimental conditions. The clarity, complementing reproducibility, provides unambiguous descriptions of results in a mechanical or mathematical form. Both pillars depend on well-structured and accurate descriptions of scientific practices, which are normally recorded in experimental protocols, scientific workflows, etc. Here we present SMART Protocols (SP), our ontology-based approach for representing experimental protocols and our contribution to clarity and reproducibility. SP delivers an unambiguous description of processes by means of which data is produced; by doing so, we argue, it facilitates reproducibility. Moreover, SP is thought to be part of e-science infrastructures. SP results from the analysis of 175 protocols; from this dataset, we extracted common elements. From our analysis, we identified document, workflow and domain-specific aspects in the representation of experimental protocols. The ontology is available at http://purl.org/net/SMARTprotocol

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Los modelos ecológicos se han convertido en una pieza clave de esta ciencia. La generación de conocimiento se consigue en buena medida mediante procesos analíticos más o menos complejos aplicados sobre conjuntos de datos diversos. Pero buena parte del conocimiento necesario para diseñar e implementar esos modelos no está accesible a la comunidad científica. Proponemos la creación de herramientas informáticas para documentar, almacenar y ejecutar modelos ecológicos y flujos de trabajo. Estas herramientas (repositorios de modelos) están siendo desarrolladas por otras disciplinas como la biología molecular o las ciencias de la Tierra. Presentamos un repositorio de modelos (ModeleR) desarrollado en el contexto del Observatorio de seguimiento del cambio global de Sierra Nevada (Granada-Almería). Creemos que los repositorios de modelos fomentarán la cooperación entre científicos, mejorando la creación de conocimiento relevante que podría ser transferido a los tomadores de decisiones.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper concerns the application of recent information technologies for creating a software system for numerical simulations in the domain of plasma physics and in particular metal vapor lasers. The presented work is connected with performing modernization of legacy physics software for reuse on the web and inside a Service-Oriented Architecture environment. Applied and described is the creation of Java front-ends of legacy C++ and FORTRAN codes. Then the transformation of some of the scientific components into web services, as well as the creation of a web interface to the legacy application, is presented. The use of the BPEL language for managing scientific workflows is also considered.