562 resultados para Workflow
Resumo:
It is estimated that the quantity of digital data being transferred, processed or stored at any one time currently stands at 4.4 zettabytes (4.4 × 2 70 bytes) and this figure is expected to have grown by a factor of 10 to 44 zettabytes by 2020. Exploiting this data is, and will remain, a significant challenge. At present there is the capacity to store 33% of digital data in existence at any one time; by 2020 this capacity is expected to fall to 15%. These statistics suggest that, in the era of Big Data, the identification of important, exploitable data will need to be done in a timely manner. Systems for the monitoring and analysis of data, e.g. stock markets, smart grids and sensor networks, can be made up of massive numbers of individual components. These components can be geographically distributed yet may interact with one another via continuous data streams, which in turn may affect the state of the sender or receiver. This introduces a dynamic causality, which further complicates the overall system by introducing a temporal constraint that is difficult to accommodate. Practical approaches to realising the system described above have led to a multiplicity of analysis techniques, each of which concentrates on specific characteristics of the system being analysed and treats these characteristics as the dominant component affecting the results being sought. The multiplicity of analysis techniques introduces another layer of heterogeneity, that is heterogeneity of approach, partitioning the field to the extent that results from one domain are difficult to exploit in another. The question is asked can a generic solution for the monitoring and analysis of data that: accommodates temporal constraints; bridges the gap between expert knowledge and raw data; and enables data to be effectively interpreted and exploited in a transparent manner, be identified? The approach proposed in this dissertation acquires, analyses and processes data in a manner that is free of the constraints of any particular analysis technique, while at the same time facilitating these techniques where appropriate. Constraints are applied by defining a workflow based on the production, interpretation and consumption of data. This supports the application of different analysis techniques on the same raw data without the danger of incorporating hidden bias that may exist. To illustrate and to realise this approach a software platform has been created that allows for the transparent analysis of data, combining analysis techniques with a maintainable record of provenance so that independent third party analysis can be applied to verify any derived conclusions. In order to demonstrate these concepts, a complex real world example involving the near real-time capturing and analysis of neurophysiological data from a neonatal intensive care unit (NICU) was chosen. A system was engineered to gather raw data, analyse that data using different analysis techniques, uncover information, incorporate that information into the system and curate the evolution of the discovered knowledge. The application domain was chosen for three reasons: firstly because it is complex and no comprehensive solution exists; secondly, it requires tight interaction with domain experts, thus requiring the handling of subjective knowledge and inference; and thirdly, given the dearth of neurophysiologists, there is a real world need to provide a solution for this domain
Resumo:
Nolan and Temple Lang argue that “the ability to express statistical computations is an es- sential skill.” A key related capacity is the ability to conduct and present data analysis in a way that another person can understand and replicate. The copy-and-paste workflow that is an artifact of antiquated user-interface design makes reproducibility of statistical analysis more difficult, especially as data become increasingly complex and statistical methods become increasingly sophisticated. R Markdown is a new technology that makes creating fully-reproducible statistical analysis simple and painless. It provides a solution suitable not only for cutting edge research, but also for use in an introductory statistics course. We present experiential and statistical evidence that R Markdown can be used effectively in introductory statistics courses, and discuss its role in the rapidly-changing world of statistical computation.
Resumo:
An enterprise information system (EIS) is an integrated data-applications platform characterized by diverse, heterogeneous, and distributed data sources. For many enterprises, a number of business processes still depend heavily on static rule-based methods and extensive human expertise. Enterprises are faced with the need for optimizing operation scheduling, improving resource utilization, discovering useful knowledge, and making data-driven decisions.
This thesis research is focused on real-time optimization and knowledge discovery that addresses workflow optimization, resource allocation, as well as data-driven predictions of process-execution times, order fulfillment, and enterprise service-level performance. In contrast to prior work on data analytics techniques for enterprise performance optimization, the emphasis here is on realizing scalable and real-time enterprise intelligence based on a combination of heterogeneous system simulation, combinatorial optimization, machine-learning algorithms, and statistical methods.
On-demand digital-print service is a representative enterprise requiring a powerful EIS.We use real-life data from Reischling Press, Inc. (RPI), a digit-print-service provider (PSP), to evaluate our optimization algorithms.
In order to handle the increase in volume and diversity of demands, we first present a high-performance, scalable, and real-time production scheduling algorithm for production automation based on an incremental genetic algorithm (IGA). The objective of this algorithm is to optimize the order dispatching sequence and balance resource utilization. Compared to prior work, this solution is scalable for a high volume of orders and it provides fast scheduling solutions for orders that require complex fulfillment procedures. Experimental results highlight its potential benefit in reducing production inefficiencies and enhancing the productivity of an enterprise.
We next discuss analysis and prediction of different attributes involved in hierarchical components of an enterprise. We start from a study of the fundamental processes related to real-time prediction. Our process-execution time and process status prediction models integrate statistical methods with machine-learning algorithms. In addition to improved prediction accuracy compared to stand-alone machine-learning algorithms, it also performs a probabilistic estimation of the predicted status. An order generally consists of multiple series and parallel processes. We next introduce an order-fulfillment prediction model that combines advantages of multiple classification models by incorporating flexible decision-integration mechanisms. Experimental results show that adopting due dates recommended by the model can significantly reduce enterprise late-delivery ratio. Finally, we investigate service-level attributes that reflect the overall performance of an enterprise. We analyze and decompose time-series data into different components according to their hierarchical periodic nature, perform correlation analysis,
and develop univariate prediction models for each component as well as multivariate models for correlated components. Predictions for the original time series are aggregated from the predictions of its components. In addition to a significant increase in mid-term prediction accuracy, this distributed modeling strategy also improves short-term time-series prediction accuracy.
In summary, this thesis research has led to a set of characterization, optimization, and prediction tools for an EIS to derive insightful knowledge from data and use them as guidance for production management. It is expected to provide solutions for enterprises to increase reconfigurability, accomplish more automated procedures, and obtain data-driven recommendations or effective decisions.
Resumo:
BACKGROUND: The wealth of phenotypic descriptions documented in the published articles, monographs, and dissertations of phylogenetic systematics is traditionally reported in a free-text format, and it is therefore largely inaccessible for linkage to biological databases for genetics, development, and phenotypes, and difficult to manage for large-scale integrative work. The Phenoscape project aims to represent these complex and detailed descriptions with rich and formal semantics that are amenable to computation and integration with phenotype data from other fields of biology. This entails reconceptualizing the traditional free-text characters into the computable Entity-Quality (EQ) formalism using ontologies. METHODOLOGY/PRINCIPAL FINDINGS: We used ontologies and the EQ formalism to curate a collection of 47 phylogenetic studies on ostariophysan fishes (including catfishes, characins, minnows, knifefishes) and their relatives with the goal of integrating these complex phenotype descriptions with information from an existing model organism database (zebrafish, http://zfin.org). We developed a curation workflow for the collection of character, taxonomic and specimen data from these publications. A total of 4,617 phenotypic characters (10,512 states) for 3,449 taxa, primarily species, were curated into EQ formalism (for a total of 12,861 EQ statements) using anatomical and taxonomic terms from teleost-specific ontologies (Teleost Anatomy Ontology and Teleost Taxonomy Ontology) in combination with terms from a quality ontology (Phenotype and Trait Ontology). Standards and guidelines for consistently and accurately representing phenotypes were developed in response to the challenges that were evident from two annotation experiments and from feedback from curators. CONCLUSIONS/SIGNIFICANCE: The challenges we encountered and many of the curation standards and methods for improving consistency that we developed are generally applicable to any effort to represent phenotypes using ontologies. This is because an ontological representation of the detailed variations in phenotype, whether between mutant or wildtype, among individual humans, or across the diversity of species, requires a process by which a precise combination of terms from domain ontologies are selected and organized according to logical relations. The efficiencies that we have developed in this process will be useful for any attempt to annotate complex phenotypic descriptions using ontologies. We also discuss some ramifications of EQ representation for the domain of systematics.
Resumo:
A cross-domain workflow application may be constructed using a standard reference model such as the one by the Workflow Management Coalition (WfMC) [7] but the requirements for this type of application are inherently different from one organization to another. The existing models and systems built around them meet some but not all the requirements from all the organizations involved in a collaborative process. Furthermore the requirements change over time. This makes the applications difficult to develop and distribute. Service Oriented Architecture (SOA) based approaches such as the BPET (Business Process Execution Language) intend to provide a solution but fail to address the problems sufficiently, especially in the situations where the expectations and level of skills of the users (e.g. the participants of the processes) in different organisations are likely to be different. In this paper, we discuss a design pattern that provides a novel approach towards a solution. In the solution, business users can design the applications at a high level of abstraction: the use cases and user interactions; the designs are documented and used, together with the data and events captured later that represents the user interactions with the systems, to feed an intermediate component local to the users -the IFM (InterFace Mapper) -which bridges the gaps between the users and the systems. We discuss the main issues faced in the design and prototyping. The approach alleviates the need for re-programming with the APIs to any back-end service thus easing the development and distribution of the applications
Resumo:
eScience is an umbrella concept which covers internet technologies, such as web service orchestration that involves manipulation and processing of high volumes of data, using simple and efficient methodologies. This concept is normally associated with bioinformatics, but nothing prevents the use of an identical approach for geoinfomatics and OGC (Open Geospatial Consortium) web services like WPS (Web Processing Service). In this paper we present an extended WPS implementation based on the PyWPS framework using an automatically generated WSDL (Web Service Description Language) XML document that replicates the WPS input/output document structure used during an Execute request to a server. Services are accessed using a modified SOAP (Simple Object Access Protocol) interface provided by PyWPS, that uses service and input/outputs identifiers as element names. The WSDL XML document is dynamically generated by applying XSLT (Extensible Stylesheet Language Transformation) to the getCapabilities XML document that is generated by PyWPS. The availability of the SOAP interface and WSDL description allows WPS instances to be accessible to workflow development software like Taverna, enabling users to build complex workflows using web services represented by interconnecting graphics. Taverna will transform the visual representation of the workflow into a SCUFL (Simple Conceptual Unified Flow Language) based XML document that can be run internally or sent to a Taverna orchestration server. SCUFL uses a dataflow-centric orchestration model as opposed to the more commonly used orchestration language BPEL (Business Process Execution Language) which is process-centric.
Resumo:
Geoscience methods are increasingly being utilised in criminal, environmental and humanitarian forensic investigations, and the use of such methods is supported by a growing body of experimental and theoretical research. Geoscience search techniques can complement traditional methodologies in the search for buried objects, including clandestine graves, weapons, explosives, drugs, illegal weapons, hazardous waste and vehicles. This paper details recent advances in search and detection methods, with case studies and reviews. Relevant examples are given, together with a generalised workflow for search and suggested detection technique(s) table. Forensic geoscience techniques are continuing to rapidly evolve to assist search investigators to detect hitherto difficult to locate forensic targets.
Resumo:
Just as conventional institutions are organisational structures for coordinating the activities of multiple interacting individuals, electronic institutions provide a computational analogue for coordinating the activities of multiple interacting software agents. In this paper, we argue that open multi-agent systems can be effectively designed and implemented as electronic institutions, for which we provide a comprehensive computational model. More specifically, the paper provides an operational semantics for electronic institutions, specifying the essential data structures, the state representation and the key operations necessary to implement them. We specify the agent workflow structure that is the core component of such electronic institutions and particular instantiations of knowledge representation languages that support the institutional model. In so doing, we provide the first formal account of the electronic institution concept in a rigorous and unambiguous way.
Resumo:
The scheduling problem in distributed data-intensive computing environments has become an active research topic due to the tremendous growth in grid and cloud computing environments. As an innovative distributed intelligent paradigm, swarm intelligence provides a novel approach to solving these potentially intractable problems. In this paper, we formulate the scheduling problem for work-flow applications with security constraints in distributed data-intensive computing environments and present a novel security constraint model. Several meta-heuristic adaptations to the particle swarm optimization algorithm are introduced to deal with the formulation of efficient schedules. A variable neighborhood particle swarm optimization algorithm is compared with a multi-start particle swarm optimization and multi-start genetic algorithm. Experimental results illustrate that population based meta-heuristics approaches usually provide a good balance between global exploration and local exploitation and their feasibility and effectiveness for scheduling work-flow applications. © 2010 Elsevier Inc. All rights reserved.
Resumo:
The burial of objects (human remains, explosives, weapons) below or behind concrete, brick, plaster or tiling may be associated with serious crime and are difficult locations to search. These are quite common forensic search scenarios but little has been published on them to-date. Most documented discoveries are accidental or from suspect/witness testimony. The problem in locating such hidden objects means a random or chance-based approach is not advisable. A preliminary strategy is presented here, based on previous studies, augmented by primary research where new technology or applications are required. This blend allows a rudimentary search workflow, from remote desktop study, to non-destructive investigation through to recommendations as to how the above may inform excavation, demonstrated here with a case study from a homicide investigation. Published case studies on the search for human remains demonstrate the problems encountered when trying to find and recover sealed-in and sealed over locations. Established methods include desktop study, photography, geophysics and search dogs:these are integrated with new technology (LiDAR and laser scanning; photographic rectification; close quarter aerial imagery; ground-penetrating radar on walls and gamma-ray/neutron activation radiography) to propose this possible search strategy.
Resumo:
Next-generation sequencing (NGS) technologies have begun to revolutionize the field of haematological malignancies through the assessment of a patient's genetic makeup with a minimal cost. Significant discoveries have already provided a unique insight into disease initiation, risk stratification and therapeutic intervention. Sequencing analysis will likely form part of the routine diagnostic testing in the future. However, a number of important issues need to be addressed for that to become a reality with regard to result interpretation, laboratory workflow, data storage and ethical issues. In this review we summarize the contribution that NGS has already made to the field of haematological malignancies. Finally, we discuss the challenges that NGS technologies will bring in relation to data storage, ethical and legal issues and laboratory validation. Despite these challenges, we predict that high-throughput DNA sequencing will redefine haematological malignancies based on individualized genomic analysis.
Resumo:
Immunohistochemistry (IHC) is a widely available and highly utilised tool in diagnostic histopathology and is used to guide treatment options as well as provide prognostic information. IHC is subjected to qualitative and subjective assessment, which has been criticised for a lack of stringency, while PCR-based molecular diagnostic validations by comparison are regarded as very rigorous. It is essential that IHC tests are validated through evidence-based procedures. With the move to ISO15189 (2012), not just of the accuracy, specificity and reproducibility of each test need to be determined and managed, but also the degree of uncertainty and the delivery of such tests. The recent update to ISO 15189 (2012) states that it is appropriate to consider the potential uncertainty of measurement of the value obtained in the laboratory and how that may impact on prognostic or predictive thresholds. In order to highlight the problems surrounding IHC validity, we reviewed the measurement of Ki67and p53 in the literature. Both of these biomarkers have been incorporated into clinical care by pathology laboratories worldwide. The variation seen appears excessive even when measuring centrally stained slides from the same cases. We therefore propose in this paper to establish the basis on which IHC laboratories can bring the same level of robust validation seen in the molecular pathology laboratories and the principles applied to all routine IHC tests.
Resumo:
Objective: To understand the knowledge and attitudes of rural Chinese physicians, patients, and village health workers (VHWs) toward diabetic eye disease and glaucoma. Methods: Focus groups for each of the 3 stakeholders were conducted in 3 counties (9 groups). The focus groups were recorded, transcribed, and coded using specialized software. Responses to questions about barriers to compliance and interventions to remove these barriers were also ranked and scored. Results: Among 22 physicians, 23 patients, and 25 VHWs, knowledge about diabetic eye disease was generally good, but physicians and patients understood glaucoma only as an acutely symptomatic disease of relatively low prevalence. Physicians did not favor routine pupillary dilation to detect asymptomatic disease, expressing concerns about workflow and danger and inconvenience to patients. Providers believed that cost was the main barrier to patient compliance, whereas patients ranked poorly trained physicians as more important. All 3 stakeholder groups ranked financial interventions to improve compliance (eg, direct payment, lotteries, and contracts) low and preferred patient education and telephone contact by nurses. All the groups somewhat doubted the ability of VHWs to screen for eye disease accurately, but patients were generally willing to pay for VHW screening. The VHWs were uncertain about the value of eye care training but might accept it if accompanied by equipment. They did not rank payment for screening services as important. Conclusions: Misconceptions about glaucoma's asymptomatic nature and an unwillingness to routinely examine asymptomatic patients must be addressed in training programs. Home contact by nurses and patient education may be the most appropriate interventions to improve compliance.
Resumo:
O desenvolvimento das soluções baseadas em telemedicina tem permitido a criação e o desenvolvimento de novas formas de prestar cuidados de saúde, aproximando prestadores de pacientes e diminuindo os tempos de espera associados, melhorando a qualidade do serviço prestado. No entanto, nem sempre a introdução de tecnologia nos processos de saúde corresponde a uma redução nas assimetrias existentes a nível nacional na prestação de cuidados. O principal objectivo deste trabalho consistiu em desenvolver um sistema que permita a criação de um mercado electrónico de teleradiologia tirando partido das soluções tecnologicamente evoluídas já existentes e da boa distribuição de equipamento a nível nacional. Neste mercado é possível maximizar a satisfação de pacientes e entidades requisitantes de exames radiológicos em relação ao serviço prestado, sem prejudicar a qualidade do serviço prestado, ao mesmo tempo que optimiza a utilização do equipamento disponibilizado. Para tal, foi verificado o actual estado da arte em termos de sistemas de telemedicina e de teleradiologia, tendo igualmente sido confirmada a percepção existente das assimetrias a nível nacional em termos de distribuição de recursos humanos no sector da saúde. Depois de verificar o actual fluxo de trabalho em termos de requisição, execução e interpretação de exames imagiológicos, procedeu-se à sua optimização e adaptação para um mercado de teleradiologia, desenhando um conjunto de requisitos associados com as principais etapas de execução dos exames, identificando os principais entraves ao seu funcionamento e propondo mecanismos originais de resolução com base em sistemas de informação. Ao nível de sistemas de informação, é apresentado um protótipo que possibilita a demonstração da implementação de alguns dos mais importantes requisitos anteriormente apresentados e o seu funcionamento, demonstrando o funcionamento prático do sistema de mercado de imagens baseado em teleradiologia. Finalmente é demonstrada a exequibilidade prática desta solução mediante a apresentação de um modelo de negócio onde se apresentam os benefícios decorrentes da implementação deste sistema e os respectivos custos associados com a implementação de uma infra-estrutura desta natureza.
Resumo:
A exigente inovação na área das aplicações biomédicas tem guiado a evolução das tecnologias de informação nas últimas décadas. Os desafios associados a uma gestão, integração, análise e interpretação eficientes dos dados provenientes das mais modernas tecnologias de hardware e software requerem um esforço concertado. Desde hardware para sequenciação de genes a registos electrónicos de paciente, passando por pesquisa de fármacos, a possibilidade de explorar com precisão os dados destes ambientes é vital para a compreensão da saúde humana. Esta tese engloba a discussão e o desenvolvimento de melhores estratégias informáticas para ultrapassar estes desafios, principalmente no contexto da composição de serviços, incluindo técnicas flexíveis de integração de dados, como warehousing ou federação, e técnicas avançadas de interoperabilidade, como serviços web ou LinkedData. A composição de serviços é apresentada como um ideal genérico, direcionado para a integração de dados e para a interoperabilidade de software. Relativamente a esta última, esta investigação debruçou-se sobre o campo da farmacovigilância, no contexto do projeto Europeu EU-ADR. As contribuições para este projeto, um novo standard de interoperabilidade e um motor de execução de workflows, sustentam a sucesso da EU-ADR Web Platform, uma plataforma para realizar estudos avançados de farmacovigilância. No contexto do projeto Europeu GEN2PHEN, esta investigação visou ultrapassar os desafios associados à integração de dados distribuídos e heterogéneos no campo do varíoma humano. Foi criada uma nova solução, WAVe - Web Analyses of the Variome, que fornece uma coleção rica de dados de variação genética através de uma interface Web inovadora e de uma API avançada. O desenvolvimento destas estratégias evidenciou duas oportunidades claras na área de software biomédico: melhorar o processo de implementação de software através do recurso a técnicas de desenvolvimento rápidas e aperfeiçoar a qualidade e disponibilidade dos dados através da adopção do paradigma de web semântica. A plataforma COEUS atravessa as fronteiras de integração e interoperabilidade, fornecendo metodologias para a aquisição e tradução flexíveis de dados, bem como uma camada de serviços interoperáveis para explorar semanticamente os dados agregados. Combinando as técnicas de desenvolvimento rápidas com a riqueza da perspectiva "Semantic Web in a box", a plataforma COEUS é uma aproximação pioneira, permitindo o desenvolvimento da próxima geração de aplicações biomédicas.