534 resultados para data centric storage
Resumo:
As a result of the more distributed nature of organisations and the inherently increasing complexity of their business processes, a significant effort is required for the specification and verification of those processes. The composition of the activities into a business process that accomplishes a specific organisational goal has primarily been a manual task. Automated planning is a branch of artificial intelligence (AI) in which activities are selected and organised by anticipating their expected outcomes with the aim of achieving some goal. As such, automated planning would seem to be a natural fit to the BPM domain to automate the specification of control flow. A number of attempts have been made to apply automated planning to the business process and service composition domain in different stages of the BPM lifecycle. However, a unified adoption of these techniques throughout the BPM lifecycle is missing. As such, we propose a new intention-centric BPM paradigm, which aims on minimising the specification effort by exploiting automated planning techniques to achieve a pre-stated goal. This paper provides a vision on the future possibilities of enhancing BPM using automated planning. A research agenda is presented, which provides an overview of the opportunities and challenges for the exploitation of automated planning in BPM.
Resumo:
QUT Library Research Support has simplified and streamlined the process of research data management planning, storage, discovery and reuse through collaboration and the use of integrated and tailored online tools, and a simplification of the metadata schema. This poster presents the integrated data management services a QUT, including QUT’s Data Management Planning Tool, Research Data Finder, Spatial Data Finder and Software Finder, and information on the simplified Registry Interchange Format – Collections and Services (RIF-CS) Schema. The QUT Data Management Planning (DMP) Tool was built using the Digital Curation Centre’s DMP Online Tool and modified to QUT’s needs and policies. The tool allows researchers and Higher Degree Research students to plan how to handle research data throughout the active phase of their research. The plan is promoted as a ‘live’ document’ and researchers are encouraged to update it as required. The information entered into the plan can be made private or shared with supervisors, project members and external examiners. A plan is mandatory when requesting storage space on the QUT Research Data Storage Service. QUT’s Research Data Finder is integrated with QUT’s Academic Profiles and the Data Management Planning Tool to create a seamless data management process. This process aims to encourage the creation of high quality rich records which facilitate discovery and reuse of quality data. The Registry Interchange Format – Collections and Services (RIF-CS) Schema that is used in the QUT Research Data Finder was simplified to “RIF-CS lite” to reflect mandatory and optional metadata requirements. RIF-CS lite removed schema fields that were underused or extra to the needs of the users and system. This has reduced the amount of metadata fields required from users and made integration of systems a far more simple process where field content is easily shared across services making the process of collecting metadata as transparent as possible.
Resumo:
Distributed systems are widely used for solving large-scale and data-intensive computing problems, including all-to-all comparison (ATAC) problems. However, when used for ATAC problems, existing computational frameworks such as Hadoop focus on load balancing for allocating comparison tasks, without careful consideration of data distribution and storage usage. While Hadoop-based solutions provide users with simplicity of implementation, their inherent MapReduce computing pattern does not match the ATAC pattern. This leads to load imbalances and poor data locality when Hadoop's data distribution strategy is used for ATAC problems. Here we present a data distribution strategy which considers data locality, load balancing and storage savings for ATAC computing problems in homogeneous distributed systems. A simulated annealing algorithm is developed for data distribution and task scheduling. Experimental results show a significant performance improvement for our approach over Hadoop-based solutions.
Resumo:
The concept of big data has already outperformed traditional data management efforts in almost all industries. Other instances it has succeeded in obtaining promising results that provide value from large-scale integration and analysis of heterogeneous data sources for example Genomic and proteomic information. Big data analytics have become increasingly important in describing the data sets and analytical techniques in software applications that are so large and complex due to its significant advantages including better business decisions, cost reduction and delivery of new product and services [1]. In a similar context, the health community has experienced not only more complex and large data content, but also information systems that contain a large number of data sources with interrelated and interconnected data attributes. That have resulted in challenging, and highly dynamic environments leading to creation of big data with its enumerate complexities, for instant sharing of information with the expected security requirements of stakeholders. When comparing big data analysis with other sectors, the health sector is still in its early stages. Key challenges include accommodating the volume, velocity and variety of healthcare data with the current deluge of exponential growth. Given the complexity of big data, it is understood that while data storage and accessibility are technically manageable, the implementation of Information Accountability measures to healthcare big data might be a practical solution in support of information security, privacy and traceability measures. Transparency is one important measure that can demonstrate integrity which is a vital factor in the healthcare service. Clarity about performance expectations is considered to be another Information Accountability measure which is necessary to avoid data ambiguity and controversy about interpretation and finally, liability [2]. According to current studies [3] Electronic Health Records (EHR) are key information resources for big data analysis and is also composed of varied co-created values [3]. Common healthcare information originates from and is used by different actors and groups that facilitate understanding of the relationship for other data sources. Consequently, healthcare services often serve as an integrated service bundle. Although a critical requirement in healthcare services and analytics, it is difficult to find a comprehensive set of guidelines to adopt EHR to fulfil the big data analysis requirements. Therefore as a remedy, this research work focus on a systematic approach containing comprehensive guidelines with the accurate data that must be provided to apply and evaluate big data analysis until the necessary decision making requirements are fulfilled to improve quality of healthcare services. Hence, we believe that this approach would subsequently improve quality of life.
Resumo:
The world has experienced a large increase in the amount of available data. Therefore, it requires better and more specialized tools for data storage and retrieval and information privacy. Recently Electronic Health Record (EHR) Systems have emerged to fulfill this need in health systems. They play an important role in medicine by granting access to information that can be used in medical diagnosis. Traditional systems have a focus on the storage and retrieval of this information, usually leaving issues related to privacy in the background. Doctors and patients may have different objectives when using an EHR system: patients try to restrict sensible information in their medical records to avoid misuse information while doctors want to see as much information as possible to ensure a correct diagnosis. One solution to this dilemma is the Accountable e-Health model, an access protocol model based in the Information Accountability Protocol. In this model patients are warned when doctors access their restricted data. They also enable a non-restrictive access for authenticated doctors. In this work we use FluxMED, an EHR system, and augment it with aspects of the Information Accountability Protocol to address these issues. The Implementation of the Information Accountability Framework (IAF) in FluxMED provides ways for both patients and physicians to have their privacy and access needs achieved. Issues related to storage and data security are secured by FluxMED, which contains mechanisms to ensure security and data integrity. The effort required to develop a platform for the management of medical information is mitigated by the FluxMED's workflow-based architecture: the system is flexible enough to allow the type and amount of information being altered without the need to change in your source code.
Resumo:
Increasingly larger scale applications are generating an unprecedented amount of data. However, the increasing gap between computation and I/O capacity on High End Computing machines makes a severe bottleneck for data analysis. Instead of moving data from its source to the output storage, in-situ analytics processes output data while simulations are running. However, in-situ data analysis incurs much more computing resource contentions with simulations. Such contentions severely damage the performance of simulation on HPE. Since different data processing strategies have different impact on performance and cost, there is a consequent need for flexibility in the location of data analytics. In this paper, we explore and analyze several potential data-analytics placement strategies along the I/O path. To find out the best strategy to reduce data movement in given situation, we propose a flexible data analytics (FlexAnalytics) framework in this paper. Based on this framework, a FlexAnalytics prototype system is developed for analytics placement. FlexAnalytics system enhances the scalability and flexibility of current I/O stack on HEC platforms and is useful for data pre-processing, runtime data analysis and visualization, as well as for large-scale data transfer. Two use cases – scientific data compression and remote visualization – have been applied in the study to verify the performance of FlexAnalytics. Experimental results demonstrate that FlexAnalytics framework increases data transition bandwidth and improves the application end-to-end transfer performance.
Resumo:
Big Data and predictive analytics have received significant attention from the media and academic literature throughout the past few years, and it is likely that these emerging technologies will materially impact the mining sector. This short communication argues, however, that these technological forces will probably unfold differently in the mining industry than they have in many other sectors because of significant differences in the marginal cost of data capture and storage. To this end, we offer a brief overview of what Big Data and predictive analytics are, and explain how they are bringing about changes in a broad range of sectors. We discuss the “N=all” approach to data collection being promoted by many consultants and technology vendors in the marketplace but, by considering the economic and technical realities of data acquisition and storage, we then explain why a “n « all” data collection strategy probably makes more sense for the mining sector. Finally, towards shaping the industry’s policies with regards to technology-related investments in this area, we conclude by putting forward a conceptual model for leveraging Big Data tools and analytical techniques that is a more appropriate fit for the mining sector.
Resumo:
This research studied distributed computing of all-to-all comparison problems with big data sets. The thesis formalised the problem, and developed a high-performance and scalable computing framework with a programming model, data distribution strategies and task scheduling policies to solve the problem. The study considered storage usage, data locality and load balancing for performance improvement in solving the problem. The research outcomes can be applied in bioinformatics, biometrics and data mining and other domains in which all-to-all comparisons are a typical computing pattern.
Resumo:
Distributed renewable energy has become a significant contender in the supply of power in the distribution network in Queensland and throughout the world. As the cost of battery storage falls, distribution utilities turn their attention to the impacts of battery storage and other storage technologies on the low voltage (LV) network. With access to detailed residential energy usage data, Energex's available residential tariffs are investigated for their effectiveness in providing customers with financial incentives to move to Time-of Use based tariffs and to reward use of battery storage.
Resumo:
Dispersing a data object into a set of data shares is an elemental stage in distributed communication and storage systems. In comparison to data replication, data dispersal with redundancy saves space and bandwidth. Moreover, dispersing a data object to distinct communication links or storage sites limits adversarial access to whole data and tolerates loss of a part of data shares. Existing data dispersal schemes have been proposed mostly based on various mathematical transformations on the data which induce high computation overhead. This paper presents a novel data dispersal scheme where each part of a data object is replicated, without encoding, into a subset of data shares according to combinatorial design theory. Particularly, data parts are mapped to points and data shares are mapped to lines of a projective plane. Data parts are then distributed to data shares using the point and line incidence relations in the plane so that certain subsets of data shares collectively possess all data parts. The presented scheme incorporates combinatorial design theory with inseparability transformation to achieve secure data dispersal at reduced computation, communication and storage costs. Rigorous formal analysis and experimental study demonstrate significant cost-benefits of the presented scheme in comparison to existing methods.
Resumo:
Solving large-scale all-to-all comparison problems using distributed computing is increasingly significant for various applications. Previous efforts to implement distributed all-to-all comparison frameworks have treated the two phases of data distribution and comparison task scheduling separately. This leads to high storage demands as well as poor data locality for the comparison tasks, thus creating a need to redistribute the data at runtime. Furthermore, most previous methods have been developed for homogeneous computing environments, so their overall performance is degraded even further when they are used in heterogeneous distributed systems. To tackle these challenges, this paper presents a data-aware task scheduling approach for solving all-to-all comparison problems in heterogeneous distributed systems. The approach formulates the requirements for data distribution and comparison task scheduling simultaneously as a constrained optimization problem. Then, metaheuristic data pre-scheduling and dynamic task scheduling strategies are developed along with an algorithmic implementation to solve the problem. The approach provides perfect data locality for all comparison tasks, avoiding rearrangement of data at runtime. It achieves load balancing among heterogeneous computing nodes, thus enhancing the overall computation time. It also reduces data storage requirements across the network. The effectiveness of the approach is demonstrated through experimental studies.
Resumo:
This workshop aims at discussing alternative approaches to resolving the problem of health information fragmentation, partially resulting from difficulties of health complex systems to semantically interact at the information level. In principle, we challenge the current paradigm of keeping medical records where they were created and discuss an alternative approach in which an individual's health data can be maintained by new entities whose sole responsibility is the sustainability of individual-centric health records. In particular, we will discuss the unique characteristics of the European health information landscape. This workshop is also a business meeting of the IMIA Working Group on Health Record Banking.