976 resultados para Data Management


Relevância:

100.00% 100.00%

Publicador:

Relevância:

100.00% 100.00%

Publicador:

Relevância:

100.00% 100.00%

Publicador:

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The iRODS system, created by the San Diego Supercomputing Centre, is a rule oriented data management system that allows the user to create sets of rules to define how the data is to be managed. Each rule corresponds to a particular action or operation (such as checksumming a file) and the system is flexible enough to allow the user to create new rules for new types of operations. The iRODS system can interface to any storage system (provided an iRODS driver is built for that system) and relies on its’ metadata catalogue to provide a virtual file-system that can handle files of any size and type. However, some storage systems (such as tape systems) do not handle small files efficiently and prefer small files to be packaged up (or “bundled”) into larger units. We have developed a system that can bundle small data files of any type into larger units - mounted collections. The system can create collection families and contains its’ own extensible metadata, including metadata on which family the collection belongs to. The mounted collection system can work standalone and is being incorporated into the iRODS system to enhance the systems flexibility to handle small files. In this paper we describe the motivation for creating a mounted collection system, its’ architecture and how it has been incorporated into the iRODS system. We describe different technologies used to create the mounted collection system and provide some performance numbers.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

There is remarkable agreement in expectations today for vastly improved ocean data management a decade from now -- capabilities that will help to bring significant benefits to ocean research and to society. Advancing data management to such a degree, however, will require cultural and policy changes that are slow to effect. The technological foundations upon which data management systems are built are certain to continue advancing rapidly in parallel. These considerations argue for adopting attitudes of pragmatism and realism when planning data management strategies. In this paper we adopt those attitudes as we outline opportunities for progress in ocean data management. We begin with a synopsis of expectations for integrated ocean data management a decade from now. We discuss factors that should be considered by those evaluating candidate “standards”. We highlight challenges and opportunities in a number of technical areas, including “Web 2.0” applications, data modeling, data discovery and metadata, real-time operational data, archival of data, biological data management and satellite data management. We discuss the importance of investments in the development of software toolkits to accelerate progress. We conclude the paper by recommending a few specific, short term targets for implementation, that we believe to be both significant and achievable, and calling for action by community leadership to effect these advancements.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Climate-G is a large scale distributed testbed devoted to climate change research. It is an unfunded effort started in 2008 and involving a wide community both in Europe and US. The testbed is an interdisciplinary effort involving partners from several institutions and joining expertise in the field of climate change and computational science. Its main goal is to allow scientists carrying out geographical and cross-institutional data discovery, access, analysis, visualization and sharing of climate data. It represents an attempt to address, in a real environment, challenging data and metadata management issues. This paper presents a complete overview about the Climate-G testbed highlighting the most important results that have been achieved since the beginning of this project.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Purpose: To investigate the relationship between research data management (RDM) and data sharing in the formulation of RDM policies and development of practices in higher education institutions (HEIs). Design/methodology/approach: Two strands of work were undertaken sequentially: firstly, content analysis of 37 RDM policies from UK HEIs; secondly, two detailed case studies of institutions with different approaches to RDM based on semi-structured interviews with staff involved in the development of RDM policy and services. The data are interpreted using insights from Actor Network Theory. Findings: RDM policy formation and service development has created a complex set of networks within and beyond institutions involving different professional groups with widely varying priorities shaping activities. Data sharing is considered an important activity in the policies and services of HEIs studied, but its prominence can in most cases be attributed to the positions adopted by large research funders. Research limitations/implications: The case studies, as research based on qualitative data, cannot be assumed to be universally applicable but do illustrate a variety of issues and challenges experienced more generally, particularly in the UK. Practical implications: The research may help to inform development of policy and practice in RDM in HEIs and funder organisations. Originality/value: This paper makes an early contribution to the RDM literature on the specific topic of the relationship between RDM policy and services, and openness – a topic which to date has received limited attention.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Instrumentation and automation plays a vital role to managing the water industry. These systems generate vast amounts of data that must be effectively managed in order to enable intelligent decision making. Time series data management software, commonly known as data historians are used for collecting and managing real-time (time series) information. More advanced software solutions provide a data infrastructure or utility wide Operations Data Management System (ODMS) that stores, manages, calculates, displays, shares, and integrates data from multiple disparate automation and business systems that are used daily in water utilities. These ODMS solutions are proven and have the ability to manage data from smart water meters to the collaboration of data across third party corporations. This paper focuses on practical, utility successes in the water industry where utility managers are leveraging instantaneous access to data from proven, commercial off-the-shelf ODMS solutions to enable better real-time decision making. Successes include saving $650,000 / year in water loss control, safeguarding water quality, saving millions of dollars in energy management and asset management. Immediate opportunities exist to integrate the research being done in academia with these ODMS solutions in the field and to leverage these successes to utilities around the world.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The Short-term Water Information and Forecasting Tools (SWIFT) is a suite of tools for flood and short-term streamflow forecasting, consisting of a collection of hydrologic model components and utilities. Catchments are modeled using conceptual subareas and a node-link structure for channel routing. The tools comprise modules for calibration, model state updating, output error correction, ensemble runs and data assimilation. Given the combinatorial nature of the modelling experiments and the sub-daily time steps typically used for simulations, the volume of model configurations and time series data is substantial and its management is not trivial. SWIFT is currently used mostly for research purposes but has also been used operationally, with intersecting but significantly different requirements. Early versions of SWIFT used mostly ad-hoc text files handled via Fortran code, with limited use of netCDF for time series data. The configuration and data handling modules have since been redesigned. The model configuration now follows a design where the data model is decoupled from the on-disk persistence mechanism. For research purposes the preferred on-disk format is JSON, to leverage numerous software libraries in a variety of languages, while retaining the legacy option of custom tab-separated text formats when it is a preferred access arrangement for the researcher. By decoupling data model and data persistence, it is much easier to interchangeably use for instance relational databases to provide stricter provenance and audit trail capabilities in an operational flood forecasting context. For the time series data, given the volume and required throughput, text based formats are usually inadequate. A schema derived from CF conventions has been designed to efficiently handle time series for SWIFT.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In geophysics and seismology, raw data need to be processed to generate useful information that can be turned into knowledge by researchers. The number of sensors that are acquiring raw data is increasing rapidly. Without good data management systems, more time can be spent in querying and preparing datasets for analyses than in acquiring raw data. Also, a lot of good quality data acquired at great effort can be lost forever if they are not correctly stored. Local and international cooperation will probably be reduced, and a lot of data will never become scientific knowledge. For this reason, the Seismological Laboratory of the Institute of Astronomy, Geophysics and Atmospheric Sciences at the University of São Paulo (IAG-USP) has concentrated fully on its data management system. This report describes the efforts of the IAG-USP to set up a seismology data management system to facilitate local and international cooperation. © 2011 by the Istituto Nazionale di Geofisica e Vulcanologia. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The aging process is characterized by the progressive fitness decline experienced at all the levels of physiological organization, from single molecules up to the whole organism. Studies confirmed inflammaging, a chronic low-level inflammation, as a deeply intertwined partner of the aging process, which may provide the “common soil” upon which age-related diseases develop and flourish. Thus, albeit inflammation per se represents a physiological process, it can rapidly become detrimental if it goes out of control causing an excess of local and systemic inflammatory response, a striking risk factor for the elderly population. Developing interventions to counteract the establishment of this state is thus a top priority. Diet, among other factors, represents a good candidate to regulate inflammation. Building on top of this consideration, the EU project NU-AGE is now trying to assess if a Mediterranean diet, fortified for the elderly population needs, may help in modulating inflammaging. To do so, NU-AGE enrolled a total of 1250 subjects, half of which followed a 1-year long diet, and characterized them by mean of the most advanced –omics and non –omics analyses. The aim of this thesis was the development of a solid data management pipeline able to efficiently cope with the results of these assays, which are now flowing inside a centralized database, ready to be used to test the most disparate scientific hypotheses. At the same time, the work hereby described encompasses the data analysis of the GEHA project, which was focused on identifying the genetic determinants of longevity, with a particular focus on developing and applying a method for detecting epistatic interactions in human mtDNA. Eventually, in an effort to propel the adoption of NGS technologies in everyday pipeline, we developed a NGS variant calling pipeline devoted to solve all the sequencing-related issues of the mtDNA.

Relevância:

100.00% 100.00%

Publicador: