999 resultados para data archiving
Resumo:
This report describes web archiving in the National Library of Finland. The National Library of Finland has been archiving Finnish web on a regular basis since 2006. Web archiving is an important part of the Library'ʹs endeavours to collect and preserve Finnish published cultural heritage. In 2010, the amount of harvested data was 200 million files, or 25 Terabytes. The report takes the reader through the relevant legislation; internal plans and policies; funding and their allocation; the practices of web archiving; arrangements for the use of the archive; and issues rising from data security, sensitive materials, &c.
Resumo:
Poster at Open Repositories 2014, Helsinki, Finland, June 9-13, 2014
Resumo:
Presentation at Open Repositories 2014, Helsinki, Finland, June 9-13, 2014
Resumo:
This paper describes a prototype grid infrastructure, called the eMinerals minigrid, for molecular simulation scientists. which is based on an integration of shared compute and data resources. We describe the key components, namely the use of Condor pools, Linux/Unix clusters with PBS and IBM's LoadLeveller job handling tools, the use of Globus for security handling, the use of Condor-G tools for wrapping globus job submit commands, Condor's DAGman tool for handling workflow, the Storage Resource Broker for handling data, and the CCLRC dataportal and associated tools for both archiving data with metadata and making data available to other workers.
Resumo:
Smart healthcare is a complex domain for systems integration due to human and technical factors and heterogeneous data sources involved. As a part of smart city, it is such a complex area where clinical functions require smartness of multi-systems collaborations for effective communications among departments, and radiology is one of the areas highly relies on intelligent information integration and communication. Therefore, it faces many challenges regarding integration and its interoperability such as information collision, heterogeneous data sources, policy obstacles, and procedure mismanagement. The purpose of this study is to conduct an analysis of data, semantic, and pragmatic interoperability of systems integration in radiology department, and to develop a pragmatic interoperability framework for guiding the integration. We select an on-going project at a local hospital for undertaking our case study. The project is to achieve data sharing and interoperability among Radiology Information Systems (RIS), Electronic Patient Record (EPR), and Picture Archiving and Communication Systems (PACS). Qualitative data collection and analysis methods are used. The data sources consisted of documentation including publications and internal working papers, one year of non-participant observations and 37 interviews with radiologists, clinicians, directors of IT services, referring clinicians, radiographers, receptionists and secretary. We identified four primary phases of data analysis process for the case study: requirements and barriers identification, integration approach, interoperability measurements, and knowledge foundations. Each phase is discussed and supported by qualitative data. Through the analysis we also develop a pragmatic interoperability framework that summaries the empirical findings and proposes recommendations for guiding the integration in the radiology context.
Resumo:
In the instrumental records of daily precipitation, we often encounter one or more periods in which values below some threshold were not registered. Such periods, besides lacking small values, also have a large number of dry days. Their cumulative distribution function is shifted to the right in relation to that for other portions of the record having more reliable observations. Such problems are examined in this work, based mostly on the two-sample Kolmogorov–Smirnov (KS) test, where the portion of the series with more number of dry days is compared with the portion with less number of dry days. Another relatively common problem in daily rainfall data is the prevalence of integers either throughout the period of record or in some part of it, likely resulting from truncation during data compilation prior to archiving or by coarse rounding of daily readings by observers. This problem is identified by simple calculation of the proportion of integers in the series, taking the expected proportion as 10%. The above two procedures were applied to the daily rainfall data sets from the European Climate Assessment (ECA), Southeast Asian Climate Assessment (SACA), and Brazilian Water Resources Agency (BRA). Taking the statistic D of the KS test >0.15 and the corresponding p-value <0.001 as the condition to classify a given series as suspicious, the proportions of the ECA, SACA, and BRA series falling into this category are, respectively, 34.5%, 54.3%, and 62.5%. With relation to coarse rounding problem, the proportions of series exceeding twice the 10% reference level are 3%, 60%, and 43% for the ECA, SACA, and BRA data sets, respectively. A simple way to visualize the two problems addressed here is by plotting the time series of daily rainfall for a limited range, for instance, 0–10 mm day−1.
Resumo:
The oceans play a critical role in the Earth's climate, but unfortunately, the extent of this role is only partially understood. One major obstacle is the difficulty associated with making high-quality, globally distributed observations, a feat that is nearly impossible using only ships and other ocean-based platforms. The data collected by satellite-borne ocean color instruments, however, provide environmental scientists a synoptic look at the productivity and variability of the Earth's oceans and atmosphere, respectively, on high-resolution temporal and spatial scales. Three such instruments, the Sea-viewing Wide Field-of-view Sensor (SeaWiFS) onboard ORBIMAGE's OrbView-2 satellite, and two Moderate Resolution Imaging Spectroradiometers (MODIS) onboard the National Aeronautic and Space Administration's (NASA) Terra and Aqua satellites, have been in continuous operation since September 1997, February 2000, and June 2002, respectively. To facilitate the assembly of a suitably accurate data set for climate research, members of the NASA Sensor Intercomparison and Merger for Biological and Interdisciplinary Oceanic Studies (SIMBIOS) Project and SeaWiFS Project Offices devote significant attention to the calibration and validation of these and other ocean color instruments. This article briefly presents results from the SIMBIOS and SeaWiFS Project Office's (SSPO) satellite ocean color validation activities and describes the SeaWiFS Bio-optical Archive and Storage System (SeaBASS), a state-of-the-art system for archiving, cataloging, and distributing the in situ data used in these activities.
Resumo:
In 2008, the 50th anniversary of the IGY (International Geophysical Year), WDCMARE presents with this CD publication 3632 data sets in Open Access as part of the most important results from 73 cruises of the research vessel METEOR between 1964 and 1985. The archive is a coherent organized collection of published and unpublished data sets produced by scientists of all marine research disciplines who participated in Meteor expeditions, measured environmental parameters during cruises and investigated sample material post cruise in the labs of the participating institutions. In most cases, the data was gathered from the Meteor Forschungsergebnisse, published by the Deutsche Forschungsgemeinschaft (DFG). A second important data source are time series and radiosonde ascensions of more than 20 years of ships weather observations, which were provided by the Deutscher Wetterdienst, Hamburg. The final inclusion of all data into the PANGAEA information system ensures secure archiving, future updates, widespread distribution in electronic, machine-readable form with longterm access via the Internet. To produce this publication, all data sets with metadata were extracted from PANGAEA and organized in a directory structure on a CD together with a search capability.
Resumo:
Funded by Chief Scientist Office, Scotland. Grant Number: CZH/4/394 Economic and Social Research Council grant as part of the National Centre for Research Methods. Grant Number: RES-576-25-0032
Resumo:
The authors would like to thank the College of Life Sciences of Aberdeen University and Marine Scotland Science which funded CP's PhD project. Skate tagging experiments were undertaken as part of Scottish Government project SP004. We thank Ian Burrett for help in catching the fish and the other fishermen and anglers who returned tags. We thank José Manuel Gonzalez-Irusta for extracting and making available the environmental layers used as environmental covariates in the environmental suitability modelling procedure. We also thank Jason Matthiopoulos for insightful suggestions on habitat utilization metrics as well as Stephen C.F. Palmer, and three anonymous reviewers for useful suggestions to improve the clarity and quality of the manuscript.
Resumo:
We thank Orkney Islands Council for access to Eynhallow and Talisman Energy (UK) Ltd and Marine Scotland for fieldwork and equipment support. Handling and tagging of fulmars was conducted under licences from the British Trust for Ornithology and the UK Home Office. EE was funded by a Marine Alliance for Science and Technology for Scotland/University of Aberdeen College of Life Sciences and Medicine studentship and LQ was supported by a NERC Studentship. Thanks also to the many colleagues who assisted with fieldwork during the project, and to Helen Bailey and Arliss Winship for advice on implementing the state-space model.
Resumo:
Aim: To evaluate the reported use of Data Monitoring Committees (DMCs), the frequency of interim analysis, pre-specified stopping rules and early trial termination in neonatal randomised controlled trials (RCTs). Methods: We reviewed neonatal RCTs published in four high impact general medical journals, specifically looking at safety issues including documented involvement of a DMC, stated interim analysis, stopping rules and early trial termination. We searched all journal issues over an 11-year period (2003-2013) and recorded predefined parameters on each item for RCTs meeting inclusion criteria. Results: Seventy neonatal trials were identified in four general medical journals: Lancet, New England Journal of Medicine (NEJM), British Medical Journal and Journal of American Medical Association (JAMA). 43 (61.4%) studies reported the presence of a DMC, 36 (51.4%) explicitly mentioned interim analysis; stopping rules were reported in 15 (21.4%) RCTs and 7 (10%) trials were terminated early. The NEJM most frequently reported these parameters compared to the other three journals reviewed. Conclusion: While the majority of neonatal RCTs report on DMC involvement and interim analysis there is still scope for improvement. Clear documentation of safety related issues should be a central component of reporting in neonatal trials involving newborn infants.
Resumo:
During the SINOPS project, an optimal state of the art simulation of the marine silicon cycle is attempted employing a biogeochemical ocean general circulation model (BOGCM) through three particular time steps relevant for global (paleo-) climate. In order to tune the model optimally, results of the simulations are compared to a comprehensive data set of 'real' observations. SINOPS' scientific data management ensures that data structure becomes homogeneous throughout the project. Practical work routine comprises systematic progress from data acquisition, through preparation, processing, quality check and archiving, up to the presentation of data to the scientific community. Meta-information and analytical data are mapped by an n-dimensional catalogue in order to itemize the analytical value and to serve as an unambiguous identifier. In practice, data management is carried out by means of the online-accessible information system PANGAEA, which offers a tool set comprising a data warehouse, Graphical Information System (GIS), 2-D plot, cross-section plot, etc. and whose multidimensional data model promotes scientific data mining. Besides scientific and technical aspects, this alliance between scientific project team and data management crew serves to integrate the participants and allows them to gain mutual respect and appreciation.
Resumo:
With the advent of new technologies it is increasingly easier to find data of different nature from even more accurate sensors that measure the most disparate physical quantities and with different methodologies. The collection of data thus becomes progressively important and takes the form of archiving, cataloging and online and offline consultation of information. Over time, the amount of data collected can become so relevant that it contains information that cannot be easily explored manually or with basic statistical techniques. The use of Big Data therefore becomes the object of more advanced investigation techniques, such as Machine Learning and Deep Learning. In this work some applications in the world of precision zootechnics and heat stress accused by dairy cows are described. Experimental Italian and German stables were involved for the training and testing of the Random Forest algorithm, obtaining a prediction of milk production depending on the microclimatic conditions of the previous days with satisfactory accuracy. Furthermore, in order to identify an objective method for identifying production drops, compared to the Wood model, typically used as an analytical model of the lactation curve, a Robust Statistics technique was used. Its application on some sample lactations and the results obtained allow us to be confident about the use of this method in the future.
Resumo:
High-throughput screening of physical, genetic and chemical-genetic interactions brings important perspectives in the Systems Biology field, as the analysis of these interactions provides new insights into protein/gene function, cellular metabolic variations and the validation of therapeutic targets and drug design. However, such analysis depends on a pipeline connecting different tools that can automatically integrate data from diverse sources and result in a more comprehensive dataset that can be properly interpreted. We describe here the Integrated Interactome System (IIS), an integrative platform with a web-based interface for the annotation, analysis and visualization of the interaction profiles of proteins/genes, metabolites and drugs of interest. IIS works in four connected modules: (i) Submission module, which receives raw data derived from Sanger sequencing (e.g. two-hybrid system); (ii) Search module, which enables the user to search for the processed reads to be assembled into contigs/singlets, or for lists of proteins/genes, metabolites and drugs of interest, and add them to the project; (iii) Annotation module, which assigns annotations from several databases for the contigs/singlets or lists of proteins/genes, generating tables with automatic annotation that can be manually curated; and (iv) Interactome module, which maps the contigs/singlets or the uploaded lists to entries in our integrated database, building networks that gather novel identified interactions, protein and metabolite expression/concentration levels, subcellular localization and computed topological metrics, GO biological processes and KEGG pathways enrichment. This module generates a XGMML file that can be imported into Cytoscape or be visualized directly on the web. We have developed IIS by the integration of diverse databases following the need of appropriate tools for a systematic analysis of physical, genetic and chemical-genetic interactions. IIS was validated with yeast two-hybrid, proteomics and metabolomics datasets, but it is also extendable to other datasets. IIS is freely available online at: http://www.lge.ibi.unicamp.br/lnbio/IIS/.