923 resultados para data publishing
Resumo:
Personal information is increasingly gathered and used for providing services tailored to user preferences, but the datasets used to provide such functionality can represent serious privacy threats if not appropriately protected. Work in privacy-preserving data publishing targeted privacy guarantees that protect against record re-identification, by making records indistinguishable, or sensitive attribute value disclosure, by introducing diversity or noise in the sensitive values. However, most approaches fail in the high-dimensional case, and the ones that don’t introduce a utility cost incompatible with tailored recommendation scenarios. This paper aims at a sensible trade-off between privacy and the benefits of tailored recommendations, in the context of privacy-preserving data publishing. We empirically demonstrate that significant privacy improvements can be achieved at a utility cost compatible with tailored recommendation scenarios, using a simple partition-based sanitization method.
Resumo:
Presentation at Open Repositories 2014, Helsinki, Finland, June 9-13, 2014
Resumo:
Publishing Linked Data SPARQL Graph Store Protocol Linked Data Platform Reflection on Data Publishing
Resumo:
From 1983 through mid-1991, more than 200,000 MEDLINE entries were AIDS-related. Close to 60% of the journals indexed in MEDLINE published at least one article on AIDS during the past ten years. As reflected by a subset of 29,077 MEDLINE records, the literature of AIDS has grown to encompass 29 languages and 65 countries. A bibliometric study of the medical literature helps to demonstrate the progression of AIDS as a world health problem and the concomitant expansion of the research effort underway to control it.
Resumo:
We present some recent trends in the field of digital cultural heritage management and applications including digital cultural data curation, interoperability, open linked data publishing, crowd sourcing, visualization, platforms for digital cultural heritage, and applications. We present some examples from research and development projects of MUSIC/TUC in those areas.
Resumo:
Abstract The World Wide Web Consortium, W3C, is known for standards like HTML and CSS but there's a lot more to it than that. Mobile, automotive, publishing, graphics, TV and more. Then there are horizontal issues like privacy, security, accessibility and internationalisation. Many of these assume that there is an underlying data infrastructure to power applications. In this session, W3C's Data Activity Lead, Phil Archer, will describe the overall vision for better use of the Web as a platform for sharing data and how that translates into recent, current and possible future work. What's the difference between using the Web as a data platform and as a glorified USB stick? Why does it matter? And what makes a standard a standard anyway? Speaker Biography Phil Archer Phil Archer is Data Activity Lead at W3C, the industry standards body for the World Wide Web, coordinating W3C's work in the Semantic Web and related technologies. He is most closely involved in the Data on the Web Best Practices, Permissions and Obligations Expression and Spatial Data on the Web Working Groups. His key themes are interoperability through common terminology and URI persistence. As well as work at the W3C, his career has encompassed broadcasting, teaching, linked data publishing, copy writing, and, perhaps incongruously, countryside conservation. The common thread throughout has been a knack for communication, particularly communicating complex technical ideas to a more general audience.
Resumo:
Presentation at Open Repositories 2014, Helsinki, Finland, June 9-13, 2014
Big Decisions and Sparse Data: Adapting Scientific Publishing to the Needs of Practical Conservation
Resumo:
The biggest challenge in conservation biology is breaking down the gap between research and practical management. A major obstacle is the fact that many researchers are unwilling to tackle projects likely to produce sparse or messy data because the results would be difficult to publish in refereed journals. The obvious solution to sparse data is to build up results from multiple studies. Consequently, we suggest that there needs to be greater emphasis in conservation biology on publishing papers that can be built on by subsequent research rather than on papers that produce clear results individually. This building approach requires: (1) a stronger theoretical framework, in which researchers attempt to anticipate models that will be relevant in future studies and incorporate expected differences among studies into those models; (2) use of modern methods for model selection and multi-model inference, and publication of parameter estimates under a range of plausible models; (3) explicit incorporation of prior information into each case study; and (4) planning management treatments in an adaptive framework that considers treatments applied in other studies. We encourage journals to publish papers that promote this building approach rather than expecting papers to conform to traditional standards of rigor as stand-alone papers, and believe that this shift in publishing philosophy would better encourage researchers to tackle the most urgent conservation problems.
Resumo:
Traditionally, the formal scientific output in most fields of natural science has been limited to peer- reviewed academic journal publications, with less attention paid to the chain of intermediate data results and their associated metadata, including provenance. In effect, this has constrained the representation and verification of the data provenance to the confines of the related publications. Detailed knowledge of a dataset’s provenance is essential to establish the pedigree of the data for its effective re-use, and to avoid redundant re-enactment of the experiment or computation involved. It is increasingly important for open-access data to determine their authenticity and quality, especially considering the growing volumes of datasets appearing in the public domain. To address these issues, we present an approach that combines the Digital Object Identifier (DOI) – a widely adopted citation technique – with existing, widely adopted climate science data standards to formally publish detailed provenance of a climate research dataset as an associated scientific workflow. This is integrated with linked-data compliant data re-use standards (e.g. OAI-ORE) to enable a seamless link between a publication and the complete trail of lineage of the corresponding dataset, including the dataset itself.
Resumo:
Nel presente lavoro si introduce un nuovo indice per la valutazione dei prodotti della ricerca: l'indice di multidisciplinarieta`. Questa nuova metrica puo` essere un interessante parametro di valutazione: il panorama degli studi multidisciplinari e` vasto ed eterogeneo, ed all'interno di questo sono richieste necessarie competenze trasversali. Le attuali metriche adottate nella valutazione di un accademico, di un journal, o di una conferenza non tengono conto di queste situazioni intermedie, e limitano la loro valutazione dell'impatto al semplice conteggio delle citazioni ricevute. Il risultato di tale valutazione consiste in un valore dell'impatto della ricerca senza una connotazione della direzione e della rilevanza di questa nel contesto delle altre discipline. L'indice di multidisciplinarieta` proposto si integrerebbe allora all'interno dell'attuale panorama delle metriche di valutazione della ricerca, offrendo -accanto ad una quantificazione dell'impatto- una quantificazione della varieta` dei contesti disciplinari nei quali si inserisce.
Introduction to the data library PANGAEA - Publishing Network for Geoscientific & Environmental Data
Resumo:
In recent years, a variety of systems have been developed that export the workflows used to analyze data and make them part of published articles. We argue that the workflows that are published in current approaches are dependent on the specific codes used for execution, the specific workflow system used, and the specific workflow catalogs where they are published. In this paper, we describe a new approach that addresses these shortcomings and makes workflows more reusable through: 1) the use of abstract workflows to complement executable workflows to make them reusable when the execution environment is different, 2) the publication of both abstract and executable workflows using standards such as the Open Provenance Model that can be imported by other workflow systems, 3) the publication of workflows as Linked Data that results in open web accessible workflow repositories. We illustrate this approach using a complex workflow that we re-created from an influential publication that describes the generation of 'drugomes'.
Resumo:
Publishing Linked Data is a process that involves several design decisions and technologies. Although some initial guidelines have been already provided by Linked Data publishers, these are still far from covering all the steps that are necessary (from data source selection to publication) or giving enough details about all these steps, technologies, intermediate products, etc. Furthermore, given the variety of data sources from which Linked Data can be generated, we believe that it is possible to have a single and uni�ed method for publishing Linked Data, but we should rely on di�erent techniques, technologies and tools for particular datasets of a given domain. In this paper we present a general method for publishing Linked Data and the application of the method to cover di�erent sources from di�erent domains.
Resumo:
The Linked Data initiative offers a straight method to publish structured data in the World Wide Web and link it to other data, resulting in a world wide network of semantically codified data known as the Linked Open Data cloud. The size of the Linked Open Data cloud, i.e. the amount of data published using Linked Data principles, is growing exponentially, including life sciences data. However, key information for biological research is still missing in the Linked Open Data cloud. For example, the relation between orthologs genes and genetic diseases is absent, even though such information can be used for hypothesis generation regarding human diseases. The OGOLOD system, an extension of the OGO Knowledge Base, publishes orthologs/diseases information using Linked Data. This gives the scientists the ability to query the structured information in connection with other Linked Data and to discover new information related to orthologs and human diseases in the cloud.