OpenDataHub: an open dataset management system


Autoria(s): Gonçalves, Orêncio Rodolfo Abreu
Contribuinte(s)

Nunes, Duarte Nuno Jardim

Quintal, Filipe

Pereira, Lucas

Data(s)

04/01/2017

03/02/2017

01/06/2016

Resumo

This thesis presents a cloud-based software platform for sharing publicly available scientific datasets. The proposed platform leverages the potential of NoSQL databases and asynchronous IO technologies, such as Node.JS, in order to achieve high performances and flexible solutions. This solution will serve two main groups of users. The dataset providers, which are the researchers responsible for sharing and maintaining datasets, and the dataset users, that are those who desire to access the public data. To the former are given tools to easily publish and maintain large volumes of data, whereas the later are given tools to enable the preview and creation of subsets of the original data through the introduction of filter and aggregation operations. The choice of NoSQL over more traditional RDDMS emerged from and extended benchmark between relational databases (MySQL) and NoSQL (MongoDB) that is also presented in this thesis. The obtained results come to confirm the theoretical guarantees that NoSQL databases are more suitable for the kind of data that our system users will be handling, i. e., non-homogeneous data structures that can grow really fast. It is envisioned that a platform like this can lead the way to a new era of scientific data sharing where researchers are able to easily share and access all kinds of datasets, and even in more advanced scenarios be presented with recommended datasets and already existing research results on top of those recommendations.

Identificador

http://hdl.handle.net/10400.13/1322

201329344

Idioma(s)

eng

Direitos

embargoedAccess

Palavras-Chave #Ciência e tecnologia informáticas #Engenharia #Plataformas #Computação #Dados #Dataset #Sistema de gerenciamento #Conjunto de dados #OpenDataHub #Linguagens informáticas #LCAB #MySQL #MongoDB #Software #Benchmark #Informatics Engineering #Computer Science #. #Faculdade de Ciências Exatas e da Engenharia #Domínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática
Tipo

masterThesis