Save Consumers Time and Money: Thou Shall Not Forget Digital Native Big Data Consumers


Autoria(s): Arguillas, Florio Orocio
Data(s)

03/07/2014

03/07/2014

10/06/2014

Resumo

Poster at Open Repositories 2014, Helsinki, Finland, June 9-13, 2014

Posters, Demos and Developer "How-To's"

In order to accommodate all types of data consumers, the Census Bureau (CB) distributed their 9,060-variable Census 2010 Summary File 1 (SF1) tables into 49 segment files, each with variables not exceeding 256 so as not to exceed the older generation spreadsheet column limit. By providing the data in segments, not in full, digital native big data consumers in the U.S. and all over the world who have the technical and logistical capacity to process big data have to process the summary files in the same manner as other data consumers. This translates cumulatively to thousands of person hours spent following the multi-step process of preparing and merging the segments to extract needed information. These costs could have been avoided had the CB or repositories distributing the SF1 also made available full datasets in one file with big data consumers in mind. This is precisely what the CISER Data Archive implemented as it found a repository niche – making available full datasets of the SF1 for free to Big Data Consumers in an easy, one-click download fashion.

Identificador

http://www.doria.fi/handle/10024/97565

URN:NBN:fi-fe2014070432201

Idioma(s)

en

Relação

Poster Reception

Open Repositories 2014

Cornell University, United States of America

Palavras-Chave #Repository #Niche #Big Data #SF1 #CISER
Tipo

Poster