Biblioteca Digital

A principled experimental design approach to Big Data analysis

**Autoria(s):** Drovandi, Christopher C.; Holmes, Christopher; McGree, James; Mengersen, Kerrie; Richardson, Sylvia; Ryan, Elizabeth
Data(s)	2015
Resumo	Big Datasets are endemic, but they are often notoriously difficult to analyse because of their size, heterogeneity, history and quality. The purpose of this paper is to open a discourse on the use of modern experimental design methods to analyse Big Data in order to answer particular questions of interest. By appealing to a range of examples, it is suggested that this perspective on Big Data modelling and analysis has wide generality and advantageous inferential and computational properties. In particular, the principled experimental design approach is shown to provide a flexible framework for analysis that, for certain classes of objectives and utility functions, delivers near equivalent answers compared with analyses of the full dataset under a controlled error rate. It can also provide a formalised method for iterative parameter estimation, model checking, identification of data gaps and evaluation of data quality. Finally, it has the potential to add value to other Big Data sampling algorithms, in particular divide-and-conquer strategies, by determining efficient sub-samples.
Formato	application/pdf
Identificador	http://eprints.qut.edu.au/87946/
Relação	http://eprints.qut.edu.au/87946/8/87946.pdf Drovandi, Christopher C., Holmes, Christopher, McGree, James, Mengersen, Kerrie, Richardson, Sylvia, & Ryan, Elizabeth (2015) A principled experimental design approach to Big Data analysis. [Working Paper] (Unpublished)
Direitos	Copyright 2015 The Author(s)
Fonte	Science & Engineering Faculty
Palavras-Chave	#010400 STATISTICS #Big Data #Sub-sampling #Experimental design #Active learning #Dimension reduction #Subset
Tipo	Working Paper

Acesso ao item digital