Imputation in data fusion of heterogeneous data sets a model-based numerical experiment


Autoria(s): Berchtold Andre; Jeannin Andre
Data(s)

2008

Resumo

Given the very large amount of data obtained everyday through population surveys, much of the new research again could use this information instead of collecting new samples. Unfortunately, relevant data are often disseminated into different files obtained through different sampling designs. Data fusion is a set of methods used to combine information from different sources into a single dataset. In this article, we are interested in a specific problem: the fusion of two data files, one of which being quite small. We propose a model-based procedure combining a logistic regression with an Expectation-Maximization algorithm. Results show that despite the lack of data, this procedure can perform better than standard matching procedures.

Identificador

http://serval.unil.ch/?id=serval:BIB_42AD4B036A5C

isbn:0361-0918

doi:10.1080/03610910802203295

isiid:000258267900005

Idioma(s)

en

Fonte

Communications In Statistics-Simulation and Computation, vol. 37, no. 7, pp. 1316-1328

Palavras-Chave #binary variable; data fusion; data structure; Expectation-Maximization algorithm; logistic regression; matching; MULTIPLE IMPUTATION
Tipo

info:eu-repo/semantics/article

article