Missing data mechanisms and their implications on the analysis of categorical data


Autoria(s): POLETO, Frederico Z.; SINGER, Julio M.; PAULINO, Carlos Daniel
Contribuinte(s)

UNIVERSIDADE DE SÃO PAULO

Data(s)

20/10/2012

20/10/2012

2011

Resumo

We review some issues related to the implications of different missing data mechanisms on statistical inference for contingency tables and consider simulation studies to compare the results obtained under such models to those where the units with missing data are disregarded. We confirm that although, in general, analyses under the correct missing at random and missing completely at random models are more efficient even for small sample sizes, there are exceptions where they may not improve the results obtained by ignoring the partially classified data. We show that under the missing not at random (MNAR) model, estimates on the boundary of the parameter space as well as lack of identifiability of the parameters of saturated models may be associated with undesirable asymptotic properties of maximum likelihood estimators and likelihood ratio tests; even in standard cases the bias of the estimators may be low only for very large samples. We also show that the probability of a boundary solution obtained under the correct MNAR model may be large even for large samples and that, consequently, we may not always conclude that a MNAR model is misspecified because the estimate is on the boundary of the parameter space.

Conselho Nacional de Desenvolvimento Cientifico e Tecnologico (CNPq), Brazil

Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)

Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)

Fundacao de Amparo a Pesquisa do Estado de Sao Paulo (FAPESP), Brazil

Coordenacao de Aperfeicoamento de Pessoal de Nivel Superior (CAPES), Brazil

Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)

Fundacao para a Ciencia e Tecnologia (FCT) through the CEAUL-FCUL, Portugal

Fundação para a Ciência e a Tecnologia de Portugal (FCT)

Identificador

STATISTICS AND COMPUTING, v.21, n.1, p.31-43, 2011

0960-3174

http://producao.usp.br/handle/BDPI/30457

10.1007/s11222-009-9143-x

http://dx.doi.org/10.1007/s11222-009-9143-x

Idioma(s)

eng

Publicador

SPRINGER

Relação

Statistics and Computing

Direitos

restrictedAccess

Copyright SPRINGER

Palavras-Chave #Categorical data #Missing or incomplete data #MAR, MCAR and MNAR #Ignorable and non-ignorable mechanism #Selection models #NON-IGNORABLE NONRESPONSE #LOG-LINEAR MODELS #NONIGNORABLE NONRESPONSE #CONTINGENCY-TABLES #SENSITIVITY-ANALYSIS #INFERENCE #SUBJECT #IDENTIFIABILITY #INFORMATION #REGRESSION #Computer Science, Theory & Methods #Statistics & Probability
Tipo

article

original article

publishedVersion