Complete imputation of missing repeated categorical data

One-sample applications

Colin Patrick West, Jeffrey D. Dawson

Research output: Contribution to journalArticle

3 Citations (Scopus)

Abstract

Longitudinal studies with repeated measures are often subject to non-response. Methods currently employed to alleviate the difficulties caused by missing data are typically unsatisfactory, especially when the cause of the missingness is related to the outcomes. We present an approach for incomplete categorical data in the repeated measures setting that allows missing data to depend on other observed outcomes for a study subject. The proposed methodology also allows a broader examination of study findings through interpretation of results in the framework of the set of all possible test statistics that might have been observed had no data been missing. The proposed approach consists of the following general steps. First, we generate all possible sets of missing values and form a set of possible complete data sets. We then weight each data set according to clearly defined assumptions and apply an appropriate statistical test procedure to each data set, combining the results to give an overall indication of significance. We make use of the EM algorithm and a Bayesian prior in this approach. While not restricted to the one-sample case, the proposed methodology is illustrated for one-sample data and compared to the common complete-case and available-case analysis methods.

Original languageEnglish (US)
Pages (from-to)203-217
Number of pages15
JournalStatistics in Medicine
Volume21
Issue number2
DOIs
StatePublished - Jan 30 2002

Fingerprint

Nominal or categorical data
Imputation
Missing Data
Repeated Measures
Longitudinal Studies
Outcome Assessment (Health Care)
Weights and Measures
Non-response
Methodology
Longitudinal Study
Missing Values
Incomplete Data
EM Algorithm
Statistical test
Test Statistic
Datasets

Keywords

  • EM algorithm
  • Incomplete categorical data
  • Missing data in longitudinal studies
  • Pattern of missingness

ASJC Scopus subject areas

  • Epidemiology

Cite this

Complete imputation of missing repeated categorical data : One-sample applications. / West, Colin Patrick; Dawson, Jeffrey D.

In: Statistics in Medicine, Vol. 21, No. 2, 30.01.2002, p. 203-217.

Research output: Contribution to journalArticle

West, Colin Patrick ; Dawson, Jeffrey D. / Complete imputation of missing repeated categorical data : One-sample applications. In: Statistics in Medicine. 2002 ; Vol. 21, No. 2. pp. 203-217.
@article{ef0856f9687344939663ea9167369f7b,
title = "Complete imputation of missing repeated categorical data: One-sample applications",
abstract = "Longitudinal studies with repeated measures are often subject to non-response. Methods currently employed to alleviate the difficulties caused by missing data are typically unsatisfactory, especially when the cause of the missingness is related to the outcomes. We present an approach for incomplete categorical data in the repeated measures setting that allows missing data to depend on other observed outcomes for a study subject. The proposed methodology also allows a broader examination of study findings through interpretation of results in the framework of the set of all possible test statistics that might have been observed had no data been missing. The proposed approach consists of the following general steps. First, we generate all possible sets of missing values and form a set of possible complete data sets. We then weight each data set according to clearly defined assumptions and apply an appropriate statistical test procedure to each data set, combining the results to give an overall indication of significance. We make use of the EM algorithm and a Bayesian prior in this approach. While not restricted to the one-sample case, the proposed methodology is illustrated for one-sample data and compared to the common complete-case and available-case analysis methods.",
keywords = "EM algorithm, Incomplete categorical data, Missing data in longitudinal studies, Pattern of missingness",
author = "West, {Colin Patrick} and Dawson, {Jeffrey D.}",
year = "2002",
month = "1",
day = "30",
doi = "10.1002/sim.982",
language = "English (US)",
volume = "21",
pages = "203--217",
journal = "Statistics in Medicine",
issn = "0277-6715",
publisher = "John Wiley and Sons Ltd",
number = "2",

}

TY - JOUR

T1 - Complete imputation of missing repeated categorical data

T2 - One-sample applications

AU - West, Colin Patrick

AU - Dawson, Jeffrey D.

PY - 2002/1/30

Y1 - 2002/1/30

N2 - Longitudinal studies with repeated measures are often subject to non-response. Methods currently employed to alleviate the difficulties caused by missing data are typically unsatisfactory, especially when the cause of the missingness is related to the outcomes. We present an approach for incomplete categorical data in the repeated measures setting that allows missing data to depend on other observed outcomes for a study subject. The proposed methodology also allows a broader examination of study findings through interpretation of results in the framework of the set of all possible test statistics that might have been observed had no data been missing. The proposed approach consists of the following general steps. First, we generate all possible sets of missing values and form a set of possible complete data sets. We then weight each data set according to clearly defined assumptions and apply an appropriate statistical test procedure to each data set, combining the results to give an overall indication of significance. We make use of the EM algorithm and a Bayesian prior in this approach. While not restricted to the one-sample case, the proposed methodology is illustrated for one-sample data and compared to the common complete-case and available-case analysis methods.

AB - Longitudinal studies with repeated measures are often subject to non-response. Methods currently employed to alleviate the difficulties caused by missing data are typically unsatisfactory, especially when the cause of the missingness is related to the outcomes. We present an approach for incomplete categorical data in the repeated measures setting that allows missing data to depend on other observed outcomes for a study subject. The proposed methodology also allows a broader examination of study findings through interpretation of results in the framework of the set of all possible test statistics that might have been observed had no data been missing. The proposed approach consists of the following general steps. First, we generate all possible sets of missing values and form a set of possible complete data sets. We then weight each data set according to clearly defined assumptions and apply an appropriate statistical test procedure to each data set, combining the results to give an overall indication of significance. We make use of the EM algorithm and a Bayesian prior in this approach. While not restricted to the one-sample case, the proposed methodology is illustrated for one-sample data and compared to the common complete-case and available-case analysis methods.

KW - EM algorithm

KW - Incomplete categorical data

KW - Missing data in longitudinal studies

KW - Pattern of missingness

UR - http://www.scopus.com/inward/record.url?scp=0037196216&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0037196216&partnerID=8YFLogxK

U2 - 10.1002/sim.982

DO - 10.1002/sim.982

M3 - Article

VL - 21

SP - 203

EP - 217

JO - Statistics in Medicine

JF - Statistics in Medicine

SN - 0277-6715

IS - 2

ER -