The absence of longitudinal data limits the accuracy of high-throughput clinical phenotyping for identifying type 2 diabetes mellitus subjects

Wei Qi Wei, Cynthia L. Leibson, Jeanine E. Ransom, Abel N. Kho, Christopher G. Chute

Research output: Contribution to journalArticle

21 Citations (Scopus)

Abstract

Purpose: To evaluate the impact of insufficient longitudinal data on the accuracy of a high-throughput clinical phenotyping (HTCP) algorithm for identifying (1) patients with type 2 diabetes mellitus (T2DM) and (2) patients with no diabetes. Methods: Retrospective study conducted at Mayo Clinic in Rochester, Minnesota. Eligible subjects were Olmsted County residents with ≥1 Mayo Clinic encounter in each of three time periods: (1) 2007, (2) from 1997 through 2006, and (3) before 1997 (N= 54,283). Diabetes relevant electronic medical record (EMR) data about diagnoses, laboratories, and medications were used. We employed the HTCP algorithm to categorize individuals as T2DM cases and non-diabetes controls. Considering the full 11. years (1997-2007) as the gold standard, we compared gold-standard categorizations with those using data for 10 subsequent intervals, ranging from 1998-2007 (10-year data) to 2007 (1-year data). Positive predictive values (PPVs) and false-negative rates (FNRs) were calculated. McNemar tests were used to determine whether categorizations using shorter time periods differed from the gold standard. Statistical significance was defined as P< 0.05. Results: We identified 2770 T2DM cases and 21,005 controls when the algorithm was applied using 11-year data. Using 2007 data alone, PPVs and FNRs, respectively, were 70% and 25% for case identification and 59% and 67% for control identification. All time frames differed significantly from the gold standard, except for the 10-year period. Conclusions: The accuracy of the algorithm reduced remarkably as data were limited to shorter observation periods. This impact should be considered carefully when designing/executing HTCP algorithms.

Original languageEnglish (US)
Pages (from-to)239-247
Number of pages9
JournalInternational Journal of Medical Informatics
Volume82
Issue number4
DOIs
StatePublished - Apr 2013

Fingerprint

Type 2 Diabetes Mellitus
Electronic Health Records
Clinical Laboratory Techniques
Gold
Retrospective Studies
Observation

Keywords

  • Data aggregation
  • Diabetes mellitus
  • Electronic medical record
  • Medical informatics
  • Phenotype
  • Research subject selection

ASJC Scopus subject areas

  • Health Informatics

Cite this

The absence of longitudinal data limits the accuracy of high-throughput clinical phenotyping for identifying type 2 diabetes mellitus subjects. / Wei, Wei Qi; Leibson, Cynthia L.; Ransom, Jeanine E.; Kho, Abel N.; Chute, Christopher G.

In: International Journal of Medical Informatics, Vol. 82, No. 4, 04.2013, p. 239-247.

Research output: Contribution to journalArticle

Wei, Wei Qi ; Leibson, Cynthia L. ; Ransom, Jeanine E. ; Kho, Abel N. ; Chute, Christopher G. / The absence of longitudinal data limits the accuracy of high-throughput clinical phenotyping for identifying type 2 diabetes mellitus subjects. In: International Journal of Medical Informatics. 2013 ; Vol. 82, No. 4. pp. 239-247.
@article{73ba20dbdab8411d9b43fc2d27cc3bf1,
title = "The absence of longitudinal data limits the accuracy of high-throughput clinical phenotyping for identifying type 2 diabetes mellitus subjects",
abstract = "Purpose: To evaluate the impact of insufficient longitudinal data on the accuracy of a high-throughput clinical phenotyping (HTCP) algorithm for identifying (1) patients with type 2 diabetes mellitus (T2DM) and (2) patients with no diabetes. Methods: Retrospective study conducted at Mayo Clinic in Rochester, Minnesota. Eligible subjects were Olmsted County residents with ≥1 Mayo Clinic encounter in each of three time periods: (1) 2007, (2) from 1997 through 2006, and (3) before 1997 (N= 54,283). Diabetes relevant electronic medical record (EMR) data about diagnoses, laboratories, and medications were used. We employed the HTCP algorithm to categorize individuals as T2DM cases and non-diabetes controls. Considering the full 11. years (1997-2007) as the gold standard, we compared gold-standard categorizations with those using data for 10 subsequent intervals, ranging from 1998-2007 (10-year data) to 2007 (1-year data). Positive predictive values (PPVs) and false-negative rates (FNRs) were calculated. McNemar tests were used to determine whether categorizations using shorter time periods differed from the gold standard. Statistical significance was defined as P< 0.05. Results: We identified 2770 T2DM cases and 21,005 controls when the algorithm was applied using 11-year data. Using 2007 data alone, PPVs and FNRs, respectively, were 70{\%} and 25{\%} for case identification and 59{\%} and 67{\%} for control identification. All time frames differed significantly from the gold standard, except for the 10-year period. Conclusions: The accuracy of the algorithm reduced remarkably as data were limited to shorter observation periods. This impact should be considered carefully when designing/executing HTCP algorithms.",
keywords = "Data aggregation, Diabetes mellitus, Electronic medical record, Medical informatics, Phenotype, Research subject selection",
author = "Wei, {Wei Qi} and Leibson, {Cynthia L.} and Ransom, {Jeanine E.} and Kho, {Abel N.} and Chute, {Christopher G.}",
year = "2013",
month = "4",
doi = "10.1016/j.ijmedinf.2012.05.015",
language = "English (US)",
volume = "82",
pages = "239--247",
journal = "International Journal of Medical Informatics",
issn = "1386-5056",
publisher = "Elsevier Ireland Ltd",
number = "4",

}

TY - JOUR

T1 - The absence of longitudinal data limits the accuracy of high-throughput clinical phenotyping for identifying type 2 diabetes mellitus subjects

AU - Wei, Wei Qi

AU - Leibson, Cynthia L.

AU - Ransom, Jeanine E.

AU - Kho, Abel N.

AU - Chute, Christopher G.

PY - 2013/4

Y1 - 2013/4

N2 - Purpose: To evaluate the impact of insufficient longitudinal data on the accuracy of a high-throughput clinical phenotyping (HTCP) algorithm for identifying (1) patients with type 2 diabetes mellitus (T2DM) and (2) patients with no diabetes. Methods: Retrospective study conducted at Mayo Clinic in Rochester, Minnesota. Eligible subjects were Olmsted County residents with ≥1 Mayo Clinic encounter in each of three time periods: (1) 2007, (2) from 1997 through 2006, and (3) before 1997 (N= 54,283). Diabetes relevant electronic medical record (EMR) data about diagnoses, laboratories, and medications were used. We employed the HTCP algorithm to categorize individuals as T2DM cases and non-diabetes controls. Considering the full 11. years (1997-2007) as the gold standard, we compared gold-standard categorizations with those using data for 10 subsequent intervals, ranging from 1998-2007 (10-year data) to 2007 (1-year data). Positive predictive values (PPVs) and false-negative rates (FNRs) were calculated. McNemar tests were used to determine whether categorizations using shorter time periods differed from the gold standard. Statistical significance was defined as P< 0.05. Results: We identified 2770 T2DM cases and 21,005 controls when the algorithm was applied using 11-year data. Using 2007 data alone, PPVs and FNRs, respectively, were 70% and 25% for case identification and 59% and 67% for control identification. All time frames differed significantly from the gold standard, except for the 10-year period. Conclusions: The accuracy of the algorithm reduced remarkably as data were limited to shorter observation periods. This impact should be considered carefully when designing/executing HTCP algorithms.

AB - Purpose: To evaluate the impact of insufficient longitudinal data on the accuracy of a high-throughput clinical phenotyping (HTCP) algorithm for identifying (1) patients with type 2 diabetes mellitus (T2DM) and (2) patients with no diabetes. Methods: Retrospective study conducted at Mayo Clinic in Rochester, Minnesota. Eligible subjects were Olmsted County residents with ≥1 Mayo Clinic encounter in each of three time periods: (1) 2007, (2) from 1997 through 2006, and (3) before 1997 (N= 54,283). Diabetes relevant electronic medical record (EMR) data about diagnoses, laboratories, and medications were used. We employed the HTCP algorithm to categorize individuals as T2DM cases and non-diabetes controls. Considering the full 11. years (1997-2007) as the gold standard, we compared gold-standard categorizations with those using data for 10 subsequent intervals, ranging from 1998-2007 (10-year data) to 2007 (1-year data). Positive predictive values (PPVs) and false-negative rates (FNRs) were calculated. McNemar tests were used to determine whether categorizations using shorter time periods differed from the gold standard. Statistical significance was defined as P< 0.05. Results: We identified 2770 T2DM cases and 21,005 controls when the algorithm was applied using 11-year data. Using 2007 data alone, PPVs and FNRs, respectively, were 70% and 25% for case identification and 59% and 67% for control identification. All time frames differed significantly from the gold standard, except for the 10-year period. Conclusions: The accuracy of the algorithm reduced remarkably as data were limited to shorter observation periods. This impact should be considered carefully when designing/executing HTCP algorithms.

KW - Data aggregation

KW - Diabetes mellitus

KW - Electronic medical record

KW - Medical informatics

KW - Phenotype

KW - Research subject selection

UR - http://www.scopus.com/inward/record.url?scp=84875375245&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84875375245&partnerID=8YFLogxK

U2 - 10.1016/j.ijmedinf.2012.05.015

DO - 10.1016/j.ijmedinf.2012.05.015

M3 - Article

VL - 82

SP - 239

EP - 247

JO - International Journal of Medical Informatics

JF - International Journal of Medical Informatics

SN - 1386-5056

IS - 4

ER -