Impact of data fragmentation across healthcare centers on the accuracy of a high-throughput clinical phenotyping algorithm for specifying subjects with type 2 diabetes mellitus

Wei Qi Wei; Cynthia L. Leibson; Jeanine E. Ransom; Abel N. Kho; Pedro J. Caraballo; High Seng Chai; Barbara P. Yawn; Jennifer A. Pacheco; Christopher G. Chute

doi:10.1136/amiajnl-2011-000597

Impact of data fragmentation across healthcare centers on the accuracy of a high-throughput clinical phenotyping algorithm for specifying subjects with type 2 diabetes mellitus

Wei Qi Wei, Cynthia L. Leibson, Jeanine E. Ransom, Abel N. Kho, Pedro J. Caraballo, High Seng Chai, Barbara P. Yawn, Jennifer A. Pacheco, Christopher G. Chute

Research output: Contribution to journal › Article › peer-review

52 Scopus citations

Abstract

Objective: To evaluate data fragmentation across healthcare centers with regard to the accuracy of a highthroughput clinical phenotyping (HTCP) algorithm developed to differentiate (1) patients with type 2 diabetes mellitus (T2DM) and (2) patients with no diabetes. Materials and methods: This population-based study identified all Olmsted County, Minnesota residents in 2007. We used provider-linked electronic medical record data from the two healthcare centers that provide >95% of all care to County residents (ie, Olmsted Medical Center and Mayo Clinic in Rochester, Minnesota, USA). Subjects were limited to residents with one or more encounter January 1, 2006 through December 31, 2007 at both healthcare centers. DM-relevant data on diagnoses, laboratory results, and medication from both centers were obtained during this period. The algorithm was first executed using data from both centers (ie, the gold standard) and then from Mayo Clinic alone. Positive predictive values and false-negative rates were calculated, and the McNemar test was used to compare categorization when data from the Mayo Clinic alone were used with the gold standard. Age and sex were compared between true-positive and false-negative subjects with T2DM. Statistical significance was accepted as p<0.05. Results: With data from both medical centers, 765 subjects with T2DM (4256 non-DM subjects) were identified. When single-center data were used, 252 T2DM subjects (1573 non-DM subjects) were missed; an additional false-positive 27 T2DM subjects (215 non-DM subjects) were identified. The positive predictive values and false-negative rates were 95.0% (513/540) and 32.9% (252/765), respectively, for T2DM subjects and 92.6% (2683/2898) and 37.0% (1573/4256), respectively, for non-DM subjects. Age and sex distribution differed between true-positive (mean age 62.1; 45% female) and false-negative (mean age 65.0; 56.0% female) T2DM subjects. Conclusion: The findings show that application of an HTCP algorithm using data from a single medical center contributes to misclassification. These findings should be considered carefully by researchers when developing and executing HTCP algorithms.

Original language	English (US)
Pages (from-to)	219-224
Number of pages	6
Journal	Journal of the American Medical Informatics Association
Volume	19
Issue number	2
DOIs	https://doi.org/10.1136/amiajnl-2011-000597
State	Published - Mar 2012

ASJC Scopus subject areas

Health Informatics

Access to Document

10.1136/amiajnl-2011-000597

Cite this

Wei, W. Q., Leibson, C. L., Ransom, J. E., Kho, A. N., Caraballo, P. J., Chai, H. S., Yawn, B. P., Pacheco, J. A., & Chute, C. G. (2012). Impact of data fragmentation across healthcare centers on the accuracy of a high-throughput clinical phenotyping algorithm for specifying subjects with type 2 diabetes mellitus. Journal of the American Medical Informatics Association, 19(2), 219-224. https://doi.org/10.1136/amiajnl-2011-000597

Impact of data fragmentation across healthcare centers on the accuracy of a high-throughput clinical phenotyping algorithm for specifying subjects with type 2 diabetes mellitus. / Wei, Wei Qi; Leibson, Cynthia L.; Ransom, Jeanine E. et al.
In: Journal of the American Medical Informatics Association, Vol. 19, No. 2, 03.2012, p. 219-224.

Research output: Contribution to journal › Article › peer-review

Wei, WQ, Leibson, CL, Ransom, JE, Kho, AN, Caraballo, PJ, Chai, HS, Yawn, BP, Pacheco, JA & Chute, CG 2012, 'Impact of data fragmentation across healthcare centers on the accuracy of a high-throughput clinical phenotyping algorithm for specifying subjects with type 2 diabetes mellitus', Journal of the American Medical Informatics Association, vol. 19, no. 2, pp. 219-224. https://doi.org/10.1136/amiajnl-2011-000597

@article{6050e351010f4e11b63e5e47dec96ab5,

title = "Impact of data fragmentation across healthcare centers on the accuracy of a high-throughput clinical phenotyping algorithm for specifying subjects with type 2 diabetes mellitus",

abstract = "Objective: To evaluate data fragmentation across healthcare centers with regard to the accuracy of a highthroughput clinical phenotyping (HTCP) algorithm developed to differentiate (1) patients with type 2 diabetes mellitus (T2DM) and (2) patients with no diabetes. Materials and methods: This population-based study identified all Olmsted County, Minnesota residents in 2007. We used provider-linked electronic medical record data from the two healthcare centers that provide >95% of all care to County residents (ie, Olmsted Medical Center and Mayo Clinic in Rochester, Minnesota, USA). Subjects were limited to residents with one or more encounter January 1, 2006 through December 31, 2007 at both healthcare centers. DM-relevant data on diagnoses, laboratory results, and medication from both centers were obtained during this period. The algorithm was first executed using data from both centers (ie, the gold standard) and then from Mayo Clinic alone. Positive predictive values and false-negative rates were calculated, and the McNemar test was used to compare categorization when data from the Mayo Clinic alone were used with the gold standard. Age and sex were compared between true-positive and false-negative subjects with T2DM. Statistical significance was accepted as p<0.05. Results: With data from both medical centers, 765 subjects with T2DM (4256 non-DM subjects) were identified. When single-center data were used, 252 T2DM subjects (1573 non-DM subjects) were missed; an additional false-positive 27 T2DM subjects (215 non-DM subjects) were identified. The positive predictive values and false-negative rates were 95.0% (513/540) and 32.9% (252/765), respectively, for T2DM subjects and 92.6% (2683/2898) and 37.0% (1573/4256), respectively, for non-DM subjects. Age and sex distribution differed between true-positive (mean age 62.1; 45% female) and false-negative (mean age 65.0; 56.0% female) T2DM subjects. Conclusion: The findings show that application of an HTCP algorithm using data from a single medical center contributes to misclassification. These findings should be considered carefully by researchers when developing and executing HTCP algorithms.",

author = "Wei, {Wei Qi} and Leibson, {Cynthia L.} and Ransom, {Jeanine E.} and Kho, {Abel N.} and Caraballo, {Pedro J.} and Chai, {High Seng} and Yawn, {Barbara P.} and Pacheco, {Jennifer A.} and Chute, {Christopher G.}",

year = "2012",

month = mar,

doi = "10.1136/amiajnl-2011-000597",

language = "English (US)",

volume = "19",

pages = "219--224",

journal = "Journal of the American Medical Informatics Association",

issn = "1067-5027",

publisher = "Oxford University Press",

number = "2",

}

TY - JOUR

T1 - Impact of data fragmentation across healthcare centers on the accuracy of a high-throughput clinical phenotyping algorithm for specifying subjects with type 2 diabetes mellitus

AU - Wei, Wei Qi

AU - Leibson, Cynthia L.

AU - Ransom, Jeanine E.

AU - Kho, Abel N.

AU - Caraballo, Pedro J.

AU - Chai, High Seng

AU - Yawn, Barbara P.

AU - Pacheco, Jennifer A.

AU - Chute, Christopher G.

PY - 2012/3

Y1 - 2012/3

N2 - Objective: To evaluate data fragmentation across healthcare centers with regard to the accuracy of a highthroughput clinical phenotyping (HTCP) algorithm developed to differentiate (1) patients with type 2 diabetes mellitus (T2DM) and (2) patients with no diabetes. Materials and methods: This population-based study identified all Olmsted County, Minnesota residents in 2007. We used provider-linked electronic medical record data from the two healthcare centers that provide >95% of all care to County residents (ie, Olmsted Medical Center and Mayo Clinic in Rochester, Minnesota, USA). Subjects were limited to residents with one or more encounter January 1, 2006 through December 31, 2007 at both healthcare centers. DM-relevant data on diagnoses, laboratory results, and medication from both centers were obtained during this period. The algorithm was first executed using data from both centers (ie, the gold standard) and then from Mayo Clinic alone. Positive predictive values and false-negative rates were calculated, and the McNemar test was used to compare categorization when data from the Mayo Clinic alone were used with the gold standard. Age and sex were compared between true-positive and false-negative subjects with T2DM. Statistical significance was accepted as p<0.05. Results: With data from both medical centers, 765 subjects with T2DM (4256 non-DM subjects) were identified. When single-center data were used, 252 T2DM subjects (1573 non-DM subjects) were missed; an additional false-positive 27 T2DM subjects (215 non-DM subjects) were identified. The positive predictive values and false-negative rates were 95.0% (513/540) and 32.9% (252/765), respectively, for T2DM subjects and 92.6% (2683/2898) and 37.0% (1573/4256), respectively, for non-DM subjects. Age and sex distribution differed between true-positive (mean age 62.1; 45% female) and false-negative (mean age 65.0; 56.0% female) T2DM subjects. Conclusion: The findings show that application of an HTCP algorithm using data from a single medical center contributes to misclassification. These findings should be considered carefully by researchers when developing and executing HTCP algorithms.

AB - Objective: To evaluate data fragmentation across healthcare centers with regard to the accuracy of a highthroughput clinical phenotyping (HTCP) algorithm developed to differentiate (1) patients with type 2 diabetes mellitus (T2DM) and (2) patients with no diabetes. Materials and methods: This population-based study identified all Olmsted County, Minnesota residents in 2007. We used provider-linked electronic medical record data from the two healthcare centers that provide >95% of all care to County residents (ie, Olmsted Medical Center and Mayo Clinic in Rochester, Minnesota, USA). Subjects were limited to residents with one or more encounter January 1, 2006 through December 31, 2007 at both healthcare centers. DM-relevant data on diagnoses, laboratory results, and medication from both centers were obtained during this period. The algorithm was first executed using data from both centers (ie, the gold standard) and then from Mayo Clinic alone. Positive predictive values and false-negative rates were calculated, and the McNemar test was used to compare categorization when data from the Mayo Clinic alone were used with the gold standard. Age and sex were compared between true-positive and false-negative subjects with T2DM. Statistical significance was accepted as p<0.05. Results: With data from both medical centers, 765 subjects with T2DM (4256 non-DM subjects) were identified. When single-center data were used, 252 T2DM subjects (1573 non-DM subjects) were missed; an additional false-positive 27 T2DM subjects (215 non-DM subjects) were identified. The positive predictive values and false-negative rates were 95.0% (513/540) and 32.9% (252/765), respectively, for T2DM subjects and 92.6% (2683/2898) and 37.0% (1573/4256), respectively, for non-DM subjects. Age and sex distribution differed between true-positive (mean age 62.1; 45% female) and false-negative (mean age 65.0; 56.0% female) T2DM subjects. Conclusion: The findings show that application of an HTCP algorithm using data from a single medical center contributes to misclassification. These findings should be considered carefully by researchers when developing and executing HTCP algorithms.

UR - http://www.scopus.com/inward/record.url?scp=84857165460&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84857165460&partnerID=8YFLogxK

U2 - 10.1136/amiajnl-2011-000597

DO - 10.1136/amiajnl-2011-000597

M3 - Article

C2 - 22249968

AN - SCOPUS:84857165460

SN - 1067-5027

VL - 19

SP - 219

EP - 224

JO - Journal of the American Medical Informatics Association

JF - Journal of the American Medical Informatics Association

IS - 2

ER -

Impact of data fragmentation across healthcare centers on the accuracy of a high-throughput clinical phenotyping algorithm for specifying subjects with type 2 diabetes mellitus

Abstract

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this