Using Unsupervised Machine Learning Methods to Cluster Comorbidities in a Population-Based Cohort of Patients With Rheumatoid Arthritis

Cynthia S. Crowson; Tina M. Gunderson; John M. Davis; Elena Myasoedova; Vanessa L. Kronzer; Caitrin M. Coffey; Elizabeth J. Atkinson

doi:10.1002/acr.24973

Using Unsupervised Machine Learning Methods to Cluster Comorbidities in a Population-Based Cohort of Patients With Rheumatoid Arthritis

Cynthia S. Crowson, Tina M. Gunderson, John M. Davis, Elena Myasoedova, Vanessa L. Kronzer, Caitrin M. Coffey, Elizabeth J. Atkinson

Research output: Contribution to journal › Article › peer-review

Abstract

Objective: To identify clusters of comorbidities in patients with rheumatoid arthritis (RA) using 4 methods and to compare to patients without RA. Methods: In this retrospective, population-based study, residents of 8 Minnesota counties with prevalent RA as of January 1, 2015 were identified. Age-, sex-, and county-matched non-RA comparators were selected from the same underlying population. Diagnostic codes were retrieved for 5 years before January 1, 2015. Using 2 codes ≥30 days apart, 44 previously defined morbidities and 11 nonoverlapping chronic disease categories based on Clinical Classifications Software were defined. Unsupervised machine learning methods of interest included hierarchical clustering, factor analysis, K-means clustering, and network analysis. Results: Two groups of 1,643 patients with and without RA (72% female; mean age 63.1 years in both groups) were studied. Clustering of comorbidities revealed strong associations among mental/behavioral comorbidities and among cardiovascular risk factors and diseases. The clusters were associated with age and sex. Differences between the 4 clustering methods were driven by comorbidities that are rare and those that were weakly associated with other comorbidities. Common comorbidities tended to group together consistently across approaches. The instability of clusters when using different random seeds or bootstrap sampling impugns the usefulness and reliability of these methods. Clusters of common comorbidities between RA and non-RA cohorts were similar. Conclusion: Despite the higher comorbidity burden in patients with RA compared to the general population, clustering comorbidities did not identify substantial differences in comorbidity patterns between the RA and non-RA cohorts. The instability of clustering methods suggests caution when interpreting clustering using 1 method.

Original language	English (US)
Pages (from-to)	210-219
Number of pages	10
Journal	Arthritis Care and Research
Volume	75
Issue number	2
DOIs	https://doi.org/10.1002/acr.24973
State	Published - Feb 2023

ASJC Scopus subject areas

Rheumatology

Access to Document

10.1002/acr.24973

Cite this

@article{8dabe4afbeb44a49b68135e884550dd0,

title = "Using Unsupervised Machine Learning Methods to Cluster Comorbidities in a Population-Based Cohort of Patients With Rheumatoid Arthritis",

abstract = "Objective: To identify clusters of comorbidities in patients with rheumatoid arthritis (RA) using 4 methods and to compare to patients without RA. Methods: In this retrospective, population-based study, residents of 8 Minnesota counties with prevalent RA as of January 1, 2015 were identified. Age-, sex-, and county-matched non-RA comparators were selected from the same underlying population. Diagnostic codes were retrieved for 5 years before January 1, 2015. Using 2 codes ≥30 days apart, 44 previously defined morbidities and 11 nonoverlapping chronic disease categories based on Clinical Classifications Software were defined. Unsupervised machine learning methods of interest included hierarchical clustering, factor analysis, K-means clustering, and network analysis. Results: Two groups of 1,643 patients with and without RA (72% female; mean age 63.1 years in both groups) were studied. Clustering of comorbidities revealed strong associations among mental/behavioral comorbidities and among cardiovascular risk factors and diseases. The clusters were associated with age and sex. Differences between the 4 clustering methods were driven by comorbidities that are rare and those that were weakly associated with other comorbidities. Common comorbidities tended to group together consistently across approaches. The instability of clusters when using different random seeds or bootstrap sampling impugns the usefulness and reliability of these methods. Clusters of common comorbidities between RA and non-RA cohorts were similar. Conclusion: Despite the higher comorbidity burden in patients with RA compared to the general population, clustering comorbidities did not identify substantial differences in comorbidity patterns between the RA and non-RA cohorts. The instability of clustering methods suggests caution when interpreting clustering using 1 method.",

author = "Crowson, {Cynthia S.} and Gunderson, {Tina M.} and Davis, {John M.} and Elena Myasoedova and Kronzer, {Vanessa L.} and Coffey, {Caitrin M.} and Atkinson, {Elizabeth J.}",

note = "Publisher Copyright: {\textcopyright} 2022 American College of Rheumatology.",

year = "2023",

month = feb,

doi = "10.1002/acr.24973",

language = "English (US)",

volume = "75",

pages = "210--219",

journal = "Arthritis Care and Research",

issn = "2151-464X",

publisher = "John Wiley & Sons Inc.",

number = "2",

}

TY - JOUR

T1 - Using Unsupervised Machine Learning Methods to Cluster Comorbidities in a Population-Based Cohort of Patients With Rheumatoid Arthritis

AU - Crowson, Cynthia S.

AU - Gunderson, Tina M.

AU - Davis, John M.

AU - Myasoedova, Elena

AU - Kronzer, Vanessa L.

AU - Coffey, Caitrin M.

AU - Atkinson, Elizabeth J.

PY - 2023/2

Y1 - 2023/2

N2 - Objective: To identify clusters of comorbidities in patients with rheumatoid arthritis (RA) using 4 methods and to compare to patients without RA. Methods: In this retrospective, population-based study, residents of 8 Minnesota counties with prevalent RA as of January 1, 2015 were identified. Age-, sex-, and county-matched non-RA comparators were selected from the same underlying population. Diagnostic codes were retrieved for 5 years before January 1, 2015. Using 2 codes ≥30 days apart, 44 previously defined morbidities and 11 nonoverlapping chronic disease categories based on Clinical Classifications Software were defined. Unsupervised machine learning methods of interest included hierarchical clustering, factor analysis, K-means clustering, and network analysis. Results: Two groups of 1,643 patients with and without RA (72% female; mean age 63.1 years in both groups) were studied. Clustering of comorbidities revealed strong associations among mental/behavioral comorbidities and among cardiovascular risk factors and diseases. The clusters were associated with age and sex. Differences between the 4 clustering methods were driven by comorbidities that are rare and those that were weakly associated with other comorbidities. Common comorbidities tended to group together consistently across approaches. The instability of clusters when using different random seeds or bootstrap sampling impugns the usefulness and reliability of these methods. Clusters of common comorbidities between RA and non-RA cohorts were similar. Conclusion: Despite the higher comorbidity burden in patients with RA compared to the general population, clustering comorbidities did not identify substantial differences in comorbidity patterns between the RA and non-RA cohorts. The instability of clustering methods suggests caution when interpreting clustering using 1 method.

AB - Objective: To identify clusters of comorbidities in patients with rheumatoid arthritis (RA) using 4 methods and to compare to patients without RA. Methods: In this retrospective, population-based study, residents of 8 Minnesota counties with prevalent RA as of January 1, 2015 were identified. Age-, sex-, and county-matched non-RA comparators were selected from the same underlying population. Diagnostic codes were retrieved for 5 years before January 1, 2015. Using 2 codes ≥30 days apart, 44 previously defined morbidities and 11 nonoverlapping chronic disease categories based on Clinical Classifications Software were defined. Unsupervised machine learning methods of interest included hierarchical clustering, factor analysis, K-means clustering, and network analysis. Results: Two groups of 1,643 patients with and without RA (72% female; mean age 63.1 years in both groups) were studied. Clustering of comorbidities revealed strong associations among mental/behavioral comorbidities and among cardiovascular risk factors and diseases. The clusters were associated with age and sex. Differences between the 4 clustering methods were driven by comorbidities that are rare and those that were weakly associated with other comorbidities. Common comorbidities tended to group together consistently across approaches. The instability of clusters when using different random seeds or bootstrap sampling impugns the usefulness and reliability of these methods. Clusters of common comorbidities between RA and non-RA cohorts were similar. Conclusion: Despite the higher comorbidity burden in patients with RA compared to the general population, clustering comorbidities did not identify substantial differences in comorbidity patterns between the RA and non-RA cohorts. The instability of clustering methods suggests caution when interpreting clustering using 1 method.

UR - http://www.scopus.com/inward/record.url?scp=85138096770&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85138096770&partnerID=8YFLogxK

U2 - 10.1002/acr.24973

DO - 10.1002/acr.24973

M3 - Article

C2 - 35724274

AN - SCOPUS:85138096770

SN - 2151-464X

VL - 75

SP - 210

EP - 219

JO - Arthritis Care and Research

JF - Arthritis Care and Research

IS - 2

ER -

Using Unsupervised Machine Learning Methods to Cluster Comorbidities in a Population-Based Cohort of Patients With Rheumatoid Arthritis

Abstract

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this