Objective: To identify clusters of comorbidities in patients with rheumatoid arthritis (RA) using 4 methods and to compare to patients without RA. Methods: In this retrospective, population-based study, residents of 8 Minnesota counties with prevalent RA as of January 1, 2015 were identified. Age-, sex-, and county-matched non-RA comparators were selected from the same underlying population. Diagnostic codes were retrieved for 5 years before January 1, 2015. Using 2 codes ≥30 days apart, 44 previously defined morbidities and 11 nonoverlapping chronic disease categories based on Clinical Classifications Software were defined. Unsupervised machine learning methods of interest included hierarchical clustering, factor analysis, K-means clustering, and network analysis. Results: Two groups of 1,643 patients with and without RA (72% female; mean age 63.1 years in both groups) were studied. Clustering of comorbidities revealed strong associations among mental/behavioral comorbidities and among cardiovascular risk factors and diseases. The clusters were associated with age and sex. Differences between the 4 clustering methods were driven by comorbidities that are rare and those that were weakly associated with other comorbidities. Common comorbidities tended to group together consistently across approaches. The instability of clusters when using different random seeds or bootstrap sampling impugns the usefulness and reliability of these methods. Clusters of common comorbidities between RA and non-RA cohorts were similar. Conclusion: Despite the higher comorbidity burden in patients with RA compared to the general population, clustering comorbidities did not identify substantial differences in comorbidity patterns between the RA and non-RA cohorts. The instability of clustering methods suggests caution when interpreting clustering using 1 method.
ASJC Scopus subject areas