Improved interpretability of machine learning model using unsupervised clustering: Predicting time to first treatment in chronic lymphocytic leukemia

David Chen, Gaurav Goyal, Ronald S. Go, Sameer A. Parikh, Che G. Ngufor

Research output: Contribution to journalArticlepeer-review

5 Scopus citations

Abstract

PURPOSE Time to event is an important aspect of clinical decision making. This is particularly true when diseases have highly heterogeneous presentations and prognoses, as in chronic lymphocytic lymphoma (CLL). Although machine learning methods can readily learn complex nonlinear relationships, many methods are criticized as inadequate because of limited interpretability. We propose using unsupervised clustering of the continuous output of machine learning models to provide discrete risk stratification for predicting time to first treatment in a cohort of patients with CLL. PATIENTS AND METHODS A total of 737 treatment-naïve patients with CLL diagnosed at Mayo Clinic were included in this study. We compared predictive abilities for two survival models (Cox proportional hazards and random survival forest) and four classification methods (logistic regression, support vector machines, random forest, and gradient boosting machine). Probability of treatment was then stratified. RESULTS Machine learning methods did not yield significantly more accurate predictions of time to first treatment. However, automated risk stratification provided by clustering was able to better differentiate patients who were at risk for treatment within 1 year than models developed using standard survival analysis techniques. CONCLUSION Clustering the posterior probabilities of machine learning models provides a way to better interpret machine learning models.

Original languageEnglish (US)
Pages (from-to)1-11
Number of pages11
JournalJCO Clinical Cancer Informatics
Volume3
DOIs
StatePublished - 2019

ASJC Scopus subject areas

  • Oncology
  • Health Informatics
  • Cancer Research

Fingerprint

Dive into the research topics of 'Improved interpretability of machine learning model using unsupervised clustering: Predicting time to first treatment in chronic lymphocytic leukemia'. Together they form a unique fingerprint.

Cite this