TY - JOUR
T1 - Improved interpretability of machine learning model using unsupervised clustering
T2 - Predicting time to first treatment in chronic lymphocytic leukemia
AU - Chen, David
AU - Goyal, Gaurav
AU - Go, Ronald S.
AU - Parikh, Sameer A.
AU - Ngufor, Che G.
N1 - Publisher Copyright:
© 2019 by American Society of Clinical Oncology
PY - 2019
Y1 - 2019
N2 - PURPOSE Time to event is an important aspect of clinical decision making. This is particularly true when diseases have highly heterogeneous presentations and prognoses, as in chronic lymphocytic lymphoma (CLL). Although machine learning methods can readily learn complex nonlinear relationships, many methods are criticized as inadequate because of limited interpretability. We propose using unsupervised clustering of the continuous output of machine learning models to provide discrete risk stratification for predicting time to first treatment in a cohort of patients with CLL. PATIENTS AND METHODS A total of 737 treatment-naïve patients with CLL diagnosed at Mayo Clinic were included in this study. We compared predictive abilities for two survival models (Cox proportional hazards and random survival forest) and four classification methods (logistic regression, support vector machines, random forest, and gradient boosting machine). Probability of treatment was then stratified. RESULTS Machine learning methods did not yield significantly more accurate predictions of time to first treatment. However, automated risk stratification provided by clustering was able to better differentiate patients who were at risk for treatment within 1 year than models developed using standard survival analysis techniques. CONCLUSION Clustering the posterior probabilities of machine learning models provides a way to better interpret machine learning models.
AB - PURPOSE Time to event is an important aspect of clinical decision making. This is particularly true when diseases have highly heterogeneous presentations and prognoses, as in chronic lymphocytic lymphoma (CLL). Although machine learning methods can readily learn complex nonlinear relationships, many methods are criticized as inadequate because of limited interpretability. We propose using unsupervised clustering of the continuous output of machine learning models to provide discrete risk stratification for predicting time to first treatment in a cohort of patients with CLL. PATIENTS AND METHODS A total of 737 treatment-naïve patients with CLL diagnosed at Mayo Clinic were included in this study. We compared predictive abilities for two survival models (Cox proportional hazards and random survival forest) and four classification methods (logistic regression, support vector machines, random forest, and gradient boosting machine). Probability of treatment was then stratified. RESULTS Machine learning methods did not yield significantly more accurate predictions of time to first treatment. However, automated risk stratification provided by clustering was able to better differentiate patients who were at risk for treatment within 1 year than models developed using standard survival analysis techniques. CONCLUSION Clustering the posterior probabilities of machine learning models provides a way to better interpret machine learning models.
UR - http://www.scopus.com/inward/record.url?scp=85075391411&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85075391411&partnerID=8YFLogxK
U2 - 10.1200/CCI.18.00137
DO - 10.1200/CCI.18.00137
M3 - Article
C2 - 31112417
AN - SCOPUS:85075391411
SN - 2473-4276
VL - 3
SP - 1
EP - 11
JO - JCO clinical cancer informatics
JF - JCO clinical cancer informatics
ER -