Explanatory Analysis of a Machine Learning Model to Identify Hypertrophic Cardiomyopathy Patients from EHR Using Diagnostic Codes

Nasibeh Zanjirani Farahani, Divaakar Siva Baala Sundaram, Moein Enayati, Shivaram Poigai Arunachalam, Kalyan Pasupathy, Adelaide M. Arruda-Olson

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Scopus citations

Abstract

Hypertrophic cardiomyopathy (HCM) is a genetic heart disease that is the leading cause of sudden cardiac death (SCD) in young adults. Despite the well-known risk factors and existing clinical practice guidelines, HCM patients are underdiagnosed and sub-optimally managed. Developing machine learning models on electronic health record (EHR) data can help in better diagnosis of HCM and thus improve hundreds of patient lives. Automated phenotyping using HCM billing codes has received limited attention in the literature with a small number of prior publications. In this paper, we propose a novel predictive model that helps physicians in making diagnostic decisions, by means of information learned from historical data of similar patients.We assembled a cohort of 11,562 patients with known or suspected HCM who have visited Mayo Clinic between the years 1995 to 2019. All existing billing codes of these patients were extracted from the EHR data warehouse. Target ground truth labeling for training the machine learning model was provided by confirmed HCM diagnosis using the gold standard imaging tests for HCM diagnosis echocardiography (echo), or cardiac magnetic resonance (CMR) imaging. As the result, patients were labeled into three categories of 'yes' definite HCM 'no' HCM phenotype and 'possible HCM' after a manual review of medical records and imaging tests.In this study, a random forest was adopted to investigate the predictive performance of billing codes for the identification of HCM patients due to its practical application and expected accuracy in a wide range of use cases. Our model performed well in finding patients with 'yes definite 'possible' and 'no' HCM with an accuracy of 71%, weighted recall of 70%, the precision of 75%, and weighted F1 score of 72%. Furthermore, we provided visualizations based on multidimensional scaling and the principal component analysis to provide insights for clinicians' interpretation. This model can be used for the identification of HCM patients using their EHR data, and help clinicians in their diagnosis decision making.

Original languageEnglish (US)
Title of host publicationProceedings - 2020 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2020
EditorsTaesung Park, Young-Rae Cho, Xiaohua Tony Hu, Illhoi Yoo, Hyun Goo Woo, Jianxin Wang, Julio Facelli, Seungyoon Nam, Mingon Kang
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1932-1937
Number of pages6
ISBN (Electronic)9781728162157
DOIs
StatePublished - Dec 16 2020
Event2020 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2020 - Virtual, Seoul, Korea, Republic of
Duration: Dec 16 2020Dec 19 2020

Publication series

NameProceedings - 2020 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2020

Conference

Conference2020 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2020
Country/TerritoryKorea, Republic of
CityVirtual, Seoul
Period12/16/2012/19/20

Keywords

  • billing code
  • classification
  • decision making
  • diagnostic codes
  • electronic health records (EHR)
  • hypertrophic cardiomyopathy (HCM)
  • machine learning
  • random forest

ASJC Scopus subject areas

  • Computer Science Applications
  • Information Systems and Management
  • Medicine (miscellaneous)
  • Health Informatics

Fingerprint

Dive into the research topics of 'Explanatory Analysis of a Machine Learning Model to Identify Hypertrophic Cardiomyopathy Patients from EHR Using Diagnostic Codes'. Together they form a unique fingerprint.

Cite this