Background: One of the common causes of sudden cardiac death (SCD) in young people is hypertrophic cardiomyopathy (HCM) and the primary prevention of SCD is with an implantable cardioverter defibrillators (ICD). Concerning the incidence of appropriate ICD therapy and the complications associated with ICD implantation and discharge, patients with implanted ICDs are closely monitored and interrogation reports are generated from clinical consultations. Methods: In this study, we compared the performance of structured device data and unstructured interrogation reports for extracting information of ICD therapy and heart rhythm. We sampled 687 reports with a gold standard generated through manual chart review. A rule-based natural language processing (NLP) system was developed using 480 reports and the information in the corresponding device data was aggregated for the task. We compared the performance of the NLP system with information aggregated from structured device data using the remaining 207 reports. Results: The rule-based NLP system achieved F-measure of 0.92 and 0.98 for ICD therapy and heart rhythm while the performance of aggregating device data was significantly lower with F-measure of 0.78 and 0.45 respectively. Limitations of using only structured device data include no differentiation of real events from management events, data availability, and disparate perspectives of vendor and data granularity while using interrogation reports needs to overcome non-representative keyword/pattern and contextual errors. Conclusions: Extracting phenotyping information from data generated in real-world requires the incorporation of medical knowledge. It is essential to analyze, compare, and harmonize multiple data sources for real-world evidence generation.
|Original language||English (US)|
|State||Published - Jan 22 2020|
ASJC Scopus subject areas