Developing customizable cancer information extraction modules for pathology reports using clamp

Ergin Soysal, Jeremy L. Warner, Jingqi Wang, Min Jiang, Krysten Harvey, Sandeep Kumar Jain, Xiao Dong, Hsing Yi Song, Harish Siddhanamatha, Liwei Wang, Qi Dai, Qingxia Chen, Xianglin Du, Cui Tao, Ping Yang, Joshua Charles Denny, Hongfang D Liu, Hua Xu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Natural language processing (NLP) technologies have been successfully applied to cancer research by enabling automated phenotypic information extraction from narratives in electronic health records (EHRs) such as pathology reports; however, developing customized NLP solutions requires substantial effort. To facilitate the adoption of NLP in cancer research, we have developed a set of customizable modules for extracting comprehensive types of cancer-related information in pathology reports (e.g., tumor size, tumor stage, and biomarkers), by leveraging the existing CLAMP system, which provides user-friendly interfaces for building customized NLP solutions for individual needs. Evaluation using annotated data at Vanderbilt University Medical Center showed that CLAMP-Cancer could extract diverse types of cancer information with good F-measures (0.80-0.98). We then applied CLAMP-Cancer to an information extraction task at Mayo Clinic and showed that we can quickly build a customized NLP system with comparable performance with an existing system at Mayo Clinic. CLAMP-Cancer is freely available for academic use.

Original languageEnglish (US)
Title of host publicationMEDINFO 2019
Subtitle of host publicationHealth and Wellbeing e-Networks for All - Proceedings of the 17th World Congress on Medical and Health Informatics
EditorsBrigitte Seroussi, Lucila Ohno-Machado, Lucila Ohno-Machado, Brigitte Seroussi
PublisherIOS Press
Pages1041-1045
Number of pages5
ISBN (Electronic)9781643680026
DOIs
StatePublished - Aug 21 2019
Event17th World Congress on Medical and Health Informatics, MEDINFO 2019 - Lyon, France
Duration: Aug 25 2019Aug 30 2019

Publication series

NameStudies in Health Technology and Informatics
Volume264
ISSN (Print)0926-9630
ISSN (Electronic)1879-8365

Conference

Conference17th World Congress on Medical and Health Informatics, MEDINFO 2019
CountryFrance
CityLyon
Period8/25/198/30/19

    Fingerprint

Keywords

  • Electronic Health Records
  • Information Storage and Retrieval
  • Natural Language Processing

ASJC Scopus subject areas

  • Biomedical Engineering
  • Health Informatics
  • Health Information Management

Cite this

Soysal, E., Warner, J. L., Wang, J., Jiang, M., Harvey, K., Jain, S. K., Dong, X., Song, H. Y., Siddhanamatha, H., Wang, L., Dai, Q., Chen, Q., Du, X., Tao, C., Yang, P., Denny, J. C., Liu, H. D., & Xu, H. (2019). Developing customizable cancer information extraction modules for pathology reports using clamp. In B. Seroussi, L. Ohno-Machado, L. Ohno-Machado, & B. Seroussi (Eds.), MEDINFO 2019: Health and Wellbeing e-Networks for All - Proceedings of the 17th World Congress on Medical and Health Informatics (pp. 1041-1045). (Studies in Health Technology and Informatics; Vol. 264). IOS Press. https://doi.org/10.3233/SHTI190383