TY - GEN
T1 - Developing customizable cancer information extraction modules for pathology reports using clamp
AU - Soysal, Ergin
AU - Warner, Jeremy L.
AU - Wang, Jingqi
AU - Jiang, Min
AU - Harvey, Krysten
AU - Jain, Sandeep Kumar
AU - Dong, Xiao
AU - Song, Hsing Yi
AU - Siddhanamatha, Harish
AU - Wang, Liwei
AU - Dai, Qi
AU - Chen, Qingxia
AU - Du, Xianglin
AU - Tao, Cui
AU - Yang, Ping
AU - Denny, Joshua Charles
AU - Liu, Hongfang
AU - Xu, Hua
N1 - Funding Information:
This study is partially supported by grants from NCI U24 CA194215, NIGMS R01 GM103859, NIGMS R01 GM102282, R01 LM011829, CPRIT R1307, and UTHealth Innovation for Cancer Prevention Research Training Program Pre-doctoral Fellowship (CPRIT # RP160015). Disclaimer: The content is solely the responsibility of the authors and does not necessarily represent the official views of the Cancer Prevention and Research Institute of Texas.
Publisher Copyright:
© 2019 International Medical Informatics Association (IMIA) and IOS Press.
PY - 2019/8/21
Y1 - 2019/8/21
N2 - Natural language processing (NLP) technologies have been successfully applied to cancer research by enabling automated phenotypic information extraction from narratives in electronic health records (EHRs) such as pathology reports; however, developing customized NLP solutions requires substantial effort. To facilitate the adoption of NLP in cancer research, we have developed a set of customizable modules for extracting comprehensive types of cancer-related information in pathology reports (e.g., tumor size, tumor stage, and biomarkers), by leveraging the existing CLAMP system, which provides user-friendly interfaces for building customized NLP solutions for individual needs. Evaluation using annotated data at Vanderbilt University Medical Center showed that CLAMP-Cancer could extract diverse types of cancer information with good F-measures (0.80-0.98). We then applied CLAMP-Cancer to an information extraction task at Mayo Clinic and showed that we can quickly build a customized NLP system with comparable performance with an existing system at Mayo Clinic. CLAMP-Cancer is freely available for academic use.
AB - Natural language processing (NLP) technologies have been successfully applied to cancer research by enabling automated phenotypic information extraction from narratives in electronic health records (EHRs) such as pathology reports; however, developing customized NLP solutions requires substantial effort. To facilitate the adoption of NLP in cancer research, we have developed a set of customizable modules for extracting comprehensive types of cancer-related information in pathology reports (e.g., tumor size, tumor stage, and biomarkers), by leveraging the existing CLAMP system, which provides user-friendly interfaces for building customized NLP solutions for individual needs. Evaluation using annotated data at Vanderbilt University Medical Center showed that CLAMP-Cancer could extract diverse types of cancer information with good F-measures (0.80-0.98). We then applied CLAMP-Cancer to an information extraction task at Mayo Clinic and showed that we can quickly build a customized NLP system with comparable performance with an existing system at Mayo Clinic. CLAMP-Cancer is freely available for academic use.
KW - Electronic Health Records
KW - Information Storage and Retrieval
KW - Natural Language Processing
UR - http://www.scopus.com/inward/record.url?scp=85071496267&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85071496267&partnerID=8YFLogxK
U2 - 10.3233/SHTI190383
DO - 10.3233/SHTI190383
M3 - Conference contribution
C2 - 31438083
AN - SCOPUS:85071496267
T3 - Studies in Health Technology and Informatics
SP - 1041
EP - 1045
BT - MEDINFO 2019
A2 - Seroussi, Brigitte
A2 - Ohno-Machado, Lucila
A2 - Ohno-Machado, Lucila
A2 - Seroussi, Brigitte
PB - IOS Press
T2 - 17th World Congress on Medical and Health Informatics, MEDINFO 2019
Y2 - 25 August 2019 through 30 August 2019
ER -