TY - JOUR
T1 - Assessment of Electronic Health Record for Cancer Research and Patient Care Through a Scoping Review of Cancer Natural Language Processing
AU - Wang, Liwei
AU - Fu, Sunyang
AU - Wen, Andrew
AU - Ruan, Xiaoyang
AU - He, Huan
AU - Liu, Sijia
AU - Moon, Sungrim
AU - Mai, Michelle
AU - Riaz, Irbaz B.
AU - Wang, Nan
AU - Yang, Ping
AU - Xu, Hua
AU - Warner, Jeremy L.
AU - Liu, Hongfang
PY - 2022/7/1
Y1 - 2022/7/1
N2 - PURPOSE: The advancement of natural language processing (NLP) has promoted the use of detailed textual data in electronic health records (EHRs) to support cancer research and to facilitate patient care. In this review, we aim to assess EHR for cancer research and patient care by using the Minimal Common Oncology Data Elements (mCODE), which is a community-driven effort to define a minimal set of data elements for cancer research and practice. Specifically, we aim to assess the alignment of NLP-extracted data elements with mCODE and review existing NLP methodologies for extracting said data elements. METHODS: Published literature studies were searched to retrieve cancer-related NLP articles that were written in English and published between January 2010 and September 2020 from main literature databases. After the retrieval, articles with EHRs as the data source were manually identified. A charting form was developed for relevant study analysis and used to categorize data including four main topics: metadata, EHR data and targeted cancer types, NLP methodology, and oncology data elements and standards. RESULTS: A total of 123 publications were selected finally and included in our analysis. We found that cancer research and patient care require some data elements beyond mCODE as expected. Transparency and reproductivity are not sufficient in NLP methods, and inconsistency in NLP evaluation exists. CONCLUSION: We conducted a comprehensive review of cancer NLP for research and patient care using EHRs data. Issues and barriers for wide adoption of cancer NLP were identified and discussed.
AB - PURPOSE: The advancement of natural language processing (NLP) has promoted the use of detailed textual data in electronic health records (EHRs) to support cancer research and to facilitate patient care. In this review, we aim to assess EHR for cancer research and patient care by using the Minimal Common Oncology Data Elements (mCODE), which is a community-driven effort to define a minimal set of data elements for cancer research and practice. Specifically, we aim to assess the alignment of NLP-extracted data elements with mCODE and review existing NLP methodologies for extracting said data elements. METHODS: Published literature studies were searched to retrieve cancer-related NLP articles that were written in English and published between January 2010 and September 2020 from main literature databases. After the retrieval, articles with EHRs as the data source were manually identified. A charting form was developed for relevant study analysis and used to categorize data including four main topics: metadata, EHR data and targeted cancer types, NLP methodology, and oncology data elements and standards. RESULTS: A total of 123 publications were selected finally and included in our analysis. We found that cancer research and patient care require some data elements beyond mCODE as expected. Transparency and reproductivity are not sufficient in NLP methods, and inconsistency in NLP evaluation exists. CONCLUSION: We conducted a comprehensive review of cancer NLP for research and patient care using EHRs data. Issues and barriers for wide adoption of cancer NLP were identified and discussed.
UR - http://www.scopus.com/inward/record.url?scp=85135428760&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85135428760&partnerID=8YFLogxK
U2 - 10.1200/CCI.22.00006
DO - 10.1200/CCI.22.00006
M3 - Review article
C2 - 35917480
AN - SCOPUS:85135428760
SN - 2473-4276
VL - 6
SP - e2200006
JO - JCO Clinical Cancer Informatics
JF - JCO Clinical Cancer Informatics
ER -