Assessment of Electronic Health Record for Cancer Research and Patient Care Through a Scoping Review of Cancer Natural Language Processing

Liwei Wang; Sunyang Fu; Andrew Wen; Xiaoyang Ruan; Huan He; Sijia Liu; Sungrim Moon; Michelle Mai; Irbaz B. Riaz; Nan Wang; Ping Yang; Hua Xu; Jeremy L. Warner; Hongfang Liu

doi:10.1200/CCI.22.00006

Assessment of Electronic Health Record for Cancer Research and Patient Care Through a Scoping Review of Cancer Natural Language Processing

Liwei Wang, Sunyang Fu, Andrew Wen, Xiaoyang Ruan, Huan He, Sijia Liu, Sungrim Moon, Michelle Mai, Irbaz B. Riaz, Nan Wang, Ping Yang, Hua Xu, Jeremy L. Warner, Hongfang Liu

Research output: Contribution to journal › Review article › peer-review

Abstract

PURPOSE: The advancement of natural language processing (NLP) has promoted the use of detailed textual data in electronic health records (EHRs) to support cancer research and to facilitate patient care. In this review, we aim to assess EHR for cancer research and patient care by using the Minimal Common Oncology Data Elements (mCODE), which is a community-driven effort to define a minimal set of data elements for cancer research and practice. Specifically, we aim to assess the alignment of NLP-extracted data elements with mCODE and review existing NLP methodologies for extracting said data elements. METHODS: Published literature studies were searched to retrieve cancer-related NLP articles that were written in English and published between January 2010 and September 2020 from main literature databases. After the retrieval, articles with EHRs as the data source were manually identified. A charting form was developed for relevant study analysis and used to categorize data including four main topics: metadata, EHR data and targeted cancer types, NLP methodology, and oncology data elements and standards. RESULTS: A total of 123 publications were selected finally and included in our analysis. We found that cancer research and patient care require some data elements beyond mCODE as expected. Transparency and reproductivity are not sufficient in NLP methods, and inconsistency in NLP evaluation exists. CONCLUSION: We conducted a comprehensive review of cancer NLP for research and patient care using EHRs data. Issues and barriers for wide adoption of cancer NLP were identified and discussed.

Original language	English (US)
Pages (from-to)	e2200006
Journal	JCO Clinical Cancer Informatics
Volume	6
DOIs	https://doi.org/10.1200/CCI.22.00006
State	Published - Jul 1 2022

ASJC Scopus subject areas

General Medicine

Access to Document

10.1200/CCI.22.00006

Cite this

@article{52150fa10be44139877fa462583cdcf1,

title = "Assessment of Electronic Health Record for Cancer Research and Patient Care Through a Scoping Review of Cancer Natural Language Processing",

abstract = "PURPOSE: The advancement of natural language processing (NLP) has promoted the use of detailed textual data in electronic health records (EHRs) to support cancer research and to facilitate patient care. In this review, we aim to assess EHR for cancer research and patient care by using the Minimal Common Oncology Data Elements (mCODE), which is a community-driven effort to define a minimal set of data elements for cancer research and practice. Specifically, we aim to assess the alignment of NLP-extracted data elements with mCODE and review existing NLP methodologies for extracting said data elements. METHODS: Published literature studies were searched to retrieve cancer-related NLP articles that were written in English and published between January 2010 and September 2020 from main literature databases. After the retrieval, articles with EHRs as the data source were manually identified. A charting form was developed for relevant study analysis and used to categorize data including four main topics: metadata, EHR data and targeted cancer types, NLP methodology, and oncology data elements and standards. RESULTS: A total of 123 publications were selected finally and included in our analysis. We found that cancer research and patient care require some data elements beyond mCODE as expected. Transparency and reproductivity are not sufficient in NLP methods, and inconsistency in NLP evaluation exists. CONCLUSION: We conducted a comprehensive review of cancer NLP for research and patient care using EHRs data. Issues and barriers for wide adoption of cancer NLP were identified and discussed.",

author = "Liwei Wang and Sunyang Fu and Andrew Wen and Xiaoyang Ruan and Huan He and Sijia Liu and Sungrim Moon and Michelle Mai and Riaz, {Irbaz B.} and Nan Wang and Ping Yang and Hua Xu and Warner, {Jeremy L.} and Hongfang Liu",

year = "2022",

month = jul,

day = "1",

doi = "10.1200/CCI.22.00006",

language = "English (US)",

volume = "6",

pages = "e2200006",

journal = "JCO Clinical Cancer Informatics",

issn = "2473-4276",

publisher = "American Society of Clinical Oncology",

}

TY - JOUR

T1 - Assessment of Electronic Health Record for Cancer Research and Patient Care Through a Scoping Review of Cancer Natural Language Processing

AU - Wang, Liwei

AU - Fu, Sunyang

AU - Wen, Andrew

AU - Ruan, Xiaoyang

AU - He, Huan

AU - Liu, Sijia

AU - Moon, Sungrim

AU - Mai, Michelle

AU - Riaz, Irbaz B.

AU - Wang, Nan

AU - Yang, Ping

AU - Xu, Hua

AU - Warner, Jeremy L.

AU - Liu, Hongfang

PY - 2022/7/1

Y1 - 2022/7/1

N2 - PURPOSE: The advancement of natural language processing (NLP) has promoted the use of detailed textual data in electronic health records (EHRs) to support cancer research and to facilitate patient care. In this review, we aim to assess EHR for cancer research and patient care by using the Minimal Common Oncology Data Elements (mCODE), which is a community-driven effort to define a minimal set of data elements for cancer research and practice. Specifically, we aim to assess the alignment of NLP-extracted data elements with mCODE and review existing NLP methodologies for extracting said data elements. METHODS: Published literature studies were searched to retrieve cancer-related NLP articles that were written in English and published between January 2010 and September 2020 from main literature databases. After the retrieval, articles with EHRs as the data source were manually identified. A charting form was developed for relevant study analysis and used to categorize data including four main topics: metadata, EHR data and targeted cancer types, NLP methodology, and oncology data elements and standards. RESULTS: A total of 123 publications were selected finally and included in our analysis. We found that cancer research and patient care require some data elements beyond mCODE as expected. Transparency and reproductivity are not sufficient in NLP methods, and inconsistency in NLP evaluation exists. CONCLUSION: We conducted a comprehensive review of cancer NLP for research and patient care using EHRs data. Issues and barriers for wide adoption of cancer NLP were identified and discussed.

AB - PURPOSE: The advancement of natural language processing (NLP) has promoted the use of detailed textual data in electronic health records (EHRs) to support cancer research and to facilitate patient care. In this review, we aim to assess EHR for cancer research and patient care by using the Minimal Common Oncology Data Elements (mCODE), which is a community-driven effort to define a minimal set of data elements for cancer research and practice. Specifically, we aim to assess the alignment of NLP-extracted data elements with mCODE and review existing NLP methodologies for extracting said data elements. METHODS: Published literature studies were searched to retrieve cancer-related NLP articles that were written in English and published between January 2010 and September 2020 from main literature databases. After the retrieval, articles with EHRs as the data source were manually identified. A charting form was developed for relevant study analysis and used to categorize data including four main topics: metadata, EHR data and targeted cancer types, NLP methodology, and oncology data elements and standards. RESULTS: A total of 123 publications were selected finally and included in our analysis. We found that cancer research and patient care require some data elements beyond mCODE as expected. Transparency and reproductivity are not sufficient in NLP methods, and inconsistency in NLP evaluation exists. CONCLUSION: We conducted a comprehensive review of cancer NLP for research and patient care using EHRs data. Issues and barriers for wide adoption of cancer NLP were identified and discussed.

UR - http://www.scopus.com/inward/record.url?scp=85135428760&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85135428760&partnerID=8YFLogxK

U2 - 10.1200/CCI.22.00006

DO - 10.1200/CCI.22.00006

M3 - Review article

C2 - 35917480

AN - SCOPUS:85135428760

SN - 2473-4276

VL - 6

SP - e2200006

JO - JCO Clinical Cancer Informatics

JF - JCO Clinical Cancer Informatics

ER -

Assessment of Electronic Health Record for Cancer Research and Patient Care Through a Scoping Review of Cancer Natural Language Processing

Abstract

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this