Discerning tumor status from unstructured MRI reports-completeness of information in existing reports and utility of automated natural language processing

Lionel T.E. Cheng; Jiaping Zheng; Guergana K. Savova; Bradley J. Erickson

doi:10.1007/s10278-009-9215-7

Discerning tumor status from unstructured MRI reports-completeness of information in existing reports and utility of automated natural language processing

Lionel T.E. Cheng, Jiaping Zheng, Guergana K. Savova, Bradley J. Erickson

Radiology

Research output: Contribution to journal › Article › peer-review

55 Scopus citations

Abstract

Information in electronic medical records is often in an unstructured free-text format. This format presents challenges for expedient data retrieval and may fail to convey important findings. Natural language processing (NLP) is an emerging technique for rapid and efficient clinical data retrieval. While proven in disease detection, the utility of NLP in discerning disease progression from free-text reports is untested. We aimed to (1) assess whether unstructured radiology reports contained sufficient information for tumor status classification; (2) develop an NLP-based data extraction tool to determine tumor status from unstructured reports; and (3) compare NLP and human tumor status classification outcomes. Consecutive follow-up brain tumor magnetic resonance imaging reports (2000 - 2007) from a tertiary center were manually annotated using consensus guidelines on tumor status. Reports were randomized to NLP training (70%) or testing (30%) groups. The NLP tool utilized a support vector machines model with statistical and rule-based outcomes. Most reports had sufficient information for tumor status classification, although 0.8% did not describe status despite reference to prior examinations. Tumor size was unreported in 68.7% of documents, while 50.3% lacked data on change magnitude when there was detectable progression or regression. Using retrospective human classification as the gold standard, NLP achieved 80.6% sensitivity and 91.6% specificity for tumor status determination (mean positive predictive value, 82.4%; negative predictive value, 92.0%). In conclusion, most reports contained sufficient information for tumor status determination, though variable features were used to describe status. NLP demonstrated good accuracy for tumor status classification and may have novel application for automated disease status classification from electronic databases.

Original language	English (US)
Pages (from-to)	119-132
Number of pages	14
Journal	Journal of Digital Imaging
Volume	23
Issue number	2
DOIs	https://doi.org/10.1007/s10278-009-9215-7
State	Published - Apr 2010

Keywords

Natural language processing
Radiology reports
Structured
Tumor status
Unstructured

ASJC Scopus subject areas

Radiological and Ultrasound Technology
Radiology Nuclear Medicine and imaging
Computer Science Applications

Access to Document

10.1007/s10278-009-9215-7

Cite this

@article{75999106a495486282b9b690b90991b3,

title = "Discerning tumor status from unstructured MRI reports-completeness of information in existing reports and utility of automated natural language processing",

abstract = "Information in electronic medical records is often in an unstructured free-text format. This format presents challenges for expedient data retrieval and may fail to convey important findings. Natural language processing (NLP) is an emerging technique for rapid and efficient clinical data retrieval. While proven in disease detection, the utility of NLP in discerning disease progression from free-text reports is untested. We aimed to (1) assess whether unstructured radiology reports contained sufficient information for tumor status classification; (2) develop an NLP-based data extraction tool to determine tumor status from unstructured reports; and (3) compare NLP and human tumor status classification outcomes. Consecutive follow-up brain tumor magnetic resonance imaging reports (2000 - 2007) from a tertiary center were manually annotated using consensus guidelines on tumor status. Reports were randomized to NLP training (70%) or testing (30%) groups. The NLP tool utilized a support vector machines model with statistical and rule-based outcomes. Most reports had sufficient information for tumor status classification, although 0.8% did not describe status despite reference to prior examinations. Tumor size was unreported in 68.7% of documents, while 50.3% lacked data on change magnitude when there was detectable progression or regression. Using retrospective human classification as the gold standard, NLP achieved 80.6% sensitivity and 91.6% specificity for tumor status determination (mean positive predictive value, 82.4%; negative predictive value, 92.0%). In conclusion, most reports contained sufficient information for tumor status determination, though variable features were used to describe status. NLP demonstrated good accuracy for tumor status classification and may have novel application for automated disease status classification from electronic databases.",

keywords = "Natural language processing, Radiology reports, Structured, Tumor status, Unstructured",

author = "Cheng, {Lionel T.E.} and Jiaping Zheng and Savova, {Guergana K.} and Erickson, {Bradley J.}",

year = "2010",

month = apr,

doi = "10.1007/s10278-009-9215-7",

language = "English (US)",

volume = "23",

pages = "119--132",

journal = "Journal of Digital Imaging",

issn = "0897-1889",

publisher = "Springer New York",

number = "2",

}

TY - JOUR

T1 - Discerning tumor status from unstructured MRI reports-completeness of information in existing reports and utility of automated natural language processing

AU - Cheng, Lionel T.E.

AU - Zheng, Jiaping

AU - Savova, Guergana K.

AU - Erickson, Bradley J.

PY - 2010/4

Y1 - 2010/4

N2 - Information in electronic medical records is often in an unstructured free-text format. This format presents challenges for expedient data retrieval and may fail to convey important findings. Natural language processing (NLP) is an emerging technique for rapid and efficient clinical data retrieval. While proven in disease detection, the utility of NLP in discerning disease progression from free-text reports is untested. We aimed to (1) assess whether unstructured radiology reports contained sufficient information for tumor status classification; (2) develop an NLP-based data extraction tool to determine tumor status from unstructured reports; and (3) compare NLP and human tumor status classification outcomes. Consecutive follow-up brain tumor magnetic resonance imaging reports (2000 - 2007) from a tertiary center were manually annotated using consensus guidelines on tumor status. Reports were randomized to NLP training (70%) or testing (30%) groups. The NLP tool utilized a support vector machines model with statistical and rule-based outcomes. Most reports had sufficient information for tumor status classification, although 0.8% did not describe status despite reference to prior examinations. Tumor size was unreported in 68.7% of documents, while 50.3% lacked data on change magnitude when there was detectable progression or regression. Using retrospective human classification as the gold standard, NLP achieved 80.6% sensitivity and 91.6% specificity for tumor status determination (mean positive predictive value, 82.4%; negative predictive value, 92.0%). In conclusion, most reports contained sufficient information for tumor status determination, though variable features were used to describe status. NLP demonstrated good accuracy for tumor status classification and may have novel application for automated disease status classification from electronic databases.

AB - Information in electronic medical records is often in an unstructured free-text format. This format presents challenges for expedient data retrieval and may fail to convey important findings. Natural language processing (NLP) is an emerging technique for rapid and efficient clinical data retrieval. While proven in disease detection, the utility of NLP in discerning disease progression from free-text reports is untested. We aimed to (1) assess whether unstructured radiology reports contained sufficient information for tumor status classification; (2) develop an NLP-based data extraction tool to determine tumor status from unstructured reports; and (3) compare NLP and human tumor status classification outcomes. Consecutive follow-up brain tumor magnetic resonance imaging reports (2000 - 2007) from a tertiary center were manually annotated using consensus guidelines on tumor status. Reports were randomized to NLP training (70%) or testing (30%) groups. The NLP tool utilized a support vector machines model with statistical and rule-based outcomes. Most reports had sufficient information for tumor status classification, although 0.8% did not describe status despite reference to prior examinations. Tumor size was unreported in 68.7% of documents, while 50.3% lacked data on change magnitude when there was detectable progression or regression. Using retrospective human classification as the gold standard, NLP achieved 80.6% sensitivity and 91.6% specificity for tumor status determination (mean positive predictive value, 82.4%; negative predictive value, 92.0%). In conclusion, most reports contained sufficient information for tumor status determination, though variable features were used to describe status. NLP demonstrated good accuracy for tumor status classification and may have novel application for automated disease status classification from electronic databases.

KW - Natural language processing

KW - Radiology reports

KW - Structured

KW - Tumor status

KW - Unstructured

UR - http://www.scopus.com/inward/record.url?scp=77952241648&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77952241648&partnerID=8YFLogxK

U2 - 10.1007/s10278-009-9215-7

DO - 10.1007/s10278-009-9215-7

M3 - Article

C2 - 19484309

AN - SCOPUS:77952241648

SN - 0897-1889

VL - 23

SP - 119

EP - 132

JO - Journal of Digital Imaging

JF - Journal of Digital Imaging

IS - 2

ER -

Discerning tumor status from unstructured MRI reports-completeness of information in existing reports and utility of automated natural language processing

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this