The 2019 national natural language processing (NLP) clinical challenges (n2c2)/Open health NLP (OHNLP) shared task on clinical concept normalization for clinical records

Sam Henry; Yanshan Wang; Feichen Shen; Ozlem Uzuner

doi:10.1093/jamia/ocaa106

The 2019 national natural language processing (NLP) clinical challenges (n2c2)/Open health NLP (OHNLP) shared task on clinical concept normalization for clinical records

Sam Henry, Yanshan Wang, Feichen Shen, Ozlem Uzuner

Digital Health Sciences

Research output: Contribution to journal › Review article › peer-review

2 Scopus citations

Abstract

Objective: The 2019 National Natural language processing (NLP) Clinical Challenges (n2c2)/Open Health NLP (OHNLP) shared task track 3, focused on medical concept normalization (MCN) in clinical records. This track aimed to assess the state of the art in identifying and matching salient medical concepts to a controlled vocabulary. In this paper, we describe the task, describe the data set used, compare the participating systems, present results, identify the strengths and limitations of the current state of the art, and identify directions for future research. Materials and Methods: Participating teams were provided with narrative discharge summaries in which text spans corresponding to medical concepts were identified. This paper refers to these text spans as mentions. Teams were tasked with normalizing these mentions to concepts, represented by concept unique identifiers, within the Unified Medical Language System. Submitted systems represented 4 broad categories of approaches: cascading dictionary matching, cosine distance, deep learning, and retrieve-and-rank systems. Disambiguation modules were common across all approaches. Results: A total of 33 teams participated in the MCN task. The best-performing team achieved an accuracy of 0.8526. The median and mean performances among all teams were 0.7733 and 0.7426, respectively. Conclusions: Overall performance among the top 10 teams was high. However, several mention types were challenging for all teams. These included mentions requiring disambiguation of misspelled words, acronyms, abbreviations, and mentions with more than 1 possible semantic type. Also challenging were complex mentions of long, multi-word terms that may require new ways of extracting and representing mention meaning, the use of domain knowledge, parse trees, or hand-crafted rules.

Original language	English (US)
Pages (from-to)	1529-1537
Number of pages	9
Journal	Journal of the American Medical Informatics Association
Volume	27
Issue number	10
DOIs	https://doi.org/10.1093/jamia/ocaa106
State	Published - Oct 1 2020

Keywords

Clinical narratives
Concept normalization
Machine learning
Natural language processing

ASJC Scopus subject areas

Health Informatics

Access to Document

10.1093/jamia/ocaa106

Cite this

The 2019 national natural language processing (NLP) clinical challenges (n2c2)/Open health NLP (OHNLP) shared task on clinical concept normalization for clinical records. / Henry, Sam; Wang, Yanshan; Shen, Feichen et al.
In: Journal of the American Medical Informatics Association, Vol. 27, No. 10, 01.10.2020, p. 1529-1537.

Research output: Contribution to journal › Review article › peer-review

@article{613a4bf3262541429c622f86dffbf282,

title = "The 2019 national natural language processing (NLP) clinical challenges (n2c2)/Open health NLP (OHNLP) shared task on clinical concept normalization for clinical records",

abstract = "Objective: The 2019 National Natural language processing (NLP) Clinical Challenges (n2c2)/Open Health NLP (OHNLP) shared task track 3, focused on medical concept normalization (MCN) in clinical records. This track aimed to assess the state of the art in identifying and matching salient medical concepts to a controlled vocabulary. In this paper, we describe the task, describe the data set used, compare the participating systems, present results, identify the strengths and limitations of the current state of the art, and identify directions for future research. Materials and Methods: Participating teams were provided with narrative discharge summaries in which text spans corresponding to medical concepts were identified. This paper refers to these text spans as mentions. Teams were tasked with normalizing these mentions to concepts, represented by concept unique identifiers, within the Unified Medical Language System. Submitted systems represented 4 broad categories of approaches: cascading dictionary matching, cosine distance, deep learning, and retrieve-and-rank systems. Disambiguation modules were common across all approaches. Results: A total of 33 teams participated in the MCN task. The best-performing team achieved an accuracy of 0.8526. The median and mean performances among all teams were 0.7733 and 0.7426, respectively. Conclusions: Overall performance among the top 10 teams was high. However, several mention types were challenging for all teams. These included mentions requiring disambiguation of misspelled words, acronyms, abbreviations, and mentions with more than 1 possible semantic type. Also challenging were complex mentions of long, multi-word terms that may require new ways of extracting and representing mention meaning, the use of domain knowledge, parse trees, or hand-crafted rules.",

keywords = "Clinical narratives, Concept normalization, Machine learning, Natural language processing",

author = "Sam Henry and Yanshan Wang and Feichen Shen and Ozlem Uzuner",

note = "Publisher Copyright: {\textcopyright} The Author(s) 2020.",

year = "2020",

month = oct,

day = "1",

doi = "10.1093/jamia/ocaa106",

language = "English (US)",

volume = "27",

pages = "1529--1537",

journal = "Journal of the American Medical Informatics Association",

issn = "1067-5027",

publisher = "Oxford University Press",

number = "10",

}

TY - JOUR

T1 - The 2019 national natural language processing (NLP) clinical challenges (n2c2)/Open health NLP (OHNLP) shared task on clinical concept normalization for clinical records

AU - Henry, Sam

AU - Wang, Yanshan

AU - Shen, Feichen

AU - Uzuner, Ozlem

N1 - Publisher Copyright: © The Author(s) 2020.

PY - 2020/10/1

Y1 - 2020/10/1

N2 - Objective: The 2019 National Natural language processing (NLP) Clinical Challenges (n2c2)/Open Health NLP (OHNLP) shared task track 3, focused on medical concept normalization (MCN) in clinical records. This track aimed to assess the state of the art in identifying and matching salient medical concepts to a controlled vocabulary. In this paper, we describe the task, describe the data set used, compare the participating systems, present results, identify the strengths and limitations of the current state of the art, and identify directions for future research. Materials and Methods: Participating teams were provided with narrative discharge summaries in which text spans corresponding to medical concepts were identified. This paper refers to these text spans as mentions. Teams were tasked with normalizing these mentions to concepts, represented by concept unique identifiers, within the Unified Medical Language System. Submitted systems represented 4 broad categories of approaches: cascading dictionary matching, cosine distance, deep learning, and retrieve-and-rank systems. Disambiguation modules were common across all approaches. Results: A total of 33 teams participated in the MCN task. The best-performing team achieved an accuracy of 0.8526. The median and mean performances among all teams were 0.7733 and 0.7426, respectively. Conclusions: Overall performance among the top 10 teams was high. However, several mention types were challenging for all teams. These included mentions requiring disambiguation of misspelled words, acronyms, abbreviations, and mentions with more than 1 possible semantic type. Also challenging were complex mentions of long, multi-word terms that may require new ways of extracting and representing mention meaning, the use of domain knowledge, parse trees, or hand-crafted rules.

AB - Objective: The 2019 National Natural language processing (NLP) Clinical Challenges (n2c2)/Open Health NLP (OHNLP) shared task track 3, focused on medical concept normalization (MCN) in clinical records. This track aimed to assess the state of the art in identifying and matching salient medical concepts to a controlled vocabulary. In this paper, we describe the task, describe the data set used, compare the participating systems, present results, identify the strengths and limitations of the current state of the art, and identify directions for future research. Materials and Methods: Participating teams were provided with narrative discharge summaries in which text spans corresponding to medical concepts were identified. This paper refers to these text spans as mentions. Teams were tasked with normalizing these mentions to concepts, represented by concept unique identifiers, within the Unified Medical Language System. Submitted systems represented 4 broad categories of approaches: cascading dictionary matching, cosine distance, deep learning, and retrieve-and-rank systems. Disambiguation modules were common across all approaches. Results: A total of 33 teams participated in the MCN task. The best-performing team achieved an accuracy of 0.8526. The median and mean performances among all teams were 0.7733 and 0.7426, respectively. Conclusions: Overall performance among the top 10 teams was high. However, several mention types were challenging for all teams. These included mentions requiring disambiguation of misspelled words, acronyms, abbreviations, and mentions with more than 1 possible semantic type. Also challenging were complex mentions of long, multi-word terms that may require new ways of extracting and representing mention meaning, the use of domain knowledge, parse trees, or hand-crafted rules.

KW - Clinical narratives

KW - Concept normalization

KW - Machine learning

KW - Natural language processing

UR - http://www.scopus.com/inward/record.url?scp=85093538858&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85093538858&partnerID=8YFLogxK

U2 - 10.1093/jamia/ocaa106

DO - 10.1093/jamia/ocaa106

M3 - Review article

C2 - 32968800

AN - SCOPUS:85093538858

SN - 1067-5027

VL - 27

SP - 1529

EP - 1537

JO - Journal of the American Medical Informatics Association

JF - Journal of the American Medical Informatics Association

IS - 10

ER -

The 2019 national natural language processing (NLP) clinical challenges (n2c2)/Open health NLP (OHNLP) shared task on clinical concept normalization for clinical records

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this