Word sense disambiguation across two domains: Biomedical literature and clinical notes

Guergana K. Savova, Anni R. Coden, Igor L. Sominsky, Rie Johnson, Philip V. Ogren, Piet C de Groen, Christopher G. Chute

Research output: Contribution to journalArticle

36 Citations (Scopus)

Abstract

The aim of this study is to explore the word sense disambiguation (WSD) problem across two biomedical domains-biomedical literature and clinical notes. A supervised machine learning technique was used for the WSD task. One of the challenges addressed is the creation of a suitable clinical corpus with manual sense annotations. This corpus in conjunction with the WSD set from the National Library of Medicine provided the basis for the evaluation of our method across multiple domains and for the comparison of our results to published ones. Noteworthy is that only 20% of the most relevant ambiguous terms within a domain overlap between the two domains, having more senses associated with them in the clinical space than in the biomedical literature space. Experimentation with 28 different feature sets rendered a system achieving an average F-score of 0.82 on the clinical data and 0.86 on the biomedical literature.

Original languageEnglish (US)
Pages (from-to)1088-1100
Number of pages13
JournalJournal of Biomedical Informatics
Volume41
Issue number6
DOIs
StatePublished - Dec 2008

Fingerprint

National Library of Medicine (U.S.)
Medicine
Learning systems
Supervised Machine Learning

Keywords

  • Artificial intelligence
  • Biomedical natural language processing
  • Information extraction
  • Machine learning
  • Natural language processing
  • Word sense disambiguation

ASJC Scopus subject areas

  • Computer Science Applications
  • Health Informatics

Cite this

Savova, G. K., Coden, A. R., Sominsky, I. L., Johnson, R., Ogren, P. V., Groen, P. C. D., & Chute, C. G. (2008). Word sense disambiguation across two domains: Biomedical literature and clinical notes. Journal of Biomedical Informatics, 41(6), 1088-1100. https://doi.org/10.1016/j.jbi.2008.02.003

Word sense disambiguation across two domains : Biomedical literature and clinical notes. / Savova, Guergana K.; Coden, Anni R.; Sominsky, Igor L.; Johnson, Rie; Ogren, Philip V.; Groen, Piet C de; Chute, Christopher G.

In: Journal of Biomedical Informatics, Vol. 41, No. 6, 12.2008, p. 1088-1100.

Research output: Contribution to journalArticle

Savova, GK, Coden, AR, Sominsky, IL, Johnson, R, Ogren, PV, Groen, PCD & Chute, CG 2008, 'Word sense disambiguation across two domains: Biomedical literature and clinical notes', Journal of Biomedical Informatics, vol. 41, no. 6, pp. 1088-1100. https://doi.org/10.1016/j.jbi.2008.02.003
Savova, Guergana K. ; Coden, Anni R. ; Sominsky, Igor L. ; Johnson, Rie ; Ogren, Philip V. ; Groen, Piet C de ; Chute, Christopher G. / Word sense disambiguation across two domains : Biomedical literature and clinical notes. In: Journal of Biomedical Informatics. 2008 ; Vol. 41, No. 6. pp. 1088-1100.
@article{a77ccc7203c14a2e9fb4581ca286a2b4,
title = "Word sense disambiguation across two domains: Biomedical literature and clinical notes",
abstract = "The aim of this study is to explore the word sense disambiguation (WSD) problem across two biomedical domains-biomedical literature and clinical notes. A supervised machine learning technique was used for the WSD task. One of the challenges addressed is the creation of a suitable clinical corpus with manual sense annotations. This corpus in conjunction with the WSD set from the National Library of Medicine provided the basis for the evaluation of our method across multiple domains and for the comparison of our results to published ones. Noteworthy is that only 20{\%} of the most relevant ambiguous terms within a domain overlap between the two domains, having more senses associated with them in the clinical space than in the biomedical literature space. Experimentation with 28 different feature sets rendered a system achieving an average F-score of 0.82 on the clinical data and 0.86 on the biomedical literature.",
keywords = "Artificial intelligence, Biomedical natural language processing, Information extraction, Machine learning, Natural language processing, Word sense disambiguation",
author = "Savova, {Guergana K.} and Coden, {Anni R.} and Sominsky, {Igor L.} and Rie Johnson and Ogren, {Philip V.} and Groen, {Piet C de} and Chute, {Christopher G.}",
year = "2008",
month = "12",
doi = "10.1016/j.jbi.2008.02.003",
language = "English (US)",
volume = "41",
pages = "1088--1100",
journal = "Journal of Biomedical Informatics",
issn = "1532-0464",
publisher = "Academic Press Inc.",
number = "6",

}

TY - JOUR

T1 - Word sense disambiguation across two domains

T2 - Biomedical literature and clinical notes

AU - Savova, Guergana K.

AU - Coden, Anni R.

AU - Sominsky, Igor L.

AU - Johnson, Rie

AU - Ogren, Philip V.

AU - Groen, Piet C de

AU - Chute, Christopher G.

PY - 2008/12

Y1 - 2008/12

N2 - The aim of this study is to explore the word sense disambiguation (WSD) problem across two biomedical domains-biomedical literature and clinical notes. A supervised machine learning technique was used for the WSD task. One of the challenges addressed is the creation of a suitable clinical corpus with manual sense annotations. This corpus in conjunction with the WSD set from the National Library of Medicine provided the basis for the evaluation of our method across multiple domains and for the comparison of our results to published ones. Noteworthy is that only 20% of the most relevant ambiguous terms within a domain overlap between the two domains, having more senses associated with them in the clinical space than in the biomedical literature space. Experimentation with 28 different feature sets rendered a system achieving an average F-score of 0.82 on the clinical data and 0.86 on the biomedical literature.

AB - The aim of this study is to explore the word sense disambiguation (WSD) problem across two biomedical domains-biomedical literature and clinical notes. A supervised machine learning technique was used for the WSD task. One of the challenges addressed is the creation of a suitable clinical corpus with manual sense annotations. This corpus in conjunction with the WSD set from the National Library of Medicine provided the basis for the evaluation of our method across multiple domains and for the comparison of our results to published ones. Noteworthy is that only 20% of the most relevant ambiguous terms within a domain overlap between the two domains, having more senses associated with them in the clinical space than in the biomedical literature space. Experimentation with 28 different feature sets rendered a system achieving an average F-score of 0.82 on the clinical data and 0.86 on the biomedical literature.

KW - Artificial intelligence

KW - Biomedical natural language processing

KW - Information extraction

KW - Machine learning

KW - Natural language processing

KW - Word sense disambiguation

UR - http://www.scopus.com/inward/record.url?scp=55549101625&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=55549101625&partnerID=8YFLogxK

U2 - 10.1016/j.jbi.2008.02.003

DO - 10.1016/j.jbi.2008.02.003

M3 - Article

C2 - 18375190

AN - SCOPUS:55549101625

VL - 41

SP - 1088

EP - 1100

JO - Journal of Biomedical Informatics

JF - Journal of Biomedical Informatics

SN - 1532-0464

IS - 6

ER -