Word sense disambiguation across two domains: Biomedical literature and clinical notes

Guergana K. Savova; Anni R. Coden; Igor L. Sominsky; Rie Johnson; Philip V. Ogren; Piet C.de Groen; Christopher G. Chute

doi:10.1016/j.jbi.2008.02.003

Word sense disambiguation across two domains: Biomedical literature and clinical notes

Guergana K. Savova, Anni R. Coden, Igor L. Sominsky, Rie Johnson, Philip V. Ogren, Piet C.de Groen, Christopher G. Chute

Research output: Contribution to journal › Article › peer-review

42 Scopus citations

Abstract

The aim of this study is to explore the word sense disambiguation (WSD) problem across two biomedical domains-biomedical literature and clinical notes. A supervised machine learning technique was used for the WSD task. One of the challenges addressed is the creation of a suitable clinical corpus with manual sense annotations. This corpus in conjunction with the WSD set from the National Library of Medicine provided the basis for the evaluation of our method across multiple domains and for the comparison of our results to published ones. Noteworthy is that only 20% of the most relevant ambiguous terms within a domain overlap between the two domains, having more senses associated with them in the clinical space than in the biomedical literature space. Experimentation with 28 different feature sets rendered a system achieving an average F-score of 0.82 on the clinical data and 0.86 on the biomedical literature.

Original language	English (US)
Pages (from-to)	1088-1100
Number of pages	13
Journal	Journal of Biomedical Informatics
Volume	41
Issue number	6
DOIs	https://doi.org/10.1016/j.jbi.2008.02.003
State	Published - Dec 2008

Keywords

Artificial intelligence
Biomedical natural language processing
Information extraction
Machine learning
Natural language processing
Word sense disambiguation

ASJC Scopus subject areas

Health Informatics
Computer Science Applications

Access to Document

10.1016/j.jbi.2008.02.003

Cite this

@article{a77ccc7203c14a2e9fb4581ca286a2b4,

title = "Word sense disambiguation across two domains: Biomedical literature and clinical notes",

abstract = "The aim of this study is to explore the word sense disambiguation (WSD) problem across two biomedical domains-biomedical literature and clinical notes. A supervised machine learning technique was used for the WSD task. One of the challenges addressed is the creation of a suitable clinical corpus with manual sense annotations. This corpus in conjunction with the WSD set from the National Library of Medicine provided the basis for the evaluation of our method across multiple domains and for the comparison of our results to published ones. Noteworthy is that only 20% of the most relevant ambiguous terms within a domain overlap between the two domains, having more senses associated with them in the clinical space than in the biomedical literature space. Experimentation with 28 different feature sets rendered a system achieving an average F-score of 0.82 on the clinical data and 0.86 on the biomedical literature.",

keywords = "Artificial intelligence, Biomedical natural language processing, Information extraction, Machine learning, Natural language processing, Word sense disambiguation",

author = "Savova, {Guergana K.} and Coden, {Anni R.} and Sominsky, {Igor L.} and Rie Johnson and Ogren, {Philip V.} and Groen, {Piet C.de} and Chute, {Christopher G.}",

year = "2008",

month = dec,

doi = "10.1016/j.jbi.2008.02.003",

language = "English (US)",

volume = "41",

pages = "1088--1100",

journal = "Journal of Biomedical Informatics",

issn = "1532-0464",

publisher = "Academic Press Inc.",

number = "6",

}

TY - JOUR

T1 - Word sense disambiguation across two domains

T2 - Biomedical literature and clinical notes

AU - Savova, Guergana K.

AU - Coden, Anni R.

AU - Sominsky, Igor L.

AU - Johnson, Rie

AU - Ogren, Philip V.

AU - Groen, Piet C.de

AU - Chute, Christopher G.

PY - 2008/12

Y1 - 2008/12

N2 - The aim of this study is to explore the word sense disambiguation (WSD) problem across two biomedical domains-biomedical literature and clinical notes. A supervised machine learning technique was used for the WSD task. One of the challenges addressed is the creation of a suitable clinical corpus with manual sense annotations. This corpus in conjunction with the WSD set from the National Library of Medicine provided the basis for the evaluation of our method across multiple domains and for the comparison of our results to published ones. Noteworthy is that only 20% of the most relevant ambiguous terms within a domain overlap between the two domains, having more senses associated with them in the clinical space than in the biomedical literature space. Experimentation with 28 different feature sets rendered a system achieving an average F-score of 0.82 on the clinical data and 0.86 on the biomedical literature.

AB - The aim of this study is to explore the word sense disambiguation (WSD) problem across two biomedical domains-biomedical literature and clinical notes. A supervised machine learning technique was used for the WSD task. One of the challenges addressed is the creation of a suitable clinical corpus with manual sense annotations. This corpus in conjunction with the WSD set from the National Library of Medicine provided the basis for the evaluation of our method across multiple domains and for the comparison of our results to published ones. Noteworthy is that only 20% of the most relevant ambiguous terms within a domain overlap between the two domains, having more senses associated with them in the clinical space than in the biomedical literature space. Experimentation with 28 different feature sets rendered a system achieving an average F-score of 0.82 on the clinical data and 0.86 on the biomedical literature.

KW - Artificial intelligence

KW - Biomedical natural language processing

KW - Information extraction

KW - Machine learning

KW - Natural language processing

KW - Word sense disambiguation

UR - http://www.scopus.com/inward/record.url?scp=55549101625&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=55549101625&partnerID=8YFLogxK

U2 - 10.1016/j.jbi.2008.02.003

DO - 10.1016/j.jbi.2008.02.003

M3 - Article

C2 - 18375190

AN - SCOPUS:55549101625

SN - 1532-0464

VL - 41

SP - 1088

EP - 1100

JO - Journal of Biomedical Informatics

JF - Journal of Biomedical Informatics

IS - 6

ER -

Word sense disambiguation across two domains: Biomedical literature and clinical notes

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this