Using distributional analysis to semantically classify UMLS concepts

Jung Wei Fan; Hua Xu; Carol Friedman

Using distributional analysis to semantically classify UMLS concepts

Jung Wei Fan, Hua Xu, Carol Friedman

Digital Health Sciences

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

2 Scopus citations

Abstract

The UMLS is a widely used and comprehensive knowledge source in the biomedical domain. It specifies biomedical concepts and their semantic categories, and therefore is valuable for Natural Language Processing (NLP) and other knowledge-based systems. However, the UMLS semantic classification is not always accurate, which adversely affects performance of these systems. Therefore, it is desirable to automatically validate, or, when necessary, to semantically reclassify UMLS concepts. We applied a distributional similarity method based on syntactic dependencies and -skew divergence to classify concepts in the T033 Finding class in order to determine which ones were biologic functions or disorders. A gold standard of 100 randomly sampled concepts was created that was based on a majority annotation of three experts. Precision of 0.54 and recall of 0.654 was achieved by the top prediction; precision of 0.64 and recall of 0.769 was achieved by the top 2 predictions. Error analysis revealed problems in the current method, and provided insight into future improvements.

Original language	English (US)
Title of host publication	MEDINFO 2007 - Proceedings of the 12th World Congress on Health (Medical) Informatics
Subtitle of host publication	Building Sustainable Health Systems
Publisher	IOS Press
Pages	519-523
Number of pages	5
ISBN (Print)	9781586037741
State	Published - 2007
Event	12th World Congress on Medical Informatics, MEDINFO 2007 - Brisbane, QLD, Australia Duration: Aug 20 2007 → Aug 24 2007

Publication series

Name	Studies in Health Technology and Informatics
Volume	129
ISSN (Print)	0926-9630
ISSN (Electronic)	1879-8365

Other

Other	12th World Congress on Medical Informatics, MEDINFO 2007
Country/Territory	Australia
City	Brisbane, QLD
Period	8/20/07 → 8/24/07

Keywords

UMLS
distributional similarity
natural language processing
semantic classification

ASJC Scopus subject areas

Biomedical Engineering
Health Informatics
Health Information Management

Cite this

Using distributional analysis to semantically classify UMLS concepts. / Fan, Jung Wei; Xu, Hua; Friedman, Carol.
MEDINFO 2007 - Proceedings of the 12th World Congress on Health (Medical) Informatics: Building Sustainable Health Systems. IOS Press, 2007. p. 519-523 (Studies in Health Technology and Informatics; Vol. 129).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Fan, JW, Xu, H & Friedman, C 2007, Using distributional analysis to semantically classify UMLS concepts. in MEDINFO 2007 - Proceedings of the 12th World Congress on Health (Medical) Informatics: Building Sustainable Health Systems. Studies in Health Technology and Informatics, vol. 129, IOS Press, pp. 519-523, 12th World Congress on Medical Informatics, MEDINFO 2007, Brisbane, QLD, Australia, 8/20/07.

@inproceedings{766bb26deae546ed8586db91fb8ec7d7,

title = "Using distributional analysis to semantically classify UMLS concepts",

abstract = "The UMLS is a widely used and comprehensive knowledge source in the biomedical domain. It specifies biomedical concepts and their semantic categories, and therefore is valuable for Natural Language Processing (NLP) and other knowledge-based systems. However, the UMLS semantic classification is not always accurate, which adversely affects performance of these systems. Therefore, it is desirable to automatically validate, or, when necessary, to semantically reclassify UMLS concepts. We applied a distributional similarity method based on syntactic dependencies and -skew divergence to classify concepts in the T033 Finding class in order to determine which ones were biologic functions or disorders. A gold standard of 100 randomly sampled concepts was created that was based on a majority annotation of three experts. Precision of 0.54 and recall of 0.654 was achieved by the top prediction; precision of 0.64 and recall of 0.769 was achieved by the top 2 predictions. Error analysis revealed problems in the current method, and provided insight into future improvements.",

keywords = "UMLS, distributional similarity, natural language processing, semantic classification",

author = "Fan, {Jung Wei} and Hua Xu and Carol Friedman",

year = "2007",

language = "English (US)",

isbn = "9781586037741",

series = "Studies in Health Technology and Informatics",

publisher = "IOS Press",

pages = "519--523",

booktitle = "MEDINFO 2007 - Proceedings of the 12th World Congress on Health (Medical) Informatics",

note = "12th World Congress on Medical Informatics, MEDINFO 2007 ; Conference date: 20-08-2007 Through 24-08-2007",

}

TY - GEN

T1 - Using distributional analysis to semantically classify UMLS concepts

AU - Fan, Jung Wei

AU - Xu, Hua

AU - Friedman, Carol

PY - 2007

Y1 - 2007

N2 - The UMLS is a widely used and comprehensive knowledge source in the biomedical domain. It specifies biomedical concepts and their semantic categories, and therefore is valuable for Natural Language Processing (NLP) and other knowledge-based systems. However, the UMLS semantic classification is not always accurate, which adversely affects performance of these systems. Therefore, it is desirable to automatically validate, or, when necessary, to semantically reclassify UMLS concepts. We applied a distributional similarity method based on syntactic dependencies and -skew divergence to classify concepts in the T033 Finding class in order to determine which ones were biologic functions or disorders. A gold standard of 100 randomly sampled concepts was created that was based on a majority annotation of three experts. Precision of 0.54 and recall of 0.654 was achieved by the top prediction; precision of 0.64 and recall of 0.769 was achieved by the top 2 predictions. Error analysis revealed problems in the current method, and provided insight into future improvements.

AB - The UMLS is a widely used and comprehensive knowledge source in the biomedical domain. It specifies biomedical concepts and their semantic categories, and therefore is valuable for Natural Language Processing (NLP) and other knowledge-based systems. However, the UMLS semantic classification is not always accurate, which adversely affects performance of these systems. Therefore, it is desirable to automatically validate, or, when necessary, to semantically reclassify UMLS concepts. We applied a distributional similarity method based on syntactic dependencies and -skew divergence to classify concepts in the T033 Finding class in order to determine which ones were biologic functions or disorders. A gold standard of 100 randomly sampled concepts was created that was based on a majority annotation of three experts. Precision of 0.54 and recall of 0.654 was achieved by the top prediction; precision of 0.64 and recall of 0.769 was achieved by the top 2 predictions. Error analysis revealed problems in the current method, and provided insight into future improvements.

KW - UMLS

KW - distributional similarity

KW - natural language processing

KW - semantic classification

UR - http://www.scopus.com/inward/record.url?scp=84887979250&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84887979250&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:84887979250

SN - 9781586037741

T3 - Studies in Health Technology and Informatics

SP - 519

EP - 523

BT - MEDINFO 2007 - Proceedings of the 12th World Congress on Health (Medical) Informatics

PB - IOS Press

T2 - 12th World Congress on Medical Informatics, MEDINFO 2007

Y2 - 20 August 2007 through 24 August 2007

ER -

Using distributional analysis to semantically classify UMLS concepts

Abstract

Publication series

Other

Keywords

ASJC Scopus subject areas

Other files and links

Fingerprint

Cite this