Quality evaluation of value sets from cancer study common data elements using the UMLS semantic groups

Guoqian D Jiang, Harold R. Solbrig, Christopher G. Chute

Research output: Contribution to journalArticle

10 Citations (Scopus)

Abstract

Objective: The objective of this study is to develop an approach to evaluate the quality of terminological annotations on the value set (ie, enumerated value domain) components of the common data elements (CDEs) in the context of clinical research using both unified medical language system (UMLS) semantic types and groups. Materials and methods: The CDEs of the National Cancer Institute (NCI) Cancer Data Standards Repository, the NCI Thesaurus (NCIt) concepts and the UMLS semantic network were integrated using a semantic web-based framework for a SPARQL-enabled evaluation. First, the set of CDE-permissible values with corresponding meanings in external controlled terminologies were isolated. The corresponding value meanings were then evaluated against their NCI- or UMLS-generated semantic network mapping to determine whether all of the meanings fell within the same semantic group. Results: Of the enumerated CDEs in the Cancer Data Standards Repository, 3093 (26.2%) had elements drawn from more than one UMLS semantic group. A random sample (n1/4100) of this set of elements indicated that 17% of them were likely to have been misclassified. Discussion: The use of existing semantic web tools can support a high-throughput mechanism for evaluating the quality of large CDE collections. This study demonstrates that the involvement of multiple semantic groups in an enumerated value domain of a CDE is an effective anchor to trigger an auditing point for quality evaluation activities. Conclusion: This approach produces a useful quality assurance mechanism for a clinical study CDE repository.

Original languageEnglish (US)
JournalJournal of the American Medical Informatics Association
Volume19
Issue numberE1
DOIs
StatePublished - Jun 2012

Fingerprint

Unified Medical Language System
Semantics
National Cancer Institute (U.S.)
Neoplasms
Controlled Vocabulary
Trigger Points
Common Data Elements
Terminology

ASJC Scopus subject areas

  • Health Informatics

Cite this

Quality evaluation of value sets from cancer study common data elements using the UMLS semantic groups. / Jiang, Guoqian D; Solbrig, Harold R.; Chute, Christopher G.

In: Journal of the American Medical Informatics Association, Vol. 19, No. E1, 06.2012.

Research output: Contribution to journalArticle

@article{68cc7cc1d465485980e98b0f345b837e,
title = "Quality evaluation of value sets from cancer study common data elements using the UMLS semantic groups",
abstract = "Objective: The objective of this study is to develop an approach to evaluate the quality of terminological annotations on the value set (ie, enumerated value domain) components of the common data elements (CDEs) in the context of clinical research using both unified medical language system (UMLS) semantic types and groups. Materials and methods: The CDEs of the National Cancer Institute (NCI) Cancer Data Standards Repository, the NCI Thesaurus (NCIt) concepts and the UMLS semantic network were integrated using a semantic web-based framework for a SPARQL-enabled evaluation. First, the set of CDE-permissible values with corresponding meanings in external controlled terminologies were isolated. The corresponding value meanings were then evaluated against their NCI- or UMLS-generated semantic network mapping to determine whether all of the meanings fell within the same semantic group. Results: Of the enumerated CDEs in the Cancer Data Standards Repository, 3093 (26.2{\%}) had elements drawn from more than one UMLS semantic group. A random sample (n1/4100) of this set of elements indicated that 17{\%} of them were likely to have been misclassified. Discussion: The use of existing semantic web tools can support a high-throughput mechanism for evaluating the quality of large CDE collections. This study demonstrates that the involvement of multiple semantic groups in an enumerated value domain of a CDE is an effective anchor to trigger an auditing point for quality evaluation activities. Conclusion: This approach produces a useful quality assurance mechanism for a clinical study CDE repository.",
author = "Jiang, {Guoqian D} and Solbrig, {Harold R.} and Chute, {Christopher G.}",
year = "2012",
month = "6",
doi = "10.1136/amiajnl-2011-000739",
language = "English (US)",
volume = "19",
journal = "Journal of the American Medical Informatics Association : JAMIA",
issn = "1067-5027",
publisher = "Oxford University Press",
number = "E1",

}

TY - JOUR

T1 - Quality evaluation of value sets from cancer study common data elements using the UMLS semantic groups

AU - Jiang, Guoqian D

AU - Solbrig, Harold R.

AU - Chute, Christopher G.

PY - 2012/6

Y1 - 2012/6

N2 - Objective: The objective of this study is to develop an approach to evaluate the quality of terminological annotations on the value set (ie, enumerated value domain) components of the common data elements (CDEs) in the context of clinical research using both unified medical language system (UMLS) semantic types and groups. Materials and methods: The CDEs of the National Cancer Institute (NCI) Cancer Data Standards Repository, the NCI Thesaurus (NCIt) concepts and the UMLS semantic network were integrated using a semantic web-based framework for a SPARQL-enabled evaluation. First, the set of CDE-permissible values with corresponding meanings in external controlled terminologies were isolated. The corresponding value meanings were then evaluated against their NCI- or UMLS-generated semantic network mapping to determine whether all of the meanings fell within the same semantic group. Results: Of the enumerated CDEs in the Cancer Data Standards Repository, 3093 (26.2%) had elements drawn from more than one UMLS semantic group. A random sample (n1/4100) of this set of elements indicated that 17% of them were likely to have been misclassified. Discussion: The use of existing semantic web tools can support a high-throughput mechanism for evaluating the quality of large CDE collections. This study demonstrates that the involvement of multiple semantic groups in an enumerated value domain of a CDE is an effective anchor to trigger an auditing point for quality evaluation activities. Conclusion: This approach produces a useful quality assurance mechanism for a clinical study CDE repository.

AB - Objective: The objective of this study is to develop an approach to evaluate the quality of terminological annotations on the value set (ie, enumerated value domain) components of the common data elements (CDEs) in the context of clinical research using both unified medical language system (UMLS) semantic types and groups. Materials and methods: The CDEs of the National Cancer Institute (NCI) Cancer Data Standards Repository, the NCI Thesaurus (NCIt) concepts and the UMLS semantic network were integrated using a semantic web-based framework for a SPARQL-enabled evaluation. First, the set of CDE-permissible values with corresponding meanings in external controlled terminologies were isolated. The corresponding value meanings were then evaluated against their NCI- or UMLS-generated semantic network mapping to determine whether all of the meanings fell within the same semantic group. Results: Of the enumerated CDEs in the Cancer Data Standards Repository, 3093 (26.2%) had elements drawn from more than one UMLS semantic group. A random sample (n1/4100) of this set of elements indicated that 17% of them were likely to have been misclassified. Discussion: The use of existing semantic web tools can support a high-throughput mechanism for evaluating the quality of large CDE collections. This study demonstrates that the involvement of multiple semantic groups in an enumerated value domain of a CDE is an effective anchor to trigger an auditing point for quality evaluation activities. Conclusion: This approach produces a useful quality assurance mechanism for a clinical study CDE repository.

UR - http://www.scopus.com/inward/record.url?scp=84863552403&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84863552403&partnerID=8YFLogxK

U2 - 10.1136/amiajnl-2011-000739

DO - 10.1136/amiajnl-2011-000739

M3 - Article

C2 - 22511016

AN - SCOPUS:84863552403

VL - 19

JO - Journal of the American Medical Informatics Association : JAMIA

JF - Journal of the American Medical Informatics Association : JAMIA

SN - 1067-5027

IS - E1

ER -