Quality evaluation of cancer study Common Data Elements using the UMLS Semantic Network

Guoqian D Jiang, Harold R. Solbrig, Christopher G. Chute

Research output: Contribution to journalArticle

5 Citations (Scopus)

Abstract

The binding of controlled terminology has been regarded as important for standardization of Common Data Elements (CDEs) in cancer research. However, the potential of such binding has not yet been fully explored, especially its quality assurance aspect. The objective of this study is to explore whether there is a relationship between terminological annotations and the UMLS Semantic Network (SN) that can be exploited to improve those annotations. We profiled the terminological concepts associated with the standard structure of the CDEs of the NCI Cancer Data Standards Repository (caDSR) using the UMLS SN. We processed 17798 data elements and extracted 17526 primary object class/property concept pairs. We identified dominant semantic types for the categories "object class" and "property" and determined that the preponderance of the instances were disjoint (i.e. the intersection of semantic types between the two categories is empty). We then performed a preliminary evaluation on the data elements whose asserted primary object class/property concept pairs conflict with this observation - where the semantic type of the object class fell into a SN category typically used by property or visa-versa. In conclusion, the UMLS SN based profiling approach is feasible for the quality assurance and accessibility of the cancer study CDEs. This approach could provide useful insight about how to build mechanisms of quality assurance in a meta-data repository.

Original languageEnglish (US)
JournalJournal of Biomedical Informatics
Volume44
Issue numberSUPPL. 1
DOIs
StatePublished - Dec 2011

Fingerprint

Unified Medical Language System
Semantics
Quality assurance
Neoplasms
Terminology
Common Data Elements
Metadata
Standardization

Keywords

  • Cancer study
  • Common Data Elements
  • Quality assurance
  • Semantic Network

ASJC Scopus subject areas

  • Computer Science Applications
  • Health Informatics

Cite this

Quality evaluation of cancer study Common Data Elements using the UMLS Semantic Network. / Jiang, Guoqian D; Solbrig, Harold R.; Chute, Christopher G.

In: Journal of Biomedical Informatics, Vol. 44, No. SUPPL. 1, 12.2011.

Research output: Contribution to journalArticle

@article{1bfd38abef624ab2bf6c269c3f900cb6,
title = "Quality evaluation of cancer study Common Data Elements using the UMLS Semantic Network",
abstract = "The binding of controlled terminology has been regarded as important for standardization of Common Data Elements (CDEs) in cancer research. However, the potential of such binding has not yet been fully explored, especially its quality assurance aspect. The objective of this study is to explore whether there is a relationship between terminological annotations and the UMLS Semantic Network (SN) that can be exploited to improve those annotations. We profiled the terminological concepts associated with the standard structure of the CDEs of the NCI Cancer Data Standards Repository (caDSR) using the UMLS SN. We processed 17798 data elements and extracted 17526 primary object class/property concept pairs. We identified dominant semantic types for the categories {"}object class{"} and {"}property{"} and determined that the preponderance of the instances were disjoint (i.e. the intersection of semantic types between the two categories is empty). We then performed a preliminary evaluation on the data elements whose asserted primary object class/property concept pairs conflict with this observation - where the semantic type of the object class fell into a SN category typically used by property or visa-versa. In conclusion, the UMLS SN based profiling approach is feasible for the quality assurance and accessibility of the cancer study CDEs. This approach could provide useful insight about how to build mechanisms of quality assurance in a meta-data repository.",
keywords = "Cancer study, Common Data Elements, Quality assurance, Semantic Network",
author = "Jiang, {Guoqian D} and Solbrig, {Harold R.} and Chute, {Christopher G.}",
year = "2011",
month = "12",
doi = "10.1016/j.jbi.2011.08.001",
language = "English (US)",
volume = "44",
journal = "Journal of Biomedical Informatics",
issn = "1532-0464",
publisher = "Academic Press Inc.",
number = "SUPPL. 1",

}

TY - JOUR

T1 - Quality evaluation of cancer study Common Data Elements using the UMLS Semantic Network

AU - Jiang, Guoqian D

AU - Solbrig, Harold R.

AU - Chute, Christopher G.

PY - 2011/12

Y1 - 2011/12

N2 - The binding of controlled terminology has been regarded as important for standardization of Common Data Elements (CDEs) in cancer research. However, the potential of such binding has not yet been fully explored, especially its quality assurance aspect. The objective of this study is to explore whether there is a relationship between terminological annotations and the UMLS Semantic Network (SN) that can be exploited to improve those annotations. We profiled the terminological concepts associated with the standard structure of the CDEs of the NCI Cancer Data Standards Repository (caDSR) using the UMLS SN. We processed 17798 data elements and extracted 17526 primary object class/property concept pairs. We identified dominant semantic types for the categories "object class" and "property" and determined that the preponderance of the instances were disjoint (i.e. the intersection of semantic types between the two categories is empty). We then performed a preliminary evaluation on the data elements whose asserted primary object class/property concept pairs conflict with this observation - where the semantic type of the object class fell into a SN category typically used by property or visa-versa. In conclusion, the UMLS SN based profiling approach is feasible for the quality assurance and accessibility of the cancer study CDEs. This approach could provide useful insight about how to build mechanisms of quality assurance in a meta-data repository.

AB - The binding of controlled terminology has been regarded as important for standardization of Common Data Elements (CDEs) in cancer research. However, the potential of such binding has not yet been fully explored, especially its quality assurance aspect. The objective of this study is to explore whether there is a relationship between terminological annotations and the UMLS Semantic Network (SN) that can be exploited to improve those annotations. We profiled the terminological concepts associated with the standard structure of the CDEs of the NCI Cancer Data Standards Repository (caDSR) using the UMLS SN. We processed 17798 data elements and extracted 17526 primary object class/property concept pairs. We identified dominant semantic types for the categories "object class" and "property" and determined that the preponderance of the instances were disjoint (i.e. the intersection of semantic types between the two categories is empty). We then performed a preliminary evaluation on the data elements whose asserted primary object class/property concept pairs conflict with this observation - where the semantic type of the object class fell into a SN category typically used by property or visa-versa. In conclusion, the UMLS SN based profiling approach is feasible for the quality assurance and accessibility of the cancer study CDEs. This approach could provide useful insight about how to build mechanisms of quality assurance in a meta-data repository.

KW - Cancer study

KW - Common Data Elements

KW - Quality assurance

KW - Semantic Network

UR - http://www.scopus.com/inward/record.url?scp=83955162918&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=83955162918&partnerID=8YFLogxK

U2 - 10.1016/j.jbi.2011.08.001

DO - 10.1016/j.jbi.2011.08.001

M3 - Article

C2 - 21840422

AN - SCOPUS:83955162918

VL - 44

JO - Journal of Biomedical Informatics

JF - Journal of Biomedical Informatics

SN - 1532-0464

IS - SUPPL. 1

ER -