Representation of time-relevant common data elements in the cancer data standards repository: Statistical evaluation of an ontological approach

Henry W. Chen, Jingcheng Du, Hsing Yi Song, Xiangyu Liu, Guoqian Jiang, Cui Tao

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

Background: Today, there is an increasing need to centralize and standardize electronic health data within clinical research as the volume of data continues to balloon. Domain-specific common data elements (CDEs) are emerging as a standard approach to clinical research data capturing and reporting. Recent efforts to standardize clinical study CDEs have been of great benefit in facilitating data integration and data sharing. The importance of the temporal dimension of clinical research studies has been well recognized; however, very few studies have focused on the formal representation of temporal constraints and temporal relationships within clinical research data in the biomedical research community. In particular, temporal information can be extremely powerful to enable high-quality cancer research. Objective: The objective of the study was to develop and evaluate an ontological approach to represent the temporal aspects of cancer study CDEs. Methods: We used CDEs recorded in the National Cancer Institute (NCI) Cancer Data Standards Repository (caDSR) and created a CDE parser to extract time-relevant CDEs from the caDSR. Using the Web Ontology Language (OWL)-based Time Event Ontology (TEO), we manually derived representative patterns to semantically model the temporal components of the CDEs using an observing set of randomly selected time-related CDEs (n=600) to create a set of TEO ontological representation patterns. In evaluating TEO's ability to represent the temporal components of the CDEs, this set of representation patterns was tested against two test sets of randomly selected time-related CDEs (n=425). Results: It was found that 94.2% (801/850) of the CDEs in the test sets could be represented by the TEO representation patterns. Conclusions: In conclusion, TEO is a good ontological model for representing the temporal components of the CDEs recorded in caDSR. Our representative model can harness the Semantic Web reasoning and inferencing functionalities and present a means for temporal CDEs to be machine-readable, streamlining meaningful searches.

Original languageEnglish (US)
Article numbere7
JournalJMIR Medical Informatics
Volume6
Issue number1
DOIs
StatePublished - Jan 2018

Keywords

  • Biomedical ontology
  • Common data elements
  • Database
  • Database management systems
  • Time

ASJC Scopus subject areas

  • Health Informatics
  • Health Information Management

Fingerprint

Dive into the research topics of 'Representation of time-relevant common data elements in the cancer data standards repository: Statistical evaluation of an ontological approach'. Together they form a unique fingerprint.

Cite this