Preliminary exploration of survival analysis using the OHDSI common data model

A case study of intrahepatic cholangiocarcinoma

Na Hong, Ning Zhang, Huawei Wu, Shanshan Lu, Yue Yu, Li Hou, Yinying Lu, Hongfang D Liu, Guoqian D Jiang

Research output: Contribution to journalArticle

Abstract

Background: Data heterogeneity is a common phenomenon related to the secondary use of electronic health records (EHR) data from different sources. The Observational Health Data Sciences and Informatics (OHDSI) Common Data Model (CDM) organizes healthcare data into standard data structures using concepts that are explicitly and formally specified through standard vocabularies, thereby facilitating large-scale analysis. The objective of this study is to design, develop, and evaluate generic survival analysis routines built using the OHDSI CDM. Methods: We used intrahepatic cholangiocarcinoma (ICC) patient data to implement CDM-based survival analysis methods. Our methods comprise the following modules: 1) Mapping local terms to standard OHDSI concepts. The analytical expression of variables and values related to demographic characteristics, medical history, smoking status, laboratory results, and tumor feature data. These data were mapped to standard OHDSI concepts through a manual analysis; 2) Loading patient data into the CDM using the concept mappings; 3) Developing an R interface that supports the portable survival analysis on top of OHDSI CDM, and comparing the CDM-based analysis results with those using traditional statistical analysis methods. Results: Our dataset contained 346 patients diagnosed with ICC. The collected clinical data contains 115 variables, of which 75 variables were mapped to the OHDSI concepts. These concepts mainly belong to four domains: condition, observation, measurement, and procedure. The corresponding standard concepts are scattered in six vocabularies: ICD10CM, ICD10PCS, SNOMED, LOINC, NDFRT, and READ. We loaded a total of 25,950 patient data records into the OHDSI CDM database. However, 40 variables failed to map to the OHDSI CDM as they mostly belong to imaging data and pathological data. Conclusions: Our study demonstrates that conducting survival analysis using the OHDSI CDM is feasible and can produce reusable analysis routines. However, challenges to be overcome include 1) semantic loss caused by inaccurate mapping and value normalization; 2) incomplete OHDSI vocabularies describing imaging data, pathological data, and modular data representation.

Original languageEnglish (US)
Article number116
JournalBMC Medical Informatics and Decision Making
Volume18
DOIs
StatePublished - Dec 7 2018

Fingerprint

Informatics
Cholangiocarcinoma
Survival Analysis
Health
Vocabulary
Logical Observation Identifiers Names and Codes
Systematized Nomenclature of Medicine
Electronic Health Records
Information Storage and Retrieval
Semantics
Smoking
Observation
Demography
Databases
Delivery of Health Care

Keywords

  • Intrahepatic cholangiocarcinoma
  • Multi-center analysis
  • OHDSI CDM
  • R
  • Survival analysis

ASJC Scopus subject areas

  • Health Policy
  • Health Informatics

Cite this

Preliminary exploration of survival analysis using the OHDSI common data model : A case study of intrahepatic cholangiocarcinoma. / Hong, Na; Zhang, Ning; Wu, Huawei; Lu, Shanshan; Yu, Yue; Hou, Li; Lu, Yinying; Liu, Hongfang D; Jiang, Guoqian D.

In: BMC Medical Informatics and Decision Making, Vol. 18, 116, 07.12.2018.

Research output: Contribution to journalArticle

@article{97205a38375e46ef9f90cccd3b1ee2c7,
title = "Preliminary exploration of survival analysis using the OHDSI common data model: A case study of intrahepatic cholangiocarcinoma",
abstract = "Background: Data heterogeneity is a common phenomenon related to the secondary use of electronic health records (EHR) data from different sources. The Observational Health Data Sciences and Informatics (OHDSI) Common Data Model (CDM) organizes healthcare data into standard data structures using concepts that are explicitly and formally specified through standard vocabularies, thereby facilitating large-scale analysis. The objective of this study is to design, develop, and evaluate generic survival analysis routines built using the OHDSI CDM. Methods: We used intrahepatic cholangiocarcinoma (ICC) patient data to implement CDM-based survival analysis methods. Our methods comprise the following modules: 1) Mapping local terms to standard OHDSI concepts. The analytical expression of variables and values related to demographic characteristics, medical history, smoking status, laboratory results, and tumor feature data. These data were mapped to standard OHDSI concepts through a manual analysis; 2) Loading patient data into the CDM using the concept mappings; 3) Developing an R interface that supports the portable survival analysis on top of OHDSI CDM, and comparing the CDM-based analysis results with those using traditional statistical analysis methods. Results: Our dataset contained 346 patients diagnosed with ICC. The collected clinical data contains 115 variables, of which 75 variables were mapped to the OHDSI concepts. These concepts mainly belong to four domains: condition, observation, measurement, and procedure. The corresponding standard concepts are scattered in six vocabularies: ICD10CM, ICD10PCS, SNOMED, LOINC, NDFRT, and READ. We loaded a total of 25,950 patient data records into the OHDSI CDM database. However, 40 variables failed to map to the OHDSI CDM as they mostly belong to imaging data and pathological data. Conclusions: Our study demonstrates that conducting survival analysis using the OHDSI CDM is feasible and can produce reusable analysis routines. However, challenges to be overcome include 1) semantic loss caused by inaccurate mapping and value normalization; 2) incomplete OHDSI vocabularies describing imaging data, pathological data, and modular data representation.",
keywords = "Intrahepatic cholangiocarcinoma, Multi-center analysis, OHDSI CDM, R, Survival analysis",
author = "Na Hong and Ning Zhang and Huawei Wu and Shanshan Lu and Yue Yu and Li Hou and Yinying Lu and Liu, {Hongfang D} and Jiang, {Guoqian D}",
year = "2018",
month = "12",
day = "7",
doi = "10.1186/s12911-018-0686-7",
language = "English (US)",
volume = "18",
journal = "BMC Medical Informatics and Decision Making",
issn = "1472-6947",
publisher = "BioMed Central",

}

TY - JOUR

T1 - Preliminary exploration of survival analysis using the OHDSI common data model

T2 - A case study of intrahepatic cholangiocarcinoma

AU - Hong, Na

AU - Zhang, Ning

AU - Wu, Huawei

AU - Lu, Shanshan

AU - Yu, Yue

AU - Hou, Li

AU - Lu, Yinying

AU - Liu, Hongfang D

AU - Jiang, Guoqian D

PY - 2018/12/7

Y1 - 2018/12/7

N2 - Background: Data heterogeneity is a common phenomenon related to the secondary use of electronic health records (EHR) data from different sources. The Observational Health Data Sciences and Informatics (OHDSI) Common Data Model (CDM) organizes healthcare data into standard data structures using concepts that are explicitly and formally specified through standard vocabularies, thereby facilitating large-scale analysis. The objective of this study is to design, develop, and evaluate generic survival analysis routines built using the OHDSI CDM. Methods: We used intrahepatic cholangiocarcinoma (ICC) patient data to implement CDM-based survival analysis methods. Our methods comprise the following modules: 1) Mapping local terms to standard OHDSI concepts. The analytical expression of variables and values related to demographic characteristics, medical history, smoking status, laboratory results, and tumor feature data. These data were mapped to standard OHDSI concepts through a manual analysis; 2) Loading patient data into the CDM using the concept mappings; 3) Developing an R interface that supports the portable survival analysis on top of OHDSI CDM, and comparing the CDM-based analysis results with those using traditional statistical analysis methods. Results: Our dataset contained 346 patients diagnosed with ICC. The collected clinical data contains 115 variables, of which 75 variables were mapped to the OHDSI concepts. These concepts mainly belong to four domains: condition, observation, measurement, and procedure. The corresponding standard concepts are scattered in six vocabularies: ICD10CM, ICD10PCS, SNOMED, LOINC, NDFRT, and READ. We loaded a total of 25,950 patient data records into the OHDSI CDM database. However, 40 variables failed to map to the OHDSI CDM as they mostly belong to imaging data and pathological data. Conclusions: Our study demonstrates that conducting survival analysis using the OHDSI CDM is feasible and can produce reusable analysis routines. However, challenges to be overcome include 1) semantic loss caused by inaccurate mapping and value normalization; 2) incomplete OHDSI vocabularies describing imaging data, pathological data, and modular data representation.

AB - Background: Data heterogeneity is a common phenomenon related to the secondary use of electronic health records (EHR) data from different sources. The Observational Health Data Sciences and Informatics (OHDSI) Common Data Model (CDM) organizes healthcare data into standard data structures using concepts that are explicitly and formally specified through standard vocabularies, thereby facilitating large-scale analysis. The objective of this study is to design, develop, and evaluate generic survival analysis routines built using the OHDSI CDM. Methods: We used intrahepatic cholangiocarcinoma (ICC) patient data to implement CDM-based survival analysis methods. Our methods comprise the following modules: 1) Mapping local terms to standard OHDSI concepts. The analytical expression of variables and values related to demographic characteristics, medical history, smoking status, laboratory results, and tumor feature data. These data were mapped to standard OHDSI concepts through a manual analysis; 2) Loading patient data into the CDM using the concept mappings; 3) Developing an R interface that supports the portable survival analysis on top of OHDSI CDM, and comparing the CDM-based analysis results with those using traditional statistical analysis methods. Results: Our dataset contained 346 patients diagnosed with ICC. The collected clinical data contains 115 variables, of which 75 variables were mapped to the OHDSI concepts. These concepts mainly belong to four domains: condition, observation, measurement, and procedure. The corresponding standard concepts are scattered in six vocabularies: ICD10CM, ICD10PCS, SNOMED, LOINC, NDFRT, and READ. We loaded a total of 25,950 patient data records into the OHDSI CDM database. However, 40 variables failed to map to the OHDSI CDM as they mostly belong to imaging data and pathological data. Conclusions: Our study demonstrates that conducting survival analysis using the OHDSI CDM is feasible and can produce reusable analysis routines. However, challenges to be overcome include 1) semantic loss caused by inaccurate mapping and value normalization; 2) incomplete OHDSI vocabularies describing imaging data, pathological data, and modular data representation.

KW - Intrahepatic cholangiocarcinoma

KW - Multi-center analysis

KW - OHDSI CDM

KW - R

KW - Survival analysis

UR - http://www.scopus.com/inward/record.url?scp=85058055869&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85058055869&partnerID=8YFLogxK

U2 - 10.1186/s12911-018-0686-7

DO - 10.1186/s12911-018-0686-7

M3 - Article

VL - 18

JO - BMC Medical Informatics and Decision Making

JF - BMC Medical Informatics and Decision Making

SN - 1472-6947

M1 - 116

ER -