TY - GEN
T1 - Building an i2b2-based integrated data repository for cancer research
T2 - 2nd International Workshop on Data Management and Analytics for Medicine and Healthcare, DMAH 2016 held in conjunction with 42nd International Conference on Very Large Data Bases, VLDB 2016
AU - Hong, Na
AU - Li, Zheng
AU - Kiefer, Richard C.
AU - Robertson, Melissa S.
AU - Goode, Ellen L.
AU - Wang, Chen
AU - Jiang, Guoqian
N1 - Funding Information:
The study is supported in part by a NCI U01 Project – caCDE-QA (1U01CA180940-01A1), R01-CA122443, and an award from Mayo Clinic Ovarian Cancer SPORE (P50 CA136393).
Publisher Copyright:
© Springer International Publishing AG 2017.
PY - 2017
Y1 - 2017
N2 - In this study, we describe our preliminary efforts in building an i2b2- based integrated data repository that supports centralized data management for ovarian cancer clinical research, and discuss important lessons learnt that would inspire the evaluation and enhancement for future generic cancer-specific data repository. We collected multiple types of heterogeneous clinical data, including demographic, outcome, chemo-treatment and lab-test information for ovarian cancer. To better integrate different data types, we conducted data normalization procedures through reusing standard codes and creating mappings between local codes and standard vocabularies. We also developed the extract, transform and load (ETL) scripts to load the data into an i2b2 instance. Through further analytic practices, we evaluated major expectations of the systems according to common clinical research needs, including cohort query and identification, clinical databased hypothesis-testing, and exploratory data-mining. We also identified and discussed outstanding issues we will address through additional enhancement of existing i2b2 system.
AB - In this study, we describe our preliminary efforts in building an i2b2- based integrated data repository that supports centralized data management for ovarian cancer clinical research, and discuss important lessons learnt that would inspire the evaluation and enhancement for future generic cancer-specific data repository. We collected multiple types of heterogeneous clinical data, including demographic, outcome, chemo-treatment and lab-test information for ovarian cancer. To better integrate different data types, we conducted data normalization procedures through reusing standard codes and creating mappings between local codes and standard vocabularies. We also developed the extract, transform and load (ETL) scripts to load the data into an i2b2 instance. Through further analytic practices, we evaluated major expectations of the systems according to common clinical research needs, including cohort query and identification, clinical databased hypothesis-testing, and exploratory data-mining. We also identified and discussed outstanding issues we will address through additional enhancement of existing i2b2 system.
KW - Cancer registry
KW - Extract
KW - Informatics for integrating biology and the bedside (i2b2)
KW - Integrated data repository
KW - Ovarian cancer research
KW - Transform and load (ETL)
UR - http://www.scopus.com/inward/record.url?scp=85018700805&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85018700805&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-57741-8_8
DO - 10.1007/978-3-319-57741-8_8
M3 - Conference contribution
AN - SCOPUS:85018700805
SN - 9783319577401
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 121
EP - 135
BT - Data Management and Analytics for Medicine and Healthcare - 2nd International Workshop, DMAH 2016 Held at VLDB 2016, Revised Selected Papers
A2 - Yao, Lixia
A2 - Wang, Fusheng
A2 - Luo, Gang
PB - Springer Verlag
Y2 - 5 September 2016 through 9 September 2016
ER -