Building an i2b2-based integrated data repository for cancer research: A case study of ovarian cancer registry

Na Hong, Zheng Li, Richard C. Kiefer, Melissa S. Robertson, Ellen L Goode, Chen Wang, Guoqian D Jiang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

In this study, we describe our preliminary efforts in building an i2b2- based integrated data repository that supports centralized data management for ovarian cancer clinical research, and discuss important lessons learnt that would inspire the evaluation and enhancement for future generic cancer-specific data repository. We collected multiple types of heterogeneous clinical data, including demographic, outcome, chemo-treatment and lab-test information for ovarian cancer. To better integrate different data types, we conducted data normalization procedures through reusing standard codes and creating mappings between local codes and standard vocabularies. We also developed the extract, transform and load (ETL) scripts to load the data into an i2b2 instance. Through further analytic practices, we evaluated major expectations of the systems according to common clinical research needs, including cohort query and identification, clinical databased hypothesis-testing, and exploratory data-mining. We also identified and discussed outstanding issues we will address through additional enhancement of existing i2b2 system.

Original languageEnglish (US)
Title of host publicationData Management and Analytics for Medicine and Healthcare - 2nd International Workshop, DMAH 2016 Held at VLDB 2016, Revised Selected Papers
PublisherSpringer Verlag
Pages121-135
Number of pages15
Volume10186 LNCS
ISBN (Print)9783319577401
DOIs
StatePublished - 2017
Event2nd International Workshop on Data Management and Analytics for Medicine and Healthcare, DMAH 2016 held in conjunction with 42nd International Conference on Very Large Data Bases, VLDB 2016 - New Delhi, India
Duration: Sep 5 2016Sep 9 2016

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume10186 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other2nd International Workshop on Data Management and Analytics for Medicine and Healthcare, DMAH 2016 held in conjunction with 42nd International Conference on Very Large Data Bases, VLDB 2016
CountryIndia
CityNew Delhi
Period9/5/169/9/16

Fingerprint

Codes (standards)
Ovarian Cancer
Repository
Cancer
Information management
Data mining
Testing
Enhancement
Hypothesis Testing
Data Management
Normalization
Data Mining
Integrate
Query
Transform
Evaluation

Keywords

  • Cancer registry
  • Extract
  • Informatics for integrating biology and the bedside (i2b2)
  • Integrated data repository
  • Ovarian cancer research
  • Transform and load (ETL)

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Cite this

Hong, N., Li, Z., Kiefer, R. C., Robertson, M. S., Goode, E. L., Wang, C., & Jiang, G. D. (2017). Building an i2b2-based integrated data repository for cancer research: A case study of ovarian cancer registry. In Data Management and Analytics for Medicine and Healthcare - 2nd International Workshop, DMAH 2016 Held at VLDB 2016, Revised Selected Papers (Vol. 10186 LNCS, pp. 121-135). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 10186 LNCS). Springer Verlag. https://doi.org/10.1007/978-3-319-57741-8_8

Building an i2b2-based integrated data repository for cancer research : A case study of ovarian cancer registry. / Hong, Na; Li, Zheng; Kiefer, Richard C.; Robertson, Melissa S.; Goode, Ellen L; Wang, Chen; Jiang, Guoqian D.

Data Management and Analytics for Medicine and Healthcare - 2nd International Workshop, DMAH 2016 Held at VLDB 2016, Revised Selected Papers. Vol. 10186 LNCS Springer Verlag, 2017. p. 121-135 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 10186 LNCS).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Hong, N, Li, Z, Kiefer, RC, Robertson, MS, Goode, EL, Wang, C & Jiang, GD 2017, Building an i2b2-based integrated data repository for cancer research: A case study of ovarian cancer registry. in Data Management and Analytics for Medicine and Healthcare - 2nd International Workshop, DMAH 2016 Held at VLDB 2016, Revised Selected Papers. vol. 10186 LNCS, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 10186 LNCS, Springer Verlag, pp. 121-135, 2nd International Workshop on Data Management and Analytics for Medicine and Healthcare, DMAH 2016 held in conjunction with 42nd International Conference on Very Large Data Bases, VLDB 2016, New Delhi, India, 9/5/16. https://doi.org/10.1007/978-3-319-57741-8_8
Hong N, Li Z, Kiefer RC, Robertson MS, Goode EL, Wang C et al. Building an i2b2-based integrated data repository for cancer research: A case study of ovarian cancer registry. In Data Management and Analytics for Medicine and Healthcare - 2nd International Workshop, DMAH 2016 Held at VLDB 2016, Revised Selected Papers. Vol. 10186 LNCS. Springer Verlag. 2017. p. 121-135. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-319-57741-8_8
Hong, Na ; Li, Zheng ; Kiefer, Richard C. ; Robertson, Melissa S. ; Goode, Ellen L ; Wang, Chen ; Jiang, Guoqian D. / Building an i2b2-based integrated data repository for cancer research : A case study of ovarian cancer registry. Data Management and Analytics for Medicine and Healthcare - 2nd International Workshop, DMAH 2016 Held at VLDB 2016, Revised Selected Papers. Vol. 10186 LNCS Springer Verlag, 2017. pp. 121-135 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{3d6b261106ac41dc9704f6e1a19b71aa,
title = "Building an i2b2-based integrated data repository for cancer research: A case study of ovarian cancer registry",
abstract = "In this study, we describe our preliminary efforts in building an i2b2- based integrated data repository that supports centralized data management for ovarian cancer clinical research, and discuss important lessons learnt that would inspire the evaluation and enhancement for future generic cancer-specific data repository. We collected multiple types of heterogeneous clinical data, including demographic, outcome, chemo-treatment and lab-test information for ovarian cancer. To better integrate different data types, we conducted data normalization procedures through reusing standard codes and creating mappings between local codes and standard vocabularies. We also developed the extract, transform and load (ETL) scripts to load the data into an i2b2 instance. Through further analytic practices, we evaluated major expectations of the systems according to common clinical research needs, including cohort query and identification, clinical databased hypothesis-testing, and exploratory data-mining. We also identified and discussed outstanding issues we will address through additional enhancement of existing i2b2 system.",
keywords = "Cancer registry, Extract, Informatics for integrating biology and the bedside (i2b2), Integrated data repository, Ovarian cancer research, Transform and load (ETL)",
author = "Na Hong and Zheng Li and Kiefer, {Richard C.} and Robertson, {Melissa S.} and Goode, {Ellen L} and Chen Wang and Jiang, {Guoqian D}",
year = "2017",
doi = "10.1007/978-3-319-57741-8_8",
language = "English (US)",
isbn = "9783319577401",
volume = "10186 LNCS",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer Verlag",
pages = "121--135",
booktitle = "Data Management and Analytics for Medicine and Healthcare - 2nd International Workshop, DMAH 2016 Held at VLDB 2016, Revised Selected Papers",

}

TY - GEN

T1 - Building an i2b2-based integrated data repository for cancer research

T2 - A case study of ovarian cancer registry

AU - Hong, Na

AU - Li, Zheng

AU - Kiefer, Richard C.

AU - Robertson, Melissa S.

AU - Goode, Ellen L

AU - Wang, Chen

AU - Jiang, Guoqian D

PY - 2017

Y1 - 2017

N2 - In this study, we describe our preliminary efforts in building an i2b2- based integrated data repository that supports centralized data management for ovarian cancer clinical research, and discuss important lessons learnt that would inspire the evaluation and enhancement for future generic cancer-specific data repository. We collected multiple types of heterogeneous clinical data, including demographic, outcome, chemo-treatment and lab-test information for ovarian cancer. To better integrate different data types, we conducted data normalization procedures through reusing standard codes and creating mappings between local codes and standard vocabularies. We also developed the extract, transform and load (ETL) scripts to load the data into an i2b2 instance. Through further analytic practices, we evaluated major expectations of the systems according to common clinical research needs, including cohort query and identification, clinical databased hypothesis-testing, and exploratory data-mining. We also identified and discussed outstanding issues we will address through additional enhancement of existing i2b2 system.

AB - In this study, we describe our preliminary efforts in building an i2b2- based integrated data repository that supports centralized data management for ovarian cancer clinical research, and discuss important lessons learnt that would inspire the evaluation and enhancement for future generic cancer-specific data repository. We collected multiple types of heterogeneous clinical data, including demographic, outcome, chemo-treatment and lab-test information for ovarian cancer. To better integrate different data types, we conducted data normalization procedures through reusing standard codes and creating mappings between local codes and standard vocabularies. We also developed the extract, transform and load (ETL) scripts to load the data into an i2b2 instance. Through further analytic practices, we evaluated major expectations of the systems according to common clinical research needs, including cohort query and identification, clinical databased hypothesis-testing, and exploratory data-mining. We also identified and discussed outstanding issues we will address through additional enhancement of existing i2b2 system.

KW - Cancer registry

KW - Extract

KW - Informatics for integrating biology and the bedside (i2b2)

KW - Integrated data repository

KW - Ovarian cancer research

KW - Transform and load (ETL)

UR - http://www.scopus.com/inward/record.url?scp=85018700805&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85018700805&partnerID=8YFLogxK

U2 - 10.1007/978-3-319-57741-8_8

DO - 10.1007/978-3-319-57741-8_8

M3 - Conference contribution

AN - SCOPUS:85018700805

SN - 9783319577401

VL - 10186 LNCS

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 121

EP - 135

BT - Data Management and Analytics for Medicine and Healthcare - 2nd International Workshop, DMAH 2016 Held at VLDB 2016, Revised Selected Papers

PB - Springer Verlag

ER -