Building a Research-Quality Copy Number Variation Data Repository for Translational Research

Chen Wang, Raymond M. Moore, Jared M. Evans, Xiaonan Hou, S. John Weroha, Guoqian D Jiang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Copy number variation (CNV) has known associations with population diversities and disease conditions. However, research communities face great challenges in reusing the CNV data due to the heterogeneity of existing CNV data sources. The objective of the study is to design, develop and evaluate a scalable CNV data repository based on a proposed common data schema for facilitating research-quality CNV data integration and reuse. We created a proposal for a CNV common data schema through analyzing multiple existing CNV data sources. We designed a collection of the CNV quality metrics and demonstrated its usefulness using the CNV data from a study of ovarian cancer xenograft models. We implemented a CNV data repository using a MongoDB database backend and established the CNV genomic data services that enable reusing of the curated CNV data and answering CNV-relevant research questions. The critical issues and future plan for the system enhancement and community engagement were discussed.

Original languageEnglish (US)
Title of host publicationHeterogeneous Data Management, Polystores, and Analytics for Healthcare - VLDB 2018 Workshops, Poly and DMAH, 2018, Revised Selected Papers
EditorsMichael Stonebraker, Fusheng Wang, Vijay Gadepally, George Teodoro, Timothy Mattson, Gang Luo
PublisherSpringer Verlag
Pages148-161
Number of pages14
ISBN (Print)9783030141769
DOIs
StatePublished - Jan 1 2019
EventInternational Workshop on Polystores and other Systems for Heterogeneous Data, Poly 2018 and 4th International Workshop on Data Management and Analytics for Medicine and Health Care, DMAH 2019 held in conjunction with 44th International Conference on Very Large Data Bases, VLDB 2018 - Rio de Janeiro, Brazil
Duration: Aug 27 2018Aug 31 2018

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11470 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

ConferenceInternational Workshop on Polystores and other Systems for Heterogeneous Data, Poly 2018 and 4th International Workshop on Data Management and Analytics for Medicine and Health Care, DMAH 2019 held in conjunction with 44th International Conference on Very Large Data Bases, VLDB 2018
CountryBrazil
CityRio de Janeiro
Period8/27/188/31/18

Fingerprint

Repository
Data integration
Schema
Data Reuse
Ovarian Cancer
Population Diversity
Data Integration
Genomics
Enhancement
Metric
Heterografts

Keywords

  • Copy number variation
  • Integrated data repository
  • Quality assurance
  • Standardization

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Cite this

Wang, C., Moore, R. M., Evans, J. M., Hou, X., John Weroha, S., & Jiang, G. D. (2019). Building a Research-Quality Copy Number Variation Data Repository for Translational Research. In M. Stonebraker, F. Wang, V. Gadepally, G. Teodoro, T. Mattson, & G. Luo (Eds.), Heterogeneous Data Management, Polystores, and Analytics for Healthcare - VLDB 2018 Workshops, Poly and DMAH, 2018, Revised Selected Papers (pp. 148-161). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 11470 LNCS). Springer Verlag. https://doi.org/10.1007/978-3-030-14177-6_12

Building a Research-Quality Copy Number Variation Data Repository for Translational Research. / Wang, Chen; Moore, Raymond M.; Evans, Jared M.; Hou, Xiaonan; John Weroha, S.; Jiang, Guoqian D.

Heterogeneous Data Management, Polystores, and Analytics for Healthcare - VLDB 2018 Workshops, Poly and DMAH, 2018, Revised Selected Papers. ed. / Michael Stonebraker; Fusheng Wang; Vijay Gadepally; George Teodoro; Timothy Mattson; Gang Luo. Springer Verlag, 2019. p. 148-161 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 11470 LNCS).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Wang, C, Moore, RM, Evans, JM, Hou, X, John Weroha, S & Jiang, GD 2019, Building a Research-Quality Copy Number Variation Data Repository for Translational Research. in M Stonebraker, F Wang, V Gadepally, G Teodoro, T Mattson & G Luo (eds), Heterogeneous Data Management, Polystores, and Analytics for Healthcare - VLDB 2018 Workshops, Poly and DMAH, 2018, Revised Selected Papers. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 11470 LNCS, Springer Verlag, pp. 148-161, International Workshop on Polystores and other Systems for Heterogeneous Data, Poly 2018 and 4th International Workshop on Data Management and Analytics for Medicine and Health Care, DMAH 2019 held in conjunction with 44th International Conference on Very Large Data Bases, VLDB 2018, Rio de Janeiro, Brazil, 8/27/18. https://doi.org/10.1007/978-3-030-14177-6_12
Wang C, Moore RM, Evans JM, Hou X, John Weroha S, Jiang GD. Building a Research-Quality Copy Number Variation Data Repository for Translational Research. In Stonebraker M, Wang F, Gadepally V, Teodoro G, Mattson T, Luo G, editors, Heterogeneous Data Management, Polystores, and Analytics for Healthcare - VLDB 2018 Workshops, Poly and DMAH, 2018, Revised Selected Papers. Springer Verlag. 2019. p. 148-161. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-030-14177-6_12
Wang, Chen ; Moore, Raymond M. ; Evans, Jared M. ; Hou, Xiaonan ; John Weroha, S. ; Jiang, Guoqian D. / Building a Research-Quality Copy Number Variation Data Repository for Translational Research. Heterogeneous Data Management, Polystores, and Analytics for Healthcare - VLDB 2018 Workshops, Poly and DMAH, 2018, Revised Selected Papers. editor / Michael Stonebraker ; Fusheng Wang ; Vijay Gadepally ; George Teodoro ; Timothy Mattson ; Gang Luo. Springer Verlag, 2019. pp. 148-161 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{b30305b19a7145ccadbe80f3bb66efdb,
title = "Building a Research-Quality Copy Number Variation Data Repository for Translational Research",
abstract = "Copy number variation (CNV) has known associations with population diversities and disease conditions. However, research communities face great challenges in reusing the CNV data due to the heterogeneity of existing CNV data sources. The objective of the study is to design, develop and evaluate a scalable CNV data repository based on a proposed common data schema for facilitating research-quality CNV data integration and reuse. We created a proposal for a CNV common data schema through analyzing multiple existing CNV data sources. We designed a collection of the CNV quality metrics and demonstrated its usefulness using the CNV data from a study of ovarian cancer xenograft models. We implemented a CNV data repository using a MongoDB database backend and established the CNV genomic data services that enable reusing of the curated CNV data and answering CNV-relevant research questions. The critical issues and future plan for the system enhancement and community engagement were discussed.",
keywords = "Copy number variation, Integrated data repository, Quality assurance, Standardization",
author = "Chen Wang and Moore, {Raymond M.} and Evans, {Jared M.} and Xiaonan Hou and {John Weroha}, S. and Jiang, {Guoqian D}",
year = "2019",
month = "1",
day = "1",
doi = "10.1007/978-3-030-14177-6_12",
language = "English (US)",
isbn = "9783030141769",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer Verlag",
pages = "148--161",
editor = "Michael Stonebraker and Fusheng Wang and Vijay Gadepally and George Teodoro and Timothy Mattson and Gang Luo",
booktitle = "Heterogeneous Data Management, Polystores, and Analytics for Healthcare - VLDB 2018 Workshops, Poly and DMAH, 2018, Revised Selected Papers",
address = "Germany",

}

TY - GEN

T1 - Building a Research-Quality Copy Number Variation Data Repository for Translational Research

AU - Wang, Chen

AU - Moore, Raymond M.

AU - Evans, Jared M.

AU - Hou, Xiaonan

AU - John Weroha, S.

AU - Jiang, Guoqian D

PY - 2019/1/1

Y1 - 2019/1/1

N2 - Copy number variation (CNV) has known associations with population diversities and disease conditions. However, research communities face great challenges in reusing the CNV data due to the heterogeneity of existing CNV data sources. The objective of the study is to design, develop and evaluate a scalable CNV data repository based on a proposed common data schema for facilitating research-quality CNV data integration and reuse. We created a proposal for a CNV common data schema through analyzing multiple existing CNV data sources. We designed a collection of the CNV quality metrics and demonstrated its usefulness using the CNV data from a study of ovarian cancer xenograft models. We implemented a CNV data repository using a MongoDB database backend and established the CNV genomic data services that enable reusing of the curated CNV data and answering CNV-relevant research questions. The critical issues and future plan for the system enhancement and community engagement were discussed.

AB - Copy number variation (CNV) has known associations with population diversities and disease conditions. However, research communities face great challenges in reusing the CNV data due to the heterogeneity of existing CNV data sources. The objective of the study is to design, develop and evaluate a scalable CNV data repository based on a proposed common data schema for facilitating research-quality CNV data integration and reuse. We created a proposal for a CNV common data schema through analyzing multiple existing CNV data sources. We designed a collection of the CNV quality metrics and demonstrated its usefulness using the CNV data from a study of ovarian cancer xenograft models. We implemented a CNV data repository using a MongoDB database backend and established the CNV genomic data services that enable reusing of the curated CNV data and answering CNV-relevant research questions. The critical issues and future plan for the system enhancement and community engagement were discussed.

KW - Copy number variation

KW - Integrated data repository

KW - Quality assurance

KW - Standardization

UR - http://www.scopus.com/inward/record.url?scp=85064599509&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85064599509&partnerID=8YFLogxK

U2 - 10.1007/978-3-030-14177-6_12

DO - 10.1007/978-3-030-14177-6_12

M3 - Conference contribution

SN - 9783030141769

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 148

EP - 161

BT - Heterogeneous Data Management, Polystores, and Analytics for Healthcare - VLDB 2018 Workshops, Poly and DMAH, 2018, Revised Selected Papers

A2 - Stonebraker, Michael

A2 - Wang, Fusheng

A2 - Gadepally, Vijay

A2 - Teodoro, George

A2 - Mattson, Timothy

A2 - Luo, Gang

PB - Springer Verlag

ER -