Building a Research-Quality Copy Number Variation Data Repository for Translational Research

Chen Wang, Raymond M. Moore, Jared M. Evans, Xiaonan Hou, S. John Weroha, Guoqian Jiang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Copy number variation (CNV) has known associations with population diversities and disease conditions. However, research communities face great challenges in reusing the CNV data due to the heterogeneity of existing CNV data sources. The objective of the study is to design, develop and evaluate a scalable CNV data repository based on a proposed common data schema for facilitating research-quality CNV data integration and reuse. We created a proposal for a CNV common data schema through analyzing multiple existing CNV data sources. We designed a collection of the CNV quality metrics and demonstrated its usefulness using the CNV data from a study of ovarian cancer xenograft models. We implemented a CNV data repository using a MongoDB database backend and established the CNV genomic data services that enable reusing of the curated CNV data and answering CNV-relevant research questions. The critical issues and future plan for the system enhancement and community engagement were discussed.

Original languageEnglish (US)
Title of host publicationHeterogeneous Data Management, Polystores, and Analytics for Healthcare - VLDB 2018 Workshops, Poly and DMAH, 2018, Revised Selected Papers
EditorsTimothy Mattson, Fusheng Wang, George Teodoro, Michael Stonebraker, Vijay Gadepally, Gang Luo
PublisherSpringer Verlag
Pages148-161
Number of pages14
ISBN (Print)9783030141769
DOIs
StatePublished - 2019
EventInternational Workshop on Polystores and other Systems for Heterogeneous Data, Poly 2018 and 4th International Workshop on Data Management and Analytics for Medicine and Health Care, DMAH 2019 held in conjunction with 44th International Conference on Very Large Data Bases, VLDB 2018 - Rio de Janeiro, Brazil
Duration: Aug 27 2018Aug 31 2018

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11470 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

ConferenceInternational Workshop on Polystores and other Systems for Heterogeneous Data, Poly 2018 and 4th International Workshop on Data Management and Analytics for Medicine and Health Care, DMAH 2019 held in conjunction with 44th International Conference on Very Large Data Bases, VLDB 2018
Country/TerritoryBrazil
CityRio de Janeiro
Period8/27/188/31/18

Keywords

  • Copy number variation
  • Integrated data repository
  • Quality assurance
  • Standardization

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'Building a Research-Quality Copy Number Variation Data Repository for Translational Research'. Together they form a unique fingerprint.

Cite this