RefCNV: Identification of gene-based copy number variants using whole exome sequencing

Lun Ching Chang, Biswajit Das, Chih Jian Lih, Han Si, Corinne E. Camalier, Paul M. McGregor, Eric Polley

Research output: Contribution to journalArticle

2 Citations (Scopus)

Abstract

With rapid advances in DNA sequencing technologies, whole exome sequencing (WES) has become a popular approach for detecting somatic mutations in oncology studies. The initial intent of WES was to characterize single nucleotide variants, but it was observed that the number of sequencing reads that mapped to a genomic region correlated with the DNA copy number variants (CNVs). We propose a method RefCNV that uses a reference set to estimate the distribution of the coverage for each exon. The construction of the reference set includes an evaluation of the sources of variability in the coverage distribution. We observed that the processing steps had an impact on the coverage distribution. For each exon, we compared the observed coverage with the expected normal coverage. Thresholds for determining CNVs were selected to control the false-positive error rate. RefCNV prediction correlated significantly (r = 0.96-0.86) with CNV measured by digital polymerase chain reaction for MET (7q31), EGFR (7p12), or ERBB2 (17q12) in 13 tumor cell lines. The genome-wide CNV analysis showed a good overall correlation (Spearman’s coefficient = 0.82) between RefCNV estimation and publicly available CNV data in Cancer Cell Line Encyclopedia. RefCNV also showed better performance than three other CNV estimation methods in genome-wide CNV analysis.

Original languageEnglish (US)
Pages (from-to)65-71
Number of pages7
JournalCancer Informatics
Volume15
DOIs
StatePublished - Apr 27 2016
Externally publishedYes

Fingerprint

Exome
Gene Dosage
Exons
DNA Copy Number Variations
Genome
Encyclopedias
Tumor Cell Line
DNA Sequence Analysis
Nucleotides
Technology
Cell Line
Polymerase Chain Reaction
Mutation
Neoplasms

Keywords

  • Copy number variation
  • Methodology
  • Next-generation sequencing
  • Whole exome sequencing

ASJC Scopus subject areas

  • Oncology
  • Cancer Research

Cite this

RefCNV : Identification of gene-based copy number variants using whole exome sequencing. / Chang, Lun Ching; Das, Biswajit; Lih, Chih Jian; Si, Han; Camalier, Corinne E.; McGregor, Paul M.; Polley, Eric.

In: Cancer Informatics, Vol. 15, 27.04.2016, p. 65-71.

Research output: Contribution to journalArticle

Chang, Lun Ching ; Das, Biswajit ; Lih, Chih Jian ; Si, Han ; Camalier, Corinne E. ; McGregor, Paul M. ; Polley, Eric. / RefCNV : Identification of gene-based copy number variants using whole exome sequencing. In: Cancer Informatics. 2016 ; Vol. 15. pp. 65-71.
@article{92a80b753020478282945a1980bd41cd,
title = "RefCNV: Identification of gene-based copy number variants using whole exome sequencing",
abstract = "With rapid advances in DNA sequencing technologies, whole exome sequencing (WES) has become a popular approach for detecting somatic mutations in oncology studies. The initial intent of WES was to characterize single nucleotide variants, but it was observed that the number of sequencing reads that mapped to a genomic region correlated with the DNA copy number variants (CNVs). We propose a method RefCNV that uses a reference set to estimate the distribution of the coverage for each exon. The construction of the reference set includes an evaluation of the sources of variability in the coverage distribution. We observed that the processing steps had an impact on the coverage distribution. For each exon, we compared the observed coverage with the expected normal coverage. Thresholds for determining CNVs were selected to control the false-positive error rate. RefCNV prediction correlated significantly (r = 0.96-0.86) with CNV measured by digital polymerase chain reaction for MET (7q31), EGFR (7p12), or ERBB2 (17q12) in 13 tumor cell lines. The genome-wide CNV analysis showed a good overall correlation (Spearman’s coefficient = 0.82) between RefCNV estimation and publicly available CNV data in Cancer Cell Line Encyclopedia. RefCNV also showed better performance than three other CNV estimation methods in genome-wide CNV analysis.",
keywords = "Copy number variation, Methodology, Next-generation sequencing, Whole exome sequencing",
author = "Chang, {Lun Ching} and Biswajit Das and Lih, {Chih Jian} and Han Si and Camalier, {Corinne E.} and McGregor, {Paul M.} and Eric Polley",
year = "2016",
month = "4",
day = "27",
doi = "10.4137/CIN.S36612",
language = "English (US)",
volume = "15",
pages = "65--71",
journal = "Cancer Informatics",
issn = "1176-9351",
publisher = "Libertas Academica Ltd.",

}

TY - JOUR

T1 - RefCNV

T2 - Identification of gene-based copy number variants using whole exome sequencing

AU - Chang, Lun Ching

AU - Das, Biswajit

AU - Lih, Chih Jian

AU - Si, Han

AU - Camalier, Corinne E.

AU - McGregor, Paul M.

AU - Polley, Eric

PY - 2016/4/27

Y1 - 2016/4/27

N2 - With rapid advances in DNA sequencing technologies, whole exome sequencing (WES) has become a popular approach for detecting somatic mutations in oncology studies. The initial intent of WES was to characterize single nucleotide variants, but it was observed that the number of sequencing reads that mapped to a genomic region correlated with the DNA copy number variants (CNVs). We propose a method RefCNV that uses a reference set to estimate the distribution of the coverage for each exon. The construction of the reference set includes an evaluation of the sources of variability in the coverage distribution. We observed that the processing steps had an impact on the coverage distribution. For each exon, we compared the observed coverage with the expected normal coverage. Thresholds for determining CNVs were selected to control the false-positive error rate. RefCNV prediction correlated significantly (r = 0.96-0.86) with CNV measured by digital polymerase chain reaction for MET (7q31), EGFR (7p12), or ERBB2 (17q12) in 13 tumor cell lines. The genome-wide CNV analysis showed a good overall correlation (Spearman’s coefficient = 0.82) between RefCNV estimation and publicly available CNV data in Cancer Cell Line Encyclopedia. RefCNV also showed better performance than three other CNV estimation methods in genome-wide CNV analysis.

AB - With rapid advances in DNA sequencing technologies, whole exome sequencing (WES) has become a popular approach for detecting somatic mutations in oncology studies. The initial intent of WES was to characterize single nucleotide variants, but it was observed that the number of sequencing reads that mapped to a genomic region correlated with the DNA copy number variants (CNVs). We propose a method RefCNV that uses a reference set to estimate the distribution of the coverage for each exon. The construction of the reference set includes an evaluation of the sources of variability in the coverage distribution. We observed that the processing steps had an impact on the coverage distribution. For each exon, we compared the observed coverage with the expected normal coverage. Thresholds for determining CNVs were selected to control the false-positive error rate. RefCNV prediction correlated significantly (r = 0.96-0.86) with CNV measured by digital polymerase chain reaction for MET (7q31), EGFR (7p12), or ERBB2 (17q12) in 13 tumor cell lines. The genome-wide CNV analysis showed a good overall correlation (Spearman’s coefficient = 0.82) between RefCNV estimation and publicly available CNV data in Cancer Cell Line Encyclopedia. RefCNV also showed better performance than three other CNV estimation methods in genome-wide CNV analysis.

KW - Copy number variation

KW - Methodology

KW - Next-generation sequencing

KW - Whole exome sequencing

UR - http://www.scopus.com/inward/record.url?scp=84964812718&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84964812718&partnerID=8YFLogxK

U2 - 10.4137/CIN.S36612

DO - 10.4137/CIN.S36612

M3 - Article

AN - SCOPUS:84964812718

VL - 15

SP - 65

EP - 71

JO - Cancer Informatics

JF - Cancer Informatics

SN - 1176-9351

ER -