CNV-RF Is a Random Forest–Based Copy Number Variation Detection Method Using Next-Generation Sequencing

Getiria Onsongo, Linda Baughn, Matthew Bower, Christine Henzler, Matthew Schomaker, Kevin A.T. Silverstein, Bharat Thyagarajan

Research output: Contribution to journalArticle

8 Citations (Scopus)

Abstract

Simultaneous detection of small copy number variations (CNVs) (<0.5 kb) and single-nucleotide variants in clinically significant genes is of great interest for clinical laboratories. The analytical variability in next-generation sequencing (NGS) and artifacts in coverage data because of issues with mappability along with lack of robust bioinformatics tools for CNV detection have limited the utility of targeted NGS data to identify CNVs. We describe the development and implementation of a bioinformatics algorithm, copy number variation–random forest (CNV-RF), that incorporates a machine learning component to identify CNVs from targeted NGS data. Using CNV-RF, we identified 12 of 13 deletions in samples with known CNVs, two cases with duplications, and identified novel deletions in 22 additional cases. Furthermore, no CNVs were identified among 60 genes in 14 cases with normal copy number and no CNVs were identified in another 104 patients with clinical suspicion of CNVs. All positive deletions and duplications were confirmed using a quantitative PCR method. CNV-RF also detected heterozygous deletions and duplications with a specificity of 50% across 4813 genes. The ability of CNV-RF to detect clinically relevant CNVs with a high degree of sensitivity along with confirmation using a low-cost quantitative PCR method provides a framework for providing comprehensive NGS-based CNV/single-nucleotide variant detection in a clinical molecular diagnostics laboratory.

Original languageEnglish (US)
Pages (from-to)872-881
Number of pages10
JournalJournal of Molecular Diagnostics
Volume18
Issue number6
DOIs
StatePublished - Nov 1 2016
Externally publishedYes

Fingerprint

Computational Biology
Nucleotides
Genes
Polymerase Chain Reaction
Molecular Pathology
Artifacts
Costs and Cost Analysis
Machine Learning
Forests

ASJC Scopus subject areas

  • Pathology and Forensic Medicine
  • Molecular Medicine

Cite this

CNV-RF Is a Random Forest–Based Copy Number Variation Detection Method Using Next-Generation Sequencing. / Onsongo, Getiria; Baughn, Linda; Bower, Matthew; Henzler, Christine; Schomaker, Matthew; Silverstein, Kevin A.T.; Thyagarajan, Bharat.

In: Journal of Molecular Diagnostics, Vol. 18, No. 6, 01.11.2016, p. 872-881.

Research output: Contribution to journalArticle

Onsongo, Getiria ; Baughn, Linda ; Bower, Matthew ; Henzler, Christine ; Schomaker, Matthew ; Silverstein, Kevin A.T. ; Thyagarajan, Bharat. / CNV-RF Is a Random Forest–Based Copy Number Variation Detection Method Using Next-Generation Sequencing. In: Journal of Molecular Diagnostics. 2016 ; Vol. 18, No. 6. pp. 872-881.
@article{560b6057e37c4482b191950c1419df45,
title = "CNV-RF Is a Random Forest–Based Copy Number Variation Detection Method Using Next-Generation Sequencing",
abstract = "Simultaneous detection of small copy number variations (CNVs) (<0.5 kb) and single-nucleotide variants in clinically significant genes is of great interest for clinical laboratories. The analytical variability in next-generation sequencing (NGS) and artifacts in coverage data because of issues with mappability along with lack of robust bioinformatics tools for CNV detection have limited the utility of targeted NGS data to identify CNVs. We describe the development and implementation of a bioinformatics algorithm, copy number variation–random forest (CNV-RF), that incorporates a machine learning component to identify CNVs from targeted NGS data. Using CNV-RF, we identified 12 of 13 deletions in samples with known CNVs, two cases with duplications, and identified novel deletions in 22 additional cases. Furthermore, no CNVs were identified among 60 genes in 14 cases with normal copy number and no CNVs were identified in another 104 patients with clinical suspicion of CNVs. All positive deletions and duplications were confirmed using a quantitative PCR method. CNV-RF also detected heterozygous deletions and duplications with a specificity of 50{\%} across 4813 genes. The ability of CNV-RF to detect clinically relevant CNVs with a high degree of sensitivity along with confirmation using a low-cost quantitative PCR method provides a framework for providing comprehensive NGS-based CNV/single-nucleotide variant detection in a clinical molecular diagnostics laboratory.",
author = "Getiria Onsongo and Linda Baughn and Matthew Bower and Christine Henzler and Matthew Schomaker and Silverstein, {Kevin A.T.} and Bharat Thyagarajan",
year = "2016",
month = "11",
day = "1",
doi = "10.1016/j.jmoldx.2016.07.001",
language = "English (US)",
volume = "18",
pages = "872--881",
journal = "Journal of Molecular Diagnostics",
issn = "1525-1578",
publisher = "Association of Molecular Pathology",
number = "6",

}

TY - JOUR

T1 - CNV-RF Is a Random Forest–Based Copy Number Variation Detection Method Using Next-Generation Sequencing

AU - Onsongo, Getiria

AU - Baughn, Linda

AU - Bower, Matthew

AU - Henzler, Christine

AU - Schomaker, Matthew

AU - Silverstein, Kevin A.T.

AU - Thyagarajan, Bharat

PY - 2016/11/1

Y1 - 2016/11/1

N2 - Simultaneous detection of small copy number variations (CNVs) (<0.5 kb) and single-nucleotide variants in clinically significant genes is of great interest for clinical laboratories. The analytical variability in next-generation sequencing (NGS) and artifacts in coverage data because of issues with mappability along with lack of robust bioinformatics tools for CNV detection have limited the utility of targeted NGS data to identify CNVs. We describe the development and implementation of a bioinformatics algorithm, copy number variation–random forest (CNV-RF), that incorporates a machine learning component to identify CNVs from targeted NGS data. Using CNV-RF, we identified 12 of 13 deletions in samples with known CNVs, two cases with duplications, and identified novel deletions in 22 additional cases. Furthermore, no CNVs were identified among 60 genes in 14 cases with normal copy number and no CNVs were identified in another 104 patients with clinical suspicion of CNVs. All positive deletions and duplications were confirmed using a quantitative PCR method. CNV-RF also detected heterozygous deletions and duplications with a specificity of 50% across 4813 genes. The ability of CNV-RF to detect clinically relevant CNVs with a high degree of sensitivity along with confirmation using a low-cost quantitative PCR method provides a framework for providing comprehensive NGS-based CNV/single-nucleotide variant detection in a clinical molecular diagnostics laboratory.

AB - Simultaneous detection of small copy number variations (CNVs) (<0.5 kb) and single-nucleotide variants in clinically significant genes is of great interest for clinical laboratories. The analytical variability in next-generation sequencing (NGS) and artifacts in coverage data because of issues with mappability along with lack of robust bioinformatics tools for CNV detection have limited the utility of targeted NGS data to identify CNVs. We describe the development and implementation of a bioinformatics algorithm, copy number variation–random forest (CNV-RF), that incorporates a machine learning component to identify CNVs from targeted NGS data. Using CNV-RF, we identified 12 of 13 deletions in samples with known CNVs, two cases with duplications, and identified novel deletions in 22 additional cases. Furthermore, no CNVs were identified among 60 genes in 14 cases with normal copy number and no CNVs were identified in another 104 patients with clinical suspicion of CNVs. All positive deletions and duplications were confirmed using a quantitative PCR method. CNV-RF also detected heterozygous deletions and duplications with a specificity of 50% across 4813 genes. The ability of CNV-RF to detect clinically relevant CNVs with a high degree of sensitivity along with confirmation using a low-cost quantitative PCR method provides a framework for providing comprehensive NGS-based CNV/single-nucleotide variant detection in a clinical molecular diagnostics laboratory.

UR - http://www.scopus.com/inward/record.url?scp=84992517622&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84992517622&partnerID=8YFLogxK

U2 - 10.1016/j.jmoldx.2016.07.001

DO - 10.1016/j.jmoldx.2016.07.001

M3 - Article

C2 - 27597741

AN - SCOPUS:84992517622

VL - 18

SP - 872

EP - 881

JO - Journal of Molecular Diagnostics

JF - Journal of Molecular Diagnostics

SN - 1525-1578

IS - 6

ER -