Coverage profile correction of shallow-depth circulating cell-free DNA sequencing via multidistance learning

Nicholas B. Larson, Melissa C. Larson, Jie Na, Carlos P. Sosa, Chen Wang, Jean Pierre Kocher, Ross Rowsey

Research output: Contribution to journalArticle

Abstract

Shallow-depth whole-genome sequencing (WGS) of circulating cell-free DNA (ccfDNA) is a popular approach for non-invasive genomic screening assays, including liquid biopsy for early detection of invasive tumors as well as non-invasive prenatal screening (NIPS) for common fetal trisomies. In contrast to nuclear DNA WGS, ccfDNA WGS exhibits extensive inter- and intra- sample coverage variability that is not fully explained by typical sources of variation in WGS, such as GC content. This variability may inflate false positive and false negative screening rates of copy-number alterations and aneuploidy, particularly if these features are present at a relatively low proportion of total sequenced content. Herein, we propose an empirically-driven coverage correction strategy that leverages prior annotation information in a multi-distance learning context to improve within-sample coverage profile correction. Specifically, we train a weighted k-nearest neighbors-style method on non-pregnant female donor ccfDNA WGS samples, and apply it to NIPS samples to evaluate coverage profile variability reduction. We additionally characterize improvement in the discrimination of positive fetal trisomy cases relative to normal controls, and compare our results against a more traditional regression-based approach to profile coverage correction based on GC content and mappability. Under cross-validation, performance measures indicated benefit to combining the two feature sets relative to either in isolation. We also observed substantial improvement in coverage profile variability reduction in leave-out clinical NIPS samples, with variability reduced by 26.5-53.5% relative to the standard regression-based method as quantified by median absolute deviation. Finally, we observed improvement discrimination for screening positive trisomy cases reducing ccfDNA WGS coverage variability while additionally improving NIPS trisomy screening assay performance. Overall, our results indicate that machine learning approaches can substantially improve ccfDNA WGS coverage profile correction and downstream analyses.

Original languageEnglish (US)
Pages (from-to)599-610
Number of pages12
JournalPacific Symposium on Biocomputing. Pacific Symposium on Biocomputing
Volume25
StatePublished - Jan 1 2020

Fingerprint

DNA Sequence Analysis
Screening
DNA
Learning
Genome
Genes
Trisomy
Prenatal Diagnosis
Base Composition
Assays
Distance Education
Aneuploidy
Biopsy
Distance education
Learning systems
Tumors
Liquids
Neoplasms

ASJC Scopus subject areas

  • Biomedical Engineering
  • Computational Theory and Mathematics

Cite this

Coverage profile correction of shallow-depth circulating cell-free DNA sequencing via multidistance learning. / Larson, Nicholas B.; Larson, Melissa C.; Na, Jie; Sosa, Carlos P.; Wang, Chen; Kocher, Jean Pierre; Rowsey, Ross.

In: Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing, Vol. 25, 01.01.2020, p. 599-610.

Research output: Contribution to journalArticle

Larson, Nicholas B. ; Larson, Melissa C. ; Na, Jie ; Sosa, Carlos P. ; Wang, Chen ; Kocher, Jean Pierre ; Rowsey, Ross. / Coverage profile correction of shallow-depth circulating cell-free DNA sequencing via multidistance learning. In: Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing. 2020 ; Vol. 25. pp. 599-610.
@article{d1970a841b7f49cb87d2dda96094de95,
title = "Coverage profile correction of shallow-depth circulating cell-free DNA sequencing via multidistance learning",
abstract = "Shallow-depth whole-genome sequencing (WGS) of circulating cell-free DNA (ccfDNA) is a popular approach for non-invasive genomic screening assays, including liquid biopsy for early detection of invasive tumors as well as non-invasive prenatal screening (NIPS) for common fetal trisomies. In contrast to nuclear DNA WGS, ccfDNA WGS exhibits extensive inter- and intra- sample coverage variability that is not fully explained by typical sources of variation in WGS, such as GC content. This variability may inflate false positive and false negative screening rates of copy-number alterations and aneuploidy, particularly if these features are present at a relatively low proportion of total sequenced content. Herein, we propose an empirically-driven coverage correction strategy that leverages prior annotation information in a multi-distance learning context to improve within-sample coverage profile correction. Specifically, we train a weighted k-nearest neighbors-style method on non-pregnant female donor ccfDNA WGS samples, and apply it to NIPS samples to evaluate coverage profile variability reduction. We additionally characterize improvement in the discrimination of positive fetal trisomy cases relative to normal controls, and compare our results against a more traditional regression-based approach to profile coverage correction based on GC content and mappability. Under cross-validation, performance measures indicated benefit to combining the two feature sets relative to either in isolation. We also observed substantial improvement in coverage profile variability reduction in leave-out clinical NIPS samples, with variability reduced by 26.5-53.5{\%} relative to the standard regression-based method as quantified by median absolute deviation. Finally, we observed improvement discrimination for screening positive trisomy cases reducing ccfDNA WGS coverage variability while additionally improving NIPS trisomy screening assay performance. Overall, our results indicate that machine learning approaches can substantially improve ccfDNA WGS coverage profile correction and downstream analyses.",
author = "Larson, {Nicholas B.} and Larson, {Melissa C.} and Jie Na and Sosa, {Carlos P.} and Chen Wang and Kocher, {Jean Pierre} and Ross Rowsey",
year = "2020",
month = "1",
day = "1",
language = "English (US)",
volume = "25",
pages = "599--610",
journal = "Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing",
issn = "2335-6936",

}

TY - JOUR

T1 - Coverage profile correction of shallow-depth circulating cell-free DNA sequencing via multidistance learning

AU - Larson, Nicholas B.

AU - Larson, Melissa C.

AU - Na, Jie

AU - Sosa, Carlos P.

AU - Wang, Chen

AU - Kocher, Jean Pierre

AU - Rowsey, Ross

PY - 2020/1/1

Y1 - 2020/1/1

N2 - Shallow-depth whole-genome sequencing (WGS) of circulating cell-free DNA (ccfDNA) is a popular approach for non-invasive genomic screening assays, including liquid biopsy for early detection of invasive tumors as well as non-invasive prenatal screening (NIPS) for common fetal trisomies. In contrast to nuclear DNA WGS, ccfDNA WGS exhibits extensive inter- and intra- sample coverage variability that is not fully explained by typical sources of variation in WGS, such as GC content. This variability may inflate false positive and false negative screening rates of copy-number alterations and aneuploidy, particularly if these features are present at a relatively low proportion of total sequenced content. Herein, we propose an empirically-driven coverage correction strategy that leverages prior annotation information in a multi-distance learning context to improve within-sample coverage profile correction. Specifically, we train a weighted k-nearest neighbors-style method on non-pregnant female donor ccfDNA WGS samples, and apply it to NIPS samples to evaluate coverage profile variability reduction. We additionally characterize improvement in the discrimination of positive fetal trisomy cases relative to normal controls, and compare our results against a more traditional regression-based approach to profile coverage correction based on GC content and mappability. Under cross-validation, performance measures indicated benefit to combining the two feature sets relative to either in isolation. We also observed substantial improvement in coverage profile variability reduction in leave-out clinical NIPS samples, with variability reduced by 26.5-53.5% relative to the standard regression-based method as quantified by median absolute deviation. Finally, we observed improvement discrimination for screening positive trisomy cases reducing ccfDNA WGS coverage variability while additionally improving NIPS trisomy screening assay performance. Overall, our results indicate that machine learning approaches can substantially improve ccfDNA WGS coverage profile correction and downstream analyses.

AB - Shallow-depth whole-genome sequencing (WGS) of circulating cell-free DNA (ccfDNA) is a popular approach for non-invasive genomic screening assays, including liquid biopsy for early detection of invasive tumors as well as non-invasive prenatal screening (NIPS) for common fetal trisomies. In contrast to nuclear DNA WGS, ccfDNA WGS exhibits extensive inter- and intra- sample coverage variability that is not fully explained by typical sources of variation in WGS, such as GC content. This variability may inflate false positive and false negative screening rates of copy-number alterations and aneuploidy, particularly if these features are present at a relatively low proportion of total sequenced content. Herein, we propose an empirically-driven coverage correction strategy that leverages prior annotation information in a multi-distance learning context to improve within-sample coverage profile correction. Specifically, we train a weighted k-nearest neighbors-style method on non-pregnant female donor ccfDNA WGS samples, and apply it to NIPS samples to evaluate coverage profile variability reduction. We additionally characterize improvement in the discrimination of positive fetal trisomy cases relative to normal controls, and compare our results against a more traditional regression-based approach to profile coverage correction based on GC content and mappability. Under cross-validation, performance measures indicated benefit to combining the two feature sets relative to either in isolation. We also observed substantial improvement in coverage profile variability reduction in leave-out clinical NIPS samples, with variability reduced by 26.5-53.5% relative to the standard regression-based method as quantified by median absolute deviation. Finally, we observed improvement discrimination for screening positive trisomy cases reducing ccfDNA WGS coverage variability while additionally improving NIPS trisomy screening assay performance. Overall, our results indicate that machine learning approaches can substantially improve ccfDNA WGS coverage profile correction and downstream analyses.

UR - http://www.scopus.com/inward/record.url?scp=85076052181&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85076052181&partnerID=8YFLogxK

M3 - Article

C2 - 31797631

AN - SCOPUS:85076052181

VL - 25

SP - 599

EP - 610

JO - Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing

JF - Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing

SN - 2335-6936

ER -