Scalable Open Science Approach for Mutation Calling of Tumor Exomes Using Multiple Genomic Pipelines

MC3 Working Group, The Cancer Genome Atlas Research Network

Research output: Contribution to journalArticle

43 Citations (Scopus)

Abstract

The Cancer Genome Atlas (TCGA) cancer genomics dataset includes over 10,000 tumor-normal exome pairs across 33 different cancer types, in total >400 TB of raw data files requiring analysis. Here we describe the Multi-Center Mutation Calling in Multiple Cancers project, our effort to generate a comprehensive encyclopedia of somatic mutation calls for the TCGA data to enable robust cross-tumor-type analyses. Our approach accounts for variance and batch effects introduced by the rapid advancement of DNA extraction, hybridization-capture, sequencing, and analysis methods over time. We present best practices for applying an ensemble of seven mutation-calling algorithms with scoring and artifact filtering. The dataset created by this analysis includes 3.5 million somatic variants and forms the basis for PanCan Atlas papers. The results have been made available to the research community along with the methods used to generate them. This project is the result of collaboration from a number of institutes and demonstrates how team science drives extremely large genomics projects. The MC3 is a variant calling project of over 10,000 cancer exome samples from 33 cancer types. Over three million somatic variants were detected using seven different methods developed from institutions across the United States. These variants formed the basis for the PanCan Atlas papers.

Original languageEnglish (US)
Pages (from-to)271-281.e7
JournalCell Systems
Volume6
Issue number3
DOIs
StatePublished - Mar 28 2018

Fingerprint

Exome
Mutation
Atlases
Neoplasms
Genomics
Genome
Encyclopedias
Information Storage and Retrieval
Practice Guidelines
Artifacts

Keywords

  • large-scale
  • open science
  • pan-cancer
  • PanCanAtlas project
  • reproducible computing
  • somatic mutation calling
  • TCGA

ASJC Scopus subject areas

  • Pathology and Forensic Medicine
  • Histology
  • Cell Biology

Cite this

Scalable Open Science Approach for Mutation Calling of Tumor Exomes Using Multiple Genomic Pipelines. / MC3 Working Group; The Cancer Genome Atlas Research Network.

In: Cell Systems, Vol. 6, No. 3, 28.03.2018, p. 271-281.e7.

Research output: Contribution to journalArticle

MC3 Working Group & The Cancer Genome Atlas Research Network 2018, 'Scalable Open Science Approach for Mutation Calling of Tumor Exomes Using Multiple Genomic Pipelines', Cell Systems, vol. 6, no. 3, pp. 271-281.e7. https://doi.org/10.1016/j.cels.2018.03.002
MC3 Working Group ; The Cancer Genome Atlas Research Network. / Scalable Open Science Approach for Mutation Calling of Tumor Exomes Using Multiple Genomic Pipelines. In: Cell Systems. 2018 ; Vol. 6, No. 3. pp. 271-281.e7.
@article{fee32375d2134133bf6827b49168f108,
title = "Scalable Open Science Approach for Mutation Calling of Tumor Exomes Using Multiple Genomic Pipelines",
abstract = "The Cancer Genome Atlas (TCGA) cancer genomics dataset includes over 10,000 tumor-normal exome pairs across 33 different cancer types, in total >400 TB of raw data files requiring analysis. Here we describe the Multi-Center Mutation Calling in Multiple Cancers project, our effort to generate a comprehensive encyclopedia of somatic mutation calls for the TCGA data to enable robust cross-tumor-type analyses. Our approach accounts for variance and batch effects introduced by the rapid advancement of DNA extraction, hybridization-capture, sequencing, and analysis methods over time. We present best practices for applying an ensemble of seven mutation-calling algorithms with scoring and artifact filtering. The dataset created by this analysis includes 3.5 million somatic variants and forms the basis for PanCan Atlas papers. The results have been made available to the research community along with the methods used to generate them. This project is the result of collaboration from a number of institutes and demonstrates how team science drives extremely large genomics projects. The MC3 is a variant calling project of over 10,000 cancer exome samples from 33 cancer types. Over three million somatic variants were detected using seven different methods developed from institutions across the United States. These variants formed the basis for the PanCan Atlas papers.",
keywords = "large-scale, open science, pan-cancer, PanCanAtlas project, reproducible computing, somatic mutation calling, TCGA",
author = "{MC3 Working Group} and {The Cancer Genome Atlas Research Network} and Kyle Ellrott and Bailey, {Matthew H.} and Gordon Saksena and Covington, {Kyle R.} and Cyriac Kandoth and Chip Stewart and Julian Hess and Singer Ma and Chiotti, {Kami E.} and Michael McLellan and Sofia, {Heidi J.} and Carolyn Hutter and Gad Getz and David Wheeler and Li Ding and Caesar-Johnson, {Samantha J.} and Demchok, {John A.} and Ina Felau and Melpomeni Kasapi and Ferguson, {Martin L.} and Hutter, {Carolyn M.} and Sofia, {Heidi J.} and Roy Tarnuzzer and Zhining Wang and Liming Yang and Zenklusen, {Jean C.} and Zhang, {Jiashan (Julia)} and Sudha Chudamani and Jia Liu and Laxmi Lolla and Rashi Naresh and Todd Pihl and Qiang Sun and Yunhu Wan and Ye Wu and Juok Cho and Borad, {Mitesh J} and Vishal Chandan and John Cheville and Copland, {John A III} and Flotte, {Thomas J} and Michael Kendrick and Jean-Pierre Kocher and O'Neill, {Brian Patrick} and Patel, {Tushar C} and Petersen, {Gloria M} and Roberts, {Lewis Rowland} and Smallridge, {Robert Christian} and Melissa Stanton and Lizhi Zhang",
year = "2018",
month = "3",
day = "28",
doi = "10.1016/j.cels.2018.03.002",
language = "English (US)",
volume = "6",
pages = "271--281.e7",
journal = "Cell Systems",
issn = "2405-4712",
publisher = "Cell Press",
number = "3",

}

TY - JOUR

T1 - Scalable Open Science Approach for Mutation Calling of Tumor Exomes Using Multiple Genomic Pipelines

AU - MC3 Working Group

AU - The Cancer Genome Atlas Research Network

AU - Ellrott, Kyle

AU - Bailey, Matthew H.

AU - Saksena, Gordon

AU - Covington, Kyle R.

AU - Kandoth, Cyriac

AU - Stewart, Chip

AU - Hess, Julian

AU - Ma, Singer

AU - Chiotti, Kami E.

AU - McLellan, Michael

AU - Sofia, Heidi J.

AU - Hutter, Carolyn

AU - Getz, Gad

AU - Wheeler, David

AU - Ding, Li

AU - Caesar-Johnson, Samantha J.

AU - Demchok, John A.

AU - Felau, Ina

AU - Kasapi, Melpomeni

AU - Ferguson, Martin L.

AU - Hutter, Carolyn M.

AU - Sofia, Heidi J.

AU - Tarnuzzer, Roy

AU - Wang, Zhining

AU - Yang, Liming

AU - Zenklusen, Jean C.

AU - Zhang, Jiashan (Julia)

AU - Chudamani, Sudha

AU - Liu, Jia

AU - Lolla, Laxmi

AU - Naresh, Rashi

AU - Pihl, Todd

AU - Sun, Qiang

AU - Wan, Yunhu

AU - Wu, Ye

AU - Cho, Juok

AU - Borad, Mitesh J

AU - Chandan, Vishal

AU - Cheville, John

AU - Copland, John A III

AU - Flotte, Thomas J

AU - Kendrick, Michael

AU - Kocher, Jean-Pierre

AU - O'Neill, Brian Patrick

AU - Patel, Tushar C

AU - Petersen, Gloria M

AU - Roberts, Lewis Rowland

AU - Smallridge, Robert Christian

AU - Stanton, Melissa

AU - Zhang, Lizhi

PY - 2018/3/28

Y1 - 2018/3/28

N2 - The Cancer Genome Atlas (TCGA) cancer genomics dataset includes over 10,000 tumor-normal exome pairs across 33 different cancer types, in total >400 TB of raw data files requiring analysis. Here we describe the Multi-Center Mutation Calling in Multiple Cancers project, our effort to generate a comprehensive encyclopedia of somatic mutation calls for the TCGA data to enable robust cross-tumor-type analyses. Our approach accounts for variance and batch effects introduced by the rapid advancement of DNA extraction, hybridization-capture, sequencing, and analysis methods over time. We present best practices for applying an ensemble of seven mutation-calling algorithms with scoring and artifact filtering. The dataset created by this analysis includes 3.5 million somatic variants and forms the basis for PanCan Atlas papers. The results have been made available to the research community along with the methods used to generate them. This project is the result of collaboration from a number of institutes and demonstrates how team science drives extremely large genomics projects. The MC3 is a variant calling project of over 10,000 cancer exome samples from 33 cancer types. Over three million somatic variants were detected using seven different methods developed from institutions across the United States. These variants formed the basis for the PanCan Atlas papers.

AB - The Cancer Genome Atlas (TCGA) cancer genomics dataset includes over 10,000 tumor-normal exome pairs across 33 different cancer types, in total >400 TB of raw data files requiring analysis. Here we describe the Multi-Center Mutation Calling in Multiple Cancers project, our effort to generate a comprehensive encyclopedia of somatic mutation calls for the TCGA data to enable robust cross-tumor-type analyses. Our approach accounts for variance and batch effects introduced by the rapid advancement of DNA extraction, hybridization-capture, sequencing, and analysis methods over time. We present best practices for applying an ensemble of seven mutation-calling algorithms with scoring and artifact filtering. The dataset created by this analysis includes 3.5 million somatic variants and forms the basis for PanCan Atlas papers. The results have been made available to the research community along with the methods used to generate them. This project is the result of collaboration from a number of institutes and demonstrates how team science drives extremely large genomics projects. The MC3 is a variant calling project of over 10,000 cancer exome samples from 33 cancer types. Over three million somatic variants were detected using seven different methods developed from institutions across the United States. These variants formed the basis for the PanCan Atlas papers.

KW - large-scale

KW - open science

KW - pan-cancer

KW - PanCanAtlas project

KW - reproducible computing

KW - somatic mutation calling

KW - TCGA

UR - http://www.scopus.com/inward/record.url?scp=85044569292&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85044569292&partnerID=8YFLogxK

U2 - 10.1016/j.cels.2018.03.002

DO - 10.1016/j.cels.2018.03.002

M3 - Article

VL - 6

SP - 271-281.e7

JO - Cell Systems

JF - Cell Systems

SN - 2405-4712

IS - 3

ER -