Performance benchmarking of GATK3.8 and GATK4

Jacob R. Heldenbrand, Saurabh Baheti, Matthew A. Bockol, Travis M. Drucker, Steven N. Hart, Matthew E. Hudson, Ravishankar K. Iyer, Michael T. Kalmbach, Eric W. Klee, Eric D Wieben, Mathieu Wiepert, Derek E. Wildman, Liudmila S. Mainzer

Research output: Contribution to journalArticlepeer-review

Abstract

Use of the Genome Analysis Toolkit (GATK) continues to be the standard practice in genomic variant calling in both research and the clinic. Recently the toolkit has been rapidly evolving. Significant computational performance improvements have been introduced in GATK3.8 through collaboration with Intel in 2017. The first release of GATK4 in early 2018 revealed significant rewrites in the code base, as the stepping stone toward a Spark implementation. As the software continues to be a moving target for optimal deployment in highly productive environments, we present a detailed analysis of these improvements, to help the community stay abreast with changes in performance. We re-evaluated the options previously identified as advantageous, such as threading, parallel garbage collection, I/O options and data-level parallelization. Based on our results, we consider the performance and cost trade-offs of using GATK3.8 and GATK4 for different types of analyses.

Original languageEnglish (US)
JournalUnknown Journal
DOIs
StatePublished - Jun 18 2018

Keywords

  • Benchmarking
  • GATK
  • Variant calling

ASJC Scopus subject areas

  • Biochemistry, Genetics and Molecular Biology(all)
  • Agricultural and Biological Sciences(all)
  • Immunology and Microbiology(all)
  • Neuroscience(all)
  • Pharmacology, Toxicology and Pharmaceutics(all)

Fingerprint Dive into the research topics of 'Performance benchmarking of GATK3.8 and GATK4'. Together they form a unique fingerprint.

Cite this