Comprehensive, integrated, and phased whole-genome analysis of the primary ENCODE cell line K562

Bo Zhou, Steve S. Ho, Stephanie U. Greer, Xiaowei Zhu, John M. Bell, Joseph G. Arthur, Noah Spies, Xianglong Zhang, Seunggyu Byeon, Reenal Pattni, Noa Ben-Efraim, Michael S. Haney, Rajini R. Haraksingh, Giltae Song, Hanlee P. Ji, Dimitri Perrin, Wing H. Wong, Alexej Abyzov, Alexander E. Urban

Research output: Contribution to journalArticle

2 Citations (Scopus)

Abstract

K562 is widely used in biomedical research. It is one of three tier-one cell lines of ENCODE and also most commonly used for large-scale CRISPR/Cas9 screens. Although its functional genomic and epigenomic characteristics have been extensively studied, its genome sequence and genomic structural features have never been comprehensively analyzed. Such information is essential for the correct interpretation and understanding of the vast troves of existing functional genomics and epigenomics data for K562. We performed and integrated deep-coverage whole-genome (short-insert), mate-pair, and linked-read sequencing as well as karyotyping and array CGH analysis to identify a wide spectrum of genome characteristics in K562: copy numbers (CN) of aneuploid chromosome segments at high-resolution, SNVs and indels (both corrected for CN in aneuploid regions), loss of heterozygosity, megabase-scale phased haplotypes often spanning entire chromosome arms, structural variants (SVs), including small and large-scale complex SVs and nonreference retrotransposon insertions. Many SVs were phased, assembled, and experimentally validated. We identified multiple allele-specific deletions and duplications within the tumor suppressor gene FHIT. Taking aneuploidy into account, we reanalyzed K562 RNA-seq and whole-genome bisulfite sequencing data for allele-specific expression and allele-specific DNA methylation. We also show examples of how deeper insights into regulatory complexity are gained by integrating genomic variant information and structural context with functional genomics and epigenomics data. Furthermore, using K562 haplotype information, we produced an allele-specific CRISPR targeting map. This comprehensive whole-genome analysis serves as a resource for future studies that utilize K562 as well as a framework for the analysis of other cancer genomes.

Original languageEnglish (US)
Pages (from-to)472-484
Number of pages13
JournalGenome Research
Volume29
Issue number3
DOIs
StatePublished - Mar 1 2019

Fingerprint

Genome
Cell Line
Aneuploidy
Clustered Regularly Interspaced Short Palindromic Repeats
Alleles
Epigenomics
Genomics
Haplotypes
Genomic Structural Variation
Chromosomes
Karyotyping
Retroelements
Loss of Heterozygosity
DNA Methylation
Tumor Suppressor Genes
Biomedical Research
RNA
Neoplasms

ASJC Scopus subject areas

  • Genetics
  • Genetics(clinical)

Cite this

Zhou, B., Ho, S. S., Greer, S. U., Zhu, X., Bell, J. M., Arthur, J. G., ... Urban, A. E. (2019). Comprehensive, integrated, and phased whole-genome analysis of the primary ENCODE cell line K562. Genome Research, 29(3), 472-484. https://doi.org/10.1101/gr.234948.118

Comprehensive, integrated, and phased whole-genome analysis of the primary ENCODE cell line K562. / Zhou, Bo; Ho, Steve S.; Greer, Stephanie U.; Zhu, Xiaowei; Bell, John M.; Arthur, Joseph G.; Spies, Noah; Zhang, Xianglong; Byeon, Seunggyu; Pattni, Reenal; Ben-Efraim, Noa; Haney, Michael S.; Haraksingh, Rajini R.; Song, Giltae; Ji, Hanlee P.; Perrin, Dimitri; Wong, Wing H.; Abyzov, Alexej; Urban, Alexander E.

In: Genome Research, Vol. 29, No. 3, 01.03.2019, p. 472-484.

Research output: Contribution to journalArticle

Zhou, B, Ho, SS, Greer, SU, Zhu, X, Bell, JM, Arthur, JG, Spies, N, Zhang, X, Byeon, S, Pattni, R, Ben-Efraim, N, Haney, MS, Haraksingh, RR, Song, G, Ji, HP, Perrin, D, Wong, WH, Abyzov, A & Urban, AE 2019, 'Comprehensive, integrated, and phased whole-genome analysis of the primary ENCODE cell line K562', Genome Research, vol. 29, no. 3, pp. 472-484. https://doi.org/10.1101/gr.234948.118
Zhou, Bo ; Ho, Steve S. ; Greer, Stephanie U. ; Zhu, Xiaowei ; Bell, John M. ; Arthur, Joseph G. ; Spies, Noah ; Zhang, Xianglong ; Byeon, Seunggyu ; Pattni, Reenal ; Ben-Efraim, Noa ; Haney, Michael S. ; Haraksingh, Rajini R. ; Song, Giltae ; Ji, Hanlee P. ; Perrin, Dimitri ; Wong, Wing H. ; Abyzov, Alexej ; Urban, Alexander E. / Comprehensive, integrated, and phased whole-genome analysis of the primary ENCODE cell line K562. In: Genome Research. 2019 ; Vol. 29, No. 3. pp. 472-484.
@article{ba6c4a96e31247d5a6ee86dd5cdc9b2b,
title = "Comprehensive, integrated, and phased whole-genome analysis of the primary ENCODE cell line K562",
abstract = "K562 is widely used in biomedical research. It is one of three tier-one cell lines of ENCODE and also most commonly used for large-scale CRISPR/Cas9 screens. Although its functional genomic and epigenomic characteristics have been extensively studied, its genome sequence and genomic structural features have never been comprehensively analyzed. Such information is essential for the correct interpretation and understanding of the vast troves of existing functional genomics and epigenomics data for K562. We performed and integrated deep-coverage whole-genome (short-insert), mate-pair, and linked-read sequencing as well as karyotyping and array CGH analysis to identify a wide spectrum of genome characteristics in K562: copy numbers (CN) of aneuploid chromosome segments at high-resolution, SNVs and indels (both corrected for CN in aneuploid regions), loss of heterozygosity, megabase-scale phased haplotypes often spanning entire chromosome arms, structural variants (SVs), including small and large-scale complex SVs and nonreference retrotransposon insertions. Many SVs were phased, assembled, and experimentally validated. We identified multiple allele-specific deletions and duplications within the tumor suppressor gene FHIT. Taking aneuploidy into account, we reanalyzed K562 RNA-seq and whole-genome bisulfite sequencing data for allele-specific expression and allele-specific DNA methylation. We also show examples of how deeper insights into regulatory complexity are gained by integrating genomic variant information and structural context with functional genomics and epigenomics data. Furthermore, using K562 haplotype information, we produced an allele-specific CRISPR targeting map. This comprehensive whole-genome analysis serves as a resource for future studies that utilize K562 as well as a framework for the analysis of other cancer genomes.",
author = "Bo Zhou and Ho, {Steve S.} and Greer, {Stephanie U.} and Xiaowei Zhu and Bell, {John M.} and Arthur, {Joseph G.} and Noah Spies and Xianglong Zhang and Seunggyu Byeon and Reenal Pattni and Noa Ben-Efraim and Haney, {Michael S.} and Haraksingh, {Rajini R.} and Giltae Song and Ji, {Hanlee P.} and Dimitri Perrin and Wong, {Wing H.} and Alexej Abyzov and Urban, {Alexander E.}",
year = "2019",
month = "3",
day = "1",
doi = "10.1101/gr.234948.118",
language = "English (US)",
volume = "29",
pages = "472--484",
journal = "Genome Research",
issn = "1088-9051",
publisher = "Cold Spring Harbor Laboratory Press",
number = "3",

}

TY - JOUR

T1 - Comprehensive, integrated, and phased whole-genome analysis of the primary ENCODE cell line K562

AU - Zhou, Bo

AU - Ho, Steve S.

AU - Greer, Stephanie U.

AU - Zhu, Xiaowei

AU - Bell, John M.

AU - Arthur, Joseph G.

AU - Spies, Noah

AU - Zhang, Xianglong

AU - Byeon, Seunggyu

AU - Pattni, Reenal

AU - Ben-Efraim, Noa

AU - Haney, Michael S.

AU - Haraksingh, Rajini R.

AU - Song, Giltae

AU - Ji, Hanlee P.

AU - Perrin, Dimitri

AU - Wong, Wing H.

AU - Abyzov, Alexej

AU - Urban, Alexander E.

PY - 2019/3/1

Y1 - 2019/3/1

N2 - K562 is widely used in biomedical research. It is one of three tier-one cell lines of ENCODE and also most commonly used for large-scale CRISPR/Cas9 screens. Although its functional genomic and epigenomic characteristics have been extensively studied, its genome sequence and genomic structural features have never been comprehensively analyzed. Such information is essential for the correct interpretation and understanding of the vast troves of existing functional genomics and epigenomics data for K562. We performed and integrated deep-coverage whole-genome (short-insert), mate-pair, and linked-read sequencing as well as karyotyping and array CGH analysis to identify a wide spectrum of genome characteristics in K562: copy numbers (CN) of aneuploid chromosome segments at high-resolution, SNVs and indels (both corrected for CN in aneuploid regions), loss of heterozygosity, megabase-scale phased haplotypes often spanning entire chromosome arms, structural variants (SVs), including small and large-scale complex SVs and nonreference retrotransposon insertions. Many SVs were phased, assembled, and experimentally validated. We identified multiple allele-specific deletions and duplications within the tumor suppressor gene FHIT. Taking aneuploidy into account, we reanalyzed K562 RNA-seq and whole-genome bisulfite sequencing data for allele-specific expression and allele-specific DNA methylation. We also show examples of how deeper insights into regulatory complexity are gained by integrating genomic variant information and structural context with functional genomics and epigenomics data. Furthermore, using K562 haplotype information, we produced an allele-specific CRISPR targeting map. This comprehensive whole-genome analysis serves as a resource for future studies that utilize K562 as well as a framework for the analysis of other cancer genomes.

AB - K562 is widely used in biomedical research. It is one of three tier-one cell lines of ENCODE and also most commonly used for large-scale CRISPR/Cas9 screens. Although its functional genomic and epigenomic characteristics have been extensively studied, its genome sequence and genomic structural features have never been comprehensively analyzed. Such information is essential for the correct interpretation and understanding of the vast troves of existing functional genomics and epigenomics data for K562. We performed and integrated deep-coverage whole-genome (short-insert), mate-pair, and linked-read sequencing as well as karyotyping and array CGH analysis to identify a wide spectrum of genome characteristics in K562: copy numbers (CN) of aneuploid chromosome segments at high-resolution, SNVs and indels (both corrected for CN in aneuploid regions), loss of heterozygosity, megabase-scale phased haplotypes often spanning entire chromosome arms, structural variants (SVs), including small and large-scale complex SVs and nonreference retrotransposon insertions. Many SVs were phased, assembled, and experimentally validated. We identified multiple allele-specific deletions and duplications within the tumor suppressor gene FHIT. Taking aneuploidy into account, we reanalyzed K562 RNA-seq and whole-genome bisulfite sequencing data for allele-specific expression and allele-specific DNA methylation. We also show examples of how deeper insights into regulatory complexity are gained by integrating genomic variant information and structural context with functional genomics and epigenomics data. Furthermore, using K562 haplotype information, we produced an allele-specific CRISPR targeting map. This comprehensive whole-genome analysis serves as a resource for future studies that utilize K562 as well as a framework for the analysis of other cancer genomes.

UR - http://www.scopus.com/inward/record.url?scp=85062587338&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85062587338&partnerID=8YFLogxK

U2 - 10.1101/gr.234948.118

DO - 10.1101/gr.234948.118

M3 - Article

VL - 29

SP - 472

EP - 484

JO - Genome Research

JF - Genome Research

SN - 1088-9051

IS - 3

ER -