Long-read sequencing across the C9orf72 'GGGGCC' repeat expansion: Implications for clinical use and genetic discovery efforts in human disease

Mark T.W. Ebbert, Stefan L. Farrugia, Jonathon P. Sens, Karen Jansen-West, Tania D Gendron, Mercedes Prudencio, Ian J. McLaughlin, Brett Bowman, Matthew Seetin, Mariely Dejesus-Hernandez, Jazmyne Jackson, Patricia H. Brown, Dennis W Dickson, Marka Van Blitterswijk, Rosa V Rademakers, Leonard Petrucelli, John D. Fryer

Research output: Contribution to journalArticle

13 Citations (Scopus)

Abstract

Background: Many neurodegenerative diseases are caused by nucleotide repeat expansions, but most expansions, like the C9orf72 'GGGGCC' (G4C2) repeat that causes approximately 5-7% of all amyotrophic lateral sclerosis (ALS) and frontotemporal dementia (FTD) cases, are too long to sequence using short-read sequencing technologies. It is unclear whether long-read sequencing technologies can traverse these long, challenging repeat expansions. Here, we demonstrate that two long-read sequencing technologies, Pacific Biosciences' (PacBio) and Oxford Nanopore Technologies' (ONT), can sequence through disease-causing repeats cloned into plasmids, including the FTD/ALS-causing G4C2 repeat expansion. We also report the first long-read sequencing data characterizing the C9orf72 G4C2 repeat expansion at the nucleotide level in two symptomatic expansion carriers using PacBio whole-genome sequencing and a no-amplification (No-Amp) targeted approach based on CRISPR/Cas9. Results: Both the PacBio and ONT platforms successfully sequenced through the repeat expansions in plasmids. Throughput on the MinION was a challenge for whole-genome sequencing; we were unable to attain reads covering the human C9orf72 repeat expansion using 15 flow cells. We obtained 8× coverage across the C9orf72 locus using the PacBio Sequel, accurately reporting the unexpanded allele at eight repeats, and reading through the entire expansion with 1324 repeats (7941 nucleotides). Using the No-Amp targeted approach, we attained > 800× coverage and were able to identify the unexpanded allele, closely estimate expansion size, and assess nucleotide content in a single experiment. We estimate the individual's repeat region was > 99% G4C2 content, though we cannot rule out small interruptions. Conclusions: Our findings indicate that long-read sequencing is well suited to characterizing known repeat expansions, and for discovering new disease-causing, disease-modifying, or risk-modifying repeat expansions that have gone undetected with conventional short-read sequencing. The PacBio No-Amp targeted approach may have future potential in clinical and genetic counseling environments. Larger and deeper long-read sequencing studies in C9orf72 expansion carriers will be important to determine heterogeneity and whether the repeats are interrupted by non-G4C2 content, potentially mitigating or modifying disease course or age of onset, as interruptions are known to do in other repeat-expansion disorders. These results have broad implications across all diseases where the genetic etiology remains unclear.

Original languageEnglish (US)
Article number46
JournalMolecular Neurodegeneration
Volume13
Issue number1
DOIs
StatePublished - Aug 21 2018

Fingerprint

Technology
Nucleotides
Nanopores
Plasmids
Clustered Regularly Interspaced Short Palindromic Repeats
Alleles
Genome
Frontotemporal Dementia
Inborn Genetic Diseases
Genetic Counseling
Age of Onset
Neurodegenerative Diseases
Reading

Keywords

  • Amyotrophic lateral sclerosis (ALS)
  • C9orf72
  • Frontotemporal dementia (FTD)
  • Genetics
  • GGGGCC
  • Long-read sequencing
  • Oxford Nanopore Technologies MinION
  • PacBio RS II and Sequel
  • Repeat expansion disorders
  • Structural mutations

ASJC Scopus subject areas

  • Molecular Biology
  • Clinical Neurology
  • Cellular and Molecular Neuroscience

Cite this

Long-read sequencing across the C9orf72 'GGGGCC' repeat expansion : Implications for clinical use and genetic discovery efforts in human disease. / Ebbert, Mark T.W.; Farrugia, Stefan L.; Sens, Jonathon P.; Jansen-West, Karen; Gendron, Tania D; Prudencio, Mercedes; McLaughlin, Ian J.; Bowman, Brett; Seetin, Matthew; Dejesus-Hernandez, Mariely; Jackson, Jazmyne; Brown, Patricia H.; Dickson, Dennis W; Van Blitterswijk, Marka; Rademakers, Rosa V; Petrucelli, Leonard; Fryer, John D.

In: Molecular Neurodegeneration, Vol. 13, No. 1, 46, 21.08.2018.

Research output: Contribution to journalArticle

Ebbert, Mark T.W. ; Farrugia, Stefan L. ; Sens, Jonathon P. ; Jansen-West, Karen ; Gendron, Tania D ; Prudencio, Mercedes ; McLaughlin, Ian J. ; Bowman, Brett ; Seetin, Matthew ; Dejesus-Hernandez, Mariely ; Jackson, Jazmyne ; Brown, Patricia H. ; Dickson, Dennis W ; Van Blitterswijk, Marka ; Rademakers, Rosa V ; Petrucelli, Leonard ; Fryer, John D. / Long-read sequencing across the C9orf72 'GGGGCC' repeat expansion : Implications for clinical use and genetic discovery efforts in human disease. In: Molecular Neurodegeneration. 2018 ; Vol. 13, No. 1.
@article{85f704bbfe64434cbd6d0211e6001552,
title = "Long-read sequencing across the C9orf72 'GGGGCC' repeat expansion: Implications for clinical use and genetic discovery efforts in human disease",
abstract = "Background: Many neurodegenerative diseases are caused by nucleotide repeat expansions, but most expansions, like the C9orf72 'GGGGCC' (G4C2) repeat that causes approximately 5-7{\%} of all amyotrophic lateral sclerosis (ALS) and frontotemporal dementia (FTD) cases, are too long to sequence using short-read sequencing technologies. It is unclear whether long-read sequencing technologies can traverse these long, challenging repeat expansions. Here, we demonstrate that two long-read sequencing technologies, Pacific Biosciences' (PacBio) and Oxford Nanopore Technologies' (ONT), can sequence through disease-causing repeats cloned into plasmids, including the FTD/ALS-causing G4C2 repeat expansion. We also report the first long-read sequencing data characterizing the C9orf72 G4C2 repeat expansion at the nucleotide level in two symptomatic expansion carriers using PacBio whole-genome sequencing and a no-amplification (No-Amp) targeted approach based on CRISPR/Cas9. Results: Both the PacBio and ONT platforms successfully sequenced through the repeat expansions in plasmids. Throughput on the MinION was a challenge for whole-genome sequencing; we were unable to attain reads covering the human C9orf72 repeat expansion using 15 flow cells. We obtained 8× coverage across the C9orf72 locus using the PacBio Sequel, accurately reporting the unexpanded allele at eight repeats, and reading through the entire expansion with 1324 repeats (7941 nucleotides). Using the No-Amp targeted approach, we attained > 800× coverage and were able to identify the unexpanded allele, closely estimate expansion size, and assess nucleotide content in a single experiment. We estimate the individual's repeat region was > 99{\%} G4C2 content, though we cannot rule out small interruptions. Conclusions: Our findings indicate that long-read sequencing is well suited to characterizing known repeat expansions, and for discovering new disease-causing, disease-modifying, or risk-modifying repeat expansions that have gone undetected with conventional short-read sequencing. The PacBio No-Amp targeted approach may have future potential in clinical and genetic counseling environments. Larger and deeper long-read sequencing studies in C9orf72 expansion carriers will be important to determine heterogeneity and whether the repeats are interrupted by non-G4C2 content, potentially mitigating or modifying disease course or age of onset, as interruptions are known to do in other repeat-expansion disorders. These results have broad implications across all diseases where the genetic etiology remains unclear.",
keywords = "Amyotrophic lateral sclerosis (ALS), C9orf72, Frontotemporal dementia (FTD), Genetics, GGGGCC, Long-read sequencing, Oxford Nanopore Technologies MinION, PacBio RS II and Sequel, Repeat expansion disorders, Structural mutations",
author = "Ebbert, {Mark T.W.} and Farrugia, {Stefan L.} and Sens, {Jonathon P.} and Karen Jansen-West and Gendron, {Tania D} and Mercedes Prudencio and McLaughlin, {Ian J.} and Brett Bowman and Matthew Seetin and Mariely Dejesus-Hernandez and Jazmyne Jackson and Brown, {Patricia H.} and Dickson, {Dennis W} and {Van Blitterswijk}, Marka and Rademakers, {Rosa V} and Leonard Petrucelli and Fryer, {John D.}",
year = "2018",
month = "8",
day = "21",
doi = "10.1186/s13024-018-0274-4",
language = "English (US)",
volume = "13",
journal = "Molecular Neurodegeneration",
issn = "1750-1326",
publisher = "BioMed Central",
number = "1",

}

TY - JOUR

T1 - Long-read sequencing across the C9orf72 'GGGGCC' repeat expansion

T2 - Implications for clinical use and genetic discovery efforts in human disease

AU - Ebbert, Mark T.W.

AU - Farrugia, Stefan L.

AU - Sens, Jonathon P.

AU - Jansen-West, Karen

AU - Gendron, Tania D

AU - Prudencio, Mercedes

AU - McLaughlin, Ian J.

AU - Bowman, Brett

AU - Seetin, Matthew

AU - Dejesus-Hernandez, Mariely

AU - Jackson, Jazmyne

AU - Brown, Patricia H.

AU - Dickson, Dennis W

AU - Van Blitterswijk, Marka

AU - Rademakers, Rosa V

AU - Petrucelli, Leonard

AU - Fryer, John D.

PY - 2018/8/21

Y1 - 2018/8/21

N2 - Background: Many neurodegenerative diseases are caused by nucleotide repeat expansions, but most expansions, like the C9orf72 'GGGGCC' (G4C2) repeat that causes approximately 5-7% of all amyotrophic lateral sclerosis (ALS) and frontotemporal dementia (FTD) cases, are too long to sequence using short-read sequencing technologies. It is unclear whether long-read sequencing technologies can traverse these long, challenging repeat expansions. Here, we demonstrate that two long-read sequencing technologies, Pacific Biosciences' (PacBio) and Oxford Nanopore Technologies' (ONT), can sequence through disease-causing repeats cloned into plasmids, including the FTD/ALS-causing G4C2 repeat expansion. We also report the first long-read sequencing data characterizing the C9orf72 G4C2 repeat expansion at the nucleotide level in two symptomatic expansion carriers using PacBio whole-genome sequencing and a no-amplification (No-Amp) targeted approach based on CRISPR/Cas9. Results: Both the PacBio and ONT platforms successfully sequenced through the repeat expansions in plasmids. Throughput on the MinION was a challenge for whole-genome sequencing; we were unable to attain reads covering the human C9orf72 repeat expansion using 15 flow cells. We obtained 8× coverage across the C9orf72 locus using the PacBio Sequel, accurately reporting the unexpanded allele at eight repeats, and reading through the entire expansion with 1324 repeats (7941 nucleotides). Using the No-Amp targeted approach, we attained > 800× coverage and were able to identify the unexpanded allele, closely estimate expansion size, and assess nucleotide content in a single experiment. We estimate the individual's repeat region was > 99% G4C2 content, though we cannot rule out small interruptions. Conclusions: Our findings indicate that long-read sequencing is well suited to characterizing known repeat expansions, and for discovering new disease-causing, disease-modifying, or risk-modifying repeat expansions that have gone undetected with conventional short-read sequencing. The PacBio No-Amp targeted approach may have future potential in clinical and genetic counseling environments. Larger and deeper long-read sequencing studies in C9orf72 expansion carriers will be important to determine heterogeneity and whether the repeats are interrupted by non-G4C2 content, potentially mitigating or modifying disease course or age of onset, as interruptions are known to do in other repeat-expansion disorders. These results have broad implications across all diseases where the genetic etiology remains unclear.

AB - Background: Many neurodegenerative diseases are caused by nucleotide repeat expansions, but most expansions, like the C9orf72 'GGGGCC' (G4C2) repeat that causes approximately 5-7% of all amyotrophic lateral sclerosis (ALS) and frontotemporal dementia (FTD) cases, are too long to sequence using short-read sequencing technologies. It is unclear whether long-read sequencing technologies can traverse these long, challenging repeat expansions. Here, we demonstrate that two long-read sequencing technologies, Pacific Biosciences' (PacBio) and Oxford Nanopore Technologies' (ONT), can sequence through disease-causing repeats cloned into plasmids, including the FTD/ALS-causing G4C2 repeat expansion. We also report the first long-read sequencing data characterizing the C9orf72 G4C2 repeat expansion at the nucleotide level in two symptomatic expansion carriers using PacBio whole-genome sequencing and a no-amplification (No-Amp) targeted approach based on CRISPR/Cas9. Results: Both the PacBio and ONT platforms successfully sequenced through the repeat expansions in plasmids. Throughput on the MinION was a challenge for whole-genome sequencing; we were unable to attain reads covering the human C9orf72 repeat expansion using 15 flow cells. We obtained 8× coverage across the C9orf72 locus using the PacBio Sequel, accurately reporting the unexpanded allele at eight repeats, and reading through the entire expansion with 1324 repeats (7941 nucleotides). Using the No-Amp targeted approach, we attained > 800× coverage and were able to identify the unexpanded allele, closely estimate expansion size, and assess nucleotide content in a single experiment. We estimate the individual's repeat region was > 99% G4C2 content, though we cannot rule out small interruptions. Conclusions: Our findings indicate that long-read sequencing is well suited to characterizing known repeat expansions, and for discovering new disease-causing, disease-modifying, or risk-modifying repeat expansions that have gone undetected with conventional short-read sequencing. The PacBio No-Amp targeted approach may have future potential in clinical and genetic counseling environments. Larger and deeper long-read sequencing studies in C9orf72 expansion carriers will be important to determine heterogeneity and whether the repeats are interrupted by non-G4C2 content, potentially mitigating or modifying disease course or age of onset, as interruptions are known to do in other repeat-expansion disorders. These results have broad implications across all diseases where the genetic etiology remains unclear.

KW - Amyotrophic lateral sclerosis (ALS)

KW - C9orf72

KW - Frontotemporal dementia (FTD)

KW - Genetics

KW - GGGGCC

KW - Long-read sequencing

KW - Oxford Nanopore Technologies MinION

KW - PacBio RS II and Sequel

KW - Repeat expansion disorders

KW - Structural mutations

UR - http://www.scopus.com/inward/record.url?scp=85052156507&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85052156507&partnerID=8YFLogxK

U2 - 10.1186/s13024-018-0274-4

DO - 10.1186/s13024-018-0274-4

M3 - Article

C2 - 30126445

AN - SCOPUS:85052156507

VL - 13

JO - Molecular Neurodegeneration

JF - Molecular Neurodegeneration

SN - 1750-1326

IS - 1

M1 - 46

ER -