Hybrid-denovo

A de novo OTU-picking pipeline integrating single-end and paired-end 16S sequence tags

Xianfeng Chen, Stephen Johnson, Patricio Jeraldo, Junwen Wang, Nicholas D Chia, Jean-Pierre Kocher, Jun Chen

Research output: Contribution to journalComment/debate

7 Citations (Scopus)

Abstract

Background: Illumina paired-end sequencing has been increasingly popular for 16S rRNA gene-based microbiota profiling. It provides higher phylogenetic resolution than single-end reads due to a longer read length. However, the reverse read (R2) often has significant low base quality, and a large proportion of R2s will be discarded after quality control, resulting in a mixture of paired-end and single-end reads. A typical 16S analysis pipeline usually processes either paired-end or single-end reads but not a mixture. Thus, the quantification accuracy and statistical power will be reduced due to the loss of a large amount of reads. As a result, rare taxa may not be detectable with the paired-end approach, or low taxonomic resolution will result in a single-end approach. Results: To have both the higher phylogenetic resolution provided by paired-end reads and the higher sequence coverage by single-end reads, we propose a novel OTU-picking pipeline, hybrid-denovo, that can process a hybrid of single-end and paired-end reads. Using high-quality paired-end reads as a gold standard, we show that hybrid-denovo achieved the highest correlation with the gold standard and performed better than the approaches based on paired-end or single-end reads in terms of quantifying the microbial diversity and taxonomic abundances. By applying our method to a rheumatoid arthritis (RA) data set, we demonstrated that hybrid-denovo captured more microbial diversity and identified more RA-associated taxa than a paired-end or single-end approach. Conclusions: Hybrid-denovo utilizes both paired-end and single-end 16S sequencing reads and is recommended for 16S rRNA gene targeted paired-end sequencing data.

Original languageEnglish (US)
Pages (from-to)1-7
Number of pages7
JournalGigaScience
Volume7
Issue number3
DOIs
StatePublished - Mar 1 2018

Fingerprint

Pipelines
Genes
Quality control
rRNA Genes
Rheumatoid Arthritis
Microbiota
Quality Control

Keywords

  • 16S rRNA
  • Microbiome
  • OTU picking

ASJC Scopus subject areas

  • Health Informatics
  • Computer Science Applications

Cite this

Hybrid-denovo : A de novo OTU-picking pipeline integrating single-end and paired-end 16S sequence tags. / Chen, Xianfeng; Johnson, Stephen; Jeraldo, Patricio; Wang, Junwen; Chia, Nicholas D; Kocher, Jean-Pierre; Chen, Jun.

In: GigaScience, Vol. 7, No. 3, 01.03.2018, p. 1-7.

Research output: Contribution to journalComment/debate

Chen, Xianfeng ; Johnson, Stephen ; Jeraldo, Patricio ; Wang, Junwen ; Chia, Nicholas D ; Kocher, Jean-Pierre ; Chen, Jun. / Hybrid-denovo : A de novo OTU-picking pipeline integrating single-end and paired-end 16S sequence tags. In: GigaScience. 2018 ; Vol. 7, No. 3. pp. 1-7.
@article{0a999ed1d68c4e61abbf33fdb0279c9f,
title = "Hybrid-denovo: A de novo OTU-picking pipeline integrating single-end and paired-end 16S sequence tags",
abstract = "Background: Illumina paired-end sequencing has been increasingly popular for 16S rRNA gene-based microbiota profiling. It provides higher phylogenetic resolution than single-end reads due to a longer read length. However, the reverse read (R2) often has significant low base quality, and a large proportion of R2s will be discarded after quality control, resulting in a mixture of paired-end and single-end reads. A typical 16S analysis pipeline usually processes either paired-end or single-end reads but not a mixture. Thus, the quantification accuracy and statistical power will be reduced due to the loss of a large amount of reads. As a result, rare taxa may not be detectable with the paired-end approach, or low taxonomic resolution will result in a single-end approach. Results: To have both the higher phylogenetic resolution provided by paired-end reads and the higher sequence coverage by single-end reads, we propose a novel OTU-picking pipeline, hybrid-denovo, that can process a hybrid of single-end and paired-end reads. Using high-quality paired-end reads as a gold standard, we show that hybrid-denovo achieved the highest correlation with the gold standard and performed better than the approaches based on paired-end or single-end reads in terms of quantifying the microbial diversity and taxonomic abundances. By applying our method to a rheumatoid arthritis (RA) data set, we demonstrated that hybrid-denovo captured more microbial diversity and identified more RA-associated taxa than a paired-end or single-end approach. Conclusions: Hybrid-denovo utilizes both paired-end and single-end 16S sequencing reads and is recommended for 16S rRNA gene targeted paired-end sequencing data.",
keywords = "16S rRNA, Microbiome, OTU picking",
author = "Xianfeng Chen and Stephen Johnson and Patricio Jeraldo and Junwen Wang and Chia, {Nicholas D} and Jean-Pierre Kocher and Jun Chen",
year = "2018",
month = "3",
day = "1",
doi = "10.1093/gigascience/gix129",
language = "English (US)",
volume = "7",
pages = "1--7",
journal = "GigaScience",
issn = "2047-217X",
publisher = "BioMed Central",
number = "3",

}

TY - JOUR

T1 - Hybrid-denovo

T2 - A de novo OTU-picking pipeline integrating single-end and paired-end 16S sequence tags

AU - Chen, Xianfeng

AU - Johnson, Stephen

AU - Jeraldo, Patricio

AU - Wang, Junwen

AU - Chia, Nicholas D

AU - Kocher, Jean-Pierre

AU - Chen, Jun

PY - 2018/3/1

Y1 - 2018/3/1

N2 - Background: Illumina paired-end sequencing has been increasingly popular for 16S rRNA gene-based microbiota profiling. It provides higher phylogenetic resolution than single-end reads due to a longer read length. However, the reverse read (R2) often has significant low base quality, and a large proportion of R2s will be discarded after quality control, resulting in a mixture of paired-end and single-end reads. A typical 16S analysis pipeline usually processes either paired-end or single-end reads but not a mixture. Thus, the quantification accuracy and statistical power will be reduced due to the loss of a large amount of reads. As a result, rare taxa may not be detectable with the paired-end approach, or low taxonomic resolution will result in a single-end approach. Results: To have both the higher phylogenetic resolution provided by paired-end reads and the higher sequence coverage by single-end reads, we propose a novel OTU-picking pipeline, hybrid-denovo, that can process a hybrid of single-end and paired-end reads. Using high-quality paired-end reads as a gold standard, we show that hybrid-denovo achieved the highest correlation with the gold standard and performed better than the approaches based on paired-end or single-end reads in terms of quantifying the microbial diversity and taxonomic abundances. By applying our method to a rheumatoid arthritis (RA) data set, we demonstrated that hybrid-denovo captured more microbial diversity and identified more RA-associated taxa than a paired-end or single-end approach. Conclusions: Hybrid-denovo utilizes both paired-end and single-end 16S sequencing reads and is recommended for 16S rRNA gene targeted paired-end sequencing data.

AB - Background: Illumina paired-end sequencing has been increasingly popular for 16S rRNA gene-based microbiota profiling. It provides higher phylogenetic resolution than single-end reads due to a longer read length. However, the reverse read (R2) often has significant low base quality, and a large proportion of R2s will be discarded after quality control, resulting in a mixture of paired-end and single-end reads. A typical 16S analysis pipeline usually processes either paired-end or single-end reads but not a mixture. Thus, the quantification accuracy and statistical power will be reduced due to the loss of a large amount of reads. As a result, rare taxa may not be detectable with the paired-end approach, or low taxonomic resolution will result in a single-end approach. Results: To have both the higher phylogenetic resolution provided by paired-end reads and the higher sequence coverage by single-end reads, we propose a novel OTU-picking pipeline, hybrid-denovo, that can process a hybrid of single-end and paired-end reads. Using high-quality paired-end reads as a gold standard, we show that hybrid-denovo achieved the highest correlation with the gold standard and performed better than the approaches based on paired-end or single-end reads in terms of quantifying the microbial diversity and taxonomic abundances. By applying our method to a rheumatoid arthritis (RA) data set, we demonstrated that hybrid-denovo captured more microbial diversity and identified more RA-associated taxa than a paired-end or single-end approach. Conclusions: Hybrid-denovo utilizes both paired-end and single-end 16S sequencing reads and is recommended for 16S rRNA gene targeted paired-end sequencing data.

KW - 16S rRNA

KW - Microbiome

KW - OTU picking

UR - http://www.scopus.com/inward/record.url?scp=85045207122&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85045207122&partnerID=8YFLogxK

U2 - 10.1093/gigascience/gix129

DO - 10.1093/gigascience/gix129

M3 - Comment/debate

VL - 7

SP - 1

EP - 7

JO - GigaScience

JF - GigaScience

SN - 2047-217X

IS - 3

ER -