GMPR: A robust normalization method for zero-inflated count data with application to microbiome sequencing data

Li Chen, James Reeve, Lujun Zhang, Shengbing Huang, Xuefeng Wang, Jun Chen

Research output: Contribution to journalArticle

10 Citations (Scopus)

Abstract

Normalization is the first critical step in microbiome sequencing data analysis used to account for variable library sizes. Current RNA-Seq based normalization methods that have been adapted for microbiome data fail to consider the unique characteristics of microbiome data, which contain a vast number of zeros due to the physical absence or under-sampling of the microbes. Normalization methods that specifically address the zero-inflation remain largely undeveloped. Here we propose geometric mean of pairwise ratios-a simple but effective normalization method- for zero-inflated sequencing data such as microbiome data. Simulation studies and real datasets analyses demonstrate that the proposed method is more robust than competing methods, leading to more powerful detection of differentially abundant taxa and higher reproducibility of the relative abundances of taxa.

Original languageEnglish (US)
Article numbere4600
JournalPeerJ
Volume2018
Issue number4
DOIs
StatePublished - Jan 1 2018

Fingerprint

Microbiota
RNA
Sampling
methodology
inflation
Economic Inflation
reproducibility
Libraries
data analysis
microbiome
microorganisms
sampling

Keywords

  • Metagenomics
  • Microbiome
  • Normalization
  • RNA-seq
  • Statistics
  • Zero-inflation

ASJC Scopus subject areas

  • Neuroscience(all)
  • Biochemistry, Genetics and Molecular Biology(all)
  • Agricultural and Biological Sciences(all)

Cite this

GMPR : A robust normalization method for zero-inflated count data with application to microbiome sequencing data. / Chen, Li; Reeve, James; Zhang, Lujun; Huang, Shengbing; Wang, Xuefeng; Chen, Jun.

In: PeerJ, Vol. 2018, No. 4, e4600, 01.01.2018.

Research output: Contribution to journalArticle

Chen, Li ; Reeve, James ; Zhang, Lujun ; Huang, Shengbing ; Wang, Xuefeng ; Chen, Jun. / GMPR : A robust normalization method for zero-inflated count data with application to microbiome sequencing data. In: PeerJ. 2018 ; Vol. 2018, No. 4.
@article{84f984842b764da598abfb6c67be86b0,
title = "GMPR: A robust normalization method for zero-inflated count data with application to microbiome sequencing data",
abstract = "Normalization is the first critical step in microbiome sequencing data analysis used to account for variable library sizes. Current RNA-Seq based normalization methods that have been adapted for microbiome data fail to consider the unique characteristics of microbiome data, which contain a vast number of zeros due to the physical absence or under-sampling of the microbes. Normalization methods that specifically address the zero-inflation remain largely undeveloped. Here we propose geometric mean of pairwise ratios-a simple but effective normalization method- for zero-inflated sequencing data such as microbiome data. Simulation studies and real datasets analyses demonstrate that the proposed method is more robust than competing methods, leading to more powerful detection of differentially abundant taxa and higher reproducibility of the relative abundances of taxa.",
keywords = "Metagenomics, Microbiome, Normalization, RNA-seq, Statistics, Zero-inflation",
author = "Li Chen and James Reeve and Lujun Zhang and Shengbing Huang and Xuefeng Wang and Jun Chen",
year = "2018",
month = "1",
day = "1",
doi = "10.7717/peerj.4600",
language = "English (US)",
volume = "2018",
journal = "PeerJ",
issn = "2167-8359",
publisher = "PeerJ",
number = "4",

}

TY - JOUR

T1 - GMPR

T2 - A robust normalization method for zero-inflated count data with application to microbiome sequencing data

AU - Chen, Li

AU - Reeve, James

AU - Zhang, Lujun

AU - Huang, Shengbing

AU - Wang, Xuefeng

AU - Chen, Jun

PY - 2018/1/1

Y1 - 2018/1/1

N2 - Normalization is the first critical step in microbiome sequencing data analysis used to account for variable library sizes. Current RNA-Seq based normalization methods that have been adapted for microbiome data fail to consider the unique characteristics of microbiome data, which contain a vast number of zeros due to the physical absence or under-sampling of the microbes. Normalization methods that specifically address the zero-inflation remain largely undeveloped. Here we propose geometric mean of pairwise ratios-a simple but effective normalization method- for zero-inflated sequencing data such as microbiome data. Simulation studies and real datasets analyses demonstrate that the proposed method is more robust than competing methods, leading to more powerful detection of differentially abundant taxa and higher reproducibility of the relative abundances of taxa.

AB - Normalization is the first critical step in microbiome sequencing data analysis used to account for variable library sizes. Current RNA-Seq based normalization methods that have been adapted for microbiome data fail to consider the unique characteristics of microbiome data, which contain a vast number of zeros due to the physical absence or under-sampling of the microbes. Normalization methods that specifically address the zero-inflation remain largely undeveloped. Here we propose geometric mean of pairwise ratios-a simple but effective normalization method- for zero-inflated sequencing data such as microbiome data. Simulation studies and real datasets analyses demonstrate that the proposed method is more robust than competing methods, leading to more powerful detection of differentially abundant taxa and higher reproducibility of the relative abundances of taxa.

KW - Metagenomics

KW - Microbiome

KW - Normalization

KW - RNA-seq

KW - Statistics

KW - Zero-inflation

UR - http://www.scopus.com/inward/record.url?scp=85044741450&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85044741450&partnerID=8YFLogxK

U2 - 10.7717/peerj.4600

DO - 10.7717/peerj.4600

M3 - Article

AN - SCOPUS:85044741450

VL - 2018

JO - PeerJ

JF - PeerJ

SN - 2167-8359

IS - 4

M1 - e4600

ER -