A logistic normal multinomial regression model for microbiome compositional data analysis

Fan Xia, Jun Chen, Wing Kam Fung, Hongzhe Li

Research output: Contribution to journalArticle

24 Citations (Scopus)

Abstract

Summary: Changes in human microbiome are associated with many human diseases. Next generation sequencing technologies make it possible to quantify the microbial composition without the need for laboratory cultivation. One important problem of microbiome data analysis is to identify the environmental/biological covariates that are associated with different bacterial taxa. Taxa count data in microbiome studies are often over-dispersed and include many zeros. To account for such an over-dispersion, we propose to use an additive logistic normal multinomial regression model to associate the covariates to bacterial composition. The model can naturally account for sampling variabilities and zero observations and also allow for a flexible covariance structure among the bacterial taxa. In order to select the relevant covariates and to estimate the corresponding regression coefficients, we propose a group ℓ1 penalized likelihood estimation method for variable selection and estimation. We develop a Monte Carlo expectation-maximization algorithm to implement the penalized likelihood estimation. Our simulation results show that the proposed method outperforms the group ℓ1 penalized multinomial logistic regression and the Dirichlet multinomial regression models in variable selection. We demonstrate the methods using a data set that links human gut microbiome to micro-nutrients in order to identify the nutrients that are associated with the human gut microbiome enterotype.

Original languageEnglish (US)
Pages (from-to)1053-1063
Number of pages11
JournalBiometrics
Volume69
Issue number4
DOIs
StatePublished - Dec 2013
Externally publishedYes

Fingerprint

Compositional Data
Multinomial Model
Microbiota
Logistics
data analysis
Regression Model
Data analysis
Covariates
Penalized Likelihood
Nutrients
Variable Selection
Chemical analysis
Overdispersion
digestive system
Flexible Structure
Count Data
Monte Carlo Algorithm
Covariance Structure
Expectation-maximization Algorithm
Zero

Keywords

  • Hierarchical model
  • Markov chain Monte Carlo
  • Over-dispersion
  • Regularization
  • Variable selection

ASJC Scopus subject areas

  • Statistics and Probability
  • Medicine(all)
  • Biochemistry, Genetics and Molecular Biology(all)
  • Immunology and Microbiology(all)
  • Agricultural and Biological Sciences(all)
  • Applied Mathematics

Cite this

A logistic normal multinomial regression model for microbiome compositional data analysis. / Xia, Fan; Chen, Jun; Fung, Wing Kam; Li, Hongzhe.

In: Biometrics, Vol. 69, No. 4, 12.2013, p. 1053-1063.

Research output: Contribution to journalArticle

Xia, Fan ; Chen, Jun ; Fung, Wing Kam ; Li, Hongzhe. / A logistic normal multinomial regression model for microbiome compositional data analysis. In: Biometrics. 2013 ; Vol. 69, No. 4. pp. 1053-1063.
@article{d51b576436814e509d2fbc31c75aef62,
title = "A logistic normal multinomial regression model for microbiome compositional data analysis",
abstract = "Summary: Changes in human microbiome are associated with many human diseases. Next generation sequencing technologies make it possible to quantify the microbial composition without the need for laboratory cultivation. One important problem of microbiome data analysis is to identify the environmental/biological covariates that are associated with different bacterial taxa. Taxa count data in microbiome studies are often over-dispersed and include many zeros. To account for such an over-dispersion, we propose to use an additive logistic normal multinomial regression model to associate the covariates to bacterial composition. The model can naturally account for sampling variabilities and zero observations and also allow for a flexible covariance structure among the bacterial taxa. In order to select the relevant covariates and to estimate the corresponding regression coefficients, we propose a group ℓ1 penalized likelihood estimation method for variable selection and estimation. We develop a Monte Carlo expectation-maximization algorithm to implement the penalized likelihood estimation. Our simulation results show that the proposed method outperforms the group ℓ1 penalized multinomial logistic regression and the Dirichlet multinomial regression models in variable selection. We demonstrate the methods using a data set that links human gut microbiome to micro-nutrients in order to identify the nutrients that are associated with the human gut microbiome enterotype.",
keywords = "Hierarchical model, Markov chain Monte Carlo, Over-dispersion, Regularization, Variable selection",
author = "Fan Xia and Jun Chen and Fung, {Wing Kam} and Hongzhe Li",
year = "2013",
month = "12",
doi = "10.1111/biom.12079",
language = "English (US)",
volume = "69",
pages = "1053--1063",
journal = "Biometrics",
issn = "0006-341X",
publisher = "Wiley-Blackwell",
number = "4",

}

TY - JOUR

T1 - A logistic normal multinomial regression model for microbiome compositional data analysis

AU - Xia, Fan

AU - Chen, Jun

AU - Fung, Wing Kam

AU - Li, Hongzhe

PY - 2013/12

Y1 - 2013/12

N2 - Summary: Changes in human microbiome are associated with many human diseases. Next generation sequencing technologies make it possible to quantify the microbial composition without the need for laboratory cultivation. One important problem of microbiome data analysis is to identify the environmental/biological covariates that are associated with different bacterial taxa. Taxa count data in microbiome studies are often over-dispersed and include many zeros. To account for such an over-dispersion, we propose to use an additive logistic normal multinomial regression model to associate the covariates to bacterial composition. The model can naturally account for sampling variabilities and zero observations and also allow for a flexible covariance structure among the bacterial taxa. In order to select the relevant covariates and to estimate the corresponding regression coefficients, we propose a group ℓ1 penalized likelihood estimation method for variable selection and estimation. We develop a Monte Carlo expectation-maximization algorithm to implement the penalized likelihood estimation. Our simulation results show that the proposed method outperforms the group ℓ1 penalized multinomial logistic regression and the Dirichlet multinomial regression models in variable selection. We demonstrate the methods using a data set that links human gut microbiome to micro-nutrients in order to identify the nutrients that are associated with the human gut microbiome enterotype.

AB - Summary: Changes in human microbiome are associated with many human diseases. Next generation sequencing technologies make it possible to quantify the microbial composition without the need for laboratory cultivation. One important problem of microbiome data analysis is to identify the environmental/biological covariates that are associated with different bacterial taxa. Taxa count data in microbiome studies are often over-dispersed and include many zeros. To account for such an over-dispersion, we propose to use an additive logistic normal multinomial regression model to associate the covariates to bacterial composition. The model can naturally account for sampling variabilities and zero observations and also allow for a flexible covariance structure among the bacterial taxa. In order to select the relevant covariates and to estimate the corresponding regression coefficients, we propose a group ℓ1 penalized likelihood estimation method for variable selection and estimation. We develop a Monte Carlo expectation-maximization algorithm to implement the penalized likelihood estimation. Our simulation results show that the proposed method outperforms the group ℓ1 penalized multinomial logistic regression and the Dirichlet multinomial regression models in variable selection. We demonstrate the methods using a data set that links human gut microbiome to micro-nutrients in order to identify the nutrients that are associated with the human gut microbiome enterotype.

KW - Hierarchical model

KW - Markov chain Monte Carlo

KW - Over-dispersion

KW - Regularization

KW - Variable selection

UR - http://www.scopus.com/inward/record.url?scp=84890309776&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84890309776&partnerID=8YFLogxK

U2 - 10.1111/biom.12079

DO - 10.1111/biom.12079

M3 - Article

C2 - 24128059

AN - SCOPUS:84890309776

VL - 69

SP - 1053

EP - 1063

JO - Biometrics

JF - Biometrics

SN - 0006-341X

IS - 4

ER -