Beta-Poisson model for single-cell RNA-seq data analyses

Trung Nghia Vu, Quin F. Wills, Krishna R Kalari, Nifang Niu, Liewei M Wang, Mattias Rantalainen, Yudi Pawitan

Research output: Contribution to journalArticle

34 Citations (Scopus)

Abstract

Motivation: Single-cell RNA-sequencing technology allows detection of gene expression at the single-cell level. One typical feature of the data is a bimodality in the cellular distribution even for highly expressed genes, primarily caused by a proportion of non-expressing cells. The standard and the over-dispersed gamma-Poisson models that are commonly used in bulk-cell RNA-sequencing are not able to capture this property. Results: We introduce a beta-Poisson mixture model that can capture the bimodality of the single-cell gene expression distribution. We further integrate the model into the generalized linear model framework in order to perform differential expression analyses. The whole analytical procedure is called BPSC. The results from several real single-cell RNA-seq datasets indicate that ∼90% of the transcripts are well characterized by the beta-Poisson model; the model-fit from BPSC is better than the fit of the standard gamma-Poisson model in > 80% of the transcripts. Moreover, in differential expression analyses of simulated and real datasets, BPSC performs well against edgeR, a conventional method widely used in bulk-cell RNA-sequencing data, and against scde and MAST, two recent methods specifically designed for single-cell RNA-seq data. Availability and Implementation: An R package BPSC for model fitting and differential expression analyses of single-cell RNA-seq data is available under GPL-3 license at https://github.com/nghiavtr/BPSC. Contact: or mattias.rantalainen@ki.se Supplementary information: Supplementary data are available at Bioinformatics online.

Original languageEnglish (US)
Pages (from-to)2128-2135
Number of pages8
JournalBioinformatics
Volume32
Issue number14
DOIs
StatePublished - Jul 15 2016

Fingerprint

Poisson Model
RNA
Cell
RNA Sequence Analysis
Differential Expression
Bimodality
Sequencing
Gene expression
Gene Expression
Single-Cell Analysis
Poisson Mixture
Licensure
Bioinformatics
Computational Biology
Model Fitting
Generalized Linear Model
Mixture Model
Linear Models
Genes
Availability

ASJC Scopus subject areas

  • Statistics and Probability
  • Medicine(all)
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Computational Theory and Mathematics
  • Computational Mathematics

Cite this

Vu, T. N., Wills, Q. F., Kalari, K. R., Niu, N., Wang, L. M., Rantalainen, M., & Pawitan, Y. (2016). Beta-Poisson model for single-cell RNA-seq data analyses. Bioinformatics, 32(14), 2128-2135. https://doi.org/10.1093/bioinformatics/btw202

Beta-Poisson model for single-cell RNA-seq data analyses. / Vu, Trung Nghia; Wills, Quin F.; Kalari, Krishna R; Niu, Nifang; Wang, Liewei M; Rantalainen, Mattias; Pawitan, Yudi.

In: Bioinformatics, Vol. 32, No. 14, 15.07.2016, p. 2128-2135.

Research output: Contribution to journalArticle

Vu, TN, Wills, QF, Kalari, KR, Niu, N, Wang, LM, Rantalainen, M & Pawitan, Y 2016, 'Beta-Poisson model for single-cell RNA-seq data analyses', Bioinformatics, vol. 32, no. 14, pp. 2128-2135. https://doi.org/10.1093/bioinformatics/btw202
Vu, Trung Nghia ; Wills, Quin F. ; Kalari, Krishna R ; Niu, Nifang ; Wang, Liewei M ; Rantalainen, Mattias ; Pawitan, Yudi. / Beta-Poisson model for single-cell RNA-seq data analyses. In: Bioinformatics. 2016 ; Vol. 32, No. 14. pp. 2128-2135.
@article{e6f6cb7a676843e0b60b300f626d21cf,
title = "Beta-Poisson model for single-cell RNA-seq data analyses",
abstract = "Motivation: Single-cell RNA-sequencing technology allows detection of gene expression at the single-cell level. One typical feature of the data is a bimodality in the cellular distribution even for highly expressed genes, primarily caused by a proportion of non-expressing cells. The standard and the over-dispersed gamma-Poisson models that are commonly used in bulk-cell RNA-sequencing are not able to capture this property. Results: We introduce a beta-Poisson mixture model that can capture the bimodality of the single-cell gene expression distribution. We further integrate the model into the generalized linear model framework in order to perform differential expression analyses. The whole analytical procedure is called BPSC. The results from several real single-cell RNA-seq datasets indicate that ∼90{\%} of the transcripts are well characterized by the beta-Poisson model; the model-fit from BPSC is better than the fit of the standard gamma-Poisson model in > 80{\%} of the transcripts. Moreover, in differential expression analyses of simulated and real datasets, BPSC performs well against edgeR, a conventional method widely used in bulk-cell RNA-sequencing data, and against scde and MAST, two recent methods specifically designed for single-cell RNA-seq data. Availability and Implementation: An R package BPSC for model fitting and differential expression analyses of single-cell RNA-seq data is available under GPL-3 license at https://github.com/nghiavtr/BPSC. Contact: or mattias.rantalainen@ki.se Supplementary information: Supplementary data are available at Bioinformatics online.",
author = "Vu, {Trung Nghia} and Wills, {Quin F.} and Kalari, {Krishna R} and Nifang Niu and Wang, {Liewei M} and Mattias Rantalainen and Yudi Pawitan",
year = "2016",
month = "7",
day = "15",
doi = "10.1093/bioinformatics/btw202",
language = "English (US)",
volume = "32",
pages = "2128--2135",
journal = "Bioinformatics",
issn = "1367-4803",
publisher = "Oxford University Press",
number = "14",

}

TY - JOUR

T1 - Beta-Poisson model for single-cell RNA-seq data analyses

AU - Vu, Trung Nghia

AU - Wills, Quin F.

AU - Kalari, Krishna R

AU - Niu, Nifang

AU - Wang, Liewei M

AU - Rantalainen, Mattias

AU - Pawitan, Yudi

PY - 2016/7/15

Y1 - 2016/7/15

N2 - Motivation: Single-cell RNA-sequencing technology allows detection of gene expression at the single-cell level. One typical feature of the data is a bimodality in the cellular distribution even for highly expressed genes, primarily caused by a proportion of non-expressing cells. The standard and the over-dispersed gamma-Poisson models that are commonly used in bulk-cell RNA-sequencing are not able to capture this property. Results: We introduce a beta-Poisson mixture model that can capture the bimodality of the single-cell gene expression distribution. We further integrate the model into the generalized linear model framework in order to perform differential expression analyses. The whole analytical procedure is called BPSC. The results from several real single-cell RNA-seq datasets indicate that ∼90% of the transcripts are well characterized by the beta-Poisson model; the model-fit from BPSC is better than the fit of the standard gamma-Poisson model in > 80% of the transcripts. Moreover, in differential expression analyses of simulated and real datasets, BPSC performs well against edgeR, a conventional method widely used in bulk-cell RNA-sequencing data, and against scde and MAST, two recent methods specifically designed for single-cell RNA-seq data. Availability and Implementation: An R package BPSC for model fitting and differential expression analyses of single-cell RNA-seq data is available under GPL-3 license at https://github.com/nghiavtr/BPSC. Contact: or mattias.rantalainen@ki.se Supplementary information: Supplementary data are available at Bioinformatics online.

AB - Motivation: Single-cell RNA-sequencing technology allows detection of gene expression at the single-cell level. One typical feature of the data is a bimodality in the cellular distribution even for highly expressed genes, primarily caused by a proportion of non-expressing cells. The standard and the over-dispersed gamma-Poisson models that are commonly used in bulk-cell RNA-sequencing are not able to capture this property. Results: We introduce a beta-Poisson mixture model that can capture the bimodality of the single-cell gene expression distribution. We further integrate the model into the generalized linear model framework in order to perform differential expression analyses. The whole analytical procedure is called BPSC. The results from several real single-cell RNA-seq datasets indicate that ∼90% of the transcripts are well characterized by the beta-Poisson model; the model-fit from BPSC is better than the fit of the standard gamma-Poisson model in > 80% of the transcripts. Moreover, in differential expression analyses of simulated and real datasets, BPSC performs well against edgeR, a conventional method widely used in bulk-cell RNA-sequencing data, and against scde and MAST, two recent methods specifically designed for single-cell RNA-seq data. Availability and Implementation: An R package BPSC for model fitting and differential expression analyses of single-cell RNA-seq data is available under GPL-3 license at https://github.com/nghiavtr/BPSC. Contact: or mattias.rantalainen@ki.se Supplementary information: Supplementary data are available at Bioinformatics online.

UR - http://www.scopus.com/inward/record.url?scp=84984845042&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84984845042&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/btw202

DO - 10.1093/bioinformatics/btw202

M3 - Article

C2 - 27153638

AN - SCOPUS:84984845042

VL - 32

SP - 2128

EP - 2135

JO - Bioinformatics

JF - Bioinformatics

SN - 1367-4803

IS - 14

ER -