Alternating em algorithm for a bilinear model in isoform quantification from RNA-seq data

Wenjiang Deng; Tian Mou; Krishna R. Kalari; Nifang Niu; Liewei Wang; Yudi Pawitan; Trung Nghia Vu

doi:10.1093/bioinformatics/btz640

Alternating em algorithm for a bilinear model in isoform quantification from RNA-seq data

Wenjiang Deng, Tian Mou, Krishna R. Kalari, Nifang Niu, Liewei Wang, Yudi Pawitan, Trung Nghia Vu

Research output: Contribution to journal › Article › peer-review

1 Scopus citations

Abstract

Motivation: Estimation of isoform-level gene expression from RNA-seq data depends on simplifying assumptions, such as uniform read distribution, that are easily violated in real data. Such violations typically lead to biased estimates. Most existing methods provide bias correction step(s), which is based on biological considerations-such as GC content-and applied in single samples separately. The main problem is that not all biases are known. Results: We have developed a novel method called XAEM based on a more flexible and robust statistical model. Existing methods are essentially based on a linear model Xβ, where the design matrix X is known and is computed based on the simplifying assumptions. In contrast XAEM considers Xβ as a bilinear model with both X and β unknown. Joint estimation of X and β is made possible by a simultaneous analysis of multi-sample RNA-seq data. Compared to existing methods, XAEM automatically performs empirical correction of potentially unknown biases. We use an alternating expectation-maximization (AEM) algorithm, alternating between estimation of X and β. For speed XAEM utilizes quasi-mapping for read alignment, thus leading to a fast algorithm. Overall XAEM performs favorably compared to recent advanced methods. For simulated datasets, XAEM obtains higher accuracy for multiple-isoform genes. In a differential-expression analysis of a real single-cell RNA-seq dataset, XAEM achieves substantially better rediscovery rates in independent validation sets.

Original language	English (US)
Pages (from-to)	805-812
Number of pages	8
Journal	Bioinformatics
Volume	36
Issue number	3
DOIs	https://doi.org/10.1093/bioinformatics/btz640
State	Published - Feb 1 2020

ASJC Scopus subject areas

Statistics and Probability
Biochemistry
Molecular Biology
Computer Science Applications
Computational Theory and Mathematics
Computational Mathematics

Access to Document

10.1093/bioinformatics/btz640

Cite this

@article{fb1f73a3038c49c18d487f2e2ea2f802,

title = "Alternating em algorithm for a bilinear model in isoform quantification from RNA-seq data",

abstract = "Motivation: Estimation of isoform-level gene expression from RNA-seq data depends on simplifying assumptions, such as uniform read distribution, that are easily violated in real data. Such violations typically lead to biased estimates. Most existing methods provide bias correction step(s), which is based on biological considerations-such as GC content-and applied in single samples separately. The main problem is that not all biases are known. Results: We have developed a novel method called XAEM based on a more flexible and robust statistical model. Existing methods are essentially based on a linear model Xβ, where the design matrix X is known and is computed based on the simplifying assumptions. In contrast XAEM considers Xβ as a bilinear model with both X and β unknown. Joint estimation of X and β is made possible by a simultaneous analysis of multi-sample RNA-seq data. Compared to existing methods, XAEM automatically performs empirical correction of potentially unknown biases. We use an alternating expectation-maximization (AEM) algorithm, alternating between estimation of X and β. For speed XAEM utilizes quasi-mapping for read alignment, thus leading to a fast algorithm. Overall XAEM performs favorably compared to recent advanced methods. For simulated datasets, XAEM obtains higher accuracy for multiple-isoform genes. In a differential-expression analysis of a real single-cell RNA-seq dataset, XAEM achieves substantially better rediscovery rates in independent validation sets.",

author = "Wenjiang Deng and Tian Mou and Kalari, {Krishna R.} and Nifang Niu and Liewei Wang and Yudi Pawitan and Vu, {Trung Nghia}",

note = "Publisher Copyright: {\textcopyright} 2019 The Author(s). Published by Oxford University Press.",

year = "2020",

month = feb,

day = "1",

doi = "10.1093/bioinformatics/btz640",

language = "English (US)",

volume = "36",

pages = "805--812",

journal = "Bioinformatics",

issn = "1367-4803",

publisher = "Oxford University Press",

number = "3",

}

TY - JOUR

T1 - Alternating em algorithm for a bilinear model in isoform quantification from RNA-seq data

AU - Deng, Wenjiang

AU - Mou, Tian

AU - Kalari, Krishna R.

AU - Niu, Nifang

AU - Wang, Liewei

AU - Pawitan, Yudi

AU - Vu, Trung Nghia

PY - 2020/2/1

Y1 - 2020/2/1

N2 - Motivation: Estimation of isoform-level gene expression from RNA-seq data depends on simplifying assumptions, such as uniform read distribution, that are easily violated in real data. Such violations typically lead to biased estimates. Most existing methods provide bias correction step(s), which is based on biological considerations-such as GC content-and applied in single samples separately. The main problem is that not all biases are known. Results: We have developed a novel method called XAEM based on a more flexible and robust statistical model. Existing methods are essentially based on a linear model Xβ, where the design matrix X is known and is computed based on the simplifying assumptions. In contrast XAEM considers Xβ as a bilinear model with both X and β unknown. Joint estimation of X and β is made possible by a simultaneous analysis of multi-sample RNA-seq data. Compared to existing methods, XAEM automatically performs empirical correction of potentially unknown biases. We use an alternating expectation-maximization (AEM) algorithm, alternating between estimation of X and β. For speed XAEM utilizes quasi-mapping for read alignment, thus leading to a fast algorithm. Overall XAEM performs favorably compared to recent advanced methods. For simulated datasets, XAEM obtains higher accuracy for multiple-isoform genes. In a differential-expression analysis of a real single-cell RNA-seq dataset, XAEM achieves substantially better rediscovery rates in independent validation sets.

AB - Motivation: Estimation of isoform-level gene expression from RNA-seq data depends on simplifying assumptions, such as uniform read distribution, that are easily violated in real data. Such violations typically lead to biased estimates. Most existing methods provide bias correction step(s), which is based on biological considerations-such as GC content-and applied in single samples separately. The main problem is that not all biases are known. Results: We have developed a novel method called XAEM based on a more flexible and robust statistical model. Existing methods are essentially based on a linear model Xβ, where the design matrix X is known and is computed based on the simplifying assumptions. In contrast XAEM considers Xβ as a bilinear model with both X and β unknown. Joint estimation of X and β is made possible by a simultaneous analysis of multi-sample RNA-seq data. Compared to existing methods, XAEM automatically performs empirical correction of potentially unknown biases. We use an alternating expectation-maximization (AEM) algorithm, alternating between estimation of X and β. For speed XAEM utilizes quasi-mapping for read alignment, thus leading to a fast algorithm. Overall XAEM performs favorably compared to recent advanced methods. For simulated datasets, XAEM obtains higher accuracy for multiple-isoform genes. In a differential-expression analysis of a real single-cell RNA-seq dataset, XAEM achieves substantially better rediscovery rates in independent validation sets.

UR - http://www.scopus.com/inward/record.url?scp=85079077045&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85079077045&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/btz640

DO - 10.1093/bioinformatics/btz640

M3 - Article

C2 - 31400221

AN - SCOPUS:85079077045

SN - 1367-4803

VL - 36

SP - 805

EP - 812

JO - Bioinformatics

JF - Bioinformatics

IS - 3

ER -

Alternating em algorithm for a bilinear model in isoform quantification from RNA-seq data

Abstract

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this