SERE: Single-parameter quality control and sample comparison for RNA-Seq

Stefan K. Schulze; Rahul Kanwar; Meike Gölzenleuchter; Terry M. Therneau; Andreas S. Beutler

doi:10.1186/1471-2164-13-524

SERE: Single-parameter quality control and sample comparison for RNA-Seq

Stefan K. Schulze, Rahul Kanwar, Meike Gölzenleuchter, Terry M. Therneau, Andreas S. Beutler

Research output: Contribution to journal › Article › peer-review

78 Scopus citations

Abstract

Background: Assessing the reliability of experimental replicates (or global alterations corresponding to different experimental conditions) is a critical step in analyzing RNA-Seq data. Pearson's correlation coefficient r has been widely used in the RNA-Seq field even though its statistical characteristics may be poorly suited to the task.Results: Here we present a single-parameter test procedure for count data, the Simple Error Ratio Estimate (SERE), that can determine whether two RNA-Seq libraries are faithful replicates or globally different. Benchmarking shows that the interpretation of SERE is unambiguous regardless of the total read count or the range of expression differences among bins (exons or genes), a score of 1 indicating faithful replication (i.e., samples are affected only by Poisson variation of individual counts), a score of 0 indicating data duplication, and scores >1 corresponding to true global differences between RNA-Seq libraries. On the contrary the interpretation of Pearson's r is generally ambiguous and highly dependent on sequencing depth and the range of expression levels inherent to the sample (difference between lowest and highest bin count). Cohen's simple Kappa results are also ambiguous and are highly dependent on the choice of bins. For quantifying global sample differences SERE performs similarly to a measure based on the negative binomial distribution yet is simpler to compute.Conclusions: SERE can therefore serve as a straightforward and reliable statistical procedure for the global assessment of pairs or large groups of RNA-Seq datasets by a single statistical parameter.

Original language	English (US)
Article number	524
Journal	BMC genomics
Volume	13
Issue number	1
DOIs	https://doi.org/10.1186/1471-2164-13-524
State	Published - Oct 3 2012

Keywords

Count data
Kappa
Pearson's correlation coefficient
Poisson variation
RNA-Seq
Replicates
SERE
Simple Error Ratio Estimate

ASJC Scopus subject areas

Biotechnology
Genetics

Access to Document

10.1186/1471-2164-13-524

Cite this

@article{97284c4f960345df86b3e8192aabd763,

title = "SERE: Single-parameter quality control and sample comparison for RNA-Seq",

abstract = "Background: Assessing the reliability of experimental replicates (or global alterations corresponding to different experimental conditions) is a critical step in analyzing RNA-Seq data. Pearson's correlation coefficient r has been widely used in the RNA-Seq field even though its statistical characteristics may be poorly suited to the task.Results: Here we present a single-parameter test procedure for count data, the Simple Error Ratio Estimate (SERE), that can determine whether two RNA-Seq libraries are faithful replicates or globally different. Benchmarking shows that the interpretation of SERE is unambiguous regardless of the total read count or the range of expression differences among bins (exons or genes), a score of 1 indicating faithful replication (i.e., samples are affected only by Poisson variation of individual counts), a score of 0 indicating data duplication, and scores >1 corresponding to true global differences between RNA-Seq libraries. On the contrary the interpretation of Pearson's r is generally ambiguous and highly dependent on sequencing depth and the range of expression levels inherent to the sample (difference between lowest and highest bin count). Cohen's simple Kappa results are also ambiguous and are highly dependent on the choice of bins. For quantifying global sample differences SERE performs similarly to a measure based on the negative binomial distribution yet is simpler to compute.Conclusions: SERE can therefore serve as a straightforward and reliable statistical procedure for the global assessment of pairs or large groups of RNA-Seq datasets by a single statistical parameter.",

keywords = "Count data, Kappa, Pearson's correlation coefficient, Poisson variation, RNA-Seq, Replicates, SERE, Simple Error Ratio Estimate",

author = "Schulze, {Stefan K.} and Rahul Kanwar and Meike G{\"o}lzenleuchter and Therneau, {Terry M.} and Beutler, {Andreas S.}",

note = "Funding Information: This research was supported by NINDS.",

year = "2012",

month = oct,

day = "3",

doi = "10.1186/1471-2164-13-524",

language = "English (US)",

volume = "13",

journal = "BMC genomics",

issn = "1471-2164",

publisher = "BioMed Central",

number = "1",

}

TY - JOUR

T1 - SERE

T2 - Single-parameter quality control and sample comparison for RNA-Seq

AU - Schulze, Stefan K.

AU - Kanwar, Rahul

AU - Gölzenleuchter, Meike

AU - Therneau, Terry M.

AU - Beutler, Andreas S.

N1 - Funding Information: This research was supported by NINDS.

PY - 2012/10/3

Y1 - 2012/10/3

N2 - Background: Assessing the reliability of experimental replicates (or global alterations corresponding to different experimental conditions) is a critical step in analyzing RNA-Seq data. Pearson's correlation coefficient r has been widely used in the RNA-Seq field even though its statistical characteristics may be poorly suited to the task.Results: Here we present a single-parameter test procedure for count data, the Simple Error Ratio Estimate (SERE), that can determine whether two RNA-Seq libraries are faithful replicates or globally different. Benchmarking shows that the interpretation of SERE is unambiguous regardless of the total read count or the range of expression differences among bins (exons or genes), a score of 1 indicating faithful replication (i.e., samples are affected only by Poisson variation of individual counts), a score of 0 indicating data duplication, and scores >1 corresponding to true global differences between RNA-Seq libraries. On the contrary the interpretation of Pearson's r is generally ambiguous and highly dependent on sequencing depth and the range of expression levels inherent to the sample (difference between lowest and highest bin count). Cohen's simple Kappa results are also ambiguous and are highly dependent on the choice of bins. For quantifying global sample differences SERE performs similarly to a measure based on the negative binomial distribution yet is simpler to compute.Conclusions: SERE can therefore serve as a straightforward and reliable statistical procedure for the global assessment of pairs or large groups of RNA-Seq datasets by a single statistical parameter.

AB - Background: Assessing the reliability of experimental replicates (or global alterations corresponding to different experimental conditions) is a critical step in analyzing RNA-Seq data. Pearson's correlation coefficient r has been widely used in the RNA-Seq field even though its statistical characteristics may be poorly suited to the task.Results: Here we present a single-parameter test procedure for count data, the Simple Error Ratio Estimate (SERE), that can determine whether two RNA-Seq libraries are faithful replicates or globally different. Benchmarking shows that the interpretation of SERE is unambiguous regardless of the total read count or the range of expression differences among bins (exons or genes), a score of 1 indicating faithful replication (i.e., samples are affected only by Poisson variation of individual counts), a score of 0 indicating data duplication, and scores >1 corresponding to true global differences between RNA-Seq libraries. On the contrary the interpretation of Pearson's r is generally ambiguous and highly dependent on sequencing depth and the range of expression levels inherent to the sample (difference between lowest and highest bin count). Cohen's simple Kappa results are also ambiguous and are highly dependent on the choice of bins. For quantifying global sample differences SERE performs similarly to a measure based on the negative binomial distribution yet is simpler to compute.Conclusions: SERE can therefore serve as a straightforward and reliable statistical procedure for the global assessment of pairs or large groups of RNA-Seq datasets by a single statistical parameter.

KW - Count data

KW - Kappa

KW - Pearson's correlation coefficient

KW - Poisson variation

KW - RNA-Seq

KW - Replicates

KW - SERE

KW - Simple Error Ratio Estimate

UR - http://www.scopus.com/inward/record.url?scp=84866870523&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84866870523&partnerID=8YFLogxK

U2 - 10.1186/1471-2164-13-524

DO - 10.1186/1471-2164-13-524

M3 - Article

C2 - 23033915

AN - SCOPUS:84866870523

SN - 1471-2164

VL - 13

JO - BMC genomics

JF - BMC genomics

IS - 1

M1 - 524

ER -

SERE: Single-parameter quality control and sample comparison for RNA-Seq

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this