Self-Contained gene-set analysis of expression data: An evaluation of existing and novel methods

Brooke L. Fridley, Gregory D. Jenkins, Joanna M Biernacka

Research output: Contribution to journalArticle

45 Citations (Scopus)

Abstract

Gene set methods aim to assess the overall evidence of association of a set of genes with a phenotype, such as disease or a quantitative trait. Multiple approaches for gene set analysis of expression data have been proposed. They can be divided into two types: competitive and self-contained. Benefits of self-contained methods include that they can be used for genome-wide, candidate gene, or pathway studies, and have been reported to be more powerful than competitive methods. We therefore investigated ten self-contained methods that can be used for continuous, discrete and time-toevent phenotypes. To assess the power and type I error rate for the various previously proposed and novel approaches, an extensive simulation study was completed in which the scenarios varied according to: number of genes in a gene set, number of genes associated with the phenotype, effect sizes, correlation between expression of genes within a gene set, and the sample size. In addition to the simulated data, the various methods were applied to a pharmacogenomic study of the drug gemcitabine. Simulation results demonstrated that overall Fisher's method and the global model with random effects have the highest power for a wide range of scenarios, while the analysis based on the first principal component and Kolmogorov-Smirnov test tended to have lowest power. The methods investigated here are likely to play an important role in identifying pathways that contribute to complex traits.

Original languageEnglish (US)
Article numbere12693
Pages (from-to)1-9
Number of pages9
JournalPLoS One
Volume5
Issue number9
DOIs
StatePublished - 2010

Fingerprint

Genes
genes
gemcitabine
methodology
Phenotype
phenotype
pharmacogenomics
Nonparametric Statistics
quantitative traits
Sample Size
Genome
Gene Expression
drugs
gene expression
genome
Pharmaceutical Preparations
testing
sampling

ASJC Scopus subject areas

  • Agricultural and Biological Sciences(all)
  • Biochemistry, Genetics and Molecular Biology(all)
  • Medicine(all)

Cite this

Self-Contained gene-set analysis of expression data : An evaluation of existing and novel methods. / Fridley, Brooke L.; Jenkins, Gregory D.; Biernacka, Joanna M.

In: PLoS One, Vol. 5, No. 9, e12693, 2010, p. 1-9.

Research output: Contribution to journalArticle

@article{5fcdfc23bcf74fdfa9ec1504bb7846f4,
title = "Self-Contained gene-set analysis of expression data: An evaluation of existing and novel methods",
abstract = "Gene set methods aim to assess the overall evidence of association of a set of genes with a phenotype, such as disease or a quantitative trait. Multiple approaches for gene set analysis of expression data have been proposed. They can be divided into two types: competitive and self-contained. Benefits of self-contained methods include that they can be used for genome-wide, candidate gene, or pathway studies, and have been reported to be more powerful than competitive methods. We therefore investigated ten self-contained methods that can be used for continuous, discrete and time-toevent phenotypes. To assess the power and type I error rate for the various previously proposed and novel approaches, an extensive simulation study was completed in which the scenarios varied according to: number of genes in a gene set, number of genes associated with the phenotype, effect sizes, correlation between expression of genes within a gene set, and the sample size. In addition to the simulated data, the various methods were applied to a pharmacogenomic study of the drug gemcitabine. Simulation results demonstrated that overall Fisher's method and the global model with random effects have the highest power for a wide range of scenarios, while the analysis based on the first principal component and Kolmogorov-Smirnov test tended to have lowest power. The methods investigated here are likely to play an important role in identifying pathways that contribute to complex traits.",
author = "Fridley, {Brooke L.} and Jenkins, {Gregory D.} and Biernacka, {Joanna M}",
year = "2010",
doi = "10.1371/journal.pone.0012693",
language = "English (US)",
volume = "5",
pages = "1--9",
journal = "PLoS One",
issn = "1932-6203",
publisher = "Public Library of Science",
number = "9",

}

TY - JOUR

T1 - Self-Contained gene-set analysis of expression data

T2 - An evaluation of existing and novel methods

AU - Fridley, Brooke L.

AU - Jenkins, Gregory D.

AU - Biernacka, Joanna M

PY - 2010

Y1 - 2010

N2 - Gene set methods aim to assess the overall evidence of association of a set of genes with a phenotype, such as disease or a quantitative trait. Multiple approaches for gene set analysis of expression data have been proposed. They can be divided into two types: competitive and self-contained. Benefits of self-contained methods include that they can be used for genome-wide, candidate gene, or pathway studies, and have been reported to be more powerful than competitive methods. We therefore investigated ten self-contained methods that can be used for continuous, discrete and time-toevent phenotypes. To assess the power and type I error rate for the various previously proposed and novel approaches, an extensive simulation study was completed in which the scenarios varied according to: number of genes in a gene set, number of genes associated with the phenotype, effect sizes, correlation between expression of genes within a gene set, and the sample size. In addition to the simulated data, the various methods were applied to a pharmacogenomic study of the drug gemcitabine. Simulation results demonstrated that overall Fisher's method and the global model with random effects have the highest power for a wide range of scenarios, while the analysis based on the first principal component and Kolmogorov-Smirnov test tended to have lowest power. The methods investigated here are likely to play an important role in identifying pathways that contribute to complex traits.

AB - Gene set methods aim to assess the overall evidence of association of a set of genes with a phenotype, such as disease or a quantitative trait. Multiple approaches for gene set analysis of expression data have been proposed. They can be divided into two types: competitive and self-contained. Benefits of self-contained methods include that they can be used for genome-wide, candidate gene, or pathway studies, and have been reported to be more powerful than competitive methods. We therefore investigated ten self-contained methods that can be used for continuous, discrete and time-toevent phenotypes. To assess the power and type I error rate for the various previously proposed and novel approaches, an extensive simulation study was completed in which the scenarios varied according to: number of genes in a gene set, number of genes associated with the phenotype, effect sizes, correlation between expression of genes within a gene set, and the sample size. In addition to the simulated data, the various methods were applied to a pharmacogenomic study of the drug gemcitabine. Simulation results demonstrated that overall Fisher's method and the global model with random effects have the highest power for a wide range of scenarios, while the analysis based on the first principal component and Kolmogorov-Smirnov test tended to have lowest power. The methods investigated here are likely to play an important role in identifying pathways that contribute to complex traits.

UR - http://www.scopus.com/inward/record.url?scp=77958614226&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77958614226&partnerID=8YFLogxK

U2 - 10.1371/journal.pone.0012693

DO - 10.1371/journal.pone.0012693

M3 - Article

C2 - 20862301

AN - SCOPUS:77958614226

VL - 5

SP - 1

EP - 9

JO - PLoS One

JF - PLoS One

SN - 1932-6203

IS - 9

M1 - e12693

ER -