A latent model for prioritization of SNPs for functional studies

Brooke L. Fridley, Ed Iversen, Ya Yu Tsai, Gregory D. Jenkins, Ellen L Goode, Thomas A. Sellers

Research output: Contribution to journalArticle

6 Citations (Scopus)

Abstract

One difficult question facing researchers is how to prioritize SNPs detected from genetic association studies for functional studies. Often a list of the top M SNPs is determined based on solely the p-value from an association analysis, where M is determined by financial/time constraints. For many studies of complex diseases, multiple analyses have been completed and integrating these multiple sets of results may be difficult. One may also wish to incorporate biological knowledge, such as whether the SNP is in the exon of a gene or a regulatory region, into the selection of markers to follow-up. In this manuscript, we propose a Bayesian latent variable model (BLVM) for incorporating "features" about a SNP to estimate a latent "quality score", with SNPs prioritized based on the posterior probability distribution of the rankings of these quality scores. We illustrate the method using data from an ovarian cancer genome-wide association study (GWAS). In addition to the application of the BLVM to the ovarian GWAS, we applied the BLVM to simulated data which mimics the setting involving the prioritization of markers across multiple GWAS for related diseases/traits. The top ranked SNP by BLVM for the ovarian GWAS, ranked 2nd and 7th based on p-values from analyses of all invasive and invasive serous cases. The top SNP based on serous case analysis p-value (which ranked 197th for invasive case analysis), was ranked 8th based on the posterior probability of being in the top 5 markers (0.13). In summary, the application of the BLVM allows for the systematic integration of multiple SNP "features" for the prioritization of loci for fine-mapping or functional studies, taking into account the uncertainty in ranking.

Original languageEnglish (US)
Article numbere20764
JournalPLoS One
Volume6
Issue number6
DOIs
StatePublished - 2011

Fingerprint

prioritization
Single Nucleotide Polymorphism
Genes
Genome-Wide Association Study
Value engineering
ovarian neoplasms
Nucleic Acid Regulatory Sequences
probability distribution
Probability distributions
exons
Exons
uncertainty
researchers
Genetic Association Studies
loci
Ovarian Neoplasms
genome-wide association study
Uncertainty
Research Personnel
genes

ASJC Scopus subject areas

  • Agricultural and Biological Sciences(all)
  • Biochemistry, Genetics and Molecular Biology(all)
  • Medicine(all)

Cite this

Fridley, B. L., Iversen, E., Tsai, Y. Y., Jenkins, G. D., Goode, E. L., & Sellers, T. A. (2011). A latent model for prioritization of SNPs for functional studies. PLoS One, 6(6), [e20764]. https://doi.org/10.1371/journal.pone.0020764

A latent model for prioritization of SNPs for functional studies. / Fridley, Brooke L.; Iversen, Ed; Tsai, Ya Yu; Jenkins, Gregory D.; Goode, Ellen L; Sellers, Thomas A.

In: PLoS One, Vol. 6, No. 6, e20764, 2011.

Research output: Contribution to journalArticle

Fridley, BL, Iversen, E, Tsai, YY, Jenkins, GD, Goode, EL & Sellers, TA 2011, 'A latent model for prioritization of SNPs for functional studies', PLoS One, vol. 6, no. 6, e20764. https://doi.org/10.1371/journal.pone.0020764
Fridley, Brooke L. ; Iversen, Ed ; Tsai, Ya Yu ; Jenkins, Gregory D. ; Goode, Ellen L ; Sellers, Thomas A. / A latent model for prioritization of SNPs for functional studies. In: PLoS One. 2011 ; Vol. 6, No. 6.
@article{de882dd559cb43c9b1e8c178339f6ccb,
title = "A latent model for prioritization of SNPs for functional studies",
abstract = "One difficult question facing researchers is how to prioritize SNPs detected from genetic association studies for functional studies. Often a list of the top M SNPs is determined based on solely the p-value from an association analysis, where M is determined by financial/time constraints. For many studies of complex diseases, multiple analyses have been completed and integrating these multiple sets of results may be difficult. One may also wish to incorporate biological knowledge, such as whether the SNP is in the exon of a gene or a regulatory region, into the selection of markers to follow-up. In this manuscript, we propose a Bayesian latent variable model (BLVM) for incorporating {"}features{"} about a SNP to estimate a latent {"}quality score{"}, with SNPs prioritized based on the posterior probability distribution of the rankings of these quality scores. We illustrate the method using data from an ovarian cancer genome-wide association study (GWAS). In addition to the application of the BLVM to the ovarian GWAS, we applied the BLVM to simulated data which mimics the setting involving the prioritization of markers across multiple GWAS for related diseases/traits. The top ranked SNP by BLVM for the ovarian GWAS, ranked 2nd and 7th based on p-values from analyses of all invasive and invasive serous cases. The top SNP based on serous case analysis p-value (which ranked 197th for invasive case analysis), was ranked 8th based on the posterior probability of being in the top 5 markers (0.13). In summary, the application of the BLVM allows for the systematic integration of multiple SNP {"}features{"} for the prioritization of loci for fine-mapping or functional studies, taking into account the uncertainty in ranking.",
author = "Fridley, {Brooke L.} and Ed Iversen and Tsai, {Ya Yu} and Jenkins, {Gregory D.} and Goode, {Ellen L} and Sellers, {Thomas A.}",
year = "2011",
doi = "10.1371/journal.pone.0020764",
language = "English (US)",
volume = "6",
journal = "PLoS One",
issn = "1932-6203",
publisher = "Public Library of Science",
number = "6",

}

TY - JOUR

T1 - A latent model for prioritization of SNPs for functional studies

AU - Fridley, Brooke L.

AU - Iversen, Ed

AU - Tsai, Ya Yu

AU - Jenkins, Gregory D.

AU - Goode, Ellen L

AU - Sellers, Thomas A.

PY - 2011

Y1 - 2011

N2 - One difficult question facing researchers is how to prioritize SNPs detected from genetic association studies for functional studies. Often a list of the top M SNPs is determined based on solely the p-value from an association analysis, where M is determined by financial/time constraints. For many studies of complex diseases, multiple analyses have been completed and integrating these multiple sets of results may be difficult. One may also wish to incorporate biological knowledge, such as whether the SNP is in the exon of a gene or a regulatory region, into the selection of markers to follow-up. In this manuscript, we propose a Bayesian latent variable model (BLVM) for incorporating "features" about a SNP to estimate a latent "quality score", with SNPs prioritized based on the posterior probability distribution of the rankings of these quality scores. We illustrate the method using data from an ovarian cancer genome-wide association study (GWAS). In addition to the application of the BLVM to the ovarian GWAS, we applied the BLVM to simulated data which mimics the setting involving the prioritization of markers across multiple GWAS for related diseases/traits. The top ranked SNP by BLVM for the ovarian GWAS, ranked 2nd and 7th based on p-values from analyses of all invasive and invasive serous cases. The top SNP based on serous case analysis p-value (which ranked 197th for invasive case analysis), was ranked 8th based on the posterior probability of being in the top 5 markers (0.13). In summary, the application of the BLVM allows for the systematic integration of multiple SNP "features" for the prioritization of loci for fine-mapping or functional studies, taking into account the uncertainty in ranking.

AB - One difficult question facing researchers is how to prioritize SNPs detected from genetic association studies for functional studies. Often a list of the top M SNPs is determined based on solely the p-value from an association analysis, where M is determined by financial/time constraints. For many studies of complex diseases, multiple analyses have been completed and integrating these multiple sets of results may be difficult. One may also wish to incorporate biological knowledge, such as whether the SNP is in the exon of a gene or a regulatory region, into the selection of markers to follow-up. In this manuscript, we propose a Bayesian latent variable model (BLVM) for incorporating "features" about a SNP to estimate a latent "quality score", with SNPs prioritized based on the posterior probability distribution of the rankings of these quality scores. We illustrate the method using data from an ovarian cancer genome-wide association study (GWAS). In addition to the application of the BLVM to the ovarian GWAS, we applied the BLVM to simulated data which mimics the setting involving the prioritization of markers across multiple GWAS for related diseases/traits. The top ranked SNP by BLVM for the ovarian GWAS, ranked 2nd and 7th based on p-values from analyses of all invasive and invasive serous cases. The top SNP based on serous case analysis p-value (which ranked 197th for invasive case analysis), was ranked 8th based on the posterior probability of being in the top 5 markers (0.13). In summary, the application of the BLVM allows for the systematic integration of multiple SNP "features" for the prioritization of loci for fine-mapping or functional studies, taking into account the uncertainty in ranking.

UR - http://www.scopus.com/inward/record.url?scp=79958093993&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=79958093993&partnerID=8YFLogxK

U2 - 10.1371/journal.pone.0020764

DO - 10.1371/journal.pone.0020764

M3 - Article

VL - 6

JO - PLoS One

JF - PLoS One

SN - 1932-6203

IS - 6

M1 - e20764

ER -