TY - JOUR
T1 - Two-phase designs to follow-up genome-wide association signals with DNA resequencing studies
AU - Schaid, Daniel J.
AU - Jenkins, Gregory D.
AU - Ingle, James N.
AU - Weinshilboum, Richard M.
PY - 2013/4
Y1 - 2013/4
N2 - Genome-wide association studies (GWAS) of complex traits have generated many association signals for single nucleotide polymorphisms (SNPs). To understand the underlying causal genetic variant(s), focused DNA resequencing of targeted genomic regions is commonly used, yet the current cost of resequencing limits sample sizes for resequencing studies. Information from the large GWAS can be used to guide choice of samples for resequencing, such as the SNP genotypes in the targeted genomic region. Viewing the GWAS tag-SNPs as imperfect surrogates for the underlying causal variants, yet expecting that the tag-SNPs are correlated with the causal variants, a reasonable approach is a two-phase case-control design, with the GWAS serving as the first-phase and the resequencing study serving as the second-phase. Using stratified sampling based on both tag-SNP genotypes and case-control status, we explore the gains in power of a two-phase design relative to randomly sampling cases and controls for resequencing (i.e., ignoring tag-SNP genotypes). Simulation results show that stratified sampling based on both tag-SNP genotypes and case-control status is not likely to have lower power than stratified sampling based only on case-control status, and can sometimes have substantially greater power. The gain in power depends on the amount of linkage disequilibrium between the tag-SNP and causal variant alleles, as well as the effect size of the causal variant. Hence, the two-phase design provides an efficient approach to follow-up GWAS signals with DNA resequencing.
AB - Genome-wide association studies (GWAS) of complex traits have generated many association signals for single nucleotide polymorphisms (SNPs). To understand the underlying causal genetic variant(s), focused DNA resequencing of targeted genomic regions is commonly used, yet the current cost of resequencing limits sample sizes for resequencing studies. Information from the large GWAS can be used to guide choice of samples for resequencing, such as the SNP genotypes in the targeted genomic region. Viewing the GWAS tag-SNPs as imperfect surrogates for the underlying causal variants, yet expecting that the tag-SNPs are correlated with the causal variants, a reasonable approach is a two-phase case-control design, with the GWAS serving as the first-phase and the resequencing study serving as the second-phase. Using stratified sampling based on both tag-SNP genotypes and case-control status, we explore the gains in power of a two-phase design relative to randomly sampling cases and controls for resequencing (i.e., ignoring tag-SNP genotypes). Simulation results show that stratified sampling based on both tag-SNP genotypes and case-control status is not likely to have lower power than stratified sampling based only on case-control status, and can sometimes have substantially greater power. The gain in power depends on the amount of linkage disequilibrium between the tag-SNP and causal variant alleles, as well as the effect size of the causal variant. Hence, the two-phase design provides an efficient approach to follow-up GWAS signals with DNA resequencing.
KW - DNA resequencing
KW - Horwitz-Thompson estimate
KW - Inverse sampling fraction weights
KW - Two-phase sampling
UR - http://www.scopus.com/inward/record.url?scp=84875644185&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84875644185&partnerID=8YFLogxK
U2 - 10.1002/gepi.21708
DO - 10.1002/gepi.21708
M3 - Article
C2 - 23348637
AN - SCOPUS:84875644185
SN - 0741-0395
VL - 37
SP - 229
EP - 238
JO - Genetic Epidemiology
JF - Genetic Epidemiology
IS - 3
ER -