Streamlined analysis of pooled genotype data in SNP-based association studies

Valentina Moskvina; Nadine Norton; Nigel Williams; Peter Holmans; Michael Owen; Michael O'Donovan

doi:10.1002/gepi.20062

Streamlined analysis of pooled genotype data in SNP-based association studies

Valentina Moskvina, Nadine Norton, Nigel Williams, Peter Holmans, Michael Owen, Michael O'Donovan

Cancer Biology

Research output: Contribution to journal › Article › peer-review

24 Scopus citations

Abstract

Several groups have developed methods for estimating allele frequencies in DNA pools as a fast and cheap way for detecting allelic association between genetic markers and disease. To obtain accurate estimates of allele frequencies, a correction factor k for the degree to which measurement of allele-specific products is biased is generally applied. Factor k is usually obtained as the ratio of the two allele-specific signals in samples from heterozygous individuals, a step that can significantly impair throughput and increase cost. We have systematically investigated the properties of k through the use of empirical and simulated data. We show that for the dye terminator primer extension genotyping method we have applied, the correction factor k is substantially influenced by the dye terminators incorporated, but also by the terminal 3′ base of the extension primer. We also show that the variation in k is large enough to result in unacceptable error rates if association studies are conducted without regard to k. We show that the impact of ignoring k can be neutralized by applying a correction factor k_max that can be easily derived, but this at the potential cost of an increase in type I error. Finally, based upon observed distributions for k we derive a method allowing the estimation of the probability pooled data reflects significant differences in the allele frequencies between the subjects comprising the pools. By controlling the error rates in the absence of knowledge of the appropriate SNP-specific correction factors, each approach enhances the performance of DNA pooling, while considerably streamlining the method by reducing time and cost.

Original language	English (US)
Pages (from-to)	273-282
Number of pages	10
Journal	Genetic epidemiology
Volume	28
Issue number	3
DOIs	https://doi.org/10.1002/gepi.20062
State	Published - Apr 2005

Keywords

Association
Case-control differential amplification
DNA pool
SNP

ASJC Scopus subject areas

Epidemiology
Genetics(clinical)

Access to Document

10.1002/gepi.20062

Cite this

@article{985b04e1885245ac8a98ad4db2623b03,

title = "Streamlined analysis of pooled genotype data in SNP-based association studies",

abstract = "Several groups have developed methods for estimating allele frequencies in DNA pools as a fast and cheap way for detecting allelic association between genetic markers and disease. To obtain accurate estimates of allele frequencies, a correction factor k for the degree to which measurement of allele-specific products is biased is generally applied. Factor k is usually obtained as the ratio of the two allele-specific signals in samples from heterozygous individuals, a step that can significantly impair throughput and increase cost. We have systematically investigated the properties of k through the use of empirical and simulated data. We show that for the dye terminator primer extension genotyping method we have applied, the correction factor k is substantially influenced by the dye terminators incorporated, but also by the terminal 3′ base of the extension primer. We also show that the variation in k is large enough to result in unacceptable error rates if association studies are conducted without regard to k. We show that the impact of ignoring k can be neutralized by applying a correction factor kmax that can be easily derived, but this at the potential cost of an increase in type I error. Finally, based upon observed distributions for k we derive a method allowing the estimation of the probability pooled data reflects significant differences in the allele frequencies between the subjects comprising the pools. By controlling the error rates in the absence of knowledge of the appropriate SNP-specific correction factors, each approach enhances the performance of DNA pooling, while considerably streamlining the method by reducing time and cost.",

keywords = "Association, Case-control differential amplification, DNA pool, SNP",

author = "Valentina Moskvina and Nadine Norton and Nigel Williams and Peter Holmans and Michael Owen and Michael O'Donovan",

year = "2005",

month = apr,

doi = "10.1002/gepi.20062",

language = "English (US)",

volume = "28",

pages = "273--282",

journal = "Genetic epidemiology",

issn = "0741-0395",

publisher = "Wiley-Liss Inc.",

number = "3",

}

TY - JOUR

T1 - Streamlined analysis of pooled genotype data in SNP-based association studies

AU - Moskvina, Valentina

AU - Norton, Nadine

AU - Williams, Nigel

AU - Holmans, Peter

AU - Owen, Michael

AU - O'Donovan, Michael

PY - 2005/4

Y1 - 2005/4

N2 - Several groups have developed methods for estimating allele frequencies in DNA pools as a fast and cheap way for detecting allelic association between genetic markers and disease. To obtain accurate estimates of allele frequencies, a correction factor k for the degree to which measurement of allele-specific products is biased is generally applied. Factor k is usually obtained as the ratio of the two allele-specific signals in samples from heterozygous individuals, a step that can significantly impair throughput and increase cost. We have systematically investigated the properties of k through the use of empirical and simulated data. We show that for the dye terminator primer extension genotyping method we have applied, the correction factor k is substantially influenced by the dye terminators incorporated, but also by the terminal 3′ base of the extension primer. We also show that the variation in k is large enough to result in unacceptable error rates if association studies are conducted without regard to k. We show that the impact of ignoring k can be neutralized by applying a correction factor kmax that can be easily derived, but this at the potential cost of an increase in type I error. Finally, based upon observed distributions for k we derive a method allowing the estimation of the probability pooled data reflects significant differences in the allele frequencies between the subjects comprising the pools. By controlling the error rates in the absence of knowledge of the appropriate SNP-specific correction factors, each approach enhances the performance of DNA pooling, while considerably streamlining the method by reducing time and cost.

AB - Several groups have developed methods for estimating allele frequencies in DNA pools as a fast and cheap way for detecting allelic association between genetic markers and disease. To obtain accurate estimates of allele frequencies, a correction factor k for the degree to which measurement of allele-specific products is biased is generally applied. Factor k is usually obtained as the ratio of the two allele-specific signals in samples from heterozygous individuals, a step that can significantly impair throughput and increase cost. We have systematically investigated the properties of k through the use of empirical and simulated data. We show that for the dye terminator primer extension genotyping method we have applied, the correction factor k is substantially influenced by the dye terminators incorporated, but also by the terminal 3′ base of the extension primer. We also show that the variation in k is large enough to result in unacceptable error rates if association studies are conducted without regard to k. We show that the impact of ignoring k can be neutralized by applying a correction factor kmax that can be easily derived, but this at the potential cost of an increase in type I error. Finally, based upon observed distributions for k we derive a method allowing the estimation of the probability pooled data reflects significant differences in the allele frequencies between the subjects comprising the pools. By controlling the error rates in the absence of knowledge of the appropriate SNP-specific correction factors, each approach enhances the performance of DNA pooling, while considerably streamlining the method by reducing time and cost.

KW - Association

KW - Case-control differential amplification

KW - DNA pool

KW - SNP

UR - http://www.scopus.com/inward/record.url?scp=15544371188&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=15544371188&partnerID=8YFLogxK

U2 - 10.1002/gepi.20062

DO - 10.1002/gepi.20062

M3 - Article

C2 - 15700279

AN - SCOPUS:15544371188

SN - 0741-0395

VL - 28

SP - 273

EP - 282

JO - Genetic epidemiology

JF - Genetic epidemiology

IS - 3

ER -

Streamlined analysis of pooled genotype data in SNP-based association studies

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this