Streamlined analysis of pooled genotype data in SNP-based association studies

Valentina Moskvina, Nadine Norton, Nigel Williams, Peter Holmans, Michael Owen, Michael O'Donovan

Research output: Contribution to journalArticle

24 Citations (Scopus)

Abstract

Several groups have developed methods for estimating allele frequencies in DNA pools as a fast and cheap way for detecting allelic association between genetic markers and disease. To obtain accurate estimates of allele frequencies, a correction factor k for the degree to which measurement of allele-specific products is biased is generally applied. Factor k is usually obtained as the ratio of the two allele-specific signals in samples from heterozygous individuals, a step that can significantly impair throughput and increase cost. We have systematically investigated the properties of k through the use of empirical and simulated data. We show that for the dye terminator primer extension genotyping method we have applied, the correction factor k is substantially influenced by the dye terminators incorporated, but also by the terminal 3′ base of the extension primer. We also show that the variation in k is large enough to result in unacceptable error rates if association studies are conducted without regard to k. We show that the impact of ignoring k can be neutralized by applying a correction factor kmax that can be easily derived, but this at the potential cost of an increase in type I error. Finally, based upon observed distributions for k we derive a method allowing the estimation of the probability pooled data reflects significant differences in the allele frequencies between the subjects comprising the pools. By controlling the error rates in the absence of knowledge of the appropriate SNP-specific correction factors, each approach enhances the performance of DNA pooling, while considerably streamlining the method by reducing time and cost.

Original languageEnglish (US)
Pages (from-to)273-282
Number of pages10
JournalGenetic Epidemiology
Volume28
Issue number3
DOIs
StatePublished - Apr 2005
Externally publishedYes

Fingerprint

Single Nucleotide Polymorphism
Genotype
Gene Frequency
Costs and Cost Analysis
Coloring Agents
Alleles
Inborn Genetic Diseases
DNA
Genetic Markers

Keywords

  • Association
  • Case-control differential amplification
  • DNA pool
  • SNP

ASJC Scopus subject areas

  • Genetics(clinical)
  • Epidemiology

Cite this

Streamlined analysis of pooled genotype data in SNP-based association studies. / Moskvina, Valentina; Norton, Nadine; Williams, Nigel; Holmans, Peter; Owen, Michael; O'Donovan, Michael.

In: Genetic Epidemiology, Vol. 28, No. 3, 04.2005, p. 273-282.

Research output: Contribution to journalArticle

Moskvina, V, Norton, N, Williams, N, Holmans, P, Owen, M & O'Donovan, M 2005, 'Streamlined analysis of pooled genotype data in SNP-based association studies', Genetic Epidemiology, vol. 28, no. 3, pp. 273-282. https://doi.org/10.1002/gepi.20062
Moskvina, Valentina ; Norton, Nadine ; Williams, Nigel ; Holmans, Peter ; Owen, Michael ; O'Donovan, Michael. / Streamlined analysis of pooled genotype data in SNP-based association studies. In: Genetic Epidemiology. 2005 ; Vol. 28, No. 3. pp. 273-282.
@article{985b04e1885245ac8a98ad4db2623b03,
title = "Streamlined analysis of pooled genotype data in SNP-based association studies",
abstract = "Several groups have developed methods for estimating allele frequencies in DNA pools as a fast and cheap way for detecting allelic association between genetic markers and disease. To obtain accurate estimates of allele frequencies, a correction factor k for the degree to which measurement of allele-specific products is biased is generally applied. Factor k is usually obtained as the ratio of the two allele-specific signals in samples from heterozygous individuals, a step that can significantly impair throughput and increase cost. We have systematically investigated the properties of k through the use of empirical and simulated data. We show that for the dye terminator primer extension genotyping method we have applied, the correction factor k is substantially influenced by the dye terminators incorporated, but also by the terminal 3′ base of the extension primer. We also show that the variation in k is large enough to result in unacceptable error rates if association studies are conducted without regard to k. We show that the impact of ignoring k can be neutralized by applying a correction factor kmax that can be easily derived, but this at the potential cost of an increase in type I error. Finally, based upon observed distributions for k we derive a method allowing the estimation of the probability pooled data reflects significant differences in the allele frequencies between the subjects comprising the pools. By controlling the error rates in the absence of knowledge of the appropriate SNP-specific correction factors, each approach enhances the performance of DNA pooling, while considerably streamlining the method by reducing time and cost.",
keywords = "Association, Case-control differential amplification, DNA pool, SNP",
author = "Valentina Moskvina and Nadine Norton and Nigel Williams and Peter Holmans and Michael Owen and Michael O'Donovan",
year = "2005",
month = "4",
doi = "10.1002/gepi.20062",
language = "English (US)",
volume = "28",
pages = "273--282",
journal = "Genetic Epidemiology",
issn = "0741-0395",
publisher = "Wiley-Liss Inc.",
number = "3",

}

TY - JOUR

T1 - Streamlined analysis of pooled genotype data in SNP-based association studies

AU - Moskvina, Valentina

AU - Norton, Nadine

AU - Williams, Nigel

AU - Holmans, Peter

AU - Owen, Michael

AU - O'Donovan, Michael

PY - 2005/4

Y1 - 2005/4

N2 - Several groups have developed methods for estimating allele frequencies in DNA pools as a fast and cheap way for detecting allelic association between genetic markers and disease. To obtain accurate estimates of allele frequencies, a correction factor k for the degree to which measurement of allele-specific products is biased is generally applied. Factor k is usually obtained as the ratio of the two allele-specific signals in samples from heterozygous individuals, a step that can significantly impair throughput and increase cost. We have systematically investigated the properties of k through the use of empirical and simulated data. We show that for the dye terminator primer extension genotyping method we have applied, the correction factor k is substantially influenced by the dye terminators incorporated, but also by the terminal 3′ base of the extension primer. We also show that the variation in k is large enough to result in unacceptable error rates if association studies are conducted without regard to k. We show that the impact of ignoring k can be neutralized by applying a correction factor kmax that can be easily derived, but this at the potential cost of an increase in type I error. Finally, based upon observed distributions for k we derive a method allowing the estimation of the probability pooled data reflects significant differences in the allele frequencies between the subjects comprising the pools. By controlling the error rates in the absence of knowledge of the appropriate SNP-specific correction factors, each approach enhances the performance of DNA pooling, while considerably streamlining the method by reducing time and cost.

AB - Several groups have developed methods for estimating allele frequencies in DNA pools as a fast and cheap way for detecting allelic association between genetic markers and disease. To obtain accurate estimates of allele frequencies, a correction factor k for the degree to which measurement of allele-specific products is biased is generally applied. Factor k is usually obtained as the ratio of the two allele-specific signals in samples from heterozygous individuals, a step that can significantly impair throughput and increase cost. We have systematically investigated the properties of k through the use of empirical and simulated data. We show that for the dye terminator primer extension genotyping method we have applied, the correction factor k is substantially influenced by the dye terminators incorporated, but also by the terminal 3′ base of the extension primer. We also show that the variation in k is large enough to result in unacceptable error rates if association studies are conducted without regard to k. We show that the impact of ignoring k can be neutralized by applying a correction factor kmax that can be easily derived, but this at the potential cost of an increase in type I error. Finally, based upon observed distributions for k we derive a method allowing the estimation of the probability pooled data reflects significant differences in the allele frequencies between the subjects comprising the pools. By controlling the error rates in the absence of knowledge of the appropriate SNP-specific correction factors, each approach enhances the performance of DNA pooling, while considerably streamlining the method by reducing time and cost.

KW - Association

KW - Case-control differential amplification

KW - DNA pool

KW - SNP

UR - http://www.scopus.com/inward/record.url?scp=15544371188&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=15544371188&partnerID=8YFLogxK

U2 - 10.1002/gepi.20062

DO - 10.1002/gepi.20062

M3 - Article

VL - 28

SP - 273

EP - 282

JO - Genetic Epidemiology

JF - Genetic Epidemiology

SN - 0741-0395

IS - 3

ER -