Utilizing genotype imputation for the augmentation of sequence data

Brooke L. Fridley; Gregory Jenkins; Matthew E. Deyo-Svendsen; Scott Hebbring; Robert Freimuth

doi:10.1371/journal.pone.0011018

Utilizing genotype imputation for the augmentation of sequence data

Brooke L. Fridley, Gregory Jenkins, Matthew E. Deyo-Svendsen, Scott Hebbring, Robert Freimuth

Quantitative Health Sciences

Research output: Contribution to journal › Article › peer-review

17 Scopus citations

Abstract

Background: In recent years, capabilities for genotyping large sets of single nucleotide polymorphisms (SNPs) has increased considerably with the ability to genotype over 1 million SNP markers across the genome. This advancement in technology has led to an increase in the number of genome-wide association studies (GWAS) for various complex traits. These GWAS have resulted in the implication of over 1500 SNPs associated with disease traits. However, the SNPs identified from these GWAS are not necessarily the functional variants. Therefore, the next phase in GWAS will involve the refining of these putative loci. Methodology: A next step for GWAS would be to catalog all variants, especially rarer variants, within the detected loci, followed by the association analysis of the detected variants with the disease trait. However, sequencing a locus in a large number of subjects is still relatively expensive. A more cost effective approach would be to sequence a portion of the individuals, followed by the application of genotype imputation methods for imputing markers in the remaining individuals. A potentially attractive alternative option would be to impute based on the 1000 Genomes Project; however, this has the drawbacks of using a reference population that does not necessarily match the disease status and LD pattern of the study population. We explored a variety of approaches for carrying out the imputation using a reference panel consisting of sequence data for a fraction of the study participants using data from both a candidate gene sequencing study and the 1000 Genomes Project. Conclusions: Imputation of genetic variation based on a proportion of sequenced samples is feasible. Our results indicate the following sequencing study design guidelines which take advantage of the recent advances in genotype imputation methodology: Select the largest and most diverse reference panel for sequencing and genotype as many "anchor" markers as possible.

Original language	English (US)
Article number	e11018
Journal	PloS one
Volume	5
Issue number	6
DOIs	https://doi.org/10.1371/journal.pone.0011018
State	Published - 2010

ASJC Scopus subject areas

General

Access to Document

10.1371/journal.pone.0011018

Cite this

@article{3dde190185774bf09f4e310f5cc8078b,

title = "Utilizing genotype imputation for the augmentation of sequence data",

abstract = "Background: In recent years, capabilities for genotyping large sets of single nucleotide polymorphisms (SNPs) has increased considerably with the ability to genotype over 1 million SNP markers across the genome. This advancement in technology has led to an increase in the number of genome-wide association studies (GWAS) for various complex traits. These GWAS have resulted in the implication of over 1500 SNPs associated with disease traits. However, the SNPs identified from these GWAS are not necessarily the functional variants. Therefore, the next phase in GWAS will involve the refining of these putative loci. Methodology: A next step for GWAS would be to catalog all variants, especially rarer variants, within the detected loci, followed by the association analysis of the detected variants with the disease trait. However, sequencing a locus in a large number of subjects is still relatively expensive. A more cost effective approach would be to sequence a portion of the individuals, followed by the application of genotype imputation methods for imputing markers in the remaining individuals. A potentially attractive alternative option would be to impute based on the 1000 Genomes Project; however, this has the drawbacks of using a reference population that does not necessarily match the disease status and LD pattern of the study population. We explored a variety of approaches for carrying out the imputation using a reference panel consisting of sequence data for a fraction of the study participants using data from both a candidate gene sequencing study and the 1000 Genomes Project. Conclusions: Imputation of genetic variation based on a proportion of sequenced samples is feasible. Our results indicate the following sequencing study design guidelines which take advantage of the recent advances in genotype imputation methodology: Select the largest and most diverse reference panel for sequencing and genotype as many {"}anchor{"} markers as possible.",

author = "Fridley, {Brooke L.} and Gregory Jenkins and Deyo-Svendsen, {Matthew E.} and Scott Hebbring and Robert Freimuth",

year = "2010",

doi = "10.1371/journal.pone.0011018",

language = "English (US)",

volume = "5",

journal = "PloS one",

issn = "1932-6203",

publisher = "Public Library of Science",

number = "6",

}

TY - JOUR

T1 - Utilizing genotype imputation for the augmentation of sequence data

AU - Fridley, Brooke L.

AU - Jenkins, Gregory

AU - Deyo-Svendsen, Matthew E.

AU - Hebbring, Scott

AU - Freimuth, Robert

PY - 2010

Y1 - 2010

N2 - Background: In recent years, capabilities for genotyping large sets of single nucleotide polymorphisms (SNPs) has increased considerably with the ability to genotype over 1 million SNP markers across the genome. This advancement in technology has led to an increase in the number of genome-wide association studies (GWAS) for various complex traits. These GWAS have resulted in the implication of over 1500 SNPs associated with disease traits. However, the SNPs identified from these GWAS are not necessarily the functional variants. Therefore, the next phase in GWAS will involve the refining of these putative loci. Methodology: A next step for GWAS would be to catalog all variants, especially rarer variants, within the detected loci, followed by the association analysis of the detected variants with the disease trait. However, sequencing a locus in a large number of subjects is still relatively expensive. A more cost effective approach would be to sequence a portion of the individuals, followed by the application of genotype imputation methods for imputing markers in the remaining individuals. A potentially attractive alternative option would be to impute based on the 1000 Genomes Project; however, this has the drawbacks of using a reference population that does not necessarily match the disease status and LD pattern of the study population. We explored a variety of approaches for carrying out the imputation using a reference panel consisting of sequence data for a fraction of the study participants using data from both a candidate gene sequencing study and the 1000 Genomes Project. Conclusions: Imputation of genetic variation based on a proportion of sequenced samples is feasible. Our results indicate the following sequencing study design guidelines which take advantage of the recent advances in genotype imputation methodology: Select the largest and most diverse reference panel for sequencing and genotype as many "anchor" markers as possible.

AB - Background: In recent years, capabilities for genotyping large sets of single nucleotide polymorphisms (SNPs) has increased considerably with the ability to genotype over 1 million SNP markers across the genome. This advancement in technology has led to an increase in the number of genome-wide association studies (GWAS) for various complex traits. These GWAS have resulted in the implication of over 1500 SNPs associated with disease traits. However, the SNPs identified from these GWAS are not necessarily the functional variants. Therefore, the next phase in GWAS will involve the refining of these putative loci. Methodology: A next step for GWAS would be to catalog all variants, especially rarer variants, within the detected loci, followed by the association analysis of the detected variants with the disease trait. However, sequencing a locus in a large number of subjects is still relatively expensive. A more cost effective approach would be to sequence a portion of the individuals, followed by the application of genotype imputation methods for imputing markers in the remaining individuals. A potentially attractive alternative option would be to impute based on the 1000 Genomes Project; however, this has the drawbacks of using a reference population that does not necessarily match the disease status and LD pattern of the study population. We explored a variety of approaches for carrying out the imputation using a reference panel consisting of sequence data for a fraction of the study participants using data from both a candidate gene sequencing study and the 1000 Genomes Project. Conclusions: Imputation of genetic variation based on a proportion of sequenced samples is feasible. Our results indicate the following sequencing study design guidelines which take advantage of the recent advances in genotype imputation methodology: Select the largest and most diverse reference panel for sequencing and genotype as many "anchor" markers as possible.

UR - http://www.scopus.com/inward/record.url?scp=77956197018&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77956197018&partnerID=8YFLogxK

U2 - 10.1371/journal.pone.0011018

DO - 10.1371/journal.pone.0011018

M3 - Article

C2 - 20543988

AN - SCOPUS:77956197018

SN - 1932-6203

VL - 5

JO - PloS one

JF - PloS one

IS - 6

M1 - e11018

ER -

Utilizing genotype imputation for the augmentation of sequence data

Abstract

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this