Background: Genome-wide association studies with single nucleotide polymorphisms (SNPs) show great promise to identify genetic determinants of complex human traits. In current analyses, genotype calling and imputation of missing genotypes are usually considered as two separated tasks. The genotypes of SNPs are first determined one at a time from allele signal intensities. Then the missing genotypes, i.e., no-calls caused by not perfectly separated signal clouds, are imputed based on the linkage disequilibrium (LD) between multiple SNPs. Although many statistical methods have been developed to improve either genotype calling or imputation of missing genotypes, treating the two steps independently can lead to loss of genetic information. Results: We propose a novel genotype calling framework. In this framework, we consider the signal intensities and underlying LD structure of SNPs simultaneously by estimating both cluster parameters and haplotype frequencies. As a result, our new method outperforms some existing algorithms in terms of both call rates and genotyping accuracy. Our studies also suggest that jointly analyzing multiple SNPs in LD provides more accurate estimation of haplotypes than haplotype reconstruction methods that only use called genotypes. Conclusion: Our study demonstrates that jointly analyzing signal intensities and LD structure of multiple SNPs is a better way to determine genotypes and estimate LD parameters.
ASJC Scopus subject areas
- Molecular Biology
- Computer Science Applications
- Structural Biology
- Applied Mathematics