TY - JOUR
T1 - Applying semantic web technologies for phenome-wide scan using an electronic health record linked Biobank
AU - Pathak, Jyotishman
AU - Kiefer, Richard C.
AU - Bielinski, Suzette J.
AU - Chute, Christopher G.
N1 - Funding Information:
This research is supported in part by the Mayo Clinic Early Career Development Award (FP00058504), the eMERGE consortia (U01-HG-006379), the SHARPn project (90TR002), Mayo Clinic Genome-wide Association Study of Venous Thromboembolism (HG04735), Mayo Clinic SPORE in Pancreatic Cancer (P50CA102701), and Mayo Clinic Cancer Center (GERA Program). The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. The authors would like to thank Paul Decker, Rachel Gullerud, and Robert Freimuth for their help with access and preliminary analysis of MayoGC data.
Publisher Copyright:
© 2012 Pathak et al.; licensee BioMed Central Ltd.
PY - 2012/12/17
Y1 - 2012/12/17
N2 - Background: The ability to conduct genome-wide association studies (GWAS) has enabled new exploration of how genetic variations contribute to health and disease etiology. However, historically GWAS have been limited by inadequate sample size due to associated costs for genotyping and phenotyping of study subjects. This has prompted several academic medical centers to form "biobanks" where biospecimens linked to personal health information, typically in electronic health records (EHRs), are collected and stored on a large number of subjects. This provides tremendous opportunities to discover novel genotype-phenotype associations and foster hypotheses generation. Results: In this work, we study how emerging Semantic Web technologies can be applied in conjunction with clinical and genotype data stored at the Mayo Clinic Biobank to mine the phenotype data for genetic associations. In particular, we demonstrate the role of using Resource Description Framework (RDF) for representing EHR diagnoses and procedure data, and enable federated querying via standardized Web protocols to identify subjects genotyped for Type 2 Diabetes and Hypothyroidism to discover gene-disease associations. Our study highlights the potential of Web-scale data federation techniques to execute complex queries. Conclusions: This study demonstrates how Semantic Web technologies can be applied in conjunction with clinical data stored in EHRs to accurately identify subjects with specific diseases and phenotypes, and identify genotype-phenotype associations.
AB - Background: The ability to conduct genome-wide association studies (GWAS) has enabled new exploration of how genetic variations contribute to health and disease etiology. However, historically GWAS have been limited by inadequate sample size due to associated costs for genotyping and phenotyping of study subjects. This has prompted several academic medical centers to form "biobanks" where biospecimens linked to personal health information, typically in electronic health records (EHRs), are collected and stored on a large number of subjects. This provides tremendous opportunities to discover novel genotype-phenotype associations and foster hypotheses generation. Results: In this work, we study how emerging Semantic Web technologies can be applied in conjunction with clinical and genotype data stored at the Mayo Clinic Biobank to mine the phenotype data for genetic associations. In particular, we demonstrate the role of using Resource Description Framework (RDF) for representing EHR diagnoses and procedure data, and enable federated querying via standardized Web protocols to identify subjects genotyped for Type 2 Diabetes and Hypothyroidism to discover gene-disease associations. Our study highlights the potential of Web-scale data federation techniques to execute complex queries. Conclusions: This study demonstrates how Semantic Web technologies can be applied in conjunction with clinical data stored in EHRs to accurately identify subjects with specific diseases and phenotypes, and identify genotype-phenotype associations.
UR - http://www.scopus.com/inward/record.url?scp=84889676637&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84889676637&partnerID=8YFLogxK
U2 - 10.1186/2041-1480-3-10
DO - 10.1186/2041-1480-3-10
M3 - Article
AN - SCOPUS:84889676637
SN - 2041-1480
VL - 3
JO - Journal of Biomedical Semantics
JF - Journal of Biomedical Semantics
IS - 1
M1 - 10
ER -