Harmonization and semantic annotation of data dictionaries from the Pharmacogenomics Research Network: A case study

Qian Zhu; Robert R. Freimuth; Zonghui Lian; Scott Bauer; Jyotishman Pathak; Cui Tao; Matthew J. Durski; Christopher G. Chute

doi:10.1016/j.jbi.2012.11.004

Harmonization and semantic annotation of data dictionaries from the Pharmacogenomics Research Network: A case study

Qian Zhu, Robert R. Freimuth, Zonghui Lian, Scott Bauer, Jyotishman Pathak, Cui Tao, Matthew J. Durski, Christopher G. Chute

Research output: Contribution to journal › Article › peer-review

6 Scopus citations

Abstract

The Pharmacogenomics Research Network (PGRN) is a collaborative partnership of research groups funded by NIH to discover and understand how genome contributes to an individual's response to medication. Since traditional biomedical research studies and clinical trials are often conducted independently, common and standardized representations for data are seldom used. This leads to heterogeneity in data representation, which hinders data reuse, data integration and meta-analyses.This study demonstrates harmonization and semantic annotation work for pharmacogenomics data dictionaries collected from PGRN research groups. A semi-automated system was developed to support the harmonization/annotation process, which includes four individual steps, (1) pre-processing PGRN variables; (2) decomposing and normalizing variable descriptions; (3) semantically annotating words and phrases using controlled terminologies; (4) grouping PGRN variables into categories based on the annotation results and semantic types, for total 1514 PGRN variables.Our results demonstrate that there is a significant amount of variability in how pharmacogenomics data is represented and that additional standardization efforts are needed. This represents a critical first step toward identifying and creating data standards for pharmacogenomics studies.

Original language	English (US)
Pages (from-to)	286-293
Number of pages	8
Journal	Journal of Biomedical Informatics
Volume	46
Issue number	2
DOIs	https://doi.org/10.1016/j.jbi.2012.11.004
State	Published - Apr 2013

Keywords

Data harmonization
Pharmacogenomics
Semantic annotation

ASJC Scopus subject areas

Computer Science Applications
Health Informatics

Access to Document

10.1016/j.jbi.2012.11.004

Cite this

@article{fd55cde606a34e41bca980bde0f73361,

title = "Harmonization and semantic annotation of data dictionaries from the Pharmacogenomics Research Network: A case study",

abstract = "The Pharmacogenomics Research Network (PGRN) is a collaborative partnership of research groups funded by NIH to discover and understand how genome contributes to an individual's response to medication. Since traditional biomedical research studies and clinical trials are often conducted independently, common and standardized representations for data are seldom used. This leads to heterogeneity in data representation, which hinders data reuse, data integration and meta-analyses.This study demonstrates harmonization and semantic annotation work for pharmacogenomics data dictionaries collected from PGRN research groups. A semi-automated system was developed to support the harmonization/annotation process, which includes four individual steps, (1) pre-processing PGRN variables; (2) decomposing and normalizing variable descriptions; (3) semantically annotating words and phrases using controlled terminologies; (4) grouping PGRN variables into categories based on the annotation results and semantic types, for total 1514 PGRN variables.Our results demonstrate that there is a significant amount of variability in how pharmacogenomics data is represented and that additional standardization efforts are needed. This represents a critical first step toward identifying and creating data standards for pharmacogenomics studies.",

keywords = "Data harmonization, Pharmacogenomics, Semantic annotation",

author = "Qian Zhu and Freimuth, {Robert R.} and Zonghui Lian and Scott Bauer and Jyotishman Pathak and Cui Tao and Durski, {Matthew J.} and Chute, {Christopher G.}",

note = "Funding Information: This work was supported by the NIH/NIGMS ( U19 GM61388 ; the Pharmacogenomic Research Network). ",

year = "2013",

month = apr,

doi = "10.1016/j.jbi.2012.11.004",

language = "English (US)",

volume = "46",

pages = "286--293",

journal = "Journal of Biomedical Informatics",

issn = "1532-0464",

publisher = "Academic Press Inc.",

number = "2",

}

TY - JOUR

T1 - Harmonization and semantic annotation of data dictionaries from the Pharmacogenomics Research Network

T2 - A case study

AU - Zhu, Qian

AU - Freimuth, Robert R.

AU - Lian, Zonghui

AU - Bauer, Scott

AU - Pathak, Jyotishman

AU - Tao, Cui

AU - Durski, Matthew J.

AU - Chute, Christopher G.

N1 - Funding Information: This work was supported by the NIH/NIGMS ( U19 GM61388 ; the Pharmacogenomic Research Network).

PY - 2013/4

Y1 - 2013/4

N2 - The Pharmacogenomics Research Network (PGRN) is a collaborative partnership of research groups funded by NIH to discover and understand how genome contributes to an individual's response to medication. Since traditional biomedical research studies and clinical trials are often conducted independently, common and standardized representations for data are seldom used. This leads to heterogeneity in data representation, which hinders data reuse, data integration and meta-analyses.This study demonstrates harmonization and semantic annotation work for pharmacogenomics data dictionaries collected from PGRN research groups. A semi-automated system was developed to support the harmonization/annotation process, which includes four individual steps, (1) pre-processing PGRN variables; (2) decomposing and normalizing variable descriptions; (3) semantically annotating words and phrases using controlled terminologies; (4) grouping PGRN variables into categories based on the annotation results and semantic types, for total 1514 PGRN variables.Our results demonstrate that there is a significant amount of variability in how pharmacogenomics data is represented and that additional standardization efforts are needed. This represents a critical first step toward identifying and creating data standards for pharmacogenomics studies.

AB - The Pharmacogenomics Research Network (PGRN) is a collaborative partnership of research groups funded by NIH to discover and understand how genome contributes to an individual's response to medication. Since traditional biomedical research studies and clinical trials are often conducted independently, common and standardized representations for data are seldom used. This leads to heterogeneity in data representation, which hinders data reuse, data integration and meta-analyses.This study demonstrates harmonization and semantic annotation work for pharmacogenomics data dictionaries collected from PGRN research groups. A semi-automated system was developed to support the harmonization/annotation process, which includes four individual steps, (1) pre-processing PGRN variables; (2) decomposing and normalizing variable descriptions; (3) semantically annotating words and phrases using controlled terminologies; (4) grouping PGRN variables into categories based on the annotation results and semantic types, for total 1514 PGRN variables.Our results demonstrate that there is a significant amount of variability in how pharmacogenomics data is represented and that additional standardization efforts are needed. This represents a critical first step toward identifying and creating data standards for pharmacogenomics studies.

KW - Data harmonization

KW - Pharmacogenomics

KW - Semantic annotation

UR - http://www.scopus.com/inward/record.url?scp=84875588566&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84875588566&partnerID=8YFLogxK

U2 - 10.1016/j.jbi.2012.11.004

DO - 10.1016/j.jbi.2012.11.004

M3 - Article

C2 - 23201637

AN - SCOPUS:84875588566

SN - 1532-0464

VL - 46

SP - 286

EP - 293

JO - Journal of Biomedical Informatics

JF - Journal of Biomedical Informatics

IS - 2

ER -

Harmonization and semantic annotation of data dictionaries from the Pharmacogenomics Research Network: A case study

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this