A system for phenotype harmonization in the National Heart, Lung, and Blood Institute Trans-Omics for Precision Medicine (TOPMed) program

Adrienne M. Stilp; Leslie S. Emery; Jai G. Broome; Erin J. Buth; Alyna T. Khan; Cecelia A. Laurie; Fei Fei Wang; Quenna Wong; Dongquan Chen; Catherine M. D’Augustine; Nancy L. Heard-Costa; Chancellor R. Hohensee; William Craig Johnson; Lucia D. Juarez; Jingmin Liu; Karen M. Mutalik; Laura M. Raffield; Kerri L. Wiggins; Paul S. de Vries; Tanika N. Kelly; Charles Kooperberg; Pradeep Natarajan; Gina M. Peloso; Patricia A. Peyser; Alex P. Reiner; Donna K. Arnett; Stella Aslibekyan; Kathleen C. Barnes; Lawrence F. Bielak; Joshua C. Bis; Brian E. Cade; Ming Huei Chen; Adolfo Correa; L. Adrienne Cupples; Mariza de Andrade; Patrick T. Ellinor; Myriam Fornage; Nora Franceschini; Weiniu Gan; Santhi K. Ganesh; Jan Graffelman; Megan L. Grove; Xiuqing Guo; Nicola L. Hawley; Wan Ling Hsu; Rebecca D. Jackson; Cashell E. Jaquish; Andrew D. Johnson; Sharon L.R. Kardia; Shannon Kelly; Jiwon Lee; Rasika A. Mathias; Stephen T. McGarvey; Braxton D. Mitchell; May E. Montasser; Alanna C. Morrison; Kari E. North; Seyed Mehdi Nouraie; Elizabeth C. Oelsner; Nathan Pankratz; Stephen S. Rich; Jerome I. Rotter; Jennifer A. Smith; Kent D. Taylor; Ramachandran S. Vasan; Daniel E. Weeks; Scott T. Weiss; Carla G. Wilson; Lisa R. Yanek; Bruce M. Psaty; Susan R. Heckbert; Cathy C. Laurie

doi:10.1093/aje/kwab115

A system for phenotype harmonization in the National Heart, Lung, and Blood Institute Trans-Omics for Precision Medicine (TOPMed) program

Adrienne M. Stilp, Leslie S. Emery, Jai G. Broome, Erin J. Buth, Alyna T. Khan, Cecelia A. Laurie, Fei Fei Wang, Quenna Wong, Dongquan Chen, Catherine M. D’Augustine, Nancy L. Heard-Costa, Chancellor R. Hohensee, William Craig Johnson, Lucia D. Juarez, Jingmin Liu, Karen M. Mutalik, Laura M. Raffield, Kerri L. Wiggins, Paul S. de Vries, Tanika N. KellyCharles Kooperberg, Pradeep Natarajan, Gina M. Peloso, Patricia A. Peyser, Alex P. Reiner, Donna K. Arnett, Stella Aslibekyan, Kathleen C. Barnes, Lawrence F. Bielak, Joshua C. Bis, Brian E. Cade, Ming Huei Chen, Adolfo Correa, L. Adrienne Cupples, Mariza de Andrade, Patrick T. Ellinor, Myriam Fornage, Nora Franceschini, Weiniu Gan, Santhi K. Ganesh, Jan Graffelman, Megan L. Grove, Xiuqing Guo, Nicola L. Hawley, Wan Ling Hsu, Rebecca D. Jackson, Cashell E. Jaquish, Andrew D. Johnson, Sharon L.R. Kardia, Shannon Kelly, Jiwon Lee, Rasika A. Mathias, Stephen T. McGarvey, Braxton D. Mitchell, May E. Montasser, Alanna C. Morrison, Kari E. North, Seyed Mehdi Nouraie, Elizabeth C. Oelsner, Nathan Pankratz, Stephen S. Rich, Jerome I. Rotter, Jennifer A. Smith, Kent D. Taylor, Ramachandran S. Vasan, Daniel E. Weeks, Scott T. Weiss, Carla G. Wilson, Lisa R. Yanek, Bruce M. Psaty, Susan R. Heckbert, Cathy C. Laurie

Quantitative Health Sciences

Research output: Contribution to journal › Article › peer-review

Abstract

Genotype-phenotype association studies often combine phenotype data from multiple studies to increase statistical power. Harmonization of the data usually requires substantial effort due to heterogeneity in phenotype definitions, study design, data collection procedures, and data-set organization. Here we describe a centralized system for phenotype harmonization that includes input from phenotype domain and study experts, quality control, documentation, reproducible results, and data-sharing mechanisms. This system was developed for the National Heart, Lung, and Blood Institute’s Trans-Omics for Precision Medicine (TOPMed) program, which is generating genomic and other -omics data for more than 80 studies with extensive phenotype data. To date, 63 phenotypes have been harmonized across thousands of participants (recruited in 1948–2012) from up to 17 studies per phenotype. Here we discuss challenges in this undertaking and how they were addressed. The harmonized phenotype data and associated documentation have been submitted to National Institutes of Health data repositories for controlled access by the scientific community. We also provide materials to facilitate future harmonization efforts by the community, which include 1) the software code used to generate the 63 harmonized phenotypes, enabling others to reproduce, modify, or extend these harmonizations to additional studies, and 2) the results of labeling thousands of phenotype variables with controlled vocabulary terms.

Original language	English (US)
Pages (from-to)	1977-1992
Number of pages	16
Journal	American journal of epidemiology
Volume	190
Issue number	10
DOIs	https://doi.org/10.1093/aje/kwab115
State	Published - 2021

Keywords

Cardiovascular disease
Common data elements
Hematologic disease
Information dissemination
Lung diseases
Phenotypes
Sleep-wake disorders

ASJC Scopus subject areas

General Medicine

Access to Document

10.1093/aje/kwab115

Cite this

Stilp, A. M., Emery, L. S., Broome, J. G., Buth, E. J., Khan, A. T., Laurie, C. A., Wang, F. F., Wong, Q., Chen, D., D’Augustine, C. M., Heard-Costa, N. L., Hohensee, C. R., Johnson, W. C., Juarez, L. D., Liu, J., Mutalik, K. M., Raffield, L. M., Wiggins, K. L., de Vries, P. S., ... Laurie, C. C. (2021). A system for phenotype harmonization in the National Heart, Lung, and Blood Institute Trans-Omics for Precision Medicine (TOPMed) program. American journal of epidemiology, 190(10), 1977-1992. https://doi.org/10.1093/aje/kwab115

Stilp, AM, Emery, LS, Broome, JG, Buth, EJ, Khan, AT, Laurie, CA, Wang, FF, Wong, Q, Chen, D, D’Augustine, CM, Heard-Costa, NL, Hohensee, CR, Johnson, WC, Juarez, LD, Liu, J, Mutalik, KM, Raffield, LM, Wiggins, KL, de Vries, PS, Kelly, TN, Kooperberg, C, Natarajan, P, Peloso, GM, Peyser, PA, Reiner, AP, Arnett, DK, Aslibekyan, S, Barnes, KC, Bielak, LF, Bis, JC, Cade, BE, Chen, MH, Correa, A, Adrienne Cupples, L, de Andrade, M, Ellinor, PT, Fornage, M, Franceschini, N, Gan, W, Ganesh, SK, Graffelman, J, Grove, ML, Guo, X, Hawley, NL, Hsu, WL, Jackson, RD, Jaquish, CE, Johnson, AD, Kardia, SLR, Kelly, S, Lee, J, Mathias, RA, McGarvey, ST, Mitchell, BD, Montasser, ME, Morrison, AC, North, KE, Nouraie, SM, Oelsner, EC, Pankratz, N, Rich, SS, Rotter, JI, Smith, JA, Taylor, KD, Vasan, RS, Weeks, DE, Weiss, ST, Wilson, CG, Yanek, LR, Psaty, BM, Heckbert, SR & Laurie, CC 2021, 'A system for phenotype harmonization in the National Heart, Lung, and Blood Institute Trans-Omics for Precision Medicine (TOPMed) program', American journal of epidemiology, vol. 190, no. 10, pp. 1977-1992. https://doi.org/10.1093/aje/kwab115

@article{d2334a6f71654264a6184734339d8069,

title = "A system for phenotype harmonization in the National Heart, Lung, and Blood Institute Trans-Omics for Precision Medicine (TOPMed) program",

abstract = "Genotype-phenotype association studies often combine phenotype data from multiple studies to increase statistical power. Harmonization of the data usually requires substantial effort due to heterogeneity in phenotype definitions, study design, data collection procedures, and data-set organization. Here we describe a centralized system for phenotype harmonization that includes input from phenotype domain and study experts, quality control, documentation, reproducible results, and data-sharing mechanisms. This system was developed for the National Heart, Lung, and Blood Institute{\textquoteright}s Trans-Omics for Precision Medicine (TOPMed) program, which is generating genomic and other -omics data for more than 80 studies with extensive phenotype data. To date, 63 phenotypes have been harmonized across thousands of participants (recruited in 1948–2012) from up to 17 studies per phenotype. Here we discuss challenges in this undertaking and how they were addressed. The harmonized phenotype data and associated documentation have been submitted to National Institutes of Health data repositories for controlled access by the scientific community. We also provide materials to facilitate future harmonization efforts by the community, which include 1) the software code used to generate the 63 harmonized phenotypes, enabling others to reproduce, modify, or extend these harmonizations to additional studies, and 2) the results of labeling thousands of phenotype variables with controlled vocabulary terms.",

keywords = "Cardiovascular disease, Common data elements, Hematologic disease, Information dissemination, Lung diseases, Phenotypes, Sleep-wake disorders",

author = "Stilp, {Adrienne M.} and Emery, {Leslie S.} and Broome, {Jai G.} and Buth, {Erin J.} and Khan, {Alyna T.} and Laurie, {Cecelia A.} and Wang, {Fei Fei} and Quenna Wong and Dongquan Chen and D{\textquoteright}Augustine, {Catherine M.} and Heard-Costa, {Nancy L.} and Hohensee, {Chancellor R.} and Johnson, {William Craig} and Juarez, {Lucia D.} and Jingmin Liu and Mutalik, {Karen M.} and Raffield, {Laura M.} and Wiggins, {Kerri L.} and {de Vries}, {Paul S.} and Kelly, {Tanika N.} and Charles Kooperberg and Pradeep Natarajan and Peloso, {Gina M.} and Peyser, {Patricia A.} and Reiner, {Alex P.} and Arnett, {Donna K.} and Stella Aslibekyan and Barnes, {Kathleen C.} and Bielak, {Lawrence F.} and Bis, {Joshua C.} and Cade, {Brian E.} and Chen, {Ming Huei} and Adolfo Correa and {Adrienne Cupples}, L. and {de Andrade}, Mariza and Ellinor, {Patrick T.} and Myriam Fornage and Nora Franceschini and Weiniu Gan and Ganesh, {Santhi K.} and Jan Graffelman and Grove, {Megan L.} and Xiuqing Guo and Hawley, {Nicola L.} and Hsu, {Wan Ling} and Jackson, {Rebecca D.} and Jaquish, {Cashell E.} and Johnson, {Andrew D.} and Kardia, {Sharon L.R.} and Shannon Kelly and Jiwon Lee and Mathias, {Rasika A.} and McGarvey, {Stephen T.} and Mitchell, {Braxton D.} and Montasser, {May E.} and Morrison, {Alanna C.} and North, {Kari E.} and Nouraie, {Seyed Mehdi} and Oelsner, {Elizabeth C.} and Nathan Pankratz and Rich, {Stephen S.} and Rotter, {Jerome I.} and Smith, {Jennifer A.} and Taylor, {Kent D.} and Vasan, {Ramachandran S.} and Weeks, {Daniel E.} and Weiss, {Scott T.} and Wilson, {Carla G.} and Yanek, {Lisa R.} and Psaty, {Bruce M.} and Heckbert, {Susan R.} and Laurie, {Cathy C.}",

note = "Publisher Copyright: {\textcopyright} The Author(s) 2021.",

year = "2021",

doi = "10.1093/aje/kwab115",

language = "English (US)",

volume = "190",

pages = "1977--1992",

journal = "American journal of epidemiology",

issn = "0002-9262",

publisher = "Oxford University Press",

number = "10",

}

TY - JOUR

T1 - A system for phenotype harmonization in the National Heart, Lung, and Blood Institute Trans-Omics for Precision Medicine (TOPMed) program

AU - Stilp, Adrienne M.

AU - Emery, Leslie S.

AU - Broome, Jai G.

AU - Buth, Erin J.

AU - Khan, Alyna T.

AU - Laurie, Cecelia A.

AU - Wang, Fei Fei

AU - Wong, Quenna

AU - Chen, Dongquan

AU - D’Augustine, Catherine M.

AU - Heard-Costa, Nancy L.

AU - Hohensee, Chancellor R.

AU - Johnson, William Craig

AU - Juarez, Lucia D.

AU - Liu, Jingmin

AU - Mutalik, Karen M.

AU - Raffield, Laura M.

AU - Wiggins, Kerri L.

AU - de Vries, Paul S.

AU - Kelly, Tanika N.

AU - Kooperberg, Charles

AU - Natarajan, Pradeep

AU - Peloso, Gina M.

AU - Peyser, Patricia A.

AU - Reiner, Alex P.

AU - Arnett, Donna K.

AU - Aslibekyan, Stella

AU - Barnes, Kathleen C.

AU - Bielak, Lawrence F.

AU - Bis, Joshua C.

AU - Cade, Brian E.

AU - Chen, Ming Huei

AU - Correa, Adolfo

AU - Adrienne Cupples, L.

AU - de Andrade, Mariza

AU - Ellinor, Patrick T.

AU - Fornage, Myriam

AU - Franceschini, Nora

AU - Gan, Weiniu

AU - Ganesh, Santhi K.

AU - Graffelman, Jan

AU - Grove, Megan L.

AU - Guo, Xiuqing

AU - Hawley, Nicola L.

AU - Hsu, Wan Ling

AU - Jackson, Rebecca D.

AU - Jaquish, Cashell E.

AU - Johnson, Andrew D.

AU - Kardia, Sharon L.R.

AU - Kelly, Shannon

AU - Lee, Jiwon

AU - Mathias, Rasika A.

AU - McGarvey, Stephen T.

AU - Mitchell, Braxton D.

AU - Montasser, May E.

AU - Morrison, Alanna C.

AU - North, Kari E.

AU - Nouraie, Seyed Mehdi

AU - Oelsner, Elizabeth C.

AU - Pankratz, Nathan

AU - Rich, Stephen S.

AU - Rotter, Jerome I.

AU - Smith, Jennifer A.

AU - Taylor, Kent D.

AU - Vasan, Ramachandran S.

AU - Weeks, Daniel E.

AU - Weiss, Scott T.

AU - Wilson, Carla G.

AU - Yanek, Lisa R.

AU - Psaty, Bruce M.

AU - Heckbert, Susan R.

AU - Laurie, Cathy C.

PY - 2021

Y1 - 2021

N2 - Genotype-phenotype association studies often combine phenotype data from multiple studies to increase statistical power. Harmonization of the data usually requires substantial effort due to heterogeneity in phenotype definitions, study design, data collection procedures, and data-set organization. Here we describe a centralized system for phenotype harmonization that includes input from phenotype domain and study experts, quality control, documentation, reproducible results, and data-sharing mechanisms. This system was developed for the National Heart, Lung, and Blood Institute’s Trans-Omics for Precision Medicine (TOPMed) program, which is generating genomic and other -omics data for more than 80 studies with extensive phenotype data. To date, 63 phenotypes have been harmonized across thousands of participants (recruited in 1948–2012) from up to 17 studies per phenotype. Here we discuss challenges in this undertaking and how they were addressed. The harmonized phenotype data and associated documentation have been submitted to National Institutes of Health data repositories for controlled access by the scientific community. We also provide materials to facilitate future harmonization efforts by the community, which include 1) the software code used to generate the 63 harmonized phenotypes, enabling others to reproduce, modify, or extend these harmonizations to additional studies, and 2) the results of labeling thousands of phenotype variables with controlled vocabulary terms.

AB - Genotype-phenotype association studies often combine phenotype data from multiple studies to increase statistical power. Harmonization of the data usually requires substantial effort due to heterogeneity in phenotype definitions, study design, data collection procedures, and data-set organization. Here we describe a centralized system for phenotype harmonization that includes input from phenotype domain and study experts, quality control, documentation, reproducible results, and data-sharing mechanisms. This system was developed for the National Heart, Lung, and Blood Institute’s Trans-Omics for Precision Medicine (TOPMed) program, which is generating genomic and other -omics data for more than 80 studies with extensive phenotype data. To date, 63 phenotypes have been harmonized across thousands of participants (recruited in 1948–2012) from up to 17 studies per phenotype. Here we discuss challenges in this undertaking and how they were addressed. The harmonized phenotype data and associated documentation have been submitted to National Institutes of Health data repositories for controlled access by the scientific community. We also provide materials to facilitate future harmonization efforts by the community, which include 1) the software code used to generate the 63 harmonized phenotypes, enabling others to reproduce, modify, or extend these harmonizations to additional studies, and 2) the results of labeling thousands of phenotype variables with controlled vocabulary terms.

KW - Cardiovascular disease

KW - Common data elements

KW - Hematologic disease

KW - Information dissemination

KW - Lung diseases

KW - Phenotypes

KW - Sleep-wake disorders

UR - http://www.scopus.com/inward/record.url?scp=85117878335&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85117878335&partnerID=8YFLogxK

U2 - 10.1093/aje/kwab115

DO - 10.1093/aje/kwab115

M3 - Article

C2 - 33861317

AN - SCOPUS:85117878335

SN - 0002-9262

VL - 190

SP - 1977

EP - 1992

JO - American journal of epidemiology

JF - American journal of epidemiology

IS - 10

ER -

A system for phenotype harmonization in the National Heart, Lung, and Blood Institute Trans-Omics for Precision Medicine (TOPMed) program

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this