Evaluating the impact of data representation on EHR-based analytic tasks

Wonsuk Oh, Michael S. Steinbach, M. Regina Castro, Kevin A. Peterson, Vipin Kumar, Pedro Caraballo, Gyorgy J. Simona

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Different analytic techniques operate optimally with different types of data. As the use of EHR-based analytics expands to newer tasks, data will have to be transformed into different representations, so the tasks can be optimally solved. We classified representations into broad categories based on their characteristics, and proposed a new knowledge-driven representation for clinical data mining as well as trajectory mining, called Severity Encoding Variables (SEVs). Additionally, we studied which characteristics make representations most suitable for particular clinical analytics tasks including trajectory mining. Our evaluation shows that, for regression, most data representations performed similarly, with SEV achieving a slight (albeit statistically significant) advantage. For patients at high risk of diabetes, it outperformed the competing representation by (relative) 20%. For association mining, SEV achieved the highest performance. Its ability to constrain the search space of patterns through clinical knowledge was key to its success.

Original languageEnglish (US)
Title of host publicationMEDINFO 2019
Subtitle of host publicationHealth and Wellbeing e-Networks for All - Proceedings of the 17th World Congress on Medical and Health Informatics
EditorsBrigitte Seroussi, Lucila Ohno-Machado, Lucila Ohno-Machado, Brigitte Seroussi
PublisherIOS Press
Pages288-292
Number of pages5
ISBN (Electronic)9781643680026
DOIs
StatePublished - Aug 21 2019
Event17th World Congress on Medical and Health Informatics, MEDINFO 2019 - Lyon, France
Duration: Aug 25 2019Aug 30 2019

Publication series

NameStudies in Health Technology and Informatics
Volume264
ISSN (Print)0926-9630
ISSN (Electronic)1879-8365

Conference

Conference17th World Congress on Medical and Health Informatics, MEDINFO 2019
CountryFrance
CityLyon
Period8/25/198/30/19

Fingerprint

Trajectories
Data Mining
Knowledge representation
Medical problems
Data mining

Keywords

  • Data Mining
  • Data Science
  • Electronic Health Records

ASJC Scopus subject areas

  • Biomedical Engineering
  • Health Informatics
  • Health Information Management

Cite this

Oh, W., Steinbach, M. S., Castro, M. R., Peterson, K. A., Kumar, V., Caraballo, P., & Simona, G. J. (2019). Evaluating the impact of data representation on EHR-based analytic tasks. In B. Seroussi, L. Ohno-Machado, L. Ohno-Machado, & B. Seroussi (Eds.), MEDINFO 2019: Health and Wellbeing e-Networks for All - Proceedings of the 17th World Congress on Medical and Health Informatics (pp. 288-292). (Studies in Health Technology and Informatics; Vol. 264). IOS Press. https://doi.org/10.3233/SHTI190229

Evaluating the impact of data representation on EHR-based analytic tasks. / Oh, Wonsuk; Steinbach, Michael S.; Castro, M. Regina; Peterson, Kevin A.; Kumar, Vipin; Caraballo, Pedro; Simona, Gyorgy J.

MEDINFO 2019: Health and Wellbeing e-Networks for All - Proceedings of the 17th World Congress on Medical and Health Informatics. ed. / Brigitte Seroussi; Lucila Ohno-Machado; Lucila Ohno-Machado; Brigitte Seroussi. IOS Press, 2019. p. 288-292 (Studies in Health Technology and Informatics; Vol. 264).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Oh, W, Steinbach, MS, Castro, MR, Peterson, KA, Kumar, V, Caraballo, P & Simona, GJ 2019, Evaluating the impact of data representation on EHR-based analytic tasks. in B Seroussi, L Ohno-Machado, L Ohno-Machado & B Seroussi (eds), MEDINFO 2019: Health and Wellbeing e-Networks for All - Proceedings of the 17th World Congress on Medical and Health Informatics. Studies in Health Technology and Informatics, vol. 264, IOS Press, pp. 288-292, 17th World Congress on Medical and Health Informatics, MEDINFO 2019, Lyon, France, 8/25/19. https://doi.org/10.3233/SHTI190229
Oh W, Steinbach MS, Castro MR, Peterson KA, Kumar V, Caraballo P et al. Evaluating the impact of data representation on EHR-based analytic tasks. In Seroussi B, Ohno-Machado L, Ohno-Machado L, Seroussi B, editors, MEDINFO 2019: Health and Wellbeing e-Networks for All - Proceedings of the 17th World Congress on Medical and Health Informatics. IOS Press. 2019. p. 288-292. (Studies in Health Technology and Informatics). https://doi.org/10.3233/SHTI190229
Oh, Wonsuk ; Steinbach, Michael S. ; Castro, M. Regina ; Peterson, Kevin A. ; Kumar, Vipin ; Caraballo, Pedro ; Simona, Gyorgy J. / Evaluating the impact of data representation on EHR-based analytic tasks. MEDINFO 2019: Health and Wellbeing e-Networks for All - Proceedings of the 17th World Congress on Medical and Health Informatics. editor / Brigitte Seroussi ; Lucila Ohno-Machado ; Lucila Ohno-Machado ; Brigitte Seroussi. IOS Press, 2019. pp. 288-292 (Studies in Health Technology and Informatics).
@inproceedings{63388cde5b1c49b5ab985081634a736d,
title = "Evaluating the impact of data representation on EHR-based analytic tasks",
abstract = "Different analytic techniques operate optimally with different types of data. As the use of EHR-based analytics expands to newer tasks, data will have to be transformed into different representations, so the tasks can be optimally solved. We classified representations into broad categories based on their characteristics, and proposed a new knowledge-driven representation for clinical data mining as well as trajectory mining, called Severity Encoding Variables (SEVs). Additionally, we studied which characteristics make representations most suitable for particular clinical analytics tasks including trajectory mining. Our evaluation shows that, for regression, most data representations performed similarly, with SEV achieving a slight (albeit statistically significant) advantage. For patients at high risk of diabetes, it outperformed the competing representation by (relative) 20{\%}. For association mining, SEV achieved the highest performance. Its ability to constrain the search space of patterns through clinical knowledge was key to its success.",
keywords = "Data Mining, Data Science, Electronic Health Records",
author = "Wonsuk Oh and Steinbach, {Michael S.} and Castro, {M. Regina} and Peterson, {Kevin A.} and Vipin Kumar and Pedro Caraballo and Simona, {Gyorgy J.}",
year = "2019",
month = "8",
day = "21",
doi = "10.3233/SHTI190229",
language = "English (US)",
series = "Studies in Health Technology and Informatics",
publisher = "IOS Press",
pages = "288--292",
editor = "Brigitte Seroussi and Lucila Ohno-Machado and Lucila Ohno-Machado and Brigitte Seroussi",
booktitle = "MEDINFO 2019",

}

TY - GEN

T1 - Evaluating the impact of data representation on EHR-based analytic tasks

AU - Oh, Wonsuk

AU - Steinbach, Michael S.

AU - Castro, M. Regina

AU - Peterson, Kevin A.

AU - Kumar, Vipin

AU - Caraballo, Pedro

AU - Simona, Gyorgy J.

PY - 2019/8/21

Y1 - 2019/8/21

N2 - Different analytic techniques operate optimally with different types of data. As the use of EHR-based analytics expands to newer tasks, data will have to be transformed into different representations, so the tasks can be optimally solved. We classified representations into broad categories based on their characteristics, and proposed a new knowledge-driven representation for clinical data mining as well as trajectory mining, called Severity Encoding Variables (SEVs). Additionally, we studied which characteristics make representations most suitable for particular clinical analytics tasks including trajectory mining. Our evaluation shows that, for regression, most data representations performed similarly, with SEV achieving a slight (albeit statistically significant) advantage. For patients at high risk of diabetes, it outperformed the competing representation by (relative) 20%. For association mining, SEV achieved the highest performance. Its ability to constrain the search space of patterns through clinical knowledge was key to its success.

AB - Different analytic techniques operate optimally with different types of data. As the use of EHR-based analytics expands to newer tasks, data will have to be transformed into different representations, so the tasks can be optimally solved. We classified representations into broad categories based on their characteristics, and proposed a new knowledge-driven representation for clinical data mining as well as trajectory mining, called Severity Encoding Variables (SEVs). Additionally, we studied which characteristics make representations most suitable for particular clinical analytics tasks including trajectory mining. Our evaluation shows that, for regression, most data representations performed similarly, with SEV achieving a slight (albeit statistically significant) advantage. For patients at high risk of diabetes, it outperformed the competing representation by (relative) 20%. For association mining, SEV achieved the highest performance. Its ability to constrain the search space of patterns through clinical knowledge was key to its success.

KW - Data Mining

KW - Data Science

KW - Electronic Health Records

UR - http://www.scopus.com/inward/record.url?scp=85071512591&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85071512591&partnerID=8YFLogxK

U2 - 10.3233/SHTI190229

DO - 10.3233/SHTI190229

M3 - Conference contribution

C2 - 31437931

AN - SCOPUS:85071512591

T3 - Studies in Health Technology and Informatics

SP - 288

EP - 292

BT - MEDINFO 2019

A2 - Seroussi, Brigitte

A2 - Ohno-Machado, Lucila

A2 - Ohno-Machado, Lucila

A2 - Seroussi, Brigitte

PB - IOS Press

ER -