Patient-level temporal aggregation for text-based asthma status ascertainment

Research output: Contribution to journalArticle

11 Citations (Scopus)

Abstract

Objective: To specify the problem of patient-level temporal aggregation from clinical text and introduce several probabilistic methods for addressing that problem. The patient-level perspective differs from the prevailing natural language processing (NLP) practice of evaluating at the term, event, sentence, document, or visit level. Methods: We utilized an existing pediatric asthma cohort with manual annotations. After generating a basic feature set via standard clinical NLP methods, we introduce six methods of aggregating time-distributed features from the document level to the patient level. These aggregation methods are used to classify patients according to their asthma status in two hypothetical settings: retrospective epidemiology and clinical decision support. Results: In both settings, solid patient classification performance was obtained with machine learning algorithms on a number of evidence aggregation methods, with Sum aggregation obtaining the highest F1 score of 85.71% on the retrospective epidemiological setting, and a probability density function-based method obtaining the highest F1 score of 74.63% on the clinical decision support setting. Multiple techniques also estimated the diagnosis date (index date) of asthma with promising accuracy. Discussion: The clinical decision support setting is a more difficult problem. We rule out some aggregation methods rather than determining the best overall aggregation method, since our preliminary data set represented a practical setting in which manually annotated data were limited. Conclusion: Results contrasted the strengths of several aggregation algorithms in different settings. Multiple approaches exhibited good patient classification performance, and also predicted the timing of estimates with reasonable accuracy.

Original languageEnglish (US)
Pages (from-to)876-884
Number of pages9
JournalJournal of the American Medical Informatics Association
Volume21
Issue number5
DOIs
StatePublished - 2014

Fingerprint

Asthma
Clinical Decision Support Systems
Natural Language Processing
Epidemiology
Pediatrics

ASJC Scopus subject areas

  • Health Informatics

Cite this

@article{6783070e0b434bc1a7de8f848096ea36,
title = "Patient-level temporal aggregation for text-based asthma status ascertainment",
abstract = "Objective: To specify the problem of patient-level temporal aggregation from clinical text and introduce several probabilistic methods for addressing that problem. The patient-level perspective differs from the prevailing natural language processing (NLP) practice of evaluating at the term, event, sentence, document, or visit level. Methods: We utilized an existing pediatric asthma cohort with manual annotations. After generating a basic feature set via standard clinical NLP methods, we introduce six methods of aggregating time-distributed features from the document level to the patient level. These aggregation methods are used to classify patients according to their asthma status in two hypothetical settings: retrospective epidemiology and clinical decision support. Results: In both settings, solid patient classification performance was obtained with machine learning algorithms on a number of evidence aggregation methods, with Sum aggregation obtaining the highest F1 score of 85.71{\%} on the retrospective epidemiological setting, and a probability density function-based method obtaining the highest F1 score of 74.63{\%} on the clinical decision support setting. Multiple techniques also estimated the diagnosis date (index date) of asthma with promising accuracy. Discussion: The clinical decision support setting is a more difficult problem. We rule out some aggregation methods rather than determining the best overall aggregation method, since our preliminary data set represented a practical setting in which manually annotated data were limited. Conclusion: Results contrasted the strengths of several aggregation algorithms in different settings. Multiple approaches exhibited good patient classification performance, and also predicted the timing of estimates with reasonable accuracy.",
author = "Wu, {Stephen T.} and Juhn, {Young J} and Sunghwan Sohn and Liu, {Hongfang D}",
year = "2014",
doi = "10.1136/amiajnl-2013-002463",
language = "English (US)",
volume = "21",
pages = "876--884",
journal = "Journal of the American Medical Informatics Association : JAMIA",
issn = "1067-5027",
publisher = "Oxford University Press",
number = "5",

}

TY - JOUR

T1 - Patient-level temporal aggregation for text-based asthma status ascertainment

AU - Wu, Stephen T.

AU - Juhn, Young J

AU - Sohn, Sunghwan

AU - Liu, Hongfang D

PY - 2014

Y1 - 2014

N2 - Objective: To specify the problem of patient-level temporal aggregation from clinical text and introduce several probabilistic methods for addressing that problem. The patient-level perspective differs from the prevailing natural language processing (NLP) practice of evaluating at the term, event, sentence, document, or visit level. Methods: We utilized an existing pediatric asthma cohort with manual annotations. After generating a basic feature set via standard clinical NLP methods, we introduce six methods of aggregating time-distributed features from the document level to the patient level. These aggregation methods are used to classify patients according to their asthma status in two hypothetical settings: retrospective epidemiology and clinical decision support. Results: In both settings, solid patient classification performance was obtained with machine learning algorithms on a number of evidence aggregation methods, with Sum aggregation obtaining the highest F1 score of 85.71% on the retrospective epidemiological setting, and a probability density function-based method obtaining the highest F1 score of 74.63% on the clinical decision support setting. Multiple techniques also estimated the diagnosis date (index date) of asthma with promising accuracy. Discussion: The clinical decision support setting is a more difficult problem. We rule out some aggregation methods rather than determining the best overall aggregation method, since our preliminary data set represented a practical setting in which manually annotated data were limited. Conclusion: Results contrasted the strengths of several aggregation algorithms in different settings. Multiple approaches exhibited good patient classification performance, and also predicted the timing of estimates with reasonable accuracy.

AB - Objective: To specify the problem of patient-level temporal aggregation from clinical text and introduce several probabilistic methods for addressing that problem. The patient-level perspective differs from the prevailing natural language processing (NLP) practice of evaluating at the term, event, sentence, document, or visit level. Methods: We utilized an existing pediatric asthma cohort with manual annotations. After generating a basic feature set via standard clinical NLP methods, we introduce six methods of aggregating time-distributed features from the document level to the patient level. These aggregation methods are used to classify patients according to their asthma status in two hypothetical settings: retrospective epidemiology and clinical decision support. Results: In both settings, solid patient classification performance was obtained with machine learning algorithms on a number of evidence aggregation methods, with Sum aggregation obtaining the highest F1 score of 85.71% on the retrospective epidemiological setting, and a probability density function-based method obtaining the highest F1 score of 74.63% on the clinical decision support setting. Multiple techniques also estimated the diagnosis date (index date) of asthma with promising accuracy. Discussion: The clinical decision support setting is a more difficult problem. We rule out some aggregation methods rather than determining the best overall aggregation method, since our preliminary data set represented a practical setting in which manually annotated data were limited. Conclusion: Results contrasted the strengths of several aggregation algorithms in different settings. Multiple approaches exhibited good patient classification performance, and also predicted the timing of estimates with reasonable accuracy.

UR - http://www.scopus.com/inward/record.url?scp=84906315639&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84906315639&partnerID=8YFLogxK

U2 - 10.1136/amiajnl-2013-002463

DO - 10.1136/amiajnl-2013-002463

M3 - Article

VL - 21

SP - 876

EP - 884

JO - Journal of the American Medical Informatics Association : JAMIA

JF - Journal of the American Medical Informatics Association : JAMIA

SN - 1067-5027

IS - 5

ER -