Objective: To specify the problem of patient-level temporal aggregation from clinical text and introduce several probabilistic methods for addressing that problem. The patient-level perspective differs from the prevailing natural language processing (NLP) practice of evaluating at the term, event, sentence, document, or visit level. Methods: We utilized an existing pediatric asthma cohort with manual annotations. After generating a basic feature set via standard clinical NLP methods, we introduce six methods of aggregating time-distributed features from the document level to the patient level. These aggregation methods are used to classify patients according to their asthma status in two hypothetical settings: retrospective epidemiology and clinical decision support. Results: In both settings, solid patient classification performance was obtained with machine learning algorithms on a number of evidence aggregation methods, with Sum aggregation obtaining the highest F1 score of 85.71% on the retrospective epidemiological setting, and a probability density function-based method obtaining the highest F1 score of 74.63% on the clinical decision support setting. Multiple techniques also estimated the diagnosis date (index date) of asthma with promising accuracy. Discussion: The clinical decision support setting is a more difficult problem. We rule out some aggregation methods rather than determining the best overall aggregation method, since our preliminary data set represented a practical setting in which manually annotated data were limited. Conclusion: Results contrasted the strengths of several aggregation algorithms in different settings. Multiple approaches exhibited good patient classification performance, and also predicted the timing of estimates with reasonable accuracy.
|Original language||English (US)|
|Number of pages||9|
|Journal||Journal of the American Medical Informatics Association|
|State||Published - Jan 1 2014|
ASJC Scopus subject areas
- Health Informatics