Ascertainment of Delirium Status Using Natural Language Processing from Electronic Health Records

Sunyang Fu, Guilherme S. Lopes, Sandeep R. Pagali, Bjoerg Thorsteinsdottir, Nathan K. Lebrasseur, Andrew Wen, Hongfang Liu, Walter A. Rocca, Janet E. Olson, Jennifer St. Sauver, Sunghwan Sohn

Research output: Contribution to journalArticlepeer-review


Background: Delirium is underdiagnosed in clinical practice and is not routinely coded for billing. Manual chart review can be used to identify the occurrence of delirium; however, it is labor-intensive and impractical for large-scale studies. Natural language processing (NLP) has the capability to process raw text in electronic health records (EHRs) and determine the meaning of the information. We developed and validated NLP algorithms to automatically identify the occurrence of delirium from EHRs. Methods: This study used a randomly selected cohort from the population-based Mayo Clinic Biobank (N = 300, age ≥65). We adopted the standardized evidence-based framework confusion assessment method (CAM) to develop and evaluate NLP algorithms to identify the occurrence of delirium using clinical notes in EHRs. Two NLP algorithms were developed based on CAM criteria: one based on the original CAM (NLP-CAM; delirium vs no delirium) and another based on our modified CAM (NLP-mCAM; definite, possible, and no delirium). The sensitivity, specificity, and accuracy were used for concordance in delirium status between NLP algorithms and manual chart review as the gold standard. The prevalence of delirium cases was examined using International Classification of Diseases, 9th Revision (ICD-9), NLP-CAM, and NLP-mCAM. Results: NLP-CAM demonstrated a sensitivity, specificity, and accuracy of 0.919, 1.000, and 0.967, respectively. NLP-mCAM demonstrated sensitivity, specificity, and accuracy of 0.827, 0.913, and 0.827, respectively. The prevalence analysis of delirium showed that the NLP-CAM algorithm identified 12 651 (9.4%) delirium patients, the NLP-mCAM algorithm identified 20 611 (15.3%) definite delirium cases, and 10 762 (8.0%) possible cases. Conclusions: NLP algorithms based on the standardized evidence-based CAM framework demonstrated high performance in delineating delirium status in an expeditious and cost-effective manner.

Original languageEnglish (US)
Pages (from-to)524-530
Number of pages7
JournalJournals of Gerontology - Series A Biological Sciences and Medical Sciences
Issue number3
StatePublished - Mar 1 2022


  • Confusion assessment method
  • Delirium
  • Electronic health records
  • Natural language processing

ASJC Scopus subject areas

  • Aging
  • Geriatrics and Gerontology


Dive into the research topics of 'Ascertainment of Delirium Status Using Natural Language Processing from Electronic Health Records'. Together they form a unique fingerprint.

Cite this