Automatic extraction and assessment of lifestyle exposures for Alzheimer's disease using natural language processing

Research output: Contribution to journalArticle

Abstract

Introduction: Previous biomedical studies identified many lifestyle exposures that could possibly represent risk factors for dementia in general or dementia due to Alzheimer's disease (AD). These lifestyle exposures are mainly mentioned in free-text electronic health records (EHRs). However, automatic extraction and assessment of these exposures using EHRs remains understudied. Methods: A natural language processing (NLP) approach was adopted to extract lifestyle exposures and intervention strategies from the clinical notes of 260 patients with clinical diagnoses of AD dementia and 260 age-matched cognitively unimpaired persons. Statistics of lifestyle exposures were compared between these two groups. The mapping results of the NLP extraction were evaluated by comparing the results with data captured independently by clinicians. Results: Thirty out of fifty-five potentially relevant lifestyle exposures were mentioned in our clinical note dataset. Twenty-two dietary factors and three substance abuses that were potentially relevant were not found in clinical notes. Patients with AD dementia were significantly exposed to more of the potential risk factors compared to the cognitively unimpaired subjects (χ2 = 120.31, p-value < 0.001). The average accuracy of the automated extraction was 74.0% in comparison with the manual review of randomly selected 50 sample documents. Discussion and conclusion: We illustrated the feasibility of NLP techniques for the automated evaluation of a large number lifestyle habits using free-text EHR data. We found that AD dementia patients were exposed to more of the potential risk factors than the comparison group. Our results also demonstrated the feasibility and accuracy of investigating putative risk factors using NLP techniques.

Original languageEnglish (US)
Article number103943
JournalInternational Journal of Medical Informatics
Volume130
DOIs
StatePublished - Oct 1 2019

Fingerprint

Natural Language Processing
Life Style
Alzheimer Disease
Electronic Health Records
Dementia
Habits
Substance-Related Disorders

Keywords

  • Alzheimer's disease
  • Electronic health records
  • Lifestyle exposure
  • Natural language processing

ASJC Scopus subject areas

  • Health Informatics

Cite this

@article{25ac55a6892545eda662bad4096522a2,
title = "Automatic extraction and assessment of lifestyle exposures for Alzheimer's disease using natural language processing",
abstract = "Introduction: Previous biomedical studies identified many lifestyle exposures that could possibly represent risk factors for dementia in general or dementia due to Alzheimer's disease (AD). These lifestyle exposures are mainly mentioned in free-text electronic health records (EHRs). However, automatic extraction and assessment of these exposures using EHRs remains understudied. Methods: A natural language processing (NLP) approach was adopted to extract lifestyle exposures and intervention strategies from the clinical notes of 260 patients with clinical diagnoses of AD dementia and 260 age-matched cognitively unimpaired persons. Statistics of lifestyle exposures were compared between these two groups. The mapping results of the NLP extraction were evaluated by comparing the results with data captured independently by clinicians. Results: Thirty out of fifty-five potentially relevant lifestyle exposures were mentioned in our clinical note dataset. Twenty-two dietary factors and three substance abuses that were potentially relevant were not found in clinical notes. Patients with AD dementia were significantly exposed to more of the potential risk factors compared to the cognitively unimpaired subjects (χ2 = 120.31, p-value < 0.001). The average accuracy of the automated extraction was 74.0{\%} in comparison with the manual review of randomly selected 50 sample documents. Discussion and conclusion: We illustrated the feasibility of NLP techniques for the automated evaluation of a large number lifestyle habits using free-text EHR data. We found that AD dementia patients were exposed to more of the potential risk factors than the comparison group. Our results also demonstrated the feasibility and accuracy of investigating putative risk factors using NLP techniques.",
keywords = "Alzheimer's disease, Electronic health records, Lifestyle exposure, Natural language processing",
author = "Xin Zhou and Yanshan Wang and Sunghwan Sohn and Therneau, {Terry M} and Liu, {Hongfang D} and Knopman, {David S}",
year = "2019",
month = "10",
day = "1",
doi = "10.1016/j.ijmedinf.2019.08.003",
language = "English (US)",
volume = "130",
journal = "International Journal of Medical Informatics",
issn = "1386-5056",
publisher = "Elsevier Ireland Ltd",

}

TY - JOUR

T1 - Automatic extraction and assessment of lifestyle exposures for Alzheimer's disease using natural language processing

AU - Zhou, Xin

AU - Wang, Yanshan

AU - Sohn, Sunghwan

AU - Therneau, Terry M

AU - Liu, Hongfang D

AU - Knopman, David S

PY - 2019/10/1

Y1 - 2019/10/1

N2 - Introduction: Previous biomedical studies identified many lifestyle exposures that could possibly represent risk factors for dementia in general or dementia due to Alzheimer's disease (AD). These lifestyle exposures are mainly mentioned in free-text electronic health records (EHRs). However, automatic extraction and assessment of these exposures using EHRs remains understudied. Methods: A natural language processing (NLP) approach was adopted to extract lifestyle exposures and intervention strategies from the clinical notes of 260 patients with clinical diagnoses of AD dementia and 260 age-matched cognitively unimpaired persons. Statistics of lifestyle exposures were compared between these two groups. The mapping results of the NLP extraction were evaluated by comparing the results with data captured independently by clinicians. Results: Thirty out of fifty-five potentially relevant lifestyle exposures were mentioned in our clinical note dataset. Twenty-two dietary factors and three substance abuses that were potentially relevant were not found in clinical notes. Patients with AD dementia were significantly exposed to more of the potential risk factors compared to the cognitively unimpaired subjects (χ2 = 120.31, p-value < 0.001). The average accuracy of the automated extraction was 74.0% in comparison with the manual review of randomly selected 50 sample documents. Discussion and conclusion: We illustrated the feasibility of NLP techniques for the automated evaluation of a large number lifestyle habits using free-text EHR data. We found that AD dementia patients were exposed to more of the potential risk factors than the comparison group. Our results also demonstrated the feasibility and accuracy of investigating putative risk factors using NLP techniques.

AB - Introduction: Previous biomedical studies identified many lifestyle exposures that could possibly represent risk factors for dementia in general or dementia due to Alzheimer's disease (AD). These lifestyle exposures are mainly mentioned in free-text electronic health records (EHRs). However, automatic extraction and assessment of these exposures using EHRs remains understudied. Methods: A natural language processing (NLP) approach was adopted to extract lifestyle exposures and intervention strategies from the clinical notes of 260 patients with clinical diagnoses of AD dementia and 260 age-matched cognitively unimpaired persons. Statistics of lifestyle exposures were compared between these two groups. The mapping results of the NLP extraction were evaluated by comparing the results with data captured independently by clinicians. Results: Thirty out of fifty-five potentially relevant lifestyle exposures were mentioned in our clinical note dataset. Twenty-two dietary factors and three substance abuses that were potentially relevant were not found in clinical notes. Patients with AD dementia were significantly exposed to more of the potential risk factors compared to the cognitively unimpaired subjects (χ2 = 120.31, p-value < 0.001). The average accuracy of the automated extraction was 74.0% in comparison with the manual review of randomly selected 50 sample documents. Discussion and conclusion: We illustrated the feasibility of NLP techniques for the automated evaluation of a large number lifestyle habits using free-text EHR data. We found that AD dementia patients were exposed to more of the potential risk factors than the comparison group. Our results also demonstrated the feasibility and accuracy of investigating putative risk factors using NLP techniques.

KW - Alzheimer's disease

KW - Electronic health records

KW - Lifestyle exposure

KW - Natural language processing

UR - http://www.scopus.com/inward/record.url?scp=85071516008&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85071516008&partnerID=8YFLogxK

U2 - 10.1016/j.ijmedinf.2019.08.003

DO - 10.1016/j.ijmedinf.2019.08.003

M3 - Article

C2 - 31476655

AN - SCOPUS:85071516008

VL - 130

JO - International Journal of Medical Informatics

JF - International Journal of Medical Informatics

SN - 1386-5056

M1 - 103943

ER -