Risk factor detection for heart disease by applying text analytics in electronic medical records

Manabu Torii, Jung Wei Fan, Wei li Yang, Theodore Lee, Matthew T. Wiley, Daniel S. Zisook, Yang Huang

Research output: Contribution to journalArticle

14 Citations (Scopus)

Abstract

In the United States, about 600,000 people die of heart disease every year. The annual cost of care services, medications, and lost productivity reportedly exceeds 108.9 billion dollars. Effective disease risk assessment is critical to prevention, care, and treatment planning. Recent advancements in text analytics have opened up new possibilities of using the rich information in electronic medical records (EMRs) to identify relevant risk factors. The 2014 i2b2/UTHealth Challenge brought together researchers and practitioners of clinical natural language processing (NLP) to tackle the identification of heart disease risk factors reported in EMRs. We participated in this track and developed an NLP system by leveraging existing tools and resources, both public and proprietary. Our system was a hybrid of several machine-learning and rule-based components. The system achieved an overall F1 score of 0.9185, with a recall of 0.9409 and a precision of 0.8972.

Original languageEnglish (US)
Pages (from-to)S164-S170
JournalJournal of Biomedical Informatics
Volume58
DOIs
StatePublished - Dec 1 2015
Externally publishedYes

Fingerprint

Natural Language Processing
Electronic medical equipment
Electronic Health Records
Heart Diseases
Natural language processing systems
Research Personnel
Costs and Cost Analysis
Risk assessment
Learning systems
Productivity
Planning
Processing
Costs
Machine Learning

Keywords

  • Medical records
  • Natural language processing
  • Risk assessment
  • Text classification

ASJC Scopus subject areas

  • Computer Science Applications
  • Health Informatics

Cite this

Risk factor detection for heart disease by applying text analytics in electronic medical records. / Torii, Manabu; Fan, Jung Wei; Yang, Wei li; Lee, Theodore; Wiley, Matthew T.; Zisook, Daniel S.; Huang, Yang.

In: Journal of Biomedical Informatics, Vol. 58, 01.12.2015, p. S164-S170.

Research output: Contribution to journalArticle

Torii, Manabu ; Fan, Jung Wei ; Yang, Wei li ; Lee, Theodore ; Wiley, Matthew T. ; Zisook, Daniel S. ; Huang, Yang. / Risk factor detection for heart disease by applying text analytics in electronic medical records. In: Journal of Biomedical Informatics. 2015 ; Vol. 58. pp. S164-S170.
@article{d085b7c744624c19b3d40fa5a427d1ef,
title = "Risk factor detection for heart disease by applying text analytics in electronic medical records",
abstract = "In the United States, about 600,000 people die of heart disease every year. The annual cost of care services, medications, and lost productivity reportedly exceeds 108.9 billion dollars. Effective disease risk assessment is critical to prevention, care, and treatment planning. Recent advancements in text analytics have opened up new possibilities of using the rich information in electronic medical records (EMRs) to identify relevant risk factors. The 2014 i2b2/UTHealth Challenge brought together researchers and practitioners of clinical natural language processing (NLP) to tackle the identification of heart disease risk factors reported in EMRs. We participated in this track and developed an NLP system by leveraging existing tools and resources, both public and proprietary. Our system was a hybrid of several machine-learning and rule-based components. The system achieved an overall F1 score of 0.9185, with a recall of 0.9409 and a precision of 0.8972.",
keywords = "Medical records, Natural language processing, Risk assessment, Text classification",
author = "Manabu Torii and Fan, {Jung Wei} and Yang, {Wei li} and Theodore Lee and Wiley, {Matthew T.} and Zisook, {Daniel S.} and Yang Huang",
year = "2015",
month = "12",
day = "1",
doi = "10.1016/j.jbi.2015.08.011",
language = "English (US)",
volume = "58",
pages = "S164--S170",
journal = "Journal of Biomedical Informatics",
issn = "1532-0464",
publisher = "Academic Press Inc.",

}

TY - JOUR

T1 - Risk factor detection for heart disease by applying text analytics in electronic medical records

AU - Torii, Manabu

AU - Fan, Jung Wei

AU - Yang, Wei li

AU - Lee, Theodore

AU - Wiley, Matthew T.

AU - Zisook, Daniel S.

AU - Huang, Yang

PY - 2015/12/1

Y1 - 2015/12/1

N2 - In the United States, about 600,000 people die of heart disease every year. The annual cost of care services, medications, and lost productivity reportedly exceeds 108.9 billion dollars. Effective disease risk assessment is critical to prevention, care, and treatment planning. Recent advancements in text analytics have opened up new possibilities of using the rich information in electronic medical records (EMRs) to identify relevant risk factors. The 2014 i2b2/UTHealth Challenge brought together researchers and practitioners of clinical natural language processing (NLP) to tackle the identification of heart disease risk factors reported in EMRs. We participated in this track and developed an NLP system by leveraging existing tools and resources, both public and proprietary. Our system was a hybrid of several machine-learning and rule-based components. The system achieved an overall F1 score of 0.9185, with a recall of 0.9409 and a precision of 0.8972.

AB - In the United States, about 600,000 people die of heart disease every year. The annual cost of care services, medications, and lost productivity reportedly exceeds 108.9 billion dollars. Effective disease risk assessment is critical to prevention, care, and treatment planning. Recent advancements in text analytics have opened up new possibilities of using the rich information in electronic medical records (EMRs) to identify relevant risk factors. The 2014 i2b2/UTHealth Challenge brought together researchers and practitioners of clinical natural language processing (NLP) to tackle the identification of heart disease risk factors reported in EMRs. We participated in this track and developed an NLP system by leveraging existing tools and resources, both public and proprietary. Our system was a hybrid of several machine-learning and rule-based components. The system achieved an overall F1 score of 0.9185, with a recall of 0.9409 and a precision of 0.8972.

KW - Medical records

KW - Natural language processing

KW - Risk assessment

KW - Text classification

UR - http://www.scopus.com/inward/record.url?scp=84940099418&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84940099418&partnerID=8YFLogxK

U2 - 10.1016/j.jbi.2015.08.011

DO - 10.1016/j.jbi.2015.08.011

M3 - Article

C2 - 26279500

AN - SCOPUS:84940099418

VL - 58

SP - S164-S170

JO - Journal of Biomedical Informatics

JF - Journal of Biomedical Informatics

SN - 1532-0464

ER -