A study of transportability of an existing smoking status detection module across institutions.

Mei Liu, Anushi Shah, Min Jiang, Neeraja B. Peterson, Q. Dai, Melinda C. Aldrich, Qingxia Chen, Erica A. Bowton, Hongfang D Liu, Joshua C. Denny, Hua Xu

Research output: Contribution to journalArticle

23 Citations (Scopus)

Abstract

Electronic Medical Records (EMRs) are valuable resources for clinical observational studies. Smoking status of a patient is one of the key factors for many diseases, but it is often embedded in narrative text. Natural language processing (NLP) systems have been developed for this specific task, such as the smoking status detection module in the clinical Text Analysis and Knowledge Extraction System (cTAKES). This study examined transportability of the smoking module in cTAKES on the Vanderbilt University Hospital's EMR data. Our evaluation demonstrated that modest effort of change is necessary to achieve desirable performance. We modified the system by filtering notes, annotating new data for training the machine learning classifier, and adding rules to the rule-based classifiers. Our results showed that the customized module achieved significantly higher F-measures at all levels of classification (i.e., sentence, document, patient) compared to the direct application of the cTAKES module to the Vanderbilt data.

Original languageEnglish (US)
Pages (from-to)577-586
Number of pages10
JournalAMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium
Volume2012
StatePublished - 2012

Fingerprint

Electronic Health Records
Smoking
Natural Language Processing
Observational Studies
Clinical Studies
Machine Learning

ASJC Scopus subject areas

  • Medicine(all)

Cite this

A study of transportability of an existing smoking status detection module across institutions. / Liu, Mei; Shah, Anushi; Jiang, Min; Peterson, Neeraja B.; Dai, Q.; Aldrich, Melinda C.; Chen, Qingxia; Bowton, Erica A.; Liu, Hongfang D; Denny, Joshua C.; Xu, Hua.

In: AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium, Vol. 2012, 2012, p. 577-586.

Research output: Contribution to journalArticle

Liu, M, Shah, A, Jiang, M, Peterson, NB, Dai, Q, Aldrich, MC, Chen, Q, Bowton, EA, Liu, HD, Denny, JC & Xu, H 2012, 'A study of transportability of an existing smoking status detection module across institutions.', AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium, vol. 2012, pp. 577-586.
Liu, Mei ; Shah, Anushi ; Jiang, Min ; Peterson, Neeraja B. ; Dai, Q. ; Aldrich, Melinda C. ; Chen, Qingxia ; Bowton, Erica A. ; Liu, Hongfang D ; Denny, Joshua C. ; Xu, Hua. / A study of transportability of an existing smoking status detection module across institutions. In: AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium. 2012 ; Vol. 2012. pp. 577-586.
@article{07f7160a92e24da59d8369870bdb275c,
title = "A study of transportability of an existing smoking status detection module across institutions.",
abstract = "Electronic Medical Records (EMRs) are valuable resources for clinical observational studies. Smoking status of a patient is one of the key factors for many diseases, but it is often embedded in narrative text. Natural language processing (NLP) systems have been developed for this specific task, such as the smoking status detection module in the clinical Text Analysis and Knowledge Extraction System (cTAKES). This study examined transportability of the smoking module in cTAKES on the Vanderbilt University Hospital's EMR data. Our evaluation demonstrated that modest effort of change is necessary to achieve desirable performance. We modified the system by filtering notes, annotating new data for training the machine learning classifier, and adding rules to the rule-based classifiers. Our results showed that the customized module achieved significantly higher F-measures at all levels of classification (i.e., sentence, document, patient) compared to the direct application of the cTAKES module to the Vanderbilt data.",
author = "Mei Liu and Anushi Shah and Min Jiang and Peterson, {Neeraja B.} and Q. Dai and Aldrich, {Melinda C.} and Qingxia Chen and Bowton, {Erica A.} and Liu, {Hongfang D} and Denny, {Joshua C.} and Hua Xu",
year = "2012",
language = "English (US)",
volume = "2012",
pages = "577--586",
journal = "AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium",
issn = "1559-4076",
publisher = "American Medical Informatics Association",

}

TY - JOUR

T1 - A study of transportability of an existing smoking status detection module across institutions.

AU - Liu, Mei

AU - Shah, Anushi

AU - Jiang, Min

AU - Peterson, Neeraja B.

AU - Dai, Q.

AU - Aldrich, Melinda C.

AU - Chen, Qingxia

AU - Bowton, Erica A.

AU - Liu, Hongfang D

AU - Denny, Joshua C.

AU - Xu, Hua

PY - 2012

Y1 - 2012

N2 - Electronic Medical Records (EMRs) are valuable resources for clinical observational studies. Smoking status of a patient is one of the key factors for many diseases, but it is often embedded in narrative text. Natural language processing (NLP) systems have been developed for this specific task, such as the smoking status detection module in the clinical Text Analysis and Knowledge Extraction System (cTAKES). This study examined transportability of the smoking module in cTAKES on the Vanderbilt University Hospital's EMR data. Our evaluation demonstrated that modest effort of change is necessary to achieve desirable performance. We modified the system by filtering notes, annotating new data for training the machine learning classifier, and adding rules to the rule-based classifiers. Our results showed that the customized module achieved significantly higher F-measures at all levels of classification (i.e., sentence, document, patient) compared to the direct application of the cTAKES module to the Vanderbilt data.

AB - Electronic Medical Records (EMRs) are valuable resources for clinical observational studies. Smoking status of a patient is one of the key factors for many diseases, but it is often embedded in narrative text. Natural language processing (NLP) systems have been developed for this specific task, such as the smoking status detection module in the clinical Text Analysis and Knowledge Extraction System (cTAKES). This study examined transportability of the smoking module in cTAKES on the Vanderbilt University Hospital's EMR data. Our evaluation demonstrated that modest effort of change is necessary to achieve desirable performance. We modified the system by filtering notes, annotating new data for training the machine learning classifier, and adding rules to the rule-based classifiers. Our results showed that the customized module achieved significantly higher F-measures at all levels of classification (i.e., sentence, document, patient) compared to the direct application of the cTAKES module to the Vanderbilt data.

UR - http://www.scopus.com/inward/record.url?scp=84880843518&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84880843518&partnerID=8YFLogxK

M3 - Article

VL - 2012

SP - 577

EP - 586

JO - AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium

JF - AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium

SN - 1559-4076

ER -