Comprehensive temporal information detection from clinical text

Medical events, time, and TLINK identification

Sunghwan Sohn, Kavishwar B. Wagholikar, Dingcheng Li, Siddhartha R. Jonnalagadda, Cui Tao, Ravikumar Komandur Elayavilli, Hongfang D Liu

Research output: Contribution to journalArticle

24 Citations (Scopus)

Abstract

Background: Temporal information detection systems have been developed by the Mayo Clinic for the 2012 i2b2 Natural Language Processing Challenge. Objective: To construct automated systems for EVENT/ TIMEX3 extraction and temporal link (TLINK) identification from clinical text. Materials and methods: The i2b2 organizers provided 190 annotated discharge summaries as the training set and 120 discharge summaries as the test set. Our Event system used a conditional random field classifier with a variety of features including lexical information, natural language elements, and medical ontology. The TIMEX3 system employed a rule-based method using regular expression pattern match and systematic reasoning to determine normalized values. The TLINK system employed both rule-based reasoning and machine learning. All three systems were built in an Apache Unstructured Information Management Architecture framework. Results: Our TIMEX3 system performed the best (F-measure of 0.900, value accuracy 0.731) among the challenge teams. The Event system produced an F-measure of 0.870, and the TLINK system an F-measure of 0.537. Conclusions: Our TIMEX3 system demonstrated good capability of regular expression rules to extract and normalize time information. Event and TLINK machine learning systems required well-defined feature sets to perform well. We could also leverage expert knowledge as part of the machine learning features to further improve TLINK identification performance.

Original languageEnglish (US)
Pages (from-to)836-842
Number of pages7
JournalJournal of the American Medical Informatics Association
Volume20
Issue number5
DOIs
StatePublished - 2013

Fingerprint

Natural Language Processing
Information Management
Information Systems
Language
Machine Learning

ASJC Scopus subject areas

  • Health Informatics

Cite this

Comprehensive temporal information detection from clinical text : Medical events, time, and TLINK identification. / Sohn, Sunghwan; Wagholikar, Kavishwar B.; Li, Dingcheng; Jonnalagadda, Siddhartha R.; Tao, Cui; Elayavilli, Ravikumar Komandur; Liu, Hongfang D.

In: Journal of the American Medical Informatics Association, Vol. 20, No. 5, 2013, p. 836-842.

Research output: Contribution to journalArticle

Sohn, Sunghwan ; Wagholikar, Kavishwar B. ; Li, Dingcheng ; Jonnalagadda, Siddhartha R. ; Tao, Cui ; Elayavilli, Ravikumar Komandur ; Liu, Hongfang D. / Comprehensive temporal information detection from clinical text : Medical events, time, and TLINK identification. In: Journal of the American Medical Informatics Association. 2013 ; Vol. 20, No. 5. pp. 836-842.
@article{66f113c8b7f34cd786a51c9d4e78095a,
title = "Comprehensive temporal information detection from clinical text: Medical events, time, and TLINK identification",
abstract = "Background: Temporal information detection systems have been developed by the Mayo Clinic for the 2012 i2b2 Natural Language Processing Challenge. Objective: To construct automated systems for EVENT/ TIMEX3 extraction and temporal link (TLINK) identification from clinical text. Materials and methods: The i2b2 organizers provided 190 annotated discharge summaries as the training set and 120 discharge summaries as the test set. Our Event system used a conditional random field classifier with a variety of features including lexical information, natural language elements, and medical ontology. The TIMEX3 system employed a rule-based method using regular expression pattern match and systematic reasoning to determine normalized values. The TLINK system employed both rule-based reasoning and machine learning. All three systems were built in an Apache Unstructured Information Management Architecture framework. Results: Our TIMEX3 system performed the best (F-measure of 0.900, value accuracy 0.731) among the challenge teams. The Event system produced an F-measure of 0.870, and the TLINK system an F-measure of 0.537. Conclusions: Our TIMEX3 system demonstrated good capability of regular expression rules to extract and normalize time information. Event and TLINK machine learning systems required well-defined feature sets to perform well. We could also leverage expert knowledge as part of the machine learning features to further improve TLINK identification performance.",
author = "Sunghwan Sohn and Wagholikar, {Kavishwar B.} and Dingcheng Li and Jonnalagadda, {Siddhartha R.} and Cui Tao and Elayavilli, {Ravikumar Komandur} and Liu, {Hongfang D}",
year = "2013",
doi = "10.1136/amiajnl-2013-001622",
language = "English (US)",
volume = "20",
pages = "836--842",
journal = "Journal of the American Medical Informatics Association : JAMIA",
issn = "1067-5027",
publisher = "Oxford University Press",
number = "5",

}

TY - JOUR

T1 - Comprehensive temporal information detection from clinical text

T2 - Medical events, time, and TLINK identification

AU - Sohn, Sunghwan

AU - Wagholikar, Kavishwar B.

AU - Li, Dingcheng

AU - Jonnalagadda, Siddhartha R.

AU - Tao, Cui

AU - Elayavilli, Ravikumar Komandur

AU - Liu, Hongfang D

PY - 2013

Y1 - 2013

N2 - Background: Temporal information detection systems have been developed by the Mayo Clinic for the 2012 i2b2 Natural Language Processing Challenge. Objective: To construct automated systems for EVENT/ TIMEX3 extraction and temporal link (TLINK) identification from clinical text. Materials and methods: The i2b2 organizers provided 190 annotated discharge summaries as the training set and 120 discharge summaries as the test set. Our Event system used a conditional random field classifier with a variety of features including lexical information, natural language elements, and medical ontology. The TIMEX3 system employed a rule-based method using regular expression pattern match and systematic reasoning to determine normalized values. The TLINK system employed both rule-based reasoning and machine learning. All three systems were built in an Apache Unstructured Information Management Architecture framework. Results: Our TIMEX3 system performed the best (F-measure of 0.900, value accuracy 0.731) among the challenge teams. The Event system produced an F-measure of 0.870, and the TLINK system an F-measure of 0.537. Conclusions: Our TIMEX3 system demonstrated good capability of regular expression rules to extract and normalize time information. Event and TLINK machine learning systems required well-defined feature sets to perform well. We could also leverage expert knowledge as part of the machine learning features to further improve TLINK identification performance.

AB - Background: Temporal information detection systems have been developed by the Mayo Clinic for the 2012 i2b2 Natural Language Processing Challenge. Objective: To construct automated systems for EVENT/ TIMEX3 extraction and temporal link (TLINK) identification from clinical text. Materials and methods: The i2b2 organizers provided 190 annotated discharge summaries as the training set and 120 discharge summaries as the test set. Our Event system used a conditional random field classifier with a variety of features including lexical information, natural language elements, and medical ontology. The TIMEX3 system employed a rule-based method using regular expression pattern match and systematic reasoning to determine normalized values. The TLINK system employed both rule-based reasoning and machine learning. All three systems were built in an Apache Unstructured Information Management Architecture framework. Results: Our TIMEX3 system performed the best (F-measure of 0.900, value accuracy 0.731) among the challenge teams. The Event system produced an F-measure of 0.870, and the TLINK system an F-measure of 0.537. Conclusions: Our TIMEX3 system demonstrated good capability of regular expression rules to extract and normalize time information. Event and TLINK machine learning systems required well-defined feature sets to perform well. We could also leverage expert knowledge as part of the machine learning features to further improve TLINK identification performance.

UR - http://www.scopus.com/inward/record.url?scp=84881183249&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84881183249&partnerID=8YFLogxK

U2 - 10.1136/amiajnl-2013-001622

DO - 10.1136/amiajnl-2013-001622

M3 - Article

VL - 20

SP - 836

EP - 842

JO - Journal of the American Medical Informatics Association : JAMIA

JF - Journal of the American Medical Informatics Association : JAMIA

SN - 1067-5027

IS - 5

ER -