Integrated cTAKES for concept mention detection and normalization

Hongfang Liu; Kavishwar Wagholikar; Siddhartha Jonnalagadda; Sunghwan Sohn

Integrated cTAKES for concept mention detection and normalization

Hongfang Liu, Kavishwar Wagholikar, Siddhartha Jonnalagadda, Sunghwan Sohn

Research output: Contribution to journal › Conference article › peer-review

Abstract

We participated Task 1 using an existing system MedTagger implemented in inte-grated cTAKES (icTAKES). The concept mention detection is based on Conditional Random Fields (CRF) and the concept mention normalization is based on a greedy dictionary lookup algorithm. A distinctive feature in MedTagger compared to other concept mention detection systems is the incorporation of dictionary lookup results into a machine learning framework for sequential labeling. Dictionary lookup results of MedLex and semantic vectors representing distributed semantics were used as features. Overall, the precision, recall, and F-measure of our best run for concept mention are 0.8, 0.573, and 0.668 respectively for strict evaluation and 0.939, 0.766, and 0.844 for relaxed evaluation. The accuracy of our best run for concept men-tion normalization is 54.6% and 87.0% for strict and relaxed mapping, respectively.

Original language	English (US)
Journal	CEUR Workshop Proceedings
Volume	1179
State	Published - 2013
Event	2013 Cross Language Evaluation Forum Conference, CLEF 2013 - Valencia, Spain Duration: Sep 23 2013 → Sep 26 2013

Keywords

Conditional random fields
Dictionary lookup
Distributed semantics
Named entity recognition
Normalization

ASJC Scopus subject areas

General Computer Science

Cite this

@article{acebdcb129d940efb0c90bfb6c098829,

title = "Integrated cTAKES for concept mention detection and normalization",

abstract = "We participated Task 1 using an existing system MedTagger implemented in inte-grated cTAKES (icTAKES). The concept mention detection is based on Conditional Random Fields (CRF) and the concept mention normalization is based on a greedy dictionary lookup algorithm. A distinctive feature in MedTagger compared to other concept mention detection systems is the incorporation of dictionary lookup results into a machine learning framework for sequential labeling. Dictionary lookup results of MedLex and semantic vectors representing distributed semantics were used as features. Overall, the precision, recall, and F-measure of our best run for concept mention are 0.8, 0.573, and 0.668 respectively for strict evaluation and 0.939, 0.766, and 0.844 for relaxed evaluation. The accuracy of our best run for concept men-tion normalization is 54.6% and 87.0% for strict and relaxed mapping, respectively.",

keywords = "Conditional random fields, Dictionary lookup, Distributed semantics, Named entity recognition, Normalization",

author = "Hongfang Liu and Kavishwar Wagholikar and Siddhartha Jonnalagadda and Sunghwan Sohn",

year = "2013",

language = "English (US)",

volume = "1179",

journal = "CEUR Workshop Proceedings",

issn = "1613-0073",

publisher = "CEUR-WS",

note = "2013 Cross Language Evaluation Forum Conference, CLEF 2013 ; Conference date: 23-09-2013 Through 26-09-2013",

}

TY - JOUR

T1 - Integrated cTAKES for concept mention detection and normalization

AU - Liu, Hongfang

AU - Wagholikar, Kavishwar

AU - Jonnalagadda, Siddhartha

AU - Sohn, Sunghwan

PY - 2013

Y1 - 2013

N2 - We participated Task 1 using an existing system MedTagger implemented in inte-grated cTAKES (icTAKES). The concept mention detection is based on Conditional Random Fields (CRF) and the concept mention normalization is based on a greedy dictionary lookup algorithm. A distinctive feature in MedTagger compared to other concept mention detection systems is the incorporation of dictionary lookup results into a machine learning framework for sequential labeling. Dictionary lookup results of MedLex and semantic vectors representing distributed semantics were used as features. Overall, the precision, recall, and F-measure of our best run for concept mention are 0.8, 0.573, and 0.668 respectively for strict evaluation and 0.939, 0.766, and 0.844 for relaxed evaluation. The accuracy of our best run for concept men-tion normalization is 54.6% and 87.0% for strict and relaxed mapping, respectively.

AB - We participated Task 1 using an existing system MedTagger implemented in inte-grated cTAKES (icTAKES). The concept mention detection is based on Conditional Random Fields (CRF) and the concept mention normalization is based on a greedy dictionary lookup algorithm. A distinctive feature in MedTagger compared to other concept mention detection systems is the incorporation of dictionary lookup results into a machine learning framework for sequential labeling. Dictionary lookup results of MedLex and semantic vectors representing distributed semantics were used as features. Overall, the precision, recall, and F-measure of our best run for concept mention are 0.8, 0.573, and 0.668 respectively for strict evaluation and 0.939, 0.766, and 0.844 for relaxed evaluation. The accuracy of our best run for concept men-tion normalization is 54.6% and 87.0% for strict and relaxed mapping, respectively.

KW - Conditional random fields

KW - Dictionary lookup

KW - Distributed semantics

KW - Named entity recognition

KW - Normalization

UR - http://www.scopus.com/inward/record.url?scp=84922041543&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84922041543&partnerID=8YFLogxK

M3 - Conference article

AN - SCOPUS:84922041543

SN - 1613-0073

VL - 1179

JO - CEUR Workshop Proceedings

JF - CEUR Workshop Proceedings

T2 - 2013 Cross Language Evaluation Forum Conference, CLEF 2013

Y2 - 23 September 2013 through 26 September 2013

ER -

Integrated cTAKES for concept mention detection and normalization

Abstract

Keywords

ASJC Scopus subject areas

Other files and links

Fingerprint

Cite this