Disorder concept identification from clinical notes an experience with the ShARe/CLEF 2013 challenge

Jung Wei Fan; Navdeep Sood; Yang Huang

Disorder concept identification from clinical notes an experience with the ShARe/CLEF 2013 challenge

Jung Wei Fan, Navdeep Sood, Yang Huang

Digital Health Sciences

Research output: Contribution to journal › Conference article › peer-review

6 Scopus citations

Abstract

We participated in both tasks 1a and 1b of the ShARe/CLEF 2013 NLP Challenge, where 1a was on detecting disorder concept boundaries and 1b was on assigning concept IDs to the entities from 1a. An existing NLP system developed at Kaiser Permanente was modified to output concepts that were close to the disorder definition of the Challenge. The core pipeline involved deterministic section detection, tokenization, sentence chunking, probabilistic POS tagging, rule-based phrase chunking, terminology look-up (using UMLS 2012AB), rule-based concept disambiguation and post-coordination. The system originally identifies findings (both normal and abnormal), procedures, anatomies, etc., and therefore a post-filter was created to subset the concepts with the source (SNOMED) and semantic types expected by the Challenge. A list of frequency-ranked CUIs was extracted from the training corpus to help break ties when multiple concepts were proposed on a single set of span. However, no retraining/customization was made to meet the boundary annotation preference specified in the challenge guidelines. Our best settings achieved an F-score of 0.503 (was 0.684 with relaxed boundary penalty) in task 1a, and best accuracy of 0.443 (was 0.865 on relaxed boundaries) in task 1b.

Original language	English (US)
Journal	CEUR Workshop Proceedings
Volume	1179
State	Published - 2013
Event	2013 Cross Language Evaluation Forum Conference, CLEF 2013 - Valencia, Spain Duration: Sep 23 2013 → Sep 26 2013

Keywords

Concept boundary detection
Concept normalization
Medical language processing

ASJC Scopus subject areas

General Computer Science

Cite this

@article{46c425161ac14de58b985cfdc1d5d1ca,

title = "Disorder concept identification from clinical notes an experience with the ShARe/CLEF 2013 challenge",

abstract = "We participated in both tasks 1a and 1b of the ShARe/CLEF 2013 NLP Challenge, where 1a was on detecting disorder concept boundaries and 1b was on assigning concept IDs to the entities from 1a. An existing NLP system developed at Kaiser Permanente was modified to output concepts that were close to the disorder definition of the Challenge. The core pipeline involved deterministic section detection, tokenization, sentence chunking, probabilistic POS tagging, rule-based phrase chunking, terminology look-up (using UMLS 2012AB), rule-based concept disambiguation and post-coordination. The system originally identifies findings (both normal and abnormal), procedures, anatomies, etc., and therefore a post-filter was created to subset the concepts with the source (SNOMED) and semantic types expected by the Challenge. A list of frequency-ranked CUIs was extracted from the training corpus to help break ties when multiple concepts were proposed on a single set of span. However, no retraining/customization was made to meet the boundary annotation preference specified in the challenge guidelines. Our best settings achieved an F-score of 0.503 (was 0.684 with relaxed boundary penalty) in task 1a, and best accuracy of 0.443 (was 0.865 on relaxed boundaries) in task 1b.",

keywords = "Concept boundary detection, Concept normalization, Medical language processing",

author = "Fan, {Jung Wei} and Navdeep Sood and Yang Huang",

year = "2013",

language = "English (US)",

volume = "1179",

journal = "CEUR Workshop Proceedings",

issn = "1613-0073",

publisher = "CEUR-WS",

note = "2013 Cross Language Evaluation Forum Conference, CLEF 2013 ; Conference date: 23-09-2013 Through 26-09-2013",

}

TY - JOUR

T1 - Disorder concept identification from clinical notes an experience with the ShARe/CLEF 2013 challenge

AU - Fan, Jung Wei

AU - Sood, Navdeep

AU - Huang, Yang

PY - 2013

Y1 - 2013

N2 - We participated in both tasks 1a and 1b of the ShARe/CLEF 2013 NLP Challenge, where 1a was on detecting disorder concept boundaries and 1b was on assigning concept IDs to the entities from 1a. An existing NLP system developed at Kaiser Permanente was modified to output concepts that were close to the disorder definition of the Challenge. The core pipeline involved deterministic section detection, tokenization, sentence chunking, probabilistic POS tagging, rule-based phrase chunking, terminology look-up (using UMLS 2012AB), rule-based concept disambiguation and post-coordination. The system originally identifies findings (both normal and abnormal), procedures, anatomies, etc., and therefore a post-filter was created to subset the concepts with the source (SNOMED) and semantic types expected by the Challenge. A list of frequency-ranked CUIs was extracted from the training corpus to help break ties when multiple concepts were proposed on a single set of span. However, no retraining/customization was made to meet the boundary annotation preference specified in the challenge guidelines. Our best settings achieved an F-score of 0.503 (was 0.684 with relaxed boundary penalty) in task 1a, and best accuracy of 0.443 (was 0.865 on relaxed boundaries) in task 1b.

AB - We participated in both tasks 1a and 1b of the ShARe/CLEF 2013 NLP Challenge, where 1a was on detecting disorder concept boundaries and 1b was on assigning concept IDs to the entities from 1a. An existing NLP system developed at Kaiser Permanente was modified to output concepts that were close to the disorder definition of the Challenge. The core pipeline involved deterministic section detection, tokenization, sentence chunking, probabilistic POS tagging, rule-based phrase chunking, terminology look-up (using UMLS 2012AB), rule-based concept disambiguation and post-coordination. The system originally identifies findings (both normal and abnormal), procedures, anatomies, etc., and therefore a post-filter was created to subset the concepts with the source (SNOMED) and semantic types expected by the Challenge. A list of frequency-ranked CUIs was extracted from the training corpus to help break ties when multiple concepts were proposed on a single set of span. However, no retraining/customization was made to meet the boundary annotation preference specified in the challenge guidelines. Our best settings achieved an F-score of 0.503 (was 0.684 with relaxed boundary penalty) in task 1a, and best accuracy of 0.443 (was 0.865 on relaxed boundaries) in task 1b.

KW - Concept boundary detection

KW - Concept normalization

KW - Medical language processing

UR - http://www.scopus.com/inward/record.url?scp=84922051496&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84922051496&partnerID=8YFLogxK

M3 - Conference article

AN - SCOPUS:84922051496

SN - 1613-0073

VL - 1179

JO - CEUR Workshop Proceedings

JF - CEUR Workshop Proceedings

T2 - 2013 Cross Language Evaluation Forum Conference, CLEF 2013

Y2 - 23 September 2013 through 26 September 2013

ER -

Disorder concept identification from clinical notes an experience with the ShARe/CLEF 2013 challenge

Abstract

Keywords

ASJC Scopus subject areas

Other files and links

Fingerprint

Cite this