Disorder concept identification from clinical notes an experience with the ShARe/CLEF 2013 challenge

Jung Wei Fan, Navdeep Sood, Yang Huang

Research output: Contribution to journalConference article

5 Citations (Scopus)

Abstract

We participated in both tasks 1a and 1b of the ShARe/CLEF 2013 NLP Challenge, where 1a was on detecting disorder concept boundaries and 1b was on assigning concept IDs to the entities from 1a. An existing NLP system developed at Kaiser Permanente was modified to output concepts that were close to the disorder definition of the Challenge. The core pipeline involved deterministic section detection, tokenization, sentence chunking, probabilistic POS tagging, rule-based phrase chunking, terminology look-up (using UMLS 2012AB), rule-based concept disambiguation and post-coordination. The system originally identifies findings (both normal and abnormal), procedures, anatomies, etc., and therefore a post-filter was created to subset the concepts with the source (SNOMED) and semantic types expected by the Challenge. A list of frequency-ranked CUIs was extracted from the training corpus to help break ties when multiple concepts were proposed on a single set of span. However, no retraining/customization was made to meet the boundary annotation preference specified in the challenge guidelines. Our best settings achieved an F-score of 0.503 (was 0.684 with relaxed boundary penalty) in task 1a, and best accuracy of 0.443 (was 0.865 on relaxed boundaries) in task 1b.

Original languageEnglish (US)
JournalCEUR Workshop Proceedings
Volume1179
StatePublished - Jan 1 2013
Externally publishedYes
Event2013 Cross Language Evaluation Forum Conference, CLEF 2013 - Valencia, Spain
Duration: Sep 23 2013Sep 26 2013

Fingerprint

Terminology
Pipelines
Semantics

Keywords

  • Concept boundary detection
  • Concept normalization
  • Medical language processing

ASJC Scopus subject areas

  • Computer Science(all)

Cite this

Disorder concept identification from clinical notes an experience with the ShARe/CLEF 2013 challenge. / Fan, Jung Wei; Sood, Navdeep; Huang, Yang.

In: CEUR Workshop Proceedings, Vol. 1179, 01.01.2013.

Research output: Contribution to journalConference article

@article{46c425161ac14de58b985cfdc1d5d1ca,
title = "Disorder concept identification from clinical notes an experience with the ShARe/CLEF 2013 challenge",
abstract = "We participated in both tasks 1a and 1b of the ShARe/CLEF 2013 NLP Challenge, where 1a was on detecting disorder concept boundaries and 1b was on assigning concept IDs to the entities from 1a. An existing NLP system developed at Kaiser Permanente was modified to output concepts that were close to the disorder definition of the Challenge. The core pipeline involved deterministic section detection, tokenization, sentence chunking, probabilistic POS tagging, rule-based phrase chunking, terminology look-up (using UMLS 2012AB), rule-based concept disambiguation and post-coordination. The system originally identifies findings (both normal and abnormal), procedures, anatomies, etc., and therefore a post-filter was created to subset the concepts with the source (SNOMED) and semantic types expected by the Challenge. A list of frequency-ranked CUIs was extracted from the training corpus to help break ties when multiple concepts were proposed on a single set of span. However, no retraining/customization was made to meet the boundary annotation preference specified in the challenge guidelines. Our best settings achieved an F-score of 0.503 (was 0.684 with relaxed boundary penalty) in task 1a, and best accuracy of 0.443 (was 0.865 on relaxed boundaries) in task 1b.",
keywords = "Concept boundary detection, Concept normalization, Medical language processing",
author = "Fan, {Jung Wei} and Navdeep Sood and Yang Huang",
year = "2013",
month = "1",
day = "1",
language = "English (US)",
volume = "1179",
journal = "CEUR Workshop Proceedings",
issn = "1613-0073",
publisher = "CEUR-WS",

}

TY - JOUR

T1 - Disorder concept identification from clinical notes an experience with the ShARe/CLEF 2013 challenge

AU - Fan, Jung Wei

AU - Sood, Navdeep

AU - Huang, Yang

PY - 2013/1/1

Y1 - 2013/1/1

N2 - We participated in both tasks 1a and 1b of the ShARe/CLEF 2013 NLP Challenge, where 1a was on detecting disorder concept boundaries and 1b was on assigning concept IDs to the entities from 1a. An existing NLP system developed at Kaiser Permanente was modified to output concepts that were close to the disorder definition of the Challenge. The core pipeline involved deterministic section detection, tokenization, sentence chunking, probabilistic POS tagging, rule-based phrase chunking, terminology look-up (using UMLS 2012AB), rule-based concept disambiguation and post-coordination. The system originally identifies findings (both normal and abnormal), procedures, anatomies, etc., and therefore a post-filter was created to subset the concepts with the source (SNOMED) and semantic types expected by the Challenge. A list of frequency-ranked CUIs was extracted from the training corpus to help break ties when multiple concepts were proposed on a single set of span. However, no retraining/customization was made to meet the boundary annotation preference specified in the challenge guidelines. Our best settings achieved an F-score of 0.503 (was 0.684 with relaxed boundary penalty) in task 1a, and best accuracy of 0.443 (was 0.865 on relaxed boundaries) in task 1b.

AB - We participated in both tasks 1a and 1b of the ShARe/CLEF 2013 NLP Challenge, where 1a was on detecting disorder concept boundaries and 1b was on assigning concept IDs to the entities from 1a. An existing NLP system developed at Kaiser Permanente was modified to output concepts that were close to the disorder definition of the Challenge. The core pipeline involved deterministic section detection, tokenization, sentence chunking, probabilistic POS tagging, rule-based phrase chunking, terminology look-up (using UMLS 2012AB), rule-based concept disambiguation and post-coordination. The system originally identifies findings (both normal and abnormal), procedures, anatomies, etc., and therefore a post-filter was created to subset the concepts with the source (SNOMED) and semantic types expected by the Challenge. A list of frequency-ranked CUIs was extracted from the training corpus to help break ties when multiple concepts were proposed on a single set of span. However, no retraining/customization was made to meet the boundary annotation preference specified in the challenge guidelines. Our best settings achieved an F-score of 0.503 (was 0.684 with relaxed boundary penalty) in task 1a, and best accuracy of 0.443 (was 0.865 on relaxed boundaries) in task 1b.

KW - Concept boundary detection

KW - Concept normalization

KW - Medical language processing

UR - http://www.scopus.com/inward/record.url?scp=84922051496&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84922051496&partnerID=8YFLogxK

M3 - Conference article

AN - SCOPUS:84922051496

VL - 1179

JO - CEUR Workshop Proceedings

JF - CEUR Workshop Proceedings

SN - 1613-0073

ER -