Abstract
We participated in both tasks 1a and 1b of the ShARe/CLEF 2013 NLP Challenge, where 1a was on detecting disorder concept boundaries and 1b was on assigning concept IDs to the entities from 1a. An existing NLP system developed at Kaiser Permanente was modified to output concepts that were close to the disorder definition of the Challenge. The core pipeline involved deterministic section detection, tokenization, sentence chunking, probabilistic POS tagging, rule-based phrase chunking, terminology look-up (using UMLS 2012AB), rule-based concept disambiguation and post-coordination. The system originally identifies findings (both normal and abnormal), procedures, anatomies, etc., and therefore a post-filter was created to subset the concepts with the source (SNOMED) and semantic types expected by the Challenge. A list of frequency-ranked CUIs was extracted from the training corpus to help break ties when multiple concepts were proposed on a single set of span. However, no retraining/customization was made to meet the boundary annotation preference specified in the challenge guidelines. Our best settings achieved an F-score of 0.503 (was 0.684 with relaxed boundary penalty) in task 1a, and best accuracy of 0.443 (was 0.865 on relaxed boundaries) in task 1b.
Original language | English (US) |
---|---|
Journal | CEUR Workshop Proceedings |
Volume | 1179 |
State | Published - 2013 |
Event | 2013 Cross Language Evaluation Forum Conference, CLEF 2013 - Valencia, Spain Duration: Sep 23 2013 → Sep 26 2013 |
Keywords
- Concept boundary detection
- Concept normalization
- Medical language processing
ASJC Scopus subject areas
- Computer Science(all)