An initial study of full parsing of clinical text using the Stanford Parser

Hua Xu, Samir AbdelRahman, Min Jiang, Jung Wei Fan, Yang Huang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

5 Scopus citations

Abstract

Full parsing recognizes a sentence and generates a syntactic structure of it (a parse tree), which is useful for many natural language processing (NLP) applications. The Stanford Parser is one of the state-of-art parsers in the general English domain. However, there is no formal evaluation of its performance in clinical text that often contains ungrammatical structures. In this study, we randomly selected 50 sentences in the clinical corpus from 2010 i2b2 NLP challenge and manually annotated them to create a gold standard of parse trees. Our evaluation showed that the original Stanford Parser achieved a bracketing F-measure (BF) of 77% on the gold standard. Moreover, we assessed the effect of part-of-speech (POS) tags on parsing and our results showed that manually corrected POS tags achieved a maximum BF of 81%. Furthermore, we analyzed errors of the Stanford Parser and provided valuable insights to large-scale parse tree annotation for clinical text.

Original languageEnglish (US)
Title of host publication2011 IEEE International Conference on Bioinformatics and Biomedicine Workshops, BIBMW 2011
Pages607-614
Number of pages8
DOIs
StatePublished - 2011
Event2011 IEEE International Conference on Bioinformatics and Biomedicine Workshops, BIBMW 2011 - Atlanta, GA, United States
Duration: Nov 12 2011Nov 15 2011

Publication series

Name2011 IEEE International Conference on Bioinformatics and Biomedicine Workshops, BIBMW 2011

Other

Other2011 IEEE International Conference on Bioinformatics and Biomedicine Workshops, BIBMW 2011
Country/TerritoryUnited States
CityAtlanta, GA
Period11/12/1111/15/11

ASJC Scopus subject areas

  • Biomedical Engineering
  • Health Informatics
  • Health Information Management

Fingerprint

Dive into the research topics of 'An initial study of full parsing of clinical text using the Stanford Parser'. Together they form a unique fingerprint.

Cite this