An initial study of full parsing of clinical text using the Stanford Parser

Hua Xu; Samir AbdelRahman; Min Jiang; Jung Wei Fan; Yang Huang

doi:10.1109/BIBMW.2011.6112438

An initial study of full parsing of clinical text using the Stanford Parser

Hua Xu, Samir AbdelRahman, Min Jiang, Jung Wei Fan, Yang Huang

Digital Health Sciences

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

5 Scopus citations

Abstract

Full parsing recognizes a sentence and generates a syntactic structure of it (a parse tree), which is useful for many natural language processing (NLP) applications. The Stanford Parser is one of the state-of-art parsers in the general English domain. However, there is no formal evaluation of its performance in clinical text that often contains ungrammatical structures. In this study, we randomly selected 50 sentences in the clinical corpus from 2010 i2b2 NLP challenge and manually annotated them to create a gold standard of parse trees. Our evaluation showed that the original Stanford Parser achieved a bracketing F-measure (BF) of 77% on the gold standard. Moreover, we assessed the effect of part-of-speech (POS) tags on parsing and our results showed that manually corrected POS tags achieved a maximum BF of 81%. Furthermore, we analyzed errors of the Stanford Parser and provided valuable insights to large-scale parse tree annotation for clinical text.

Original language	English (US)
Title of host publication	2011 IEEE International Conference on Bioinformatics and Biomedicine Workshops, BIBMW 2011
Pages	607-614
Number of pages	8
DOIs	https://doi.org/10.1109/BIBMW.2011.6112438
State	Published - 2011
Event	2011 IEEE International Conference on Bioinformatics and Biomedicine Workshops, BIBMW 2011 - Atlanta, GA, United States Duration: Nov 12 2011 → Nov 15 2011

Publication series

Name	2011 IEEE International Conference on Bioinformatics and Biomedicine Workshops, BIBMW 2011

Other

Other	2011 IEEE International Conference on Bioinformatics and Biomedicine Workshops, BIBMW 2011
Country/Territory	United States
City	Atlanta, GA
Period	11/12/11 → 11/15/11

ASJC Scopus subject areas

Biomedical Engineering
Health Informatics
Health Information Management

Access to Document

10.1109/BIBMW.2011.6112438

Cite this

Xu, H., AbdelRahman, S., Jiang, M., Fan, J. W., & Huang, Y. (2011). An initial study of full parsing of clinical text using the Stanford Parser. In 2011 IEEE International Conference on Bioinformatics and Biomedicine Workshops, BIBMW 2011 (pp. 607-614). Article 6112438 (2011 IEEE International Conference on Bioinformatics and Biomedicine Workshops, BIBMW 2011). https://doi.org/10.1109/BIBMW.2011.6112438

An initial study of full parsing of clinical text using the Stanford Parser. / Xu, Hua; AbdelRahman, Samir; Jiang, Min et al.
2011 IEEE International Conference on Bioinformatics and Biomedicine Workshops, BIBMW 2011. 2011. p. 607-614 6112438 (2011 IEEE International Conference on Bioinformatics and Biomedicine Workshops, BIBMW 2011).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Xu, H, AbdelRahman, S, Jiang, M, Fan, JW & Huang, Y 2011, An initial study of full parsing of clinical text using the Stanford Parser. in 2011 IEEE International Conference on Bioinformatics and Biomedicine Workshops, BIBMW 2011., 6112438, 2011 IEEE International Conference on Bioinformatics and Biomedicine Workshops, BIBMW 2011, pp. 607-614, 2011 IEEE International Conference on Bioinformatics and Biomedicine Workshops, BIBMW 2011, Atlanta, GA, United States, 11/12/11. https://doi.org/10.1109/BIBMW.2011.6112438

Xu H, AbdelRahman S, Jiang M, Fan JW, Huang Y. An initial study of full parsing of clinical text using the Stanford Parser. In 2011 IEEE International Conference on Bioinformatics and Biomedicine Workshops, BIBMW 2011. 2011. p. 607-614. 6112438. (2011 IEEE International Conference on Bioinformatics and Biomedicine Workshops, BIBMW 2011). doi: 10.1109/BIBMW.2011.6112438

@inproceedings{e5cdae4da1eb44bea44e45eb73f46aa1,

title = "An initial study of full parsing of clinical text using the Stanford Parser",

abstract = "Full parsing recognizes a sentence and generates a syntactic structure of it (a parse tree), which is useful for many natural language processing (NLP) applications. The Stanford Parser is one of the state-of-art parsers in the general English domain. However, there is no formal evaluation of its performance in clinical text that often contains ungrammatical structures. In this study, we randomly selected 50 sentences in the clinical corpus from 2010 i2b2 NLP challenge and manually annotated them to create a gold standard of parse trees. Our evaluation showed that the original Stanford Parser achieved a bracketing F-measure (BF) of 77% on the gold standard. Moreover, we assessed the effect of part-of-speech (POS) tags on parsing and our results showed that manually corrected POS tags achieved a maximum BF of 81%. Furthermore, we analyzed errors of the Stanford Parser and provided valuable insights to large-scale parse tree annotation for clinical text.",

author = "Hua Xu and Samir AbdelRahman and Min Jiang and Fan, {Jung Wei} and Yang Huang",

year = "2011",

doi = "10.1109/BIBMW.2011.6112438",

language = "English (US)",

isbn = "9781457716133",

series = "2011 IEEE International Conference on Bioinformatics and Biomedicine Workshops, BIBMW 2011",

pages = "607--614",

booktitle = "2011 IEEE International Conference on Bioinformatics and Biomedicine Workshops, BIBMW 2011",

note = "2011 IEEE International Conference on Bioinformatics and Biomedicine Workshops, BIBMW 2011 ; Conference date: 12-11-2011 Through 15-11-2011",

}

TY - GEN

T1 - An initial study of full parsing of clinical text using the Stanford Parser

AU - Xu, Hua

AU - AbdelRahman, Samir

AU - Jiang, Min

AU - Fan, Jung Wei

AU - Huang, Yang

PY - 2011

Y1 - 2011

N2 - Full parsing recognizes a sentence and generates a syntactic structure of it (a parse tree), which is useful for many natural language processing (NLP) applications. The Stanford Parser is one of the state-of-art parsers in the general English domain. However, there is no formal evaluation of its performance in clinical text that often contains ungrammatical structures. In this study, we randomly selected 50 sentences in the clinical corpus from 2010 i2b2 NLP challenge and manually annotated them to create a gold standard of parse trees. Our evaluation showed that the original Stanford Parser achieved a bracketing F-measure (BF) of 77% on the gold standard. Moreover, we assessed the effect of part-of-speech (POS) tags on parsing and our results showed that manually corrected POS tags achieved a maximum BF of 81%. Furthermore, we analyzed errors of the Stanford Parser and provided valuable insights to large-scale parse tree annotation for clinical text.

AB - Full parsing recognizes a sentence and generates a syntactic structure of it (a parse tree), which is useful for many natural language processing (NLP) applications. The Stanford Parser is one of the state-of-art parsers in the general English domain. However, there is no formal evaluation of its performance in clinical text that often contains ungrammatical structures. In this study, we randomly selected 50 sentences in the clinical corpus from 2010 i2b2 NLP challenge and manually annotated them to create a gold standard of parse trees. Our evaluation showed that the original Stanford Parser achieved a bracketing F-measure (BF) of 77% on the gold standard. Moreover, we assessed the effect of part-of-speech (POS) tags on parsing and our results showed that manually corrected POS tags achieved a maximum BF of 81%. Furthermore, we analyzed errors of the Stanford Parser and provided valuable insights to large-scale parse tree annotation for clinical text.

UR - http://www.scopus.com/inward/record.url?scp=84862912300&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84862912300&partnerID=8YFLogxK

U2 - 10.1109/BIBMW.2011.6112438

DO - 10.1109/BIBMW.2011.6112438

M3 - Conference contribution

AN - SCOPUS:84862912300

SN - 9781457716133

T3 - 2011 IEEE International Conference on Bioinformatics and Biomedicine Workshops, BIBMW 2011

SP - 607

EP - 614

BT - 2011 IEEE International Conference on Bioinformatics and Biomedicine Workshops, BIBMW 2011

T2 - 2011 IEEE International Conference on Bioinformatics and Biomedicine Workshops, BIBMW 2011

Y2 - 12 November 2011 through 15 November 2011

ER -

An initial study of full parsing of clinical text using the Stanford Parser

Abstract

Publication series

Other

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this