Automatic information extraction from unstructured mammography reports using distributed semantics

Anupama Gupta, Imon Banerjee, Daniel L. Rubin

Research output: Contribution to journalArticlepeer-review

Abstract

To date, the methods developed for automated extraction of information from radiology reports are mainly rule-based or dictionary-based, and, therefore, require substantial manual effort to build these systems. Recent efforts to develop automated systems for entity detection have been undertaken, but little work has been done to automatically extract relations and their associated named entities in narrative radiology reports that have comparable accuracy to rule-based methods. Our goal is to extract relations in a unsupervised way from radiology reports without specifying prior domain knowledge. We propose a hybrid approach for information extraction that combines dependency-based parse tree with distributed semantics for generating structured information frames about particular findings/abnormalities from the free-text mammography reports. The proposed IE system obtains a F1-score of 0.94 in terms of completeness of the content in the information frames, which outperforms a state-of-the-art rule-based system in this domain by a significant margin. The proposed system can be leveraged in a variety of applications, such as decision support and information retrieval, and may also easily scale to other radiology domains, since there is no need to tune the system with hand-crafted information extraction rules.

Original languageEnglish (US)
Pages (from-to)78-86
Number of pages9
JournalJournal of Biomedical Informatics
Volume78
DOIs
StatePublished - Feb 2018

Keywords

  • Information extraction
  • Information frames
  • Report annotation
  • Word embedding

ASJC Scopus subject areas

  • Computer Science Applications
  • Health Informatics

Fingerprint

Dive into the research topics of 'Automatic information extraction from unstructured mammography reports using distributed semantics'. Together they form a unique fingerprint.

Cite this