A frequency-filtering strategy of obtaining PHI-free sentences from clinical data repository

Dingcheng Li, Majid Rastegar-Mojarad, Ravikumar Komandur Elayavilli, Yanshan Wang, Saeed Mehrabi, Yue Yu, Sunghwan Sohn, Yanpeng Li, Naveed Afzal, Hongfang Liu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Scopus citations

Abstract

Clinical natural language processing (NLP) has become indispensable in the secondary use of electronic medical records (EMRs). However, it is found that current clinical NLP tools face the problem of portability among different institutes. An ideal solution to this problem is cross-institutional data sharing. However, the legal enforcement of no revelation of protected health information (PHI) obstructs this practice even with the availability of state-of-the-art de-identification tools. In this paper, we investigated the use of a frequency-filtering approach to extract PHI-free sentences utilizing the Enterprise Data Trust (EDT), a large collection of EMRs at Mayo Clinic. Our approach is based on the assumption that sentences appearing frequently tend to contain no PHI. This assumption originates from the observation that there exist a large number of redundant descriptions of similar patient conditions in EDT. Both manual and automatic evaluations on the sentence set with frequencies higher than one show no PHI are found. The promising results demonstrate the potential of sharing highly frequent sentences among institutes.

Original languageEnglish (US)
Title of host publicationBCB 2015 - 6th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics
PublisherAssociation for Computing Machinery, Inc
Pages315-324
Number of pages10
ISBN (Print)9781450338530
DOIs
StatePublished - Sep 9 2015
Event6th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics, BCB 2015 - Atlanta, United States
Duration: Sep 9 2015Sep 12 2015

Other

Other6th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics, BCB 2015
CountryUnited States
CityAtlanta
Period9/9/159/12/15

Keywords

  • Cross-institutional data-sharing
  • EMR
  • Frequency-filtering strategy
  • PHI-free
  • Protected health information
  • Sentence frequency
  • Word bigram

ASJC Scopus subject areas

  • Software
  • Health Informatics
  • Computer Science Applications
  • Biomedical Engineering

Fingerprint Dive into the research topics of 'A frequency-filtering strategy of obtaining PHI-free sentences from clinical data repository'. Together they form a unique fingerprint.

Cite this