Ontology-based temporal relation modeling with map-reduce latent dirichlet allocations for big EHR data

Dingcheng Li, Cui Tao, Hongfang Liu, Christopher Chute

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Scopus citations

Abstract

In this paper, we propose a model called Temporal & Co reference Topic Modeling (TCTM) to do automatic annotation with respect to the Time Event Ontology (TEO) for the big-size Electronic Health Record (EHR). TCTM, based on Latent Dirichlet Allocations (LDA) and integrated into MapReduce framework, inherently addresses the twin problem of data sparseness and high dimensionality. As a non-parametric Bayesian model, it can flexibly add new attributes or features. Side information associated with corpora, such as section header, timestamp, sentence distance, event distance or disease category in clinical notes makes latent topics more interpretable and more biased toward co referring events. Furthermore, TCTM integrates Hidden Markov Model LDA (HMM-LDA) to obtain the power of both sequential modeling and exchangeability. A MapReduce based variational method is employed to do parameter estimation and inferences, thus enabling TCTM to overcome the bottleneck brought by big data.

Original languageEnglish (US)
Title of host publicationProceedings - 2nd International Conference on Cloud and Green Computing and 2nd International Conference on Social Computing and Its Applications, CGC/SCA 2012
Pages708-715
Number of pages8
DOIs
StatePublished - 2012
Event2nd International Conference on Cloud and Green Computing, CGC 2012, Held Jointly with the 2nd International Conference on Social Computing and Its Applications, SCA 2012 - Xiangtan, Hunan, China
Duration: Nov 1 2012Nov 3 2012

Publication series

NameProceedings - 2nd International Conference on Cloud and Green Computing and 2nd International Conference on Social Computing and Its Applications, CGC/SCA 2012

Other

Other2nd International Conference on Cloud and Green Computing, CGC 2012, Held Jointly with the 2nd International Conference on Social Computing and Its Applications, SCA 2012
Country/TerritoryChina
CityXiangtan, Hunan
Period11/1/1211/3/12

Keywords

  • MapReduce
  • event coreference resolution
  • latent Dirichlet allocations
  • temporal relation annotation

ASJC Scopus subject areas

  • Computer Science Applications
  • Software

Fingerprint

Dive into the research topics of 'Ontology-based temporal relation modeling with map-reduce latent dirichlet allocations for big EHR data'. Together they form a unique fingerprint.

Cite this