Ontology-based temporal relation modeling with map-reduce latent dirichlet allocations for big EHR data

Dingcheng Li, Cui Tao, Hongfang D Liu, Christopher Chute

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

In this paper, we propose a model called Temporal & Co reference Topic Modeling (TCTM) to do automatic annotation with respect to the Time Event Ontology (TEO) for the big-size Electronic Health Record (EHR). TCTM, based on Latent Dirichlet Allocations (LDA) and integrated into MapReduce framework, inherently addresses the twin problem of data sparseness and high dimensionality. As a non-parametric Bayesian model, it can flexibly add new attributes or features. Side information associated with corpora, such as section header, timestamp, sentence distance, event distance or disease category in clinical notes makes latent topics more interpretable and more biased toward co referring events. Furthermore, TCTM integrates Hidden Markov Model LDA (HMM-LDA) to obtain the power of both sequential modeling and exchangeability. A MapReduce based variational method is employed to do parameter estimation and inferences, thus enabling TCTM to overcome the bottleneck brought by big data.

Original languageEnglish (US)
Title of host publicationProceedings - 2nd International Conference on Cloud and Green Computing and 2nd International Conference on Social Computing and Its Applications, CGC/SCA 2012
Pages708-715
Number of pages8
DOIs
StatePublished - 2012
Event2nd International Conference on Cloud and Green Computing, CGC 2012, Held Jointly with the 2nd International Conference on Social Computing and Its Applications, SCA 2012 - Xiangtan, Hunan, China
Duration: Nov 1 2012Nov 3 2012

Other

Other2nd International Conference on Cloud and Green Computing, CGC 2012, Held Jointly with the 2nd International Conference on Social Computing and Its Applications, SCA 2012
CountryChina
CityXiangtan, Hunan
Period11/1/1211/3/12

Fingerprint

Ontology
Health
Hidden Markov models
Parameter estimation
Big data

Keywords

  • event coreference resolution
  • latent Dirichlet allocations
  • MapReduce
  • temporal relation annotation

ASJC Scopus subject areas

  • Computer Science Applications
  • Software

Cite this

Li, D., Tao, C., Liu, H. D., & Chute, C. (2012). Ontology-based temporal relation modeling with map-reduce latent dirichlet allocations for big EHR data. In Proceedings - 2nd International Conference on Cloud and Green Computing and 2nd International Conference on Social Computing and Its Applications, CGC/SCA 2012 (pp. 708-715). [6382894] https://doi.org/10.1109/CGC.2012.112

Ontology-based temporal relation modeling with map-reduce latent dirichlet allocations for big EHR data. / Li, Dingcheng; Tao, Cui; Liu, Hongfang D; Chute, Christopher.

Proceedings - 2nd International Conference on Cloud and Green Computing and 2nd International Conference on Social Computing and Its Applications, CGC/SCA 2012. 2012. p. 708-715 6382894.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Li, D, Tao, C, Liu, HD & Chute, C 2012, Ontology-based temporal relation modeling with map-reduce latent dirichlet allocations for big EHR data. in Proceedings - 2nd International Conference on Cloud and Green Computing and 2nd International Conference on Social Computing and Its Applications, CGC/SCA 2012., 6382894, pp. 708-715, 2nd International Conference on Cloud and Green Computing, CGC 2012, Held Jointly with the 2nd International Conference on Social Computing and Its Applications, SCA 2012, Xiangtan, Hunan, China, 11/1/12. https://doi.org/10.1109/CGC.2012.112
Li D, Tao C, Liu HD, Chute C. Ontology-based temporal relation modeling with map-reduce latent dirichlet allocations for big EHR data. In Proceedings - 2nd International Conference on Cloud and Green Computing and 2nd International Conference on Social Computing and Its Applications, CGC/SCA 2012. 2012. p. 708-715. 6382894 https://doi.org/10.1109/CGC.2012.112
Li, Dingcheng ; Tao, Cui ; Liu, Hongfang D ; Chute, Christopher. / Ontology-based temporal relation modeling with map-reduce latent dirichlet allocations for big EHR data. Proceedings - 2nd International Conference on Cloud and Green Computing and 2nd International Conference on Social Computing and Its Applications, CGC/SCA 2012. 2012. pp. 708-715
@inproceedings{04076f08217848caa982930cb3b69f1c,
title = "Ontology-based temporal relation modeling with map-reduce latent dirichlet allocations for big EHR data",
abstract = "In this paper, we propose a model called Temporal & Co reference Topic Modeling (TCTM) to do automatic annotation with respect to the Time Event Ontology (TEO) for the big-size Electronic Health Record (EHR). TCTM, based on Latent Dirichlet Allocations (LDA) and integrated into MapReduce framework, inherently addresses the twin problem of data sparseness and high dimensionality. As a non-parametric Bayesian model, it can flexibly add new attributes or features. Side information associated with corpora, such as section header, timestamp, sentence distance, event distance or disease category in clinical notes makes latent topics more interpretable and more biased toward co referring events. Furthermore, TCTM integrates Hidden Markov Model LDA (HMM-LDA) to obtain the power of both sequential modeling and exchangeability. A MapReduce based variational method is employed to do parameter estimation and inferences, thus enabling TCTM to overcome the bottleneck brought by big data.",
keywords = "event coreference resolution, latent Dirichlet allocations, MapReduce, temporal relation annotation",
author = "Dingcheng Li and Cui Tao and Liu, {Hongfang D} and Christopher Chute",
year = "2012",
doi = "10.1109/CGC.2012.112",
language = "English (US)",
isbn = "9780769548647",
pages = "708--715",
booktitle = "Proceedings - 2nd International Conference on Cloud and Green Computing and 2nd International Conference on Social Computing and Its Applications, CGC/SCA 2012",

}

TY - GEN

T1 - Ontology-based temporal relation modeling with map-reduce latent dirichlet allocations for big EHR data

AU - Li, Dingcheng

AU - Tao, Cui

AU - Liu, Hongfang D

AU - Chute, Christopher

PY - 2012

Y1 - 2012

N2 - In this paper, we propose a model called Temporal & Co reference Topic Modeling (TCTM) to do automatic annotation with respect to the Time Event Ontology (TEO) for the big-size Electronic Health Record (EHR). TCTM, based on Latent Dirichlet Allocations (LDA) and integrated into MapReduce framework, inherently addresses the twin problem of data sparseness and high dimensionality. As a non-parametric Bayesian model, it can flexibly add new attributes or features. Side information associated with corpora, such as section header, timestamp, sentence distance, event distance or disease category in clinical notes makes latent topics more interpretable and more biased toward co referring events. Furthermore, TCTM integrates Hidden Markov Model LDA (HMM-LDA) to obtain the power of both sequential modeling and exchangeability. A MapReduce based variational method is employed to do parameter estimation and inferences, thus enabling TCTM to overcome the bottleneck brought by big data.

AB - In this paper, we propose a model called Temporal & Co reference Topic Modeling (TCTM) to do automatic annotation with respect to the Time Event Ontology (TEO) for the big-size Electronic Health Record (EHR). TCTM, based on Latent Dirichlet Allocations (LDA) and integrated into MapReduce framework, inherently addresses the twin problem of data sparseness and high dimensionality. As a non-parametric Bayesian model, it can flexibly add new attributes or features. Side information associated with corpora, such as section header, timestamp, sentence distance, event distance or disease category in clinical notes makes latent topics more interpretable and more biased toward co referring events. Furthermore, TCTM integrates Hidden Markov Model LDA (HMM-LDA) to obtain the power of both sequential modeling and exchangeability. A MapReduce based variational method is employed to do parameter estimation and inferences, thus enabling TCTM to overcome the bottleneck brought by big data.

KW - event coreference resolution

KW - latent Dirichlet allocations

KW - MapReduce

KW - temporal relation annotation

UR - http://www.scopus.com/inward/record.url?scp=84874602478&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84874602478&partnerID=8YFLogxK

U2 - 10.1109/CGC.2012.112

DO - 10.1109/CGC.2012.112

M3 - Conference contribution

SN - 9780769548647

SP - 708

EP - 715

BT - Proceedings - 2nd International Conference on Cloud and Green Computing and 2nd International Conference on Social Computing and Its Applications, CGC/SCA 2012

ER -