TY - GEN
T1 - Ontology-based temporal relation modeling with map-reduce latent dirichlet allocations for big EHR data
AU - Li, Dingcheng
AU - Tao, Cui
AU - Liu, Hongfang
AU - Chute, Christopher
PY - 2012
Y1 - 2012
N2 - In this paper, we propose a model called Temporal & Co reference Topic Modeling (TCTM) to do automatic annotation with respect to the Time Event Ontology (TEO) for the big-size Electronic Health Record (EHR). TCTM, based on Latent Dirichlet Allocations (LDA) and integrated into MapReduce framework, inherently addresses the twin problem of data sparseness and high dimensionality. As a non-parametric Bayesian model, it can flexibly add new attributes or features. Side information associated with corpora, such as section header, timestamp, sentence distance, event distance or disease category in clinical notes makes latent topics more interpretable and more biased toward co referring events. Furthermore, TCTM integrates Hidden Markov Model LDA (HMM-LDA) to obtain the power of both sequential modeling and exchangeability. A MapReduce based variational method is employed to do parameter estimation and inferences, thus enabling TCTM to overcome the bottleneck brought by big data.
AB - In this paper, we propose a model called Temporal & Co reference Topic Modeling (TCTM) to do automatic annotation with respect to the Time Event Ontology (TEO) for the big-size Electronic Health Record (EHR). TCTM, based on Latent Dirichlet Allocations (LDA) and integrated into MapReduce framework, inherently addresses the twin problem of data sparseness and high dimensionality. As a non-parametric Bayesian model, it can flexibly add new attributes or features. Side information associated with corpora, such as section header, timestamp, sentence distance, event distance or disease category in clinical notes makes latent topics more interpretable and more biased toward co referring events. Furthermore, TCTM integrates Hidden Markov Model LDA (HMM-LDA) to obtain the power of both sequential modeling and exchangeability. A MapReduce based variational method is employed to do parameter estimation and inferences, thus enabling TCTM to overcome the bottleneck brought by big data.
KW - MapReduce
KW - event coreference resolution
KW - latent Dirichlet allocations
KW - temporal relation annotation
UR - http://www.scopus.com/inward/record.url?scp=84874602478&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84874602478&partnerID=8YFLogxK
U2 - 10.1109/CGC.2012.112
DO - 10.1109/CGC.2012.112
M3 - Conference contribution
AN - SCOPUS:84874602478
SN - 9780769548647
T3 - Proceedings - 2nd International Conference on Cloud and Green Computing and 2nd International Conference on Social Computing and Its Applications, CGC/SCA 2012
SP - 708
EP - 715
BT - Proceedings - 2nd International Conference on Cloud and Green Computing and 2nd International Conference on Social Computing and Its Applications, CGC/SCA 2012
T2 - 2nd International Conference on Cloud and Green Computing, CGC 2012, Held Jointly with the 2nd International Conference on Social Computing and Its Applications, SCA 2012
Y2 - 1 November 2012 through 3 November 2012
ER -