Extending association rule summarization techniques to assess risk of diabetes mellitus

György J. Simon, Pedro J. Caraballo, Terry M. Therneau, Steven S. Cha, M. Regina Castro, Peter W. Li

Research output: Contribution to journalArticlepeer-review

35 Scopus citations

Abstract

Early detection of patients with elevated risk of developing diabetes mellitus is critical to the improved prevention and overall clinical management of these patients. We aim to apply association rule mining to electronic medical records (EMR) to discover sets of risk factors and their corresponding subpopulations that represent patients at particularly high risk of developing diabetes. Given the high dimensionality of EMRs, association rule mining generates a very large set of rules which we need to summarize for easy clinical use. We reviewed four association rule set summarization techniques and conducted a comparative evaluation to provide guidance regarding their applicability, strengths and weaknesses. We proposed extensions to incorporate risk of diabetes into the process of finding an optimal summary. We evaluated these modified techniques on a real-world prediabetic patient cohort. We found that all four methods produced summaries that described subpopulations at high risk of diabetes with each method having its clear strength. For our purpose, our extension to the Buttom-Up Summarization (BUS) algorithm produced the most suitable summary. The subpopulations identified by this summary covered most high-risk patients, had low overlap and were at very high risk of diabetes.

Original languageEnglish (US)
Article number6955774
Pages (from-to)130-141
Number of pages12
JournalIEEE Transactions on Knowledge and Data Engineering
Volume27
Issue number1
DOIs
StatePublished - Jan 1 2015

Keywords

  • Association rule summarization
  • Association rules
  • Data mining
  • Survival analysis

ASJC Scopus subject areas

  • Information Systems
  • Computer Science Applications
  • Computational Theory and Mathematics

Fingerprint

Dive into the research topics of 'Extending association rule summarization techniques to assess risk of diabetes mellitus'. Together they form a unique fingerprint.

Cite this