Frequent Causal Pattern Mining: A Computationally Efficient Framework for Estimating Bias-Corrected Effects

Pranjul Yadav; Michael Steinbach; M. Regina Castro; Pedro J. Caraballo; Vipin Kumar; Gyorgy Simon

doi:10.1109/BigData47090.2019.9005977

Frequent Causal Pattern Mining: A Computationally Efficient Framework for Estimating Bias-Corrected Effects

Pranjul Yadav, Michael Steinbach, M. Regina Castro, Pedro J. Caraballo, Vipin Kumar, Gyorgy Simon

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Abstract

Our aging population increasingly suffers from multiple chronic diseases simultaneously, necessitating the comprehensive treatment of these conditions. Finding the optimal set of drugs for a combinatorial set of diseases is a combinatorial pattern exploration problem. Association rule mining is a popular tool for such problems, but the requirement of health care for finding causal, rather than associative, patterns renders association rule mining unsuitable. To address this issue, we propose a novel framework based on the Rubin-Neyman causal model for extracting causal rules from observational data, correcting for a number of common biases. Specifically, given a set of interventions and a set of items that define subpopulations (e.g., diseases), we wish to find all subpopulations in which effective intervention combinations exist and in each such subpopulation, we wish to find all intervention combinations such that dropping any intervention from this combination will reduce the efficacy of the treatment. A key aspect of our framework is the concept of closed intervention sets which extend the concept of quantifying the effect of a single intervention to a set of concurrent interventions. Closed intervention sets also allow for a pruning strategy that is strictly more efficient than the traditional pruning strategy used by the Apriori algorithm. To implement our ideas, we introduce and compare five methods of estimating causal effect from observational data and rigorously evaluate them on synthetic data to mathematically prove (when possible) why they work. We also evaluated our causal rule mining framework on the Electronic Health Records (EHR) data of a large cohort of 152000 patients from Mayo Clinic and showed that the patterns we extracted are sufficiently rich to explain the controversial findings in the medical literature regarding the effect of a class of cholesterol drugs on Type-II Diabetes Mellitus (T2DM).

Original language	English (US)
Title of host publication	Proceedings - 2019 IEEE International Conference on Big Data, Big Data 2019
Editors	Chaitanya Baru, Jun Huan, Latifur Khan, Xiaohua Tony Hu, Ronay Ak, Yuanyuan Tian, Roger Barga, Carlo Zaniolo, Kisung Lee, Yanfang Fanny Ye
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	1981-1990
Number of pages	10
ISBN (Electronic)	9781728108582
DOIs	https://doi.org/10.1109/BigData47090.2019.9005977
State	Published - Dec 2019
Event	2019 IEEE International Conference on Big Data, Big Data 2019 - Los Angeles, United States Duration: Dec 9 2019 → Dec 12 2019

Publication series

Name	Proceedings - 2019 IEEE International Conference on Big Data, Big Data 2019

Conference

Conference	2019 IEEE International Conference on Big Data, Big Data 2019
Country/Territory	United States
City	Los Angeles
Period	12/9/19 → 12/12/19

ASJC Scopus subject areas

Artificial Intelligence
Computer Networks and Communications
Information Systems
Information Systems and Management

Access to Document

10.1109/BigData47090.2019.9005977

Cite this

Yadav, P., Steinbach, M., Castro, M. R., Caraballo, P. J., Kumar, V., & Simon, G. (2019). Frequent Causal Pattern Mining: A Computationally Efficient Framework for Estimating Bias-Corrected Effects. In C. Baru, J. Huan, L. Khan, X. T. Hu, R. Ak, Y. Tian, R. Barga, C. Zaniolo, K. Lee, & Y. F. Ye (Eds.), Proceedings - 2019 IEEE International Conference on Big Data, Big Data 2019 (pp. 1981-1990). Article 9005977 (Proceedings - 2019 IEEE International Conference on Big Data, Big Data 2019). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/BigData47090.2019.9005977

Frequent Causal Pattern Mining: A Computationally Efficient Framework for Estimating Bias-Corrected Effects. / Yadav, Pranjul; Steinbach, Michael; Castro, M. Regina et al.
Proceedings - 2019 IEEE International Conference on Big Data, Big Data 2019. ed. / Chaitanya Baru; Jun Huan; Latifur Khan; Xiaohua Tony Hu; Ronay Ak; Yuanyuan Tian; Roger Barga; Carlo Zaniolo; Kisung Lee; Yanfang Fanny Ye. Institute of Electrical and Electronics Engineers Inc., 2019. p. 1981-1990 9005977 (Proceedings - 2019 IEEE International Conference on Big Data, Big Data 2019).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Yadav, P, Steinbach, M, Castro, MR , Caraballo, PJ, Kumar, V & Simon, G 2019, Frequent Causal Pattern Mining: A Computationally Efficient Framework for Estimating Bias-Corrected Effects. in C Baru, J Huan, L Khan, XT Hu, R Ak, Y Tian, R Barga, C Zaniolo, K Lee & YF Ye (eds), Proceedings - 2019 IEEE International Conference on Big Data, Big Data 2019., 9005977, Proceedings - 2019 IEEE International Conference on Big Data, Big Data 2019, Institute of Electrical and Electronics Engineers Inc., pp. 1981-1990, 2019 IEEE International Conference on Big Data, Big Data 2019, Los Angeles, United States, 12/9/19. https://doi.org/10.1109/BigData47090.2019.9005977

Yadav P, Steinbach M, Castro MR , Caraballo PJ, Kumar V, Simon G. Frequent Causal Pattern Mining: A Computationally Efficient Framework for Estimating Bias-Corrected Effects. In Baru C, Huan J, Khan L, Hu XT, Ak R, Tian Y, Barga R, Zaniolo C, Lee K, Ye YF, editors, Proceedings - 2019 IEEE International Conference on Big Data, Big Data 2019. Institute of Electrical and Electronics Engineers Inc. 2019. p. 1981-1990. 9005977. (Proceedings - 2019 IEEE International Conference on Big Data, Big Data 2019). doi: 10.1109/BigData47090.2019.9005977

Yadav, Pranjul ; Steinbach, Michael ; Castro, M. Regina et al. / Frequent Causal Pattern Mining : A Computationally Efficient Framework for Estimating Bias-Corrected Effects. Proceedings - 2019 IEEE International Conference on Big Data, Big Data 2019. editor / Chaitanya Baru ; Jun Huan ; Latifur Khan ; Xiaohua Tony Hu ; Ronay Ak ; Yuanyuan Tian ; Roger Barga ; Carlo Zaniolo ; Kisung Lee ; Yanfang Fanny Ye. Institute of Electrical and Electronics Engineers Inc., 2019. pp. 1981-1990 (Proceedings - 2019 IEEE International Conference on Big Data, Big Data 2019).

@inproceedings{96fc8d58cf5c4a28b6c95febe8ea2205,

title = "Frequent Causal Pattern Mining: A Computationally Efficient Framework for Estimating Bias-Corrected Effects",

abstract = "Our aging population increasingly suffers from multiple chronic diseases simultaneously, necessitating the comprehensive treatment of these conditions. Finding the optimal set of drugs for a combinatorial set of diseases is a combinatorial pattern exploration problem. Association rule mining is a popular tool for such problems, but the requirement of health care for finding causal, rather than associative, patterns renders association rule mining unsuitable. To address this issue, we propose a novel framework based on the Rubin-Neyman causal model for extracting causal rules from observational data, correcting for a number of common biases. Specifically, given a set of interventions and a set of items that define subpopulations (e.g., diseases), we wish to find all subpopulations in which effective intervention combinations exist and in each such subpopulation, we wish to find all intervention combinations such that dropping any intervention from this combination will reduce the efficacy of the treatment. A key aspect of our framework is the concept of closed intervention sets which extend the concept of quantifying the effect of a single intervention to a set of concurrent interventions. Closed intervention sets also allow for a pruning strategy that is strictly more efficient than the traditional pruning strategy used by the Apriori algorithm. To implement our ideas, we introduce and compare five methods of estimating causal effect from observational data and rigorously evaluate them on synthetic data to mathematically prove (when possible) why they work. We also evaluated our causal rule mining framework on the Electronic Health Records (EHR) data of a large cohort of 152000 patients from Mayo Clinic and showed that the patterns we extracted are sufficiently rich to explain the controversial findings in the medical literature regarding the effect of a class of cholesterol drugs on Type-II Diabetes Mellitus (T2DM).",

author = "Pranjul Yadav and Michael Steinbach and Castro, {M. Regina} and Caraballo, {Pedro J.} and Vipin Kumar and Gyorgy Simon",

note = "Publisher Copyright: {\textcopyright} 2019 IEEE.; 2019 IEEE International Conference on Big Data, Big Data 2019 ; Conference date: 09-12-2019 Through 12-12-2019",

year = "2019",

month = dec,

doi = "10.1109/BigData47090.2019.9005977",

language = "English (US)",

series = "Proceedings - 2019 IEEE International Conference on Big Data, Big Data 2019",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "1981--1990",

editor = "Chaitanya Baru and Jun Huan and Latifur Khan and Hu, {Xiaohua Tony} and Ronay Ak and Yuanyuan Tian and Roger Barga and Carlo Zaniolo and Kisung Lee and Ye, {Yanfang Fanny}",

booktitle = "Proceedings - 2019 IEEE International Conference on Big Data, Big Data 2019",

}

TY - GEN

T1 - Frequent Causal Pattern Mining

T2 - 2019 IEEE International Conference on Big Data, Big Data 2019

AU - Yadav, Pranjul

AU - Steinbach, Michael

AU - Castro, M. Regina

AU - Caraballo, Pedro J.

AU - Kumar, Vipin

AU - Simon, Gyorgy

PY - 2019/12

Y1 - 2019/12

N2 - Our aging population increasingly suffers from multiple chronic diseases simultaneously, necessitating the comprehensive treatment of these conditions. Finding the optimal set of drugs for a combinatorial set of diseases is a combinatorial pattern exploration problem. Association rule mining is a popular tool for such problems, but the requirement of health care for finding causal, rather than associative, patterns renders association rule mining unsuitable. To address this issue, we propose a novel framework based on the Rubin-Neyman causal model for extracting causal rules from observational data, correcting for a number of common biases. Specifically, given a set of interventions and a set of items that define subpopulations (e.g., diseases), we wish to find all subpopulations in which effective intervention combinations exist and in each such subpopulation, we wish to find all intervention combinations such that dropping any intervention from this combination will reduce the efficacy of the treatment. A key aspect of our framework is the concept of closed intervention sets which extend the concept of quantifying the effect of a single intervention to a set of concurrent interventions. Closed intervention sets also allow for a pruning strategy that is strictly more efficient than the traditional pruning strategy used by the Apriori algorithm. To implement our ideas, we introduce and compare five methods of estimating causal effect from observational data and rigorously evaluate them on synthetic data to mathematically prove (when possible) why they work. We also evaluated our causal rule mining framework on the Electronic Health Records (EHR) data of a large cohort of 152000 patients from Mayo Clinic and showed that the patterns we extracted are sufficiently rich to explain the controversial findings in the medical literature regarding the effect of a class of cholesterol drugs on Type-II Diabetes Mellitus (T2DM).

AB - Our aging population increasingly suffers from multiple chronic diseases simultaneously, necessitating the comprehensive treatment of these conditions. Finding the optimal set of drugs for a combinatorial set of diseases is a combinatorial pattern exploration problem. Association rule mining is a popular tool for such problems, but the requirement of health care for finding causal, rather than associative, patterns renders association rule mining unsuitable. To address this issue, we propose a novel framework based on the Rubin-Neyman causal model for extracting causal rules from observational data, correcting for a number of common biases. Specifically, given a set of interventions and a set of items that define subpopulations (e.g., diseases), we wish to find all subpopulations in which effective intervention combinations exist and in each such subpopulation, we wish to find all intervention combinations such that dropping any intervention from this combination will reduce the efficacy of the treatment. A key aspect of our framework is the concept of closed intervention sets which extend the concept of quantifying the effect of a single intervention to a set of concurrent interventions. Closed intervention sets also allow for a pruning strategy that is strictly more efficient than the traditional pruning strategy used by the Apriori algorithm. To implement our ideas, we introduce and compare five methods of estimating causal effect from observational data and rigorously evaluate them on synthetic data to mathematically prove (when possible) why they work. We also evaluated our causal rule mining framework on the Electronic Health Records (EHR) data of a large cohort of 152000 patients from Mayo Clinic and showed that the patterns we extracted are sufficiently rich to explain the controversial findings in the medical literature regarding the effect of a class of cholesterol drugs on Type-II Diabetes Mellitus (T2DM).

UR - http://www.scopus.com/inward/record.url?scp=85081302747&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85081302747&partnerID=8YFLogxK

U2 - 10.1109/BigData47090.2019.9005977

DO - 10.1109/BigData47090.2019.9005977

M3 - Conference contribution

AN - SCOPUS:85081302747

T3 - Proceedings - 2019 IEEE International Conference on Big Data, Big Data 2019

SP - 1981

EP - 1990

BT - Proceedings - 2019 IEEE International Conference on Big Data, Big Data 2019

A2 - Baru, Chaitanya

A2 - Huan, Jun

A2 - Khan, Latifur

A2 - Hu, Xiaohua Tony

A2 - Ak, Ronay

A2 - Tian, Yuanyuan

A2 - Barga, Roger

A2 - Zaniolo, Carlo

A2 - Lee, Kisung

A2 - Ye, Yanfang Fanny

PB - Institute of Electrical and Electronics Engineers Inc.

Y2 - 9 December 2019 through 12 December 2019

ER -

Frequent Causal Pattern Mining: A Computationally Efficient Framework for Estimating Bias-Corrected Effects

Abstract

Publication series

Conference

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this