Identifying serendipitous drug usages in patient forum data a feasibility study

Boshu Ru; Charles Warner-Hillard; Yong Ge; Lixia Yao

doi:10.5220/0006145201060118

Identifying serendipitous drug usages in patient forum data a feasibility study

Boshu Ru, Charles Warner-Hillard, Yong Ge, Lixia Yao

Quantitative Health Sciences

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

4 Scopus citations

Abstract

Drug repositioning reduces safety risk and development cost, compared to developing new drugs. Computational approaches have examined biological, chemical, literature, and electronic health record data for systematic drug repositioning. In this work, we built an entire computational pipeline to investigate the feasibility of mining a new data source – the fast-growing online patient forum data for identifying and verifying drug-repositioning hypotheses. We curated a gold-standard dataset based on filtered drug reviews from WebMD. Among 15,714 sentences, 447 mentioned novel desirable drug usages that were not listed as known drug indications by WebMD and thus were defined as serendipitous drug usages. We then constructed 347 features using text-mining methods and drug knowledge. Finally we built SVM, random forest and AdaBoost.M1 classifiers and evaluated their classification performance. Our best model achieved an AUC score of 0.937 on the independent test dataset, with precision equal to 0.811 and recall equal to 0.476. It successfully predicted serendipitous drug usages, including metformin and bupropion for obesity, tramadol for depression and ondansetron for irritable bowel syndrome with diarrhea. Machine learning methods make this new data source feasible for studying drug repositioning. Our future efforts include constructing more informative features, developing more effective methods to handle imbalance data, and verifying prediction results using other existing methods.

Original language	English (US)
Title of host publication	HEALTHINF 2017 - 10th International Conference on Health Informatics, Proceedings; Part of 10th International Joint Conference on Biomedical Engineering Systems and Technologies, BIOSTEC 2017
Editors	Egon L. van den Broek, Ana Fred, Hugo Gamboa, Mario Vaz
Publisher	SciTePress
Pages	106-118
Number of pages	13
ISBN (Electronic)	9789897582134
DOIs	https://doi.org/10.5220/0006145201060118
State	Published - 2017
Event	10th International Conference on Health Informatics, HEALTHINF 2017 - Part of 10th International Joint Conference on Biomedical Engineering Systems and Technologies, BIOSTEC 2017 - Porto, Portugal Duration: Feb 21 2017 → Feb 23 2017

Publication series

Name	HEALTHINF 2017 - 10th International Conference on Health Informatics, Proceedings; Part of 10th International Joint Conference on Biomedical Engineering Systems and Technologies, BIOSTEC 2017
Volume	5

Other

Other	10th International Conference on Health Informatics, HEALTHINF 2017 - Part of 10th International Joint Conference on Biomedical Engineering Systems and Technologies, BIOSTEC 2017
Country/Territory	Portugal
City	Porto
Period	2/21/17 → 2/23/17

Keywords

Drug Repositioning
Machine Learning
Patient-Reported Outcomes
Social Media

ASJC Scopus subject areas

Biomedical Engineering
Electrical and Electronic Engineering
Health Informatics
Health Information Management

Access to Document

10.5220/0006145201060118

Cite this

Ru, B., Warner-Hillard, C., Ge, Y., & Yao, L. (2017). Identifying serendipitous drug usages in patient forum data a feasibility study. In E. L. van den Broek, A. Fred, H. Gamboa, & M. Vaz (Eds.), HEALTHINF 2017 - 10th International Conference on Health Informatics, Proceedings; Part of 10th International Joint Conference on Biomedical Engineering Systems and Technologies, BIOSTEC 2017 (pp. 106-118). (HEALTHINF 2017 - 10th International Conference on Health Informatics, Proceedings; Part of 10th International Joint Conference on Biomedical Engineering Systems and Technologies, BIOSTEC 2017; Vol. 5). SciTePress. https://doi.org/10.5220/0006145201060118

Identifying serendipitous drug usages in patient forum data a feasibility study. / Ru, Boshu; Warner-Hillard, Charles; Ge, Yong et al.
HEALTHINF 2017 - 10th International Conference on Health Informatics, Proceedings; Part of 10th International Joint Conference on Biomedical Engineering Systems and Technologies, BIOSTEC 2017. ed. / Egon L. van den Broek; Ana Fred; Hugo Gamboa; Mario Vaz. SciTePress, 2017. p. 106-118 (HEALTHINF 2017 - 10th International Conference on Health Informatics, Proceedings; Part of 10th International Joint Conference on Biomedical Engineering Systems and Technologies, BIOSTEC 2017; Vol. 5).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Ru, B, Warner-Hillard, C, Ge, Y & Yao, L 2017, Identifying serendipitous drug usages in patient forum data a feasibility study. in EL van den Broek, A Fred, H Gamboa & M Vaz (eds), HEALTHINF 2017 - 10th International Conference on Health Informatics, Proceedings; Part of 10th International Joint Conference on Biomedical Engineering Systems and Technologies, BIOSTEC 2017. HEALTHINF 2017 - 10th International Conference on Health Informatics, Proceedings; Part of 10th International Joint Conference on Biomedical Engineering Systems and Technologies, BIOSTEC 2017, vol. 5, SciTePress, pp. 106-118, 10th International Conference on Health Informatics, HEALTHINF 2017 - Part of 10th International Joint Conference on Biomedical Engineering Systems and Technologies, BIOSTEC 2017, Porto, Portugal, 2/21/17. https://doi.org/10.5220/0006145201060118

Ru B, Warner-Hillard C, Ge Y, Yao L. Identifying serendipitous drug usages in patient forum data a feasibility study. In van den Broek EL, Fred A, Gamboa H, Vaz M, editors, HEALTHINF 2017 - 10th International Conference on Health Informatics, Proceedings; Part of 10th International Joint Conference on Biomedical Engineering Systems and Technologies, BIOSTEC 2017. SciTePress. 2017. p. 106-118. (HEALTHINF 2017 - 10th International Conference on Health Informatics, Proceedings; Part of 10th International Joint Conference on Biomedical Engineering Systems and Technologies, BIOSTEC 2017). doi: 10.5220/0006145201060118

Ru, Boshu ; Warner-Hillard, Charles ; Ge, Yong et al. / Identifying serendipitous drug usages in patient forum data a feasibility study. HEALTHINF 2017 - 10th International Conference on Health Informatics, Proceedings; Part of 10th International Joint Conference on Biomedical Engineering Systems and Technologies, BIOSTEC 2017. editor / Egon L. van den Broek ; Ana Fred ; Hugo Gamboa ; Mario Vaz. SciTePress, 2017. pp. 106-118 (HEALTHINF 2017 - 10th International Conference on Health Informatics, Proceedings; Part of 10th International Joint Conference on Biomedical Engineering Systems and Technologies, BIOSTEC 2017).

@inproceedings{e557ecb17fb94b26b2c4584f0c2855ea,

title = "Identifying serendipitous drug usages in patient forum data a feasibility study",

abstract = "Drug repositioning reduces safety risk and development cost, compared to developing new drugs. Computational approaches have examined biological, chemical, literature, and electronic health record data for systematic drug repositioning. In this work, we built an entire computational pipeline to investigate the feasibility of mining a new data source – the fast-growing online patient forum data for identifying and verifying drug-repositioning hypotheses. We curated a gold-standard dataset based on filtered drug reviews from WebMD. Among 15,714 sentences, 447 mentioned novel desirable drug usages that were not listed as known drug indications by WebMD and thus were defined as serendipitous drug usages. We then constructed 347 features using text-mining methods and drug knowledge. Finally we built SVM, random forest and AdaBoost.M1 classifiers and evaluated their classification performance. Our best model achieved an AUC score of 0.937 on the independent test dataset, with precision equal to 0.811 and recall equal to 0.476. It successfully predicted serendipitous drug usages, including metformin and bupropion for obesity, tramadol for depression and ondansetron for irritable bowel syndrome with diarrhea. Machine learning methods make this new data source feasible for studying drug repositioning. Our future efforts include constructing more informative features, developing more effective methods to handle imbalance data, and verifying prediction results using other existing methods.",

keywords = "Drug Repositioning, Machine Learning, Patient-Reported Outcomes, Social Media",

author = "Boshu Ru and Charles Warner-Hillard and Yong Ge and Lixia Yao",

note = "Publisher Copyright: Copyright {\textcopyright} 2017 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved.; 10th International Conference on Health Informatics, HEALTHINF 2017 - Part of 10th International Joint Conference on Biomedical Engineering Systems and Technologies, BIOSTEC 2017 ; Conference date: 21-02-2017 Through 23-02-2017",

year = "2017",

doi = "10.5220/0006145201060118",

language = "English (US)",

series = "HEALTHINF 2017 - 10th International Conference on Health Informatics, Proceedings; Part of 10th International Joint Conference on Biomedical Engineering Systems and Technologies, BIOSTEC 2017",

publisher = "SciTePress",

pages = "106--118",

editor = "{van den Broek}, {Egon L.} and Ana Fred and Hugo Gamboa and Mario Vaz",

booktitle = "HEALTHINF 2017 - 10th International Conference on Health Informatics, Proceedings; Part of 10th International Joint Conference on Biomedical Engineering Systems and Technologies, BIOSTEC 2017",

}

TY - GEN

T1 - Identifying serendipitous drug usages in patient forum data a feasibility study

AU - Ru, Boshu

AU - Warner-Hillard, Charles

AU - Ge, Yong

AU - Yao, Lixia

PY - 2017

Y1 - 2017

N2 - Drug repositioning reduces safety risk and development cost, compared to developing new drugs. Computational approaches have examined biological, chemical, literature, and electronic health record data for systematic drug repositioning. In this work, we built an entire computational pipeline to investigate the feasibility of mining a new data source – the fast-growing online patient forum data for identifying and verifying drug-repositioning hypotheses. We curated a gold-standard dataset based on filtered drug reviews from WebMD. Among 15,714 sentences, 447 mentioned novel desirable drug usages that were not listed as known drug indications by WebMD and thus were defined as serendipitous drug usages. We then constructed 347 features using text-mining methods and drug knowledge. Finally we built SVM, random forest and AdaBoost.M1 classifiers and evaluated their classification performance. Our best model achieved an AUC score of 0.937 on the independent test dataset, with precision equal to 0.811 and recall equal to 0.476. It successfully predicted serendipitous drug usages, including metformin and bupropion for obesity, tramadol for depression and ondansetron for irritable bowel syndrome with diarrhea. Machine learning methods make this new data source feasible for studying drug repositioning. Our future efforts include constructing more informative features, developing more effective methods to handle imbalance data, and verifying prediction results using other existing methods.

AB - Drug repositioning reduces safety risk and development cost, compared to developing new drugs. Computational approaches have examined biological, chemical, literature, and electronic health record data for systematic drug repositioning. In this work, we built an entire computational pipeline to investigate the feasibility of mining a new data source – the fast-growing online patient forum data for identifying and verifying drug-repositioning hypotheses. We curated a gold-standard dataset based on filtered drug reviews from WebMD. Among 15,714 sentences, 447 mentioned novel desirable drug usages that were not listed as known drug indications by WebMD and thus were defined as serendipitous drug usages. We then constructed 347 features using text-mining methods and drug knowledge. Finally we built SVM, random forest and AdaBoost.M1 classifiers and evaluated their classification performance. Our best model achieved an AUC score of 0.937 on the independent test dataset, with precision equal to 0.811 and recall equal to 0.476. It successfully predicted serendipitous drug usages, including metformin and bupropion for obesity, tramadol for depression and ondansetron for irritable bowel syndrome with diarrhea. Machine learning methods make this new data source feasible for studying drug repositioning. Our future efforts include constructing more informative features, developing more effective methods to handle imbalance data, and verifying prediction results using other existing methods.

KW - Drug Repositioning

KW - Machine Learning

KW - Patient-Reported Outcomes

KW - Social Media

UR - http://www.scopus.com/inward/record.url?scp=85051707348&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85051707348&partnerID=8YFLogxK

U2 - 10.5220/0006145201060118

DO - 10.5220/0006145201060118

M3 - Conference contribution

AN - SCOPUS:85051707348

T3 - HEALTHINF 2017 - 10th International Conference on Health Informatics, Proceedings; Part of 10th International Joint Conference on Biomedical Engineering Systems and Technologies, BIOSTEC 2017

SP - 106

EP - 118

BT - HEALTHINF 2017 - 10th International Conference on Health Informatics, Proceedings; Part of 10th International Joint Conference on Biomedical Engineering Systems and Technologies, BIOSTEC 2017

A2 - van den Broek, Egon L.

A2 - Fred, Ana

A2 - Gamboa, Hugo

A2 - Vaz, Mario

PB - SciTePress

T2 - 10th International Conference on Health Informatics, HEALTHINF 2017 - Part of 10th International Joint Conference on Biomedical Engineering Systems and Technologies, BIOSTEC 2017

Y2 - 21 February 2017 through 23 February 2017

ER -

Identifying serendipitous drug usages in patient forum data a feasibility study

Abstract

Publication series

Other

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this