TY - GEN
T1 - Identifying serendipitous drug usages in patient forum data a feasibility study
AU - Ru, Boshu
AU - Warner-Hillard, Charles
AU - Ge, Yong
AU - Yao, Lixia
N1 - Publisher Copyright:
Copyright © 2017 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved.
PY - 2017
Y1 - 2017
N2 - Drug repositioning reduces safety risk and development cost, compared to developing new drugs. Computational approaches have examined biological, chemical, literature, and electronic health record data for systematic drug repositioning. In this work, we built an entire computational pipeline to investigate the feasibility of mining a new data source – the fast-growing online patient forum data for identifying and verifying drug-repositioning hypotheses. We curated a gold-standard dataset based on filtered drug reviews from WebMD. Among 15,714 sentences, 447 mentioned novel desirable drug usages that were not listed as known drug indications by WebMD and thus were defined as serendipitous drug usages. We then constructed 347 features using text-mining methods and drug knowledge. Finally we built SVM, random forest and AdaBoost.M1 classifiers and evaluated their classification performance. Our best model achieved an AUC score of 0.937 on the independent test dataset, with precision equal to 0.811 and recall equal to 0.476. It successfully predicted serendipitous drug usages, including metformin and bupropion for obesity, tramadol for depression and ondansetron for irritable bowel syndrome with diarrhea. Machine learning methods make this new data source feasible for studying drug repositioning. Our future efforts include constructing more informative features, developing more effective methods to handle imbalance data, and verifying prediction results using other existing methods.
AB - Drug repositioning reduces safety risk and development cost, compared to developing new drugs. Computational approaches have examined biological, chemical, literature, and electronic health record data for systematic drug repositioning. In this work, we built an entire computational pipeline to investigate the feasibility of mining a new data source – the fast-growing online patient forum data for identifying and verifying drug-repositioning hypotheses. We curated a gold-standard dataset based on filtered drug reviews from WebMD. Among 15,714 sentences, 447 mentioned novel desirable drug usages that were not listed as known drug indications by WebMD and thus were defined as serendipitous drug usages. We then constructed 347 features using text-mining methods and drug knowledge. Finally we built SVM, random forest and AdaBoost.M1 classifiers and evaluated their classification performance. Our best model achieved an AUC score of 0.937 on the independent test dataset, with precision equal to 0.811 and recall equal to 0.476. It successfully predicted serendipitous drug usages, including metformin and bupropion for obesity, tramadol for depression and ondansetron for irritable bowel syndrome with diarrhea. Machine learning methods make this new data source feasible for studying drug repositioning. Our future efforts include constructing more informative features, developing more effective methods to handle imbalance data, and verifying prediction results using other existing methods.
KW - Drug Repositioning
KW - Machine Learning
KW - Patient-Reported Outcomes
KW - Social Media
UR - http://www.scopus.com/inward/record.url?scp=85051707348&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85051707348&partnerID=8YFLogxK
U2 - 10.5220/0006145201060118
DO - 10.5220/0006145201060118
M3 - Conference contribution
AN - SCOPUS:85051707348
T3 - HEALTHINF 2017 - 10th International Conference on Health Informatics, Proceedings; Part of 10th International Joint Conference on Biomedical Engineering Systems and Technologies, BIOSTEC 2017
SP - 106
EP - 118
BT - HEALTHINF 2017 - 10th International Conference on Health Informatics, Proceedings; Part of 10th International Joint Conference on Biomedical Engineering Systems and Technologies, BIOSTEC 2017
A2 - van den Broek, Egon L.
A2 - Fred, Ana
A2 - Gamboa, Hugo
A2 - Vaz, Mario
PB - SciTePress
T2 - 10th International Conference on Health Informatics, HEALTHINF 2017 - Part of 10th International Joint Conference on Biomedical Engineering Systems and Technologies, BIOSTEC 2017
Y2 - 21 February 2017 through 23 February 2017
ER -