Identifying serendipitous drug usages in patient forum data a feasibility study

Boshu Ru, Charles Warner-Hillard, Yong Ge, Lixia Yao

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Scopus citations

Abstract

Drug repositioning reduces safety risk and development cost, compared to developing new drugs. Computational approaches have examined biological, chemical, literature, and electronic health record data for systematic drug repositioning. In this work, we built an entire computational pipeline to investigate the feasibility of mining a new data source – the fast-growing online patient forum data for identifying and verifying drug-repositioning hypotheses. We curated a gold-standard dataset based on filtered drug reviews from WebMD. Among 15,714 sentences, 447 mentioned novel desirable drug usages that were not listed as known drug indications by WebMD and thus were defined as serendipitous drug usages. We then constructed 347 features using text-mining methods and drug knowledge. Finally we built SVM, random forest and AdaBoost.M1 classifiers and evaluated their classification performance. Our best model achieved an AUC score of 0.937 on the independent test dataset, with precision equal to 0.811 and recall equal to 0.476. It successfully predicted serendipitous drug usages, including metformin and bupropion for obesity, tramadol for depression and ondansetron for irritable bowel syndrome with diarrhea. Machine learning methods make this new data source feasible for studying drug repositioning. Our future efforts include constructing more informative features, developing more effective methods to handle imbalance data, and verifying prediction results using other existing methods.

Original languageEnglish (US)
Title of host publicationHEALTHINF 2017 - 10th International Conference on Health Informatics, Proceedings; Part of 10th International Joint Conference on Biomedical Engineering Systems and Technologies, BIOSTEC 2017
EditorsHugo Gamboa, Ana Fred, Egon L. van den Broek, Mario Vaz
PublisherSciTePress
Pages106-118
Number of pages13
Volume5
ISBN (Electronic)9789897582134
StatePublished - Jan 1 2017
Event10th International Conference on Health Informatics, HEALTHINF 2017 - Part of 10th International Joint Conference on Biomedical Engineering Systems and Technologies, BIOSTEC 2017 - Porto, Portugal
Duration: Feb 21 2017Feb 23 2017

Other

Other10th International Conference on Health Informatics, HEALTHINF 2017 - Part of 10th International Joint Conference on Biomedical Engineering Systems and Technologies, BIOSTEC 2017
CountryPortugal
CityPorto
Period2/21/172/23/17

    Fingerprint

Keywords

  • Drug Repositioning
  • Machine Learning
  • Patient-Reported Outcomes
  • Social Media

ASJC Scopus subject areas

  • Biomedical Engineering
  • Electrical and Electronic Engineering
  • Health Informatics
  • Health Information Management

Cite this

Ru, B., Warner-Hillard, C., Ge, Y., & Yao, L. (2017). Identifying serendipitous drug usages in patient forum data a feasibility study. In H. Gamboa, A. Fred, E. L. van den Broek, & M. Vaz (Eds.), HEALTHINF 2017 - 10th International Conference on Health Informatics, Proceedings; Part of 10th International Joint Conference on Biomedical Engineering Systems and Technologies, BIOSTEC 2017 (Vol. 5, pp. 106-118). SciTePress.