Identifying serendipitous drug usages in patient forum data a feasibility study

Boshu Ru, Charles Warner-Hillard, Yong Ge, Lixia Yao

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Citations (Scopus)

Abstract

Drug repositioning reduces safety risk and development cost, compared to developing new drugs. Computational approaches have examined biological, chemical, literature, and electronic health record data for systematic drug repositioning. In this work, we built an entire computational pipeline to investigate the feasibility of mining a new data source – the fast-growing online patient forum data for identifying and verifying drug-repositioning hypotheses. We curated a gold-standard dataset based on filtered drug reviews from WebMD. Among 15,714 sentences, 447 mentioned novel desirable drug usages that were not listed as known drug indications by WebMD and thus were defined as serendipitous drug usages. We then constructed 347 features using text-mining methods and drug knowledge. Finally we built SVM, random forest and AdaBoost.M1 classifiers and evaluated their classification performance. Our best model achieved an AUC score of 0.937 on the independent test dataset, with precision equal to 0.811 and recall equal to 0.476. It successfully predicted serendipitous drug usages, including metformin and bupropion for obesity, tramadol for depression and ondansetron for irritable bowel syndrome with diarrhea. Machine learning methods make this new data source feasible for studying drug repositioning. Our future efforts include constructing more informative features, developing more effective methods to handle imbalance data, and verifying prediction results using other existing methods.

Original languageEnglish (US)
Title of host publicationHEALTHINF 2017 - 10th International Conference on Health Informatics, Proceedings; Part of 10th International Joint Conference on Biomedical Engineering Systems and Technologies, BIOSTEC 2017
EditorsHugo Gamboa, Ana Fred, Egon L. van den Broek, Mario Vaz
PublisherSciTePress
Pages106-118
Number of pages13
Volume5
ISBN (Electronic)9789897582134
StatePublished - Jan 1 2017
Event10th International Conference on Health Informatics, HEALTHINF 2017 - Part of 10th International Joint Conference on Biomedical Engineering Systems and Technologies, BIOSTEC 2017 - Porto, Portugal
Duration: Feb 21 2017Feb 23 2017

Other

Other10th International Conference on Health Informatics, HEALTHINF 2017 - Part of 10th International Joint Conference on Biomedical Engineering Systems and Technologies, BIOSTEC 2017
CountryPortugal
CityPorto
Period2/21/172/23/17

Fingerprint

Adaptive boosting
Feasibility Studies
Drug Repositioning
Learning systems
Classifiers
Pipelines
Gold
Health
Pharmaceutical Preparations
Costs
Information Storage and Retrieval
Bupropion
Tramadol
Ondansetron
Data Mining
Irritable Bowel Syndrome
Electronic Health Records
Metformin
Area Under Curve
Diarrhea

Keywords

  • Drug Repositioning
  • Machine Learning
  • Patient-Reported Outcomes
  • Social Media

ASJC Scopus subject areas

  • Biomedical Engineering
  • Electrical and Electronic Engineering
  • Health Informatics
  • Health Information Management

Cite this

Ru, B., Warner-Hillard, C., Ge, Y., & Yao, L. (2017). Identifying serendipitous drug usages in patient forum data a feasibility study. In H. Gamboa, A. Fred, E. L. van den Broek, & M. Vaz (Eds.), HEALTHINF 2017 - 10th International Conference on Health Informatics, Proceedings; Part of 10th International Joint Conference on Biomedical Engineering Systems and Technologies, BIOSTEC 2017 (Vol. 5, pp. 106-118). SciTePress.

Identifying serendipitous drug usages in patient forum data a feasibility study. / Ru, Boshu; Warner-Hillard, Charles; Ge, Yong; Yao, Lixia.

HEALTHINF 2017 - 10th International Conference on Health Informatics, Proceedings; Part of 10th International Joint Conference on Biomedical Engineering Systems and Technologies, BIOSTEC 2017. ed. / Hugo Gamboa; Ana Fred; Egon L. van den Broek; Mario Vaz. Vol. 5 SciTePress, 2017. p. 106-118.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Ru, B, Warner-Hillard, C, Ge, Y & Yao, L 2017, Identifying serendipitous drug usages in patient forum data a feasibility study. in H Gamboa, A Fred, EL van den Broek & M Vaz (eds), HEALTHINF 2017 - 10th International Conference on Health Informatics, Proceedings; Part of 10th International Joint Conference on Biomedical Engineering Systems and Technologies, BIOSTEC 2017. vol. 5, SciTePress, pp. 106-118, 10th International Conference on Health Informatics, HEALTHINF 2017 - Part of 10th International Joint Conference on Biomedical Engineering Systems and Technologies, BIOSTEC 2017, Porto, Portugal, 2/21/17.
Ru B, Warner-Hillard C, Ge Y, Yao L. Identifying serendipitous drug usages in patient forum data a feasibility study. In Gamboa H, Fred A, van den Broek EL, Vaz M, editors, HEALTHINF 2017 - 10th International Conference on Health Informatics, Proceedings; Part of 10th International Joint Conference on Biomedical Engineering Systems and Technologies, BIOSTEC 2017. Vol. 5. SciTePress. 2017. p. 106-118
Ru, Boshu ; Warner-Hillard, Charles ; Ge, Yong ; Yao, Lixia. / Identifying serendipitous drug usages in patient forum data a feasibility study. HEALTHINF 2017 - 10th International Conference on Health Informatics, Proceedings; Part of 10th International Joint Conference on Biomedical Engineering Systems and Technologies, BIOSTEC 2017. editor / Hugo Gamboa ; Ana Fred ; Egon L. van den Broek ; Mario Vaz. Vol. 5 SciTePress, 2017. pp. 106-118
@inproceedings{e557ecb17fb94b26b2c4584f0c2855ea,
title = "Identifying serendipitous drug usages in patient forum data a feasibility study",
abstract = "Drug repositioning reduces safety risk and development cost, compared to developing new drugs. Computational approaches have examined biological, chemical, literature, and electronic health record data for systematic drug repositioning. In this work, we built an entire computational pipeline to investigate the feasibility of mining a new data source – the fast-growing online patient forum data for identifying and verifying drug-repositioning hypotheses. We curated a gold-standard dataset based on filtered drug reviews from WebMD. Among 15,714 sentences, 447 mentioned novel desirable drug usages that were not listed as known drug indications by WebMD and thus were defined as serendipitous drug usages. We then constructed 347 features using text-mining methods and drug knowledge. Finally we built SVM, random forest and AdaBoost.M1 classifiers and evaluated their classification performance. Our best model achieved an AUC score of 0.937 on the independent test dataset, with precision equal to 0.811 and recall equal to 0.476. It successfully predicted serendipitous drug usages, including metformin and bupropion for obesity, tramadol for depression and ondansetron for irritable bowel syndrome with diarrhea. Machine learning methods make this new data source feasible for studying drug repositioning. Our future efforts include constructing more informative features, developing more effective methods to handle imbalance data, and verifying prediction results using other existing methods.",
keywords = "Drug Repositioning, Machine Learning, Patient-Reported Outcomes, Social Media",
author = "Boshu Ru and Charles Warner-Hillard and Yong Ge and Lixia Yao",
year = "2017",
month = "1",
day = "1",
language = "English (US)",
volume = "5",
pages = "106--118",
editor = "Hugo Gamboa and Ana Fred and {van den Broek}, {Egon L.} and Mario Vaz",
booktitle = "HEALTHINF 2017 - 10th International Conference on Health Informatics, Proceedings; Part of 10th International Joint Conference on Biomedical Engineering Systems and Technologies, BIOSTEC 2017",
publisher = "SciTePress",

}

TY - GEN

T1 - Identifying serendipitous drug usages in patient forum data a feasibility study

AU - Ru, Boshu

AU - Warner-Hillard, Charles

AU - Ge, Yong

AU - Yao, Lixia

PY - 2017/1/1

Y1 - 2017/1/1

N2 - Drug repositioning reduces safety risk and development cost, compared to developing new drugs. Computational approaches have examined biological, chemical, literature, and electronic health record data for systematic drug repositioning. In this work, we built an entire computational pipeline to investigate the feasibility of mining a new data source – the fast-growing online patient forum data for identifying and verifying drug-repositioning hypotheses. We curated a gold-standard dataset based on filtered drug reviews from WebMD. Among 15,714 sentences, 447 mentioned novel desirable drug usages that were not listed as known drug indications by WebMD and thus were defined as serendipitous drug usages. We then constructed 347 features using text-mining methods and drug knowledge. Finally we built SVM, random forest and AdaBoost.M1 classifiers and evaluated their classification performance. Our best model achieved an AUC score of 0.937 on the independent test dataset, with precision equal to 0.811 and recall equal to 0.476. It successfully predicted serendipitous drug usages, including metformin and bupropion for obesity, tramadol for depression and ondansetron for irritable bowel syndrome with diarrhea. Machine learning methods make this new data source feasible for studying drug repositioning. Our future efforts include constructing more informative features, developing more effective methods to handle imbalance data, and verifying prediction results using other existing methods.

AB - Drug repositioning reduces safety risk and development cost, compared to developing new drugs. Computational approaches have examined biological, chemical, literature, and electronic health record data for systematic drug repositioning. In this work, we built an entire computational pipeline to investigate the feasibility of mining a new data source – the fast-growing online patient forum data for identifying and verifying drug-repositioning hypotheses. We curated a gold-standard dataset based on filtered drug reviews from WebMD. Among 15,714 sentences, 447 mentioned novel desirable drug usages that were not listed as known drug indications by WebMD and thus were defined as serendipitous drug usages. We then constructed 347 features using text-mining methods and drug knowledge. Finally we built SVM, random forest and AdaBoost.M1 classifiers and evaluated their classification performance. Our best model achieved an AUC score of 0.937 on the independent test dataset, with precision equal to 0.811 and recall equal to 0.476. It successfully predicted serendipitous drug usages, including metformin and bupropion for obesity, tramadol for depression and ondansetron for irritable bowel syndrome with diarrhea. Machine learning methods make this new data source feasible for studying drug repositioning. Our future efforts include constructing more informative features, developing more effective methods to handle imbalance data, and verifying prediction results using other existing methods.

KW - Drug Repositioning

KW - Machine Learning

KW - Patient-Reported Outcomes

KW - Social Media

UR - http://www.scopus.com/inward/record.url?scp=85051707348&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85051707348&partnerID=8YFLogxK

M3 - Conference contribution

VL - 5

SP - 106

EP - 118

BT - HEALTHINF 2017 - 10th International Conference on Health Informatics, Proceedings; Part of 10th International Joint Conference on Biomedical Engineering Systems and Technologies, BIOSTEC 2017

A2 - Gamboa, Hugo

A2 - Fred, Ana

A2 - van den Broek, Egon L.

A2 - Vaz, Mario

PB - SciTePress

ER -