Serendipity - A Machine-Learning Application for Mining Serendipitous Drug Usage from Social Media

Boshu Ru; Dingcheng Li; Yueqi Hu; Lixia Yao

doi:10.1109/TNB.2019.2909094

Serendipity - A Machine-Learning Application for Mining Serendipitous Drug Usage from Social Media

Boshu Ru, Dingcheng Li, Yueqi Hu, Lixia Yao

Quantitative Health Sciences

Research output: Contribution to journal › Article › peer-review

4 Scopus citations

Abstract

Serendipitous drug usage refers to the unexpected relief of comorbid diseases or symptoms when taking medication for a different known indication. Historically, serendipity has contributed significantly to identifying many new drug indications. If patient-reported serendipitous drug usage in social media could be computationally identified, it could help generate and validate drug-repositioning hypotheses. We investigated deep neural network models for mining serendipitous drug usage from social media. We used the word2vec algorithm to construct word-embedding features from drug reviews posted in a WebMD patient forum. We adapted and redesigned the convolutional neural network, long short-term memory network, and convolutional long short-term memory network by adding contextual information extracted from drug-review posts, information-filtering tools, medical ontology, and medical knowledge. We trained, tuned, and evaluated our models with a gold-standard dataset of 15714 sentences (447 [2.8%] describing serendipitous drug usage). Additionally, we compared our deep neural networks to support vector machine, random forest, and AdaBoost.M1 algorithms. Context information helped to reduce the false-positive rate of deep neural network models. If we used an extremely imbalanced dataset with limited instances of serendipitous drug usage, deep neural network models did not outperform other machine-learning models with n-gram and context features. However, deep neural network models could more effectively use word embedding in feature construction, an advantage that makes them worthy of further investigation. Finally, we implemented natural-language processing and machine-learning methods in a web-based application to help scientists and software developers mine social media for serendipitous drug usage.

Original language	English (US)
Article number	8681431
Pages (from-to)	324-334
Number of pages	11
Journal	IEEE Transactions on Nanobioscience
Volume	18
Issue number	3
DOIs	https://doi.org/10.1109/TNB.2019.2909094
State	Published - Jul 2019

Keywords

Social media
data mining
drug discovery
drug repurposing
health informatics

ASJC Scopus subject areas

Bioengineering
Electrical and Electronic Engineering
Biotechnology
Biomedical Engineering
Medicine (miscellaneous)
Computer Science Applications
Pharmaceutical Science

Access to Document

10.1109/TNB.2019.2909094

Cite this

@article{05490083ae274031a39a8ba7bb14ffe7,

title = "Serendipity - A Machine-Learning Application for Mining Serendipitous Drug Usage from Social Media",

abstract = "Serendipitous drug usage refers to the unexpected relief of comorbid diseases or symptoms when taking medication for a different known indication. Historically, serendipity has contributed significantly to identifying many new drug indications. If patient-reported serendipitous drug usage in social media could be computationally identified, it could help generate and validate drug-repositioning hypotheses. We investigated deep neural network models for mining serendipitous drug usage from social media. We used the word2vec algorithm to construct word-embedding features from drug reviews posted in a WebMD patient forum. We adapted and redesigned the convolutional neural network, long short-term memory network, and convolutional long short-term memory network by adding contextual information extracted from drug-review posts, information-filtering tools, medical ontology, and medical knowledge. We trained, tuned, and evaluated our models with a gold-standard dataset of 15714 sentences (447 [2.8%] describing serendipitous drug usage). Additionally, we compared our deep neural networks to support vector machine, random forest, and AdaBoost.M1 algorithms. Context information helped to reduce the false-positive rate of deep neural network models. If we used an extremely imbalanced dataset with limited instances of serendipitous drug usage, deep neural network models did not outperform other machine-learning models with n-gram and context features. However, deep neural network models could more effectively use word embedding in feature construction, an advantage that makes them worthy of further investigation. Finally, we implemented natural-language processing and machine-learning methods in a web-based application to help scientists and software developers mine social media for serendipitous drug usage.",

keywords = "Social media, data mining, drug discovery, drug repurposing, health informatics",

author = "Boshu Ru and Dingcheng Li and Yueqi Hu and Lixia Yao",

note = "Publisher Copyright: {\textcopyright} 2002-2011 IEEE.",

year = "2019",

month = jul,

doi = "10.1109/TNB.2019.2909094",

language = "English (US)",

volume = "18",

pages = "324--334",

journal = "IEEE Transactions on Nanobioscience",

issn = "1536-1241",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

number = "3",

}

TY - JOUR

T1 - Serendipity - A Machine-Learning Application for Mining Serendipitous Drug Usage from Social Media

AU - Ru, Boshu

AU - Li, Dingcheng

AU - Hu, Yueqi

AU - Yao, Lixia

PY - 2019/7

Y1 - 2019/7

N2 - Serendipitous drug usage refers to the unexpected relief of comorbid diseases or symptoms when taking medication for a different known indication. Historically, serendipity has contributed significantly to identifying many new drug indications. If patient-reported serendipitous drug usage in social media could be computationally identified, it could help generate and validate drug-repositioning hypotheses. We investigated deep neural network models for mining serendipitous drug usage from social media. We used the word2vec algorithm to construct word-embedding features from drug reviews posted in a WebMD patient forum. We adapted and redesigned the convolutional neural network, long short-term memory network, and convolutional long short-term memory network by adding contextual information extracted from drug-review posts, information-filtering tools, medical ontology, and medical knowledge. We trained, tuned, and evaluated our models with a gold-standard dataset of 15714 sentences (447 [2.8%] describing serendipitous drug usage). Additionally, we compared our deep neural networks to support vector machine, random forest, and AdaBoost.M1 algorithms. Context information helped to reduce the false-positive rate of deep neural network models. If we used an extremely imbalanced dataset with limited instances of serendipitous drug usage, deep neural network models did not outperform other machine-learning models with n-gram and context features. However, deep neural network models could more effectively use word embedding in feature construction, an advantage that makes them worthy of further investigation. Finally, we implemented natural-language processing and machine-learning methods in a web-based application to help scientists and software developers mine social media for serendipitous drug usage.

AB - Serendipitous drug usage refers to the unexpected relief of comorbid diseases or symptoms when taking medication for a different known indication. Historically, serendipity has contributed significantly to identifying many new drug indications. If patient-reported serendipitous drug usage in social media could be computationally identified, it could help generate and validate drug-repositioning hypotheses. We investigated deep neural network models for mining serendipitous drug usage from social media. We used the word2vec algorithm to construct word-embedding features from drug reviews posted in a WebMD patient forum. We adapted and redesigned the convolutional neural network, long short-term memory network, and convolutional long short-term memory network by adding contextual information extracted from drug-review posts, information-filtering tools, medical ontology, and medical knowledge. We trained, tuned, and evaluated our models with a gold-standard dataset of 15714 sentences (447 [2.8%] describing serendipitous drug usage). Additionally, we compared our deep neural networks to support vector machine, random forest, and AdaBoost.M1 algorithms. Context information helped to reduce the false-positive rate of deep neural network models. If we used an extremely imbalanced dataset with limited instances of serendipitous drug usage, deep neural network models did not outperform other machine-learning models with n-gram and context features. However, deep neural network models could more effectively use word embedding in feature construction, an advantage that makes them worthy of further investigation. Finally, we implemented natural-language processing and machine-learning methods in a web-based application to help scientists and software developers mine social media for serendipitous drug usage.

KW - Social media

KW - data mining

KW - drug discovery

KW - drug repurposing

KW - health informatics

UR - http://www.scopus.com/inward/record.url?scp=85064390104&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85064390104&partnerID=8YFLogxK

U2 - 10.1109/TNB.2019.2909094

DO - 10.1109/TNB.2019.2909094

M3 - Article

C2 - 30951476

AN - SCOPUS:85064390104

SN - 1536-1241

VL - 18

SP - 324

EP - 334

JO - IEEE Transactions on Nanobioscience

JF - IEEE Transactions on Nanobioscience

IS - 3

M1 - 8681431

ER -

Serendipity - A Machine-Learning Application for Mining Serendipitous Drug Usage from Social Media

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this