TY - JOUR
T1 - Serendipity - A Machine-Learning Application for Mining Serendipitous Drug Usage from Social Media
AU - Ru, Boshu
AU - Li, Dingcheng
AU - Hu, Yueqi
AU - Yao, Lixia
N1 - Funding Information:
Manuscript received March 30, 2019; accepted March 30, 2019. Date of publication April 4, 2019; date of current version June 28, 2019. This work was supported by the National Library of Medicine under Grant 5K01LM012102. (Corresponding author: Lixia Yao.) B. Ru is with the Department of Software and Information Systems, University of North Carolina at Charlotte, Charlotte, NC 28223 USA D. Li is with the Big Data Laboratory, Baidu USA, Inc., Bellevue, WA 98004 USA Y. Hu is with the Department of Computer Science, University of North Carolina at Charlotte, Charlotte, NC 28223 USA L. Yao is with the Department of Health Sciences Research, Mayo Clinic, Rochester, MN 55905 USA (e-mail: yao.lixia@mayo.edu). Digital Object Identifier 10.1109/TNB.2019.2909094
Publisher Copyright:
© 2002-2011 IEEE.
PY - 2019/7
Y1 - 2019/7
N2 - Serendipitous drug usage refers to the unexpected relief of comorbid diseases or symptoms when taking medication for a different known indication. Historically, serendipity has contributed significantly to identifying many new drug indications. If patient-reported serendipitous drug usage in social media could be computationally identified, it could help generate and validate drug-repositioning hypotheses. We investigated deep neural network models for mining serendipitous drug usage from social media. We used the word2vec algorithm to construct word-embedding features from drug reviews posted in a WebMD patient forum. We adapted and redesigned the convolutional neural network, long short-term memory network, and convolutional long short-term memory network by adding contextual information extracted from drug-review posts, information-filtering tools, medical ontology, and medical knowledge. We trained, tuned, and evaluated our models with a gold-standard dataset of 15714 sentences (447 [2.8%] describing serendipitous drug usage). Additionally, we compared our deep neural networks to support vector machine, random forest, and AdaBoost.M1 algorithms. Context information helped to reduce the false-positive rate of deep neural network models. If we used an extremely imbalanced dataset with limited instances of serendipitous drug usage, deep neural network models did not outperform other machine-learning models with n-gram and context features. However, deep neural network models could more effectively use word embedding in feature construction, an advantage that makes them worthy of further investigation. Finally, we implemented natural-language processing and machine-learning methods in a web-based application to help scientists and software developers mine social media for serendipitous drug usage.
AB - Serendipitous drug usage refers to the unexpected relief of comorbid diseases or symptoms when taking medication for a different known indication. Historically, serendipity has contributed significantly to identifying many new drug indications. If patient-reported serendipitous drug usage in social media could be computationally identified, it could help generate and validate drug-repositioning hypotheses. We investigated deep neural network models for mining serendipitous drug usage from social media. We used the word2vec algorithm to construct word-embedding features from drug reviews posted in a WebMD patient forum. We adapted and redesigned the convolutional neural network, long short-term memory network, and convolutional long short-term memory network by adding contextual information extracted from drug-review posts, information-filtering tools, medical ontology, and medical knowledge. We trained, tuned, and evaluated our models with a gold-standard dataset of 15714 sentences (447 [2.8%] describing serendipitous drug usage). Additionally, we compared our deep neural networks to support vector machine, random forest, and AdaBoost.M1 algorithms. Context information helped to reduce the false-positive rate of deep neural network models. If we used an extremely imbalanced dataset with limited instances of serendipitous drug usage, deep neural network models did not outperform other machine-learning models with n-gram and context features. However, deep neural network models could more effectively use word embedding in feature construction, an advantage that makes them worthy of further investigation. Finally, we implemented natural-language processing and machine-learning methods in a web-based application to help scientists and software developers mine social media for serendipitous drug usage.
KW - Social media
KW - data mining
KW - drug discovery
KW - drug repurposing
KW - health informatics
UR - http://www.scopus.com/inward/record.url?scp=85064390104&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85064390104&partnerID=8YFLogxK
U2 - 10.1109/TNB.2019.2909094
DO - 10.1109/TNB.2019.2909094
M3 - Article
C2 - 30951476
AN - SCOPUS:85064390104
SN - 1536-1241
VL - 18
SP - 324
EP - 334
JO - IEEE Transactions on Nanobioscience
JF - IEEE Transactions on Nanobioscience
IS - 3
M1 - 8681431
ER -