TY - GEN
T1 - Detecting Serendipitous Drug Usage in Social Media with Deep Neural Network Models
AU - Ru, Boshu
AU - Li, Dingcheng
AU - Yao, Lixia
N1 - Publisher Copyright:
© 2018 IEEE.
PY - 2019/1/21
Y1 - 2019/1/21
N2 - Serendipitous drug usage refers to unexpected relief of comorbid diseases or symptoms when patients take a drug for another common or known indication. In the history of drug discovery, serendipity has contributed significantly to new and successful indications for many drugs. Our previous research has identified patient reported serendipitous drug usage in social media. If such information could be computationally identified in social media, it could be helpful for generating and validating drug-repositioning hypotheses. In this study, we framed detection of serendipitous drug usage in social media as a binary classification problem and investigated deep neural network models as a solution. We constructed word-embedding features from drug-review posts in the patient forum of WebMD, using the word2vec algorithm. We adopted the convolutional neural network (CNN), long short-term memory network (LSTM), and convolutional long short-term memory network (CLSTM) and redesigned them by adding contextual information that we extracted from drug-review posts, information filtering tools, medical ontology, and medical knowledge. We trained, tuned, and evaluated our deep neural network models on a gold standard dataset containing 15,714 sentences, of which 447 contained serendipitous drug usages. Additionally, we compared our deep neural networks to support vector machine, random forest, and AdaBoost.M1 algorithms. The results showed that adding context information helped to reduce the false-positive rate of deep neural network models. In the presence of an extremely imbalanced dataset and limited instances of serendipitous drug usage, deep neural network models did not outperform other machine learning models with n-gram and context features. However, deep neural network models could more effectively utilize word embedding in feature construction. This advantage made deep neural networks worthy of further investigation and improvement.
AB - Serendipitous drug usage refers to unexpected relief of comorbid diseases or symptoms when patients take a drug for another common or known indication. In the history of drug discovery, serendipity has contributed significantly to new and successful indications for many drugs. Our previous research has identified patient reported serendipitous drug usage in social media. If such information could be computationally identified in social media, it could be helpful for generating and validating drug-repositioning hypotheses. In this study, we framed detection of serendipitous drug usage in social media as a binary classification problem and investigated deep neural network models as a solution. We constructed word-embedding features from drug-review posts in the patient forum of WebMD, using the word2vec algorithm. We adopted the convolutional neural network (CNN), long short-term memory network (LSTM), and convolutional long short-term memory network (CLSTM) and redesigned them by adding contextual information that we extracted from drug-review posts, information filtering tools, medical ontology, and medical knowledge. We trained, tuned, and evaluated our deep neural network models on a gold standard dataset containing 15,714 sentences, of which 447 contained serendipitous drug usages. Additionally, we compared our deep neural networks to support vector machine, random forest, and AdaBoost.M1 algorithms. The results showed that adding context information helped to reduce the false-positive rate of deep neural network models. In the presence of an extremely imbalanced dataset and limited instances of serendipitous drug usage, deep neural network models did not outperform other machine learning models with n-gram and context features. However, deep neural network models could more effectively utilize word embedding in feature construction. This advantage made deep neural networks worthy of further investigation and improvement.
KW - Data mining
KW - drug discovery
KW - drug repurposing
KW - health informatics
KW - social media
UR - http://www.scopus.com/inward/record.url?scp=85062512189&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85062512189&partnerID=8YFLogxK
U2 - 10.1109/BIBM.2018.8621252
DO - 10.1109/BIBM.2018.8621252
M3 - Conference contribution
AN - SCOPUS:85062512189
T3 - Proceedings - 2018 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2018
SP - 1083
EP - 1090
BT - Proceedings - 2018 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2018
A2 - Schmidt, Harald
A2 - Griol, David
A2 - Wang, Haiying
A2 - Baumbach, Jan
A2 - Zheng, Huiru
A2 - Callejas, Zoraida
A2 - Hu, Xiaohua
A2 - Dickerson, Julie
A2 - Zhang, Le
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2018 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2018
Y2 - 3 December 2018 through 6 December 2018
ER -