Serendipity—A Machine-Learning Application for Mining Serendipitous Drug Usage from Social Media

Boshu Ru, Dingcheng Li, Yueqi Hu, Lixia Yao

Research output: Contribution to journalArticle

Abstract

Serendipitous drug usage refers to the unexpected relief of comorbid diseases or symptoms when taking a medication for a different known indication. Historically, serendipity has contributed significantly to identifying many new drug indications. If patient-reported serendipitous drug usage in social media could be computationally identified, it could help generate and validate drug-repositioning hypotheses. We investigated deep neural network models for mining serendipitous drug usage from social media. We used the word2vec algorithm to construct wordembedding features from drug reviews posted in a WebMD patient forum. We adapted and redesigned the convolutional neural network, long short-term memory network, and convolutional long short-term memory network by adding contextual information extracted from drug-review posts, information-filtering tools, medical ontology, and medical knowledge. We trained, tuned, and evaluated our models with a gold-standard dataset of 15,714 sentences (447 [2.8%] describing serendipitous drug usage). Additionally, we compared our deep neural networks to support vector machine, random forest, and AdaBoost.M1 algorithms. Context information helped reduce the false-positive rate of deep neural network models. If we used an extremely imbalanced dataset with limited instances of serendipitous drug usage, deep neural network models did not outperform other machine-learning models with n-gram and context features. However, deep neural network models could more effectively use word embedding in feature construction, an advantage that makes them worthy of further investigation. Finally, we implemented natural-language processing and machine-learning methods in a web-based application to help scientists and software developers mine social media for serendipitous drug usage.

Original languageEnglish (US)
JournalIEEE Transactions on Nanobioscience
DOIs
StateAccepted/In press - Jan 1 2019

Fingerprint

Social Media
Learning systems
Neural Networks (Computer)
Pharmaceutical Preparations
Long-Term Memory
Information filtering
Short-Term Memory
Adaptive boosting
Drug Repositioning
Natural Language Processing
Support vector machines
Ontology
Machine Learning
Gold
Deep neural networks
Neural networks
Software
Processing

Keywords

  • Convolution
  • data mining
  • drug discovery
  • drug repurposing
  • Drugs
  • health informatics
  • Information filters
  • Machine learning
  • Neural networks
  • social media
  • Social networking (online)

ASJC Scopus subject areas

  • Biotechnology
  • Bioengineering
  • Medicine (miscellaneous)
  • Biomedical Engineering
  • Pharmaceutical Science
  • Computer Science Applications
  • Electrical and Electronic Engineering

Cite this

Serendipity—A Machine-Learning Application for Mining Serendipitous Drug Usage from Social Media. / Ru, Boshu; Li, Dingcheng; Hu, Yueqi; Yao, Lixia.

In: IEEE Transactions on Nanobioscience, 01.01.2019.

Research output: Contribution to journalArticle

@article{05490083ae274031a39a8ba7bb14ffe7,
title = "Serendipity—A Machine-Learning Application for Mining Serendipitous Drug Usage from Social Media",
abstract = "Serendipitous drug usage refers to the unexpected relief of comorbid diseases or symptoms when taking a medication for a different known indication. Historically, serendipity has contributed significantly to identifying many new drug indications. If patient-reported serendipitous drug usage in social media could be computationally identified, it could help generate and validate drug-repositioning hypotheses. We investigated deep neural network models for mining serendipitous drug usage from social media. We used the word2vec algorithm to construct wordembedding features from drug reviews posted in a WebMD patient forum. We adapted and redesigned the convolutional neural network, long short-term memory network, and convolutional long short-term memory network by adding contextual information extracted from drug-review posts, information-filtering tools, medical ontology, and medical knowledge. We trained, tuned, and evaluated our models with a gold-standard dataset of 15,714 sentences (447 [2.8{\%}] describing serendipitous drug usage). Additionally, we compared our deep neural networks to support vector machine, random forest, and AdaBoost.M1 algorithms. Context information helped reduce the false-positive rate of deep neural network models. If we used an extremely imbalanced dataset with limited instances of serendipitous drug usage, deep neural network models did not outperform other machine-learning models with n-gram and context features. However, deep neural network models could more effectively use word embedding in feature construction, an advantage that makes them worthy of further investigation. Finally, we implemented natural-language processing and machine-learning methods in a web-based application to help scientists and software developers mine social media for serendipitous drug usage.",
keywords = "Convolution, data mining, drug discovery, drug repurposing, Drugs, health informatics, Information filters, Machine learning, Neural networks, social media, Social networking (online)",
author = "Boshu Ru and Dingcheng Li and Yueqi Hu and Lixia Yao",
year = "2019",
month = "1",
day = "1",
doi = "10.1109/TNB.2019.2909094",
language = "English (US)",
journal = "IEEE Transactions on Nanobioscience",
issn = "1536-1241",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - Serendipity—A Machine-Learning Application for Mining Serendipitous Drug Usage from Social Media

AU - Ru, Boshu

AU - Li, Dingcheng

AU - Hu, Yueqi

AU - Yao, Lixia

PY - 2019/1/1

Y1 - 2019/1/1

N2 - Serendipitous drug usage refers to the unexpected relief of comorbid diseases or symptoms when taking a medication for a different known indication. Historically, serendipity has contributed significantly to identifying many new drug indications. If patient-reported serendipitous drug usage in social media could be computationally identified, it could help generate and validate drug-repositioning hypotheses. We investigated deep neural network models for mining serendipitous drug usage from social media. We used the word2vec algorithm to construct wordembedding features from drug reviews posted in a WebMD patient forum. We adapted and redesigned the convolutional neural network, long short-term memory network, and convolutional long short-term memory network by adding contextual information extracted from drug-review posts, information-filtering tools, medical ontology, and medical knowledge. We trained, tuned, and evaluated our models with a gold-standard dataset of 15,714 sentences (447 [2.8%] describing serendipitous drug usage). Additionally, we compared our deep neural networks to support vector machine, random forest, and AdaBoost.M1 algorithms. Context information helped reduce the false-positive rate of deep neural network models. If we used an extremely imbalanced dataset with limited instances of serendipitous drug usage, deep neural network models did not outperform other machine-learning models with n-gram and context features. However, deep neural network models could more effectively use word embedding in feature construction, an advantage that makes them worthy of further investigation. Finally, we implemented natural-language processing and machine-learning methods in a web-based application to help scientists and software developers mine social media for serendipitous drug usage.

AB - Serendipitous drug usage refers to the unexpected relief of comorbid diseases or symptoms when taking a medication for a different known indication. Historically, serendipity has contributed significantly to identifying many new drug indications. If patient-reported serendipitous drug usage in social media could be computationally identified, it could help generate and validate drug-repositioning hypotheses. We investigated deep neural network models for mining serendipitous drug usage from social media. We used the word2vec algorithm to construct wordembedding features from drug reviews posted in a WebMD patient forum. We adapted and redesigned the convolutional neural network, long short-term memory network, and convolutional long short-term memory network by adding contextual information extracted from drug-review posts, information-filtering tools, medical ontology, and medical knowledge. We trained, tuned, and evaluated our models with a gold-standard dataset of 15,714 sentences (447 [2.8%] describing serendipitous drug usage). Additionally, we compared our deep neural networks to support vector machine, random forest, and AdaBoost.M1 algorithms. Context information helped reduce the false-positive rate of deep neural network models. If we used an extremely imbalanced dataset with limited instances of serendipitous drug usage, deep neural network models did not outperform other machine-learning models with n-gram and context features. However, deep neural network models could more effectively use word embedding in feature construction, an advantage that makes them worthy of further investigation. Finally, we implemented natural-language processing and machine-learning methods in a web-based application to help scientists and software developers mine social media for serendipitous drug usage.

KW - Convolution

KW - data mining

KW - drug discovery

KW - drug repurposing

KW - Drugs

KW - health informatics

KW - Information filters

KW - Machine learning

KW - Neural networks

KW - social media

KW - Social networking (online)

UR - http://www.scopus.com/inward/record.url?scp=85064390104&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85064390104&partnerID=8YFLogxK

U2 - 10.1109/TNB.2019.2909094

DO - 10.1109/TNB.2019.2909094

M3 - Article

C2 - 30951476

AN - SCOPUS:85064390104

JO - IEEE Transactions on Nanobioscience

JF - IEEE Transactions on Nanobioscience

SN - 1536-1241

ER -