Machine Learning helps Identify New Drug Mechanisms in Triple-Negative Breast Cancer

Arjun P. Athreya, Alan J. Gaglio, Junmei Cairns, Krishna R Kalari, Richard M Weinshilboum, Liewei M Wang, Zbigniew T. Kalbarczyk, Ravishankar K. Iyer

Research output: Contribution to journalArticle

3 Citations (Scopus)

Abstract

This paper demonstrates the ability of machine learning approaches to identify a few genes among the 23; 398 genes of the human genome to experiment on in the laboratory to establish new drug mechanisms. As a case study, this work uses MDA-MB-231 breast cancer single-cells treated with the antidiabetic drug metformin. We show that mixture-model based unsupervised methods with validation from hierarchical clustering can identify single-cell subpopulations (clusters). These clusters are characterized by a small set of genes (1% of the genome) that have significant differential expression across the clusters and are also highly correlated with pathways with anticancer effects driven by metformin. Among the identified small set of genes associated with reduced breast cancer incidence, laboratory experiments on one of the genes, CDC42, showed that its downregulation by metformin inhibited cancer cell-migration and proliferation, thus validating the ability of machine learning approaches to identify biologically-relevant candidates for laboratory experiments. Given the large size of the human genome and limitations in cost and skilled resources, the broader impact of this work in identifying a small set of differentially expressed genes after drug treatment lies in augmenting the drug-disease knowledge of pharmacogenomics experts in laboratory investigations, which could help establish novel biological mechanisms associated with drug response in diseases beyond breast cancer.

Original languageEnglish (US)
JournalIEEE Transactions on Nanobioscience
DOIs
StateAccepted/In press - Jun 29 2018

Fingerprint

Triple Negative Breast Neoplasms
Learning systems
Genes
Metformin
Pharmaceutical Preparations
Human Genome
Breast Neoplasms
Pharmacogenetics
Hypoglycemic Agents
Cell Movement
Cluster Analysis
Drug therapy
Machine Learning
Down-Regulation
Experiments
Cell Proliferation
Genome
Costs and Cost Analysis
Incidence
Cells

Keywords

  • Bioinformatics
  • Breast cancer
  • breast cancer
  • Drugs
  • Gene expression
  • Genomics
  • metformin
  • mixture-models
  • model-based learning
  • Single-cell RNASeq
  • Unsupervised learning

ASJC Scopus subject areas

  • Biotechnology
  • Bioengineering
  • Medicine (miscellaneous)
  • Biomedical Engineering
  • Pharmaceutical Science
  • Computer Science Applications
  • Electrical and Electronic Engineering

Cite this

Machine Learning helps Identify New Drug Mechanisms in Triple-Negative Breast Cancer. / Athreya, Arjun P.; Gaglio, Alan J.; Cairns, Junmei; Kalari, Krishna R; Weinshilboum, Richard M; Wang, Liewei M; Kalbarczyk, Zbigniew T.; Iyer, Ravishankar K.

In: IEEE Transactions on Nanobioscience, 29.06.2018.

Research output: Contribution to journalArticle

@article{89599fca99794760acf7c79ec2c539ba,
title = "Machine Learning helps Identify New Drug Mechanisms in Triple-Negative Breast Cancer",
abstract = "This paper demonstrates the ability of machine learning approaches to identify a few genes among the 23; 398 genes of the human genome to experiment on in the laboratory to establish new drug mechanisms. As a case study, this work uses MDA-MB-231 breast cancer single-cells treated with the antidiabetic drug metformin. We show that mixture-model based unsupervised methods with validation from hierarchical clustering can identify single-cell subpopulations (clusters). These clusters are characterized by a small set of genes (1{\%} of the genome) that have significant differential expression across the clusters and are also highly correlated with pathways with anticancer effects driven by metformin. Among the identified small set of genes associated with reduced breast cancer incidence, laboratory experiments on one of the genes, CDC42, showed that its downregulation by metformin inhibited cancer cell-migration and proliferation, thus validating the ability of machine learning approaches to identify biologically-relevant candidates for laboratory experiments. Given the large size of the human genome and limitations in cost and skilled resources, the broader impact of this work in identifying a small set of differentially expressed genes after drug treatment lies in augmenting the drug-disease knowledge of pharmacogenomics experts in laboratory investigations, which could help establish novel biological mechanisms associated with drug response in diseases beyond breast cancer.",
keywords = "Bioinformatics, Breast cancer, breast cancer, Drugs, Gene expression, Genomics, metformin, mixture-models, model-based learning, Single-cell RNASeq, Unsupervised learning",
author = "Athreya, {Arjun P.} and Gaglio, {Alan J.} and Junmei Cairns and Kalari, {Krishna R} and Weinshilboum, {Richard M} and Wang, {Liewei M} and Kalbarczyk, {Zbigniew T.} and Iyer, {Ravishankar K.}",
year = "2018",
month = "6",
day = "29",
doi = "10.1109/TNB.2018.2851997",
language = "English (US)",
journal = "IEEE Transactions on Nanobioscience",
issn = "1536-1241",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - Machine Learning helps Identify New Drug Mechanisms in Triple-Negative Breast Cancer

AU - Athreya, Arjun P.

AU - Gaglio, Alan J.

AU - Cairns, Junmei

AU - Kalari, Krishna R

AU - Weinshilboum, Richard M

AU - Wang, Liewei M

AU - Kalbarczyk, Zbigniew T.

AU - Iyer, Ravishankar K.

PY - 2018/6/29

Y1 - 2018/6/29

N2 - This paper demonstrates the ability of machine learning approaches to identify a few genes among the 23; 398 genes of the human genome to experiment on in the laboratory to establish new drug mechanisms. As a case study, this work uses MDA-MB-231 breast cancer single-cells treated with the antidiabetic drug metformin. We show that mixture-model based unsupervised methods with validation from hierarchical clustering can identify single-cell subpopulations (clusters). These clusters are characterized by a small set of genes (1% of the genome) that have significant differential expression across the clusters and are also highly correlated with pathways with anticancer effects driven by metformin. Among the identified small set of genes associated with reduced breast cancer incidence, laboratory experiments on one of the genes, CDC42, showed that its downregulation by metformin inhibited cancer cell-migration and proliferation, thus validating the ability of machine learning approaches to identify biologically-relevant candidates for laboratory experiments. Given the large size of the human genome and limitations in cost and skilled resources, the broader impact of this work in identifying a small set of differentially expressed genes after drug treatment lies in augmenting the drug-disease knowledge of pharmacogenomics experts in laboratory investigations, which could help establish novel biological mechanisms associated with drug response in diseases beyond breast cancer.

AB - This paper demonstrates the ability of machine learning approaches to identify a few genes among the 23; 398 genes of the human genome to experiment on in the laboratory to establish new drug mechanisms. As a case study, this work uses MDA-MB-231 breast cancer single-cells treated with the antidiabetic drug metformin. We show that mixture-model based unsupervised methods with validation from hierarchical clustering can identify single-cell subpopulations (clusters). These clusters are characterized by a small set of genes (1% of the genome) that have significant differential expression across the clusters and are also highly correlated with pathways with anticancer effects driven by metformin. Among the identified small set of genes associated with reduced breast cancer incidence, laboratory experiments on one of the genes, CDC42, showed that its downregulation by metformin inhibited cancer cell-migration and proliferation, thus validating the ability of machine learning approaches to identify biologically-relevant candidates for laboratory experiments. Given the large size of the human genome and limitations in cost and skilled resources, the broader impact of this work in identifying a small set of differentially expressed genes after drug treatment lies in augmenting the drug-disease knowledge of pharmacogenomics experts in laboratory investigations, which could help establish novel biological mechanisms associated with drug response in diseases beyond breast cancer.

KW - Bioinformatics

KW - Breast cancer

KW - breast cancer

KW - Drugs

KW - Gene expression

KW - Genomics

KW - metformin

KW - mixture-models

KW - model-based learning

KW - Single-cell RNASeq

KW - Unsupervised learning

UR - http://www.scopus.com/inward/record.url?scp=85049338133&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85049338133&partnerID=8YFLogxK

U2 - 10.1109/TNB.2018.2851997

DO - 10.1109/TNB.2018.2851997

M3 - Article

C2 - 29994716

AN - SCOPUS:85049338133

JO - IEEE Transactions on Nanobioscience

JF - IEEE Transactions on Nanobioscience

SN - 1536-1241

ER -