A rough set theory approach to the analysis of gene expression profiles

Joachim Petit, Nathalie Meurice, José Luis Medina-Franco, Gerald M. Maggiora

Research output: Chapter in Book/Report/Conference proceedingChapter

1 Citation (Scopus)

Abstract

Rough set theory (RST) is a set-based method that is well suited for dealing with a wide variety of discrete data. The goal of this preliminary study is to evaluate the potential suitability of RST for predicting biological endpoints in cells from their associated gene expression profiles. Such studies are the basis for identifying potential new targets that ultimately will be integrated with chemical information in drug-discovery research. In the present work, a small literature dataset was used to assess whether the gene-expression profiles induced by 30 well-known drugs can be used to predict whether human hepatoma HepG2 cells exhibit signs of phospholipidosis after treatment with the drugs. The data in this study is cast in the form of a decision table (DT), whose rows are associated with the 30 drugs and whose columns are associated with the drug-induced expression levels of 17 genes, called condition attributes in RST, plus a column that is associated with the single decision attribute that characterizes whether or not cells exhibit drug-induced phospholipidosis. The gene expression levels provide a means for partitioning the drugs into equivalence classes called indiscernibility classes in RST, such that none of the drugs in a given class can be distinguished from any other drugs in that class on the basis of their drug-induced gene expression levels. One of the powers of RST is that it provides a systematic, mathematically rigorous method for removing superfluous information. The remaining relevant information can then be expressed in terms of simple, linguistic rules that significantly enhance communication among scientists, especially those not conversant with RST. In this work, the RST approach allowed easy identification of the strongest relationships existing between drug-induced gene-expression profiles and the occurrence or nonoccurrence of phospholipidosis in the HepG2 cells. This study suggests that RST may be an efficient and effective tool for analyzing gene-expression levels in small datasets. Future studies will examine the suitability of the RST approach to larger and more complex datasets.

Original languageEnglish (US)
Title of host publicationChemoinformatics for Drug Discovery
Publisherwiley
Pages51-83
Number of pages33
ISBN (Electronic)9781118742785
ISBN (Print)9781118139103
DOIs
StatePublished - Nov 15 2013

Fingerprint

Rough set theory
Gene expression
Pharmaceutical Preparations
Decision tables
Equivalence classes
Linguistics
Genes

Keywords

  • Association rules
  • Drug discovery
  • Drug-induced phospholipidosis
  • Gene expression profiles
  • Rough set theory

ASJC Scopus subject areas

  • Chemistry(all)

Cite this

Petit, J., Meurice, N., Medina-Franco, J. L., & Maggiora, G. M. (2013). A rough set theory approach to the analysis of gene expression profiles. In Chemoinformatics for Drug Discovery (pp. 51-83). wiley. https://doi.org/10.1002/9781118742785.ch3

A rough set theory approach to the analysis of gene expression profiles. / Petit, Joachim; Meurice, Nathalie; Medina-Franco, José Luis; Maggiora, Gerald M.

Chemoinformatics for Drug Discovery. wiley, 2013. p. 51-83.

Research output: Chapter in Book/Report/Conference proceedingChapter

Petit, J, Meurice, N, Medina-Franco, JL & Maggiora, GM 2013, A rough set theory approach to the analysis of gene expression profiles. in Chemoinformatics for Drug Discovery. wiley, pp. 51-83. https://doi.org/10.1002/9781118742785.ch3
Petit J, Meurice N, Medina-Franco JL, Maggiora GM. A rough set theory approach to the analysis of gene expression profiles. In Chemoinformatics for Drug Discovery. wiley. 2013. p. 51-83 https://doi.org/10.1002/9781118742785.ch3
Petit, Joachim ; Meurice, Nathalie ; Medina-Franco, José Luis ; Maggiora, Gerald M. / A rough set theory approach to the analysis of gene expression profiles. Chemoinformatics for Drug Discovery. wiley, 2013. pp. 51-83
@inbook{27c2880f28384171bb1e5ede5ca3c0c3,
title = "A rough set theory approach to the analysis of gene expression profiles",
abstract = "Rough set theory (RST) is a set-based method that is well suited for dealing with a wide variety of discrete data. The goal of this preliminary study is to evaluate the potential suitability of RST for predicting biological endpoints in cells from their associated gene expression profiles. Such studies are the basis for identifying potential new targets that ultimately will be integrated with chemical information in drug-discovery research. In the present work, a small literature dataset was used to assess whether the gene-expression profiles induced by 30 well-known drugs can be used to predict whether human hepatoma HepG2 cells exhibit signs of phospholipidosis after treatment with the drugs. The data in this study is cast in the form of a decision table (DT), whose rows are associated with the 30 drugs and whose columns are associated with the drug-induced expression levels of 17 genes, called condition attributes in RST, plus a column that is associated with the single decision attribute that characterizes whether or not cells exhibit drug-induced phospholipidosis. The gene expression levels provide a means for partitioning the drugs into equivalence classes called indiscernibility classes in RST, such that none of the drugs in a given class can be distinguished from any other drugs in that class on the basis of their drug-induced gene expression levels. One of the powers of RST is that it provides a systematic, mathematically rigorous method for removing superfluous information. The remaining relevant information can then be expressed in terms of simple, linguistic rules that significantly enhance communication among scientists, especially those not conversant with RST. In this work, the RST approach allowed easy identification of the strongest relationships existing between drug-induced gene-expression profiles and the occurrence or nonoccurrence of phospholipidosis in the HepG2 cells. This study suggests that RST may be an efficient and effective tool for analyzing gene-expression levels in small datasets. Future studies will examine the suitability of the RST approach to larger and more complex datasets.",
keywords = "Association rules, Drug discovery, Drug-induced phospholipidosis, Gene expression profiles, Rough set theory",
author = "Joachim Petit and Nathalie Meurice and Medina-Franco, {Jos{\'e} Luis} and Maggiora, {Gerald M.}",
year = "2013",
month = "11",
day = "15",
doi = "10.1002/9781118742785.ch3",
language = "English (US)",
isbn = "9781118139103",
pages = "51--83",
booktitle = "Chemoinformatics for Drug Discovery",
publisher = "wiley",

}

TY - CHAP

T1 - A rough set theory approach to the analysis of gene expression profiles

AU - Petit, Joachim

AU - Meurice, Nathalie

AU - Medina-Franco, José Luis

AU - Maggiora, Gerald M.

PY - 2013/11/15

Y1 - 2013/11/15

N2 - Rough set theory (RST) is a set-based method that is well suited for dealing with a wide variety of discrete data. The goal of this preliminary study is to evaluate the potential suitability of RST for predicting biological endpoints in cells from their associated gene expression profiles. Such studies are the basis for identifying potential new targets that ultimately will be integrated with chemical information in drug-discovery research. In the present work, a small literature dataset was used to assess whether the gene-expression profiles induced by 30 well-known drugs can be used to predict whether human hepatoma HepG2 cells exhibit signs of phospholipidosis after treatment with the drugs. The data in this study is cast in the form of a decision table (DT), whose rows are associated with the 30 drugs and whose columns are associated with the drug-induced expression levels of 17 genes, called condition attributes in RST, plus a column that is associated with the single decision attribute that characterizes whether or not cells exhibit drug-induced phospholipidosis. The gene expression levels provide a means for partitioning the drugs into equivalence classes called indiscernibility classes in RST, such that none of the drugs in a given class can be distinguished from any other drugs in that class on the basis of their drug-induced gene expression levels. One of the powers of RST is that it provides a systematic, mathematically rigorous method for removing superfluous information. The remaining relevant information can then be expressed in terms of simple, linguistic rules that significantly enhance communication among scientists, especially those not conversant with RST. In this work, the RST approach allowed easy identification of the strongest relationships existing between drug-induced gene-expression profiles and the occurrence or nonoccurrence of phospholipidosis in the HepG2 cells. This study suggests that RST may be an efficient and effective tool for analyzing gene-expression levels in small datasets. Future studies will examine the suitability of the RST approach to larger and more complex datasets.

AB - Rough set theory (RST) is a set-based method that is well suited for dealing with a wide variety of discrete data. The goal of this preliminary study is to evaluate the potential suitability of RST for predicting biological endpoints in cells from their associated gene expression profiles. Such studies are the basis for identifying potential new targets that ultimately will be integrated with chemical information in drug-discovery research. In the present work, a small literature dataset was used to assess whether the gene-expression profiles induced by 30 well-known drugs can be used to predict whether human hepatoma HepG2 cells exhibit signs of phospholipidosis after treatment with the drugs. The data in this study is cast in the form of a decision table (DT), whose rows are associated with the 30 drugs and whose columns are associated with the drug-induced expression levels of 17 genes, called condition attributes in RST, plus a column that is associated with the single decision attribute that characterizes whether or not cells exhibit drug-induced phospholipidosis. The gene expression levels provide a means for partitioning the drugs into equivalence classes called indiscernibility classes in RST, such that none of the drugs in a given class can be distinguished from any other drugs in that class on the basis of their drug-induced gene expression levels. One of the powers of RST is that it provides a systematic, mathematically rigorous method for removing superfluous information. The remaining relevant information can then be expressed in terms of simple, linguistic rules that significantly enhance communication among scientists, especially those not conversant with RST. In this work, the RST approach allowed easy identification of the strongest relationships existing between drug-induced gene-expression profiles and the occurrence or nonoccurrence of phospholipidosis in the HepG2 cells. This study suggests that RST may be an efficient and effective tool for analyzing gene-expression levels in small datasets. Future studies will examine the suitability of the RST approach to larger and more complex datasets.

KW - Association rules

KW - Drug discovery

KW - Drug-induced phospholipidosis

KW - Gene expression profiles

KW - Rough set theory

UR - http://www.scopus.com/inward/record.url?scp=84941333456&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84941333456&partnerID=8YFLogxK

U2 - 10.1002/9781118742785.ch3

DO - 10.1002/9781118742785.ch3

M3 - Chapter

SN - 9781118139103

SP - 51

EP - 83

BT - Chemoinformatics for Drug Discovery

PB - wiley

ER -