Grammatical evolution decision trees for detecting gene-gene interactions

Alison A. Motsinger-Reif, Sushamna Deodhar, Stacey J Winham, Nicholas E. Hardison

Research output: Contribution to journalArticle

12 Citations (Scopus)

Abstract

Background: A fundamental goal of human genetics is the discovery of polymorphisms that predict common, complex diseases. It is hypothesized that complex diseases are due to a myriad of factors including environmental exposures and complex genetic risk models, including gene-gene interactions. Such epistatic models present an important analytical challenge, requiring that methods perform not only statistical modeling, but also variable selection to generate testable genetic model hypotheses. This challenge is amplified by recent advances in genotyping technology, as the number of potential predictor variables is rapidly increasing. Methods: Decision trees are a highly successful, easily interpretable data-mining method that are typically optimized with a hierarchical model building approach, which limits their potential to identify interacting effects. To overcome this limitation, we utilize evolutionary computation, specifically grammatical evolution, to build decision trees to detect and model gene-gene interactions. In the current study, we introduce the Grammatical Evolution Decision Trees (GEDT) method and software and evaluate this approach on simulated data representing gene-gene interaction models of a range of effect sizes. We compare the performance of the method to a traditional decision tree algorithm and a random search approach and demonstrate the improved performance of the method to detect purely epistatic interactions. Results: The results of our simulations demonstrate that GEDT has high power to detect even very moderate genetic risk models. GEDT has high power to detect interactions with and without main effects. Conclusions: GEDT, while still in its initial stages of development, is a promising new approach for identifying gene-gene interactions in genetic association studies.

Original languageEnglish (US)
Article number8
JournalBioData Mining
Volume3
Issue number1
DOIs
StatePublished - 2010
Externally publishedYes

Fingerprint

Grammatical Evolution
Decision Trees
Decision trees
Decision tree
Genes
Gene
Interaction
Genetic Models
High Power
Model
Genetic Association
Effect Size
Random Search
Data Mining
Statistical Modeling
Main Effect
Environmental Factors
Environmental Exposure
Tree Algorithms
Medical Genetics

ASJC Scopus subject areas

  • Genetics
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Computational Theory and Mathematics
  • Computational Mathematics

Cite this

Grammatical evolution decision trees for detecting gene-gene interactions. / Motsinger-Reif, Alison A.; Deodhar, Sushamna; Winham, Stacey J; Hardison, Nicholas E.

In: BioData Mining, Vol. 3, No. 1, 8, 2010.

Research output: Contribution to journalArticle

Motsinger-Reif, Alison A. ; Deodhar, Sushamna ; Winham, Stacey J ; Hardison, Nicholas E. / Grammatical evolution decision trees for detecting gene-gene interactions. In: BioData Mining. 2010 ; Vol. 3, No. 1.
@article{43162f71b56f471eb28418169cabaa34,
title = "Grammatical evolution decision trees for detecting gene-gene interactions",
abstract = "Background: A fundamental goal of human genetics is the discovery of polymorphisms that predict common, complex diseases. It is hypothesized that complex diseases are due to a myriad of factors including environmental exposures and complex genetic risk models, including gene-gene interactions. Such epistatic models present an important analytical challenge, requiring that methods perform not only statistical modeling, but also variable selection to generate testable genetic model hypotheses. This challenge is amplified by recent advances in genotyping technology, as the number of potential predictor variables is rapidly increasing. Methods: Decision trees are a highly successful, easily interpretable data-mining method that are typically optimized with a hierarchical model building approach, which limits their potential to identify interacting effects. To overcome this limitation, we utilize evolutionary computation, specifically grammatical evolution, to build decision trees to detect and model gene-gene interactions. In the current study, we introduce the Grammatical Evolution Decision Trees (GEDT) method and software and evaluate this approach on simulated data representing gene-gene interaction models of a range of effect sizes. We compare the performance of the method to a traditional decision tree algorithm and a random search approach and demonstrate the improved performance of the method to detect purely epistatic interactions. Results: The results of our simulations demonstrate that GEDT has high power to detect even very moderate genetic risk models. GEDT has high power to detect interactions with and without main effects. Conclusions: GEDT, while still in its initial stages of development, is a promising new approach for identifying gene-gene interactions in genetic association studies.",
author = "Motsinger-Reif, {Alison A.} and Sushamna Deodhar and Winham, {Stacey J} and Hardison, {Nicholas E.}",
year = "2010",
doi = "10.1186/1756-0381-3-8",
language = "English (US)",
volume = "3",
journal = "BioData Mining",
issn = "1756-0381",
publisher = "BioMed Central",
number = "1",

}

TY - JOUR

T1 - Grammatical evolution decision trees for detecting gene-gene interactions

AU - Motsinger-Reif, Alison A.

AU - Deodhar, Sushamna

AU - Winham, Stacey J

AU - Hardison, Nicholas E.

PY - 2010

Y1 - 2010

N2 - Background: A fundamental goal of human genetics is the discovery of polymorphisms that predict common, complex diseases. It is hypothesized that complex diseases are due to a myriad of factors including environmental exposures and complex genetic risk models, including gene-gene interactions. Such epistatic models present an important analytical challenge, requiring that methods perform not only statistical modeling, but also variable selection to generate testable genetic model hypotheses. This challenge is amplified by recent advances in genotyping technology, as the number of potential predictor variables is rapidly increasing. Methods: Decision trees are a highly successful, easily interpretable data-mining method that are typically optimized with a hierarchical model building approach, which limits their potential to identify interacting effects. To overcome this limitation, we utilize evolutionary computation, specifically grammatical evolution, to build decision trees to detect and model gene-gene interactions. In the current study, we introduce the Grammatical Evolution Decision Trees (GEDT) method and software and evaluate this approach on simulated data representing gene-gene interaction models of a range of effect sizes. We compare the performance of the method to a traditional decision tree algorithm and a random search approach and demonstrate the improved performance of the method to detect purely epistatic interactions. Results: The results of our simulations demonstrate that GEDT has high power to detect even very moderate genetic risk models. GEDT has high power to detect interactions with and without main effects. Conclusions: GEDT, while still in its initial stages of development, is a promising new approach for identifying gene-gene interactions in genetic association studies.

AB - Background: A fundamental goal of human genetics is the discovery of polymorphisms that predict common, complex diseases. It is hypothesized that complex diseases are due to a myriad of factors including environmental exposures and complex genetic risk models, including gene-gene interactions. Such epistatic models present an important analytical challenge, requiring that methods perform not only statistical modeling, but also variable selection to generate testable genetic model hypotheses. This challenge is amplified by recent advances in genotyping technology, as the number of potential predictor variables is rapidly increasing. Methods: Decision trees are a highly successful, easily interpretable data-mining method that are typically optimized with a hierarchical model building approach, which limits their potential to identify interacting effects. To overcome this limitation, we utilize evolutionary computation, specifically grammatical evolution, to build decision trees to detect and model gene-gene interactions. In the current study, we introduce the Grammatical Evolution Decision Trees (GEDT) method and software and evaluate this approach on simulated data representing gene-gene interaction models of a range of effect sizes. We compare the performance of the method to a traditional decision tree algorithm and a random search approach and demonstrate the improved performance of the method to detect purely epistatic interactions. Results: The results of our simulations demonstrate that GEDT has high power to detect even very moderate genetic risk models. GEDT has high power to detect interactions with and without main effects. Conclusions: GEDT, while still in its initial stages of development, is a promising new approach for identifying gene-gene interactions in genetic association studies.

UR - http://www.scopus.com/inward/record.url?scp=78349264270&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=78349264270&partnerID=8YFLogxK

U2 - 10.1186/1756-0381-3-8

DO - 10.1186/1756-0381-3-8

M3 - Article

C2 - 21087514

AN - SCOPUS:78349264270

VL - 3

JO - BioData Mining

JF - BioData Mining

SN - 1756-0381

IS - 1

M1 - 8

ER -