TY - JOUR
T1 - Grammatical evolution decision trees for detecting gene-gene interactions
AU - Motsinger-Reif, Alison A.
AU - Deodhar, Sushamna
AU - Winham, Stacey J.
AU - Hardison, Nicholas E.
N1 - Funding Information:
SJW and NEH are supported by training grants NIGMS T32GM081057 and NIEHS 2 T32 ES007329 respectively. An earlier version of this study and manuscript were presented at the 8th European Conference on Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics.
PY - 2010
Y1 - 2010
N2 - Background: A fundamental goal of human genetics is the discovery of polymorphisms that predict common, complex diseases. It is hypothesized that complex diseases are due to a myriad of factors including environmental exposures and complex genetic risk models, including gene-gene interactions. Such epistatic models present an important analytical challenge, requiring that methods perform not only statistical modeling, but also variable selection to generate testable genetic model hypotheses. This challenge is amplified by recent advances in genotyping technology, as the number of potential predictor variables is rapidly increasing. Methods: Decision trees are a highly successful, easily interpretable data-mining method that are typically optimized with a hierarchical model building approach, which limits their potential to identify interacting effects. To overcome this limitation, we utilize evolutionary computation, specifically grammatical evolution, to build decision trees to detect and model gene-gene interactions. In the current study, we introduce the Grammatical Evolution Decision Trees (GEDT) method and software and evaluate this approach on simulated data representing gene-gene interaction models of a range of effect sizes. We compare the performance of the method to a traditional decision tree algorithm and a random search approach and demonstrate the improved performance of the method to detect purely epistatic interactions. Results: The results of our simulations demonstrate that GEDT has high power to detect even very moderate genetic risk models. GEDT has high power to detect interactions with and without main effects. Conclusions: GEDT, while still in its initial stages of development, is a promising new approach for identifying gene-gene interactions in genetic association studies.
AB - Background: A fundamental goal of human genetics is the discovery of polymorphisms that predict common, complex diseases. It is hypothesized that complex diseases are due to a myriad of factors including environmental exposures and complex genetic risk models, including gene-gene interactions. Such epistatic models present an important analytical challenge, requiring that methods perform not only statistical modeling, but also variable selection to generate testable genetic model hypotheses. This challenge is amplified by recent advances in genotyping technology, as the number of potential predictor variables is rapidly increasing. Methods: Decision trees are a highly successful, easily interpretable data-mining method that are typically optimized with a hierarchical model building approach, which limits their potential to identify interacting effects. To overcome this limitation, we utilize evolutionary computation, specifically grammatical evolution, to build decision trees to detect and model gene-gene interactions. In the current study, we introduce the Grammatical Evolution Decision Trees (GEDT) method and software and evaluate this approach on simulated data representing gene-gene interaction models of a range of effect sizes. We compare the performance of the method to a traditional decision tree algorithm and a random search approach and demonstrate the improved performance of the method to detect purely epistatic interactions. Results: The results of our simulations demonstrate that GEDT has high power to detect even very moderate genetic risk models. GEDT has high power to detect interactions with and without main effects. Conclusions: GEDT, while still in its initial stages of development, is a promising new approach for identifying gene-gene interactions in genetic association studies.
UR - http://www.scopus.com/inward/record.url?scp=78349264270&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=78349264270&partnerID=8YFLogxK
U2 - 10.1186/1756-0381-3-8
DO - 10.1186/1756-0381-3-8
M3 - Article
C2 - 21087514
AN - SCOPUS:78349264270
SN - 1756-0381
VL - 3
JO - BioData Mining
JF - BioData Mining
IS - 1
M1 - 8
ER -