A partially linear tree-based regression model for assessing complex joint gene-gene and gene-environment effects

Jinbo Chen, Kai Yu, Ann Hsing, Terry M Therneau

Research output: Contribution to journalArticle

22 Citations (Scopus)

Abstract

The success of genetic dissection of complex diseases may greatly benefit from judicious exploration of joint gene effects, which, in turn, critically depends on the power of statistical tools. Standard regression models are convenient for assessing main effects and low-order gene-gene interactions but not for exploring complex higher-order interactions. Tree-based methodology is an attractive alternative for disentangling possible interactions, but it has difficulty in modeling additive main effects. This work proposes a new class of semiparametric regression models, termed partially linear tree-based regression (PLTR) models, which exhibit the advantages of both generalized linear regression and tree models. A PLTR model quantifies joint effects of genes and other risk factors by a combination of linear main effects and a non-parametric tree -structure. We propose an iterative algorithm to fit the PLTR model, and a unified resampling approach for identifying and testing the significance of the optimal "pruned" tree nested within the tree resultant from the fitting algorithm. Simulation studies showed that the resampling procedure maintained the correct type I error rate. We applied the PLTR model to assess the association between biliary stone risk and 53 single nucleotide polymorphisms (SNPs) in the inflammation pathway in a population-based case-control study. The analysis yielded an interesting parsimonious summary of the joint effect of all SNPs. The proposed model is also useful for exploring gene-environment interactions and has broad implications for applying the tree methodology to genetic epidemiology research.

Original languageEnglish (US)
Pages (from-to)238-251
Number of pages14
JournalGenetic Epidemiology
Volume31
Issue number3
DOIs
StatePublished - Apr 2007
Externally publishedYes

Fingerprint

Joints
Genes
Single Nucleotide Polymorphism
Gene-Environment Interaction
Genetic Research
Gene Order
Molecular Epidemiology
Case-Control Studies
Dissection
Linear Models
Inflammation
Population

Keywords

  • Gene-environment interaction
  • Gene-gene interaction
  • Generalized linear model
  • Partially linear
  • Tree model

ASJC Scopus subject areas

  • Epidemiology
  • Genetics(clinical)

Cite this

A partially linear tree-based regression model for assessing complex joint gene-gene and gene-environment effects. / Chen, Jinbo; Yu, Kai; Hsing, Ann; Therneau, Terry M.

In: Genetic Epidemiology, Vol. 31, No. 3, 04.2007, p. 238-251.

Research output: Contribution to journalArticle

@article{386cc6297b91442ea678f44e1de269f3,
title = "A partially linear tree-based regression model for assessing complex joint gene-gene and gene-environment effects",
abstract = "The success of genetic dissection of complex diseases may greatly benefit from judicious exploration of joint gene effects, which, in turn, critically depends on the power of statistical tools. Standard regression models are convenient for assessing main effects and low-order gene-gene interactions but not for exploring complex higher-order interactions. Tree-based methodology is an attractive alternative for disentangling possible interactions, but it has difficulty in modeling additive main effects. This work proposes a new class of semiparametric regression models, termed partially linear tree-based regression (PLTR) models, which exhibit the advantages of both generalized linear regression and tree models. A PLTR model quantifies joint effects of genes and other risk factors by a combination of linear main effects and a non-parametric tree -structure. We propose an iterative algorithm to fit the PLTR model, and a unified resampling approach for identifying and testing the significance of the optimal {"}pruned{"} tree nested within the tree resultant from the fitting algorithm. Simulation studies showed that the resampling procedure maintained the correct type I error rate. We applied the PLTR model to assess the association between biliary stone risk and 53 single nucleotide polymorphisms (SNPs) in the inflammation pathway in a population-based case-control study. The analysis yielded an interesting parsimonious summary of the joint effect of all SNPs. The proposed model is also useful for exploring gene-environment interactions and has broad implications for applying the tree methodology to genetic epidemiology research.",
keywords = "Gene-environment interaction, Gene-gene interaction, Generalized linear model, Partially linear, Tree model",
author = "Jinbo Chen and Kai Yu and Ann Hsing and Therneau, {Terry M}",
year = "2007",
month = "4",
doi = "10.1002/gepi.20205",
language = "English (US)",
volume = "31",
pages = "238--251",
journal = "Genetic Epidemiology",
issn = "0741-0395",
publisher = "Wiley-Liss Inc.",
number = "3",

}

TY - JOUR

T1 - A partially linear tree-based regression model for assessing complex joint gene-gene and gene-environment effects

AU - Chen, Jinbo

AU - Yu, Kai

AU - Hsing, Ann

AU - Therneau, Terry M

PY - 2007/4

Y1 - 2007/4

N2 - The success of genetic dissection of complex diseases may greatly benefit from judicious exploration of joint gene effects, which, in turn, critically depends on the power of statistical tools. Standard regression models are convenient for assessing main effects and low-order gene-gene interactions but not for exploring complex higher-order interactions. Tree-based methodology is an attractive alternative for disentangling possible interactions, but it has difficulty in modeling additive main effects. This work proposes a new class of semiparametric regression models, termed partially linear tree-based regression (PLTR) models, which exhibit the advantages of both generalized linear regression and tree models. A PLTR model quantifies joint effects of genes and other risk factors by a combination of linear main effects and a non-parametric tree -structure. We propose an iterative algorithm to fit the PLTR model, and a unified resampling approach for identifying and testing the significance of the optimal "pruned" tree nested within the tree resultant from the fitting algorithm. Simulation studies showed that the resampling procedure maintained the correct type I error rate. We applied the PLTR model to assess the association between biliary stone risk and 53 single nucleotide polymorphisms (SNPs) in the inflammation pathway in a population-based case-control study. The analysis yielded an interesting parsimonious summary of the joint effect of all SNPs. The proposed model is also useful for exploring gene-environment interactions and has broad implications for applying the tree methodology to genetic epidemiology research.

AB - The success of genetic dissection of complex diseases may greatly benefit from judicious exploration of joint gene effects, which, in turn, critically depends on the power of statistical tools. Standard regression models are convenient for assessing main effects and low-order gene-gene interactions but not for exploring complex higher-order interactions. Tree-based methodology is an attractive alternative for disentangling possible interactions, but it has difficulty in modeling additive main effects. This work proposes a new class of semiparametric regression models, termed partially linear tree-based regression (PLTR) models, which exhibit the advantages of both generalized linear regression and tree models. A PLTR model quantifies joint effects of genes and other risk factors by a combination of linear main effects and a non-parametric tree -structure. We propose an iterative algorithm to fit the PLTR model, and a unified resampling approach for identifying and testing the significance of the optimal "pruned" tree nested within the tree resultant from the fitting algorithm. Simulation studies showed that the resampling procedure maintained the correct type I error rate. We applied the PLTR model to assess the association between biliary stone risk and 53 single nucleotide polymorphisms (SNPs) in the inflammation pathway in a population-based case-control study. The analysis yielded an interesting parsimonious summary of the joint effect of all SNPs. The proposed model is also useful for exploring gene-environment interactions and has broad implications for applying the tree methodology to genetic epidemiology research.

KW - Gene-environment interaction

KW - Gene-gene interaction

KW - Generalized linear model

KW - Partially linear

KW - Tree model

UR - http://www.scopus.com/inward/record.url?scp=33947672866&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33947672866&partnerID=8YFLogxK

U2 - 10.1002/gepi.20205

DO - 10.1002/gepi.20205

M3 - Article

VL - 31

SP - 238

EP - 251

JO - Genetic Epidemiology

JF - Genetic Epidemiology

SN - 0741-0395

IS - 3

ER -