Differential co-expression network centrality and machine learning feature selection for identifying susceptibility hubs in networks with scale-free structure

Caleb A. Lareau; Bill C. White; Ann L. Oberg; Brett A. McKinney

doi:10.1186/s13040-015-0040-x

Differential co-expression network centrality and machine learning feature selection for identifying susceptibility hubs in networks with scale-free structure

Caleb A. Lareau, Bill C. White, Ann L. Oberg, Brett A. McKinney

Quantitative Health Sciences

Research output: Contribution to journal › Article › peer-review

21 Scopus citations

Abstract

Background: Biological insights into group differences, such as disease status, have been achieved through differential co-expression analysis of microarray data. Additional understanding of group differences may be achieved by integrating the connectivity structure of the differential co-expression network and per-gene differential expression between phenotypic groups. Such a global differential co-expression network strategy may increase sensitivity to detect gene-gene interactions (or expression epistasis) that may act as candidates for rewiring susceptibility co-expression networks. Methods: We test two methods for inferring Genetic Association Interaction Networks (GAIN) incorporating both differential co-expression effects and differential expression effects: a generalized linear model (GLM) regression method with interaction effects (reGAIN) and a Fisher test method for correlation differences (dcGAIN). We rank the importance of each gene with complete interaction network centrality (CINC), which integrates each gene's differential co-expression effects in the GAIN model along with each gene's individual differential expression measure. We compare these methods with statistical learning methods Relief-F, Random Forests and Lasso. We also develop a mixture model and permutation approach for determining significant importance score thresholds for network centralities, Relief-F and Random Forest. We introduce a novel simulation strategy that generates microarray case-control data with embedded differential co-expression networks and underlying correlation structure based on scale-free or Erdos-Renyi (ER) random networks. Results: Using the network simulation strategy, we find that Relief-F and reGAIN provide the best balance between detecting interactions and main effects, plus reGAIN has the ability to adjust for covariates and model quantitative traits. The dcGAIN approach performs best at finding differential co-expression effects by design but worst for main effects, and it does not adjust for covariates and is limited to dichotomous outcomes. When the underlying network is scale free instead of ER, all interaction network methods have greater power to find differential co-expression effects. We apply these methods to a public microarray study of the differential immune response to influenza vaccine, and we identify effects that suggest a role in influenza vaccine immune response for genes from the PI3K family, which includes genes with known immunodeficiency function, and KLRG1, which is a known marker of senescence.

Original language	English (US)
Article number	5
Journal	BioData Mining
Volume	8
Issue number	1
DOIs	https://doi.org/10.1186/s13040-015-0040-x
State	Published - Jan 8 2015

ASJC Scopus subject areas

Biochemistry
Molecular Biology
Genetics
Computer Science Applications
Computational Theory and Mathematics
Computational Mathematics

Access to Document

10.1186/s13040-015-0040-x

Cite this

@article{f9fe706348a14ec5a97697716068a874,

title = "Differential co-expression network centrality and machine learning feature selection for identifying susceptibility hubs in networks with scale-free structure",

abstract = "Background: Biological insights into group differences, such as disease status, have been achieved through differential co-expression analysis of microarray data. Additional understanding of group differences may be achieved by integrating the connectivity structure of the differential co-expression network and per-gene differential expression between phenotypic groups. Such a global differential co-expression network strategy may increase sensitivity to detect gene-gene interactions (or expression epistasis) that may act as candidates for rewiring susceptibility co-expression networks. Methods: We test two methods for inferring Genetic Association Interaction Networks (GAIN) incorporating both differential co-expression effects and differential expression effects: a generalized linear model (GLM) regression method with interaction effects (reGAIN) and a Fisher test method for correlation differences (dcGAIN). We rank the importance of each gene with complete interaction network centrality (CINC), which integrates each gene's differential co-expression effects in the GAIN model along with each gene's individual differential expression measure. We compare these methods with statistical learning methods Relief-F, Random Forests and Lasso. We also develop a mixture model and permutation approach for determining significant importance score thresholds for network centralities, Relief-F and Random Forest. We introduce a novel simulation strategy that generates microarray case-control data with embedded differential co-expression networks and underlying correlation structure based on scale-free or Erdos-Renyi (ER) random networks. Results: Using the network simulation strategy, we find that Relief-F and reGAIN provide the best balance between detecting interactions and main effects, plus reGAIN has the ability to adjust for covariates and model quantitative traits. The dcGAIN approach performs best at finding differential co-expression effects by design but worst for main effects, and it does not adjust for covariates and is limited to dichotomous outcomes. When the underlying network is scale free instead of ER, all interaction network methods have greater power to find differential co-expression effects. We apply these methods to a public microarray study of the differential immune response to influenza vaccine, and we identify effects that suggest a role in influenza vaccine immune response for genes from the PI3K family, which includes genes with known immunodeficiency function, and KLRG1, which is a known marker of senescence.",

author = "Lareau, {Caleb A.} and White, {Bill C.} and Oberg, {Ann L.} and McKinney, {Brett A.}",

note = "Publisher Copyright: {\textcopyright} 2015 Lareau et al.; licensee BioMed Central.",

year = "2015",

month = jan,

day = "8",

doi = "10.1186/s13040-015-0040-x",

language = "English (US)",

volume = "8",

journal = "BioData Mining",

issn = "1756-0381",

publisher = "BioMed Central",

number = "1",

}

TY - JOUR

T1 - Differential co-expression network centrality and machine learning feature selection for identifying susceptibility hubs in networks with scale-free structure

AU - Lareau, Caleb A.

AU - White, Bill C.

AU - Oberg, Ann L.

AU - McKinney, Brett A.

PY - 2015/1/8

Y1 - 2015/1/8

N2 - Background: Biological insights into group differences, such as disease status, have been achieved through differential co-expression analysis of microarray data. Additional understanding of group differences may be achieved by integrating the connectivity structure of the differential co-expression network and per-gene differential expression between phenotypic groups. Such a global differential co-expression network strategy may increase sensitivity to detect gene-gene interactions (or expression epistasis) that may act as candidates for rewiring susceptibility co-expression networks. Methods: We test two methods for inferring Genetic Association Interaction Networks (GAIN) incorporating both differential co-expression effects and differential expression effects: a generalized linear model (GLM) regression method with interaction effects (reGAIN) and a Fisher test method for correlation differences (dcGAIN). We rank the importance of each gene with complete interaction network centrality (CINC), which integrates each gene's differential co-expression effects in the GAIN model along with each gene's individual differential expression measure. We compare these methods with statistical learning methods Relief-F, Random Forests and Lasso. We also develop a mixture model and permutation approach for determining significant importance score thresholds for network centralities, Relief-F and Random Forest. We introduce a novel simulation strategy that generates microarray case-control data with embedded differential co-expression networks and underlying correlation structure based on scale-free or Erdos-Renyi (ER) random networks. Results: Using the network simulation strategy, we find that Relief-F and reGAIN provide the best balance between detecting interactions and main effects, plus reGAIN has the ability to adjust for covariates and model quantitative traits. The dcGAIN approach performs best at finding differential co-expression effects by design but worst for main effects, and it does not adjust for covariates and is limited to dichotomous outcomes. When the underlying network is scale free instead of ER, all interaction network methods have greater power to find differential co-expression effects. We apply these methods to a public microarray study of the differential immune response to influenza vaccine, and we identify effects that suggest a role in influenza vaccine immune response for genes from the PI3K family, which includes genes with known immunodeficiency function, and KLRG1, which is a known marker of senescence.

AB - Background: Biological insights into group differences, such as disease status, have been achieved through differential co-expression analysis of microarray data. Additional understanding of group differences may be achieved by integrating the connectivity structure of the differential co-expression network and per-gene differential expression between phenotypic groups. Such a global differential co-expression network strategy may increase sensitivity to detect gene-gene interactions (or expression epistasis) that may act as candidates for rewiring susceptibility co-expression networks. Methods: We test two methods for inferring Genetic Association Interaction Networks (GAIN) incorporating both differential co-expression effects and differential expression effects: a generalized linear model (GLM) regression method with interaction effects (reGAIN) and a Fisher test method for correlation differences (dcGAIN). We rank the importance of each gene with complete interaction network centrality (CINC), which integrates each gene's differential co-expression effects in the GAIN model along with each gene's individual differential expression measure. We compare these methods with statistical learning methods Relief-F, Random Forests and Lasso. We also develop a mixture model and permutation approach for determining significant importance score thresholds for network centralities, Relief-F and Random Forest. We introduce a novel simulation strategy that generates microarray case-control data with embedded differential co-expression networks and underlying correlation structure based on scale-free or Erdos-Renyi (ER) random networks. Results: Using the network simulation strategy, we find that Relief-F and reGAIN provide the best balance between detecting interactions and main effects, plus reGAIN has the ability to adjust for covariates and model quantitative traits. The dcGAIN approach performs best at finding differential co-expression effects by design but worst for main effects, and it does not adjust for covariates and is limited to dichotomous outcomes. When the underlying network is scale free instead of ER, all interaction network methods have greater power to find differential co-expression effects. We apply these methods to a public microarray study of the differential immune response to influenza vaccine, and we identify effects that suggest a role in influenza vaccine immune response for genes from the PI3K family, which includes genes with known immunodeficiency function, and KLRG1, which is a known marker of senescence.

UR - http://www.scopus.com/inward/record.url?scp=84924144554&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84924144554&partnerID=8YFLogxK

U2 - 10.1186/s13040-015-0040-x

DO - 10.1186/s13040-015-0040-x

M3 - Article

AN - SCOPUS:84924144554

SN - 1756-0381

VL - 8

JO - BioData Mining

JF - BioData Mining

IS - 1

M1 - 5

ER -

Differential co-expression network centrality and machine learning feature selection for identifying susceptibility hubs in networks with scale-free structure

Abstract

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this