Knowledge-guided multi-scale independent component analysis for biomarker identification

Li Chen, Jianhua Xuan, Chen Wang, Le Ming Shih, Yue Wang, Zhen Zhang, Eric Hoffman, Robert Clarke

Research output: Contribution to journalArticle

18 Citations (Scopus)

Abstract

Background: Many statistical methods have been proposed to identify disease biomarkers from gene expression profiles. However, from gene expression profile data alone, statistical methods often fail to identify biologically meaningful biomarkers related to a specific disease under study. In this paper, we develop a novel strategy, namely knowledge-guided multi-scale independent component analysis (ICA), to first infer regulatory signals and then identify biologically relevant biomarkers from microarray data. Results: Since gene expression levels reflect the joint effect of several underlying biological functions, disease-specific biomarkers may be involved in several distinct biological functions. To identify disease-specific biomarkers that provide unique mechanistic insights, a meta-data "knowledge gene pool" (KGP) is first constructed from multiple data sources to provide important information on the likely functions (such as gene ontology information) and regulatory events (such as promoter responsive elements) associated with potential genes of interest. The gene expression and biological meta data associated with the members of the KGP can then be used to guide subsequent analysis. ICA is then applied to multi-scale gene clusters to reveal regulatory modes reflecting the underlying biological mechanisms. Finally disease-specific biomarkers are extracted by their weighted connectivity scores associated with the extracted regulatory modes. A statistical significance test is used to evaluate the significance of transcription factor enrichment for the extracted gene set based on motif information. We applied the proposed method to yeast cell cycle microarray data and Rsf-1-induced ovarian cancer microarray data. The results show that our knowledge-guided ICA approach can extract biologically meaningful regulatory modes and outperform several baseline methods for biomarker identification. Conclusion: We have proposed a novel method, namely knowledge-guided multi-scale ICA, to identify disease-specific biomarkers. The goal is to infer knowledge-relevant regulatory signals and then identify corresponding biomarkers through a multi-scale strategy. The approach has been successfully applied to two expression profiling experiments to demonstrate its improved performance in extracting biologically meaningful and disease-related biomarkers. More importantly, the proposed approach shows promising results to infer novel biomarkers for ovarian cancer and extend current knowledge.

Original languageEnglish (US)
Article number416
JournalBMC Bioinformatics
Volume9
DOIs
StatePublished - Oct 6 2008
Externally publishedYes

Fingerprint

Multiscale Analysis
Independent component analysis
Biomarkers
Independent Component Analysis
Genes
Gene expression
Gene
Microarrays
Microarray Data
Ovarian Cancer
Gene Pool
Gene Expression Profile
Metadata
Transcriptome
Ovarian Neoplasms
Statistical method
Gene Expression
Statistical methods
Knowledge
Significance Test

ASJC Scopus subject areas

  • Structural Biology
  • Biochemistry
  • Medicine(all)
  • Molecular Biology
  • Computer Science Applications
  • Applied Mathematics

Cite this

Knowledge-guided multi-scale independent component analysis for biomarker identification. / Chen, Li; Xuan, Jianhua; Wang, Chen; Shih, Le Ming; Wang, Yue; Zhang, Zhen; Hoffman, Eric; Clarke, Robert.

In: BMC Bioinformatics, Vol. 9, 416, 06.10.2008.

Research output: Contribution to journalArticle

Chen, Li ; Xuan, Jianhua ; Wang, Chen ; Shih, Le Ming ; Wang, Yue ; Zhang, Zhen ; Hoffman, Eric ; Clarke, Robert. / Knowledge-guided multi-scale independent component analysis for biomarker identification. In: BMC Bioinformatics. 2008 ; Vol. 9.
@article{cce851be0b78407e8035aac5638a3957,
title = "Knowledge-guided multi-scale independent component analysis for biomarker identification",
abstract = "Background: Many statistical methods have been proposed to identify disease biomarkers from gene expression profiles. However, from gene expression profile data alone, statistical methods often fail to identify biologically meaningful biomarkers related to a specific disease under study. In this paper, we develop a novel strategy, namely knowledge-guided multi-scale independent component analysis (ICA), to first infer regulatory signals and then identify biologically relevant biomarkers from microarray data. Results: Since gene expression levels reflect the joint effect of several underlying biological functions, disease-specific biomarkers may be involved in several distinct biological functions. To identify disease-specific biomarkers that provide unique mechanistic insights, a meta-data {"}knowledge gene pool{"} (KGP) is first constructed from multiple data sources to provide important information on the likely functions (such as gene ontology information) and regulatory events (such as promoter responsive elements) associated with potential genes of interest. The gene expression and biological meta data associated with the members of the KGP can then be used to guide subsequent analysis. ICA is then applied to multi-scale gene clusters to reveal regulatory modes reflecting the underlying biological mechanisms. Finally disease-specific biomarkers are extracted by their weighted connectivity scores associated with the extracted regulatory modes. A statistical significance test is used to evaluate the significance of transcription factor enrichment for the extracted gene set based on motif information. We applied the proposed method to yeast cell cycle microarray data and Rsf-1-induced ovarian cancer microarray data. The results show that our knowledge-guided ICA approach can extract biologically meaningful regulatory modes and outperform several baseline methods for biomarker identification. Conclusion: We have proposed a novel method, namely knowledge-guided multi-scale ICA, to identify disease-specific biomarkers. The goal is to infer knowledge-relevant regulatory signals and then identify corresponding biomarkers through a multi-scale strategy. The approach has been successfully applied to two expression profiling experiments to demonstrate its improved performance in extracting biologically meaningful and disease-related biomarkers. More importantly, the proposed approach shows promising results to infer novel biomarkers for ovarian cancer and extend current knowledge.",
author = "Li Chen and Jianhua Xuan and Chen Wang and Shih, {Le Ming} and Yue Wang and Zhen Zhang and Eric Hoffman and Robert Clarke",
year = "2008",
month = "10",
day = "6",
doi = "10.1186/1471-2105-9-416",
language = "English (US)",
volume = "9",
journal = "BMC Bioinformatics",
issn = "1471-2105",
publisher = "BioMed Central",

}

TY - JOUR

T1 - Knowledge-guided multi-scale independent component analysis for biomarker identification

AU - Chen, Li

AU - Xuan, Jianhua

AU - Wang, Chen

AU - Shih, Le Ming

AU - Wang, Yue

AU - Zhang, Zhen

AU - Hoffman, Eric

AU - Clarke, Robert

PY - 2008/10/6

Y1 - 2008/10/6

N2 - Background: Many statistical methods have been proposed to identify disease biomarkers from gene expression profiles. However, from gene expression profile data alone, statistical methods often fail to identify biologically meaningful biomarkers related to a specific disease under study. In this paper, we develop a novel strategy, namely knowledge-guided multi-scale independent component analysis (ICA), to first infer regulatory signals and then identify biologically relevant biomarkers from microarray data. Results: Since gene expression levels reflect the joint effect of several underlying biological functions, disease-specific biomarkers may be involved in several distinct biological functions. To identify disease-specific biomarkers that provide unique mechanistic insights, a meta-data "knowledge gene pool" (KGP) is first constructed from multiple data sources to provide important information on the likely functions (such as gene ontology information) and regulatory events (such as promoter responsive elements) associated with potential genes of interest. The gene expression and biological meta data associated with the members of the KGP can then be used to guide subsequent analysis. ICA is then applied to multi-scale gene clusters to reveal regulatory modes reflecting the underlying biological mechanisms. Finally disease-specific biomarkers are extracted by their weighted connectivity scores associated with the extracted regulatory modes. A statistical significance test is used to evaluate the significance of transcription factor enrichment for the extracted gene set based on motif information. We applied the proposed method to yeast cell cycle microarray data and Rsf-1-induced ovarian cancer microarray data. The results show that our knowledge-guided ICA approach can extract biologically meaningful regulatory modes and outperform several baseline methods for biomarker identification. Conclusion: We have proposed a novel method, namely knowledge-guided multi-scale ICA, to identify disease-specific biomarkers. The goal is to infer knowledge-relevant regulatory signals and then identify corresponding biomarkers through a multi-scale strategy. The approach has been successfully applied to two expression profiling experiments to demonstrate its improved performance in extracting biologically meaningful and disease-related biomarkers. More importantly, the proposed approach shows promising results to infer novel biomarkers for ovarian cancer and extend current knowledge.

AB - Background: Many statistical methods have been proposed to identify disease biomarkers from gene expression profiles. However, from gene expression profile data alone, statistical methods often fail to identify biologically meaningful biomarkers related to a specific disease under study. In this paper, we develop a novel strategy, namely knowledge-guided multi-scale independent component analysis (ICA), to first infer regulatory signals and then identify biologically relevant biomarkers from microarray data. Results: Since gene expression levels reflect the joint effect of several underlying biological functions, disease-specific biomarkers may be involved in several distinct biological functions. To identify disease-specific biomarkers that provide unique mechanistic insights, a meta-data "knowledge gene pool" (KGP) is first constructed from multiple data sources to provide important information on the likely functions (such as gene ontology information) and regulatory events (such as promoter responsive elements) associated with potential genes of interest. The gene expression and biological meta data associated with the members of the KGP can then be used to guide subsequent analysis. ICA is then applied to multi-scale gene clusters to reveal regulatory modes reflecting the underlying biological mechanisms. Finally disease-specific biomarkers are extracted by their weighted connectivity scores associated with the extracted regulatory modes. A statistical significance test is used to evaluate the significance of transcription factor enrichment for the extracted gene set based on motif information. We applied the proposed method to yeast cell cycle microarray data and Rsf-1-induced ovarian cancer microarray data. The results show that our knowledge-guided ICA approach can extract biologically meaningful regulatory modes and outperform several baseline methods for biomarker identification. Conclusion: We have proposed a novel method, namely knowledge-guided multi-scale ICA, to identify disease-specific biomarkers. The goal is to infer knowledge-relevant regulatory signals and then identify corresponding biomarkers through a multi-scale strategy. The approach has been successfully applied to two expression profiling experiments to demonstrate its improved performance in extracting biologically meaningful and disease-related biomarkers. More importantly, the proposed approach shows promising results to infer novel biomarkers for ovarian cancer and extend current knowledge.

UR - http://www.scopus.com/inward/record.url?scp=55249097170&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=55249097170&partnerID=8YFLogxK

U2 - 10.1186/1471-2105-9-416

DO - 10.1186/1471-2105-9-416

M3 - Article

C2 - 18837990

AN - SCOPUS:55249097170

VL - 9

JO - BMC Bioinformatics

JF - BMC Bioinformatics

SN - 1471-2105

M1 - 416

ER -