RedundancyMiner

De-replication of redundant GO categories in microarray and proteomics analysis

Barry R. Zeeberg, Hongfang D Liu, Ari B. Kahn, Martin Ehler, Vinodh N. Rajapakse, Robert F. Bonner, Jacob D. Brown, Brian P. Brooks, Vladimir L. Larionov, William Reinhold, John N. Weinstein, Yves G. Pommier

Research output: Contribution to journalArticle

14 Citations (Scopus)

Abstract

Background: The Gene Ontology (GO) Consortium organizes genes into hierarchical categories based on biological process, molecular function and subcellular localization. Tools such as GoMiner can leverage GO to perform ontological analysis of microarray and proteomics studies, typically generating a list of significant functional categories. Two or more of the categories are often redundant, in the sense that identical or nearly-identical sets of genes map to the categories. The redundancy might typically inflate the report of significant categories by a factor of three-fold, create an illusion of an overly long list of significant categories, and obscure the relevant biological interpretation.Results: We now introduce a new resource, RedundancyMiner, that de-replicates the redundant and nearly-redundant GO categories that had been determined by first running GoMiner. The main algorithm of RedundancyMiner, MultiClust, performs a novel form of cluster analysis in which a GO category might belong to several category clusters. Each category cluster follows a "complete linkage" paradigm. The metric is a similarity measure that captures the overlap in gene mapping between pairs of categories.Conclusions: RedundancyMiner effectively eliminated redundancies from a set of GO categories. For illustration, we have applied it to the clarification of the results arising from two current studies: (1) assessment of the gene expression profiles obtained by laser capture microdissection (LCM) of serial cryosections of the retina at the site of final optic fissure closure in the mouse embryos at specific embryonic stages, and (2) analysis of a conceptual data set obtained by examining a list of genes deemed to be "kinetochore" genes.

Original languageEnglish (US)
Article number52
JournalBMC Bioinformatics
Volume12
DOIs
StatePublished - Feb 10 2011
Externally publishedYes

Fingerprint

Gene Ontology
Proteomics
Microarray Analysis
Microarrays
Microarray
Replication
Ontology
Genes
Gene
Cluster Category
Laser Capture Microdissection
Biological Phenomena
Kinetochores
Chromosome Mapping
Transcriptome
Running
Redundancy
Cluster Analysis
Retina
Microdissection

ASJC Scopus subject areas

  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Applied Mathematics
  • Structural Biology

Cite this

Zeeberg, B. R., Liu, H. D., Kahn, A. B., Ehler, M., Rajapakse, V. N., Bonner, R. F., ... Pommier, Y. G. (2011). RedundancyMiner: De-replication of redundant GO categories in microarray and proteomics analysis. BMC Bioinformatics, 12, [52]. https://doi.org/10.1186/1471-2105-12-52

RedundancyMiner : De-replication of redundant GO categories in microarray and proteomics analysis. / Zeeberg, Barry R.; Liu, Hongfang D; Kahn, Ari B.; Ehler, Martin; Rajapakse, Vinodh N.; Bonner, Robert F.; Brown, Jacob D.; Brooks, Brian P.; Larionov, Vladimir L.; Reinhold, William; Weinstein, John N.; Pommier, Yves G.

In: BMC Bioinformatics, Vol. 12, 52, 10.02.2011.

Research output: Contribution to journalArticle

Zeeberg, BR, Liu, HD, Kahn, AB, Ehler, M, Rajapakse, VN, Bonner, RF, Brown, JD, Brooks, BP, Larionov, VL, Reinhold, W, Weinstein, JN & Pommier, YG 2011, 'RedundancyMiner: De-replication of redundant GO categories in microarray and proteomics analysis', BMC Bioinformatics, vol. 12, 52. https://doi.org/10.1186/1471-2105-12-52
Zeeberg, Barry R. ; Liu, Hongfang D ; Kahn, Ari B. ; Ehler, Martin ; Rajapakse, Vinodh N. ; Bonner, Robert F. ; Brown, Jacob D. ; Brooks, Brian P. ; Larionov, Vladimir L. ; Reinhold, William ; Weinstein, John N. ; Pommier, Yves G. / RedundancyMiner : De-replication of redundant GO categories in microarray and proteomics analysis. In: BMC Bioinformatics. 2011 ; Vol. 12.
@article{6e4af3ee24d142fa9afd11dcb4295b07,
title = "RedundancyMiner: De-replication of redundant GO categories in microarray and proteomics analysis",
abstract = "Background: The Gene Ontology (GO) Consortium organizes genes into hierarchical categories based on biological process, molecular function and subcellular localization. Tools such as GoMiner can leverage GO to perform ontological analysis of microarray and proteomics studies, typically generating a list of significant functional categories. Two or more of the categories are often redundant, in the sense that identical or nearly-identical sets of genes map to the categories. The redundancy might typically inflate the report of significant categories by a factor of three-fold, create an illusion of an overly long list of significant categories, and obscure the relevant biological interpretation.Results: We now introduce a new resource, RedundancyMiner, that de-replicates the redundant and nearly-redundant GO categories that had been determined by first running GoMiner. The main algorithm of RedundancyMiner, MultiClust, performs a novel form of cluster analysis in which a GO category might belong to several category clusters. Each category cluster follows a {"}complete linkage{"} paradigm. The metric is a similarity measure that captures the overlap in gene mapping between pairs of categories.Conclusions: RedundancyMiner effectively eliminated redundancies from a set of GO categories. For illustration, we have applied it to the clarification of the results arising from two current studies: (1) assessment of the gene expression profiles obtained by laser capture microdissection (LCM) of serial cryosections of the retina at the site of final optic fissure closure in the mouse embryos at specific embryonic stages, and (2) analysis of a conceptual data set obtained by examining a list of genes deemed to be {"}kinetochore{"} genes.",
author = "Zeeberg, {Barry R.} and Liu, {Hongfang D} and Kahn, {Ari B.} and Martin Ehler and Rajapakse, {Vinodh N.} and Bonner, {Robert F.} and Brown, {Jacob D.} and Brooks, {Brian P.} and Larionov, {Vladimir L.} and William Reinhold and Weinstein, {John N.} and Pommier, {Yves G.}",
year = "2011",
month = "2",
day = "10",
doi = "10.1186/1471-2105-12-52",
language = "English (US)",
volume = "12",
journal = "BMC Bioinformatics",
issn = "1471-2105",
publisher = "BioMed Central",

}

TY - JOUR

T1 - RedundancyMiner

T2 - De-replication of redundant GO categories in microarray and proteomics analysis

AU - Zeeberg, Barry R.

AU - Liu, Hongfang D

AU - Kahn, Ari B.

AU - Ehler, Martin

AU - Rajapakse, Vinodh N.

AU - Bonner, Robert F.

AU - Brown, Jacob D.

AU - Brooks, Brian P.

AU - Larionov, Vladimir L.

AU - Reinhold, William

AU - Weinstein, John N.

AU - Pommier, Yves G.

PY - 2011/2/10

Y1 - 2011/2/10

N2 - Background: The Gene Ontology (GO) Consortium organizes genes into hierarchical categories based on biological process, molecular function and subcellular localization. Tools such as GoMiner can leverage GO to perform ontological analysis of microarray and proteomics studies, typically generating a list of significant functional categories. Two or more of the categories are often redundant, in the sense that identical or nearly-identical sets of genes map to the categories. The redundancy might typically inflate the report of significant categories by a factor of three-fold, create an illusion of an overly long list of significant categories, and obscure the relevant biological interpretation.Results: We now introduce a new resource, RedundancyMiner, that de-replicates the redundant and nearly-redundant GO categories that had been determined by first running GoMiner. The main algorithm of RedundancyMiner, MultiClust, performs a novel form of cluster analysis in which a GO category might belong to several category clusters. Each category cluster follows a "complete linkage" paradigm. The metric is a similarity measure that captures the overlap in gene mapping between pairs of categories.Conclusions: RedundancyMiner effectively eliminated redundancies from a set of GO categories. For illustration, we have applied it to the clarification of the results arising from two current studies: (1) assessment of the gene expression profiles obtained by laser capture microdissection (LCM) of serial cryosections of the retina at the site of final optic fissure closure in the mouse embryos at specific embryonic stages, and (2) analysis of a conceptual data set obtained by examining a list of genes deemed to be "kinetochore" genes.

AB - Background: The Gene Ontology (GO) Consortium organizes genes into hierarchical categories based on biological process, molecular function and subcellular localization. Tools such as GoMiner can leverage GO to perform ontological analysis of microarray and proteomics studies, typically generating a list of significant functional categories. Two or more of the categories are often redundant, in the sense that identical or nearly-identical sets of genes map to the categories. The redundancy might typically inflate the report of significant categories by a factor of three-fold, create an illusion of an overly long list of significant categories, and obscure the relevant biological interpretation.Results: We now introduce a new resource, RedundancyMiner, that de-replicates the redundant and nearly-redundant GO categories that had been determined by first running GoMiner. The main algorithm of RedundancyMiner, MultiClust, performs a novel form of cluster analysis in which a GO category might belong to several category clusters. Each category cluster follows a "complete linkage" paradigm. The metric is a similarity measure that captures the overlap in gene mapping between pairs of categories.Conclusions: RedundancyMiner effectively eliminated redundancies from a set of GO categories. For illustration, we have applied it to the clarification of the results arising from two current studies: (1) assessment of the gene expression profiles obtained by laser capture microdissection (LCM) of serial cryosections of the retina at the site of final optic fissure closure in the mouse embryos at specific embryonic stages, and (2) analysis of a conceptual data set obtained by examining a list of genes deemed to be "kinetochore" genes.

UR - http://www.scopus.com/inward/record.url?scp=79751536704&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=79751536704&partnerID=8YFLogxK

U2 - 10.1186/1471-2105-12-52

DO - 10.1186/1471-2105-12-52

M3 - Article

VL - 12

JO - BMC Bioinformatics

JF - BMC Bioinformatics

SN - 1471-2105

M1 - 52

ER -