An integrative computational approach to identify disease-specific networks from PubMed literature information

Yuji Zhang, Dingchen Li, Cui Tao, Feichen Shen, Hongfang D Liu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

7 Citations (Scopus)

Abstract

A huge amount of association relationships among biological entities (e.g., diseases, drugs, and genes) are scattered in biomedicai literature. How to extract and analyze such heterogeneous data still remains a challenging task for most researchers in the biomedicai field. Natural language processing (NLP) has the potential in extracting associations among biological entities from literature. However, association information extracted through NLP can be large, noisy, and redundant which poses significant challenges to biomedicai researchers to use such information. To address this challenge, we propose a computational framework to facilitate the use of NLP results. We apply Latent Dirichlet Allocation (LDA) to discover topics based on associations. The networks extracted from each topic provide a disease-specific network for downstream bioinformatics analysis of associations for each topic. We illustrated the framework through the construction of disease-specific networks from Semantic MEDLINE, an NLP-generated association database, followed by the analysis of network properties, such as hub nodes and degree distribution. The results demonstrate that (1) LDA-based approach can group related diseases into the same disease topic; (2) the disease-specific association network follows the scale-free network property, in which hub nodes are enriched in related diseases, genes and drugs.

Original languageEnglish (US)
Title of host publicationProceedings - 2013 IEEE International Conference on Bioinformatics and Biomedicine, IEEE BIBM 2013
Pages72-75
Number of pages4
DOIs
StatePublished - 2013
Event2013 IEEE International Conference on Bioinformatics and Biomedicine, IEEE BIBM 2013 - Shanghai, China
Duration: Dec 18 2013Dec 21 2013

Other

Other2013 IEEE International Conference on Bioinformatics and Biomedicine, IEEE BIBM 2013
CountryChina
CityShanghai
Period12/18/1312/21/13

Fingerprint

Processing
Genes
Information use
Complex networks
Bioinformatics
Semantics

Keywords

  • Disease-specific network
  • Latent Dirichlet Allocation
  • Network Analysis
  • Semantic MEDLINE

ASJC Scopus subject areas

  • Biomedical Engineering

Cite this

Zhang, Y., Li, D., Tao, C., Shen, F., & Liu, H. D. (2013). An integrative computational approach to identify disease-specific networks from PubMed literature information. In Proceedings - 2013 IEEE International Conference on Bioinformatics and Biomedicine, IEEE BIBM 2013 (pp. 72-75). [6732738] https://doi.org/10.1109/BIBM.2013.6732738

An integrative computational approach to identify disease-specific networks from PubMed literature information. / Zhang, Yuji; Li, Dingchen; Tao, Cui; Shen, Feichen; Liu, Hongfang D.

Proceedings - 2013 IEEE International Conference on Bioinformatics and Biomedicine, IEEE BIBM 2013. 2013. p. 72-75 6732738.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Zhang, Y, Li, D, Tao, C, Shen, F & Liu, HD 2013, An integrative computational approach to identify disease-specific networks from PubMed literature information. in Proceedings - 2013 IEEE International Conference on Bioinformatics and Biomedicine, IEEE BIBM 2013., 6732738, pp. 72-75, 2013 IEEE International Conference on Bioinformatics and Biomedicine, IEEE BIBM 2013, Shanghai, China, 12/18/13. https://doi.org/10.1109/BIBM.2013.6732738
Zhang Y, Li D, Tao C, Shen F, Liu HD. An integrative computational approach to identify disease-specific networks from PubMed literature information. In Proceedings - 2013 IEEE International Conference on Bioinformatics and Biomedicine, IEEE BIBM 2013. 2013. p. 72-75. 6732738 https://doi.org/10.1109/BIBM.2013.6732738
Zhang, Yuji ; Li, Dingchen ; Tao, Cui ; Shen, Feichen ; Liu, Hongfang D. / An integrative computational approach to identify disease-specific networks from PubMed literature information. Proceedings - 2013 IEEE International Conference on Bioinformatics and Biomedicine, IEEE BIBM 2013. 2013. pp. 72-75
@inproceedings{916807a76f824965a3708914c7200500,
title = "An integrative computational approach to identify disease-specific networks from PubMed literature information",
abstract = "A huge amount of association relationships among biological entities (e.g., diseases, drugs, and genes) are scattered in biomedicai literature. How to extract and analyze such heterogeneous data still remains a challenging task for most researchers in the biomedicai field. Natural language processing (NLP) has the potential in extracting associations among biological entities from literature. However, association information extracted through NLP can be large, noisy, and redundant which poses significant challenges to biomedicai researchers to use such information. To address this challenge, we propose a computational framework to facilitate the use of NLP results. We apply Latent Dirichlet Allocation (LDA) to discover topics based on associations. The networks extracted from each topic provide a disease-specific network for downstream bioinformatics analysis of associations for each topic. We illustrated the framework through the construction of disease-specific networks from Semantic MEDLINE, an NLP-generated association database, followed by the analysis of network properties, such as hub nodes and degree distribution. The results demonstrate that (1) LDA-based approach can group related diseases into the same disease topic; (2) the disease-specific association network follows the scale-free network property, in which hub nodes are enriched in related diseases, genes and drugs.",
keywords = "Disease-specific network, Latent Dirichlet Allocation, Network Analysis, Semantic MEDLINE",
author = "Yuji Zhang and Dingchen Li and Cui Tao and Feichen Shen and Liu, {Hongfang D}",
year = "2013",
doi = "10.1109/BIBM.2013.6732738",
language = "English (US)",
isbn = "9781479913091",
pages = "72--75",
booktitle = "Proceedings - 2013 IEEE International Conference on Bioinformatics and Biomedicine, IEEE BIBM 2013",

}

TY - GEN

T1 - An integrative computational approach to identify disease-specific networks from PubMed literature information

AU - Zhang, Yuji

AU - Li, Dingchen

AU - Tao, Cui

AU - Shen, Feichen

AU - Liu, Hongfang D

PY - 2013

Y1 - 2013

N2 - A huge amount of association relationships among biological entities (e.g., diseases, drugs, and genes) are scattered in biomedicai literature. How to extract and analyze such heterogeneous data still remains a challenging task for most researchers in the biomedicai field. Natural language processing (NLP) has the potential in extracting associations among biological entities from literature. However, association information extracted through NLP can be large, noisy, and redundant which poses significant challenges to biomedicai researchers to use such information. To address this challenge, we propose a computational framework to facilitate the use of NLP results. We apply Latent Dirichlet Allocation (LDA) to discover topics based on associations. The networks extracted from each topic provide a disease-specific network for downstream bioinformatics analysis of associations for each topic. We illustrated the framework through the construction of disease-specific networks from Semantic MEDLINE, an NLP-generated association database, followed by the analysis of network properties, such as hub nodes and degree distribution. The results demonstrate that (1) LDA-based approach can group related diseases into the same disease topic; (2) the disease-specific association network follows the scale-free network property, in which hub nodes are enriched in related diseases, genes and drugs.

AB - A huge amount of association relationships among biological entities (e.g., diseases, drugs, and genes) are scattered in biomedicai literature. How to extract and analyze such heterogeneous data still remains a challenging task for most researchers in the biomedicai field. Natural language processing (NLP) has the potential in extracting associations among biological entities from literature. However, association information extracted through NLP can be large, noisy, and redundant which poses significant challenges to biomedicai researchers to use such information. To address this challenge, we propose a computational framework to facilitate the use of NLP results. We apply Latent Dirichlet Allocation (LDA) to discover topics based on associations. The networks extracted from each topic provide a disease-specific network for downstream bioinformatics analysis of associations for each topic. We illustrated the framework through the construction of disease-specific networks from Semantic MEDLINE, an NLP-generated association database, followed by the analysis of network properties, such as hub nodes and degree distribution. The results demonstrate that (1) LDA-based approach can group related diseases into the same disease topic; (2) the disease-specific association network follows the scale-free network property, in which hub nodes are enriched in related diseases, genes and drugs.

KW - Disease-specific network

KW - Latent Dirichlet Allocation

KW - Network Analysis

KW - Semantic MEDLINE

UR - http://www.scopus.com/inward/record.url?scp=84894516031&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84894516031&partnerID=8YFLogxK

U2 - 10.1109/BIBM.2013.6732738

DO - 10.1109/BIBM.2013.6732738

M3 - Conference contribution

SN - 9781479913091

SP - 72

EP - 75

BT - Proceedings - 2013 IEEE International Conference on Bioinformatics and Biomedicine, IEEE BIBM 2013

ER -