Systematic identification of latent disease-gene associations from PubMed articles

Yuji Zhang, Feichen Shen, Majid Rastegar Mojarad, Dingcheng Li, Sijia Liu, Cui Tao, Yue Yu, Hongfang D Liu

Research output: Contribution to journalArticle

9 Citations (Scopus)

Abstract

Recent scientific advances have accumulated a tremendous amount of biomedical knowledge providing novel insights into the relationship between molecular and cellular processes and diseases. Literature mining is one of the commonly used methods to retrieve and extract information from scientific publications for understanding these associations. However, due to large data volume and complicated associations with noises, the interpretability of such association data for semantic knowledge discovery is challenging. In this study, we describe an integrative computational framework aiming to expedite the discovery of latent disease mechanisms by dissecting 146,245 disease-gene associations from over 25 million of PubMed indexed articles. We take advantage of both Latent Dirichlet Allocation (LDA) modeling and network-based analysis for their capabilities of detecting latent associations and reducing noises for large volume data respectively. Our results demonstrate that (1) the LDA-based modeling is able to group similar diseases into disease topics; (2) the disease-specific association networks follow the scale-free network property; (3) certain subnetwork patterns were enriched in the disease-specific association networks; and (4) genes were enriched in topic-specific biological processes. Our approach offers promising opportunities for latent disease-gene knowledge discovery in biomedical research.

Original languageEnglish (US)
Article numbere0191568
JournalPLoS One
Volume13
Issue number1
DOIs
StatePublished - Jan 1 2018

Fingerprint

PubMed
Genes
genes
biomedical research
Data mining
Noise
Biological Phenomena
Complex networks
Genetic Association Studies
Semantics
Publications
Biomedical Research
methodology

ASJC Scopus subject areas

  • Biochemistry, Genetics and Molecular Biology(all)
  • Agricultural and Biological Sciences(all)

Cite this

Systematic identification of latent disease-gene associations from PubMed articles. / Zhang, Yuji; Shen, Feichen; Mojarad, Majid Rastegar; Li, Dingcheng; Liu, Sijia; Tao, Cui; Yu, Yue; Liu, Hongfang D.

In: PLoS One, Vol. 13, No. 1, e0191568, 01.01.2018.

Research output: Contribution to journalArticle

Zhang, Y, Shen, F, Mojarad, MR, Li, D, Liu, S, Tao, C, Yu, Y & Liu, HD 2018, 'Systematic identification of latent disease-gene associations from PubMed articles', PLoS One, vol. 13, no. 1, e0191568. https://doi.org/10.1371/journal.pone.0191568
Zhang, Yuji ; Shen, Feichen ; Mojarad, Majid Rastegar ; Li, Dingcheng ; Liu, Sijia ; Tao, Cui ; Yu, Yue ; Liu, Hongfang D. / Systematic identification of latent disease-gene associations from PubMed articles. In: PLoS One. 2018 ; Vol. 13, No. 1.
@article{a4a0018377154e349905fd781eb2e55e,
title = "Systematic identification of latent disease-gene associations from PubMed articles",
abstract = "Recent scientific advances have accumulated a tremendous amount of biomedical knowledge providing novel insights into the relationship between molecular and cellular processes and diseases. Literature mining is one of the commonly used methods to retrieve and extract information from scientific publications for understanding these associations. However, due to large data volume and complicated associations with noises, the interpretability of such association data for semantic knowledge discovery is challenging. In this study, we describe an integrative computational framework aiming to expedite the discovery of latent disease mechanisms by dissecting 146,245 disease-gene associations from over 25 million of PubMed indexed articles. We take advantage of both Latent Dirichlet Allocation (LDA) modeling and network-based analysis for their capabilities of detecting latent associations and reducing noises for large volume data respectively. Our results demonstrate that (1) the LDA-based modeling is able to group similar diseases into disease topics; (2) the disease-specific association networks follow the scale-free network property; (3) certain subnetwork patterns were enriched in the disease-specific association networks; and (4) genes were enriched in topic-specific biological processes. Our approach offers promising opportunities for latent disease-gene knowledge discovery in biomedical research.",
author = "Yuji Zhang and Feichen Shen and Mojarad, {Majid Rastegar} and Dingcheng Li and Sijia Liu and Cui Tao and Yue Yu and Liu, {Hongfang D}",
year = "2018",
month = "1",
day = "1",
doi = "10.1371/journal.pone.0191568",
language = "English (US)",
volume = "13",
journal = "PLoS One",
issn = "1932-6203",
publisher = "Public Library of Science",
number = "1",

}

TY - JOUR

T1 - Systematic identification of latent disease-gene associations from PubMed articles

AU - Zhang, Yuji

AU - Shen, Feichen

AU - Mojarad, Majid Rastegar

AU - Li, Dingcheng

AU - Liu, Sijia

AU - Tao, Cui

AU - Yu, Yue

AU - Liu, Hongfang D

PY - 2018/1/1

Y1 - 2018/1/1

N2 - Recent scientific advances have accumulated a tremendous amount of biomedical knowledge providing novel insights into the relationship between molecular and cellular processes and diseases. Literature mining is one of the commonly used methods to retrieve and extract information from scientific publications for understanding these associations. However, due to large data volume and complicated associations with noises, the interpretability of such association data for semantic knowledge discovery is challenging. In this study, we describe an integrative computational framework aiming to expedite the discovery of latent disease mechanisms by dissecting 146,245 disease-gene associations from over 25 million of PubMed indexed articles. We take advantage of both Latent Dirichlet Allocation (LDA) modeling and network-based analysis for their capabilities of detecting latent associations and reducing noises for large volume data respectively. Our results demonstrate that (1) the LDA-based modeling is able to group similar diseases into disease topics; (2) the disease-specific association networks follow the scale-free network property; (3) certain subnetwork patterns were enriched in the disease-specific association networks; and (4) genes were enriched in topic-specific biological processes. Our approach offers promising opportunities for latent disease-gene knowledge discovery in biomedical research.

AB - Recent scientific advances have accumulated a tremendous amount of biomedical knowledge providing novel insights into the relationship between molecular and cellular processes and diseases. Literature mining is one of the commonly used methods to retrieve and extract information from scientific publications for understanding these associations. However, due to large data volume and complicated associations with noises, the interpretability of such association data for semantic knowledge discovery is challenging. In this study, we describe an integrative computational framework aiming to expedite the discovery of latent disease mechanisms by dissecting 146,245 disease-gene associations from over 25 million of PubMed indexed articles. We take advantage of both Latent Dirichlet Allocation (LDA) modeling and network-based analysis for their capabilities of detecting latent associations and reducing noises for large volume data respectively. Our results demonstrate that (1) the LDA-based modeling is able to group similar diseases into disease topics; (2) the disease-specific association networks follow the scale-free network property; (3) certain subnetwork patterns were enriched in the disease-specific association networks; and (4) genes were enriched in topic-specific biological processes. Our approach offers promising opportunities for latent disease-gene knowledge discovery in biomedical research.

UR - http://www.scopus.com/inward/record.url?scp=85041044044&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85041044044&partnerID=8YFLogxK

U2 - 10.1371/journal.pone.0191568

DO - 10.1371/journal.pone.0191568

M3 - Article

C2 - 29373609

AN - SCOPUS:85041044044

VL - 13

JO - PLoS One

JF - PLoS One

SN - 1932-6203

IS - 1

M1 - e0191568

ER -