TY - GEN
T1 - An integrative computational approach to identify disease-specific networks from PubMed literature information
AU - Zhang, Yuji
AU - Li, Dingchen
AU - Tao, Cui
AU - Shen, Feichen
AU - Liu, Hongfang
PY - 2013
Y1 - 2013
N2 - A huge amount of association relationships among biological entities (e.g., diseases, drugs, and genes) are scattered in biomedicai literature. How to extract and analyze such heterogeneous data still remains a challenging task for most researchers in the biomedicai field. Natural language processing (NLP) has the potential in extracting associations among biological entities from literature. However, association information extracted through NLP can be large, noisy, and redundant which poses significant challenges to biomedicai researchers to use such information. To address this challenge, we propose a computational framework to facilitate the use of NLP results. We apply Latent Dirichlet Allocation (LDA) to discover topics based on associations. The networks extracted from each topic provide a disease-specific network for downstream bioinformatics analysis of associations for each topic. We illustrated the framework through the construction of disease-specific networks from Semantic MEDLINE, an NLP-generated association database, followed by the analysis of network properties, such as hub nodes and degree distribution. The results demonstrate that (1) LDA-based approach can group related diseases into the same disease topic; (2) the disease-specific association network follows the scale-free network property, in which hub nodes are enriched in related diseases, genes and drugs.
AB - A huge amount of association relationships among biological entities (e.g., diseases, drugs, and genes) are scattered in biomedicai literature. How to extract and analyze such heterogeneous data still remains a challenging task for most researchers in the biomedicai field. Natural language processing (NLP) has the potential in extracting associations among biological entities from literature. However, association information extracted through NLP can be large, noisy, and redundant which poses significant challenges to biomedicai researchers to use such information. To address this challenge, we propose a computational framework to facilitate the use of NLP results. We apply Latent Dirichlet Allocation (LDA) to discover topics based on associations. The networks extracted from each topic provide a disease-specific network for downstream bioinformatics analysis of associations for each topic. We illustrated the framework through the construction of disease-specific networks from Semantic MEDLINE, an NLP-generated association database, followed by the analysis of network properties, such as hub nodes and degree distribution. The results demonstrate that (1) LDA-based approach can group related diseases into the same disease topic; (2) the disease-specific association network follows the scale-free network property, in which hub nodes are enriched in related diseases, genes and drugs.
KW - Disease-specific network
KW - Latent Dirichlet Allocation
KW - Network Analysis
KW - Semantic MEDLINE
UR - http://www.scopus.com/inward/record.url?scp=84894516031&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84894516031&partnerID=8YFLogxK
U2 - 10.1109/BIBM.2013.6732738
DO - 10.1109/BIBM.2013.6732738
M3 - Conference contribution
AN - SCOPUS:84894516031
SN - 9781479913091
T3 - Proceedings - 2013 IEEE International Conference on Bioinformatics and Biomedicine, IEEE BIBM 2013
SP - 72
EP - 75
BT - Proceedings - 2013 IEEE International Conference on Bioinformatics and Biomedicine, IEEE BIBM 2013
T2 - 2013 IEEE International Conference on Bioinformatics and Biomedicine, IEEE BIBM 2013
Y2 - 18 December 2013 through 21 December 2013
ER -