TY - GEN
T1 - Assessing the Need of Discourse-Level Analysis in Identifying Evidence of Drug-Disease Relations in Scientific Literature
AU - Rastegar-Mojarad, Majid
AU - Komandur Elayavilli, Ravikumar
AU - Li, Dingcheng
AU - Liu, Hongfang
N1 - Publisher Copyright:
© 2015 IMIA and IOS Press.
PY - 2015
Y1 - 2015
N2 - Relation extraction typically involves the extraction of relations between two or more entities occurring within a single or multiple sentences. In this study, we investigated the significance of extracting information from multiple sentences specifically in the context of drug-disease relation discovery. We used multiple resources such as Semantic Medline, a literature based resource, and Medline search (for filtering spurious results) and inferred 8,772 potential drug-disease pairs. Our analysis revealed that 6,450 (73.5%) of the 8,772 potential drug-disease relations did not occur in a single sentence. Moreover, only 537 of the drug-disease pairs matched the curated gold standard in Comparative Toxicogenomics Database (CTD), a trusted resource for drug-disease relations. Among the 537, nearly 75% (407) of the drug-disease pairs occur in multiple sentences. Our analysis revealed that the drug-disease pairs inferred from Semantic Medline or retrieved from CTD could be extracted from multiple sentences in the literature. This highlights the significance of the need of discourse-level analysis in extracting the relations from biomedical literature.
AB - Relation extraction typically involves the extraction of relations between two or more entities occurring within a single or multiple sentences. In this study, we investigated the significance of extracting information from multiple sentences specifically in the context of drug-disease relation discovery. We used multiple resources such as Semantic Medline, a literature based resource, and Medline search (for filtering spurious results) and inferred 8,772 potential drug-disease pairs. Our analysis revealed that 6,450 (73.5%) of the 8,772 potential drug-disease relations did not occur in a single sentence. Moreover, only 537 of the drug-disease pairs matched the curated gold standard in Comparative Toxicogenomics Database (CTD), a trusted resource for drug-disease relations. Among the 537, nearly 75% (407) of the drug-disease pairs occur in multiple sentences. Our analysis revealed that the drug-disease pairs inferred from Semantic Medline or retrieved from CTD could be extracted from multiple sentences in the literature. This highlights the significance of the need of discourse-level analysis in extracting the relations from biomedical literature.
KW - Discourse-level analysis
KW - Literature-based discovery
KW - Relation extraction
KW - Semantic Medline
UR - http://www.scopus.com/inward/record.url?scp=84951950549&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84951950549&partnerID=8YFLogxK
U2 - 10.3233/978-1-61499-564-7-539
DO - 10.3233/978-1-61499-564-7-539
M3 - Conference contribution
C2 - 26262109
AN - SCOPUS:84951950549
T3 - Studies in Health Technology and Informatics
SP - 539
EP - 543
BT - MEDINFO 2015
A2 - Georgiou, Andrew
A2 - Sarkar, Indra Neil
A2 - de Azevedo Marques, Paulo Mazzoncini
PB - IOS Press
T2 - 15th World Congress on Health and Biomedical Informatics, MEDINFO 2015
Y2 - 19 August 2015 through 23 August 2015
ER -