TY - GEN
T1 - Human gene/protein synonym dictionary from WikiLinks
AU - Wagholikar, Kavishwar
AU - Torii, Manabu
AU - Liu, Hongfang
PY - 2011
Y1 - 2011
N2 - Many genes and proteins have alternate names (synonyms) in scientific literature, posing a challenge to effectively organize and exchange information. To address this issue, there have been several initiatives to collate the synonyms into dictionaries. Biothesaurus is an extensive dictionary derived from multiple authoritative sources. Despite its extensive coverage, there are still some synonyms not covered by Biothesaurus. Wikipedia could be a useful source of the missing synonyms, as it has a diverse set of contributors in comparison with authoritative resources, that constitute Biothesaurus. This paper reports a feasibility study of using WikiLinks to find synonyms that are not currently covered by Biothesaurus. Wikipedia pages containing the word gene or protein were included in this study. 121 candidate synonyms were extracted from WikiLinks referencing 7,339 (16%) human genes. This number is significant, given that Biothesaurus has been earlier evaluated to have a coverage of 87%. Hence, WikiLinks were found to be a useful source for collating gene synonyms that are not recorded in authoritative databases. Biothesaurus was evaluated to cover 52% of the extracted candidate synonyms not documented in NCBI. The current study will be extended in scope to cover all genes and to extract synonyms from free text in Wikipedia pages.
AB - Many genes and proteins have alternate names (synonyms) in scientific literature, posing a challenge to effectively organize and exchange information. To address this issue, there have been several initiatives to collate the synonyms into dictionaries. Biothesaurus is an extensive dictionary derived from multiple authoritative sources. Despite its extensive coverage, there are still some synonyms not covered by Biothesaurus. Wikipedia could be a useful source of the missing synonyms, as it has a diverse set of contributors in comparison with authoritative resources, that constitute Biothesaurus. This paper reports a feasibility study of using WikiLinks to find synonyms that are not currently covered by Biothesaurus. Wikipedia pages containing the word gene or protein were included in this study. 121 candidate synonyms were extracted from WikiLinks referencing 7,339 (16%) human genes. This number is significant, given that Biothesaurus has been earlier evaluated to have a coverage of 87%. Hence, WikiLinks were found to be a useful source for collating gene synonyms that are not recorded in authoritative databases. Biothesaurus was evaluated to cover 52% of the extracted candidate synonyms not documented in NCBI. The current study will be extended in scope to cover all genes and to extract synonyms from free text in Wikipedia pages.
KW - Encyclopedias
KW - Gene/protein synonym
KW - Information storage and retrieval
KW - Names
KW - Terminology
KW - WikiLinks
KW - Wikipedia
UR - http://www.scopus.com/inward/record.url?scp=84858963749&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84858963749&partnerID=8YFLogxK
U2 - 10.1145/2147805.2147870
DO - 10.1145/2147805.2147870
M3 - Conference contribution
AN - SCOPUS:84858963749
SN - 9781450307963
T3 - 2011 ACM Conference on Bioinformatics, Computational Biology and Biomedicine, BCB 2011
SP - 462
EP - 464
BT - 2011 ACM Conference on Bioinformatics, Computational Biology and Biomedicine, BCB 2011
T2 - 2011 ACM Conference on Bioinformatics, Computational Biology and Biomedicine, ACM-BCB 2011
Y2 - 1 August 2011 through 3 August 2011
ER -