Headwords and suffixes in biomedical names

Manabu Torii, Hongfang D Liu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

Natural Language Processing (NLP) techniques have been used for the task of extracting and mining knowledge from biomedical literature, One of the critical steps of such a task is biomedical named entity tagging (BNER) which usually contains two steps: the first step is the identification of biomedical names in text and the second is the assignment of semantic classes predefined to names identified by the first step. Headwords and suffixes have been used frequently by BNER systems as features for the assignment of semantic classes to names in text. However, there are few studies to evaluate the performance of headwords and suffixes in predicting semantic classes of biomedical terms utilizing knowledge sources in an unsupervised way. We conducted a study to evaluate the performance of headwords and suffixes using names in the Unified Medical Language System (UMLS) where the semantic classes associated with these names were obtained by modifying an existing UMLS semantic group system and incorporating the GENIA ontology. We define headwords and suffixes that are significantly associated with a specific semantic class as semantic suffixes. The performance of semantic assignment using semantic suffixes achieved an F-measure of 86.4% with a precision of 91.6% and a recall of 81.7%. When applying these semantic suffixes obtained using the UMLS to names extracted from the GENIA corpus, the system achieved an F-measure of 73.4% with a precision of 84.2% and a recall of 65.1% where these performance measures could be improved dramatically when limited to names associated with classes that have the corresponding GENIA types.

Original languageEnglish (US)
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Pages29-41
Number of pages13
Volume3886 LNBI
DOIs
StatePublished - 2006
Externally publishedYes
EventPAKDD 2006 International Workshop on Knowledge Discovery in Life Science Literature, KDLL 2006 - Singapore, Singapore
Duration: Apr 9 2006Apr 9 2006

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume3886 LNBI
ISSN (Print)03029743
ISSN (Electronic)16113349

Other

OtherPAKDD 2006 International Workshop on Knowledge Discovery in Life Science Literature, KDLL 2006
CountrySingapore
CitySingapore
Period4/9/064/9/06

Fingerprint

Suffix
Semantics
Names
Unified Medical Language System
Assignment
benzoylprop-ethyl
Natural Language Processing
Evaluate
Tagging
Performance Measures
Natural Language
Ontology
Class
Mining

ASJC Scopus subject areas

  • Biochemistry, Genetics and Molecular Biology(all)
  • Computer Science(all)
  • Theoretical Computer Science

Cite this

Torii, M., & Liu, H. D. (2006). Headwords and suffixes in biomedical names. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 3886 LNBI, pp. 29-41). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 3886 LNBI). https://doi.org/10.1007/11683568_3

Headwords and suffixes in biomedical names. / Torii, Manabu; Liu, Hongfang D.

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 3886 LNBI 2006. p. 29-41 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 3886 LNBI).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Torii, M & Liu, HD 2006, Headwords and suffixes in biomedical names. in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). vol. 3886 LNBI, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 3886 LNBI, pp. 29-41, PAKDD 2006 International Workshop on Knowledge Discovery in Life Science Literature, KDLL 2006, Singapore, Singapore, 4/9/06. https://doi.org/10.1007/11683568_3
Torii M, Liu HD. Headwords and suffixes in biomedical names. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 3886 LNBI. 2006. p. 29-41. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/11683568_3
Torii, Manabu ; Liu, Hongfang D. / Headwords and suffixes in biomedical names. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 3886 LNBI 2006. pp. 29-41 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{34825ff1b6c745fbba70110e92591c94,
title = "Headwords and suffixes in biomedical names",
abstract = "Natural Language Processing (NLP) techniques have been used for the task of extracting and mining knowledge from biomedical literature, One of the critical steps of such a task is biomedical named entity tagging (BNER) which usually contains two steps: the first step is the identification of biomedical names in text and the second is the assignment of semantic classes predefined to names identified by the first step. Headwords and suffixes have been used frequently by BNER systems as features for the assignment of semantic classes to names in text. However, there are few studies to evaluate the performance of headwords and suffixes in predicting semantic classes of biomedical terms utilizing knowledge sources in an unsupervised way. We conducted a study to evaluate the performance of headwords and suffixes using names in the Unified Medical Language System (UMLS) where the semantic classes associated with these names were obtained by modifying an existing UMLS semantic group system and incorporating the GENIA ontology. We define headwords and suffixes that are significantly associated with a specific semantic class as semantic suffixes. The performance of semantic assignment using semantic suffixes achieved an F-measure of 86.4{\%} with a precision of 91.6{\%} and a recall of 81.7{\%}. When applying these semantic suffixes obtained using the UMLS to names extracted from the GENIA corpus, the system achieved an F-measure of 73.4{\%} with a precision of 84.2{\%} and a recall of 65.1{\%} where these performance measures could be improved dramatically when limited to names associated with classes that have the corresponding GENIA types.",
author = "Manabu Torii and Liu, {Hongfang D}",
year = "2006",
doi = "10.1007/11683568_3",
language = "English (US)",
isbn = "3540328092",
volume = "3886 LNBI",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
pages = "29--41",
booktitle = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

}

TY - GEN

T1 - Headwords and suffixes in biomedical names

AU - Torii, Manabu

AU - Liu, Hongfang D

PY - 2006

Y1 - 2006

N2 - Natural Language Processing (NLP) techniques have been used for the task of extracting and mining knowledge from biomedical literature, One of the critical steps of such a task is biomedical named entity tagging (BNER) which usually contains two steps: the first step is the identification of biomedical names in text and the second is the assignment of semantic classes predefined to names identified by the first step. Headwords and suffixes have been used frequently by BNER systems as features for the assignment of semantic classes to names in text. However, there are few studies to evaluate the performance of headwords and suffixes in predicting semantic classes of biomedical terms utilizing knowledge sources in an unsupervised way. We conducted a study to evaluate the performance of headwords and suffixes using names in the Unified Medical Language System (UMLS) where the semantic classes associated with these names were obtained by modifying an existing UMLS semantic group system and incorporating the GENIA ontology. We define headwords and suffixes that are significantly associated with a specific semantic class as semantic suffixes. The performance of semantic assignment using semantic suffixes achieved an F-measure of 86.4% with a precision of 91.6% and a recall of 81.7%. When applying these semantic suffixes obtained using the UMLS to names extracted from the GENIA corpus, the system achieved an F-measure of 73.4% with a precision of 84.2% and a recall of 65.1% where these performance measures could be improved dramatically when limited to names associated with classes that have the corresponding GENIA types.

AB - Natural Language Processing (NLP) techniques have been used for the task of extracting and mining knowledge from biomedical literature, One of the critical steps of such a task is biomedical named entity tagging (BNER) which usually contains two steps: the first step is the identification of biomedical names in text and the second is the assignment of semantic classes predefined to names identified by the first step. Headwords and suffixes have been used frequently by BNER systems as features for the assignment of semantic classes to names in text. However, there are few studies to evaluate the performance of headwords and suffixes in predicting semantic classes of biomedical terms utilizing knowledge sources in an unsupervised way. We conducted a study to evaluate the performance of headwords and suffixes using names in the Unified Medical Language System (UMLS) where the semantic classes associated with these names were obtained by modifying an existing UMLS semantic group system and incorporating the GENIA ontology. We define headwords and suffixes that are significantly associated with a specific semantic class as semantic suffixes. The performance of semantic assignment using semantic suffixes achieved an F-measure of 86.4% with a precision of 91.6% and a recall of 81.7%. When applying these semantic suffixes obtained using the UMLS to names extracted from the GENIA corpus, the system achieved an F-measure of 73.4% with a precision of 84.2% and a recall of 65.1% where these performance measures could be improved dramatically when limited to names associated with classes that have the corresponding GENIA types.

UR - http://www.scopus.com/inward/record.url?scp=33745607586&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33745607586&partnerID=8YFLogxK

U2 - 10.1007/11683568_3

DO - 10.1007/11683568_3

M3 - Conference contribution

AN - SCOPUS:33745607586

SN - 3540328092

SN - 9783540328094

VL - 3886 LNBI

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 29

EP - 41

BT - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

ER -