Automated learning of domain taxonomies from text using background knowledge

Julia Hoxha, Guoqian D Jiang, Chunhua Weng

Research output: Contribution to journalArticle

9 Citations (Scopus)

Abstract

In this paper, we present an automated method for taxonomy learning, focusing on concept formation and hierarchical relation learning. To infer such relations, we partition the extracted concepts and group them into closely-related clusters using Hierarchical Agglomerative Clustering, informed by syntactic matching and semantic relatedness functions. We introduce a novel, unsupervised method for cluster detection based on automated dendrogram pruning, which is dynamic to each partition. We evaluate our approach with two different types of textual corpora, clinical trials descriptions and MEDLINE publication abstracts. The results of several experiments indicate that our method is superior to existing dynamic pruning and the state-of-art taxonomy learning methods. It yields higher concept coverage (95.75%) and higher accuracy of learned taxonomic relations (up to 0.71 average precision and 0.96 average recall).

Original languageEnglish (US)
Pages (from-to)295-306
Number of pages12
JournalJournal of Biomedical Informatics
Volume63
DOIs
StatePublished - Oct 1 2016

Fingerprint

Taxonomies
Learning
Syntactics
Semantics
Concept Formation
MEDLINE
Cluster Analysis
Publications
Experiments
Clinical Trials

Keywords

  • Concept discovery
  • Ontology learning
  • Semantic relation acquisition
  • Taxonomy extraction from text
  • Term recognition

ASJC Scopus subject areas

  • Computer Science Applications
  • Health Informatics

Cite this

Automated learning of domain taxonomies from text using background knowledge. / Hoxha, Julia; Jiang, Guoqian D; Weng, Chunhua.

In: Journal of Biomedical Informatics, Vol. 63, 01.10.2016, p. 295-306.

Research output: Contribution to journalArticle

@article{25df4f0809cb471d8b1be6ca1ac69189,
title = "Automated learning of domain taxonomies from text using background knowledge",
abstract = "In this paper, we present an automated method for taxonomy learning, focusing on concept formation and hierarchical relation learning. To infer such relations, we partition the extracted concepts and group them into closely-related clusters using Hierarchical Agglomerative Clustering, informed by syntactic matching and semantic relatedness functions. We introduce a novel, unsupervised method for cluster detection based on automated dendrogram pruning, which is dynamic to each partition. We evaluate our approach with two different types of textual corpora, clinical trials descriptions and MEDLINE publication abstracts. The results of several experiments indicate that our method is superior to existing dynamic pruning and the state-of-art taxonomy learning methods. It yields higher concept coverage (95.75{\%}) and higher accuracy of learned taxonomic relations (up to 0.71 average precision and 0.96 average recall).",
keywords = "Concept discovery, Ontology learning, Semantic relation acquisition, Taxonomy extraction from text, Term recognition",
author = "Julia Hoxha and Jiang, {Guoqian D} and Chunhua Weng",
year = "2016",
month = "10",
day = "1",
doi = "10.1016/j.jbi.2016.09.002",
language = "English (US)",
volume = "63",
pages = "295--306",
journal = "Journal of Biomedical Informatics",
issn = "1532-0464",
publisher = "Academic Press Inc.",

}

TY - JOUR

T1 - Automated learning of domain taxonomies from text using background knowledge

AU - Hoxha, Julia

AU - Jiang, Guoqian D

AU - Weng, Chunhua

PY - 2016/10/1

Y1 - 2016/10/1

N2 - In this paper, we present an automated method for taxonomy learning, focusing on concept formation and hierarchical relation learning. To infer such relations, we partition the extracted concepts and group them into closely-related clusters using Hierarchical Agglomerative Clustering, informed by syntactic matching and semantic relatedness functions. We introduce a novel, unsupervised method for cluster detection based on automated dendrogram pruning, which is dynamic to each partition. We evaluate our approach with two different types of textual corpora, clinical trials descriptions and MEDLINE publication abstracts. The results of several experiments indicate that our method is superior to existing dynamic pruning and the state-of-art taxonomy learning methods. It yields higher concept coverage (95.75%) and higher accuracy of learned taxonomic relations (up to 0.71 average precision and 0.96 average recall).

AB - In this paper, we present an automated method for taxonomy learning, focusing on concept formation and hierarchical relation learning. To infer such relations, we partition the extracted concepts and group them into closely-related clusters using Hierarchical Agglomerative Clustering, informed by syntactic matching and semantic relatedness functions. We introduce a novel, unsupervised method for cluster detection based on automated dendrogram pruning, which is dynamic to each partition. We evaluate our approach with two different types of textual corpora, clinical trials descriptions and MEDLINE publication abstracts. The results of several experiments indicate that our method is superior to existing dynamic pruning and the state-of-art taxonomy learning methods. It yields higher concept coverage (95.75%) and higher accuracy of learned taxonomic relations (up to 0.71 average precision and 0.96 average recall).

KW - Concept discovery

KW - Ontology learning

KW - Semantic relation acquisition

KW - Taxonomy extraction from text

KW - Term recognition

UR - http://www.scopus.com/inward/record.url?scp=84987932313&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84987932313&partnerID=8YFLogxK

U2 - 10.1016/j.jbi.2016.09.002

DO - 10.1016/j.jbi.2016.09.002

M3 - Article

C2 - 27597572

AN - SCOPUS:84987932313

VL - 63

SP - 295

EP - 306

JO - Journal of Biomedical Informatics

JF - Journal of Biomedical Informatics

SN - 1532-0464

ER -