DIR-A semantic information resource for healthcare datasets

Jingyi Shi, Mingna Zheng, Lixia Yao, Yaorong Ge

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Scopus citations


It is important for data scientists to have a good understanding of the availability of relevant datasets as well as the content, structure, and existing analyses of these datasets. While a number of efforts are underway to integrate the large amount and variety of datasets, there is a lack of information resources that focus on specific learning needs of some targeted audiences. To address this gap, we have been developing a semantic Dataset Information Resource (DIR) framework to specifically address the challenges of entry-level data scientists in learning to identify, understand, and analyze major datasets with an initial focus on healthcare. The DIR does not contain actual data from the datasets but aims to provide comprehensive knowledge about the datasets and their analyses. The framework leverages Semantic Web technologies and the W3C Dataset Description Standard for knowledge integration and representation and includes natural language processing (NLP)-based methods to enable knowledge extraction and question answering. The prototype DIR implementation includes four major components-dataset metadata and related knowledge, search modules, question answering for frequently-asked questions, and blogs. And the DIR currently includes information on three commonly-used large and complex healthcare datasets: HCUP, MarketScan, and MIMIC. Initial usage evaluation based on health informatics students is encouraging. Further development is underway.

Original languageEnglish (US)
Title of host publicationProceedings - 2017 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2017
EditorsIllhoi Yoo, Jane Huiru Zheng, Yang Gong, Xiaohua Tony Hu, Chi-Ren Shyu, Yana Bromberg, Jean Gao, Dmitry Korkin
PublisherInstitute of Electrical and Electronics Engineers Inc.
Number of pages6
ISBN (Electronic)9781509030491
StatePublished - Dec 15 2017
Event2017 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2017 - Kansas City, United States
Duration: Nov 13 2017Nov 16 2017


Other2017 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2017
CountryUnited States
CityKansas City


  • Health Informatics
  • Knowledge Representation
  • Semantic Information Resource

ASJC Scopus subject areas

  • Biomedical Engineering
  • Health Informatics

Fingerprint Dive into the research topics of 'DIR-A semantic information resource for healthcare datasets'. Together they form a unique fingerprint.

Cite this