DIR-A semantic information resource for healthcare datasets

Jingyi Shi, Mingna Zheng, Lixia Yao, Yaorong Ge

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

It is important for data scientists to have a good understanding of the availability of relevant datasets as well as the content, structure, and existing analyses of these datasets. While a number of efforts are underway to integrate the large amount and variety of datasets, there is a lack of information resources that focus on specific learning needs of some targeted audiences. To address this gap, we have been developing a semantic Dataset Information Resource (DIR) framework to specifically address the challenges of entry-level data scientists in learning to identify, understand, and analyze major datasets with an initial focus on healthcare. The DIR does not contain actual data from the datasets but aims to provide comprehensive knowledge about the datasets and their analyses. The framework leverages Semantic Web technologies and the W3C Dataset Description Standard for knowledge integration and representation and includes natural language processing (NLP)-based methods to enable knowledge extraction and question answering. The prototype DIR implementation includes four major components-dataset metadata and related knowledge, search modules, question answering for frequently-asked questions, and blogs. And the DIR currently includes information on three commonly-used large and complex healthcare datasets: HCUP, MarketScan, and MIMIC. Initial usage evaluation based on health informatics students is encouraging. Further development is underway.

Original languageEnglish (US)
Title of host publicationProceedings - 2017 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2017
EditorsIllhoi Yoo, Jane Huiru Zheng, Yang Gong, Xiaohua Tony Hu, Chi-Ren Shyu, Yana Bromberg, Jean Gao, Dmitry Korkin
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages805-810
Number of pages6
Volume2017-January
ISBN (Electronic)9781509030491
DOIs
StatePublished - Dec 15 2017
Event2017 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2017 - Kansas City, United States
Duration: Nov 13 2017Nov 16 2017

Other

Other2017 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2017
CountryUnited States
CityKansas City
Period11/13/1711/16/17

Fingerprint

Blogs
Semantic Web
Metadata
Semantics
Health
Availability
Students
Delivery of Health Care
Processing
Datasets
Blogging
Natural Language Processing
Learning
Informatics

Keywords

  • Health Informatics
  • Knowledge Representation
  • Semantic Information Resource

ASJC Scopus subject areas

  • Biomedical Engineering
  • Health Informatics

Cite this

Shi, J., Zheng, M., Yao, L., & Ge, Y. (2017). DIR-A semantic information resource for healthcare datasets. In I. Yoo, J. H. Zheng, Y. Gong, X. T. Hu, C-R. Shyu, Y. Bromberg, J. Gao, ... D. Korkin (Eds.), Proceedings - 2017 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2017 (Vol. 2017-January, pp. 805-810). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/BIBM.2017.8217758

DIR-A semantic information resource for healthcare datasets. / Shi, Jingyi; Zheng, Mingna; Yao, Lixia; Ge, Yaorong.

Proceedings - 2017 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2017. ed. / Illhoi Yoo; Jane Huiru Zheng; Yang Gong; Xiaohua Tony Hu; Chi-Ren Shyu; Yana Bromberg; Jean Gao; Dmitry Korkin. Vol. 2017-January Institute of Electrical and Electronics Engineers Inc., 2017. p. 805-810.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Shi, J, Zheng, M, Yao, L & Ge, Y 2017, DIR-A semantic information resource for healthcare datasets. in I Yoo, JH Zheng, Y Gong, XT Hu, C-R Shyu, Y Bromberg, J Gao & D Korkin (eds), Proceedings - 2017 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2017. vol. 2017-January, Institute of Electrical and Electronics Engineers Inc., pp. 805-810, 2017 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2017, Kansas City, United States, 11/13/17. https://doi.org/10.1109/BIBM.2017.8217758
Shi J, Zheng M, Yao L, Ge Y. DIR-A semantic information resource for healthcare datasets. In Yoo I, Zheng JH, Gong Y, Hu XT, Shyu C-R, Bromberg Y, Gao J, Korkin D, editors, Proceedings - 2017 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2017. Vol. 2017-January. Institute of Electrical and Electronics Engineers Inc. 2017. p. 805-810 https://doi.org/10.1109/BIBM.2017.8217758
Shi, Jingyi ; Zheng, Mingna ; Yao, Lixia ; Ge, Yaorong. / DIR-A semantic information resource for healthcare datasets. Proceedings - 2017 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2017. editor / Illhoi Yoo ; Jane Huiru Zheng ; Yang Gong ; Xiaohua Tony Hu ; Chi-Ren Shyu ; Yana Bromberg ; Jean Gao ; Dmitry Korkin. Vol. 2017-January Institute of Electrical and Electronics Engineers Inc., 2017. pp. 805-810
@inproceedings{e85aa68304864fc4bf7b9f11565bb30e,
title = "DIR-A semantic information resource for healthcare datasets",
abstract = "It is important for data scientists to have a good understanding of the availability of relevant datasets as well as the content, structure, and existing analyses of these datasets. While a number of efforts are underway to integrate the large amount and variety of datasets, there is a lack of information resources that focus on specific learning needs of some targeted audiences. To address this gap, we have been developing a semantic Dataset Information Resource (DIR) framework to specifically address the challenges of entry-level data scientists in learning to identify, understand, and analyze major datasets with an initial focus on healthcare. The DIR does not contain actual data from the datasets but aims to provide comprehensive knowledge about the datasets and their analyses. The framework leverages Semantic Web technologies and the W3C Dataset Description Standard for knowledge integration and representation and includes natural language processing (NLP)-based methods to enable knowledge extraction and question answering. The prototype DIR implementation includes four major components-dataset metadata and related knowledge, search modules, question answering for frequently-asked questions, and blogs. And the DIR currently includes information on three commonly-used large and complex healthcare datasets: HCUP, MarketScan, and MIMIC. Initial usage evaluation based on health informatics students is encouraging. Further development is underway.",
keywords = "Health Informatics, Knowledge Representation, Semantic Information Resource",
author = "Jingyi Shi and Mingna Zheng and Lixia Yao and Yaorong Ge",
year = "2017",
month = "12",
day = "15",
doi = "10.1109/BIBM.2017.8217758",
language = "English (US)",
volume = "2017-January",
pages = "805--810",
editor = "Illhoi Yoo and Zheng, {Jane Huiru} and Yang Gong and Hu, {Xiaohua Tony} and Chi-Ren Shyu and Yana Bromberg and Jean Gao and Dmitry Korkin",
booktitle = "Proceedings - 2017 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2017",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - GEN

T1 - DIR-A semantic information resource for healthcare datasets

AU - Shi, Jingyi

AU - Zheng, Mingna

AU - Yao, Lixia

AU - Ge, Yaorong

PY - 2017/12/15

Y1 - 2017/12/15

N2 - It is important for data scientists to have a good understanding of the availability of relevant datasets as well as the content, structure, and existing analyses of these datasets. While a number of efforts are underway to integrate the large amount and variety of datasets, there is a lack of information resources that focus on specific learning needs of some targeted audiences. To address this gap, we have been developing a semantic Dataset Information Resource (DIR) framework to specifically address the challenges of entry-level data scientists in learning to identify, understand, and analyze major datasets with an initial focus on healthcare. The DIR does not contain actual data from the datasets but aims to provide comprehensive knowledge about the datasets and their analyses. The framework leverages Semantic Web technologies and the W3C Dataset Description Standard for knowledge integration and representation and includes natural language processing (NLP)-based methods to enable knowledge extraction and question answering. The prototype DIR implementation includes four major components-dataset metadata and related knowledge, search modules, question answering for frequently-asked questions, and blogs. And the DIR currently includes information on three commonly-used large and complex healthcare datasets: HCUP, MarketScan, and MIMIC. Initial usage evaluation based on health informatics students is encouraging. Further development is underway.

AB - It is important for data scientists to have a good understanding of the availability of relevant datasets as well as the content, structure, and existing analyses of these datasets. While a number of efforts are underway to integrate the large amount and variety of datasets, there is a lack of information resources that focus on specific learning needs of some targeted audiences. To address this gap, we have been developing a semantic Dataset Information Resource (DIR) framework to specifically address the challenges of entry-level data scientists in learning to identify, understand, and analyze major datasets with an initial focus on healthcare. The DIR does not contain actual data from the datasets but aims to provide comprehensive knowledge about the datasets and their analyses. The framework leverages Semantic Web technologies and the W3C Dataset Description Standard for knowledge integration and representation and includes natural language processing (NLP)-based methods to enable knowledge extraction and question answering. The prototype DIR implementation includes four major components-dataset metadata and related knowledge, search modules, question answering for frequently-asked questions, and blogs. And the DIR currently includes information on three commonly-used large and complex healthcare datasets: HCUP, MarketScan, and MIMIC. Initial usage evaluation based on health informatics students is encouraging. Further development is underway.

KW - Health Informatics

KW - Knowledge Representation

KW - Semantic Information Resource

UR - http://www.scopus.com/inward/record.url?scp=85045978924&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85045978924&partnerID=8YFLogxK

U2 - 10.1109/BIBM.2017.8217758

DO - 10.1109/BIBM.2017.8217758

M3 - Conference contribution

AN - SCOPUS:85045978924

VL - 2017-January

SP - 805

EP - 810

BT - Proceedings - 2017 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2017

A2 - Yoo, Illhoi

A2 - Zheng, Jane Huiru

A2 - Gong, Yang

A2 - Hu, Xiaohua Tony

A2 - Shyu, Chi-Ren

A2 - Bromberg, Yana

A2 - Gao, Jean

A2 - Korkin, Dmitry

PB - Institute of Electrical and Electronics Engineers Inc.

ER -