TY - GEN
T1 - DIR-A semantic information resource for healthcare datasets
AU - Shi, Jingyi
AU - Zheng, Mingna
AU - Yao, Lixia
AU - Ge, Yaorong
N1 - Publisher Copyright:
© 2017 IEEE.
PY - 2017/12/15
Y1 - 2017/12/15
N2 - It is important for data scientists to have a good understanding of the availability of relevant datasets as well as the content, structure, and existing analyses of these datasets. While a number of efforts are underway to integrate the large amount and variety of datasets, there is a lack of information resources that focus on specific learning needs of some targeted audiences. To address this gap, we have been developing a semantic Dataset Information Resource (DIR) framework to specifically address the challenges of entry-level data scientists in learning to identify, understand, and analyze major datasets with an initial focus on healthcare. The DIR does not contain actual data from the datasets but aims to provide comprehensive knowledge about the datasets and their analyses. The framework leverages Semantic Web technologies and the W3C Dataset Description Standard for knowledge integration and representation and includes natural language processing (NLP)-based methods to enable knowledge extraction and question answering. The prototype DIR implementation includes four major components-dataset metadata and related knowledge, search modules, question answering for frequently-asked questions, and blogs. And the DIR currently includes information on three commonly-used large and complex healthcare datasets: HCUP, MarketScan, and MIMIC. Initial usage evaluation based on health informatics students is encouraging. Further development is underway.
AB - It is important for data scientists to have a good understanding of the availability of relevant datasets as well as the content, structure, and existing analyses of these datasets. While a number of efforts are underway to integrate the large amount and variety of datasets, there is a lack of information resources that focus on specific learning needs of some targeted audiences. To address this gap, we have been developing a semantic Dataset Information Resource (DIR) framework to specifically address the challenges of entry-level data scientists in learning to identify, understand, and analyze major datasets with an initial focus on healthcare. The DIR does not contain actual data from the datasets but aims to provide comprehensive knowledge about the datasets and their analyses. The framework leverages Semantic Web technologies and the W3C Dataset Description Standard for knowledge integration and representation and includes natural language processing (NLP)-based methods to enable knowledge extraction and question answering. The prototype DIR implementation includes four major components-dataset metadata and related knowledge, search modules, question answering for frequently-asked questions, and blogs. And the DIR currently includes information on three commonly-used large and complex healthcare datasets: HCUP, MarketScan, and MIMIC. Initial usage evaluation based on health informatics students is encouraging. Further development is underway.
KW - Health Informatics
KW - Knowledge Representation
KW - Semantic Information Resource
UR - http://www.scopus.com/inward/record.url?scp=85045978924&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85045978924&partnerID=8YFLogxK
U2 - 10.1109/BIBM.2017.8217758
DO - 10.1109/BIBM.2017.8217758
M3 - Conference contribution
AN - SCOPUS:85045978924
T3 - Proceedings - 2017 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2017
SP - 805
EP - 810
BT - Proceedings - 2017 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2017
A2 - Yoo, Illhoi
A2 - Zheng, Jane Huiru
A2 - Gong, Yang
A2 - Hu, Xiaohua Tony
A2 - Shyu, Chi-Ren
A2 - Bromberg, Yana
A2 - Gao, Jean
A2 - Korkin, Dmitry
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2017 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2017
Y2 - 13 November 2017 through 16 November 2017
ER -