A publication-based popularity index (PPI) for healthcare dataset ranking

Jingyi Shi, Mingna Zheng, Lixia Yao, Yaorong Ge

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Data are critical in this age of big data and machine learning. Due to their inherent complexity, health-related data are unique in that the datasets are usually acquired for specific purposes and with special designs. As more and more healthcare datasets become available, of which many are public, choosing a quality dataset that is suitable for specific research inquiries is becoming a challenging question for health informatics researchers, especially the learners of this field. On the other hand, from the data provider's perspective, it is important to identify features of datasets that make some datasets more valuable than others so as to improve the design and acquisition of future datasets. To address these questions, we need to develop formal mechanisms to measure the goodness of datasets according to certain criteria. In this study, we propose one way of measuring the value of healthcare datasets that is based on how often the datasets are used and reported by researchers, which we call the Publication-based Popularity Index (PPI). In this article, we describe the design of the PPI and discuss its properties. We demonstrate the utility of the PPI by ranking 14 representative healthcare datasets. We believe that the PPI can enable an overall ranking of all healthcare datasets and thus provide an important dimension to sort search results for dataset integration systems as well as a starting point for identifying and examining the design of the most valuable healthcare datasets so that features of these datasets can inform future designs.

Original languageEnglish (US)
Title of host publicationProceedings - 2018 IEEE International Conference on Healthcare Informatics, ICHI 2018
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages247-254
Number of pages8
ISBN (Electronic)9781538653777
DOIs
StatePublished - Jul 24 2018
Event6th IEEE International Conference on Healthcare Informatics, ICHI 2018 - New York, United States
Duration: Jun 4 2018Jun 7 2018

Other

Other6th IEEE International Conference on Healthcare Informatics, ICHI 2018
CountryUnited States
CityNew York
Period6/4/186/7/18

Fingerprint

Publications
Delivery of Health Care
Health
Learning systems
Datasets
Research Personnel
Systems Integration
Informatics

Keywords

  • Data quality
  • Healthcare dataset
  • Popularity index
  • Quantified measurement
  • Regression

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Networks and Communications
  • Health Informatics

Cite this

Shi, J., Zheng, M., Yao, L., & Ge, Y. (2018). A publication-based popularity index (PPI) for healthcare dataset ranking. In Proceedings - 2018 IEEE International Conference on Healthcare Informatics, ICHI 2018 (pp. 247-254). [8419368] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICHI.2018.00035

A publication-based popularity index (PPI) for healthcare dataset ranking. / Shi, Jingyi; Zheng, Mingna; Yao, Lixia; Ge, Yaorong.

Proceedings - 2018 IEEE International Conference on Healthcare Informatics, ICHI 2018. Institute of Electrical and Electronics Engineers Inc., 2018. p. 247-254 8419368.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Shi, J, Zheng, M, Yao, L & Ge, Y 2018, A publication-based popularity index (PPI) for healthcare dataset ranking. in Proceedings - 2018 IEEE International Conference on Healthcare Informatics, ICHI 2018., 8419368, Institute of Electrical and Electronics Engineers Inc., pp. 247-254, 6th IEEE International Conference on Healthcare Informatics, ICHI 2018, New York, United States, 6/4/18. https://doi.org/10.1109/ICHI.2018.00035
Shi J, Zheng M, Yao L, Ge Y. A publication-based popularity index (PPI) for healthcare dataset ranking. In Proceedings - 2018 IEEE International Conference on Healthcare Informatics, ICHI 2018. Institute of Electrical and Electronics Engineers Inc. 2018. p. 247-254. 8419368 https://doi.org/10.1109/ICHI.2018.00035
Shi, Jingyi ; Zheng, Mingna ; Yao, Lixia ; Ge, Yaorong. / A publication-based popularity index (PPI) for healthcare dataset ranking. Proceedings - 2018 IEEE International Conference on Healthcare Informatics, ICHI 2018. Institute of Electrical and Electronics Engineers Inc., 2018. pp. 247-254
@inproceedings{6b4f881cc6eb42f384f54f0d49a79a0a,
title = "A publication-based popularity index (PPI) for healthcare dataset ranking",
abstract = "Data are critical in this age of big data and machine learning. Due to their inherent complexity, health-related data are unique in that the datasets are usually acquired for specific purposes and with special designs. As more and more healthcare datasets become available, of which many are public, choosing a quality dataset that is suitable for specific research inquiries is becoming a challenging question for health informatics researchers, especially the learners of this field. On the other hand, from the data provider's perspective, it is important to identify features of datasets that make some datasets more valuable than others so as to improve the design and acquisition of future datasets. To address these questions, we need to develop formal mechanisms to measure the goodness of datasets according to certain criteria. In this study, we propose one way of measuring the value of healthcare datasets that is based on how often the datasets are used and reported by researchers, which we call the Publication-based Popularity Index (PPI). In this article, we describe the design of the PPI and discuss its properties. We demonstrate the utility of the PPI by ranking 14 representative healthcare datasets. We believe that the PPI can enable an overall ranking of all healthcare datasets and thus provide an important dimension to sort search results for dataset integration systems as well as a starting point for identifying and examining the design of the most valuable healthcare datasets so that features of these datasets can inform future designs.",
keywords = "Data quality, Healthcare dataset, Popularity index, Quantified measurement, Regression",
author = "Jingyi Shi and Mingna Zheng and Lixia Yao and Yaorong Ge",
year = "2018",
month = "7",
day = "24",
doi = "10.1109/ICHI.2018.00035",
language = "English (US)",
pages = "247--254",
booktitle = "Proceedings - 2018 IEEE International Conference on Healthcare Informatics, ICHI 2018",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - GEN

T1 - A publication-based popularity index (PPI) for healthcare dataset ranking

AU - Shi, Jingyi

AU - Zheng, Mingna

AU - Yao, Lixia

AU - Ge, Yaorong

PY - 2018/7/24

Y1 - 2018/7/24

N2 - Data are critical in this age of big data and machine learning. Due to their inherent complexity, health-related data are unique in that the datasets are usually acquired for specific purposes and with special designs. As more and more healthcare datasets become available, of which many are public, choosing a quality dataset that is suitable for specific research inquiries is becoming a challenging question for health informatics researchers, especially the learners of this field. On the other hand, from the data provider's perspective, it is important to identify features of datasets that make some datasets more valuable than others so as to improve the design and acquisition of future datasets. To address these questions, we need to develop formal mechanisms to measure the goodness of datasets according to certain criteria. In this study, we propose one way of measuring the value of healthcare datasets that is based on how often the datasets are used and reported by researchers, which we call the Publication-based Popularity Index (PPI). In this article, we describe the design of the PPI and discuss its properties. We demonstrate the utility of the PPI by ranking 14 representative healthcare datasets. We believe that the PPI can enable an overall ranking of all healthcare datasets and thus provide an important dimension to sort search results for dataset integration systems as well as a starting point for identifying and examining the design of the most valuable healthcare datasets so that features of these datasets can inform future designs.

AB - Data are critical in this age of big data and machine learning. Due to their inherent complexity, health-related data are unique in that the datasets are usually acquired for specific purposes and with special designs. As more and more healthcare datasets become available, of which many are public, choosing a quality dataset that is suitable for specific research inquiries is becoming a challenging question for health informatics researchers, especially the learners of this field. On the other hand, from the data provider's perspective, it is important to identify features of datasets that make some datasets more valuable than others so as to improve the design and acquisition of future datasets. To address these questions, we need to develop formal mechanisms to measure the goodness of datasets according to certain criteria. In this study, we propose one way of measuring the value of healthcare datasets that is based on how often the datasets are used and reported by researchers, which we call the Publication-based Popularity Index (PPI). In this article, we describe the design of the PPI and discuss its properties. We demonstrate the utility of the PPI by ranking 14 representative healthcare datasets. We believe that the PPI can enable an overall ranking of all healthcare datasets and thus provide an important dimension to sort search results for dataset integration systems as well as a starting point for identifying and examining the design of the most valuable healthcare datasets so that features of these datasets can inform future designs.

KW - Data quality

KW - Healthcare dataset

KW - Popularity index

KW - Quantified measurement

KW - Regression

UR - http://www.scopus.com/inward/record.url?scp=85051124016&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85051124016&partnerID=8YFLogxK

U2 - 10.1109/ICHI.2018.00035

DO - 10.1109/ICHI.2018.00035

M3 - Conference contribution

AN - SCOPUS:85051124016

SP - 247

EP - 254

BT - Proceedings - 2018 IEEE International Conference on Healthcare Informatics, ICHI 2018

PB - Institute of Electrical and Electronics Engineers Inc.

ER -