A publication-based popularity index (PPI) for healthcare dataset ranking

Jingyi Shi; Mingna Zheng; Lixia Yao; Yaorong Ge

doi:10.1109/ICHI.2018.00035

A publication-based popularity index (PPI) for healthcare dataset ranking

Jingyi Shi, Mingna Zheng, Lixia Yao, Yaorong Ge

Quantitative Health Sciences

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

2 Scopus citations

Abstract

Data are critical in this age of big data and machine learning. Due to their inherent complexity, health-related data are unique in that the datasets are usually acquired for specific purposes and with special designs. As more and more healthcare datasets become available, of which many are public, choosing a quality dataset that is suitable for specific research inquiries is becoming a challenging question for health informatics researchers, especially the learners of this field. On the other hand, from the data provider's perspective, it is important to identify features of datasets that make some datasets more valuable than others so as to improve the design and acquisition of future datasets. To address these questions, we need to develop formal mechanisms to measure the goodness of datasets according to certain criteria. In this study, we propose one way of measuring the value of healthcare datasets that is based on how often the datasets are used and reported by researchers, which we call the Publication-based Popularity Index (PPI). In this article, we describe the design of the PPI and discuss its properties. We demonstrate the utility of the PPI by ranking 14 representative healthcare datasets. We believe that the PPI can enable an overall ranking of all healthcare datasets and thus provide an important dimension to sort search results for dataset integration systems as well as a starting point for identifying and examining the design of the most valuable healthcare datasets so that features of these datasets can inform future designs.

Original language	English (US)
Title of host publication	Proceedings - 2018 IEEE International Conference on Healthcare Informatics, ICHI 2018
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	247-254
Number of pages	8
ISBN (Electronic)	9781538653777
DOIs	https://doi.org/10.1109/ICHI.2018.00035
State	Published - Jul 24 2018
Event	6th IEEE International Conference on Healthcare Informatics, ICHI 2018 - New York, United States Duration: Jun 4 2018 → Jun 7 2018

Publication series

Name	Proceedings - 2018 IEEE International Conference on Healthcare Informatics, ICHI 2018

Other

Other	6th IEEE International Conference on Healthcare Informatics, ICHI 2018
Country/Territory	United States
City	New York
Period	6/4/18 → 6/7/18

Keywords

Data quality
Healthcare dataset
Popularity index
Quantified measurement
Regression

ASJC Scopus subject areas

Artificial Intelligence
Computer Networks and Communications
Health Informatics

Access to Document

10.1109/ICHI.2018.00035

Cite this

Shi, J., Zheng, M., Yao, L., & Ge, Y. (2018). A publication-based popularity index (PPI) for healthcare dataset ranking. In Proceedings - 2018 IEEE International Conference on Healthcare Informatics, ICHI 2018 (pp. 247-254). Article 8419368 (Proceedings - 2018 IEEE International Conference on Healthcare Informatics, ICHI 2018). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICHI.2018.00035

A publication-based popularity index (PPI) for healthcare dataset ranking. / Shi, Jingyi; Zheng, Mingna; Yao, Lixia et al.
Proceedings - 2018 IEEE International Conference on Healthcare Informatics, ICHI 2018. Institute of Electrical and Electronics Engineers Inc., 2018. p. 247-254 8419368 (Proceedings - 2018 IEEE International Conference on Healthcare Informatics, ICHI 2018).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Shi, J, Zheng, M, Yao, L & Ge, Y 2018, A publication-based popularity index (PPI) for healthcare dataset ranking. in Proceedings - 2018 IEEE International Conference on Healthcare Informatics, ICHI 2018., 8419368, Proceedings - 2018 IEEE International Conference on Healthcare Informatics, ICHI 2018, Institute of Electrical and Electronics Engineers Inc., pp. 247-254, 6th IEEE International Conference on Healthcare Informatics, ICHI 2018, New York, United States, 6/4/18. https://doi.org/10.1109/ICHI.2018.00035

Shi J, Zheng M, Yao L, Ge Y. A publication-based popularity index (PPI) for healthcare dataset ranking. In Proceedings - 2018 IEEE International Conference on Healthcare Informatics, ICHI 2018. Institute of Electrical and Electronics Engineers Inc. 2018. p. 247-254. 8419368. (Proceedings - 2018 IEEE International Conference on Healthcare Informatics, ICHI 2018). doi: 10.1109/ICHI.2018.00035

Shi, Jingyi ; Zheng, Mingna ; Yao, Lixia et al. / A publication-based popularity index (PPI) for healthcare dataset ranking. Proceedings - 2018 IEEE International Conference on Healthcare Informatics, ICHI 2018. Institute of Electrical and Electronics Engineers Inc., 2018. pp. 247-254 (Proceedings - 2018 IEEE International Conference on Healthcare Informatics, ICHI 2018).

@inproceedings{6b4f881cc6eb42f384f54f0d49a79a0a,

title = "A publication-based popularity index (PPI) for healthcare dataset ranking",

abstract = "Data are critical in this age of big data and machine learning. Due to their inherent complexity, health-related data are unique in that the datasets are usually acquired for specific purposes and with special designs. As more and more healthcare datasets become available, of which many are public, choosing a quality dataset that is suitable for specific research inquiries is becoming a challenging question for health informatics researchers, especially the learners of this field. On the other hand, from the data provider's perspective, it is important to identify features of datasets that make some datasets more valuable than others so as to improve the design and acquisition of future datasets. To address these questions, we need to develop formal mechanisms to measure the goodness of datasets according to certain criteria. In this study, we propose one way of measuring the value of healthcare datasets that is based on how often the datasets are used and reported by researchers, which we call the Publication-based Popularity Index (PPI). In this article, we describe the design of the PPI and discuss its properties. We demonstrate the utility of the PPI by ranking 14 representative healthcare datasets. We believe that the PPI can enable an overall ranking of all healthcare datasets and thus provide an important dimension to sort search results for dataset integration systems as well as a starting point for identifying and examining the design of the most valuable healthcare datasets so that features of these datasets can inform future designs.",

keywords = "Data quality, Healthcare dataset, Popularity index, Quantified measurement, Regression",

author = "Jingyi Shi and Mingna Zheng and Lixia Yao and Yaorong Ge",

note = "Publisher Copyright: {\textcopyright} 2018 IEEE.; 6th IEEE International Conference on Healthcare Informatics, ICHI 2018 ; Conference date: 04-06-2018 Through 07-06-2018",

year = "2018",

month = jul,

day = "24",

doi = "10.1109/ICHI.2018.00035",

language = "English (US)",

series = "Proceedings - 2018 IEEE International Conference on Healthcare Informatics, ICHI 2018",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "247--254",

booktitle = "Proceedings - 2018 IEEE International Conference on Healthcare Informatics, ICHI 2018",

}

TY - GEN

T1 - A publication-based popularity index (PPI) for healthcare dataset ranking

AU - Shi, Jingyi

AU - Zheng, Mingna

AU - Yao, Lixia

AU - Ge, Yaorong

PY - 2018/7/24

Y1 - 2018/7/24

N2 - Data are critical in this age of big data and machine learning. Due to their inherent complexity, health-related data are unique in that the datasets are usually acquired for specific purposes and with special designs. As more and more healthcare datasets become available, of which many are public, choosing a quality dataset that is suitable for specific research inquiries is becoming a challenging question for health informatics researchers, especially the learners of this field. On the other hand, from the data provider's perspective, it is important to identify features of datasets that make some datasets more valuable than others so as to improve the design and acquisition of future datasets. To address these questions, we need to develop formal mechanisms to measure the goodness of datasets according to certain criteria. In this study, we propose one way of measuring the value of healthcare datasets that is based on how often the datasets are used and reported by researchers, which we call the Publication-based Popularity Index (PPI). In this article, we describe the design of the PPI and discuss its properties. We demonstrate the utility of the PPI by ranking 14 representative healthcare datasets. We believe that the PPI can enable an overall ranking of all healthcare datasets and thus provide an important dimension to sort search results for dataset integration systems as well as a starting point for identifying and examining the design of the most valuable healthcare datasets so that features of these datasets can inform future designs.

AB - Data are critical in this age of big data and machine learning. Due to their inherent complexity, health-related data are unique in that the datasets are usually acquired for specific purposes and with special designs. As more and more healthcare datasets become available, of which many are public, choosing a quality dataset that is suitable for specific research inquiries is becoming a challenging question for health informatics researchers, especially the learners of this field. On the other hand, from the data provider's perspective, it is important to identify features of datasets that make some datasets more valuable than others so as to improve the design and acquisition of future datasets. To address these questions, we need to develop formal mechanisms to measure the goodness of datasets according to certain criteria. In this study, we propose one way of measuring the value of healthcare datasets that is based on how often the datasets are used and reported by researchers, which we call the Publication-based Popularity Index (PPI). In this article, we describe the design of the PPI and discuss its properties. We demonstrate the utility of the PPI by ranking 14 representative healthcare datasets. We believe that the PPI can enable an overall ranking of all healthcare datasets and thus provide an important dimension to sort search results for dataset integration systems as well as a starting point for identifying and examining the design of the most valuable healthcare datasets so that features of these datasets can inform future designs.

KW - Data quality

KW - Healthcare dataset

KW - Popularity index

KW - Quantified measurement

KW - Regression

UR - http://www.scopus.com/inward/record.url?scp=85051124016&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85051124016&partnerID=8YFLogxK

U2 - 10.1109/ICHI.2018.00035

DO - 10.1109/ICHI.2018.00035

M3 - Conference contribution

AN - SCOPUS:85051124016

T3 - Proceedings - 2018 IEEE International Conference on Healthcare Informatics, ICHI 2018

SP - 247

EP - 254

BT - Proceedings - 2018 IEEE International Conference on Healthcare Informatics, ICHI 2018

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 6th IEEE International Conference on Healthcare Informatics, ICHI 2018

Y2 - 4 June 2018 through 7 June 2018

ER -

A publication-based popularity index (PPI) for healthcare dataset ranking

Abstract

Publication series

Other

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this