TY - GEN
T1 - A publication-based popularity index (PPI) for healthcare dataset ranking
AU - Shi, Jingyi
AU - Zheng, Mingna
AU - Yao, Lixia
AU - Ge, Yaorong
N1 - Publisher Copyright:
© 2018 IEEE.
PY - 2018/7/24
Y1 - 2018/7/24
N2 - Data are critical in this age of big data and machine learning. Due to their inherent complexity, health-related data are unique in that the datasets are usually acquired for specific purposes and with special designs. As more and more healthcare datasets become available, of which many are public, choosing a quality dataset that is suitable for specific research inquiries is becoming a challenging question for health informatics researchers, especially the learners of this field. On the other hand, from the data provider's perspective, it is important to identify features of datasets that make some datasets more valuable than others so as to improve the design and acquisition of future datasets. To address these questions, we need to develop formal mechanisms to measure the goodness of datasets according to certain criteria. In this study, we propose one way of measuring the value of healthcare datasets that is based on how often the datasets are used and reported by researchers, which we call the Publication-based Popularity Index (PPI). In this article, we describe the design of the PPI and discuss its properties. We demonstrate the utility of the PPI by ranking 14 representative healthcare datasets. We believe that the PPI can enable an overall ranking of all healthcare datasets and thus provide an important dimension to sort search results for dataset integration systems as well as a starting point for identifying and examining the design of the most valuable healthcare datasets so that features of these datasets can inform future designs.
AB - Data are critical in this age of big data and machine learning. Due to their inherent complexity, health-related data are unique in that the datasets are usually acquired for specific purposes and with special designs. As more and more healthcare datasets become available, of which many are public, choosing a quality dataset that is suitable for specific research inquiries is becoming a challenging question for health informatics researchers, especially the learners of this field. On the other hand, from the data provider's perspective, it is important to identify features of datasets that make some datasets more valuable than others so as to improve the design and acquisition of future datasets. To address these questions, we need to develop formal mechanisms to measure the goodness of datasets according to certain criteria. In this study, we propose one way of measuring the value of healthcare datasets that is based on how often the datasets are used and reported by researchers, which we call the Publication-based Popularity Index (PPI). In this article, we describe the design of the PPI and discuss its properties. We demonstrate the utility of the PPI by ranking 14 representative healthcare datasets. We believe that the PPI can enable an overall ranking of all healthcare datasets and thus provide an important dimension to sort search results for dataset integration systems as well as a starting point for identifying and examining the design of the most valuable healthcare datasets so that features of these datasets can inform future designs.
KW - Data quality
KW - Healthcare dataset
KW - Popularity index
KW - Quantified measurement
KW - Regression
UR - http://www.scopus.com/inward/record.url?scp=85051124016&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85051124016&partnerID=8YFLogxK
U2 - 10.1109/ICHI.2018.00035
DO - 10.1109/ICHI.2018.00035
M3 - Conference contribution
AN - SCOPUS:85051124016
T3 - Proceedings - 2018 IEEE International Conference on Healthcare Informatics, ICHI 2018
SP - 247
EP - 254
BT - Proceedings - 2018 IEEE International Conference on Healthcare Informatics, ICHI 2018
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 6th IEEE International Conference on Healthcare Informatics, ICHI 2018
Y2 - 4 June 2018 through 7 June 2018
ER -