A publication-based popularity index (PPI) for healthcare dataset ranking

Jingyi Shi, Mingna Zheng, Lixia Yao, Yaorong Ge

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Scopus citations

Abstract

Data are critical in this age of big data and machine learning. Due to their inherent complexity, health-related data are unique in that the datasets are usually acquired for specific purposes and with special designs. As more and more healthcare datasets become available, of which many are public, choosing a quality dataset that is suitable for specific research inquiries is becoming a challenging question for health informatics researchers, especially the learners of this field. On the other hand, from the data provider's perspective, it is important to identify features of datasets that make some datasets more valuable than others so as to improve the design and acquisition of future datasets. To address these questions, we need to develop formal mechanisms to measure the goodness of datasets according to certain criteria. In this study, we propose one way of measuring the value of healthcare datasets that is based on how often the datasets are used and reported by researchers, which we call the Publication-based Popularity Index (PPI). In this article, we describe the design of the PPI and discuss its properties. We demonstrate the utility of the PPI by ranking 14 representative healthcare datasets. We believe that the PPI can enable an overall ranking of all healthcare datasets and thus provide an important dimension to sort search results for dataset integration systems as well as a starting point for identifying and examining the design of the most valuable healthcare datasets so that features of these datasets can inform future designs.

Original languageEnglish (US)
Title of host publicationProceedings - 2018 IEEE International Conference on Healthcare Informatics, ICHI 2018
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages247-254
Number of pages8
ISBN (Electronic)9781538653777
DOIs
StatePublished - Jul 24 2018
Event6th IEEE International Conference on Healthcare Informatics, ICHI 2018 - New York, United States
Duration: Jun 4 2018Jun 7 2018

Publication series

NameProceedings - 2018 IEEE International Conference on Healthcare Informatics, ICHI 2018

Other

Other6th IEEE International Conference on Healthcare Informatics, ICHI 2018
Country/TerritoryUnited States
CityNew York
Period6/4/186/7/18

Keywords

  • Data quality
  • Healthcare dataset
  • Popularity index
  • Quantified measurement
  • Regression

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Networks and Communications
  • Health Informatics

Fingerprint

Dive into the research topics of 'A publication-based popularity index (PPI) for healthcare dataset ranking'. Together they form a unique fingerprint.

Cite this