Estimating disease burden using Internet data

Riyi Qiu; Mirsad Hadzikadic; Sha Yu; Lixia Yao

doi:10.1177/1460458218810743

Estimating disease burden using Internet data

Riyi Qiu, Mirsad Hadzikadic, Sha Yu, Lixia Yao

Quantitative Health Sciences

Research output: Contribution to journal › Article › peer-review

2 Scopus citations

Abstract

Data on disease burden are often used for assessing population health, evaluating the effectiveness of interventions, formulating health policies, and planning future resource allocation. We investigated whether Internet usage and social media data, specifically the search volume on Google, page view count on Wikipedia, and disease mentioning frequency on Twitter, correlated with the disease burden, measured by prevalence and treatment cost, for 1633 diseases over an 11-year period. We also applied least absolute shrinkage and selection operator to predict the burden of diseases. We found that Google search volume is relatively strongly correlated with the burdens for 39 of 1633 diseases, including viral hepatitis, diabetes mellitus, multiple sclerosis, and hemorrhoids. Wikipedia and Twitter data strongly correlated with the burdens of 15 and 7 diseases, respectively. However, an accurate analysis must consider each condition’s characteristics, including acute/chronic nature, severity, familiarity to the public, and the presence of stigma.

Original language	English (US)
Pages (from-to)	1863-1877
Number of pages	15
Journal	Health Informatics Journal
Volume	25
Issue number	4
DOIs	https://doi.org/10.1177/1460458218810743
State	Published - Dec 1 2019

Keywords

Google search
Twitter
Wikipedia
data mining
disease burden
least absolute shrinkage and selection operator
prevalence
treatment cost

ASJC Scopus subject areas

Health Informatics

Access to Document

10.1177/1460458218810743

Cite this

@article{8db0427c1beb46f7b8ef31973082d870,

title = "Estimating disease burden using Internet data",

abstract = "Data on disease burden are often used for assessing population health, evaluating the effectiveness of interventions, formulating health policies, and planning future resource allocation. We investigated whether Internet usage and social media data, specifically the search volume on Google, page view count on Wikipedia, and disease mentioning frequency on Twitter, correlated with the disease burden, measured by prevalence and treatment cost, for 1633 diseases over an 11-year period. We also applied least absolute shrinkage and selection operator to predict the burden of diseases. We found that Google search volume is relatively strongly correlated with the burdens for 39 of 1633 diseases, including viral hepatitis, diabetes mellitus, multiple sclerosis, and hemorrhoids. Wikipedia and Twitter data strongly correlated with the burdens of 15 and 7 diseases, respectively. However, an accurate analysis must consider each condition{\textquoteright}s characteristics, including acute/chronic nature, severity, familiarity to the public, and the presence of stigma.",

keywords = "Google search, Twitter, Wikipedia, data mining, disease burden, least absolute shrinkage and selection operator, prevalence, treatment cost",

author = "Riyi Qiu and Mirsad Hadzikadic and Sha Yu and Lixia Yao",

note = "Publisher Copyright: {\textcopyright} The Author(s) 2018.",

year = "2019",

month = dec,

day = "1",

doi = "10.1177/1460458218810743",

language = "English (US)",

volume = "25",

pages = "1863--1877",

journal = "Health Informatics Journal",

issn = "1460-4582",

publisher = "SAGE Publications Ltd",

number = "4",

}

TY - JOUR

T1 - Estimating disease burden using Internet data

AU - Qiu, Riyi

AU - Hadzikadic, Mirsad

AU - Yu, Sha

AU - Yao, Lixia

N1 - Publisher Copyright: © The Author(s) 2018.

PY - 2019/12/1

Y1 - 2019/12/1

N2 - Data on disease burden are often used for assessing population health, evaluating the effectiveness of interventions, formulating health policies, and planning future resource allocation. We investigated whether Internet usage and social media data, specifically the search volume on Google, page view count on Wikipedia, and disease mentioning frequency on Twitter, correlated with the disease burden, measured by prevalence and treatment cost, for 1633 diseases over an 11-year period. We also applied least absolute shrinkage and selection operator to predict the burden of diseases. We found that Google search volume is relatively strongly correlated with the burdens for 39 of 1633 diseases, including viral hepatitis, diabetes mellitus, multiple sclerosis, and hemorrhoids. Wikipedia and Twitter data strongly correlated with the burdens of 15 and 7 diseases, respectively. However, an accurate analysis must consider each condition’s characteristics, including acute/chronic nature, severity, familiarity to the public, and the presence of stigma.

AB - Data on disease burden are often used for assessing population health, evaluating the effectiveness of interventions, formulating health policies, and planning future resource allocation. We investigated whether Internet usage and social media data, specifically the search volume on Google, page view count on Wikipedia, and disease mentioning frequency on Twitter, correlated with the disease burden, measured by prevalence and treatment cost, for 1633 diseases over an 11-year period. We also applied least absolute shrinkage and selection operator to predict the burden of diseases. We found that Google search volume is relatively strongly correlated with the burdens for 39 of 1633 diseases, including viral hepatitis, diabetes mellitus, multiple sclerosis, and hemorrhoids. Wikipedia and Twitter data strongly correlated with the burdens of 15 and 7 diseases, respectively. However, an accurate analysis must consider each condition’s characteristics, including acute/chronic nature, severity, familiarity to the public, and the presence of stigma.

KW - Google search

KW - Twitter

KW - Wikipedia

KW - data mining

KW - disease burden

KW - least absolute shrinkage and selection operator

KW - prevalence

KW - treatment cost

UR - http://www.scopus.com/inward/record.url?scp=85058943364&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85058943364&partnerID=8YFLogxK

U2 - 10.1177/1460458218810743

DO - 10.1177/1460458218810743

M3 - Article

C2 - 30488754

AN - SCOPUS:85058943364

SN - 1460-4582

VL - 25

SP - 1863

EP - 1877

JO - Health Informatics Journal

JF - Health Informatics Journal

IS - 4

ER -

Estimating disease burden using Internet data

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this