Estimating disease burden using google trends and Wikipedia data

Riyi Qiu, Mirsad Hadzikadic, Lixia Yao

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

Data on disease burden is often used for assessing population health, evaluating the effectiveness of interventions, formulating health policies, and planning future resource allocation. We investigated whether Internet usage data, particularly the search volume on Google and page view counts on Wikipedia, are correlated with the disease burden, measured by prevalence and treatment cost, for 1,633 diseases over an 11-year period. We also applied the method of least absolute shrinkage and selection operator (LASSO) to predict the burden of diseases, using those Internet data together with three other variables we quantified previously. We found a relatively strong correlation for 39 of 1,633 diseases, including viral hepatitis, diabetes mellitus, other headache syndromes, multiple sclerosis, sleep apnea, hemorrhoids, and disaccharidase deficiency. However, an accurate analysis must consider each condition’s characteristics, including acute/chronic nature, severity, familiarity to the public, and presence of stigma.

Original languageEnglish (US)
Title of host publicationAdvances in Artificial Intelligence
Subtitle of host publicationFrom Theory to Practice - 30th International Conference on Industrial Engineering and Other Applications of Applied Intelligent Systems, IEA/AIE 2017, Proceedings
EditorsMoonis Ali, Salem Benferhat, Karim Tabia
PublisherSpringer Verlag
Pages374-385
Number of pages12
ISBN (Print)9783319600444
DOIs
StatePublished - Jan 1 2017
Event30th International Conference on Industrial, Engineering, and Other Applications of Applied Intelligent Systems, IEA/AIE 2017 - Arras, France
Duration: Jun 27 2017Jun 30 2017

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume10351 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other30th International Conference on Industrial, Engineering, and Other Applications of Applied Intelligent Systems, IEA/AIE 2017
CountryFrance
CityArras
Period6/27/176/30/17

Fingerprint

Wikipedia
Health
Internet
Multiple Sclerosis
Diabetes Mellitus
Sleep
Medical problems
Shrinkage
Acute
Resource Allocation
Resource allocation
Count
Planning
Trends
Predict
Costs
Operator

Keywords

  • Disease burden
  • Least absolute shrinkage and selection operator (LASSO)
  • Page review
  • Prevalence
  • Search query volume
  • Treatment cost

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Cite this

Qiu, R., Hadzikadic, M., & Yao, L. (2017). Estimating disease burden using google trends and Wikipedia data. In M. Ali, S. Benferhat, & K. Tabia (Eds.), Advances in Artificial Intelligence: From Theory to Practice - 30th International Conference on Industrial Engineering and Other Applications of Applied Intelligent Systems, IEA/AIE 2017, Proceedings (pp. 374-385). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 10351 LNCS). Springer Verlag. https://doi.org/10.1007/978-3-319-60045-1_39

Estimating disease burden using google trends and Wikipedia data. / Qiu, Riyi; Hadzikadic, Mirsad; Yao, Lixia.

Advances in Artificial Intelligence: From Theory to Practice - 30th International Conference on Industrial Engineering and Other Applications of Applied Intelligent Systems, IEA/AIE 2017, Proceedings. ed. / Moonis Ali; Salem Benferhat; Karim Tabia. Springer Verlag, 2017. p. 374-385 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 10351 LNCS).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Qiu, R, Hadzikadic, M & Yao, L 2017, Estimating disease burden using google trends and Wikipedia data. in M Ali, S Benferhat & K Tabia (eds), Advances in Artificial Intelligence: From Theory to Practice - 30th International Conference on Industrial Engineering and Other Applications of Applied Intelligent Systems, IEA/AIE 2017, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 10351 LNCS, Springer Verlag, pp. 374-385, 30th International Conference on Industrial, Engineering, and Other Applications of Applied Intelligent Systems, IEA/AIE 2017, Arras, France, 6/27/17. https://doi.org/10.1007/978-3-319-60045-1_39
Qiu R, Hadzikadic M, Yao L. Estimating disease burden using google trends and Wikipedia data. In Ali M, Benferhat S, Tabia K, editors, Advances in Artificial Intelligence: From Theory to Practice - 30th International Conference on Industrial Engineering and Other Applications of Applied Intelligent Systems, IEA/AIE 2017, Proceedings. Springer Verlag. 2017. p. 374-385. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-319-60045-1_39
Qiu, Riyi ; Hadzikadic, Mirsad ; Yao, Lixia. / Estimating disease burden using google trends and Wikipedia data. Advances in Artificial Intelligence: From Theory to Practice - 30th International Conference on Industrial Engineering and Other Applications of Applied Intelligent Systems, IEA/AIE 2017, Proceedings. editor / Moonis Ali ; Salem Benferhat ; Karim Tabia. Springer Verlag, 2017. pp. 374-385 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{5781d18dcc834ec0a94d271c762fd59e,
title = "Estimating disease burden using google trends and Wikipedia data",
abstract = "Data on disease burden is often used for assessing population health, evaluating the effectiveness of interventions, formulating health policies, and planning future resource allocation. We investigated whether Internet usage data, particularly the search volume on Google and page view counts on Wikipedia, are correlated with the disease burden, measured by prevalence and treatment cost, for 1,633 diseases over an 11-year period. We also applied the method of least absolute shrinkage and selection operator (LASSO) to predict the burden of diseases, using those Internet data together with three other variables we quantified previously. We found a relatively strong correlation for 39 of 1,633 diseases, including viral hepatitis, diabetes mellitus, other headache syndromes, multiple sclerosis, sleep apnea, hemorrhoids, and disaccharidase deficiency. However, an accurate analysis must consider each condition’s characteristics, including acute/chronic nature, severity, familiarity to the public, and presence of stigma.",
keywords = "Disease burden, Least absolute shrinkage and selection operator (LASSO), Page review, Prevalence, Search query volume, Treatment cost",
author = "Riyi Qiu and Mirsad Hadzikadic and Lixia Yao",
year = "2017",
month = "1",
day = "1",
doi = "10.1007/978-3-319-60045-1_39",
language = "English (US)",
isbn = "9783319600444",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer Verlag",
pages = "374--385",
editor = "Moonis Ali and Salem Benferhat and Karim Tabia",
booktitle = "Advances in Artificial Intelligence",
address = "Germany",

}

TY - GEN

T1 - Estimating disease burden using google trends and Wikipedia data

AU - Qiu, Riyi

AU - Hadzikadic, Mirsad

AU - Yao, Lixia

PY - 2017/1/1

Y1 - 2017/1/1

N2 - Data on disease burden is often used for assessing population health, evaluating the effectiveness of interventions, formulating health policies, and planning future resource allocation. We investigated whether Internet usage data, particularly the search volume on Google and page view counts on Wikipedia, are correlated with the disease burden, measured by prevalence and treatment cost, for 1,633 diseases over an 11-year period. We also applied the method of least absolute shrinkage and selection operator (LASSO) to predict the burden of diseases, using those Internet data together with three other variables we quantified previously. We found a relatively strong correlation for 39 of 1,633 diseases, including viral hepatitis, diabetes mellitus, other headache syndromes, multiple sclerosis, sleep apnea, hemorrhoids, and disaccharidase deficiency. However, an accurate analysis must consider each condition’s characteristics, including acute/chronic nature, severity, familiarity to the public, and presence of stigma.

AB - Data on disease burden is often used for assessing population health, evaluating the effectiveness of interventions, formulating health policies, and planning future resource allocation. We investigated whether Internet usage data, particularly the search volume on Google and page view counts on Wikipedia, are correlated with the disease burden, measured by prevalence and treatment cost, for 1,633 diseases over an 11-year period. We also applied the method of least absolute shrinkage and selection operator (LASSO) to predict the burden of diseases, using those Internet data together with three other variables we quantified previously. We found a relatively strong correlation for 39 of 1,633 diseases, including viral hepatitis, diabetes mellitus, other headache syndromes, multiple sclerosis, sleep apnea, hemorrhoids, and disaccharidase deficiency. However, an accurate analysis must consider each condition’s characteristics, including acute/chronic nature, severity, familiarity to the public, and presence of stigma.

KW - Disease burden

KW - Least absolute shrinkage and selection operator (LASSO)

KW - Page review

KW - Prevalence

KW - Search query volume

KW - Treatment cost

UR - http://www.scopus.com/inward/record.url?scp=85026287521&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85026287521&partnerID=8YFLogxK

U2 - 10.1007/978-3-319-60045-1_39

DO - 10.1007/978-3-319-60045-1_39

M3 - Conference contribution

AN - SCOPUS:85026287521

SN - 9783319600444

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 374

EP - 385

BT - Advances in Artificial Intelligence

A2 - Ali, Moonis

A2 - Benferhat, Salem

A2 - Tabia, Karim

PB - Springer Verlag

ER -