An exploratory study of news article clustering for web-based bio-surveillance

Manabu Torii, Burt Ujin Bayarsaikhan, Hongfang D Liu, Thang Nguyen, Kevin Jones, Noele P G Nelson, David M. Hartley

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Online news articles provide rich and timely information for disease outbreak surveillance. Meanwhile, it is not trivial to search articles relevant to disease outbreaks among the large volume of online publications. In this study, we examined the use of text clustering techniques to organize online articles. To take into account surveillance analysts' expertise in clustering articles, we considered selection of informative word features in a supervised manner. Our experiments suggest that the supervised selection of features can significantly reduce the features size without affecting the utility of resulting clusters. In addition, we observed that the clustering algorithm could yield consistent results when a small number of selected features were used.

Original languageEnglish (US)
Title of host publicationIHI'10 - Proceedings of the 1st ACM International Health Informatics Symposium
Pages435-439
Number of pages5
DOIs
StatePublished - 2010
Externally publishedYes
Event1st ACM International Health Informatics Symposium, IHI'10 - Arlington, VA, United States
Duration: Nov 11 2010Nov 12 2010

Other

Other1st ACM International Health Informatics Symposium, IHI'10
CountryUnited States
CityArlington, VA
Period11/11/1011/12/10

Fingerprint

Cluster Analysis
Disease Outbreaks
Publications

Keywords

  • biosurveillance
  • clustering
  • feature selection
  • text mining

ASJC Scopus subject areas

  • Health Informatics
  • Health Information Management

Cite this

Torii, M., Bayarsaikhan, B. U., Liu, H. D., Nguyen, T., Jones, K., Nelson, N. P. G., & Hartley, D. M. (2010). An exploratory study of news article clustering for web-based bio-surveillance. In IHI'10 - Proceedings of the 1st ACM International Health Informatics Symposium (pp. 435-439) https://doi.org/10.1145/1882992.1883058

An exploratory study of news article clustering for web-based bio-surveillance. / Torii, Manabu; Bayarsaikhan, Burt Ujin; Liu, Hongfang D; Nguyen, Thang; Jones, Kevin; Nelson, Noele P G; Hartley, David M.

IHI'10 - Proceedings of the 1st ACM International Health Informatics Symposium. 2010. p. 435-439.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Torii, M, Bayarsaikhan, BU, Liu, HD, Nguyen, T, Jones, K, Nelson, NPG & Hartley, DM 2010, An exploratory study of news article clustering for web-based bio-surveillance. in IHI'10 - Proceedings of the 1st ACM International Health Informatics Symposium. pp. 435-439, 1st ACM International Health Informatics Symposium, IHI'10, Arlington, VA, United States, 11/11/10. https://doi.org/10.1145/1882992.1883058
Torii M, Bayarsaikhan BU, Liu HD, Nguyen T, Jones K, Nelson NPG et al. An exploratory study of news article clustering for web-based bio-surveillance. In IHI'10 - Proceedings of the 1st ACM International Health Informatics Symposium. 2010. p. 435-439 https://doi.org/10.1145/1882992.1883058
Torii, Manabu ; Bayarsaikhan, Burt Ujin ; Liu, Hongfang D ; Nguyen, Thang ; Jones, Kevin ; Nelson, Noele P G ; Hartley, David M. / An exploratory study of news article clustering for web-based bio-surveillance. IHI'10 - Proceedings of the 1st ACM International Health Informatics Symposium. 2010. pp. 435-439
@inproceedings{57bc9054d67040ff9fc379ebab8dab50,
title = "An exploratory study of news article clustering for web-based bio-surveillance",
abstract = "Online news articles provide rich and timely information for disease outbreak surveillance. Meanwhile, it is not trivial to search articles relevant to disease outbreaks among the large volume of online publications. In this study, we examined the use of text clustering techniques to organize online articles. To take into account surveillance analysts' expertise in clustering articles, we considered selection of informative word features in a supervised manner. Our experiments suggest that the supervised selection of features can significantly reduce the features size without affecting the utility of resulting clusters. In addition, we observed that the clustering algorithm could yield consistent results when a small number of selected features were used.",
keywords = "biosurveillance, clustering, feature selection, text mining",
author = "Manabu Torii and Bayarsaikhan, {Burt Ujin} and Liu, {Hongfang D} and Thang Nguyen and Kevin Jones and Nelson, {Noele P G} and Hartley, {David M.}",
year = "2010",
doi = "10.1145/1882992.1883058",
language = "English (US)",
isbn = "9781450300308",
pages = "435--439",
booktitle = "IHI'10 - Proceedings of the 1st ACM International Health Informatics Symposium",

}

TY - GEN

T1 - An exploratory study of news article clustering for web-based bio-surveillance

AU - Torii, Manabu

AU - Bayarsaikhan, Burt Ujin

AU - Liu, Hongfang D

AU - Nguyen, Thang

AU - Jones, Kevin

AU - Nelson, Noele P G

AU - Hartley, David M.

PY - 2010

Y1 - 2010

N2 - Online news articles provide rich and timely information for disease outbreak surveillance. Meanwhile, it is not trivial to search articles relevant to disease outbreaks among the large volume of online publications. In this study, we examined the use of text clustering techniques to organize online articles. To take into account surveillance analysts' expertise in clustering articles, we considered selection of informative word features in a supervised manner. Our experiments suggest that the supervised selection of features can significantly reduce the features size without affecting the utility of resulting clusters. In addition, we observed that the clustering algorithm could yield consistent results when a small number of selected features were used.

AB - Online news articles provide rich and timely information for disease outbreak surveillance. Meanwhile, it is not trivial to search articles relevant to disease outbreaks among the large volume of online publications. In this study, we examined the use of text clustering techniques to organize online articles. To take into account surveillance analysts' expertise in clustering articles, we considered selection of informative word features in a supervised manner. Our experiments suggest that the supervised selection of features can significantly reduce the features size without affecting the utility of resulting clusters. In addition, we observed that the clustering algorithm could yield consistent results when a small number of selected features were used.

KW - biosurveillance

KW - clustering

KW - feature selection

KW - text mining

UR - http://www.scopus.com/inward/record.url?scp=78650943933&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=78650943933&partnerID=8YFLogxK

U2 - 10.1145/1882992.1883058

DO - 10.1145/1882992.1883058

M3 - Conference contribution

SN - 9781450300308

SP - 435

EP - 439

BT - IHI'10 - Proceedings of the 1st ACM International Health Informatics Symposium

ER -