An exploratory study of news article clustering for web-based bio-surveillance

Manabu Torii, Burt Ujin Bayarsaikhan, Hongfang Liu, Thang Nguyen, Kevin Jones, Noele P.G. Nelson, David M. Hartley

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Online news articles provide rich and timely information for disease outbreak surveillance. Meanwhile, it is not trivial to search articles relevant to disease outbreaks among the large volume of online publications. In this study, we examined the use of text clustering techniques to organize online articles. To take into account surveillance analysts' expertise in clustering articles, we considered selection of informative word features in a supervised manner. Our experiments suggest that the supervised selection of features can significantly reduce the features size without affecting the utility of resulting clusters. In addition, we observed that the clustering algorithm could yield consistent results when a small number of selected features were used.

Original languageEnglish (US)
Title of host publicationIHI'10 - Proceedings of the 1st ACM International Health Informatics Symposium
Pages435-439
Number of pages5
DOIs
StatePublished - Dec 1 2010
Event1st ACM International Health Informatics Symposium, IHI'10 - Arlington, VA, United States
Duration: Nov 11 2010Nov 12 2010

Publication series

NameIHI'10 - Proceedings of the 1st ACM International Health Informatics Symposium

Other

Other1st ACM International Health Informatics Symposium, IHI'10
CountryUnited States
CityArlington, VA
Period11/11/1011/12/10

    Fingerprint

Keywords

  • biosurveillance
  • clustering
  • feature selection
  • text mining

ASJC Scopus subject areas

  • Health Informatics
  • Health Information Management

Cite this

Torii, M., Bayarsaikhan, B. U., Liu, H., Nguyen, T., Jones, K., Nelson, N. P. G., & Hartley, D. M. (2010). An exploratory study of news article clustering for web-based bio-surveillance. In IHI'10 - Proceedings of the 1st ACM International Health Informatics Symposium (pp. 435-439). (IHI'10 - Proceedings of the 1st ACM International Health Informatics Symposium). https://doi.org/10.1145/1882992.1883058