A text classification method based on latent topics

Yanshan Wang, In Chan Choi

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Scopus citations

Abstract

Latent Dirichlet Allocation (LDA) is a generative model, which exhibits superiority over other topic modelling algorithms on latent topics of text data. Indexing by LDA is a new method in the context of LDA to provide a new definition of document probability vectors that can be applied as feature vectors. In this paper, we propose a joint process of text classification that combines DBSCAN, indexing with LDA and Support Vector Machine (SVM). DBSCAN algorithm is applied as a pre-processing for LDA to determine the number of topics, and then LDA document indexing features are employed for text classifier SVM.

Original languageEnglish (US)
Title of host publicationICORES 2012 - Proceedings of the 1st International Conference on Operations Research and Enterprise Systems
Pages212-214
Number of pages3
StatePublished - 2012
Event1st International Conference on Operations Research and Enterprise Systems, ICORES 2012 - Vilamoura, Algarve, Portugal
Duration: Feb 4 2012Feb 6 2012

Publication series

NameICORES 2012 - Proceedings of the 1st International Conference on Operations Research and Enterprise Systems

Conference

Conference1st International Conference on Operations Research and Enterprise Systems, ICORES 2012
CountryPortugal
CityVilamoura, Algarve
Period2/4/122/6/12

Keywords

  • Indexing by LDA
  • Latent topic
  • Text classification

ASJC Scopus subject areas

  • Management Science and Operations Research

Fingerprint Dive into the research topics of 'A text classification method based on latent topics'. Together they form a unique fingerprint.

Cite this