A feature selection algorithm capable of handling extremely large data dimensionality

Yijun Sun, Sinisa Todorovic, Steven Goodison

Research output: Chapter in Book/Report/Conference proceedingConference contribution

18 Scopus citations

Abstract

With the advent of high throughput technologies, feature selection has become increasingly important in a wide range of scientific disciplines. We propose a new feature selection algorithm that performs extremely well in the presence of a huge number of irrelevant features. The key idea is to decompose an arbitrarily complex nonlinear models into a set of locally linear ones through local learning, and then estimate feature relevance globally within a large margin framework. The algorithm is capable of processing many thousands of features within a few minutes on a personal computer, yet maintains a close-to-optimum accuracy that is nearly insensitive to a growing number of irrelevant features. Experiments on eight synthetic and real-world datasets are presented that demonstrate the effectiveness of the algorithm.

Original languageEnglish (US)
Title of host publicationSociety for Industrial and Applied Mathematics - 8th SIAM International Conference on Data Mining 2008, Proceedings in Applied Mathematics 130
Pages530-540
Number of pages11
Volume2
StatePublished - Oct 1 2008
Externally publishedYes
Event8th SIAM International Conference on Data Mining 2008, Applied Mathematics 130 - Atlanta, GA, United States
Duration: Apr 24 2008Apr 26 2008

Other

Other8th SIAM International Conference on Data Mining 2008, Applied Mathematics 130
CountryUnited States
CityAtlanta, GA
Period4/24/084/26/08

    Fingerprint

ASJC Scopus subject areas

  • Information Systems
  • Software
  • Signal Processing
  • Theoretical Computer Science

Cite this

Sun, Y., Todorovic, S., & Goodison, S. (2008). A feature selection algorithm capable of handling extremely large data dimensionality. In Society for Industrial and Applied Mathematics - 8th SIAM International Conference on Data Mining 2008, Proceedings in Applied Mathematics 130 (Vol. 2, pp. 530-540)