A feature selection algorithm capable of handling extremely large data dimensionality

Yijun Sun, Sinisa Todorovic, Steve Goodison

Research output: Chapter in Book/Report/Conference proceedingConference contribution

18 Scopus citations

Abstract

With the advent of high throughput technologies, feature selection has become increasingly important in a wide range of scientific disciplines. We propose a new feature selection algorithm that performs extremely well in the presence of a huge number of irrelevant features. The key idea is to decompose an arbitrarily complex nonlinear models into a set of locally linear ones through local learning, and then estimate feature relevance globally within a large margin framework. The algorithm is capable of processing many thousands of features within a few minutes on a personal computer, yet maintains a close-to-optimum accuracy that is nearly insensitive to a growing number of irrelevant features. Experiments on eight synthetic and real-world datasets are presented that demonstrate the effectiveness of the algorithm.

Original languageEnglish (US)
Title of host publicationSociety for Industrial and Applied Mathematics - 8th SIAM International Conference on Data Mining 2008, Proceedings in Applied Mathematics 130
PublisherSociety for Industrial and Applied Mathematics Publications
Pages530-540
Number of pages11
ISBN (Print)9781605603179
DOIs
StatePublished - 2008
Event8th SIAM International Conference on Data Mining 2008, Applied Mathematics 130 - Atlanta, GA, United States
Duration: Apr 24 2008Apr 26 2008

Publication series

NameSociety for Industrial and Applied Mathematics - 8th SIAM International Conference on Data Mining 2008, Proceedings in Applied Mathematics 130
Volume2

Other

Other8th SIAM International Conference on Data Mining 2008, Applied Mathematics 130
CountryUnited States
CityAtlanta, GA
Period4/24/084/26/08

ASJC Scopus subject areas

  • Information Systems
  • Software
  • Signal Processing
  • Theoretical Computer Science

Fingerprint Dive into the research topics of 'A feature selection algorithm capable of handling extremely large data dimensionality'. Together they form a unique fingerprint.

Cite this