A feature selection algorithm capable of handling extremely large data dimensionality

Yijun Sun, Sinisa Todorovic, Steve Goodison

Research output: Chapter in Book/Report/Conference proceedingConference contribution

19 Scopus citations

Abstract

With the advent of high throughput technologies, feature selection has become increasingly important in a wide range of scientific disciplines. We propose a new feature selection algorithm that performs extremely well in the presence of a huge number of irrelevant features. The key idea is to decompose an arbitrarily complex nonlinear models into a set of locally linear ones through local learning, and then estimate feature relevance globally within a large margin framework. The algorithm is capable of processing many thousands of features within a few minutes on a personal computer, yet maintains a close-to-optimum accuracy that is nearly insensitive to a growing number of irrelevant features. Experiments on eight synthetic and real-world datasets are presented that demonstrate the effectiveness of the algorithm.

Original languageEnglish (US)
Title of host publicationSociety for Industrial and Applied Mathematics - 8th SIAM International Conference on Data Mining 2008, Proceedings in Applied Mathematics 130
PublisherSociety for Industrial and Applied Mathematics Publications
Pages530-540
Number of pages11
ISBN (Print)9781605603179
DOIs
StatePublished - 2008
Event8th SIAM International Conference on Data Mining 2008, Applied Mathematics 130 - Atlanta, GA, United States
Duration: Apr 24 2008Apr 26 2008

Publication series

NameSociety for Industrial and Applied Mathematics - 8th SIAM International Conference on Data Mining 2008, Proceedings in Applied Mathematics 130
Volume2

Other

Other8th SIAM International Conference on Data Mining 2008, Applied Mathematics 130
Country/TerritoryUnited States
CityAtlanta, GA
Period4/24/084/26/08

ASJC Scopus subject areas

  • Information Systems
  • Software
  • Signal Processing
  • Theoretical Computer Science

Fingerprint

Dive into the research topics of 'A feature selection algorithm capable of handling extremely large data dimensionality'. Together they form a unique fingerprint.

Cite this