A feature selection algorithm capable of handling extremely large data dimensionality

Yijun Sun, Sinisa Todorovic, Steven Goodison

Research output: Chapter in Book/Report/Conference proceedingConference contribution

18 Citations (Scopus)

Abstract

With the advent of high throughput technologies, feature selection has become increasingly important in a wide range of scientific disciplines. We propose a new feature selection algorithm that performs extremely well in the presence of a huge number of irrelevant features. The key idea is to decompose an arbitrarily complex nonlinear models into a set of locally linear ones through local learning, and then estimate feature relevance globally within a large margin framework. The algorithm is capable of processing many thousands of features within a few minutes on a personal computer, yet maintains a close-to-optimum accuracy that is nearly insensitive to a growing number of irrelevant features. Experiments on eight synthetic and real-world datasets are presented that demonstrate the effectiveness of the algorithm.

Original languageEnglish (US)
Title of host publicationSociety for Industrial and Applied Mathematics - 8th SIAM International Conference on Data Mining 2008, Proceedings in Applied Mathematics 130
Pages530-540
Number of pages11
Volume2
StatePublished - Oct 1 2008
Externally publishedYes
Event8th SIAM International Conference on Data Mining 2008, Applied Mathematics 130 - Atlanta, GA, United States
Duration: Apr 24 2008Apr 26 2008

Other

Other8th SIAM International Conference on Data Mining 2008, Applied Mathematics 130
CountryUnited States
CityAtlanta, GA
Period4/24/084/26/08

Fingerprint

Large Data
Feature Selection
Dimensionality
Feature extraction
Personal Computer
Personal computers
Margin
High Throughput
Nonlinear Model
Throughput
Decompose
Processing
Estimate
Range of data
Demonstrate
Experiment
Experiments

ASJC Scopus subject areas

  • Information Systems
  • Software
  • Signal Processing
  • Theoretical Computer Science

Cite this

Sun, Y., Todorovic, S., & Goodison, S. (2008). A feature selection algorithm capable of handling extremely large data dimensionality. In Society for Industrial and Applied Mathematics - 8th SIAM International Conference on Data Mining 2008, Proceedings in Applied Mathematics 130 (Vol. 2, pp. 530-540)

A feature selection algorithm capable of handling extremely large data dimensionality. / Sun, Yijun; Todorovic, Sinisa; Goodison, Steven.

Society for Industrial and Applied Mathematics - 8th SIAM International Conference on Data Mining 2008, Proceedings in Applied Mathematics 130. Vol. 2 2008. p. 530-540.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Sun, Y, Todorovic, S & Goodison, S 2008, A feature selection algorithm capable of handling extremely large data dimensionality. in Society for Industrial and Applied Mathematics - 8th SIAM International Conference on Data Mining 2008, Proceedings in Applied Mathematics 130. vol. 2, pp. 530-540, 8th SIAM International Conference on Data Mining 2008, Applied Mathematics 130, Atlanta, GA, United States, 4/24/08.
Sun Y, Todorovic S, Goodison S. A feature selection algorithm capable of handling extremely large data dimensionality. In Society for Industrial and Applied Mathematics - 8th SIAM International Conference on Data Mining 2008, Proceedings in Applied Mathematics 130. Vol. 2. 2008. p. 530-540
Sun, Yijun ; Todorovic, Sinisa ; Goodison, Steven. / A feature selection algorithm capable of handling extremely large data dimensionality. Society for Industrial and Applied Mathematics - 8th SIAM International Conference on Data Mining 2008, Proceedings in Applied Mathematics 130. Vol. 2 2008. pp. 530-540
@inproceedings{1a04267452d047109432efd9d2befc32,
title = "A feature selection algorithm capable of handling extremely large data dimensionality",
abstract = "With the advent of high throughput technologies, feature selection has become increasingly important in a wide range of scientific disciplines. We propose a new feature selection algorithm that performs extremely well in the presence of a huge number of irrelevant features. The key idea is to decompose an arbitrarily complex nonlinear models into a set of locally linear ones through local learning, and then estimate feature relevance globally within a large margin framework. The algorithm is capable of processing many thousands of features within a few minutes on a personal computer, yet maintains a close-to-optimum accuracy that is nearly insensitive to a growing number of irrelevant features. Experiments on eight synthetic and real-world datasets are presented that demonstrate the effectiveness of the algorithm.",
author = "Yijun Sun and Sinisa Todorovic and Steven Goodison",
year = "2008",
month = "10",
day = "1",
language = "English (US)",
isbn = "9781605603179",
volume = "2",
pages = "530--540",
booktitle = "Society for Industrial and Applied Mathematics - 8th SIAM International Conference on Data Mining 2008, Proceedings in Applied Mathematics 130",

}

TY - GEN

T1 - A feature selection algorithm capable of handling extremely large data dimensionality

AU - Sun, Yijun

AU - Todorovic, Sinisa

AU - Goodison, Steven

PY - 2008/10/1

Y1 - 2008/10/1

N2 - With the advent of high throughput technologies, feature selection has become increasingly important in a wide range of scientific disciplines. We propose a new feature selection algorithm that performs extremely well in the presence of a huge number of irrelevant features. The key idea is to decompose an arbitrarily complex nonlinear models into a set of locally linear ones through local learning, and then estimate feature relevance globally within a large margin framework. The algorithm is capable of processing many thousands of features within a few minutes on a personal computer, yet maintains a close-to-optimum accuracy that is nearly insensitive to a growing number of irrelevant features. Experiments on eight synthetic and real-world datasets are presented that demonstrate the effectiveness of the algorithm.

AB - With the advent of high throughput technologies, feature selection has become increasingly important in a wide range of scientific disciplines. We propose a new feature selection algorithm that performs extremely well in the presence of a huge number of irrelevant features. The key idea is to decompose an arbitrarily complex nonlinear models into a set of locally linear ones through local learning, and then estimate feature relevance globally within a large margin framework. The algorithm is capable of processing many thousands of features within a few minutes on a personal computer, yet maintains a close-to-optimum accuracy that is nearly insensitive to a growing number of irrelevant features. Experiments on eight synthetic and real-world datasets are presented that demonstrate the effectiveness of the algorithm.

UR - http://www.scopus.com/inward/record.url?scp=52649105963&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=52649105963&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:52649105963

SN - 9781605603179

VL - 2

SP - 530

EP - 540

BT - Society for Industrial and Applied Mathematics - 8th SIAM International Conference on Data Mining 2008, Proceedings in Applied Mathematics 130

ER -