A feature selection algorithm capable of handling extremely large data dimensionality

Yijun Sun; Sinisa Todorovic; Steve Goodison

doi:10.1137/1.9781611972788.48

A feature selection algorithm capable of handling extremely large data dimensionality

Yijun Sun, Sinisa Todorovic, Steve Goodison

Quantitative Health Sciences

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

19 Scopus citations

Abstract

With the advent of high throughput technologies, feature selection has become increasingly important in a wide range of scientific disciplines. We propose a new feature selection algorithm that performs extremely well in the presence of a huge number of irrelevant features. The key idea is to decompose an arbitrarily complex nonlinear models into a set of locally linear ones through local learning, and then estimate feature relevance globally within a large margin framework. The algorithm is capable of processing many thousands of features within a few minutes on a personal computer, yet maintains a close-to-optimum accuracy that is nearly insensitive to a growing number of irrelevant features. Experiments on eight synthetic and real-world datasets are presented that demonstrate the effectiveness of the algorithm.

Original language	English (US)
Title of host publication	Society for Industrial and Applied Mathematics - 8th SIAM International Conference on Data Mining 2008, Proceedings in Applied Mathematics 130
Publisher	Society for Industrial and Applied Mathematics Publications
Pages	530-540
Number of pages	11
ISBN (Print)	9781605603179
DOIs	https://doi.org/10.1137/1.9781611972788.48
State	Published - 2008
Event	8th SIAM International Conference on Data Mining 2008, Applied Mathematics 130 - Atlanta, GA, United States Duration: Apr 24 2008 → Apr 26 2008

Publication series

Name	Society for Industrial and Applied Mathematics - 8th SIAM International Conference on Data Mining 2008, Proceedings in Applied Mathematics 130
Volume	2

Other

Other	8th SIAM International Conference on Data Mining 2008, Applied Mathematics 130
Country/Territory	United States
City	Atlanta, GA
Period	4/24/08 → 4/26/08

ASJC Scopus subject areas

Information Systems
Software
Signal Processing
Theoretical Computer Science

Access to Document

10.1137/1.9781611972788.48

Cite this

Sun, Y., Todorovic, S., & Goodison, S. (2008). A feature selection algorithm capable of handling extremely large data dimensionality. In Society for Industrial and Applied Mathematics - 8th SIAM International Conference on Data Mining 2008, Proceedings in Applied Mathematics 130 (pp. 530-540). (Society for Industrial and Applied Mathematics - 8th SIAM International Conference on Data Mining 2008, Proceedings in Applied Mathematics 130; Vol. 2). Society for Industrial and Applied Mathematics Publications. https://doi.org/10.1137/1.9781611972788.48

A feature selection algorithm capable of handling extremely large data dimensionality. / Sun, Yijun; Todorovic, Sinisa; Goodison, Steve.
Society for Industrial and Applied Mathematics - 8th SIAM International Conference on Data Mining 2008, Proceedings in Applied Mathematics 130. Society for Industrial and Applied Mathematics Publications, 2008. p. 530-540 (Society for Industrial and Applied Mathematics - 8th SIAM International Conference on Data Mining 2008, Proceedings in Applied Mathematics 130; Vol. 2).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Sun, Y, Todorovic, S & Goodison, S 2008, A feature selection algorithm capable of handling extremely large data dimensionality. in Society for Industrial and Applied Mathematics - 8th SIAM International Conference on Data Mining 2008, Proceedings in Applied Mathematics 130. Society for Industrial and Applied Mathematics - 8th SIAM International Conference on Data Mining 2008, Proceedings in Applied Mathematics 130, vol. 2, Society for Industrial and Applied Mathematics Publications, pp. 530-540, 8th SIAM International Conference on Data Mining 2008, Applied Mathematics 130, Atlanta, GA, United States, 4/24/08. https://doi.org/10.1137/1.9781611972788.48

Sun Y, Todorovic S, Goodison S. A feature selection algorithm capable of handling extremely large data dimensionality. In Society for Industrial and Applied Mathematics - 8th SIAM International Conference on Data Mining 2008, Proceedings in Applied Mathematics 130. Society for Industrial and Applied Mathematics Publications. 2008. p. 530-540. (Society for Industrial and Applied Mathematics - 8th SIAM International Conference on Data Mining 2008, Proceedings in Applied Mathematics 130). doi: 10.1137/1.9781611972788.48

Sun, Yijun ; Todorovic, Sinisa ; Goodison, Steve. / A feature selection algorithm capable of handling extremely large data dimensionality. Society for Industrial and Applied Mathematics - 8th SIAM International Conference on Data Mining 2008, Proceedings in Applied Mathematics 130. Society for Industrial and Applied Mathematics Publications, 2008. pp. 530-540 (Society for Industrial and Applied Mathematics - 8th SIAM International Conference on Data Mining 2008, Proceedings in Applied Mathematics 130).

@inproceedings{1a04267452d047109432efd9d2befc32,

title = "A feature selection algorithm capable of handling extremely large data dimensionality",

abstract = "With the advent of high throughput technologies, feature selection has become increasingly important in a wide range of scientific disciplines. We propose a new feature selection algorithm that performs extremely well in the presence of a huge number of irrelevant features. The key idea is to decompose an arbitrarily complex nonlinear models into a set of locally linear ones through local learning, and then estimate feature relevance globally within a large margin framework. The algorithm is capable of processing many thousands of features within a few minutes on a personal computer, yet maintains a close-to-optimum accuracy that is nearly insensitive to a growing number of irrelevant features. Experiments on eight synthetic and real-world datasets are presented that demonstrate the effectiveness of the algorithm.",

author = "Yijun Sun and Sinisa Todorovic and Steve Goodison",

year = "2008",

doi = "10.1137/1.9781611972788.48",

language = "English (US)",

isbn = "9781605603179",

series = "Society for Industrial and Applied Mathematics - 8th SIAM International Conference on Data Mining 2008, Proceedings in Applied Mathematics 130",

publisher = "Society for Industrial and Applied Mathematics Publications",

pages = "530--540",

booktitle = "Society for Industrial and Applied Mathematics - 8th SIAM International Conference on Data Mining 2008, Proceedings in Applied Mathematics 130",

address = "United States",

note = "8th SIAM International Conference on Data Mining 2008, Applied Mathematics 130 ; Conference date: 24-04-2008 Through 26-04-2008",

}

TY - GEN

T1 - A feature selection algorithm capable of handling extremely large data dimensionality

AU - Sun, Yijun

AU - Todorovic, Sinisa

AU - Goodison, Steve

PY - 2008

Y1 - 2008

N2 - With the advent of high throughput technologies, feature selection has become increasingly important in a wide range of scientific disciplines. We propose a new feature selection algorithm that performs extremely well in the presence of a huge number of irrelevant features. The key idea is to decompose an arbitrarily complex nonlinear models into a set of locally linear ones through local learning, and then estimate feature relevance globally within a large margin framework. The algorithm is capable of processing many thousands of features within a few minutes on a personal computer, yet maintains a close-to-optimum accuracy that is nearly insensitive to a growing number of irrelevant features. Experiments on eight synthetic and real-world datasets are presented that demonstrate the effectiveness of the algorithm.

AB - With the advent of high throughput technologies, feature selection has become increasingly important in a wide range of scientific disciplines. We propose a new feature selection algorithm that performs extremely well in the presence of a huge number of irrelevant features. The key idea is to decompose an arbitrarily complex nonlinear models into a set of locally linear ones through local learning, and then estimate feature relevance globally within a large margin framework. The algorithm is capable of processing many thousands of features within a few minutes on a personal computer, yet maintains a close-to-optimum accuracy that is nearly insensitive to a growing number of irrelevant features. Experiments on eight synthetic and real-world datasets are presented that demonstrate the effectiveness of the algorithm.

UR - http://www.scopus.com/inward/record.url?scp=52649105963&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=52649105963&partnerID=8YFLogxK

U2 - 10.1137/1.9781611972788.48

DO - 10.1137/1.9781611972788.48

M3 - Conference contribution

AN - SCOPUS:52649105963

SN - 9781605603179

T3 - Society for Industrial and Applied Mathematics - 8th SIAM International Conference on Data Mining 2008, Proceedings in Applied Mathematics 130

SP - 530

EP - 540

BT - Society for Industrial and Applied Mathematics - 8th SIAM International Conference on Data Mining 2008, Proceedings in Applied Mathematics 130

PB - Society for Industrial and Applied Mathematics Publications

T2 - 8th SIAM International Conference on Data Mining 2008, Applied Mathematics 130

Y2 - 24 April 2008 through 26 April 2008

ER -

A feature selection algorithm capable of handling extremely large data dimensionality

Abstract

Publication series

Other

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this