Local-learning-based feature selection for high-dimensional data analysis

Yijun Sun; Sinisa Todorovic; Steve Goodison

doi:10.1109/TPAMI.2009.190

Local-learning-based feature selection for high-dimensional data analysis

Yijun Sun, Sinisa Todorovic, Steve Goodison

Quantitative Health Sciences

Research output: Contribution to journal › Article › peer-review

269 Scopus citations

Abstract

This paper considers feature selection for data classification in the presence of a huge number of irrelevant features. We propose a new feature-selection algorithm that addresses several major issues with prior work, including problems with algorithm implementation, computational complexity, and solution accuracy. The key idea is to decompose an arbitrarily complex nonlinear problem into a set of locally linear ones through local learning, and then learn feature relevance globally within the large margin framework. The proposed algorithm is based on well-established machine learning and numerical analysis techniques, without making any assumptions about the underlying data distribution. It is capable of processing many thousands of features within minutes on a personal computer while maintaining a very high accuracy that is nearly insensitive to a growing number of irrelevant features. Theoretical analyses of the algorithm's sample complexity suggest that the algorithm has a logarithmical sample complexity with respect to the number of features. Experiments on 11 synthetic and real-world data sets demonstrate the viability of our formulation of the feature-selection problem for supervised learning and the effectiveness of our algorithm.

Original language	English (US)
Article number	5342431
Pages (from-to)	1610-1626
Number of pages	17
Journal	IEEE Transactions on Pattern Analysis and Machine Intelligence
Volume	32
Issue number	9
DOIs	https://doi.org/10.1109/TPAMI.2009.190
State	Published - 2010

Keywords

'1 regularization
Feature selection
local learning
logistical regression
sample complexity

ASJC Scopus subject areas

Software
Computer Vision and Pattern Recognition
Computational Theory and Mathematics
Artificial Intelligence
Applied Mathematics

Access to Document

10.1109/TPAMI.2009.190

Cite this

@article{eaa9bc06d28e4b5c8680f939a0f7cf10,

title = "Local-learning-based feature selection for high-dimensional data analysis",

abstract = "This paper considers feature selection for data classification in the presence of a huge number of irrelevant features. We propose a new feature-selection algorithm that addresses several major issues with prior work, including problems with algorithm implementation, computational complexity, and solution accuracy. The key idea is to decompose an arbitrarily complex nonlinear problem into a set of locally linear ones through local learning, and then learn feature relevance globally within the large margin framework. The proposed algorithm is based on well-established machine learning and numerical analysis techniques, without making any assumptions about the underlying data distribution. It is capable of processing many thousands of features within minutes on a personal computer while maintaining a very high accuracy that is nearly insensitive to a growing number of irrelevant features. Theoretical analyses of the algorithm's sample complexity suggest that the algorithm has a logarithmical sample complexity with respect to the number of features. Experiments on 11 synthetic and real-world data sets demonstrate the viability of our formulation of the feature-selection problem for supervised learning and the effectiveness of our algorithm.",

keywords = "'1 regularization, Feature selection, local learning, logistical regression, sample complexity",

author = "Yijun Sun and Sinisa Todorovic and Steve Goodison",

note = "Funding Information: The authors thank the associate editor Dr. Olivier Chapelle and three anonymous reviewers for numerous suggestions that significantly improved the quality of the paper. This work is in part supported by the Susan Komen Breast Cancer Foundation under grant No. BCTR0707587.",

year = "2010",

doi = "10.1109/TPAMI.2009.190",

language = "English (US)",

volume = "32",

pages = "1610--1626",

journal = "IEEE Transactions on Pattern Analysis and Machine Intelligence",

issn = "0162-8828",

publisher = "IEEE Computer Society",

number = "9",

}

TY - JOUR

T1 - Local-learning-based feature selection for high-dimensional data analysis

AU - Sun, Yijun

AU - Todorovic, Sinisa

AU - Goodison, Steve

N1 - Funding Information: The authors thank the associate editor Dr. Olivier Chapelle and three anonymous reviewers for numerous suggestions that significantly improved the quality of the paper. This work is in part supported by the Susan Komen Breast Cancer Foundation under grant No. BCTR0707587.

PY - 2010

Y1 - 2010

N2 - This paper considers feature selection for data classification in the presence of a huge number of irrelevant features. We propose a new feature-selection algorithm that addresses several major issues with prior work, including problems with algorithm implementation, computational complexity, and solution accuracy. The key idea is to decompose an arbitrarily complex nonlinear problem into a set of locally linear ones through local learning, and then learn feature relevance globally within the large margin framework. The proposed algorithm is based on well-established machine learning and numerical analysis techniques, without making any assumptions about the underlying data distribution. It is capable of processing many thousands of features within minutes on a personal computer while maintaining a very high accuracy that is nearly insensitive to a growing number of irrelevant features. Theoretical analyses of the algorithm's sample complexity suggest that the algorithm has a logarithmical sample complexity with respect to the number of features. Experiments on 11 synthetic and real-world data sets demonstrate the viability of our formulation of the feature-selection problem for supervised learning and the effectiveness of our algorithm.

AB - This paper considers feature selection for data classification in the presence of a huge number of irrelevant features. We propose a new feature-selection algorithm that addresses several major issues with prior work, including problems with algorithm implementation, computational complexity, and solution accuracy. The key idea is to decompose an arbitrarily complex nonlinear problem into a set of locally linear ones through local learning, and then learn feature relevance globally within the large margin framework. The proposed algorithm is based on well-established machine learning and numerical analysis techniques, without making any assumptions about the underlying data distribution. It is capable of processing many thousands of features within minutes on a personal computer while maintaining a very high accuracy that is nearly insensitive to a growing number of irrelevant features. Theoretical analyses of the algorithm's sample complexity suggest that the algorithm has a logarithmical sample complexity with respect to the number of features. Experiments on 11 synthetic and real-world data sets demonstrate the viability of our formulation of the feature-selection problem for supervised learning and the effectiveness of our algorithm.

KW - '1 regularization

KW - Feature selection

KW - local learning

KW - logistical regression

KW - sample complexity

UR - http://www.scopus.com/inward/record.url?scp=77955397866&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77955397866&partnerID=8YFLogxK

U2 - 10.1109/TPAMI.2009.190

DO - 10.1109/TPAMI.2009.190

M3 - Article

C2 - 20634556

AN - SCOPUS:77955397866

SN - 0162-8828

VL - 32

SP - 1610

EP - 1626

JO - IEEE Transactions on Pattern Analysis and Machine Intelligence

JF - IEEE Transactions on Pattern Analysis and Machine Intelligence

IS - 9

M1 - 5342431

ER -

Local-learning-based feature selection for high-dimensional data analysis

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this