Feature selection for nonlinear regression and its application to cancer research

Yijun Sun; Jin Yao; Steve Goodison

doi:10.1137/1.9781611974010.9

Feature selection for nonlinear regression and its application to cancer research

Yijun Sun, Jin Yao, Steve Goodison

Quantitative Health Sciences

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

2 Scopus citations

Abstract

Feature selection is a fundamental problem in machine learning. With the advent of high-throughput technologies, it becomes increasingly important in a wide range of scientific disciplines. In this paper, we consider the problem of feature selection for high-dimensional nonlinear regression. This problem has not yet been well addressed in the community, and existing methods suffer from issues such as local minima, simplified model assumptions, high computational complexity and selected features not directly related to learning accuracy. We propose a new wrapper method that addresses some of these issues. We start by developing a new approach to estimating sample responses and prediction errors, and then deploy a feature weighting strategy to find a feature subspace where a prediction error function is minimized. We formulate it as an optimization problem within the SVM framework and solve it using an iterative approach. In each iteration, a gradient descent based approach is derived to efficiently find a solution. A large-scale simulation study is performed on four synthetic and nine cancer microarray dataseis that demonstrates the effectiveness of the proposed method.

Original language	English (US)
Title of host publication	SIAM International Conference on Data Mining 2015, SDM 2015
Editors	Suresh Venkatasubramanian, Jieping Ye
Publisher	Society for Industrial and Applied Mathematics Publications
Pages	73-81
Number of pages	9
ISBN (Electronic)	9781510811522
DOIs	https://doi.org/10.1137/1.9781611974010.9
State	Published - 2015
Event	SIAM International Conference on Data Mining 2015, SDM 2015 - Vancouver, Canada Duration: Apr 30 2015 → May 2 2015

Publication series

Name	SIAM International Conference on Data Mining 2015, SDM 2015

Other

Other	SIAM International Conference on Data Mining 2015, SDM 2015
Country/Territory	Canada
City	Vancouver
Period	4/30/15 → 5/2/15

Keywords

Bioinformatics
Feature selection
Nonlinear regression

ASJC Scopus subject areas

Computational Theory and Mathematics
Computer Vision and Pattern Recognition
Software

Access to Document

10.1137/1.9781611974010.9

Cite this

Sun, Y., Yao, J., & Goodison, S. (2015). Feature selection for nonlinear regression and its application to cancer research. In S. Venkatasubramanian, & J. Ye (Eds.), SIAM International Conference on Data Mining 2015, SDM 2015 (pp. 73-81). (SIAM International Conference on Data Mining 2015, SDM 2015). Society for Industrial and Applied Mathematics Publications. https://doi.org/10.1137/1.9781611974010.9

Feature selection for nonlinear regression and its application to cancer research. / Sun, Yijun; Yao, Jin; Goodison, Steve.
SIAM International Conference on Data Mining 2015, SDM 2015. ed. / Suresh Venkatasubramanian; Jieping Ye. Society for Industrial and Applied Mathematics Publications, 2015. p. 73-81 (SIAM International Conference on Data Mining 2015, SDM 2015).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Sun, Y, Yao, J & Goodison, S 2015, Feature selection for nonlinear regression and its application to cancer research. in S Venkatasubramanian & J Ye (eds), SIAM International Conference on Data Mining 2015, SDM 2015. SIAM International Conference on Data Mining 2015, SDM 2015, Society for Industrial and Applied Mathematics Publications, pp. 73-81, SIAM International Conference on Data Mining 2015, SDM 2015, Vancouver, Canada, 4/30/15. https://doi.org/10.1137/1.9781611974010.9

Sun Y, Yao J, Goodison S. Feature selection for nonlinear regression and its application to cancer research. In Venkatasubramanian S, Ye J, editors, SIAM International Conference on Data Mining 2015, SDM 2015. Society for Industrial and Applied Mathematics Publications. 2015. p. 73-81. (SIAM International Conference on Data Mining 2015, SDM 2015). doi: 10.1137/1.9781611974010.9

Sun, Yijun ; Yao, Jin ; Goodison, Steve. / Feature selection for nonlinear regression and its application to cancer research. SIAM International Conference on Data Mining 2015, SDM 2015. editor / Suresh Venkatasubramanian ; Jieping Ye. Society for Industrial and Applied Mathematics Publications, 2015. pp. 73-81 (SIAM International Conference on Data Mining 2015, SDM 2015).

@inproceedings{800429f0dc5f4e2283242eb6b6761524,

title = "Feature selection for nonlinear regression and its application to cancer research",

abstract = "Feature selection is a fundamental problem in machine learning. With the advent of high-throughput technologies, it becomes increasingly important in a wide range of scientific disciplines. In this paper, we consider the problem of feature selection for high-dimensional nonlinear regression. This problem has not yet been well addressed in the community, and existing methods suffer from issues such as local minima, simplified model assumptions, high computational complexity and selected features not directly related to learning accuracy. We propose a new wrapper method that addresses some of these issues. We start by developing a new approach to estimating sample responses and prediction errors, and then deploy a feature weighting strategy to find a feature subspace where a prediction error function is minimized. We formulate it as an optimization problem within the SVM framework and solve it using an iterative approach. In each iteration, a gradient descent based approach is derived to efficiently find a solution. A large-scale simulation study is performed on four synthetic and nine cancer microarray dataseis that demonstrates the effectiveness of the proposed method.",

keywords = "Bioinformatics, Feature selection, Nonlinear regression",

author = "Yijun Sun and Jin Yao and Steve Goodison",

note = "Publisher Copyright: Copyright {\textcopyright} SIAM.; SIAM International Conference on Data Mining 2015, SDM 2015 ; Conference date: 30-04-2015 Through 02-05-2015",

year = "2015",

doi = "10.1137/1.9781611974010.9",

language = "English (US)",

series = "SIAM International Conference on Data Mining 2015, SDM 2015",

publisher = "Society for Industrial and Applied Mathematics Publications",

pages = "73--81",

editor = "Suresh Venkatasubramanian and Jieping Ye",

booktitle = "SIAM International Conference on Data Mining 2015, SDM 2015",

address = "United States",

}

TY - GEN

T1 - Feature selection for nonlinear regression and its application to cancer research

AU - Sun, Yijun

AU - Yao, Jin

AU - Goodison, Steve

PY - 2015

Y1 - 2015

N2 - Feature selection is a fundamental problem in machine learning. With the advent of high-throughput technologies, it becomes increasingly important in a wide range of scientific disciplines. In this paper, we consider the problem of feature selection for high-dimensional nonlinear regression. This problem has not yet been well addressed in the community, and existing methods suffer from issues such as local minima, simplified model assumptions, high computational complexity and selected features not directly related to learning accuracy. We propose a new wrapper method that addresses some of these issues. We start by developing a new approach to estimating sample responses and prediction errors, and then deploy a feature weighting strategy to find a feature subspace where a prediction error function is minimized. We formulate it as an optimization problem within the SVM framework and solve it using an iterative approach. In each iteration, a gradient descent based approach is derived to efficiently find a solution. A large-scale simulation study is performed on four synthetic and nine cancer microarray dataseis that demonstrates the effectiveness of the proposed method.

AB - Feature selection is a fundamental problem in machine learning. With the advent of high-throughput technologies, it becomes increasingly important in a wide range of scientific disciplines. In this paper, we consider the problem of feature selection for high-dimensional nonlinear regression. This problem has not yet been well addressed in the community, and existing methods suffer from issues such as local minima, simplified model assumptions, high computational complexity and selected features not directly related to learning accuracy. We propose a new wrapper method that addresses some of these issues. We start by developing a new approach to estimating sample responses and prediction errors, and then deploy a feature weighting strategy to find a feature subspace where a prediction error function is minimized. We formulate it as an optimization problem within the SVM framework and solve it using an iterative approach. In each iteration, a gradient descent based approach is derived to efficiently find a solution. A large-scale simulation study is performed on four synthetic and nine cancer microarray dataseis that demonstrates the effectiveness of the proposed method.

KW - Bioinformatics

KW - Feature selection

KW - Nonlinear regression

UR - http://www.scopus.com/inward/record.url?scp=84961956260&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84961956260&partnerID=8YFLogxK

U2 - 10.1137/1.9781611974010.9

DO - 10.1137/1.9781611974010.9

M3 - Conference contribution

AN - SCOPUS:84961956260

T3 - SIAM International Conference on Data Mining 2015, SDM 2015

SP - 73

EP - 81

BT - SIAM International Conference on Data Mining 2015, SDM 2015

A2 - Venkatasubramanian, Suresh

A2 - Ye, Jieping

PB - Society for Industrial and Applied Mathematics Publications

T2 - SIAM International Conference on Data Mining 2015, SDM 2015

Y2 - 30 April 2015 through 2 May 2015

ER -

Feature selection for nonlinear regression and its application to cancer research

Abstract

Publication series

Other

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this