TY - GEN
T1 - Feature selection for nonlinear regression and its application to cancer research
AU - Sun, Yijun
AU - Yao, Jin
AU - Goodison, Steve
N1 - Publisher Copyright:
Copyright © SIAM.
PY - 2015
Y1 - 2015
N2 - Feature selection is a fundamental problem in machine learning. With the advent of high-throughput technologies, it becomes increasingly important in a wide range of scientific disciplines. In this paper, we consider the problem of feature selection for high-dimensional nonlinear regression. This problem has not yet been well addressed in the community, and existing methods suffer from issues such as local minima, simplified model assumptions, high computational complexity and selected features not directly related to learning accuracy. We propose a new wrapper method that addresses some of these issues. We start by developing a new approach to estimating sample responses and prediction errors, and then deploy a feature weighting strategy to find a feature subspace where a prediction error function is minimized. We formulate it as an optimization problem within the SVM framework and solve it using an iterative approach. In each iteration, a gradient descent based approach is derived to efficiently find a solution. A large-scale simulation study is performed on four synthetic and nine cancer microarray dataseis that demonstrates the effectiveness of the proposed method.
AB - Feature selection is a fundamental problem in machine learning. With the advent of high-throughput technologies, it becomes increasingly important in a wide range of scientific disciplines. In this paper, we consider the problem of feature selection for high-dimensional nonlinear regression. This problem has not yet been well addressed in the community, and existing methods suffer from issues such as local minima, simplified model assumptions, high computational complexity and selected features not directly related to learning accuracy. We propose a new wrapper method that addresses some of these issues. We start by developing a new approach to estimating sample responses and prediction errors, and then deploy a feature weighting strategy to find a feature subspace where a prediction error function is minimized. We formulate it as an optimization problem within the SVM framework and solve it using an iterative approach. In each iteration, a gradient descent based approach is derived to efficiently find a solution. A large-scale simulation study is performed on four synthetic and nine cancer microarray dataseis that demonstrates the effectiveness of the proposed method.
KW - Bioinformatics
KW - Feature selection
KW - Nonlinear regression
UR - http://www.scopus.com/inward/record.url?scp=84961956260&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84961956260&partnerID=8YFLogxK
U2 - 10.1137/1.9781611974010.9
DO - 10.1137/1.9781611974010.9
M3 - Conference contribution
AN - SCOPUS:84961956260
T3 - SIAM International Conference on Data Mining 2015, SDM 2015
SP - 73
EP - 81
BT - SIAM International Conference on Data Mining 2015, SDM 2015
A2 - Venkatasubramanian, Suresh
A2 - Ye, Jieping
PB - Society for Industrial and Applied Mathematics Publications
T2 - SIAM International Conference on Data Mining 2015, SDM 2015
Y2 - 30 April 2015 through 2 May 2015
ER -