We propose a novel online-learning based feature selection algorithm for supervised learning in the presence of a huge amount of irrelevant features. The key idea of the algorithm is to decompose a nonlinear problem into a set of locally linear ones through local learning, and then estimate the relevance of features globally in a large margin framework with ℓ1 regularization. Unlike batch learning, the regularization parameter in online learning has to be tuned on-thefly with the increasing of training data. We address this issue within the Bayesian learning paradigm, and provide an analytic solution for automatic estimation of the regularization parameter via variational methods. Numerical experiments on a variety of benchmark data sets are presented that demonstrate the effectiveness of the newly proposed feature selection algorithm.