Transactions on Machine Learning and Data Mining (ISSN: 1865-6781)


Volume 9 - Number 1 - July 2016 - Pages 27-45


Combining Multiple Feature Selection Methods and Deep Learning for High-dimensional Data

Mihaela A. Mares, Shicai Wang, and Yike Guo

Imperial College Data Science Institute , Imperial College London, UK; School of Computer Science, Shanghai University, Shanghai, China


Abstract

Feature or variable selection when the number of features is relatively large to the number of samples or n<< p is a challenge in many machine learning applications. A large number of statistical methods have been developed to address this challenge. Each method uses different statistical assumptions about the shape of the regression function relating the predicted variable to the predictors. In this paper we propose an alternative: combining results from different feature selection methods relying on disjoint assumptions about the regression function. We show that our method will lead to better sensitivity than using different methods individually, on synthetic datasets and datasets from the UCI machine learning repository. Our empirical studies on data with n << p show that the accuracy obtained when training deep neural networks with variables selected using our method is at least as good as the accuracy obtained when not selecting variables in advance. Our first conclusion is that the feature selection results are improved by enlarging the body of limiting assumptions about the function relating the predicted variable to the predictors. Our second conclusion is that, feature selection can improve accuracy in deep learning at least on data with n << p. Keywords:


Keywords: combining feature selection, high-dimensional data, deep learning, nonlinear regression, variable selection

PDFDownload Paper (234 KB)


Back to Table of Contents