Assessing feature selection methods and their performance in high dimensional classification problems
Mathara Arachchige Dona, Surani Lakshima
High dimensional classification problems have gained increasing attention in machine learning, and feature selection has become an essential step in executing machine learning algorithms. Identifying the smallest feature subset with the most informative features is the most crucial objective in feature selection. First, we propose an extended version of wrapper feature selection methods, which selects a further smaller feature subset yet with similar performance. Secondly, we examine four existing feature ordering techniques to find the most informative ordering mechanism. Using the results, we suggest a better method by combining a sequential feature selection technique with the sum of absolute values of principal component loadings to get the most informative subset of features. We further merge two different proposed approaches and compare the performance with the existing Recursive Feature Elimination (RFE) by simulating data for several practical scenarios with a different number of informative features, sample sizes, and different imbalance rates. We also use the Synthetic Minority Oversampling Technique (SMOTE) to analyze the behavior of the proposed approach. Our simulated results and application results show that the proposed methods outperform the original RFE by giving a reasonable increment or an insignificant reduction of F1-score on various data sets.
Feature selection, Recursive Feature Elimination, Class Imbalance, Wrapper Methods, SMOTE, Principal Component Loadings