Modified K-means clustering algorithms for feature selection

Loading...
Thumbnail Image
Date
2023-06-14
Authors
Akhter, Ayeasha
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract

Computational effort is difficult when dealing with high dimensional data that has hundreds or thousands of features. Features that don't significantly influence class predictions throughout the classification process increase the computing load. By eliminating unnecessary, redundant, or noisy features from the original features, feature selection, as a dimensionality reduction strategy, tries to pick a small subset of the important features from the original features. Two new feature selection methods are described in this study in relation to the effectiveness of kmeans-based clustering methods. This research project aims to reduce the number of different features by clustering the D features into k (k < D) clusters, determining the cluster center to represent its members by finding the closest feature to the cluster center or selecting the highest weighted features among the cluster members, and performing feature selection. After removing 41.4% of the features from the VIRUS-MNIST dataset, we are able to deliver accuracy equivalent to the original dataset using both of our suggested methods in a shorter amount of time. Our proposed methods outperform sparse k-means, PCA, LLE, and wk-means based feature selection method for clustering by ANN following feature reduction in the Wine dataset. With fewer features than the modified k-means feature selection method, our second method performs more accurately on the CNAE dataset.

Description
Keywords
Modified K-means, K-means, Feature selection
Citation