Assessing feature selection methods and their performance in high dimensional classification problems

Mathara Arachchige Dona, Surani Lakshima

Assessing feature selection methods and their performance in high dimensional classification problems

dc.contributor.author	Mathara Arachchige Dona, Surani Lakshima
dc.contributor.examiningcommittee	Turgeon, Max (Statistics)	en_US
dc.contributor.examiningcommittee	Leung, Carson (Computer Science)	en_US
dc.contributor.supervisor	Muthukumarana, Saman (Statistics) Domaratzki, Mike (Computer Science)	en_US
dc.date.accessioned	2021-09-07T16:53:22Z
dc.date.available	2021-09-07T16:53:22Z
dc.date.copyright	2021-07-14
dc.date.issued	2021	en_US
dc.date.submitted	2021-07-14T19:26:57Z	en_US
dc.degree.discipline	Statistics	en_US
dc.degree.level	Master of Science (M.Sc.)	en_US
dc.description.abstract	High dimensional classification problems have gained increasing attention in machine learning, and feature selection has become an essential step in executing machine learning algorithms. Identifying the smallest feature subset with the most informative features is the most crucial objective in feature selection. First, we propose an extended version of wrapper feature selection methods, which selects a further smaller feature subset yet with similar performance. Secondly, we examine four existing feature ordering techniques to find the most informative ordering mechanism. Using the results, we suggest a better method by combining a sequential feature selection technique with the sum of absolute values of principal component loadings to get the most informative subset of features. We further merge two different proposed approaches and compare the performance with the existing Recursive Feature Elimination (RFE) by simulating data for several practical scenarios with a different number of informative features, sample sizes, and different imbalance rates. We also use the Synthetic Minority Oversampling Technique (SMOTE) to analyze the behavior of the proposed approach. Our simulated results and application results show that the proposed methods outperform the original RFE by giving a reasonable increment or an insignificant reduction of F1-score on various data sets.	en_US
dc.description.note	October 2021	en_US
dc.identifier.uri	http://hdl.handle.net/1993/35912
dc.language.iso	eng	en_US
dc.rights	open access	en_US
dc.subject	Feature selection	en_US
dc.subject	Recursive Feature Elimination	en_US
dc.subject	Class Imbalance	en_US
dc.subject	Wrapper Methods	en_US
dc.subject	SMOTE	en_US
dc.subject	Principal Component Loadings	en_US
dc.title	Assessing feature selection methods and their performance in high dimensional classification problems	en_US
dc.type	master thesis	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: MatharaArachchigeDona_Surani.pdf
Size:: 24.03 MB
Format:: Adobe Portable Document Format
Description:: Thesis file

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 2.2 KB
Format:: Item-specific license agreed to upon submission
Description:

Download

Collections

FGS - Electronic Theses and Practica