Assessing feature selection methods and their performance in high dimensional classification problems

dc.contributor.authorMathara Arachchige Dona, Surani Lakshima
dc.contributor.examiningcommitteeTurgeon, Max (Statistics)en_US
dc.contributor.examiningcommitteeLeung, Carson (Computer Science)en_US
dc.contributor.supervisorMuthukumarana, Saman (Statistics) Domaratzki, Mike (Computer Science)en_US
dc.date.accessioned2021-09-07T16:53:22Z
dc.date.available2021-09-07T16:53:22Z
dc.date.copyright2021-07-14
dc.date.issued2021en_US
dc.date.submitted2021-07-14T19:26:57Zen_US
dc.degree.disciplineStatisticsen_US
dc.degree.levelMaster of Science (M.Sc.)en_US
dc.description.abstractHigh dimensional classification problems have gained increasing attention in machine learning, and feature selection has become an essential step in executing machine learning algorithms. Identifying the smallest feature subset with the most informative features is the most crucial objective in feature selection. First, we propose an extended version of wrapper feature selection methods, which selects a further smaller feature subset yet with similar performance. Secondly, we examine four existing feature ordering techniques to find the most informative ordering mechanism. Using the results, we suggest a better method by combining a sequential feature selection technique with the sum of absolute values of principal component loadings to get the most informative subset of features. We further merge two different proposed approaches and compare the performance with the existing Recursive Feature Elimination (RFE) by simulating data for several practical scenarios with a different number of informative features, sample sizes, and different imbalance rates. We also use the Synthetic Minority Oversampling Technique (SMOTE) to analyze the behavior of the proposed approach. Our simulated results and application results show that the proposed methods outperform the original RFE by giving a reasonable increment or an insignificant reduction of F1-score on various data sets.en_US
dc.description.noteOctober 2021en_US
dc.identifier.urihttp://hdl.handle.net/1993/35912
dc.language.isoengen_US
dc.rightsopen accessen_US
dc.subjectFeature selectionen_US
dc.subjectRecursive Feature Eliminationen_US
dc.subjectClass Imbalanceen_US
dc.subjectWrapper Methodsen_US
dc.subjectSMOTEen_US
dc.subjectPrincipal Component Loadingsen_US
dc.titleAssessing feature selection methods and their performance in high dimensional classification problemsen_US
dc.typemaster thesisen_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
MatharaArachchigeDona_Surani.pdf
Size:
24.03 MB
Format:
Adobe Portable Document Format
Description:
Thesis file
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
2.2 KB
Format:
Item-specific license agreed to upon submission
Description: