A multilevel search algorithm for feature selection in biomedical data

Thumbnail Image
Oduntan, Idowu Olayinka
Journal Title
Journal ISSN
Volume Title
The automated analysis of patients’ biomedical data can be used to derive diagnostic and prognostic inferences about the observed patients. Many noninvasive techniques for acquiring biomedical samples generate data that are characterized by a large number of distinct attributes (i.e. features) and a small number of observed patients (i.e. samples). Deriving reliable inferences, such as classifying a given patient as either cancerous or non-cancerous, using these biomedical data requires that the ratio r of the number of samples to the number of features be within the range 5 < r < 10. To satisfy this requirement, the original set of features in the biomedical datasets can be reduced to an ‘optimal’ subset of features that most discriminates the observed patients. Feature selection techniques strategically seek the ‘optimal’ subset. In this thesis, I present a new feature selection technique - multilevel feature selection. The technique seeks the ‘optimal’ feature subset in biomedical datasets using a multilevel search algorithm. This algorithm combines a hierarchical search framework with a search method. The framework, which provides the capability to easily adapt the technique to different forms of biomedical datasets, consists of increasingly coarse forms of the original feature set that are strategically and progressively explored by the search method. Tabu search (a search meta-heuristics) is the search method used in the multilevel feature selection technique. I evaluate the performance of the new technique, in terms of the solution quality, using experiments that compare the classification inferences derived from the result of the technique with those derived from the result of other feature selection techniques such as the basic tabu-search-based feature selection, sequential forward selection, and random feature selection. In the experiments, the same biomedical dataset is used and equivalent amount of computational resource is allocated to the evaluated techniques to provide a common basis for comparison. The empirical results show that the multilevel feature selection technique finds ‘optimal’ subsets that enable more accurate and stable classification than those selected using the other feature selection techniques. Also, a similar comparison of the new technique with a genetic algorithm feature selection technique that selects highly discriminatory regions of consecutive features shows that the multilevel technique finds subsets that enable more stable classification.
multilevel search, multilevel feature selection, multilevel paradigm, feature selection problem