A data augmentation approach using style-based generative adversarial networks for date fruit classification
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Hardness is important for ascertaining the quality of dried fruits such as dates. In the dates sorting facilities, hardness is determined by human visual inspection resulting in the high accuracy of the sorting process. However, this process is slow, tedious, subjective, and unhygienic. Machine learning techniques have been adopted in agricultural applications for automatic sorting processes and in some scenarios, a large amount of data may not be available for training Deep Learning classification models. Such is the case presented here for sorting dates based on their hardness. The original dataset in this work consists of 1800 images with 600 images per class for hard, semi-hard, and soft classes obtained from different growing regions in Oman. This is a limited number of examples available to improve the classification accuracy of deep learning classification networks. Thus, this thesis proposes data augmentation based on Generative Adversarial Networks (GAN) (StyleGAN2-ADA) for generating date images. It incorporates Residual Neural Networks (ResNet18) for date classification. The goal is to use the original dataset to further generate enough images so that ResNet18 can successfully classify them into their respective classes. As the original dataset can hardly be considered large, the GAN images help during the training part. Utilizing the new dataset from the GAN generation process achieved accuracies of 93%, 99% and 53% for the hard, soft and semi-hard classes by using an expert system in combination with two ResNet18 networks, one network trained for classifying hard and soft cases as well as a second network that classifies the three classes. By extension, texture features were extracted from the original and generated images and compared using a Kolmogorov-Smirnov test measuring the similarity of the probability density distributions of the extracted texture features. The similarity for hard, semi-hard and soft classes are 82.86%, 100% and 94.29% respectively. This step further validated the GAN generated images indicating that they were of merit.