Cognitive unsupervised clustering for detecting cyber attacks

Thumbnail Image
Nahiyan, Kaiser
Journal Title
Journal ISSN
Volume Title
It has always been a challenge to extract meaning out of unstructured data. In the field of network intrusion detection, the availability of structured, labelled datasets is limited. Most approaches adhere to techniques that demand high-end computing resources, and do not yield satisfactory results; hence human analysts must examine all the network events in order to isolate intrusion attempts. This study proposes an intelligent approach of extracting information out of large unstructured and unlabeled datasets and performs unsupervised detection of attack traffic from normal network traffic, utilizing the concepts of cognitive learning, complexity analysis, and statistical higher-order feature learning. The thesis aims to develop a methodology for the human analysts to disregard a major portion of the network dataset that contains regular traffic, and focus on the finite time-windows that have been subjected to potential attacks. Statistical higher-order feature extraction from network flows was used to create significant features out of the large unlabelled network intrusion detection dataset, which was later classified using unsupervised kmeans clustering and variance fractal dimension trajectory (VFDT) based complexity analysis. The proposed methodology has been validated using the UNSW-NB15 network intrusion dataset and the performance measures used are; detection accuracy, false positive and false negative rate, Receiver Operation Characteristics curve, Area Under Curve Value, and F1 score. Subsequently, a comparative analysis of the proposed model with a prominent traditional unsupervised machine learning technique (i.e. standard kmeans clustering) based scheme has been performed to evaluate and benchmark the efficacy of the proposed methodology. The empirically validated results show that the proposed cognitive unsupervised clustering technique-based model outperforms the general unsupervised detection scheme based on performance measures such as detection accuracy, false positive and false negative rates, Area Under Curve Value and F1 score.
Cognitive Intelligence, Machine Intelligence, Fractals, Classification, Cognitive Computing, Packet Captures, Complexity Analysis, Packet Captures, Network Threats, Cyber Security, Cognitive Informatics