Exploring representation-level augmentation and RAG-based vulnerability augmentation with LLMs for vulnerability detection
dc.contributor.author | Daneshvar, Seyed Shayan | |
dc.contributor.examiningcommittee | Carson, Leung (Computer Science) | |
dc.contributor.examiningcommittee | Chowdhury, Shaiful (Computer Science) | |
dc.contributor.supervisor | Wang, Shaowei | |
dc.date.accessioned | 2025-03-18T21:06:07Z | |
dc.date.available | 2025-03-18T21:06:07Z | |
dc.date.issued | 2025-01-23 | |
dc.date.submitted | 2025-03-04T22:57:32Z | en_US |
dc.degree.discipline | Computer Science | |
dc.degree.level | Master of Science (M.Sc.) | |
dc.description.abstract | Using deep learning (DL) for detecting software vulnerabilities has become commonplace. However, data shortage remains a significant challenge due to the scarce nature of vulnerabilities. A few papers have attempted to address the data scarcity issue through oversampling, creating specific types of vulnerabilities, or generating code with single-statement vulnerabilities. In this thesis, we aim to find a general-purpose methodology that covers various types of vulnerabilities and multiple-statement ones while beating previous methods. Specifically, we first explore traditional mixup-inspired augmentation methods that work at the representation level and show that these methods can be useful, although they cannot beat random oversampling. One possible reason is that mixing samples heavily degrades the integrity of the code. Hence, we introduce VulScribeR, a RAG-based vulnerability augmentation pipeline that leverages LLMs and maintains code integrity, unlike mixup-based methods. We show that VulScribeR outperforms the state-of-the-art (SOTA), oversampling, and representation-level augmentation methods. | |
dc.description.note | May 2025 | |
dc.description.sponsorship | - Dr. Lorenzo Livi's Support (Initial Supervisor fund) - FGS Research Completion Scholarship - Award Number: 47255 - International Graduate Student Entrance Scholarship - Mitacs Accelerate Internship | |
dc.identifier.uri | http://hdl.handle.net/1993/38932 | |
dc.language.iso | eng | |
dc.subject | Vulnerability Augmentation | |
dc.subject | Vulnerability Generation | |
dc.subject | Vulnerability Injection | |
dc.subject | Deep Learning | |
dc.subject | Program Generation | |
dc.subject | Vulnerability Detection | |
dc.subject | Software Vulnerability | |
dc.title | Exploring representation-level augmentation and RAG-based vulnerability augmentation with LLMs for vulnerability detection | |
local.subject.manitoba | yes | |
oaire.awardNumber | 44872 | |
oaire.awardTitle | University of Manitoba Graduate Fellowship (UMGF) | |
oaire.awardURI | https://umanitoba.ca/graduate-studies/funding-awards-and-financial-aid/university-manitoba-graduate-fellowship-umgf | |
project.funder.identifier | 100010318 | |
project.funder.name | University of Manitoba |