Exploring representation-level augmentation and RAG-based vulnerability augmentation with LLMs for vulnerability detection

Daneshvar, Seyed Shayan

Exploring representation-level augmentation and RAG-based vulnerability augmentation with LLMs for vulnerability detection

dc.contributor.author	Daneshvar, Seyed Shayan
dc.contributor.examiningcommittee	Carson, Leung (Computer Science)
dc.contributor.examiningcommittee	Chowdhury, Shaiful (Computer Science)
dc.contributor.supervisor	Wang, Shaowei
dc.date.accessioned	2025-03-18T21:06:07Z
dc.date.available	2025-03-18T21:06:07Z
dc.date.issued	2025-01-23
dc.date.submitted	2025-03-04T22:57:32Z	en_US
dc.degree.discipline	Computer Science
dc.degree.level	Master of Science (M.Sc.)
dc.description.abstract	Using deep learning (DL) for detecting software vulnerabilities has become commonplace. However, data shortage remains a significant challenge due to the scarce nature of vulnerabilities. A few papers have attempted to address the data scarcity issue through oversampling, creating specific types of vulnerabilities, or generating code with single-statement vulnerabilities. In this thesis, we aim to find a general-purpose methodology that covers various types of vulnerabilities and multiple-statement ones while beating previous methods. Specifically, we first explore traditional mixup-inspired augmentation methods that work at the representation level and show that these methods can be useful, although they cannot beat random oversampling. One possible reason is that mixing samples heavily degrades the integrity of the code. Hence, we introduce VulScribeR, a RAG-based vulnerability augmentation pipeline that leverages LLMs and maintains code integrity, unlike mixup-based methods. We show that VulScribeR outperforms the state-of-the-art (SOTA), oversampling, and representation-level augmentation methods.
dc.description.note	May 2025
dc.description.sponsorship	- Dr. Lorenzo Livi's Support (Initial Supervisor fund) - FGS Research Completion Scholarship - Award Number: 47255 - International Graduate Student Entrance Scholarship - Mitacs Accelerate Internship
dc.identifier.uri	http://hdl.handle.net/1993/38932
dc.language.iso	eng
dc.subject	Vulnerability Augmentation
dc.subject	Vulnerability Generation
dc.subject	Vulnerability Injection
dc.subject	Deep Learning
dc.subject	Program Generation
dc.subject	Vulnerability Detection
dc.subject	Software Vulnerability
dc.title	Exploring representation-level augmentation and RAG-based vulnerability augmentation with LLMs for vulnerability detection
local.subject.manitoba	yes
oaire.awardNumber	44872
oaire.awardTitle	University of Manitoba Graduate Fellowship (UMGF)
oaire.awardURI	https://umanitoba.ca/graduate-studies/funding-awards-and-financial-aid/university-manitoba-graduate-fellowship-umgf
project.funder.identifier	100010318
project.funder.name	University of Manitoba

Files

Original bundle

Now showing 1 - 1 of 1

Name:: daneshvar_seyedshayan.pdf
Size:: 1.22 MB
Format:: Adobe Portable Document Format
Description:

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 770 B
Format:: Item-specific license agreed to upon submission
Description:

Download

Collections

FGS - Electronic Theses and Practica
Manitoba Heritage Theses