Exploring representation-level augmentation and RAG-based vulnerability augmentation with LLMs for vulnerability detection

dc.contributor.authorDaneshvar, Seyed Shayan
dc.contributor.examiningcommitteeCarson, Leung (Computer Science)
dc.contributor.examiningcommitteeChowdhury, Shaiful (Computer Science)
dc.contributor.supervisorWang, Shaowei
dc.date.accessioned2025-03-18T21:06:07Z
dc.date.available2025-03-18T21:06:07Z
dc.date.issued2025-01-23
dc.date.submitted2025-03-04T22:57:32Zen_US
dc.degree.disciplineComputer Science
dc.degree.levelMaster of Science (M.Sc.)
dc.description.abstractUsing deep learning (DL) for detecting software vulnerabilities has become commonplace. However, data shortage remains a significant challenge due to the scarce nature of vulnerabilities. A few papers have attempted to address the data scarcity issue through oversampling, creating specific types of vulnerabilities, or generating code with single-statement vulnerabilities. In this thesis, we aim to find a general-purpose methodology that covers various types of vulnerabilities and multiple-statement ones while beating previous methods. Specifically, we first explore traditional mixup-inspired augmentation methods that work at the representation level and show that these methods can be useful, although they cannot beat random oversampling. One possible reason is that mixing samples heavily degrades the integrity of the code. Hence, we introduce VulScribeR, a RAG-based vulnerability augmentation pipeline that leverages LLMs and maintains code integrity, unlike mixup-based methods. We show that VulScribeR outperforms the state-of-the-art (SOTA), oversampling, and representation-level augmentation methods.
dc.description.noteMay 2025
dc.description.sponsorship- Dr. Lorenzo Livi's Support (Initial Supervisor fund) - FGS Research Completion Scholarship - Award Number: 47255 - International Graduate Student Entrance Scholarship - Mitacs Accelerate Internship
dc.identifier.urihttp://hdl.handle.net/1993/38932
dc.language.isoeng
dc.subjectVulnerability Augmentation
dc.subjectVulnerability Generation
dc.subjectVulnerability Injection
dc.subjectDeep Learning
dc.subjectProgram Generation
dc.subjectVulnerability Detection
dc.subjectSoftware Vulnerability
dc.titleExploring representation-level augmentation and RAG-based vulnerability augmentation with LLMs for vulnerability detection
local.subject.manitobayes
oaire.awardNumber44872
oaire.awardTitleUniversity of Manitoba Graduate Fellowship (UMGF)
oaire.awardURIhttps://umanitoba.ca/graduate-studies/funding-awards-and-financial-aid/university-manitoba-graduate-fellowship-umgf
project.funder.identifier100010318
project.funder.nameUniversity of Manitoba
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
daneshvar_seyedshayan.pdf
Size:
1.22 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
770 B
Format:
Item-specific license agreed to upon submission
Description: