Privacy-preserving techniques on genomic data

dc.contributor.authorAziz, Md Momin Al
dc.contributor.examiningcommitteeYang Zhang, (Mathematics)en_US
dc.contributor.examiningcommitteeParimala Thulasiraman (Computer Science)en_US
dc.contributor.supervisorMohammed, Noman
dc.date.accessioned2022-06-08T13:54:48Z
dc.date.available2022-06-08T13:54:48Z
dc.date.copyright2022-06-08
dc.date.issued2022-06-03
dc.date.submitted2022-06-08T12:36:53Zen_US
dc.degree.disciplineComputer Scienceen_US
dc.degree.levelDoctor of Philosophy (Ph.D.)en_US
dc.description.abstractGenomic data hold salient information about the characteristics of a living organism. Throughout the last decade, pinnacle developments have given us more accurate and inexpensive methods to retrieve our genome sequences. However, with the advancement of genomic research, there are growing security and privacy concerns regarding collecting, storing, and analyzing such sensitive data. Recent results show that given some background information, it is possible for an adversary to re-identify an individual from a specific genomic dataset. This can reveal the current association or future susceptibility of some diseases for that individual (and sometimes the kinship between individuals), resulting in a privacy violation. This thesis has two parts and proposes several techniques to mitigate the privacy issues relating to genomic data. In our first part, we target the data privacy issues while using any external computational environment. We propose privacy-preserving frameworks to store genomic data in an untrusted computational environment (\textit{i.e.}, cloud). In particular, we employ prefix and suffix tree structures to represent genomic data while keeping them under encryption throughout its computational life-cycle. Therefore, the underlying methods perform different string search queries and arbitrary computations under encryption without requiring access to the raw sensitive data. We also propose a GPU-parallel Fully Homomorphic Encryption framework that optimizes existing algorithms and can perform string distance metrics such as Hamming, Edit distance and Set Maximal Matching. The GPU-parallel framework is 14.4 and 46.81 times faster for standard and matrix multiplications, respectively compared to the existing techniques. The second part of the thesis targets another privacy setting where the outputs from different genomic data analyses are deemed sensitive. Here, we propose several differentially private mechanisms to share partial genome datasets and intermediate statistics providing a strict privacy guarantee. Experimental results demonstrate that the proposed methods are effective for protecting data privacy while computing and analysis of genomic data. Overall, the proposed techniques in this thesis are not specialized for genomic data but can be generalized to protect other types of sensitive data.en_US
dc.description.noteOctober 2022en_US
dc.identifier.urihttp://hdl.handle.net/1993/36541
dc.language.isoengen_US
dc.rightsopen accessen_US
dc.subjectPrivacy-preserving techniquesen_US
dc.subjectgenomic data privacyen_US
dc.subjecthealthcare data privacyen_US
dc.subjectHomomorphic Encryptionen_US
dc.subjectDifferential Privacyen_US
dc.titlePrivacy-preserving techniques on genomic dataen_US
dc.title.alternativePrivacy-Preserving Techniques on Healthcare Dataen_US
dc.typedoctoral thesisen_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
PhD_Thesis_Momin_final.pdf
Size:
3.01 MB
Format:
Adobe Portable Document Format
Description:
Phd Thesis
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
2.2 KB
Format:
Item-specific license agreed to upon submission
Description: