Privacy-preserving techniques on genomic data

Aziz, Md Momin Al

Privacy-preserving techniques on genomic data

dc.contributor.author	Aziz, Md Momin Al
dc.contributor.examiningcommittee	Yang Zhang, (Mathematics)	en_US
dc.contributor.examiningcommittee	Parimala Thulasiraman (Computer Science)	en_US
dc.contributor.supervisor	Mohammed, Noman
dc.date.accessioned	2022-06-08T13:54:48Z
dc.date.available	2022-06-08T13:54:48Z
dc.date.copyright	2022-06-08
dc.date.issued	2022-06-03
dc.date.submitted	2022-06-08T12:36:53Z	en_US
dc.degree.discipline	Computer Science	en_US
dc.degree.level	Doctor of Philosophy (Ph.D.)	en_US
dc.description.abstract	Genomic data hold salient information about the characteristics of a living organism. Throughout the last decade, pinnacle developments have given us more accurate and inexpensive methods to retrieve our genome sequences. However, with the advancement of genomic research, there are growing security and privacy concerns regarding collecting, storing, and analyzing such sensitive data. Recent results show that given some background information, it is possible for an adversary to re-identify an individual from a specific genomic dataset. This can reveal the current association or future susceptibility of some diseases for that individual (and sometimes the kinship between individuals), resulting in a privacy violation. This thesis has two parts and proposes several techniques to mitigate the privacy issues relating to genomic data. In our first part, we target the data privacy issues while using any external computational environment. We propose privacy-preserving frameworks to store genomic data in an untrusted computational environment (\textit{i.e.}, cloud). In particular, we employ prefix and suffix tree structures to represent genomic data while keeping them under encryption throughout its computational life-cycle. Therefore, the underlying methods perform different string search queries and arbitrary computations under encryption without requiring access to the raw sensitive data. We also propose a GPU-parallel Fully Homomorphic Encryption framework that optimizes existing algorithms and can perform string distance metrics such as Hamming, Edit distance and Set Maximal Matching. The GPU-parallel framework is 14.4 and 46.81 times faster for standard and matrix multiplications, respectively compared to the existing techniques. The second part of the thesis targets another privacy setting where the outputs from different genomic data analyses are deemed sensitive. Here, we propose several differentially private mechanisms to share partial genome datasets and intermediate statistics providing a strict privacy guarantee. Experimental results demonstrate that the proposed methods are effective for protecting data privacy while computing and analysis of genomic data. Overall, the proposed techniques in this thesis are not specialized for genomic data but can be generalized to protect other types of sensitive data.	en_US
dc.description.note	October 2022	en_US
dc.identifier.uri	http://hdl.handle.net/1993/36541
dc.language.iso	eng	en_US
dc.rights	open access	en_US
dc.subject	Privacy-preserving techniques	en_US
dc.subject	genomic data privacy	en_US
dc.subject	healthcare data privacy	en_US
dc.subject	Homomorphic Encryption	en_US
dc.subject	Differential Privacy	en_US
dc.title	Privacy-preserving techniques on genomic data	en_US
dc.title.alternative	Privacy-Preserving Techniques on Healthcare Data	en_US
dc.type	doctoral thesis	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: PhD_Thesis_Momin_final.pdf
Size:: 3.01 MB
Format:: Adobe Portable Document Format
Description:: Phd Thesis

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 2.2 KB
Format:: Item-specific license agreed to upon submission
Description:

Download

Collections

FGS - Electronic Theses and Practica