Privacy-preserving techniques on genomic data
dc.contributor.author | Aziz, Md Momin Al | |
dc.contributor.examiningcommittee | Yang Zhang, (Mathematics) | en_US |
dc.contributor.examiningcommittee | Parimala Thulasiraman (Computer Science) | en_US |
dc.contributor.supervisor | Mohammed, Noman | |
dc.date.accessioned | 2022-06-08T13:54:48Z | |
dc.date.available | 2022-06-08T13:54:48Z | |
dc.date.copyright | 2022-06-08 | |
dc.date.issued | 2022-06-03 | |
dc.date.submitted | 2022-06-08T12:36:53Z | en_US |
dc.degree.discipline | Computer Science | en_US |
dc.degree.level | Doctor of Philosophy (Ph.D.) | en_US |
dc.description.abstract | Genomic data hold salient information about the characteristics of a living organism. Throughout the last decade, pinnacle developments have given us more accurate and inexpensive methods to retrieve our genome sequences. However, with the advancement of genomic research, there are growing security and privacy concerns regarding collecting, storing, and analyzing such sensitive data. Recent results show that given some background information, it is possible for an adversary to re-identify an individual from a specific genomic dataset. This can reveal the current association or future susceptibility of some diseases for that individual (and sometimes the kinship between individuals), resulting in a privacy violation. This thesis has two parts and proposes several techniques to mitigate the privacy issues relating to genomic data. In our first part, we target the data privacy issues while using any external computational environment. We propose privacy-preserving frameworks to store genomic data in an untrusted computational environment (\textit{i.e.}, cloud). In particular, we employ prefix and suffix tree structures to represent genomic data while keeping them under encryption throughout its computational life-cycle. Therefore, the underlying methods perform different string search queries and arbitrary computations under encryption without requiring access to the raw sensitive data. We also propose a GPU-parallel Fully Homomorphic Encryption framework that optimizes existing algorithms and can perform string distance metrics such as Hamming, Edit distance and Set Maximal Matching. The GPU-parallel framework is 14.4 and 46.81 times faster for standard and matrix multiplications, respectively compared to the existing techniques. The second part of the thesis targets another privacy setting where the outputs from different genomic data analyses are deemed sensitive. Here, we propose several differentially private mechanisms to share partial genome datasets and intermediate statistics providing a strict privacy guarantee. Experimental results demonstrate that the proposed methods are effective for protecting data privacy while computing and analysis of genomic data. Overall, the proposed techniques in this thesis are not specialized for genomic data but can be generalized to protect other types of sensitive data. | en_US |
dc.description.note | October 2022 | en_US |
dc.identifier.uri | http://hdl.handle.net/1993/36541 | |
dc.language.iso | eng | en_US |
dc.rights | open access | en_US |
dc.subject | Privacy-preserving techniques | en_US |
dc.subject | genomic data privacy | en_US |
dc.subject | healthcare data privacy | en_US |
dc.subject | Homomorphic Encryption | en_US |
dc.subject | Differential Privacy | en_US |
dc.title | Privacy-preserving techniques on genomic data | en_US |
dc.title.alternative | Privacy-Preserving Techniques on Healthcare Data | en_US |
dc.type | doctoral thesis | en_US |