Privacy-preserving data analysis techniques for biomedical data

Anjum, Md. Monowar

Privacy-preserving data analysis techniques for biomedical data

dc.contributor.author	Anjum, Md. Monowar
dc.contributor.examiningcommittee	Rouhani, Sara (Computer Science)	en_US
dc.contributor.examiningcommittee	Irani, Pourang (University of British Columbia)	en_US
dc.contributor.supervisor	Mohammed, Noman
dc.date.accessioned	2022-06-24T14:25:51Z
dc.date.available	2022-06-24T14:25:51Z
dc.date.copyright	2022-06-23
dc.date.issued	2022-06-23
dc.date.submitted	2022-06-23T22:06:55Z	en_US
dc.degree.discipline	Computer Science	en_US
dc.degree.level	Master of Science (M.Sc.)	en_US
dc.description.abstract	Privacy is a fundamental aspect of modern distributed systems. The data collection mechanism and subsequent analysis often reveals private information about individuals. This is especially true when designing contact tracing systems to combat a pandemic. Contact tracing systems collect vital information about individuals such as their social interaction graph, their frequently visited places, and other sensitive information. Majority of the proposed systems use centralized architecture and population wide deployment. Such macro-level design perspective is prone to privacy and scalability issues. In the first part of the thesis, we address the problems in recently proposed contact tracing systems. We propose a micro-level system design instead of a macro level system design. We propose a system that can be implemented at organizational level and can be scaled without any steep infrastructure cost. Privacy considerations are baked into the system design. The system only stores strictly necessary information from the user and the data never leaves the organization premises. Our proposed system can be scaled up rapidly without the requirement of population wide adoption. Subsequent data analysis from the aggregate statistics of the raw data collected by our proposed system is performed in a privacy-preserving manner. In the field of epidemiology and clinical modeling, a summary of raw biomedical data is used to fit or train disease-specific specialized models. Generalized Linear Mixed Model is one such widely used model. Training such models on sensitive data in a collaborative setting often entail privacy risks. Standard privacy-preserving mechanisms such as differential privacy can be used to mitigate the privacy risk during training the model. However, experimental evidence suggests that adding differential privacy to the training of the model can cause significant utility loss which makes the model impractical for real-world usage. Therefore, it becomes clear that generalized linear mixed models which lose their usability under differential privacy require a different approach for privacy-preserving model training. In the second part of the thesis, we propose a value-blind training method in a federated setting for generalized linear mixed models. In our proposed training method, the central server optimizes model parameters without ever getting access to the raw training data or intermediate computation values. Intermediate computation values that are shared by the collaborating parties with the central server are encrypted using homomorphic encryption. We formally prove the security of our proposed model. Experimentation on multiple datasets suggests that the model trained by our proposed method achieves a very low error rate while preserving privacy. To the best of our knowledge, this is the first work that performs a systematic privacy analysis of generalized linear mixed model training in a federated setting.	en_US
dc.description.note	October 2022	en_US
dc.identifier.uri	http://hdl.handle.net/1993/36567
dc.language.iso	eng	en_US
dc.rights	open access	en_US
dc.subject	data security	en_US
dc.subject	machine learning	en_US
dc.subject	federated learning	en_US
dc.subject	biomedical data privacy	en_US
dc.subject	homomorphic encryption	en_US
dc.title	Privacy-preserving data analysis techniques for biomedical data	en_US
dc.type	master thesis	en_US
local.subject.manitoba	no	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Monowar_Anjum.pdf
Size:: 2.31 MB
Format:: Adobe Portable Document Format
Description:: Thesis

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 2.2 KB
Format:: Item-specific license agreed to upon submission
Description:

Download

Collections

FGS - Electronic Theses and Practica