Privacy-preserving data analysis techniques for biomedical data

dc.contributor.authorAnjum, Md. Monowar
dc.contributor.examiningcommitteeRouhani, Sara (Computer Science)en_US
dc.contributor.examiningcommitteeIrani, Pourang (University of British Columbia)en_US
dc.contributor.supervisorMohammed, Noman
dc.date.accessioned2022-06-24T14:25:51Z
dc.date.available2022-06-24T14:25:51Z
dc.date.copyright2022-06-23
dc.date.issued2022-06-23
dc.date.submitted2022-06-23T22:06:55Zen_US
dc.degree.disciplineComputer Scienceen_US
dc.degree.levelMaster of Science (M.Sc.)en_US
dc.description.abstractPrivacy is a fundamental aspect of modern distributed systems. The data collection mechanism and subsequent analysis often reveals private information about individuals. This is especially true when designing contact tracing systems to combat a pandemic. Contact tracing systems collect vital information about individuals such as their social interaction graph, their frequently visited places, and other sensitive information. Majority of the proposed systems use centralized architecture and population wide deployment. Such macro-level design perspective is prone to privacy and scalability issues. In the first part of the thesis, we address the problems in recently proposed contact tracing systems. We propose a micro-level system design instead of a macro level system design. We propose a system that can be implemented at organizational level and can be scaled without any steep infrastructure cost. Privacy considerations are baked into the system design. The system only stores strictly necessary information from the user and the data never leaves the organization premises. Our proposed system can be scaled up rapidly without the requirement of population wide adoption. Subsequent data analysis from the aggregate statistics of the raw data collected by our proposed system is performed in a privacy-preserving manner. In the field of epidemiology and clinical modeling, a summary of raw biomedical data is used to fit or train disease-specific specialized models. Generalized Linear Mixed Model is one such widely used model. Training such models on sensitive data in a collaborative setting often entail privacy risks. Standard privacy-preserving mechanisms such as differential privacy can be used to mitigate the privacy risk during training the model. However, experimental evidence suggests that adding differential privacy to the training of the model can cause significant utility loss which makes the model impractical for real-world usage. Therefore, it becomes clear that generalized linear mixed models which lose their usability under differential privacy require a different approach for privacy-preserving model training. In the second part of the thesis, we propose a value-blind training method in a federated setting for generalized linear mixed models. In our proposed training method, the central server optimizes model parameters without ever getting access to the raw training data or intermediate computation values. Intermediate computation values that are shared by the collaborating parties with the central server are encrypted using homomorphic encryption. We formally prove the security of our proposed model. Experimentation on multiple datasets suggests that the model trained by our proposed method achieves a very low error rate while preserving privacy. To the best of our knowledge, this is the first work that performs a systematic privacy analysis of generalized linear mixed model training in a federated setting.en_US
dc.description.noteOctober 2022en_US
dc.identifier.urihttp://hdl.handle.net/1993/36567
dc.language.isoengen_US
dc.rightsopen accessen_US
dc.subjectdata securityen_US
dc.subjectmachine learningen_US
dc.subjectfederated learningen_US
dc.subjectbiomedical data privacyen_US
dc.subjecthomomorphic encryptionen_US
dc.titlePrivacy-preserving data analysis techniques for biomedical dataen_US
dc.typemaster thesisen_US
local.subject.manitobanoen_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Monowar_Anjum.pdf
Size:
2.31 MB
Format:
Adobe Portable Document Format
Description:
Thesis
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
2.2 KB
Format:
Item-specific license agreed to upon submission
Description: