Privacy-preserving data publishing using deep learning techniques

Loading...
Thumbnail Image
Date
2021-05-10
Authors
Ahmed, Tanbir
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
According to a recent study, around 99 percent of hospitals across the United States now use electronic health record systems. One of the most common types of EHR data is unstructured textual data and unlocking hidden details from this data is critical for improving current medical practices and research endeavors. However, these textual data contain sensitive information, which could compromise our privacy. Therefore, medical textual data cannot be released publicly without any privacy protection. De-identification is a process of detecting and removing all sensitive information present in EHRs, and it is a necessary step towards privacy-preserving EHR data sharing. Since 2016, we have seen several deep learning-based approaches for de-identification, which achieved over 98% accuracy. However, these models are trained with sensitive information and can unwittingly memorize some of their training data, and a careful analysis of these models can reveal patients' data. This thesis presents two contributions. First, We introduce new methods to de-identify textual based on self-attention mechanism and stacked Recurrent Neural Network. Experimental results on three different datasets show that our model performs better than all state-of-the-art mechanisms irrespective of the dataset. Additionally, our proposed method is significantly faster than existing techniques. We also introduced three utility metrics to judge the quality of the de-identified data. Second, we propose a differentially private ensemble framework for de-identification, allowing medical researchers to collaborate through publicly publishing the de-identification models. Experiments in three different datasets showed competitive results compared to the state-of-the-art methods with guaranteed differential privacy.
Description
Keywords
Data Privacy, Natural Language Processing, Electronic Health Record, Deep Learning
Citation