Privacy-preserving synthetic image data generation and classification

Loading...
Thumbnail Image

Authors

Faisal, Fahim

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Computer vision, generative models (e.g., ChatGPT, etc.), and deep learning are now widely used across various sectors, from large corporations to end devices, simplifying people’s lives and improving the reliability of medical findings. Sensitive image data and deep learning’s high memorization capacity pose privacy risks, particularly for medical images containing sensitive private information. De-anonymization does not work due to the re-identification risk and reduced utility. So, we developed a differentially private approach with selective noise in addition to generating high-dimensional synthetic medical image data with guaranteed differential privacy. In addition to ensuring data privacy, protecting the classification model’s privacy is crucial due to its vulnerability to “membership inference attacks.” State-of-the-art (e.g., differential privacy, etc.) defenses compromised task accuracy to preserve privacy, and some methods reuse private data or require more public data, which is impractical in some domains. To address privacy concerns while maintaining utility, we propose a collaborative distillation approach that transfers knowledge using minimal synthetic data, resulting in a compact private classifier model.

Description

Keywords

Membership Inference Defense, Knowledge distillation, Data Distillation, Privacy, Computer vision, synthetic data, Generative adversarial network

Citation