Privacy-preserving synthetic image data generation and classification

Thumbnail Image
Faisal, Fahim
Journal Title
Journal ISSN
Volume Title
Computer vision, generative models (e.g., ChatGPT, etc.), and deep learning are now widely used across various sectors, from large corporations to end devices, simplifying people’s lives and improving the reliability of medical findings. Sensitive image data and deep learning’s high memorization capacity pose privacy risks, particularly for medical images containing sensitive private information. De-anonymization does not work due to the re-identification risk and reduced utility. So, we developed a differentially private approach with selective noise in addition to generating high-dimensional synthetic medical image data with guaranteed differential privacy. In addition to ensuring data privacy, protecting the classification model’s privacy is crucial due to its vulnerability to “membership inference attacks.” State-of-the-art (e.g., differential privacy, etc.) defenses compromised task accuracy to preserve privacy, and some methods reuse private data or require more public data, which is impractical in some domains. To address privacy concerns while maintaining utility, we propose a collaborative distillation approach that transfers knowledge using minimal synthetic data, resulting in a compact private classifier model.
Membership Inference Defense, Knowledge distillation, Data Distillation, Privacy, Computer vision, synthetic data, Generative adversarial network