Inproceedings,

Differentially Private Multi-Label Learning Is Harder Than You'd Think

, , and .
2024 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW), page 40-47. (July 2024)
DOI: 10.1109/EuroSPW61312.2024.00012

Abstract

Machine Learning is key in modern data analysis. However, privacy concerns related to sharing sensitive data often hinder the full potential of Machine Learning. One of the popular techniques to ensure privacy is the Private Ag-gregation of Teacher Ensembles (PATE) framework. PATE trains an ensemble of teacher models on private data and transfers the knowledge to a student model, with rigorous privacy guarantees derived using Differential Privacy. So far, PATE has been mainly applied to binary and multi-class classifications. However, such problem types often diverge from real-world scenarios. In medicine, patients often have numerous diseases, and in healthcare multiple treatments may be relevant, not just one. In multi-label problems, the objective is to identify not just a single label, but rather a set of relevant labels that characterizes the data point accurately. This paper explores the intricacies of applying PATE to multi-label learning and identifies the challenges involved. Since the original aggregation algorithms can only handle binary and multi-class problems, we generalize them and call the variants GNThreshold, Confident GNThreshold and Inter-active GNThreshold. We provide a formalized description and privacy analysis. We conduct comparative experiments on image and numeric data sets and evaluate model quality and privacy bounds. Finally, we summarize our findings, analyze challenges and discuss possible optimizations for multi-label PATE. Data and Code Availability We experiment with the publicly available NUS-WIDE data set 6 and multiple synthetically generated data sets 26. Data description and processing details are in section 5. Our code can be found at https//github.com/BonJamon/MultiLabeIPate

Tags

Users

  • @scadsfct

Comments and Reviews