Inproceedings,

Differentially Private Multi-Label Learning Is Harder Than You'd Think

A. Hannemann, B. Friedl, and E. Buchmann.
2024 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW), page 40-47. (July 2024)
DOI: 10.1109/EuroSPW61312.2024.00012

Abstract

Machine Learning is key in modern data analysis. However, privacy concerns related to sharing sensitive data often hinder the full potential of Machine Learning. One of the popular techniques to ensure privacy is the Private Ag-gregation of Teacher Ensembles (PATE) framework. PATE trains an ensemble of teacher models on private data and transfers the knowledge to a student model, with rigorous privacy guarantees derived using Differential Privacy. So far, PATE has been mainly applied to binary and multi-class classifications. However, such problem types often diverge from real-world scenarios. In medicine, patients often have numerous diseases, and in healthcare multiple treatments may be relevant, not just one. In multi-label problems, the objective is to identify not just a single label, but rather a set of relevant labels that characterizes the data point accurately. This paper explores the intricacies of applying PATE to multi-label learning and identifies the challenges involved. Since the original aggregation algorithms can only handle binary and multi-class problems, we generalize them and call the variants GNThreshold, Confident GNThreshold and Inter-active GNThreshold. We provide a formalized description and privacy analysis. We conduct comparative experiments on image and numeric data sets and evaluate model quality and privacy bounds. Finally, we summarize our findings, analyze challenges and discuss possible optimizations for multi-label PATE. Data and Code Availability We experiment with the publicly available NUS-WIDE data set 6 and multiple synthetically generated data sets 26. Data description and processing details are in section 5. Our code can be found at https//github.com/BonJamon/MultiLabeIPate

BibTeX key: 10628452
entry type: inproceedings
booktitle: 2024 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW)
year: 2024
month: July
pages: 40-47
issn: 2768-0657
DOI: 10.1109/EuroSPW61312.2024.00012

Users

Comments and Reviewsshow / hide

Please log in to take part in the discussion (add own reviews or comments).

Cite this publication

%0 Conference Paper %1 10628452 %A Hannemann, Anika %A Friedl, Benjamin %A Buchmann, Erik %B 2024 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW) %D 2024 %K imported yaff %P 40-47 %R 10.1109/EuroSPW61312.2024.00012 %T Differentially Private Multi-Label Learning Is Harder Than You'd Think %X Machine Learning is key in modern data analysis. However, privacy concerns related to sharing sensitive data often hinder the full potential of Machine Learning. One of the popular techniques to ensure privacy is the Private Ag-gregation of Teacher Ensembles (PATE) framework. PATE trains an ensemble of teacher models on private data and transfers the knowledge to a student model, with rigorous privacy guarantees derived using Differential Privacy. So far, PATE has been mainly applied to binary and multi-class classifications. However, such problem types often diverge from real-world scenarios. In medicine, patients often have numerous diseases, and in healthcare multiple treatments may be relevant, not just one. In multi-label problems, the objective is to identify not just a single label, but rather a set of relevant labels that characterizes the data point accurately. This paper explores the intricacies of applying PATE to multi-label learning and identifies the challenges involved. Since the original aggregation algorithms can only handle binary and multi-class problems, we generalize them and call the variants GNThreshold, Confident GNThreshold and Inter-active GNThreshold. We provide a formalized description and privacy analysis. We conduct comparative experiments on image and numeric data sets and evaluate model quality and privacy bounds. Finally, we summarize our findings, analyze challenges and discuss possible optimizations for multi-label PATE. Data and Code Availability We experiment with the publicly available NUS-WIDE data set 6 and multiple synthetically generated data sets 26. Data description and processing details are in section 5. Our code can be found at https//github.com/BonJamon/MultiLabeIPate

@inproceedings{10628452, abstract = {Machine Learning is key in modern data analysis. However, privacy concerns related to sharing sensitive data often hinder the full potential of Machine Learning. One of the popular techniques to ensure privacy is the Private Ag-gregation of Teacher Ensembles (PATE) framework. PATE trains an ensemble of teacher models on private data and transfers the knowledge to a student model, with rigorous privacy guarantees derived using Differential Privacy. So far, PATE has been mainly applied to binary and multi-class classifications. However, such problem types often diverge from real-world scenarios. In medicine, patients often have numerous diseases, and in healthcare multiple treatments may be relevant, not just one. In multi-label problems, the objective is to identify not just a single label, but rather a set of relevant labels that characterizes the data point accurately. This paper explores the intricacies of applying PATE to multi-label learning and identifies the challenges involved. Since the original aggregation algorithms can only handle binary and multi-class problems, we generalize them and call the variants GNThreshold, Confident GNThreshold and Inter-active GNThreshold. We provide a formalized description and privacy analysis. We conduct comparative experiments on image and numeric data sets and evaluate model quality and privacy bounds. Finally, we summarize our findings, analyze challenges and discuss possible optimizations for multi-label PATE. Data and Code Availability We experiment with the publicly available NUS-WIDE data set [6] and multiple synthetically generated data sets [26]. Data description and processing details are in section 5. Our code can be found at https//github.com/BonJamon/MultiLabeIPate}, added-at = {2024-12-20T11:15:04.000+0100}, author = {Hannemann, Anika and Friedl, Benjamin and Buchmann, Erik}, biburl = {https://puma.scadsai.uni-leipzig.de/bibtex/273f8fbcb68538325ce5a580a7769bcd6/scadsfct}, booktitle = {2024 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW)}, doi = {10.1109/EuroSPW61312.2024.00012}, interhash = {6c0d9128fb09a7476f31f32aa090b42a}, intrahash = {73f8fbcb68538325ce5a580a7769bcd6}, issn = {2768-0657}, keywords = {imported yaff}, month = {July}, pages = {40-47}, timestamp = {2025-02-05T16:02:24.000+0100}, title = {Differentially Private Multi-Label Learning Is Harder Than You'd Think}, year = 2024 }

PUMA

Differentially Private Multi-Label Learning Is Harder Than You'd Think

Abstract

Tags

Users

Comments and Reviewsshow / hide

Cite this publication

More citation styles

search on