Deep Studying with Label Differential Privateness


Over the past a number of years, there was an elevated deal with creating differential privateness (DP) machine studying (ML) algorithms. DP has been the premise of a number of sensible deployments in trade — and has even been employed by the U.S. Census — as a result of it allows the understanding of system and algorithm privateness ensures. The underlying assumption of DP is that altering a single person’s contribution to an algorithm mustn’t considerably change its output distribution.

In the usual supervised studying setting, a mannequin is skilled to make a prediction of the label for every enter given a coaching set of instance pairs {[input1,label1], …, [inputn, labeln]}. Within the case of deep studying, earlier work launched a DP coaching framework, DP-SGD, that was built-in into TensorFlow and PyTorch. DP-SGD protects the privateness of every instance pair [input, label] by including noise to the stochastic gradient descent (SGD) coaching algorithm. But regardless of intensive efforts, most often, the accuracy of fashions skilled with DP-SGD stays considerably decrease than that of non-private fashions.

DP algorithms embrace a privateness finances, ε, which quantifies the worst-case privateness loss for every person. Particularly, ε displays how a lot the likelihood of any specific output of a DP algorithm can change if one replaces any instance of the coaching set with an arbitrarily completely different one. So, a smaller ε corresponds to raised privateness, because the algorithm is extra detached to modifications of a single instance. Nonetheless, since smaller ε tends to harm mannequin utility extra, it’s not unusual to contemplate ε as much as 8 in deep studying functions. Notably, for the extensively used multiclass picture classification dataset, CIFAR-10, the highest reported accuracy (with out pre-training) for DP fashions with ε = 3 is 69.3%, a outcome that depends on handcrafted visible options. In distinction, non-private eventualities (ε = ∞) with discovered options have proven to realize >95% accuracy whereas utilizing trendy neural community architectures. This efficiency hole stays a roadblock for a lot of real-world functions to undertake DP. Furthermore, regardless of latest advances, DP-SGD usually comes with elevated computation and reminiscence overhead because of slower convergence and the necessity to compute the norm of the per-example gradient.

In “Deep Studying with Label Differential Privateness”, offered at NeurIPS 2021, we take into account a extra relaxed, however necessary, particular case known as label differential privateness (LabelDP), the place we assume the inputs (enter1, …, entern) are public, and solely the privateness of the coaching labels (label1, …, labeln) must be protected. With this relaxed assure, we will design novel algorithms that make the most of a previous understanding of the labels to enhance the mannequin utility. We show that LabelDP achieves 20% greater accuracy than DP-SGD on the CIFAR-10 dataset. Our outcomes throughout a number of duties verify that LabelDP may considerably slim the efficiency hole between non-public fashions and their non-private counterparts, mitigating the challenges in actual world functions. We additionally current a multi-stage algorithm for coaching deep neural networks with LabelDP. Lastly, we’re excited to launch the code for this multi-stage coaching algorithm.


The notion of LabelDP has been studied within the Most likely Roughly Appropriate (PAC) studying setting, and captures a number of sensible eventualities. Examples embrace: (i) computational promoting, the place impressions are identified to the advertiser and thus thought of non-sensitive, however conversions reveal person curiosity and are thus non-public; (ii) advice methods, the place the alternatives are identified to a streaming service supplier, however the person rankings are thought of delicate; and (iii) person surveys and analytics, the place demographic info (e.g., age, gender) is non-sensitive, however earnings is delicate.

We make a number of key observations on this situation. (i) When solely the labels must be protected, a lot less complicated algorithms could be utilized for knowledge preprocessing to realize LabelDP with none modifications to the present deep studying coaching pipeline. For instance, the traditional Randomized Response (RR) algorithm, designed to get rid of evasive reply biases in survey aggregation, achieves LabelDP by merely flipping the label to a random one with a likelihood that is determined by ε. (ii) Conditioned on the (public) enter, we will compute a previous likelihood distribution, which supplies a previous perception of the chance of the category labels for the given enter. With a novel variant of RR, RR-with-prior, we will incorporate prior info to scale back the label noise whereas sustaining the identical privateness assure as classical RR.

The determine under illustrates how RR-with-prior works. Assume a mannequin is constructed to categorise an enter picture into 10 classes. Take into account a coaching instance with the label “airplane”. To ensure LabelDP, classical RR returns a random label sampled in keeping with a given distribution (see the top-right panel of the determine under). The smaller the focused privateness finances ε is, the bigger the likelihood of sampling an incorrect label must be. Now assume we now have a previous likelihood exhibiting that the given enter is “seemingly an object that flies” (decrease left panel). With the prior, RR-with-prior will discard all labels with small prior and solely pattern from the remaining labels. By dropping these unlikely labels, the likelihood of returning the proper label is considerably elevated, whereas sustaining the identical privateness finances ε (decrease proper panel).

Randomized response: If no prior info is given (top-left), all courses are sampled with equal likelihood. The likelihood of sampling the true class (P[airplane] ≈ 0.5) is greater if the privateness finances is greater (top-right). RR-with-prior: Assuming a previous distribution (bottom-left), unlikely courses are “suppressed” from the sampling distribution (bottom-right). So the likelihood of sampling the true class (P[airplane] ≈ 0.9) is elevated underneath the identical privateness finances.

A Multi-stage Coaching Algorithm

Primarily based on the RR-with-prior observations, we current a multi-stage algorithm for coaching deep neural networks with LabelDP. First, the coaching set is randomly partitioned into a number of subsets. An preliminary mannequin is then skilled on the primary subset utilizing classical RR. Lastly, the algorithm divides the info into a number of elements, and at every stage, a single half is used to coach the mannequin. The labels are produced utilizing RR-with-prior, and the priors are primarily based on the prediction of the mannequin skilled to date.

An illustration of the multi-stage coaching algorithm. The coaching set is partitioned into t disjoint subsets. An preliminary mannequin is skilled on the primary subset utilizing classical RR. Then the skilled mannequin is used to offer prior predictions within the RR-with-prior step and within the coaching of the later phases.


We benchmark the multi-stage coaching algorithm’s empirical efficiency on a number of datasets, domains, and architectures. On the CIFAR-10 multi-class classification activity for a similar privateness finances ε, the multi-stage coaching algorithm (blue within the determine under) guaranteeing LabelDP achieves 20% greater accuracy than DP-SGD. We emphasize that LabelDP protects solely the labels whereas DP-SGD protects each the inputs and labels, so this isn’t a strictly truthful comparability. Nonetheless, this outcome demonstrates that for particular software eventualities the place solely the labels must be protected, LabelDP may result in vital enhancements within the mannequin utility whereas narrowing the efficiency hole between non-public fashions and public baselines.

Comparability of the mannequin utility (take a look at accuracy) of various algorithms underneath completely different privateness budgets.

In some domains, prior information is of course obtainable or could be constructed utilizing publicly obtainable knowledge solely. For instance, many machine studying methods have historic fashions which may very well be evaluated on new knowledge to offer label priors. In domains the place unsupervised or self-supervised studying algorithms work effectively, priors is also constructed from fashions pre-trained on unlabeled (subsequently public with respect to LabelDP) knowledge. Particularly, we show two self-supervised studying algorithms in our CIFAR-10 analysis (orange and inexperienced traces within the determine above). We use self-supervised studying fashions to compute representations for the coaching examples and run k-means clustering on the representations. Then, we spend a small quantity of privateness finances (ε ≤ 0.05) to question a histogram of the label distribution of every cluster and use that because the label prior for the factors in every cluster. This prior considerably boosts the mannequin utility within the low privateness finances regime (ε < 1).

Comparable observations maintain throughout a number of datasets equivalent to MNIST, Vogue-MNIST and non-vision domains, such because the MovieLens-1M film ranking activity. Please see our paper for the complete report on the empirical outcomes.

The empirical outcomes recommend that defending the privateness of the labels could be considerably simpler than defending the privateness of each the inputs and labels. This may also be mathematically confirmed underneath particular settings. Particularly, we will present that for convex stochastic optimization, the pattern complexity of algorithms privatizing the labels is way smaller than that of algorithms privatizing each labels and inputs. In different phrases, to realize the identical stage of mannequin utility underneath the identical privateness finances, LabelDP requires fewer coaching examples.


We demonstrated that each empirical and theoretical outcomes recommend that LabelDP is a promising rest of the complete DP assure. In functions the place the privateness of the inputs doesn’t must be protected, LabelDP may cut back the efficiency hole between a personal mannequin and the non-private baseline. For future work, we plan to design higher LabelDP algorithms for different duties past multi-class classification. We hope that the discharge of the multi-stage coaching algorithm code supplies researchers with a helpful useful resource for DP analysis.


This work was carried out in collaboration with Badih Ghazi, Noah Golowich, and Ravi Kumar. We additionally thank Sami Torbey for worthwhile suggestions on our work.


Please enter your comment!
Please enter your name here