From dad4ff0a589b74c378320f2134c6efde65818ab9 Mon Sep 17 00:00:00 2001 From: "A. Unique TensorFlower" Date: Tue, 10 Aug 2021 13:18:03 -0700 Subject: [PATCH] Add the measure privacy page to the external Tensorflow Responsible AI Guide. PiperOrigin-RevId: 389961385 --- g3doc/guide/measure_privacy.md | 47 ++++++++++++++++++++++++++++++++-- 1 file changed, 45 insertions(+), 2 deletions(-) diff --git a/g3doc/guide/measure_privacy.md b/g3doc/guide/measure_privacy.md index 0cda83e..d6c2a73 100644 --- a/g3doc/guide/measure_privacy.md +++ b/g3doc/guide/measure_privacy.md @@ -1,5 +1,48 @@ # Measure Privacy -[TOC] +Differential privacy is a framework for measuring the privacy guarantees +provided by an algorithm and can be expressed using the values ε (epsilon) and δ +(delta). Of the two, ε is the more important and more sensitive to the choice of +hyperparameters. Roughly speaking, they mean the following: -## Tips +* ε gives a ceiling on how much the probability of a particular output can + increase by including (or removing) a single training example. You usually + want it to be a small constant (less than 10, or, for more stringent privacy + guarantees, less than 1). However, this is only an upper bound, and a large + value of epsilon may still mean good practical privacy. +* δ bounds the probability of an arbitrary change in model behavior. You can + usually set this to a very small number (1e-7 or so) without compromising + utility. A rule of thumb is to set it to be less than the inverse of the + training data size. + +The relationship between training hyperparameters and the resulting privacy in +terms of (ε, δ) is complicated and tricky to state explicitly. Our current +recommended approach is at the bottom of the [Get Started page](get_started.md), +which involves finding the maximum noise multiplier one can use while still +having reasonable utility, and then scaling the noise multiplier and number of +microbatches. TensorFlow Privacy provides a tool, `compute_dp_sgd_privacy` to +compute (ε, δ) based on the noise multiplier σ, the number of training steps +taken, and the fraction of input data consumed at each step. The amount of +privacy increases with the noise multiplier σ and decreases the more times the +data is used on training. Generally, in order to achieve an epsilon of at most +10.0, we need to set the noise multiplier to around 0.3 to 0.5, depending on the +dataset size and number of epochs. See the +[classification privacy tutorial](../tutorials/classification_privacy.ipynb) to +see the approach. + +For more detail, you can see +[the original DP-SGD paper](https://arxiv.org/pdf/1607.00133.pdf). + +You can use `compute_dp_sgd_privacy`, to find out the epsilon given a fixed +delta value for your model [../tutorials/classification_privacy.ipynb]: + +* `q` : the sampling ratio - the probability of an individual training point + being included in a mini batch (`batch_size/number_of_examples`). +* `noise_multiplier` : A float that governs the amount of noise added during + training. Generally, more noise results in better privacy and lower utility. + This generally +* `steps` : The number of global steps taken. + +A detailed writeup of the theory behind the computation of epsilon and delta is +available at +[Differential Privacy of the Sampled Gaussian Mechanism](https://arxiv.org/abs/1908.10530).