Add the measure privacy page to the external Tensorflow Responsible AI Guide.

PiperOrigin-RevId: 389961385
2021-08-10 13:18:03 -07:00 · 2021-08-10 13:18:03 -07:00 · dad4ff0a58
commit dad4ff0a58
parent 26f3d8368f
1 changed files with 45 additions and 2 deletions
--- a/g3doc/guide/measure_privacy.md
+++ b/g3doc/guide/measure_privacy.md
@ -1,5 +1,48 @@
 # Measure Privacy

-[TOC]
+Differential privacy is a framework for measuring the privacy guarantees
+provided by an algorithm and can be expressed using the values ε (epsilon) and δ
+(delta). Of the two, ε is the more important and more sensitive to the choice of
+hyperparameters. Roughly speaking, they mean the following:

-## Tips
+*   ε gives a ceiling on how much the probability of a particular output can
+    increase by including (or removing) a single training example. You usually
+    want it to be a small constant (less than 10, or, for more stringent privacy
+    guarantees, less than 1). However, this is only an upper bound, and a large
+    value of epsilon may still mean good practical privacy.
+*   δ bounds the probability of an arbitrary change in model behavior. You can
+    usually set this to a very small number (1e-7 or so) without compromising
+    utility. A rule of thumb is to set it to be less than the inverse of the
+    training data size.
+
+The relationship between training hyperparameters and the resulting privacy in
+terms of (ε, δ) is complicated and tricky to state explicitly. Our current
+recommended approach is at the bottom of the [Get Started page](get_started.md),
+which involves finding the maximum noise multiplier one can use while still
+having reasonable utility, and then scaling the noise multiplier and number of
+microbatches. TensorFlow Privacy provides a tool, `compute_dp_sgd_privacy` to
+compute (ε, δ) based on the noise multiplier σ, the number of training steps
+taken, and the fraction of input data consumed at each step. The amount of
+privacy increases with the noise multiplier σ and decreases the more times the
+data is used on training. Generally, in order to achieve an epsilon of at most
+10.0, we need to set the noise multiplier to around 0.3 to 0.5, depending on the
+dataset size and number of epochs. See the
+[classification privacy tutorial](../tutorials/classification_privacy.ipynb) to
+see the approach.
+
+For more detail, you can see
+[the original DP-SGD paper](https://arxiv.org/pdf/1607.00133.pdf).
+
+You can use `compute_dp_sgd_privacy`, to find out the epsilon given a fixed
+delta value for your model [../tutorials/classification_privacy.ipynb]:
+
+*   `q` : the sampling ratio - the probability of an individual training point
+    being included in a mini batch (`batch_size/number_of_examples`).
+*   `noise_multiplier` : A float that governs the amount of noise added during
+    training. Generally, more noise results in better privacy and lower utility.
+    This generally
+*   `steps` : The number of global steps taken.
+
+A detailed writeup of the theory behind the computation of epsilon and delta is
+available at
+[Differential Privacy of the Sampled Gaussian Mechanism](https://arxiv.org/abs/1908.10530).