forked from 626_privacy/tensorflow_privacy
Add the getting started page to the external Tensorflow Responsible AI Guide.
PiperOrigin-RevId: 389961144
This commit is contained in:
parent
c447a1a3c2
commit
26f3d8368f
2 changed files with 89 additions and 1 deletions
|
@ -1,3 +1,91 @@
|
||||||
# Get Started
|
# Get Started
|
||||||
|
|
||||||
## Tips
|
Using TF Privacy
|
||||||
|
|
||||||
|
This document assumes you are already familiar with differential privacy, and
|
||||||
|
have determined that you would like to implement TF Privacy to achieve
|
||||||
|
differential privacy guarantees in your model(s). If you’re not familiar with
|
||||||
|
differential privacy, please review
|
||||||
|
[the overview page](https://tensorflow.org/responsible_ai/privacy/guide). After
|
||||||
|
installing TF Privacy get started by following these steps:
|
||||||
|
|
||||||
|
## 1. Choose a differentially private version of an existing Optimizer
|
||||||
|
|
||||||
|
If you’re currently using a TensorFlow
|
||||||
|
[optimizer](https://www.tensorflow.org/api_docs/python/tf/keras/optimizers), you
|
||||||
|
will most likely want to select an Optimizer with the name `DPKeras*Optimizer`,
|
||||||
|
such as [`DPKerasAdamOptimizer`] in [`TF Privacy`].
|
||||||
|
|
||||||
|
Optionally, you may try vectorized optimizers like
|
||||||
|
[`tf_privacy.VectorizedDPKerasAdamOptimizer`]. for a possible speed improvement
|
||||||
|
(in terms of global steps per second). The use of vectorized optimizers has been
|
||||||
|
found to provide inconsistent speedups in experiments, but is not yet well
|
||||||
|
understood. As before, you will most likely want to use an optimizer analogous
|
||||||
|
to the one you're using now. These vectorized optimizers use Tensorflow's
|
||||||
|
`vectorized_map` operator, which may not work with some other Tensorflow
|
||||||
|
operators. If this is the case for you, please
|
||||||
|
[open an issue on the TF Privacy GitHub repository](https://github.com/tensorflow/privacy/issues).
|
||||||
|
|
||||||
|
## 2. Compute loss for your input minibatch
|
||||||
|
|
||||||
|
When computing the loss for your input minibatch, make sure it is a vector with
|
||||||
|
one entry per example, instead of aggregating it into a scalar. This is
|
||||||
|
necessary since DP-SGD must be able to compute the loss for individual
|
||||||
|
microbatches.
|
||||||
|
|
||||||
|
## 3. Train your model
|
||||||
|
|
||||||
|
Train your model using the DP Optimizer (step 1) and vectorized loss (step 2).
|
||||||
|
There are two options for doing this:
|
||||||
|
|
||||||
|
- Pass the optimizer and loss as arguments to `Model.compile` before calling
|
||||||
|
`Model.fit`.
|
||||||
|
- When writing a custom training loop, use `Optimizer.minimize()` on the
|
||||||
|
vectorized loss.
|
||||||
|
|
||||||
|
Once this is done, it’s recommended that you tune your hyperparameters. For a
|
||||||
|
complete walkthrough see the
|
||||||
|
[classification privacy tutorial](../tutorials/classification_privacy.ipynb)
|
||||||
|
|
||||||
|
## 4. Tune the DP-SGD hyperparameters
|
||||||
|
|
||||||
|
All `tf_privacy` optimizers take three additional hyperparameters:
|
||||||
|
|
||||||
|
* `l2_norm_clip` or $C$ - Clipping norm (the maximum Euclidean (L2) norm of
|
||||||
|
each individual gradient computed per minibatch).
|
||||||
|
* `noise_multiplier` or $σ$ - Ratio of the standard deviation to the clipping
|
||||||
|
norm.
|
||||||
|
* `num_microbatches` or $B$ - Number of microbatches into which each minibatch
|
||||||
|
is split.
|
||||||
|
|
||||||
|
Generally, the lower the effective standard deviation $σC / B$, the better the
|
||||||
|
performance of the trained model on its evaluation metrics.
|
||||||
|
|
||||||
|
The three new DP-SGD hyperparameters have the following effects and tradeoffs:
|
||||||
|
|
||||||
|
1. The number of microbatches $B$: Generally, increasing this will improve
|
||||||
|
utility because it lowers the standard deviation of the noise. However, it
|
||||||
|
will slow down training in terms of time.
|
||||||
|
2. The clipping norm $C$: Since the standard deviation of the noise scales with
|
||||||
|
C, it is probably best to set $C$ to be some quantile (e.g. median, 75th
|
||||||
|
percentile, 90th percentile) of the gradient norms. Having too large a value
|
||||||
|
of $C$ adds unnecessarily large amounts of noise.
|
||||||
|
3. The noise multiplier $σ$: Of the three hyperparameters, the amount of
|
||||||
|
privacy depends only on the noise multiplier. The larger the noise
|
||||||
|
multiplier, the more privacy is obtained; however, this also comes with a
|
||||||
|
loss of utility.
|
||||||
|
|
||||||
|
These tradeoffs between utility, privacy, and speed in terms of steps/second are
|
||||||
|
summarized here:
|
||||||
|
|
||||||
|
![tradeoffs](./images/getting-started-img.png)
|
||||||
|
|
||||||
|
Follow these suggestions to find the optimal hyperparameters:
|
||||||
|
|
||||||
|
* Set $C$ to a quantile as recommended above. A value of 1.00 often works
|
||||||
|
well.
|
||||||
|
* Set $B$ = 1, for maximum training speed.
|
||||||
|
* Experiment to find the largest value of σ that still gives acceptable
|
||||||
|
utility. Generally, values of 0.01 or lower have been observed to work well.
|
||||||
|
* Once a suitable value of $σ$ is found, scale both $B$ and $σ$ by a constant
|
||||||
|
to achieve a reasonable level of privacy.
|
||||||
|
|
BIN
g3doc/guide/images/getting-started-img.png
Normal file
BIN
g3doc/guide/images/getting-started-img.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 109 KiB |
Loading…
Reference in a new issue