Add the getting started page to the external Tensorflow Responsible AI Guide.

PiperOrigin-RevId: 389961144
2021-08-10 13:17:06 -07:00 · 2021-08-10 13:17:06 -07:00 · 26f3d8368f
commit 26f3d8368f
parent c447a1a3c2
2 changed files with 89 additions and 1 deletions
--- a/g3doc/guide/get_started.md
+++ b/g3doc/guide/get_started.md
@ -1,3 +1,91 @@
 # Get Started

-## Tips
+Using TF Privacy
+
+This document assumes you are already familiar with differential privacy, and
+have determined that you would like to implement TF Privacy to achieve
+differential privacy guarantees in your model(s). If you’re not familiar with
+differential privacy, please review
+[the overview page](https://tensorflow.org/responsible_ai/privacy/guide). After
+installing TF Privacy get started by following these steps:
+
+## 1. Choose a differentially private version of an existing Optimizer
+
+If you’re currently using a TensorFlow
+[optimizer](https://www.tensorflow.org/api_docs/python/tf/keras/optimizers), you
+will most likely want to select an Optimizer with the name `DPKeras*Optimizer`,
+such as [`DPKerasAdamOptimizer`] in [`TF Privacy`].
+
+Optionally, you may try vectorized optimizers like
+[`tf_privacy.VectorizedDPKerasAdamOptimizer`]. for a possible speed improvement
+(in terms of global steps per second). The use of vectorized optimizers has been
+found to provide inconsistent speedups in experiments, but is not yet well
+understood. As before, you will most likely want to use an optimizer analogous
+to the one you're using now. These vectorized optimizers use Tensorflow's
+`vectorized_map` operator, which may not work with some other Tensorflow
+operators. If this is the case for you, please
+[open an issue on the TF Privacy GitHub repository](https://github.com/tensorflow/privacy/issues).
+
+## 2. Compute loss for your input minibatch
+
+When computing the loss for your input minibatch, make sure it is a vector with
+one entry per example, instead of aggregating it into a scalar. This is
+necessary since DP-SGD must be able to compute the loss for individual
+microbatches.
+
+## 3. Train your model
+
+Train your model using the DP Optimizer (step 1) and vectorized loss (step 2).
+There are two options for doing this:
+
+-   Pass the optimizer and loss as arguments to `Model.compile` before calling
+    `Model.fit`.
+-   When writing a custom training loop, use `Optimizer.minimize()` on the
+    vectorized loss.
+
+Once this is done, it’s recommended that you tune your hyperparameters. For a
+complete walkthrough see the
+[classification privacy tutorial](../tutorials/classification_privacy.ipynb)
+
+## 4. Tune the DP-SGD hyperparameters
+
+All `tf_privacy` optimizers take three additional hyperparameters:
+
+*   `l2_norm_clip` or $C$ - Clipping norm (the maximum Euclidean (L2) norm of
+    each individual gradient computed per minibatch).
+*   `noise_multiplier` or $σ$ - Ratio of the standard deviation to the clipping
+    norm.
+*   `num_microbatches` or $B$ - Number of microbatches into which each minibatch
+    is split.
+
+Generally, the lower the effective standard deviation $σC / B$, the better the
+performance of the trained model on its evaluation metrics.
+
+The three new DP-SGD hyperparameters have the following effects and tradeoffs:
+
+1.  The number of microbatches $B$: Generally, increasing this will improve
+    utility because it lowers the standard deviation of the noise. However, it
+    will slow down training in terms of time.
+2.  The clipping norm $C$: Since the standard deviation of the noise scales with
+    C, it is probably best to set $C$ to be some quantile (e.g. median, 75th
+    percentile, 90th percentile) of the gradient norms. Having too large a value
+    of $C$ adds unnecessarily large amounts of noise.
+3.  The noise multiplier $σ$: Of the three hyperparameters, the amount of
+    privacy depends only on the noise multiplier. The larger the noise
+    multiplier, the more privacy is obtained; however, this also comes with a
+    loss of utility.
+
+These tradeoffs between utility, privacy, and speed in terms of steps/second are
+summarized here:
+
+![tradeoffs](./images/getting-started-img.png)
+
+Follow these suggestions to find the optimal hyperparameters:
+
+*   Set $C$ to a quantile as recommended above. A value of 1.00 often works
+    well.
+*   Set $B$ = 1, for maximum training speed.
+*   Experiment to find the largest value of σ that still gives acceptable
+    utility. Generally, values of 0.01 or lower have been observed to work well.
+*   Once a suitable value of $σ$ is found, scale both $B$ and $σ$ by a constant
+    to achieve a reasonable level of privacy.
--- a/g3doc/guide/images/getting-started-img.png
+++ b/g3doc/guide/images/getting-started-img.png