tensorflow_privacy/g3doc/guide/get_started.md

# Get Started


This document assumes you are already familiar with differential privacy, and
have determined that you would like to use TF Privacy to implement differential
privacy guarantees in your model(s). If you’re not familiar with differential
privacy, please review
[the overview page](https://tensorflow.org/responsible_ai/privacy/guide). After
installing TF Privacy, get started by following these steps:

## 1. Choose a differentially private version of an existing Optimizer

If you’re currently using a TensorFlow
[optimizer](https://www.tensorflow.org/api_docs/python/tf/keras/optimizers), you
will most likely want to select an Optimizer with the name `DPKeras*Optimizer`,
such as [`DPKerasAdamOptimizer`] in [`TF Privacy`].

Optionally, you may try vectorized optimizers like
[`tf_privacy.VectorizedDPKerasAdamOptimizer`]. for a possible speed improvement
(in terms of global steps per second). The use of vectorized optimizers has been
found to provide inconsistent speedups in experiments, but is not yet well
understood. As before, you will most likely want to use an optimizer analogous
to the one you're using now. These vectorized optimizers use Tensorflow's
`vectorized_map` operator, which may not work with some other Tensorflow
operators. If this is the case for you, please
[open an issue on the TF Privacy GitHub repository](https://github.com/tensorflow/privacy/issues).

## 2. Compute loss for your input minibatch

When computing the loss for your input minibatch, make sure it is a vector with
one entry per example, instead of aggregating it into a scalar. This is
necessary since DP-SGD must be able to compute the loss for individual
microbatches.

## 3. Train your model

Train your model using the DP Optimizer (step 1) and vectorized loss (step 2).
There are two options for doing this:

*   Pass the optimizer and loss as arguments to `Model.compile` before calling
    `Model.fit`.
*   When writing a custom training loop, use `Optimizer.minimize()` on the
    vectorized loss.

Once this is done, it’s recommended that you tune your hyperparameters. For a
complete walkthrough see the
[classification privacy tutorial](../tutorials/classification_privacy.ipynb)

## 4. Tune the DP-SGD hyperparameters

All `tf_privacy` optimizers take three additional hyperparameters:

*   `l2_norm_clip` or $C$ - Clipping norm (the maximum Euclidean (L2) norm of
    each individual gradient computed per minibatch).
*   `noise_multiplier` or $σ$ - Ratio of the standard deviation to the clipping
    norm.
*   `num_microbatches` or $B$ - Number of microbatches into which each minibatch
    is split.

Generally, the lower the effective standard deviation $σC / B$, the better the
performance of the trained model on its evaluation metrics.

The three new DP-SGD hyperparameters have the following effects and tradeoffs:

1.  The number of microbatches $B$: Generally, increasing this will improve
    utility because it lowers the standard deviation of the noise. However, it
    will slow down training in terms of time.
2.  The clipping norm $C$: Since the standard deviation of the noise scales with
    $C$, it is probably best to set $C$ to be some quantile (e.g. median, 75th
    percentile, 90th percentile) of the gradient norms. Having too large a value
    of $C$ adds unnecessarily large amounts of noise.
3.  The noise multiplier $σ$: Of the three hyperparameters, the amount of
    privacy depends only on the noise multiplier. The larger the noise
    multiplier, the more privacy is obtained; however, this also comes with a
    loss of utility.

These tradeoffs between utility, privacy, and speed in terms of steps/second are
summarized here:

![tradeoffs](./images/getting-started-img.png)

Follow these suggestions to find the optimal hyperparameters:

*   Set $C$ to a quantile as recommended above. A value of 1.00 often works
    well.
*   Set $B$ = 1, for maximum training speed.
*   Experiment to find the largest value of σ that still gives acceptable
    utility. Generally, values of 0.01 or lower have been observed to work well.
*   Once a suitable value of $σ$ is found, scale both $B$ and $σ$ by a constant
    to achieve a reasonable level of privacy.
-												Add a skeleton g3doc directory.
Move notebook files to g3doc.
  - Some style and code fixes for notebooks.
Add api-reference generation script.

PiperOrigin-RevId: 372233296

											
										
										
											2021-05-05 17:41:19 -06:00
+								# Get Started
-												Update narrative content.

PiperOrigin-RevId: 394558889

											
										
										
											2021-09-02 16:37:43 -06:00
-												    Add the getting started page to the external Tensorflow Responsible AI Guide.

PiperOrigin-RevId: 389961144

											
										
										
											2021-08-10 14:17:06 -06:00
+								This document assumes you are already familiar with differential privacy, and
-												Update narrative content.

PiperOrigin-RevId: 394558889

											
										
										
											2021-09-02 16:37:43 -06:00
+								have determined that you would like to use TF Privacy to implement differential
 								privacy guarantees in your model(s). If you’re not familiar with differential
 								privacy, please review
-												    Add the getting started page to the external Tensorflow Responsible AI Guide.

PiperOrigin-RevId: 389961144

											
										
										
											2021-08-10 14:17:06 -06:00
+								[the overview page](https://tensorflow.org/responsible_ai/privacy/guide). After
-												Update narrative content.

PiperOrigin-RevId: 394558889

											
										
										
											2021-09-02 16:37:43 -06:00
+								installing TF Privacy, get started by following these steps:
-												    Add the getting started page to the external Tensorflow Responsible AI Guide.

PiperOrigin-RevId: 389961144

											
										
										
											2021-08-10 14:17:06 -06:00
 								## 1. Choose a differentially private version of an existing Optimizer
 								If you’re currently using a TensorFlow
 								[optimizer](https://www.tensorflow.org/api_docs/python/tf/keras/optimizers), you
 								will most likely want to select an Optimizer with the name `DPKeras*Optimizer`,
 								such as [`DPKerasAdamOptimizer`] in [`TF Privacy`].
 								Optionally, you may try vectorized optimizers like
 								[`tf_privacy.VectorizedDPKerasAdamOptimizer`]. for a possible speed improvement
 								(in terms of global steps per second). The use of vectorized optimizers has been
 								found to provide inconsistent speedups in experiments, but is not yet well
 								understood. As before, you will most likely want to use an optimizer analogous
 								to the one you're using now. These vectorized optimizers use Tensorflow's
 								`vectorized_map` operator, which may not work with some other Tensorflow
 								operators. If this is the case for you, please
 								[open an issue on the TF Privacy GitHub repository](https://github.com/tensorflow/privacy/issues).
 								## 2. Compute loss for your input minibatch
 								When computing the loss for your input minibatch, make sure it is a vector with
 								one entry per example, instead of aggregating it into a scalar. This is
 								necessary since DP-SGD must be able to compute the loss for individual
 								microbatches.
 								## 3. Train your model
 								Train your model using the DP Optimizer (step 1) and vectorized loss (step 2).
 								There are two options for doing this:
-												Update narrative content.

PiperOrigin-RevId: 394558889

											
										
										
											2021-09-02 16:37:43 -06:00
+								*   Pass the optimizer and loss as arguments to `Model.compile` before calling
-												    Add the getting started page to the external Tensorflow Responsible AI Guide.

PiperOrigin-RevId: 389961144

											
										
										
											2021-08-10 14:17:06 -06:00
+								    `Model.fit`.
-												Update narrative content.

PiperOrigin-RevId: 394558889

											
										
										
											2021-09-02 16:37:43 -06:00
+								*   When writing a custom training loop, use `Optimizer.minimize()` on the
-												    Add the getting started page to the external Tensorflow Responsible AI Guide.

PiperOrigin-RevId: 389961144

											
										
										
											2021-08-10 14:17:06 -06:00
+								    vectorized loss.
 								Once this is done, it’s recommended that you tune your hyperparameters. For a
 								complete walkthrough see the
 								[classification privacy tutorial](../tutorials/classification_privacy.ipynb)
 								## 4. Tune the DP-SGD hyperparameters
 								All `tf_privacy` optimizers take three additional hyperparameters:
 								*   `l2_norm_clip` or $C$ - Clipping norm (the maximum Euclidean (L2) norm of
 								    each individual gradient computed per minibatch).
 								*   `noise_multiplier` or $σ$ - Ratio of the standard deviation to the clipping
 								    norm.
 								*   `num_microbatches` or $B$ - Number of microbatches into which each minibatch
 								    is split.
 								Generally, the lower the effective standard deviation $σC / B$, the better the
 								performance of the trained model on its evaluation metrics.
 								The three new DP-SGD hyperparameters have the following effects and tradeoffs:
 .  The number of microbatches $B$: Generally, increasing this will improve
 								    utility because it lowers the standard deviation of the noise. However, it
 								    will slow down training in terms of time.
 .  The clipping norm $C$: Since the standard deviation of the noise scales with
-												Update narrative content.

PiperOrigin-RevId: 394558889

											
										
										
											2021-09-02 16:37:43 -06:00
+								    $C$, it is probably best to set $C$ to be some quantile (e.g. median, 75th
-												    Add the getting started page to the external Tensorflow Responsible AI Guide.

PiperOrigin-RevId: 389961144

											
										
										
											2021-08-10 14:17:06 -06:00
+								    percentile, 90th percentile) of the gradient norms. Having too large a value
 								    of $C$ adds unnecessarily large amounts of noise.
 .  The noise multiplier $σ$: Of the three hyperparameters, the amount of
 								    privacy depends only on the noise multiplier. The larger the noise
 								    multiplier, the more privacy is obtained; however, this also comes with a
 								    loss of utility.
 								These tradeoffs between utility, privacy, and speed in terms of steps/second are
 								summarized here:
 								![tradeoffs](./images/getting-started-img.png)
 								Follow these suggestions to find the optimal hyperparameters:
 								*   Set $C$ to a quantile as recommended above. A value of 1.00 often works
 								    well.
 								*   Set $B$ = 1, for maximum training speed.
 								*   Experiment to find the largest value of σ that still gives acceptable
 								    utility. Generally, values of 0.01 or lower have been observed to work well.
 								*   Once a suitable value of $σ$ is found, scale both $B$ and $σ$ by a constant
 								    to achieve a reasonable level of privacy.