Fix lint errors in tensorflow_privacy/tutorials/walkthrough/README.md.

PiperOrigin-RevId: 427030504
This commit is contained in:
Michael Reneer 2022-02-07 15:16:41 -08:00 committed by A. Unique TensorFlower
parent ceced43d0b
commit 5dc3475e17

View file

@ -8,26 +8,28 @@ design machine learning algorithms that responsibly train models on private
data. Learning with differential privacy provides provable guarantees of data. Learning with differential privacy provides provable guarantees of
privacy, mitigating the risk of exposing sensitive training data in machine privacy, mitigating the risk of exposing sensitive training data in machine
learning. Intuitively, a model trained with differential privacy should not be learning. Intuitively, a model trained with differential privacy should not be
affected by any single training example, or small set of training examples, in its data set. affected by any single training example, or small set of training examples, in
its data set.
You may recall our [previous blog post on PATE](http://www.cleverhans.io/privacy/2018/04/29/privacy-and-machine-learning.html), You may recall our
an approach that achieves private learning by carefully [previous blog post on PATE](http://www.cleverhans.io/privacy/2018/04/29/privacy-and-machine-learning.html),
coordinating the activity of several different ML an approach that achieves private learning by carefully coordinating the
models [[Papernot et al.]](https://arxiv.org/abs/1610.05755). activity of several different ML models
In this post, you will learn how to train a differentially private model with [[Papernot et al.]](https://arxiv.org/abs/1610.05755). In this post, you will
another approach that relies on Differentially learn how to train a differentially private model with another approach that
Private Stochastic Gradient Descent (DP-SGD) [[Abadi et al.]](https://arxiv.org/abs/1607.00133). relies on Differentially Private Stochastic Gradient Descent (DP-SGD)
DP-SGD and PATE are two different ways to achieve the same goal of privacy-preserving [[Abadi et al.]](https://arxiv.org/abs/1607.00133). DP-SGD and PATE are two
machine learning. DP-SGD makes less assumptions about the ML task than PATE, different ways to achieve the same goal of privacy-preserving machine learning.
but this comes at the expense of making modifications to the training algorithm. DP-SGD makes less assumptions about the ML task than PATE, but this comes at the
expense of making modifications to the training algorithm.
Indeed, DP-SGD is Indeed, DP-SGD is a modification of the stochastic gradient descent algorithm,
a modification of the stochastic gradient descent algorithm,
which is the basis for many optimizers that are popular in machine learning. which is the basis for many optimizers that are popular in machine learning.
Models trained with DP-SGD have provable privacy guarantees expressed in terms Models trained with DP-SGD have provable privacy guarantees expressed in terms
of differential privacy (we will explain what this means at the end of this of differential privacy (we will explain what this means at the end of this
post). We will be using the [TensorFlow Privacy](https://github.com/tensorflow/privacy) library, post). We will be using the
which provides an implementation of DP-SGD, to illustrate our presentation of DP-SGD [TensorFlow Privacy](https://github.com/tensorflow/privacy) library, which
provides an implementation of DP-SGD, to illustrate our presentation of DP-SGD
and provide a hands-on tutorial. and provide a hands-on tutorial.
The only prerequisite for following this tutorial is to be able to train a The only prerequisite for following this tutorial is to be able to train a
@ -36,13 +38,12 @@ convolutional neural networks or how to train them, we recommend reading
[this tutorial first](https://www.tensorflow.org/tutorials/keras/basic_classification) [this tutorial first](https://www.tensorflow.org/tutorials/keras/basic_classification)
to get started with TensorFlow and machine learning. to get started with TensorFlow and machine learning.
Upon completing the tutorial presented in this post, Upon completing the tutorial presented in this post, you will be able to wrap
you will be able to wrap existing optimizers existing optimizers (e.g., SGD, Adam, ...) into their differentially private
(e.g., SGD, Adam, ...) into their differentially private counterparts using counterparts using TensorFlow (TF) Privacy. You will also learn how to tune the
TensorFlow (TF) Privacy. You will also learn how to tune the parameters parameters introduced by differentially private optimization. Finally, we will
introduced by differentially private optimization. Finally, we will learn how to learn how to measure the privacy guarantees provided using analysis tools
measure the privacy guarantees provided using analysis tools included in TF included in TF Privacy.
Privacy.
## Getting started ## Getting started
@ -50,12 +51,14 @@ Before we get started with DP-SGD and TF Privacy, we need to put together a
script that trains a simple neural network with TensorFlow. script that trains a simple neural network with TensorFlow.
In the interest of keeping this tutorial focused on the privacy aspects of In the interest of keeping this tutorial focused on the privacy aspects of
training, we've included training, we've included such a script as companion code for this blog post in
such a script as companion code for this blog post in the `walkthrough` [subdirectory](https://github.com/tensorflow/privacy/tree/master/tutorials/walkthrough) of the the `walkthrough`
`tutorials` found in the [TensorFlow Privacy](https://github.com/tensorflow/privacy) repository. The code found in the file `mnist_scratch.py` [subdirectory](https://github.com/tensorflow/privacy/tree/master/tutorials/walkthrough)
trains a small of the `tutorials` found in the
convolutional neural network on the MNIST dataset for handwriting recognition. [TensorFlow Privacy](https://github.com/tensorflow/privacy) repository. The code
This script will be used as the basis for our exercise below. found in the file `mnist_scratch.py` trains a small convolutional neural network
on the MNIST dataset for handwriting recognition. This script will be used as
the basis for our exercise below.
Next, we highlight some important code snippets from the `mnist_scratch.py` Next, we highlight some important code snippets from the `mnist_scratch.py`
script. script.
@ -116,10 +119,10 @@ python mnist_scratch.py
### Stochastic Gradient Descent ### Stochastic Gradient Descent
Before we dive into how DP-SGD and TF Privacy can be used to provide differential privacy Before we dive into how DP-SGD and TF Privacy can be used to provide
during machine learning, we first provide a brief overview of the stochastic differential privacy during machine learning, we first provide a brief overview
gradient descent algorithm, which is one of the most popular optimizers for of the stochastic gradient descent algorithm, which is one of the most popular
neural networks. optimizers for neural networks.
Stochastic gradient descent is an iterative procedure. At each iteration, a Stochastic gradient descent is an iterative procedure. At each iteration, a
batch of data is randomly sampled from the training set (this is where the batch of data is randomly sampled from the training set (this is where the
@ -208,16 +211,16 @@ gradient manipulation later at step 4.
We are now ready to create an optimizer. In TensorFlow, an optimizer object can We are now ready to create an optimizer. In TensorFlow, an optimizer object can
be instantiated by passing it a learning rate value, which is used in step 6 be instantiated by passing it a learning rate value, which is used in step 6
outlined above. outlined above. This is what the code would look like *without* differential
This is what the code would look like *without* differential privacy: privacy:
```python ```python
optimizer = tf.train.GradientDescentOptimizer(FLAGS.learning_rate) optimizer = tf.train.GradientDescentOptimizer(FLAGS.learning_rate)
train_op = optimizer.minimize(loss=scalar_loss) train_op = optimizer.minimize(loss=scalar_loss)
``` ```
Note that our code snippet assumes that a TensorFlow flag was Note that our code snippet assumes that a TensorFlow flag was defined for the
defined for the learning rate value. learning rate value.
Now, we use the `optimizers.dp_optimizer` module of TF Privacy to implement the Now, we use the `optimizers.dp_optimizer` module of TF Privacy to implement the
optimizer with differential privacy. Under the hood, this code implements steps optimizer with differential privacy. Under the hood, this code implements steps
@ -233,17 +236,18 @@ optimizer = optimizers.dp_optimizer.DPGradientDescentGaussianOptimizer(
train_op = optimizer.minimize(loss=vector_loss) train_op = optimizer.minimize(loss=vector_loss)
``` ```
In these two code snippets, we used the stochastic gradient descent In these two code snippets, we used the stochastic gradient descent optimizer
optimizer but it could be replaced by another optimizer implemented in but it could be replaced by another optimizer implemented in TensorFlow. For
TensorFlow. For instance, the `AdamOptimizer` can be replaced by `DPAdamGaussianOptimizer`. In addition to the standard optimizers already instance, the `AdamOptimizer` can be replaced by `DPAdamGaussianOptimizer`. In
included in TF Privacy, most optimizers which are objects from a child class addition to the standard optimizers already included in TF Privacy, most
of `tf.train.Optimizer` optimizers which are objects from a child class of `tf.train.Optimizer` can be
can be made differentially private by calling `optimizers.dp_optimizer.make_gaussian_optimizer_class()`. made differentially private by calling
`optimizers.dp_optimizer.make_gaussian_optimizer_class()`.
As you can see, only one line needs to change but there are a few things going As you can see, only one line needs to change but there are a few things going
on that are best to unwrap before we continue. In addition to the learning rate, we on that are best to unwrap before we continue. In addition to the learning rate,
passed the size of the training set as the `population_size` parameter. This is we passed the size of the training set as the `population_size` parameter. This
used to measure the strength of privacy achieved; we will come back to this is used to measure the strength of privacy achieved; we will come back to this
accounting aspect later. accounting aspect later.
More importantly, TF Privacy introduces three new hyperparameters to the More importantly, TF Privacy introduces three new hyperparameters to the
@ -252,11 +256,11 @@ You may have deduced what `l2_norm_clip` and `noise_multiplier` are from the two
changes outlined above. changes outlined above.
Parameter `l2_norm_clip` is the maximum Euclidean norm of each individual Parameter `l2_norm_clip` is the maximum Euclidean norm of each individual
gradient that is computed on an individual training example from a minibatch. This gradient that is computed on an individual training example from a minibatch.
parameter is used to bound the optimizer's sensitivity to individual training This parameter is used to bound the optimizer's sensitivity to individual
points. Note how in order for the optimizer to be able to compute these per training points. Note how in order for the optimizer to be able to compute these
example gradients, we must pass it a *vector* loss as defined previously, rather per example gradients, we must pass it a *vector* loss as defined previously,
than the loss averaged over the entire minibatch. rather than the loss averaged over the entire minibatch.
Next, the `noise_multiplier` parameter is used to control how much noise is Next, the `noise_multiplier` parameter is used to control how much noise is
sampled and added to gradients before they are applied by the optimizer. sampled and added to gradients before they are applied by the optimizer.
@ -320,9 +324,9 @@ single training example in the training set. This could mean to add a training
example, remove a training example, or change the values within one training example, remove a training example, or change the values within one training
example. The intuition is that if a single training point does not affect the example. The intuition is that if a single training point does not affect the
outcome of learning, the information contained in that training point cannot be outcome of learning, the information contained in that training point cannot be
memorized and the privacy of the individual who contributed this data point to our memorized and the privacy of the individual who contributed this data point to
dataset is respected. We often refer to this probability as the privacy budget: our dataset is respected. We often refer to this probability as the privacy
smaller privacy budgets correspond to stronger privacy guarantees. budget: smaller privacy budgets correspond to stronger privacy guarantees.
Accounting required to compute the privacy budget spent to train our machine Accounting required to compute the privacy budget spent to train our machine
learning model is another feature provided by TF Privacy. Knowing what level of learning model is another feature provided by TF Privacy. Knowing what level of
@ -348,15 +352,16 @@ steps = FLAGS.epochs * 60000 // FLAGS.batch_size
At a high level, the privacy analysis measures how including or excluding any At a high level, the privacy analysis measures how including or excluding any
particular point in the training data is likely to change the probability that particular point in the training data is likely to change the probability that
we learn any particular set of parameters. In other words, the analysis measures we learn any particular set of parameters. In other words, the analysis measures
the difference between the distributions of model parameters on neighboring training the difference between the distributions of model parameters on neighboring
sets (pairs of any training sets with a Hamming distance of 1). In TF Privacy, training sets (pairs of any training sets with a Hamming distance of 1). In TF
we use the Rényi divergence to measure this distance between distributions. Privacy, we use the Rényi divergence to measure this distance between
Indeed, our analysis is performed in the framework of Rényi Differential Privacy distributions. Indeed, our analysis is performed in the framework of Rényi
(RDP), which is a generalization of pure differential privacy Differential Privacy (RDP), which is a generalization of pure differential
[[Mironov]](https://arxiv.org/abs/1702.07476). RDP is a useful tool here because privacy [[Mironov]](https://arxiv.org/abs/1702.07476). RDP is a useful tool here
it is particularly well suited to analyze the differential privacy guarantees because it is particularly well suited to analyze the differential privacy
provided by sampling followed by Gaussian noise addition, which is how gradients guarantees provided by sampling followed by Gaussian noise addition, which is
are randomized in the TF Privacy implementation of the DP-SGD optimizer. how gradients are randomized in the TF Privacy implementation of the DP-SGD
optimizer.
We will express our differential privacy guarantee using two parameters: We will express our differential privacy guarantee using two parameters:
`epsilon` and `delta`. `epsilon` and `delta`.
@ -374,21 +379,22 @@ We will express our differential privacy guarantee using two parameters:
still mean good practical privacy. still mean good practical privacy.
The TF Privacy library provides two methods relevant to derive privacy The TF Privacy library provides two methods relevant to derive privacy
guarantees achieved from the three parameters outlined in the last code snippet: `compute_rdp` guarantees achieved from the three parameters outlined in the last code snippet:
and `get_privacy_spent`. `compute_rdp` and `get_privacy_spent`. These methods are found in its
These methods are found in its `analysis.rdp_accountant` module. Here is how to use them. `analysis.rdp_accountant` module. Here is how to use them.
First, we need to define a list of orders, at which the Rényi divergence will be First, we need to define a list of orders, at which the Rényi divergence will be
computed. While some finer points of how to use the RDP accountant are outside the computed. While some finer points of how to use the RDP accountant are outside
scope of this document, it is useful to keep in mind the following. the scope of this document, it is useful to keep in mind the following. First,
First, there is very little downside in expanding the list of orders for which RDP there is very little downside in expanding the list of orders for which RDP is
is computed. Second, the computed privacy budget is typically not very sensitive to computed. Second, the computed privacy budget is typically not very sensitive to
the exact value of the order (being close enough will land you in the right neighborhood). the exact value of the order (being close enough will land you in the right
Finally, if you are targeting a particular range of epsilons (say, 1—10) and your delta is neighborhood). Finally, if you are targeting a particular range of epsilons
fixed (say, `10^-5`), then your orders must cover the range between `1+ln(1/delta)/10≈2.15` and (say, 1—10) and your delta is fixed (say, `10^-5`), then your orders must cover
`1+ln(1/delta)/1≈12.5`. This last rule may appear circular (how do you know what privacy the range between `1+ln(1/delta)/10≈2.15` and `1+ln(1/delta)/1≈12.5`. This last
parameters you get without running the privacy accountant?!), one or two adjustments rule may appear circular (how do you know what privacy parameters you get
of the range of the orders would usually suffice. without running the privacy accountant?!), one or two adjustments of the range
of the orders would usually suffice.
```python ```python
orders = [1 + x / 10. for x in range(1, 100)] + list(range(12, 64)) orders = [1 + x / 10. for x in range(1, 100)] + list(range(12, 64))
@ -409,12 +415,10 @@ Running the code snippets above with the hyperparameter values used during
training will estimate the `epsilon` value that was achieved by the training will estimate the `epsilon` value that was achieved by the
differentially private optimizer, and thus the strength of the privacy guarantee differentially private optimizer, and thus the strength of the privacy guarantee
which comes with the model we trained. Once we computed the value of `epsilon`, which comes with the model we trained. Once we computed the value of `epsilon`,
interpreting this value is at times interpreting this value is at times difficult. One possibility is to purposely
difficult. One possibility is to purposely insert secrets in the model's training set and measure how likely they are to be
insert secrets in the model's training set and measure how likely leaked by a differentially private model (compared to a non-private model) at
they are to be leaked by a differentially private model inference time [[Carlini et al.]](https://arxiv.org/abs/1802.08232).
(compared to a non-private model) at inference time
[[Carlini et al.]](https://arxiv.org/abs/1802.08232).
### Putting all the pieces together ### Putting all the pieces together
@ -425,7 +429,7 @@ achieved.
However, in case you ran into an issue or you'd like to see what a complete However, in case you ran into an issue or you'd like to see what a complete
implementation looks like, the "solution" to the tutorial presented in this blog implementation looks like, the "solution" to the tutorial presented in this blog
post can be [found](https://github.com/tensorflow/privacy/blob/master/tutorials/mnist_dpsgd_tutorial.py) in the post can be
tutorials directory of TF Privacy. It is the script called `mnist_dpsgd_tutorial.py`. [found](https://github.com/tensorflow/privacy/blob/master/tutorials/mnist_dpsgd_tutorial.py)
in the tutorials directory of TF Privacy. It is the script called
`mnist_dpsgd_tutorial.py`.