Fix lint errors in tensorflow_privacy/tutorials/walkthrough/README.md
.
PiperOrigin-RevId: 427030504
This commit is contained in:
parent
ceced43d0b
commit
5dc3475e17
1 changed files with 90 additions and 86 deletions
|
@ -8,26 +8,28 @@ design machine learning algorithms that responsibly train models on private
|
|||
data. Learning with differential privacy provides provable guarantees of
|
||||
privacy, mitigating the risk of exposing sensitive training data in machine
|
||||
learning. Intuitively, a model trained with differential privacy should not be
|
||||
affected by any single training example, or small set of training examples, in its data set.
|
||||
affected by any single training example, or small set of training examples, in
|
||||
its data set.
|
||||
|
||||
You may recall our [previous blog post on PATE](http://www.cleverhans.io/privacy/2018/04/29/privacy-and-machine-learning.html),
|
||||
an approach that achieves private learning by carefully
|
||||
coordinating the activity of several different ML
|
||||
models [[Papernot et al.]](https://arxiv.org/abs/1610.05755).
|
||||
In this post, you will learn how to train a differentially private model with
|
||||
another approach that relies on Differentially
|
||||
Private Stochastic Gradient Descent (DP-SGD) [[Abadi et al.]](https://arxiv.org/abs/1607.00133).
|
||||
DP-SGD and PATE are two different ways to achieve the same goal of privacy-preserving
|
||||
machine learning. DP-SGD makes less assumptions about the ML task than PATE,
|
||||
but this comes at the expense of making modifications to the training algorithm.
|
||||
You may recall our
|
||||
[previous blog post on PATE](http://www.cleverhans.io/privacy/2018/04/29/privacy-and-machine-learning.html),
|
||||
an approach that achieves private learning by carefully coordinating the
|
||||
activity of several different ML models
|
||||
[[Papernot et al.]](https://arxiv.org/abs/1610.05755). In this post, you will
|
||||
learn how to train a differentially private model with another approach that
|
||||
relies on Differentially Private Stochastic Gradient Descent (DP-SGD)
|
||||
[[Abadi et al.]](https://arxiv.org/abs/1607.00133). DP-SGD and PATE are two
|
||||
different ways to achieve the same goal of privacy-preserving machine learning.
|
||||
DP-SGD makes less assumptions about the ML task than PATE, but this comes at the
|
||||
expense of making modifications to the training algorithm.
|
||||
|
||||
Indeed, DP-SGD is
|
||||
a modification of the stochastic gradient descent algorithm,
|
||||
Indeed, DP-SGD is a modification of the stochastic gradient descent algorithm,
|
||||
which is the basis for many optimizers that are popular in machine learning.
|
||||
Models trained with DP-SGD have provable privacy guarantees expressed in terms
|
||||
of differential privacy (we will explain what this means at the end of this
|
||||
post). We will be using the [TensorFlow Privacy](https://github.com/tensorflow/privacy) library,
|
||||
which provides an implementation of DP-SGD, to illustrate our presentation of DP-SGD
|
||||
post). We will be using the
|
||||
[TensorFlow Privacy](https://github.com/tensorflow/privacy) library, which
|
||||
provides an implementation of DP-SGD, to illustrate our presentation of DP-SGD
|
||||
and provide a hands-on tutorial.
|
||||
|
||||
The only prerequisite for following this tutorial is to be able to train a
|
||||
|
@ -36,13 +38,12 @@ convolutional neural networks or how to train them, we recommend reading
|
|||
[this tutorial first](https://www.tensorflow.org/tutorials/keras/basic_classification)
|
||||
to get started with TensorFlow and machine learning.
|
||||
|
||||
Upon completing the tutorial presented in this post,
|
||||
you will be able to wrap existing optimizers
|
||||
(e.g., SGD, Adam, ...) into their differentially private counterparts using
|
||||
TensorFlow (TF) Privacy. You will also learn how to tune the parameters
|
||||
introduced by differentially private optimization. Finally, we will learn how to
|
||||
measure the privacy guarantees provided using analysis tools included in TF
|
||||
Privacy.
|
||||
Upon completing the tutorial presented in this post, you will be able to wrap
|
||||
existing optimizers (e.g., SGD, Adam, ...) into their differentially private
|
||||
counterparts using TensorFlow (TF) Privacy. You will also learn how to tune the
|
||||
parameters introduced by differentially private optimization. Finally, we will
|
||||
learn how to measure the privacy guarantees provided using analysis tools
|
||||
included in TF Privacy.
|
||||
|
||||
## Getting started
|
||||
|
||||
|
@ -50,12 +51,14 @@ Before we get started with DP-SGD and TF Privacy, we need to put together a
|
|||
script that trains a simple neural network with TensorFlow.
|
||||
|
||||
In the interest of keeping this tutorial focused on the privacy aspects of
|
||||
training, we've included
|
||||
such a script as companion code for this blog post in the `walkthrough` [subdirectory](https://github.com/tensorflow/privacy/tree/master/tutorials/walkthrough) of the
|
||||
`tutorials` found in the [TensorFlow Privacy](https://github.com/tensorflow/privacy) repository. The code found in the file `mnist_scratch.py`
|
||||
trains a small
|
||||
convolutional neural network on the MNIST dataset for handwriting recognition.
|
||||
This script will be used as the basis for our exercise below.
|
||||
training, we've included such a script as companion code for this blog post in
|
||||
the `walkthrough`
|
||||
[subdirectory](https://github.com/tensorflow/privacy/tree/master/tutorials/walkthrough)
|
||||
of the `tutorials` found in the
|
||||
[TensorFlow Privacy](https://github.com/tensorflow/privacy) repository. The code
|
||||
found in the file `mnist_scratch.py` trains a small convolutional neural network
|
||||
on the MNIST dataset for handwriting recognition. This script will be used as
|
||||
the basis for our exercise below.
|
||||
|
||||
Next, we highlight some important code snippets from the `mnist_scratch.py`
|
||||
script.
|
||||
|
@ -116,10 +119,10 @@ python mnist_scratch.py
|
|||
|
||||
### Stochastic Gradient Descent
|
||||
|
||||
Before we dive into how DP-SGD and TF Privacy can be used to provide differential privacy
|
||||
during machine learning, we first provide a brief overview of the stochastic
|
||||
gradient descent algorithm, which is one of the most popular optimizers for
|
||||
neural networks.
|
||||
Before we dive into how DP-SGD and TF Privacy can be used to provide
|
||||
differential privacy during machine learning, we first provide a brief overview
|
||||
of the stochastic gradient descent algorithm, which is one of the most popular
|
||||
optimizers for neural networks.
|
||||
|
||||
Stochastic gradient descent is an iterative procedure. At each iteration, a
|
||||
batch of data is randomly sampled from the training set (this is where the
|
||||
|
@ -208,16 +211,16 @@ gradient manipulation later at step 4.
|
|||
|
||||
We are now ready to create an optimizer. In TensorFlow, an optimizer object can
|
||||
be instantiated by passing it a learning rate value, which is used in step 6
|
||||
outlined above.
|
||||
This is what the code would look like *without* differential privacy:
|
||||
outlined above. This is what the code would look like *without* differential
|
||||
privacy:
|
||||
|
||||
```python
|
||||
optimizer = tf.train.GradientDescentOptimizer(FLAGS.learning_rate)
|
||||
train_op = optimizer.minimize(loss=scalar_loss)
|
||||
```
|
||||
|
||||
Note that our code snippet assumes that a TensorFlow flag was
|
||||
defined for the learning rate value.
|
||||
Note that our code snippet assumes that a TensorFlow flag was defined for the
|
||||
learning rate value.
|
||||
|
||||
Now, we use the `optimizers.dp_optimizer` module of TF Privacy to implement the
|
||||
optimizer with differential privacy. Under the hood, this code implements steps
|
||||
|
@ -233,17 +236,18 @@ optimizer = optimizers.dp_optimizer.DPGradientDescentGaussianOptimizer(
|
|||
train_op = optimizer.minimize(loss=vector_loss)
|
||||
```
|
||||
|
||||
In these two code snippets, we used the stochastic gradient descent
|
||||
optimizer but it could be replaced by another optimizer implemented in
|
||||
TensorFlow. For instance, the `AdamOptimizer` can be replaced by `DPAdamGaussianOptimizer`. In addition to the standard optimizers already
|
||||
included in TF Privacy, most optimizers which are objects from a child class
|
||||
of `tf.train.Optimizer`
|
||||
can be made differentially private by calling `optimizers.dp_optimizer.make_gaussian_optimizer_class()`.
|
||||
In these two code snippets, we used the stochastic gradient descent optimizer
|
||||
but it could be replaced by another optimizer implemented in TensorFlow. For
|
||||
instance, the `AdamOptimizer` can be replaced by `DPAdamGaussianOptimizer`. In
|
||||
addition to the standard optimizers already included in TF Privacy, most
|
||||
optimizers which are objects from a child class of `tf.train.Optimizer` can be
|
||||
made differentially private by calling
|
||||
`optimizers.dp_optimizer.make_gaussian_optimizer_class()`.
|
||||
|
||||
As you can see, only one line needs to change but there are a few things going
|
||||
on that are best to unwrap before we continue. In addition to the learning rate, we
|
||||
passed the size of the training set as the `population_size` parameter. This is
|
||||
used to measure the strength of privacy achieved; we will come back to this
|
||||
on that are best to unwrap before we continue. In addition to the learning rate,
|
||||
we passed the size of the training set as the `population_size` parameter. This
|
||||
is used to measure the strength of privacy achieved; we will come back to this
|
||||
accounting aspect later.
|
||||
|
||||
More importantly, TF Privacy introduces three new hyperparameters to the
|
||||
|
@ -252,11 +256,11 @@ You may have deduced what `l2_norm_clip` and `noise_multiplier` are from the two
|
|||
changes outlined above.
|
||||
|
||||
Parameter `l2_norm_clip` is the maximum Euclidean norm of each individual
|
||||
gradient that is computed on an individual training example from a minibatch. This
|
||||
parameter is used to bound the optimizer's sensitivity to individual training
|
||||
points. Note how in order for the optimizer to be able to compute these per
|
||||
example gradients, we must pass it a *vector* loss as defined previously, rather
|
||||
than the loss averaged over the entire minibatch.
|
||||
gradient that is computed on an individual training example from a minibatch.
|
||||
This parameter is used to bound the optimizer's sensitivity to individual
|
||||
training points. Note how in order for the optimizer to be able to compute these
|
||||
per example gradients, we must pass it a *vector* loss as defined previously,
|
||||
rather than the loss averaged over the entire minibatch.
|
||||
|
||||
Next, the `noise_multiplier` parameter is used to control how much noise is
|
||||
sampled and added to gradients before they are applied by the optimizer.
|
||||
|
@ -320,9 +324,9 @@ single training example in the training set. This could mean to add a training
|
|||
example, remove a training example, or change the values within one training
|
||||
example. The intuition is that if a single training point does not affect the
|
||||
outcome of learning, the information contained in that training point cannot be
|
||||
memorized and the privacy of the individual who contributed this data point to our
|
||||
dataset is respected. We often refer to this probability as the privacy budget:
|
||||
smaller privacy budgets correspond to stronger privacy guarantees.
|
||||
memorized and the privacy of the individual who contributed this data point to
|
||||
our dataset is respected. We often refer to this probability as the privacy
|
||||
budget: smaller privacy budgets correspond to stronger privacy guarantees.
|
||||
|
||||
Accounting required to compute the privacy budget spent to train our machine
|
||||
learning model is another feature provided by TF Privacy. Knowing what level of
|
||||
|
@ -348,15 +352,16 @@ steps = FLAGS.epochs * 60000 // FLAGS.batch_size
|
|||
At a high level, the privacy analysis measures how including or excluding any
|
||||
particular point in the training data is likely to change the probability that
|
||||
we learn any particular set of parameters. In other words, the analysis measures
|
||||
the difference between the distributions of model parameters on neighboring training
|
||||
sets (pairs of any training sets with a Hamming distance of 1). In TF Privacy,
|
||||
we use the Rényi divergence to measure this distance between distributions.
|
||||
Indeed, our analysis is performed in the framework of Rényi Differential Privacy
|
||||
(RDP), which is a generalization of pure differential privacy
|
||||
[[Mironov]](https://arxiv.org/abs/1702.07476). RDP is a useful tool here because
|
||||
it is particularly well suited to analyze the differential privacy guarantees
|
||||
provided by sampling followed by Gaussian noise addition, which is how gradients
|
||||
are randomized in the TF Privacy implementation of the DP-SGD optimizer.
|
||||
the difference between the distributions of model parameters on neighboring
|
||||
training sets (pairs of any training sets with a Hamming distance of 1). In TF
|
||||
Privacy, we use the Rényi divergence to measure this distance between
|
||||
distributions. Indeed, our analysis is performed in the framework of Rényi
|
||||
Differential Privacy (RDP), which is a generalization of pure differential
|
||||
privacy [[Mironov]](https://arxiv.org/abs/1702.07476). RDP is a useful tool here
|
||||
because it is particularly well suited to analyze the differential privacy
|
||||
guarantees provided by sampling followed by Gaussian noise addition, which is
|
||||
how gradients are randomized in the TF Privacy implementation of the DP-SGD
|
||||
optimizer.
|
||||
|
||||
We will express our differential privacy guarantee using two parameters:
|
||||
`epsilon` and `delta`.
|
||||
|
@ -374,21 +379,22 @@ We will express our differential privacy guarantee using two parameters:
|
|||
still mean good practical privacy.
|
||||
|
||||
The TF Privacy library provides two methods relevant to derive privacy
|
||||
guarantees achieved from the three parameters outlined in the last code snippet: `compute_rdp`
|
||||
and `get_privacy_spent`.
|
||||
These methods are found in its `analysis.rdp_accountant` module. Here is how to use them.
|
||||
guarantees achieved from the three parameters outlined in the last code snippet:
|
||||
`compute_rdp` and `get_privacy_spent`. These methods are found in its
|
||||
`analysis.rdp_accountant` module. Here is how to use them.
|
||||
|
||||
First, we need to define a list of orders, at which the Rényi divergence will be
|
||||
computed. While some finer points of how to use the RDP accountant are outside the
|
||||
scope of this document, it is useful to keep in mind the following.
|
||||
First, there is very little downside in expanding the list of orders for which RDP
|
||||
is computed. Second, the computed privacy budget is typically not very sensitive to
|
||||
the exact value of the order (being close enough will land you in the right neighborhood).
|
||||
Finally, if you are targeting a particular range of epsilons (say, 1—10) and your delta is
|
||||
fixed (say, `10^-5`), then your orders must cover the range between `1+ln(1/delta)/10≈2.15` and
|
||||
`1+ln(1/delta)/1≈12.5`. This last rule may appear circular (how do you know what privacy
|
||||
parameters you get without running the privacy accountant?!), one or two adjustments
|
||||
of the range of the orders would usually suffice.
|
||||
computed. While some finer points of how to use the RDP accountant are outside
|
||||
the scope of this document, it is useful to keep in mind the following. First,
|
||||
there is very little downside in expanding the list of orders for which RDP is
|
||||
computed. Second, the computed privacy budget is typically not very sensitive to
|
||||
the exact value of the order (being close enough will land you in the right
|
||||
neighborhood). Finally, if you are targeting a particular range of epsilons
|
||||
(say, 1—10) and your delta is fixed (say, `10^-5`), then your orders must cover
|
||||
the range between `1+ln(1/delta)/10≈2.15` and `1+ln(1/delta)/1≈12.5`. This last
|
||||
rule may appear circular (how do you know what privacy parameters you get
|
||||
without running the privacy accountant?!), one or two adjustments of the range
|
||||
of the orders would usually suffice.
|
||||
|
||||
```python
|
||||
orders = [1 + x / 10. for x in range(1, 100)] + list(range(12, 64))
|
||||
|
@ -408,13 +414,11 @@ epsilon = get_privacy_spent(orders, rdp, target_delta=1e-5)[0]
|
|||
Running the code snippets above with the hyperparameter values used during
|
||||
training will estimate the `epsilon` value that was achieved by the
|
||||
differentially private optimizer, and thus the strength of the privacy guarantee
|
||||
which comes with the model we trained. Once we computed the value of `epsilon`,
|
||||
interpreting this value is at times
|
||||
difficult. One possibility is to purposely
|
||||
insert secrets in the model's training set and measure how likely
|
||||
they are to be leaked by a differentially private model
|
||||
(compared to a non-private model) at inference time
|
||||
[[Carlini et al.]](https://arxiv.org/abs/1802.08232).
|
||||
which comes with the model we trained. Once we computed the value of `epsilon`,
|
||||
interpreting this value is at times difficult. One possibility is to purposely
|
||||
insert secrets in the model's training set and measure how likely they are to be
|
||||
leaked by a differentially private model (compared to a non-private model) at
|
||||
inference time [[Carlini et al.]](https://arxiv.org/abs/1802.08232).
|
||||
|
||||
### Putting all the pieces together
|
||||
|
||||
|
@ -425,7 +429,7 @@ achieved.
|
|||
|
||||
However, in case you ran into an issue or you'd like to see what a complete
|
||||
implementation looks like, the "solution" to the tutorial presented in this blog
|
||||
post can be [found](https://github.com/tensorflow/privacy/blob/master/tutorials/mnist_dpsgd_tutorial.py) in the
|
||||
tutorials directory of TF Privacy. It is the script called `mnist_dpsgd_tutorial.py`.
|
||||
|
||||
|
||||
post can be
|
||||
[found](https://github.com/tensorflow/privacy/blob/master/tutorials/mnist_dpsgd_tutorial.py)
|
||||
in the tutorials directory of TF Privacy. It is the script called
|
||||
`mnist_dpsgd_tutorial.py`.
|
||||
|
|
Loading…
Reference in a new issue