Fix lint errors in tensorflow_privacy/tutorials/walkthrough/README.md
.
PiperOrigin-RevId: 427030504
This commit is contained in:
parent
ceced43d0b
commit
5dc3475e17
1 changed files with 90 additions and 86 deletions
|
@ -8,26 +8,28 @@ design machine learning algorithms that responsibly train models on private
|
||||||
data. Learning with differential privacy provides provable guarantees of
|
data. Learning with differential privacy provides provable guarantees of
|
||||||
privacy, mitigating the risk of exposing sensitive training data in machine
|
privacy, mitigating the risk of exposing sensitive training data in machine
|
||||||
learning. Intuitively, a model trained with differential privacy should not be
|
learning. Intuitively, a model trained with differential privacy should not be
|
||||||
affected by any single training example, or small set of training examples, in its data set.
|
affected by any single training example, or small set of training examples, in
|
||||||
|
its data set.
|
||||||
|
|
||||||
You may recall our [previous blog post on PATE](http://www.cleverhans.io/privacy/2018/04/29/privacy-and-machine-learning.html),
|
You may recall our
|
||||||
an approach that achieves private learning by carefully
|
[previous blog post on PATE](http://www.cleverhans.io/privacy/2018/04/29/privacy-and-machine-learning.html),
|
||||||
coordinating the activity of several different ML
|
an approach that achieves private learning by carefully coordinating the
|
||||||
models [[Papernot et al.]](https://arxiv.org/abs/1610.05755).
|
activity of several different ML models
|
||||||
In this post, you will learn how to train a differentially private model with
|
[[Papernot et al.]](https://arxiv.org/abs/1610.05755). In this post, you will
|
||||||
another approach that relies on Differentially
|
learn how to train a differentially private model with another approach that
|
||||||
Private Stochastic Gradient Descent (DP-SGD) [[Abadi et al.]](https://arxiv.org/abs/1607.00133).
|
relies on Differentially Private Stochastic Gradient Descent (DP-SGD)
|
||||||
DP-SGD and PATE are two different ways to achieve the same goal of privacy-preserving
|
[[Abadi et al.]](https://arxiv.org/abs/1607.00133). DP-SGD and PATE are two
|
||||||
machine learning. DP-SGD makes less assumptions about the ML task than PATE,
|
different ways to achieve the same goal of privacy-preserving machine learning.
|
||||||
but this comes at the expense of making modifications to the training algorithm.
|
DP-SGD makes less assumptions about the ML task than PATE, but this comes at the
|
||||||
|
expense of making modifications to the training algorithm.
|
||||||
|
|
||||||
Indeed, DP-SGD is
|
Indeed, DP-SGD is a modification of the stochastic gradient descent algorithm,
|
||||||
a modification of the stochastic gradient descent algorithm,
|
|
||||||
which is the basis for many optimizers that are popular in machine learning.
|
which is the basis for many optimizers that are popular in machine learning.
|
||||||
Models trained with DP-SGD have provable privacy guarantees expressed in terms
|
Models trained with DP-SGD have provable privacy guarantees expressed in terms
|
||||||
of differential privacy (we will explain what this means at the end of this
|
of differential privacy (we will explain what this means at the end of this
|
||||||
post). We will be using the [TensorFlow Privacy](https://github.com/tensorflow/privacy) library,
|
post). We will be using the
|
||||||
which provides an implementation of DP-SGD, to illustrate our presentation of DP-SGD
|
[TensorFlow Privacy](https://github.com/tensorflow/privacy) library, which
|
||||||
|
provides an implementation of DP-SGD, to illustrate our presentation of DP-SGD
|
||||||
and provide a hands-on tutorial.
|
and provide a hands-on tutorial.
|
||||||
|
|
||||||
The only prerequisite for following this tutorial is to be able to train a
|
The only prerequisite for following this tutorial is to be able to train a
|
||||||
|
@ -36,13 +38,12 @@ convolutional neural networks or how to train them, we recommend reading
|
||||||
[this tutorial first](https://www.tensorflow.org/tutorials/keras/basic_classification)
|
[this tutorial first](https://www.tensorflow.org/tutorials/keras/basic_classification)
|
||||||
to get started with TensorFlow and machine learning.
|
to get started with TensorFlow and machine learning.
|
||||||
|
|
||||||
Upon completing the tutorial presented in this post,
|
Upon completing the tutorial presented in this post, you will be able to wrap
|
||||||
you will be able to wrap existing optimizers
|
existing optimizers (e.g., SGD, Adam, ...) into their differentially private
|
||||||
(e.g., SGD, Adam, ...) into their differentially private counterparts using
|
counterparts using TensorFlow (TF) Privacy. You will also learn how to tune the
|
||||||
TensorFlow (TF) Privacy. You will also learn how to tune the parameters
|
parameters introduced by differentially private optimization. Finally, we will
|
||||||
introduced by differentially private optimization. Finally, we will learn how to
|
learn how to measure the privacy guarantees provided using analysis tools
|
||||||
measure the privacy guarantees provided using analysis tools included in TF
|
included in TF Privacy.
|
||||||
Privacy.
|
|
||||||
|
|
||||||
## Getting started
|
## Getting started
|
||||||
|
|
||||||
|
@ -50,12 +51,14 @@ Before we get started with DP-SGD and TF Privacy, we need to put together a
|
||||||
script that trains a simple neural network with TensorFlow.
|
script that trains a simple neural network with TensorFlow.
|
||||||
|
|
||||||
In the interest of keeping this tutorial focused on the privacy aspects of
|
In the interest of keeping this tutorial focused on the privacy aspects of
|
||||||
training, we've included
|
training, we've included such a script as companion code for this blog post in
|
||||||
such a script as companion code for this blog post in the `walkthrough` [subdirectory](https://github.com/tensorflow/privacy/tree/master/tutorials/walkthrough) of the
|
the `walkthrough`
|
||||||
`tutorials` found in the [TensorFlow Privacy](https://github.com/tensorflow/privacy) repository. The code found in the file `mnist_scratch.py`
|
[subdirectory](https://github.com/tensorflow/privacy/tree/master/tutorials/walkthrough)
|
||||||
trains a small
|
of the `tutorials` found in the
|
||||||
convolutional neural network on the MNIST dataset for handwriting recognition.
|
[TensorFlow Privacy](https://github.com/tensorflow/privacy) repository. The code
|
||||||
This script will be used as the basis for our exercise below.
|
found in the file `mnist_scratch.py` trains a small convolutional neural network
|
||||||
|
on the MNIST dataset for handwriting recognition. This script will be used as
|
||||||
|
the basis for our exercise below.
|
||||||
|
|
||||||
Next, we highlight some important code snippets from the `mnist_scratch.py`
|
Next, we highlight some important code snippets from the `mnist_scratch.py`
|
||||||
script.
|
script.
|
||||||
|
@ -116,10 +119,10 @@ python mnist_scratch.py
|
||||||
|
|
||||||
### Stochastic Gradient Descent
|
### Stochastic Gradient Descent
|
||||||
|
|
||||||
Before we dive into how DP-SGD and TF Privacy can be used to provide differential privacy
|
Before we dive into how DP-SGD and TF Privacy can be used to provide
|
||||||
during machine learning, we first provide a brief overview of the stochastic
|
differential privacy during machine learning, we first provide a brief overview
|
||||||
gradient descent algorithm, which is one of the most popular optimizers for
|
of the stochastic gradient descent algorithm, which is one of the most popular
|
||||||
neural networks.
|
optimizers for neural networks.
|
||||||
|
|
||||||
Stochastic gradient descent is an iterative procedure. At each iteration, a
|
Stochastic gradient descent is an iterative procedure. At each iteration, a
|
||||||
batch of data is randomly sampled from the training set (this is where the
|
batch of data is randomly sampled from the training set (this is where the
|
||||||
|
@ -208,16 +211,16 @@ gradient manipulation later at step 4.
|
||||||
|
|
||||||
We are now ready to create an optimizer. In TensorFlow, an optimizer object can
|
We are now ready to create an optimizer. In TensorFlow, an optimizer object can
|
||||||
be instantiated by passing it a learning rate value, which is used in step 6
|
be instantiated by passing it a learning rate value, which is used in step 6
|
||||||
outlined above.
|
outlined above. This is what the code would look like *without* differential
|
||||||
This is what the code would look like *without* differential privacy:
|
privacy:
|
||||||
|
|
||||||
```python
|
```python
|
||||||
optimizer = tf.train.GradientDescentOptimizer(FLAGS.learning_rate)
|
optimizer = tf.train.GradientDescentOptimizer(FLAGS.learning_rate)
|
||||||
train_op = optimizer.minimize(loss=scalar_loss)
|
train_op = optimizer.minimize(loss=scalar_loss)
|
||||||
```
|
```
|
||||||
|
|
||||||
Note that our code snippet assumes that a TensorFlow flag was
|
Note that our code snippet assumes that a TensorFlow flag was defined for the
|
||||||
defined for the learning rate value.
|
learning rate value.
|
||||||
|
|
||||||
Now, we use the `optimizers.dp_optimizer` module of TF Privacy to implement the
|
Now, we use the `optimizers.dp_optimizer` module of TF Privacy to implement the
|
||||||
optimizer with differential privacy. Under the hood, this code implements steps
|
optimizer with differential privacy. Under the hood, this code implements steps
|
||||||
|
@ -233,17 +236,18 @@ optimizer = optimizers.dp_optimizer.DPGradientDescentGaussianOptimizer(
|
||||||
train_op = optimizer.minimize(loss=vector_loss)
|
train_op = optimizer.minimize(loss=vector_loss)
|
||||||
```
|
```
|
||||||
|
|
||||||
In these two code snippets, we used the stochastic gradient descent
|
In these two code snippets, we used the stochastic gradient descent optimizer
|
||||||
optimizer but it could be replaced by another optimizer implemented in
|
but it could be replaced by another optimizer implemented in TensorFlow. For
|
||||||
TensorFlow. For instance, the `AdamOptimizer` can be replaced by `DPAdamGaussianOptimizer`. In addition to the standard optimizers already
|
instance, the `AdamOptimizer` can be replaced by `DPAdamGaussianOptimizer`. In
|
||||||
included in TF Privacy, most optimizers which are objects from a child class
|
addition to the standard optimizers already included in TF Privacy, most
|
||||||
of `tf.train.Optimizer`
|
optimizers which are objects from a child class of `tf.train.Optimizer` can be
|
||||||
can be made differentially private by calling `optimizers.dp_optimizer.make_gaussian_optimizer_class()`.
|
made differentially private by calling
|
||||||
|
`optimizers.dp_optimizer.make_gaussian_optimizer_class()`.
|
||||||
|
|
||||||
As you can see, only one line needs to change but there are a few things going
|
As you can see, only one line needs to change but there are a few things going
|
||||||
on that are best to unwrap before we continue. In addition to the learning rate, we
|
on that are best to unwrap before we continue. In addition to the learning rate,
|
||||||
passed the size of the training set as the `population_size` parameter. This is
|
we passed the size of the training set as the `population_size` parameter. This
|
||||||
used to measure the strength of privacy achieved; we will come back to this
|
is used to measure the strength of privacy achieved; we will come back to this
|
||||||
accounting aspect later.
|
accounting aspect later.
|
||||||
|
|
||||||
More importantly, TF Privacy introduces three new hyperparameters to the
|
More importantly, TF Privacy introduces three new hyperparameters to the
|
||||||
|
@ -252,11 +256,11 @@ You may have deduced what `l2_norm_clip` and `noise_multiplier` are from the two
|
||||||
changes outlined above.
|
changes outlined above.
|
||||||
|
|
||||||
Parameter `l2_norm_clip` is the maximum Euclidean norm of each individual
|
Parameter `l2_norm_clip` is the maximum Euclidean norm of each individual
|
||||||
gradient that is computed on an individual training example from a minibatch. This
|
gradient that is computed on an individual training example from a minibatch.
|
||||||
parameter is used to bound the optimizer's sensitivity to individual training
|
This parameter is used to bound the optimizer's sensitivity to individual
|
||||||
points. Note how in order for the optimizer to be able to compute these per
|
training points. Note how in order for the optimizer to be able to compute these
|
||||||
example gradients, we must pass it a *vector* loss as defined previously, rather
|
per example gradients, we must pass it a *vector* loss as defined previously,
|
||||||
than the loss averaged over the entire minibatch.
|
rather than the loss averaged over the entire minibatch.
|
||||||
|
|
||||||
Next, the `noise_multiplier` parameter is used to control how much noise is
|
Next, the `noise_multiplier` parameter is used to control how much noise is
|
||||||
sampled and added to gradients before they are applied by the optimizer.
|
sampled and added to gradients before they are applied by the optimizer.
|
||||||
|
@ -320,9 +324,9 @@ single training example in the training set. This could mean to add a training
|
||||||
example, remove a training example, or change the values within one training
|
example, remove a training example, or change the values within one training
|
||||||
example. The intuition is that if a single training point does not affect the
|
example. The intuition is that if a single training point does not affect the
|
||||||
outcome of learning, the information contained in that training point cannot be
|
outcome of learning, the information contained in that training point cannot be
|
||||||
memorized and the privacy of the individual who contributed this data point to our
|
memorized and the privacy of the individual who contributed this data point to
|
||||||
dataset is respected. We often refer to this probability as the privacy budget:
|
our dataset is respected. We often refer to this probability as the privacy
|
||||||
smaller privacy budgets correspond to stronger privacy guarantees.
|
budget: smaller privacy budgets correspond to stronger privacy guarantees.
|
||||||
|
|
||||||
Accounting required to compute the privacy budget spent to train our machine
|
Accounting required to compute the privacy budget spent to train our machine
|
||||||
learning model is another feature provided by TF Privacy. Knowing what level of
|
learning model is another feature provided by TF Privacy. Knowing what level of
|
||||||
|
@ -348,15 +352,16 @@ steps = FLAGS.epochs * 60000 // FLAGS.batch_size
|
||||||
At a high level, the privacy analysis measures how including or excluding any
|
At a high level, the privacy analysis measures how including or excluding any
|
||||||
particular point in the training data is likely to change the probability that
|
particular point in the training data is likely to change the probability that
|
||||||
we learn any particular set of parameters. In other words, the analysis measures
|
we learn any particular set of parameters. In other words, the analysis measures
|
||||||
the difference between the distributions of model parameters on neighboring training
|
the difference between the distributions of model parameters on neighboring
|
||||||
sets (pairs of any training sets with a Hamming distance of 1). In TF Privacy,
|
training sets (pairs of any training sets with a Hamming distance of 1). In TF
|
||||||
we use the Rényi divergence to measure this distance between distributions.
|
Privacy, we use the Rényi divergence to measure this distance between
|
||||||
Indeed, our analysis is performed in the framework of Rényi Differential Privacy
|
distributions. Indeed, our analysis is performed in the framework of Rényi
|
||||||
(RDP), which is a generalization of pure differential privacy
|
Differential Privacy (RDP), which is a generalization of pure differential
|
||||||
[[Mironov]](https://arxiv.org/abs/1702.07476). RDP is a useful tool here because
|
privacy [[Mironov]](https://arxiv.org/abs/1702.07476). RDP is a useful tool here
|
||||||
it is particularly well suited to analyze the differential privacy guarantees
|
because it is particularly well suited to analyze the differential privacy
|
||||||
provided by sampling followed by Gaussian noise addition, which is how gradients
|
guarantees provided by sampling followed by Gaussian noise addition, which is
|
||||||
are randomized in the TF Privacy implementation of the DP-SGD optimizer.
|
how gradients are randomized in the TF Privacy implementation of the DP-SGD
|
||||||
|
optimizer.
|
||||||
|
|
||||||
We will express our differential privacy guarantee using two parameters:
|
We will express our differential privacy guarantee using two parameters:
|
||||||
`epsilon` and `delta`.
|
`epsilon` and `delta`.
|
||||||
|
@ -374,21 +379,22 @@ We will express our differential privacy guarantee using two parameters:
|
||||||
still mean good practical privacy.
|
still mean good practical privacy.
|
||||||
|
|
||||||
The TF Privacy library provides two methods relevant to derive privacy
|
The TF Privacy library provides two methods relevant to derive privacy
|
||||||
guarantees achieved from the three parameters outlined in the last code snippet: `compute_rdp`
|
guarantees achieved from the three parameters outlined in the last code snippet:
|
||||||
and `get_privacy_spent`.
|
`compute_rdp` and `get_privacy_spent`. These methods are found in its
|
||||||
These methods are found in its `analysis.rdp_accountant` module. Here is how to use them.
|
`analysis.rdp_accountant` module. Here is how to use them.
|
||||||
|
|
||||||
First, we need to define a list of orders, at which the Rényi divergence will be
|
First, we need to define a list of orders, at which the Rényi divergence will be
|
||||||
computed. While some finer points of how to use the RDP accountant are outside the
|
computed. While some finer points of how to use the RDP accountant are outside
|
||||||
scope of this document, it is useful to keep in mind the following.
|
the scope of this document, it is useful to keep in mind the following. First,
|
||||||
First, there is very little downside in expanding the list of orders for which RDP
|
there is very little downside in expanding the list of orders for which RDP is
|
||||||
is computed. Second, the computed privacy budget is typically not very sensitive to
|
computed. Second, the computed privacy budget is typically not very sensitive to
|
||||||
the exact value of the order (being close enough will land you in the right neighborhood).
|
the exact value of the order (being close enough will land you in the right
|
||||||
Finally, if you are targeting a particular range of epsilons (say, 1—10) and your delta is
|
neighborhood). Finally, if you are targeting a particular range of epsilons
|
||||||
fixed (say, `10^-5`), then your orders must cover the range between `1+ln(1/delta)/10≈2.15` and
|
(say, 1—10) and your delta is fixed (say, `10^-5`), then your orders must cover
|
||||||
`1+ln(1/delta)/1≈12.5`. This last rule may appear circular (how do you know what privacy
|
the range between `1+ln(1/delta)/10≈2.15` and `1+ln(1/delta)/1≈12.5`. This last
|
||||||
parameters you get without running the privacy accountant?!), one or two adjustments
|
rule may appear circular (how do you know what privacy parameters you get
|
||||||
of the range of the orders would usually suffice.
|
without running the privacy accountant?!), one or two adjustments of the range
|
||||||
|
of the orders would usually suffice.
|
||||||
|
|
||||||
```python
|
```python
|
||||||
orders = [1 + x / 10. for x in range(1, 100)] + list(range(12, 64))
|
orders = [1 + x / 10. for x in range(1, 100)] + list(range(12, 64))
|
||||||
|
@ -409,12 +415,10 @@ Running the code snippets above with the hyperparameter values used during
|
||||||
training will estimate the `epsilon` value that was achieved by the
|
training will estimate the `epsilon` value that was achieved by the
|
||||||
differentially private optimizer, and thus the strength of the privacy guarantee
|
differentially private optimizer, and thus the strength of the privacy guarantee
|
||||||
which comes with the model we trained. Once we computed the value of `epsilon`,
|
which comes with the model we trained. Once we computed the value of `epsilon`,
|
||||||
interpreting this value is at times
|
interpreting this value is at times difficult. One possibility is to purposely
|
||||||
difficult. One possibility is to purposely
|
insert secrets in the model's training set and measure how likely they are to be
|
||||||
insert secrets in the model's training set and measure how likely
|
leaked by a differentially private model (compared to a non-private model) at
|
||||||
they are to be leaked by a differentially private model
|
inference time [[Carlini et al.]](https://arxiv.org/abs/1802.08232).
|
||||||
(compared to a non-private model) at inference time
|
|
||||||
[[Carlini et al.]](https://arxiv.org/abs/1802.08232).
|
|
||||||
|
|
||||||
### Putting all the pieces together
|
### Putting all the pieces together
|
||||||
|
|
||||||
|
@ -425,7 +429,7 @@ achieved.
|
||||||
|
|
||||||
However, in case you ran into an issue or you'd like to see what a complete
|
However, in case you ran into an issue or you'd like to see what a complete
|
||||||
implementation looks like, the "solution" to the tutorial presented in this blog
|
implementation looks like, the "solution" to the tutorial presented in this blog
|
||||||
post can be [found](https://github.com/tensorflow/privacy/blob/master/tutorials/mnist_dpsgd_tutorial.py) in the
|
post can be
|
||||||
tutorials directory of TF Privacy. It is the script called `mnist_dpsgd_tutorial.py`.
|
[found](https://github.com/tensorflow/privacy/blob/master/tutorials/mnist_dpsgd_tutorial.py)
|
||||||
|
in the tutorials directory of TF Privacy. It is the script called
|
||||||
|
`mnist_dpsgd_tutorial.py`.
|
||||||
|
|
Loading…
Reference in a new issue