Fix lint errors in tensorflow_privacy/tutorials/walkthrough/README.md.

PiperOrigin-RevId: 427030504
2022-02-07 15:16:41 -08:00 · 2022-02-07 15:16:41 -08:00 · 5dc3475e17
commit 5dc3475e17
parent ceced43d0b
1 changed files with 90 additions and 86 deletions
--- a/tutorials/walkthrough/README.md
+++ b/tutorials/walkthrough/README.md
@ -8,26 +8,28 @@ design machine learning algorithms that responsibly train models on private
 data. Learning with differential privacy provides provable guarantees of
 privacy, mitigating the risk of exposing sensitive training data in machine
 learning. Intuitively, a model trained with differential privacy should not be
-affected by any single training example, or small set of training examples, in its data set.
+affected by any single training example, or small set of training examples, in
 its data set.
-You may recall our [previous blog post on PATE](http://www.cleverhans.io/privacy/2018/04/29/privacy-and-machine-learning.html), 
+You may recall our
-an approach that achieves private learning by carefully 
+[previous blog post on PATE](http://www.cleverhans.io/privacy/2018/04/29/privacy-and-machine-learning.html),
-coordinating the activity of several different ML 
+an approach that achieves private learning by carefully coordinating the
-models [[Papernot et al.]](https://arxiv.org/abs/1610.05755). 
+activity of several different ML models
-In this post, you will learn how to train a differentially private model with
+[[Papernot et al.]](https://arxiv.org/abs/1610.05755). In this post, you will
-another approach that relies on Differentially
+learn how to train a differentially private model with another approach that
-Private Stochastic Gradient Descent (DP-SGD) [[Abadi et al.]](https://arxiv.org/abs/1607.00133).
+relies on Differentially Private Stochastic Gradient Descent (DP-SGD)
-DP-SGD and PATE are two different ways to achieve the same goal of privacy-preserving
+[[Abadi et al.]](https://arxiv.org/abs/1607.00133). DP-SGD and PATE are two
-machine learning. DP-SGD makes less assumptions about the ML task than PATE, 
+different ways to achieve the same goal of privacy-preserving machine learning.
-but this comes at the expense of making modifications to the training algorithm. 
+DP-SGD makes less assumptions about the ML task than PATE, but this comes at the
 expense of making modifications to the training algorithm.
-Indeed, DP-SGD is
+Indeed, DP-SGD is a modification of the stochastic gradient descent algorithm,
 a modification of the stochastic gradient descent algorithm,
 which is the basis for many optimizers that are popular in machine learning.
 Models trained with DP-SGD have provable privacy guarantees expressed in terms
 of differential privacy (we will explain what this means at the end of this
-post). We will be using the [TensorFlow Privacy](https://github.com/tensorflow/privacy) library,
+post). We will be using the
-which provides an implementation of DP-SGD, to illustrate our presentation of DP-SGD
+[TensorFlow Privacy](https://github.com/tensorflow/privacy) library, which
 provides an implementation of DP-SGD, to illustrate our presentation of DP-SGD
 and provide a hands-on tutorial.
 The only prerequisite for following this tutorial is to be able to train a
@ -36,13 +38,12 @@ convolutional neural networks or how to train them, we recommend reading
 [this tutorial first](https://www.tensorflow.org/tutorials/keras/basic_classification)
 to get started with TensorFlow and machine learning.
-Upon completing the tutorial presented in this post, 
+Upon completing the tutorial presented in this post, you will be able to wrap
-you will be able to wrap existing optimizers
+existing optimizers (e.g., SGD, Adam, ...) into their differentially private
-(e.g., SGD, Adam, ...) into their differentially private counterparts using
+counterparts using TensorFlow (TF) Privacy. You will also learn how to tune the
-TensorFlow (TF) Privacy. You will also learn how to tune the parameters
+parameters introduced by differentially private optimization. Finally, we will
-introduced by differentially private optimization. Finally, we will learn how to
+learn how to measure the privacy guarantees provided using analysis tools
-measure the privacy guarantees provided using analysis tools included in TF
+included in TF Privacy.
 Privacy.
 ## Getting started
@ -50,12 +51,14 @@ Before we get started with DP-SGD and TF Privacy, we need to put together a
 script that trains a simple neural network with TensorFlow.
 In the interest of keeping this tutorial focused on the privacy aspects of
-training, we've included
+training, we've included such a script as companion code for this blog post in
-such a script as companion code for this blog post in the `walkthrough` [subdirectory](https://github.com/tensorflow/privacy/tree/master/tutorials/walkthrough) of the
+the `walkthrough`
-`tutorials` found in the [TensorFlow Privacy](https://github.com/tensorflow/privacy) repository. The code found in the file `mnist_scratch.py` 
+[subdirectory](https://github.com/tensorflow/privacy/tree/master/tutorials/walkthrough)
-trains a small
+of the `tutorials` found in the
-convolutional neural network on the MNIST dataset for handwriting recognition.
+[TensorFlow Privacy](https://github.com/tensorflow/privacy) repository. The code
-This script will be used as the basis for our exercise below.
+found in the file `mnist_scratch.py` trains a small convolutional neural network
 on the MNIST dataset for handwriting recognition. This script will be used as
 the basis for our exercise below.
 Next, we highlight some important code snippets from the `mnist_scratch.py`
 script.
@ -116,10 +119,10 @@ python mnist_scratch.py
 ### Stochastic Gradient Descent
-Before we dive into how DP-SGD and TF Privacy can be used to provide differential privacy
+Before we dive into how DP-SGD and TF Privacy can be used to provide
-during machine learning, we first provide a brief overview of the stochastic
+differential privacy during machine learning, we first provide a brief overview
-gradient descent algorithm, which is one of the most popular optimizers for
+of the stochastic gradient descent algorithm, which is one of the most popular
-neural networks.
+optimizers for neural networks.
 Stochastic gradient descent is an iterative procedure. At each iteration, a
 batch of data is randomly sampled from the training set (this is where the
@ -208,16 +211,16 @@ gradient manipulation later at step 4.
 We are now ready to create an optimizer. In TensorFlow, an optimizer object can
 be instantiated by passing it a learning rate value, which is used in step 6
-outlined above. 
+outlined above. This is what the code would look like *without* differential
-This is what the code would look like *without* differential privacy:
+privacy:
 ```python
 optimizer = tf.train.GradientDescentOptimizer(FLAGS.learning_rate)
 train_op = optimizer.minimize(loss=scalar_loss)
 ```
-Note that our code snippet assumes that a TensorFlow flag was
+Note that our code snippet assumes that a TensorFlow flag was defined for the
-defined for the learning rate value.
+learning rate value.
 Now, we use the `optimizers.dp_optimizer` module of TF Privacy to implement the
 optimizer with differential privacy. Under the hood, this code implements steps
@ -233,17 +236,18 @@ optimizer = optimizers.dp_optimizer.DPGradientDescentGaussianOptimizer(
 train_op = optimizer.minimize(loss=vector_loss)
 ```
-In these two code snippets, we used the stochastic gradient descent
+In these two code snippets, we used the stochastic gradient descent optimizer
-optimizer but it could be replaced by another optimizer implemented in
+but it could be replaced by another optimizer implemented in TensorFlow. For
-TensorFlow. For instance, the `AdamOptimizer` can be replaced by `DPAdamGaussianOptimizer`. In addition to the standard optimizers already
+instance, the `AdamOptimizer` can be replaced by `DPAdamGaussianOptimizer`. In
-included in TF Privacy, most optimizers which are objects from a child class
+addition to the standard optimizers already included in TF Privacy, most
-of `tf.train.Optimizer`
+optimizers which are objects from a child class of `tf.train.Optimizer` can be
-can be made differentially private by calling `optimizers.dp_optimizer.make_gaussian_optimizer_class()`.
+made differentially private by calling
 `optimizers.dp_optimizer.make_gaussian_optimizer_class()`.
 As you can see, only one line needs to change but there are a few things going
-on that are best to unwrap before we continue. In addition to the learning rate, we
+on that are best to unwrap before we continue. In addition to the learning rate,
-passed the size of the training set as the `population_size` parameter. This is
+we passed the size of the training set as the `population_size` parameter. This
-used to measure the strength of privacy achieved; we will come back to this
+is used to measure the strength of privacy achieved; we will come back to this
 accounting aspect later.
 More importantly, TF Privacy introduces three new hyperparameters to the
@ -252,11 +256,11 @@ You may have deduced what `l2_norm_clip` and `noise_multiplier` are from the two
 changes outlined above.
 Parameter `l2_norm_clip` is the maximum Euclidean norm of each individual
-gradient that is computed on an individual training example from a minibatch. This
+gradient that is computed on an individual training example from a minibatch.
-parameter is used to bound the optimizer's sensitivity to individual training
+This parameter is used to bound the optimizer's sensitivity to individual
-points. Note how in order for the optimizer to be able to compute these per
+training points. Note how in order for the optimizer to be able to compute these
-example gradients, we must pass it a *vector* loss as defined previously, rather
+per example gradients, we must pass it a *vector* loss as defined previously,
-than the loss averaged over the entire minibatch.
+rather than the loss averaged over the entire minibatch.
 Next, the `noise_multiplier` parameter is used to control how much noise is
 sampled and added to gradients before they are applied by the optimizer.
@ -320,9 +324,9 @@ single training example in the training set. This could mean to add a training
 example, remove a training example, or change the values within one training
 example. The intuition is that if a single training point does not affect the
 outcome of learning, the information contained in that training point cannot be
-memorized and the privacy of the individual who contributed this data point to our
+memorized and the privacy of the individual who contributed this data point to
-dataset is respected. We often refer to this probability as the privacy budget:
+our dataset is respected. We often refer to this probability as the privacy
-smaller privacy budgets correspond to stronger privacy guarantees.
+budget: smaller privacy budgets correspond to stronger privacy guarantees.
 Accounting required to compute the privacy budget spent to train our machine
 learning model is another feature provided by TF Privacy. Knowing what level of
@ -348,15 +352,16 @@ steps = FLAGS.epochs * 60000 // FLAGS.batch_size
 At a high level, the privacy analysis measures how including or excluding any
 particular point in the training data is likely to change the probability that
 we learn any particular set of parameters. In other words, the analysis measures
-the difference between the distributions of model parameters on neighboring training
+the difference between the distributions of model parameters on neighboring
-sets (pairs of any training sets with a Hamming distance of 1). In TF Privacy,
+training sets (pairs of any training sets with a Hamming distance of 1). In TF
-we use the Rényi divergence to measure this distance between distributions.
+Privacy, we use the Rényi divergence to measure this distance between
-Indeed, our analysis is performed in the framework of Rényi Differential Privacy
+distributions. Indeed, our analysis is performed in the framework of Rényi
-(RDP), which is a generalization of pure differential privacy
+Differential Privacy (RDP), which is a generalization of pure differential
-[[Mironov]](https://arxiv.org/abs/1702.07476). RDP is a useful tool here because
+privacy [[Mironov]](https://arxiv.org/abs/1702.07476). RDP is a useful tool here
-it is particularly well suited to analyze the differential privacy guarantees
+because it is particularly well suited to analyze the differential privacy
-provided by sampling followed by Gaussian noise addition, which is how gradients
+guarantees provided by sampling followed by Gaussian noise addition, which is
-are randomized in the TF Privacy implementation of the DP-SGD optimizer.
+how gradients are randomized in the TF Privacy implementation of the DP-SGD
 optimizer.
 We will express our differential privacy guarantee using two parameters:
 `epsilon` and `delta`.
@ -374,21 +379,22 @@ We will express our differential privacy guarantee using two parameters:
    still mean good practical privacy.
 The TF Privacy library provides two methods relevant to derive privacy
-guarantees achieved from the three parameters outlined in the last code snippet: `compute_rdp`
+guarantees achieved from the three parameters outlined in the last code snippet:
-and `get_privacy_spent`.
+`compute_rdp` and `get_privacy_spent`. These methods are found in its
-These methods are found in its `analysis.rdp_accountant` module. Here is how to use them.
+`analysis.rdp_accountant` module. Here is how to use them.
 First, we need to define a list of orders, at which the Rényi divergence will be
-computed. While some finer points of how to use the RDP accountant are outside the 
+computed. While some finer points of how to use the RDP accountant are outside
-scope of this document, it is useful to keep in mind the following.
+the scope of this document, it is useful to keep in mind the following. First,
-First, there is very little downside in expanding the list of orders for which RDP
+there is very little downside in expanding the list of orders for which RDP is
-is computed. Second, the computed privacy budget is typically not very sensitive to
+computed. Second, the computed privacy budget is typically not very sensitive to
-the exact value of the order (being close enough will land you in the right neighborhood).
+the exact value of the order (being close enough will land you in the right
-Finally, if you are targeting a particular range of epsilons (say, 1—10) and your delta is
+neighborhood). Finally, if you are targeting a particular range of epsilons
-fixed (say, `10^-5`), then your orders must cover the range between `1+ln(1/delta)/10≈2.15` and 
+(say, 1—10) and your delta is fixed (say, `10^-5`), then your orders must cover
-`1+ln(1/delta)/1≈12.5`. This last rule may appear circular (how do you know what privacy
+the range between `1+ln(1/delta)/10≈2.15` and `1+ln(1/delta)/1≈12.5`. This last
-parameters you get without running the privacy accountant?!), one or two adjustments 
+rule may appear circular (how do you know what privacy parameters you get
-of the range of the orders would usually suffice.
+without running the privacy accountant?!), one or two adjustments of the range
 of the orders would usually suffice.
 ```python
 orders = [1 + x / 10. for x in range(1, 100)] + list(range(12, 64))
@ -409,12 +415,10 @@ Running the code snippets above with the hyperparameter values used during
 training will estimate the `epsilon` value that was achieved by the
 differentially private optimizer, and thus the strength of the privacy guarantee
 which comes with the model we trained. Once we computed the value of `epsilon`,
-interpreting this value is at times
+interpreting this value is at times difficult. One possibility is to purposely
-difficult. One possibility is to purposely 
+insert secrets in the model's training set and measure how likely they are to be
-insert secrets in the model's training set and measure how likely
+leaked by a differentially private model (compared to a non-private model) at
-they are to be leaked by a differentially private model 
+inference time [[Carlini et al.]](https://arxiv.org/abs/1802.08232).
 (compared to a non-private model) at inference time 
 [[Carlini et al.]](https://arxiv.org/abs/1802.08232).
 ### Putting all the pieces together
@ -425,7 +429,7 @@ achieved.
 However, in case you ran into an issue or you'd like to see what a complete
 implementation looks like, the "solution" to the tutorial presented in this blog
-post can be [found](https://github.com/tensorflow/privacy/blob/master/tutorials/mnist_dpsgd_tutorial.py) in the
+post can be
-tutorials directory of TF Privacy. It is the script called `mnist_dpsgd_tutorial.py`.
+[found](https://github.com/tensorflow/privacy/blob/master/tutorials/mnist_dpsgd_tutorial.py)
-
+in the tutorials directory of TF Privacy. It is the script called
-
+`mnist_dpsgd_tutorial.py`.