Updating README.md for the tutorial. Included discussion of learning_rate and target accuracy/privacy for several settings of training parameters.

PiperOrigin-RevId: 230016922
2019-01-18 16:49:02 -08:00 · 2019-01-18 16:49:02 -08:00 · 047e1eef0e
commit 047e1eef0e
parent 6c5c39c4f2
1 changed files with 44 additions and 11 deletions
--- a/tutorials/README.md
+++ b/tutorials/README.md
@ -8,18 +8,23 @@ counterpart implemented in the library.
 ## Parameters

 All of the optimizers share some privacy-specific parameters that need to
-be tuned in addition to any existing hyperparameter. There are currently three:
-* num_microbatches (int): The input data for each step (i.e., batch) of your
+be tuned in addition to any existing hyperparameter. There are currently four:
+
+* `learning_rate` (float): The learning rate of the SGD training algorithm. The
+  higher the learning rate, the more each update matters. If the updates are noisy
+  (such as when the additive noise is large compared to the clipping
+  threshold), the learning rate must be kept low for the training procedure to converge.
+* `num_microbatches` (int): The input data for each step (i.e., batch) of your
  original training algorithm is split into this many microbatches. Generally,
  increasing this will improve your utility but slow down your training in terms
  of wall-clock time. The total number of examples consumed in one global step
  remains the same. This number should evenly divide your input batch size.
-* l2_norm_clip  (float): The cumulative gradient across all network parameters
+* `l2_norm_clip` (float): The cumulative gradient across all network parameters
  from each microbatch will be clipped so that its L2 norm is at most this
  value. You should set this to something close to some percentile of what
  you expect the gradient from each microbatch to be. In previous experiments,
  we've found numbers from 0.5 to 1.0 to work reasonably well.
-* noise_multiplier (float): This governs the amount of noise added during 
+* `noise_multiplier` (float): This governs the amount of noise added during
  training. Generally, more noise results in better privacy and lower utility.
  This generally has to be at least 0.3 to obtain rigorous privacy guarantees,
  but smaller values may still be acceptable for practical purposes.
@ -44,7 +49,35 @@ approach demonstrated in the `compute_epsilon` of the `mnist_dpsgd_tutorial.py`
 where the arguments used to call the RDP accountant (i.e., the tool used to
 compute the privacy guarantee) are:

-* q : The sampling ratio, defined as (number of examples consumed in one
+* `q` : The sampling ratio, defined as (number of examples consumed in one
  step) / (total training examples).
-* noise_multiplier : The noise_multiplier from your parameters above.
-* steps : The number of global steps taken.
+* `noise_multiplier` : The noise_multiplier from your parameters above.
+* `steps` : The number of global steps taken.
+
+## Expected Output
+
+When the script is run with the default parameters, the output will
+contain the following lines (leaving out a lot of diagnostic info):
+```
+...
+Test accuracy after 1 epochs is: 0.743
+For delta=1e-5, the current epsilon is: 1.00
+...
+Test accuracy after 2 epochs is: 0.839
+For delta=1e-5, the current epsilon is: 1.04
+...
+Test accuracy after 60 epochs is: 0.966
+For delta=1e-5, the current epsilon is: 2.92
+```
+
+## Select Parameters
+
+The table below has a few sample parameters illustrating various accuracy/privacy
+tradeoffs (the first line is the default setting; privacy epsilon is reported
+at delta=1e-5; accuracy is averaged over 10 runs).
+
+| Learning rate | Noise multiplier | Clipping threshold | Number of microbatches | Number of epochs | Privacy eps | Accuracy |
+| ------------- | ---------------- | -----------------  | ---------------------  | ---------------- | ----------- | -------- |
+| 0.08          | 1.12             | 1.0                | 256                    | 60               | 2.92        | 96.6%    |
+| 0.4           | 0.6              | 1.0                | 256                    | 30               | 9.74        | 97.3%    |
+| 0.32          | 1.2              | 1.0                | 256                    | 10               | 1.20        | 95.0%    |