Updating README.md for the tutorial. Included discussion of learning_rate and target accuracy/privacy for several settings of training parameters.
PiperOrigin-RevId: 230016922
This commit is contained in:
parent
6c5c39c4f2
commit
047e1eef0e
1 changed files with 44 additions and 11 deletions
|
@ -8,18 +8,23 @@ counterpart implemented in the library.
|
|||
## Parameters
|
||||
|
||||
All of the optimizers share some privacy-specific parameters that need to
|
||||
be tuned in addition to any existing hyperparameter. There are currently three:
|
||||
* num_microbatches (int): The input data for each step (i.e., batch) of your
|
||||
be tuned in addition to any existing hyperparameter. There are currently four:
|
||||
|
||||
* `learning_rate` (float): The learning rate of the SGD training algorithm. The
|
||||
higher the learning rate, the more each update matters. If the updates are noisy
|
||||
(such as when the additive noise is large compared to the clipping
|
||||
threshold), the learning rate must be kept low for the training procedure to converge.
|
||||
* `num_microbatches` (int): The input data for each step (i.e., batch) of your
|
||||
original training algorithm is split into this many microbatches. Generally,
|
||||
increasing this will improve your utility but slow down your training in terms
|
||||
of wall-clock time. The total number of examples consumed in one global step
|
||||
remains the same. This number should evenly divide your input batch size.
|
||||
* l2_norm_clip (float): The cumulative gradient across all network parameters
|
||||
* `l2_norm_clip` (float): The cumulative gradient across all network parameters
|
||||
from each microbatch will be clipped so that its L2 norm is at most this
|
||||
value. You should set this to something close to some percentile of what
|
||||
you expect the gradient from each microbatch to be. In previous experiments,
|
||||
we've found numbers from 0.5 to 1.0 to work reasonably well.
|
||||
* noise_multiplier (float): This governs the amount of noise added during
|
||||
* `noise_multiplier` (float): This governs the amount of noise added during
|
||||
training. Generally, more noise results in better privacy and lower utility.
|
||||
This generally has to be at least 0.3 to obtain rigorous privacy guarantees,
|
||||
but smaller values may still be acceptable for practical purposes.
|
||||
|
@ -44,7 +49,35 @@ approach demonstrated in the `compute_epsilon` of the `mnist_dpsgd_tutorial.py`
|
|||
where the arguments used to call the RDP accountant (i.e., the tool used to
|
||||
compute the privacy guarantee) are:
|
||||
|
||||
* q : The sampling ratio, defined as (number of examples consumed in one
|
||||
* `q` : The sampling ratio, defined as (number of examples consumed in one
|
||||
step) / (total training examples).
|
||||
* noise_multiplier : The noise_multiplier from your parameters above.
|
||||
* steps : The number of global steps taken.
|
||||
* `noise_multiplier` : The noise_multiplier from your parameters above.
|
||||
* `steps` : The number of global steps taken.
|
||||
|
||||
## Expected Output
|
||||
|
||||
When the script is run with the default parameters, the output will
|
||||
contain the following lines (leaving out a lot of diagnostic info):
|
||||
```
|
||||
...
|
||||
Test accuracy after 1 epochs is: 0.743
|
||||
For delta=1e-5, the current epsilon is: 1.00
|
||||
...
|
||||
Test accuracy after 2 epochs is: 0.839
|
||||
For delta=1e-5, the current epsilon is: 1.04
|
||||
...
|
||||
Test accuracy after 60 epochs is: 0.966
|
||||
For delta=1e-5, the current epsilon is: 2.92
|
||||
```
|
||||
|
||||
## Select Parameters
|
||||
|
||||
The table below has a few sample parameters illustrating various accuracy/privacy
|
||||
tradeoffs (the first line is the default setting; privacy epsilon is reported
|
||||
at delta=1e-5; accuracy is averaged over 10 runs).
|
||||
|
||||
| Learning rate | Noise multiplier | Clipping threshold | Number of microbatches | Number of epochs | Privacy eps | Accuracy |
|
||||
| ------------- | ---------------- | ----------------- | --------------------- | ---------------- | ----------- | -------- |
|
||||
| 0.08 | 1.12 | 1.0 | 256 | 60 | 2.92 | 96.6% |
|
||||
| 0.4 | 0.6 | 1.0 | 256 | 30 | 9.74 | 97.3% |
|
||||
| 0.32 | 1.2 | 1.0 | 256 | 10 | 1.20 | 95.0% |
|
||||
|
|
Loading…
Reference in a new issue