Updating README.md for the tutorial. Included discussion of learning_rate and target accuracy/privacy for several settings of training parameters.
PiperOrigin-RevId: 230016922
This commit is contained in:
parent
6c5c39c4f2
commit
047e1eef0e
1 changed files with 44 additions and 11 deletions
|
@ -8,18 +8,23 @@ counterpart implemented in the library.
|
||||||
## Parameters
|
## Parameters
|
||||||
|
|
||||||
All of the optimizers share some privacy-specific parameters that need to
|
All of the optimizers share some privacy-specific parameters that need to
|
||||||
be tuned in addition to any existing hyperparameter. There are currently three:
|
be tuned in addition to any existing hyperparameter. There are currently four:
|
||||||
* num_microbatches (int): The input data for each step (i.e., batch) of your
|
|
||||||
|
* `learning_rate` (float): The learning rate of the SGD training algorithm. The
|
||||||
|
higher the learning rate, the more each update matters. If the updates are noisy
|
||||||
|
(such as when the additive noise is large compared to the clipping
|
||||||
|
threshold), the learning rate must be kept low for the training procedure to converge.
|
||||||
|
* `num_microbatches` (int): The input data for each step (i.e., batch) of your
|
||||||
original training algorithm is split into this many microbatches. Generally,
|
original training algorithm is split into this many microbatches. Generally,
|
||||||
increasing this will improve your utility but slow down your training in terms
|
increasing this will improve your utility but slow down your training in terms
|
||||||
of wall-clock time. The total number of examples consumed in one global step
|
of wall-clock time. The total number of examples consumed in one global step
|
||||||
remains the same. This number should evenly divide your input batch size.
|
remains the same. This number should evenly divide your input batch size.
|
||||||
* l2_norm_clip (float): The cumulative gradient across all network parameters
|
* `l2_norm_clip` (float): The cumulative gradient across all network parameters
|
||||||
from each microbatch will be clipped so that its L2 norm is at most this
|
from each microbatch will be clipped so that its L2 norm is at most this
|
||||||
value. You should set this to something close to some percentile of what
|
value. You should set this to something close to some percentile of what
|
||||||
you expect the gradient from each microbatch to be. In previous experiments,
|
you expect the gradient from each microbatch to be. In previous experiments,
|
||||||
we've found numbers from 0.5 to 1.0 to work reasonably well.
|
we've found numbers from 0.5 to 1.0 to work reasonably well.
|
||||||
* noise_multiplier (float): This governs the amount of noise added during
|
* `noise_multiplier` (float): This governs the amount of noise added during
|
||||||
training. Generally, more noise results in better privacy and lower utility.
|
training. Generally, more noise results in better privacy and lower utility.
|
||||||
This generally has to be at least 0.3 to obtain rigorous privacy guarantees,
|
This generally has to be at least 0.3 to obtain rigorous privacy guarantees,
|
||||||
but smaller values may still be acceptable for practical purposes.
|
but smaller values may still be acceptable for practical purposes.
|
||||||
|
@ -44,7 +49,35 @@ approach demonstrated in the `compute_epsilon` of the `mnist_dpsgd_tutorial.py`
|
||||||
where the arguments used to call the RDP accountant (i.e., the tool used to
|
where the arguments used to call the RDP accountant (i.e., the tool used to
|
||||||
compute the privacy guarantee) are:
|
compute the privacy guarantee) are:
|
||||||
|
|
||||||
* q : The sampling ratio, defined as (number of examples consumed in one
|
* `q` : The sampling ratio, defined as (number of examples consumed in one
|
||||||
step) / (total training examples).
|
step) / (total training examples).
|
||||||
* noise_multiplier : The noise_multiplier from your parameters above.
|
* `noise_multiplier` : The noise_multiplier from your parameters above.
|
||||||
* steps : The number of global steps taken.
|
* `steps` : The number of global steps taken.
|
||||||
|
|
||||||
|
## Expected Output
|
||||||
|
|
||||||
|
When the script is run with the default parameters, the output will
|
||||||
|
contain the following lines (leaving out a lot of diagnostic info):
|
||||||
|
```
|
||||||
|
...
|
||||||
|
Test accuracy after 1 epochs is: 0.743
|
||||||
|
For delta=1e-5, the current epsilon is: 1.00
|
||||||
|
...
|
||||||
|
Test accuracy after 2 epochs is: 0.839
|
||||||
|
For delta=1e-5, the current epsilon is: 1.04
|
||||||
|
...
|
||||||
|
Test accuracy after 60 epochs is: 0.966
|
||||||
|
For delta=1e-5, the current epsilon is: 2.92
|
||||||
|
```
|
||||||
|
|
||||||
|
## Select Parameters
|
||||||
|
|
||||||
|
The table below has a few sample parameters illustrating various accuracy/privacy
|
||||||
|
tradeoffs (the first line is the default setting; privacy epsilon is reported
|
||||||
|
at delta=1e-5; accuracy is averaged over 10 runs).
|
||||||
|
|
||||||
|
| Learning rate | Noise multiplier | Clipping threshold | Number of microbatches | Number of epochs | Privacy eps | Accuracy |
|
||||||
|
| ------------- | ---------------- | ----------------- | --------------------- | ---------------- | ----------- | -------- |
|
||||||
|
| 0.08 | 1.12 | 1.0 | 256 | 60 | 2.92 | 96.6% |
|
||||||
|
| 0.4 | 0.6 | 1.0 | 256 | 30 | 9.74 | 97.3% |
|
||||||
|
| 0.32 | 1.2 | 1.0 | 256 | 10 | 1.20 | 95.0% |
|
||||||
|
|
Loading…
Reference in a new issue