Change README example to use Google DP for accounting instead of deprecated privacy/analysis/rdp_accountant functions.

PiperOrigin-RevId: 449820802
This commit is contained in:
Galen Andrew 2022-05-19 13:29:15 -07:00 committed by A. Unique TensorFlower
parent f739f45299
commit 5509adb296

View file

@ -328,12 +328,12 @@ memorized and the privacy of the individual who contributed this data point to
our dataset is respected. We often refer to this probability as the privacy
budget: smaller privacy budgets correspond to stronger privacy guarantees.
Accounting required to compute the privacy budget spent to train our machine
learning model is another feature provided by TF Privacy. Knowing what level of
differential privacy was achieved allows us to put into perspective the drop in
utility that is often observed when switching to differentially private
optimization. It also allows us to compare two models objectively to determine
which of the two is more privacy-preserving than the other.
Google's DP library can be used to compute the privacy budget spent to train our
machine learning model. Knowing what level of differential privacy was achieved
allows us to put into perspective the drop in utility that is often observed
when switching to differentially private optimization. It also allows us to
compare two models objectively to determine which of the two is more
privacy-preserving than the other.
Before we derive a bound on the privacy guarantee achieved by our optimizer, we
first need to identify all the parameters that are relevant to measuring the
@ -378,37 +378,51 @@ We will express our differential privacy guarantee using two parameters:
However, this is only an upper bound, and a large value of epsilon could
still mean good practical privacy.
The TF Privacy library provides two methods relevant to derive privacy
guarantees achieved from the three parameters outlined in the last code snippet:
`compute_rdp` and `get_privacy_spent`. These methods are found in its
`analysis.rdp_accountant` module. Here is how to use them.
To compute the privacy spent using the Google DP library, we need to define a
`PrivacyAccountant` and a `DpEvent`. The `PrivacyAccountant` specifies what
method of privacy accounting will be used. In our case that will be RDP, so we
use the `RdpAccountant`. The `DpEvent` is a representation of the log of
privacy-impacting actions that have occurred, in our case, the repeated sampling
of records and estimation of their mean with Gaussian noise added.
First, we need to define a list of orders, at which the Rényi divergence will be
computed. While some finer points of how to use the RDP accountant are outside
the scope of this document, it is useful to keep in mind the following. First,
there is very little downside in expanding the list of orders for which RDP is
computed. Second, the computed privacy budget is typically not very sensitive to
the exact value of the order (being close enough will land you in the right
neighborhood). Finally, if you are targeting a particular range of epsilons
(say, 1—10) and your delta is fixed (say, `10^-5`), then your orders must cover
the range between `1+ln(1/delta)/10≈2.15` and `1+ln(1/delta)/1≈12.5`. This last
rule may appear circular (how do you know what privacy parameters you get
without running the privacy accountant?!), one or two adjustments of the range
of the orders would usually suffice.
To initialize the `PrivacyAccountant`, we need to define a list of orders, at
which the Rényi divergence will be computed. While some finer points of how to
use the RDP accountant are outside the scope of this document, it is useful to
keep in mind the following. First, there is very little downside in expanding
the list of orders for which RDP is computed. Second, the computed privacy
budget is typically not very sensitive to the exact value of the order (being
close enough will land you in the right neighborhood). Finally, if you are
targeting a particular range of epsilons (say, 1—10) and your delta is fixed
(say, `10^-5`), then your orders must cover the range between
`1+ln(1/delta)/10≈2.15` and `1+ln(1/delta)/1≈12.5`. This last rule may appear
circular (how do you know what privacy parameters you get without running the
privacy accountant?!), one or two adjustments of the range of the orders would
usually suffice.
```python
orders = [1 + x / 10. for x in range(1, 100)] + list(range(12, 64))
rdp = compute_rdp(q=sampling_probability,
noise_multiplier=FLAGS.noise_multiplier,
steps=steps,
orders=orders)
accountant = privacy_accountant.RdpAccountant(orders)
```
Then, the method `get_privacy_spent` computes the best `epsilon` for a given
`target_delta` value of delta by taking the minimum over all orders.
Next we create a `DpEvent` and feed it to the accountant for processing using
its `compose` method:
```python
epsilon = get_privacy_spent(orders, rdp, target_delta=1e-5)[0]
event = dp_event.SelfComposedDpEvent(
event=dp_event.PoissonSampledDpEvent(
sampling_probability=q,
event=dp_event.GaussianDpEvent(noise_multiplier)
),
count=steps)
accountant.compose(event)
```
Finally, we can query the accountant for the best `epsilon` at the given
`target_delta` by calling the `get_epsilon` method which takes the minimum over
all orders.
```python
epsilon = accountant.get_epsilon(target_delta)
```
Running the code snippets above with the hyperparameter values used during