Change README example to use Google DP for accounting instead of deprecated privacy/analysis/rdp_accountant functions.

PiperOrigin-RevId: 449820802
This commit is contained in:
Galen Andrew 2022-05-19 13:29:15 -07:00 committed by A. Unique TensorFlower
parent f739f45299
commit 5509adb296

View file

@ -328,12 +328,12 @@ memorized and the privacy of the individual who contributed this data point to
our dataset is respected. We often refer to this probability as the privacy our dataset is respected. We often refer to this probability as the privacy
budget: smaller privacy budgets correspond to stronger privacy guarantees. budget: smaller privacy budgets correspond to stronger privacy guarantees.
Accounting required to compute the privacy budget spent to train our machine Google's DP library can be used to compute the privacy budget spent to train our
learning model is another feature provided by TF Privacy. Knowing what level of machine learning model. Knowing what level of differential privacy was achieved
differential privacy was achieved allows us to put into perspective the drop in allows us to put into perspective the drop in utility that is often observed
utility that is often observed when switching to differentially private when switching to differentially private optimization. It also allows us to
optimization. It also allows us to compare two models objectively to determine compare two models objectively to determine which of the two is more
which of the two is more privacy-preserving than the other. privacy-preserving than the other.
Before we derive a bound on the privacy guarantee achieved by our optimizer, we Before we derive a bound on the privacy guarantee achieved by our optimizer, we
first need to identify all the parameters that are relevant to measuring the first need to identify all the parameters that are relevant to measuring the
@ -378,37 +378,51 @@ We will express our differential privacy guarantee using two parameters:
However, this is only an upper bound, and a large value of epsilon could However, this is only an upper bound, and a large value of epsilon could
still mean good practical privacy. still mean good practical privacy.
The TF Privacy library provides two methods relevant to derive privacy To compute the privacy spent using the Google DP library, we need to define a
guarantees achieved from the three parameters outlined in the last code snippet: `PrivacyAccountant` and a `DpEvent`. The `PrivacyAccountant` specifies what
`compute_rdp` and `get_privacy_spent`. These methods are found in its method of privacy accounting will be used. In our case that will be RDP, so we
`analysis.rdp_accountant` module. Here is how to use them. use the `RdpAccountant`. The `DpEvent` is a representation of the log of
privacy-impacting actions that have occurred, in our case, the repeated sampling
of records and estimation of their mean with Gaussian noise added.
First, we need to define a list of orders, at which the Rényi divergence will be To initialize the `PrivacyAccountant`, we need to define a list of orders, at
computed. While some finer points of how to use the RDP accountant are outside which the Rényi divergence will be computed. While some finer points of how to
the scope of this document, it is useful to keep in mind the following. First, use the RDP accountant are outside the scope of this document, it is useful to
there is very little downside in expanding the list of orders for which RDP is keep in mind the following. First, there is very little downside in expanding
computed. Second, the computed privacy budget is typically not very sensitive to the list of orders for which RDP is computed. Second, the computed privacy
the exact value of the order (being close enough will land you in the right budget is typically not very sensitive to the exact value of the order (being
neighborhood). Finally, if you are targeting a particular range of epsilons close enough will land you in the right neighborhood). Finally, if you are
(say, 1—10) and your delta is fixed (say, `10^-5`), then your orders must cover targeting a particular range of epsilons (say, 1—10) and your delta is fixed
the range between `1+ln(1/delta)/10≈2.15` and `1+ln(1/delta)/1≈12.5`. This last (say, `10^-5`), then your orders must cover the range between
rule may appear circular (how do you know what privacy parameters you get `1+ln(1/delta)/10≈2.15` and `1+ln(1/delta)/1≈12.5`. This last rule may appear
without running the privacy accountant?!), one or two adjustments of the range circular (how do you know what privacy parameters you get without running the
of the orders would usually suffice. privacy accountant?!), one or two adjustments of the range of the orders would
usually suffice.
```python ```python
orders = [1 + x / 10. for x in range(1, 100)] + list(range(12, 64)) orders = [1 + x / 10. for x in range(1, 100)] + list(range(12, 64))
rdp = compute_rdp(q=sampling_probability, accountant = privacy_accountant.RdpAccountant(orders)
noise_multiplier=FLAGS.noise_multiplier,
steps=steps,
orders=orders)
``` ```
Then, the method `get_privacy_spent` computes the best `epsilon` for a given Next we create a `DpEvent` and feed it to the accountant for processing using
`target_delta` value of delta by taking the minimum over all orders. its `compose` method:
```python ```python
epsilon = get_privacy_spent(orders, rdp, target_delta=1e-5)[0] event = dp_event.SelfComposedDpEvent(
event=dp_event.PoissonSampledDpEvent(
sampling_probability=q,
event=dp_event.GaussianDpEvent(noise_multiplier)
),
count=steps)
accountant.compose(event)
```
Finally, we can query the accountant for the best `epsilon` at the given
`target_delta` by calling the `get_epsilon` method which takes the minimum over
all orders.
```python
epsilon = accountant.get_epsilon(target_delta)
``` ```
Running the code snippets above with the hyperparameter values used during Running the code snippets above with the hyperparameter values used during