Commit graph

889 commits

Author SHA1 Message Date
A. Unique TensorFlower
781483d1f2 Make compute_dp_sgd_privacy_statement visible.
PiperOrigin-RevId: 520105385
2023-03-28 12:43:21 -07:00
Shuang Song
e125951c9b Sets training set as positive class for sklearn.metrics.roc_curve.
sklearn.metrics.roc_curve uses classification rules in the form "score >= threshold ==> predict positive".
When calling roc_curve, we used to label test data as positive class. This way, TPR = % test examples classified as test, FPR = % training examples classified as test. The classification rule is "loss >= threshold ==> predict test".

For membership inference, TPR is usually defined as % training examples classified as training, and FPR is % test examples classified as training.
As training samples usually have lower loss, we usually use rules in the form of "loss <= threshold ==> predict training".

Therefore, TPR in the 2nd case is actually (1 - FPR) in the 1st case, FPR in the 2nd case is (1 - TPR) in the 1st case.
This mismatch does not affect attacker advantage or AUC, but this can cause problem to PPV.

Now, we:
- set training set as positive class.
- for threshold and entropy attacks, set score to be -loss, so that higher score corresponds to training data.
- negate the thresholds (computed based on -loss) so that it corresponds to loss.

PiperOrigin-RevId: 519880043
2023-03-27 18:00:25 -07:00
A. Unique TensorFlower
7796369d8b Support gradient norm computation with respect to a subset of variables.
PiperOrigin-RevId: 519245638
2023-03-24 14:57:54 -07:00
Galen Andrew
d5d60e2eac Adds compute_dp_sgd_privacy_statement for accurate privacy accounting report.
PiperOrigin-RevId: 518934979
2023-03-23 12:37:12 -07:00
Walid Krichene
52806ba952 In dp_optimizer_keras_sparse, update iterations to reflect the number of logical batches, rather than physical batches.
In the current behavior, when using gradient accumulation, the `iterations` variable is incremented at every physical batch, while variables are only updated at every logical batch (where logical batch = accumulation_steps many physical batches). This causes certain optimizers that explicitly depend on `iterations` (such as Adam) to behave very differently under gradient accumulation.

With this change, `iterations` is only incremented after each logical batch.

PiperOrigin-RevId: 517197044
2023-03-16 12:35:57 -07:00
A. Unique TensorFlower
7ae50c5ca5 Generalize model_forward_pass() to allow input models with multiple outputs.
PiperOrigin-RevId: 517145254
2023-03-16 09:36:16 -07:00
A. Unique TensorFlower
043e8b5272 Report the true loss in DPModel instead of the norm-adjusted loss.
PiperOrigin-RevId: 517112812
2023-03-16 07:15:13 -07:00
A. Unique TensorFlower
8f4ab1a8bb Allow custom per example loss functions for computing per microbatch gradient norm.
PiperOrigin-RevId: 516897864
2023-03-15 12:28:39 -07:00
Zheng Xu
d7d497bb69 Update script for pip package.
PiperOrigin-RevId: 515696284
2023-03-10 11:44:03 -08:00
Galen Andrew
c2bd4c3c6f Bump version number.
PiperOrigin-RevId: 515456888
2023-03-09 15:22:34 -08:00
Galen Andrew
701a585e1a Revert to dp-accounting 0.3.0 API.
PiperOrigin-RevId: 515432485
2023-03-09 13:56:34 -08:00
Galen Andrew
61dfbcc1f5 Adds functions for more accurate privacy accounting.
Adds function for computation of example-level DP epsilon taking into account microbatching and not assuming Poisson subsampling. Adds function for computation of user-level DP in terms of group privacy.

PiperOrigin-RevId: 515114010
2023-03-08 12:44:39 -08:00
A. Unique TensorFlower
4e1fc252e4 Add a kwargs argument to the registry API + small changes to docstrings.
This is a forward-looking change that is needed to support more complicated
layers, such as `tf.keras.layers.MultiHeadAttention`, which can take `kwargs`
as part of their `.call()` method and can generate arbitrary outputs.

PiperOrigin-RevId: 514775503
2023-03-07 10:35:04 -08:00
Steve Chien
21ee1a607a Fix unneeded dependency.
PiperOrigin-RevId: 514523996
2023-03-06 14:20:33 -08:00
Zheng Xu
0a0f377f3f Adaptive clipping in DP-FTRL with restart.
PiperOrigin-RevId: 513934548
2023-03-06 07:16:57 -08:00
A. Unique TensorFlower
8bfafdd74d Efficient DPSGD with support to microbatched losses.
PiperOrigin-RevId: 513886957
2023-03-06 07:01:03 -08:00
Walid Krichene
cbf34f2b04 Update type annotations of gradient clipping library.
PiperOrigin-RevId: 513640655
2023-03-02 14:29:17 -08:00
A. Unique TensorFlower
7436930c64 Improve documentation and logging of fast gradient clipping modules and callers.
PiperOrigin-RevId: 513283486
2023-03-01 10:56:01 -08:00
Andres Munoz MEdina
d7cd3f8af1 Add an announcement on the public README about the new fast implementation of DP-SGD.
PiperOrigin-RevId: 512930920
2023-02-28 07:45:47 -08:00
Shuang Song
a3e8a45559 Passes number of microbatches to DP model.
PiperOrigin-RevId: 512678620
2023-02-27 11:11:59 -08:00
Shuang Song
4a418e8862 Adds __init__.py for fast_gradient_clipping.
PiperOrigin-RevId: 512236191
2023-02-24 21:32:07 -08:00
A. Unique TensorFlower
dda7fa8b39 Add a tf.GradientTape argument to the layer registry functions
PiperOrigin-RevId: 512160655
2023-02-24 14:14:36 -08:00
Shuang Song
4dd8d0ffde Catches when data is not sufficient for StratifiedKFold split.
PiperOrigin-RevId: 510197242
2023-02-16 11:24:12 -08:00
Shuang Song
0c691d0b4d Returns None for getting max results when results are empty.
PiperOrigin-RevId: 510054673
2023-02-15 23:37:43 -08:00
A. Unique TensorFlower
13534e5159 Add better tests for clip_grads.py
PiperOrigin-RevId: 509529435
2023-02-14 08:01:56 -08:00
A. Unique TensorFlower
430f103354 Generalize the registry function for the embedding layer for other models.
PiperOrigin-RevId: 509528743
2023-02-14 07:59:10 -08:00
A. Unique TensorFlower
410814ec39 Generalize the internal API to allow for more general models + layers.
PiperOrigin-RevId: 509518753
2023-02-14 07:10:40 -08:00
Shuang Song
6ee988885a Fix a bug in get_flattened_attack_metrics that types, slices, metrics do not
correspond to values because of PPV.

PiperOrigin-RevId: 509274994
2023-02-13 10:53:29 -08:00
A. Unique TensorFlower
9ed34da715 Integrate the fast gradient clipping algorithm with the DP Keras Model class.
PiperOrigin-RevId: 504931452
2023-01-26 13:45:56 -08:00
A. Unique TensorFlower
bc84ed7bfb Add fast gradient clipping tests.
PiperOrigin-RevId: 504923799
2023-01-26 13:16:19 -08:00
A. Unique TensorFlower
a3b14ae20a First implementation of the fast gradient clipping algorithm.
PiperOrigin-RevId: 504668189
2023-01-25 14:51:09 -08:00
Steve Chien
ee3d349a8d Fix copybara removal of tkinter library.
PiperOrigin-RevId: 504656239
2023-01-25 14:06:27 -08:00
Yilei Yang
622282e034 Update dependency on tkinter.
PiperOrigin-RevId: 503401013
2023-01-20 03:24:46 -08:00
Thomas Steinke
10c086c46a Implementation of differentially private second order methods ("Newton's method") for research project.
PiperOrigin-RevId: 500821630
2023-01-09 15:22:37 -08:00
Peter Hawkins
3d038a490a [NumPy] Remove references to deprecated NumPy type aliases.
This change replaces references to a number of deprecated NumPy type aliases (np.bool, np.int, np.float, np.complex, np.object, np.str) with their recommended replacement (bool, int, float, complex, object, str).

NumPy 1.24 drops the deprecated aliases, so we must remove uses before updating NumPy.

PiperOrigin-RevId: 497194550
2022-12-22 10:32:59 -08:00
Steve Chien
f99a74c7a4 Fix dependencies required by privacy_tests.
Update `distutils` to `packaging`.

PiperOrigin-RevId: 496713867
2022-12-20 11:49:28 -08:00
Shuang Song
2040f08f0d Allows slicing by custom indices.
PiperOrigin-RevId: 486998645
2022-11-08 11:05:26 -08:00
A. Unique TensorFlower
ec747a8d75 Correct imports of keras loss utils
PiperOrigin-RevId: 486795765
2022-11-07 16:34:00 -08:00
A. Unique TensorFlower
e334633466 Bugfix.
PiperOrigin-RevId: 486344068
2022-11-05 05:18:58 -07:00
Shuang Song
f7e1e61823 Adds a utility function for formating list into string.
PiperOrigin-RevId: 484026229
2022-10-26 11:33:30 -07:00
Shuang Song
7d7b670f5d Add functions to derive epsilon lower bounds.
PiperOrigin-RevId: 484021227
2022-10-26 11:15:47 -07:00
A. Unique TensorFlower
3f16540bfc Efficient DP optimizers for sparse models.
PiperOrigin-RevId: 482871514
2022-10-21 13:15:52 -07:00
Galen Andrew
a7d929a21c Bump version for release.
PiperOrigin-RevId: 482286678
2022-10-19 13:21:35 -07:00
Steve Chien
0fcfd0bf69 Remove pfor dependency in BUILD file, and strengthen unit tests for clip_and_aggregate_gradients.py.
PiperOrigin-RevId: 482050282
2022-10-18 16:21:37 -07:00
Steve Chien
4aa531faa4 Remove dependence on six in clip_and_aggregate_gradients.py.
PiperOrigin-RevId: 481750014
2022-10-17 15:07:27 -07:00
A. Unique TensorFlower
d5538fccbb Ensures DPOptimizer objects can be serialized by TensorFlow.
Handles by processing tensors to numpy. Adds tests to now capture this.

PiperOrigin-RevId: 481656298
2022-10-17 09:12:10 -07:00
A. Unique TensorFlower
c25cb4a41b Clip (per-example) and aggregate gradients.
PiperOrigin-RevId: 480761907
2022-10-12 17:43:21 -07:00
A. Unique TensorFlower
71837fbeec Adds DP-FTRL via tree aggregation optimizer DPFTRLTreeAggregationOptimizer.
Includes renaming of `frequency` parameter in restart_query.py to `period` to more more accurately reflect its purpose.

PiperOrigin-RevId: 480736961
2022-10-12 15:47:07 -07:00
A. Unique TensorFlower
5e37c1bc70 Implement initial_sample_state for TreeRangeSumQuery.
PiperOrigin-RevId: 480685277
2022-10-12 12:11:21 -07:00
A. Unique TensorFlower
79fe32a60b Changes DPOptimizerClass to generically accept and use any dp_sum_query.
This enables creation of generic DPOptimizers by user's passing queries. The most common Gaussian query is automatically performed for convenience and backwards compatibility.

Byproducts of this update:
-ensures consistent implementations between the internal (and legacy) `get_gradients` and newer `_compute_gradients` for all queries.
-refactors for python readability.
-includes new tests ensuring that `_num_microbatches=None` is tested.
-changes the `_global_state` to to be initialized in the init function for `_compute_gradients`.

PiperOrigin-RevId: 480668376
2022-10-12 11:03:55 -07:00