Commit graph

539 commits

Author SHA1 Message Date
Galen Andrew
67c9c2424c Internal change.
PiperOrigin-RevId: 551648705
2023-07-27 14:55:51 -07:00
Steve Chien
134c898ded Add DP-SGD version of v1 LinearClassifier.
PiperOrigin-RevId: 551350685
2023-07-26 16:38:41 -07:00
Shuang Song
225355258c Calls epsilon computation in MIA.
PiperOrigin-RevId: 551003589
2023-07-25 14:50:13 -07:00
Steve Chien
8e60864559 Minor code cleanup to compute_dp_sgd_privacy_lib and update dp_accounting dependency.
PiperOrigin-RevId: 550695787
2023-07-24 15:48:45 -07:00
A. Unique TensorFlower
c1c97f1c1c Modify fast clipping logic to support computation on TPUs.
PiperOrigin-RevId: 550673798
2023-07-24 14:28:45 -07:00
A. Unique TensorFlower
6b8007ddde Re-organize files and simplify test names.
These changes are intended to support a more modular system for when we
add more layer registry functions (and their corresponding tests). They are
also made so that we do not have an enormous number of lengthy tests inside
`clip_grads_test.py`.

PiperOrigin-RevId: 545779495
2023-07-05 14:01:09 -07:00
A. Unique TensorFlower
9536fb26e7 Update LICENSE rules for TF Privacy to the new API.
PiperOrigin-RevId: 545774183
2023-07-05 13:42:20 -07:00
Vadym Doroshenko
a147a480a5 Finish implementation of custom indices names.
PiperOrigin-RevId: 545440374
2023-07-04 07:19:18 -07:00
Vadym Doroshenko
93f5a5249c Add slice names for custom slices
PiperOrigin-RevId: 544599507
2023-06-30 02:33:03 -07:00
Galen Andrew
f953e834df New version updates dependendices.
PiperOrigin-RevId: 544133468
2023-06-28 12:46:28 -07:00
Vadym Doroshenko
45da453410 Implement possibility to return slice indices.
PiperOrigin-RevId: 540885025
2023-06-16 08:22:43 -07:00
Zheng Xu
a4bdb05b62 zCDP to epsilon for tree aggregation accounting.
PiperOrigin-RevId: 539706770
2023-06-12 11:09:14 -07:00
Walid Krichene
18c43b351b Support weighted losses in DPModel.
PiperOrigin-RevId: 538011437
2023-06-05 16:27:19 -07:00
A. Unique TensorFlower
0f5acf868e Add additional tests and checks on the passed loss function.
PiperOrigin-RevId: 532225904
2023-05-15 14:27:46 -07:00
Shuang Song
8fdac5f833 Test DPModel in distributed training.
PiperOrigin-RevId: 528039164
2023-04-28 18:57:29 -07:00
Walid Krichene
e65e14b2d6 Fix bug in DPModel that shows up in distributed training.
PiperOrigin-RevId: 528026372
2023-04-28 17:31:18 -07:00
Galen Andrew
9710a4acc7 Bump version and update dependenciesfor pypi release.
PiperOrigin-RevId: 527377853
2023-04-26 14:35:24 -07:00
A. Unique TensorFlower
33bbc87ff2 Use better group privacy bound in computing user level privacy [TF Privacy]
PiperOrigin-RevId: 526852999
2023-04-24 22:17:24 -07:00
Michael Reneer
60cb0dd2fb Update tensorflow privacy to use NamedTuple instead of attrs.
This allows these objects to be traversed when nested in tree-like structures more easily.

PiperOrigin-RevId: 525532511
2023-04-19 13:18:25 -07:00
Shuang Song
e362f51773 Supports slicing for multi-label data.
PiperOrigin-RevId: 523846333
2023-04-12 17:14:11 -07:00
Galen Andrew
d5e41e20ad More detailed description of arguments in compute_dp_sgd_privacy.
PiperOrigin-RevId: 522693217
2023-04-07 15:07:35 -07:00
Shuang Song
c4628d5dbc Skips adding noise when noise_multiplier is 0 for fast clipping.
PiperOrigin-RevId: 522396275
2023-04-06 11:54:55 -07:00
Shuang Song
de9836883d Skips noise addition when noise_multiplier is 0. Fix a typo.
PiperOrigin-RevId: 521912964
2023-04-04 17:48:24 -07:00
A. Unique TensorFlower
ee1abe6930 Generalize generate_model_outputs_using_core_keras_layers().
This change adds the following two new features to the above function:
(i) it supports nested custom layers of depth >2;
(ii) it allows the caller to exclude certain layers from the expansion.

Feature (ii) will be needed for the development of DP models that use
Trasformer or BERT-type layers.

PiperOrigin-RevId: 520919934
2023-03-31 07:41:16 -07:00
Galen Andrew
abb0c3f9f6 Migrates compute_dp_sgd_privacy to print new privacy statement from compute_dp_sgd_privacy_lib.
PiperOrigin-RevId: 520147633
2023-03-28 15:14:12 -07:00
A. Unique TensorFlower
781483d1f2 Make compute_dp_sgd_privacy_statement visible.
PiperOrigin-RevId: 520105385
2023-03-28 12:43:21 -07:00
Shuang Song
e125951c9b Sets training set as positive class for sklearn.metrics.roc_curve.
sklearn.metrics.roc_curve uses classification rules in the form "score >= threshold ==> predict positive".
When calling roc_curve, we used to label test data as positive class. This way, TPR = % test examples classified as test, FPR = % training examples classified as test. The classification rule is "loss >= threshold ==> predict test".

For membership inference, TPR is usually defined as % training examples classified as training, and FPR is % test examples classified as training.
As training samples usually have lower loss, we usually use rules in the form of "loss <= threshold ==> predict training".

Therefore, TPR in the 2nd case is actually (1 - FPR) in the 1st case, FPR in the 2nd case is (1 - TPR) in the 1st case.
This mismatch does not affect attacker advantage or AUC, but this can cause problem to PPV.

Now, we:
- set training set as positive class.
- for threshold and entropy attacks, set score to be -loss, so that higher score corresponds to training data.
- negate the thresholds (computed based on -loss) so that it corresponds to loss.

PiperOrigin-RevId: 519880043
2023-03-27 18:00:25 -07:00
A. Unique TensorFlower
7796369d8b Support gradient norm computation with respect to a subset of variables.
PiperOrigin-RevId: 519245638
2023-03-24 14:57:54 -07:00
Galen Andrew
d5d60e2eac Adds compute_dp_sgd_privacy_statement for accurate privacy accounting report.
PiperOrigin-RevId: 518934979
2023-03-23 12:37:12 -07:00
Walid Krichene
52806ba952 In dp_optimizer_keras_sparse, update iterations to reflect the number of logical batches, rather than physical batches.
In the current behavior, when using gradient accumulation, the `iterations` variable is incremented at every physical batch, while variables are only updated at every logical batch (where logical batch = accumulation_steps many physical batches). This causes certain optimizers that explicitly depend on `iterations` (such as Adam) to behave very differently under gradient accumulation.

With this change, `iterations` is only incremented after each logical batch.

PiperOrigin-RevId: 517197044
2023-03-16 12:35:57 -07:00
A. Unique TensorFlower
7ae50c5ca5 Generalize model_forward_pass() to allow input models with multiple outputs.
PiperOrigin-RevId: 517145254
2023-03-16 09:36:16 -07:00
A. Unique TensorFlower
043e8b5272 Report the true loss in DPModel instead of the norm-adjusted loss.
PiperOrigin-RevId: 517112812
2023-03-16 07:15:13 -07:00
A. Unique TensorFlower
8f4ab1a8bb Allow custom per example loss functions for computing per microbatch gradient norm.
PiperOrigin-RevId: 516897864
2023-03-15 12:28:39 -07:00
Galen Andrew
c2bd4c3c6f Bump version number.
PiperOrigin-RevId: 515456888
2023-03-09 15:22:34 -08:00
Galen Andrew
701a585e1a Revert to dp-accounting 0.3.0 API.
PiperOrigin-RevId: 515432485
2023-03-09 13:56:34 -08:00
Galen Andrew
61dfbcc1f5 Adds functions for more accurate privacy accounting.
Adds function for computation of example-level DP epsilon taking into account microbatching and not assuming Poisson subsampling. Adds function for computation of user-level DP in terms of group privacy.

PiperOrigin-RevId: 515114010
2023-03-08 12:44:39 -08:00
A. Unique TensorFlower
4e1fc252e4 Add a kwargs argument to the registry API + small changes to docstrings.
This is a forward-looking change that is needed to support more complicated
layers, such as `tf.keras.layers.MultiHeadAttention`, which can take `kwargs`
as part of their `.call()` method and can generate arbitrary outputs.

PiperOrigin-RevId: 514775503
2023-03-07 10:35:04 -08:00
Steve Chien
21ee1a607a Fix unneeded dependency.
PiperOrigin-RevId: 514523996
2023-03-06 14:20:33 -08:00
Zheng Xu
0a0f377f3f Adaptive clipping in DP-FTRL with restart.
PiperOrigin-RevId: 513934548
2023-03-06 07:16:57 -08:00
A. Unique TensorFlower
8bfafdd74d Efficient DPSGD with support to microbatched losses.
PiperOrigin-RevId: 513886957
2023-03-06 07:01:03 -08:00
Walid Krichene
cbf34f2b04 Update type annotations of gradient clipping library.
PiperOrigin-RevId: 513640655
2023-03-02 14:29:17 -08:00
A. Unique TensorFlower
7436930c64 Improve documentation and logging of fast gradient clipping modules and callers.
PiperOrigin-RevId: 513283486
2023-03-01 10:56:01 -08:00
Shuang Song
4a418e8862 Adds __init__.py for fast_gradient_clipping.
PiperOrigin-RevId: 512236191
2023-02-24 21:32:07 -08:00
A. Unique TensorFlower
dda7fa8b39 Add a tf.GradientTape argument to the layer registry functions
PiperOrigin-RevId: 512160655
2023-02-24 14:14:36 -08:00
Shuang Song
4dd8d0ffde Catches when data is not sufficient for StratifiedKFold split.
PiperOrigin-RevId: 510197242
2023-02-16 11:24:12 -08:00
Shuang Song
0c691d0b4d Returns None for getting max results when results are empty.
PiperOrigin-RevId: 510054673
2023-02-15 23:37:43 -08:00
A. Unique TensorFlower
13534e5159 Add better tests for clip_grads.py
PiperOrigin-RevId: 509529435
2023-02-14 08:01:56 -08:00
A. Unique TensorFlower
430f103354 Generalize the registry function for the embedding layer for other models.
PiperOrigin-RevId: 509528743
2023-02-14 07:59:10 -08:00
A. Unique TensorFlower
410814ec39 Generalize the internal API to allow for more general models + layers.
PiperOrigin-RevId: 509518753
2023-02-14 07:10:40 -08:00
Shuang Song
6ee988885a Fix a bug in get_flattened_attack_metrics that types, slices, metrics do not
correspond to values because of PPV.

PiperOrigin-RevId: 509274994
2023-02-13 10:53:29 -08:00
A. Unique TensorFlower
9ed34da715 Integrate the fast gradient clipping algorithm with the DP Keras Model class.
PiperOrigin-RevId: 504931452
2023-01-26 13:45:56 -08:00
A. Unique TensorFlower
bc84ed7bfb Add fast gradient clipping tests.
PiperOrigin-RevId: 504923799
2023-01-26 13:16:19 -08:00
A. Unique TensorFlower
a3b14ae20a First implementation of the fast gradient clipping algorithm.
PiperOrigin-RevId: 504668189
2023-01-25 14:51:09 -08:00
Steve Chien
ee3d349a8d Fix copybara removal of tkinter library.
PiperOrigin-RevId: 504656239
2023-01-25 14:06:27 -08:00
Yilei Yang
622282e034 Update dependency on tkinter.
PiperOrigin-RevId: 503401013
2023-01-20 03:24:46 -08:00
Peter Hawkins
3d038a490a [NumPy] Remove references to deprecated NumPy type aliases.
This change replaces references to a number of deprecated NumPy type aliases (np.bool, np.int, np.float, np.complex, np.object, np.str) with their recommended replacement (bool, int, float, complex, object, str).

NumPy 1.24 drops the deprecated aliases, so we must remove uses before updating NumPy.

PiperOrigin-RevId: 497194550
2022-12-22 10:32:59 -08:00
Steve Chien
f99a74c7a4 Fix dependencies required by privacy_tests.
Update `distutils` to `packaging`.

PiperOrigin-RevId: 496713867
2022-12-20 11:49:28 -08:00
Shuang Song
2040f08f0d Allows slicing by custom indices.
PiperOrigin-RevId: 486998645
2022-11-08 11:05:26 -08:00
A. Unique TensorFlower
ec747a8d75 Correct imports of keras loss utils
PiperOrigin-RevId: 486795765
2022-11-07 16:34:00 -08:00
A. Unique TensorFlower
e334633466 Bugfix.
PiperOrigin-RevId: 486344068
2022-11-05 05:18:58 -07:00
Shuang Song
f7e1e61823 Adds a utility function for formating list into string.
PiperOrigin-RevId: 484026229
2022-10-26 11:33:30 -07:00
Shuang Song
7d7b670f5d Add functions to derive epsilon lower bounds.
PiperOrigin-RevId: 484021227
2022-10-26 11:15:47 -07:00
A. Unique TensorFlower
3f16540bfc Efficient DP optimizers for sparse models.
PiperOrigin-RevId: 482871514
2022-10-21 13:15:52 -07:00
Galen Andrew
a7d929a21c Bump version for release.
PiperOrigin-RevId: 482286678
2022-10-19 13:21:35 -07:00
Steve Chien
0fcfd0bf69 Remove pfor dependency in BUILD file, and strengthen unit tests for clip_and_aggregate_gradients.py.
PiperOrigin-RevId: 482050282
2022-10-18 16:21:37 -07:00
Steve Chien
4aa531faa4 Remove dependence on six in clip_and_aggregate_gradients.py.
PiperOrigin-RevId: 481750014
2022-10-17 15:07:27 -07:00
A. Unique TensorFlower
d5538fccbb Ensures DPOptimizer objects can be serialized by TensorFlow.
Handles by processing tensors to numpy. Adds tests to now capture this.

PiperOrigin-RevId: 481656298
2022-10-17 09:12:10 -07:00
A. Unique TensorFlower
c25cb4a41b Clip (per-example) and aggregate gradients.
PiperOrigin-RevId: 480761907
2022-10-12 17:43:21 -07:00
A. Unique TensorFlower
71837fbeec Adds DP-FTRL via tree aggregation optimizer DPFTRLTreeAggregationOptimizer.
Includes renaming of `frequency` parameter in restart_query.py to `period` to more more accurately reflect its purpose.

PiperOrigin-RevId: 480736961
2022-10-12 15:47:07 -07:00
A. Unique TensorFlower
5e37c1bc70 Implement initial_sample_state for TreeRangeSumQuery.
PiperOrigin-RevId: 480685277
2022-10-12 12:11:21 -07:00
A. Unique TensorFlower
79fe32a60b Changes DPOptimizerClass to generically accept and use any dp_sum_query.
This enables creation of generic DPOptimizers by user's passing queries. The most common Gaussian query is automatically performed for convenience and backwards compatibility.

Byproducts of this update:
-ensures consistent implementations between the internal (and legacy) `get_gradients` and newer `_compute_gradients` for all queries.
-refactors for python readability.
-includes new tests ensuring that `_num_microbatches=None` is tested.
-changes the `_global_state` to to be initialized in the init function for `_compute_gradients`.

PiperOrigin-RevId: 480668376
2022-10-12 11:03:55 -07:00
A. Unique TensorFlower
f8ed0fcd9c Fix SumAggregationDPQuery's initial_sample_state raising a ValueError when called on TensorSpec.
PiperOrigin-RevId: 480474975
2022-10-11 16:02:00 -07:00
A. Unique TensorFlower
0738d6f555 Bugfix.
PiperOrigin-RevId: 478591776
2022-10-03 13:33:33 -07:00
A. Unique TensorFlower
3f6d0acdef Add ability to use sample weights to the membership attack models, where they are supported by the underlying Scikit-Learn estimators. Only the Logistic Regression and Random Forest estimators support sample weights.
PiperOrigin-RevId: 478542133
2022-10-03 10:32:31 -07:00
Chen Qian
c6c3334b57 Code changes to get ready for an incoming Keras optimizer migration.
DP optimizer only supports legacy optimizer.

PiperOrigin-RevId: 474137890
2022-09-13 15:20:26 -07:00
Shuang Song
08364adcb7 Allow squared loss to take in labels and predictions of the same number of elements but different shapes.
PiperOrigin-RevId: 474059427
2022-09-13 10:32:58 -07:00
Yilei Yang
ebae6c086e Make this code compatible with Python 3.10.
PiperOrigin-RevId: 473313795
2022-09-09 12:20:05 -07:00
Chen Qian
715fd1a670 Code changes to get ready for an incoming Keras optimizer migration.
Because the code subclasses the legacy Keras optimizer, we should explicitly use the legacy optimizer.

PiperOrigin-RevId: 473092233
2022-09-08 14:56:56 -07:00
Steve Chien
407e5c8e11 Clarify logic in Keras version of DP-SGD optimizer, and add a unit test involving clipping on multiple variables.
PiperOrigin-RevId: 472559697
2022-09-06 14:36:43 -07:00
Steve Chien
ed73077b60 Change version to 0.8.5. (Previously incorrectly skipped ahead to 0.8.6)
PiperOrigin-RevId: 471118348
2022-08-30 16:28:14 -07:00
Steve Chien
875b7f46bd Automated rollback of commit cff47686f6
PiperOrigin-RevId: 471104040
2022-08-30 15:23:08 -07:00
A. Unique TensorFlower
cff47686f6 Changes DPOptimizerClass to generically accept and use any dp_sum_query.
This enables creation of generic DPOptimizers by user's passing queries. The most common Gaussian query is automatically performed for convenience and backwards compatibility.

Byproducts of this update:
-ensures consistent implementations between the internal (and legacy) `get_gradients` and newer `_compute_gradients` for all queries.
-refactors for python readability.

PiperOrigin-RevId: 470883774
2022-08-29 20:22:40 -07:00
Steve Chien
ed16033a92 Update pinned commit of dp-accounting library, update dependency versions, and increase version to 0.8.6.
PiperOrigin-RevId: 470334560
2022-08-26 14:30:16 -07:00
Shuang Song
9f4feade7d Add more documentation for gradient_accumulation_steps in keras optimizer.
PiperOrigin-RevId: 469310667
2022-08-22 16:16:46 -07:00
Steve Chien
9e25eee68b Update remaining DPQuery tests to TF2.
PiperOrigin-RevId: 468793518
2022-08-19 15:08:15 -07:00
Steve Chien
fd64be5b5b Update several DPQuery tests to TF v2.
PiperOrigin-RevId: 468763153
2022-08-19 12:40:49 -07:00
Steve Chien
d6ad59226d Update tests for optimizer classes to TF 2.
PiperOrigin-RevId: 468587323
2022-08-18 17:38:01 -07:00
Steve Chien
5dd11fcdd6 Add import of log_loss in keras_evaluation.py.
PiperOrigin-RevId: 468294581
2022-08-17 14:31:20 -07:00
Michael Reneer
052f9a3128 Update the version of numpy to 1.23.2.
* Updated the numpy version.
* Synced the pandas version.

In Python 3.10, if you invoke `pip install pandas~=1.1.4 numpy~=1.21.4` and then `import pandas` you get the following error:

```
>>> import pandas
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/tmp/venv/lib/python3.10/site-packages/pandas/__init__.py", line 30, in <module>
    from pandas._libs import hashtable as _hashtable, lib as _lib, tslib as _tslib
  File "/tmp/venv/lib/python3.10/site-packages/pandas/_libs/__init__.py", line 13, in <module>
    from pandas._libs.interval import Interval
  File "pandas/_libs/interval.pyx", line 1, in init pandas._libs.interval
ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject
```

I believe that this is the cause of the issue https://github.com/scikit-learn-contrib/hdbscan/issues/457#issuecomment-773671043

PiperOrigin-RevId: 467952859
2022-08-16 10:02:07 -07:00
Shuang Song
40d73ed240 Add logging for secret sharer exposure computation.
PiperOrigin-RevId: 467771239
2022-08-15 15:06:42 -07:00
Galen Andrew
8a449aaa27 Correct discrepancy between tensorflow-probability versions in requirements.txt vs setup.py.
PiperOrigin-RevId: 467326193
2022-08-12 17:04:55 -07:00
Galen Andrew
5a9866726d Change requirements for tensorflow-probability and pandas.
PiperOrigin-RevId: 467220343
2022-08-12 08:58:14 -07:00
Galen Andrew
ca077a5b12 Use calibrate_dp_mechanism from differential_privacy library instead of custom binary search.
PiperOrigin-RevId: 466798182
2022-08-10 15:19:44 -07:00
Shuang Song
a9abfbc244 Allow specifying loss function with string.
PiperOrigin-RevId: 465333272
2022-08-04 09:31:28 -07:00
Steve Chien
a8a5206841 Update TFP to version 0.8.2.
PiperOrigin-RevId: 463664333
2022-07-27 13:29:15 -07:00
Steve Chien
848cfc74c1 Add logistic regression functions to API.
PiperOrigin-RevId: 463645193
2022-07-27 12:05:06 -07:00
Shuang Song
17cd0c52bc Refactor: move loss computation utilities under privacy_tests.
PiperOrigin-RevId: 463391913
2022-07-26 11:49:40 -07:00
Michael Reneer
d16f020329 Fix usage of logging API.
PiperOrigin-RevId: 463123944
2022-07-25 10:48:31 -07:00
Galen Andrew
4cb0a11c4b Automated rollback of commit db292fc5d8
PiperOrigin-RevId: 462171425
2022-07-20 10:16:48 -07:00
Steve Chien
38fe4aa984 Changes to prepare for release of v0.8.1.
Update WORKSPACE, setup.py, and requirements.txt to latest dp-accounting library release.

Update scipy version in setup.py.

Update version to 0.8.1.

PiperOrigin-RevId: 461944491
2022-07-19 12:22:07 -07:00