These changes are intended to support a more modular system for when we
add more layer registry functions (and their corresponding tests). They are
also made so that we do not have an enormous number of lengthy tests inside
`clip_grads_test.py`.
PiperOrigin-RevId: 545779495
This change adds the following two new features to the above function:
(i) it supports nested custom layers of depth >2;
(ii) it allows the caller to exclude certain layers from the expansion.
Feature (ii) will be needed for the development of DP models that use
Trasformer or BERT-type layers.
PiperOrigin-RevId: 520919934
sklearn.metrics.roc_curve uses classification rules in the form "score >= threshold ==> predict positive".
When calling roc_curve, we used to label test data as positive class. This way, TPR = % test examples classified as test, FPR = % training examples classified as test. The classification rule is "loss >= threshold ==> predict test".
For membership inference, TPR is usually defined as % training examples classified as training, and FPR is % test examples classified as training.
As training samples usually have lower loss, we usually use rules in the form of "loss <= threshold ==> predict training".
Therefore, TPR in the 2nd case is actually (1 - FPR) in the 1st case, FPR in the 2nd case is (1 - TPR) in the 1st case.
This mismatch does not affect attacker advantage or AUC, but this can cause problem to PPV.
Now, we:
- set training set as positive class.
- for threshold and entropy attacks, set score to be -loss, so that higher score corresponds to training data.
- negate the thresholds (computed based on -loss) so that it corresponds to loss.
PiperOrigin-RevId: 519880043
In the current behavior, when using gradient accumulation, the `iterations` variable is incremented at every physical batch, while variables are only updated at every logical batch (where logical batch = accumulation_steps many physical batches). This causes certain optimizers that explicitly depend on `iterations` (such as Adam) to behave very differently under gradient accumulation.
With this change, `iterations` is only incremented after each logical batch.
PiperOrigin-RevId: 517197044
Adds function for computation of example-level DP epsilon taking into account microbatching and not assuming Poisson subsampling. Adds function for computation of user-level DP in terms of group privacy.
PiperOrigin-RevId: 515114010
This is a forward-looking change that is needed to support more complicated
layers, such as `tf.keras.layers.MultiHeadAttention`, which can take `kwargs`
as part of their `.call()` method and can generate arbitrary outputs.
PiperOrigin-RevId: 514775503
This change replaces references to a number of deprecated NumPy type aliases (np.bool, np.int, np.float, np.complex, np.object, np.str) with their recommended replacement (bool, int, float, complex, object, str).
NumPy 1.24 drops the deprecated aliases, so we must remove uses before updating NumPy.
PiperOrigin-RevId: 497194550
This enables creation of generic DPOptimizers by user's passing queries. The most common Gaussian query is automatically performed for convenience and backwards compatibility.
Byproducts of this update:
-ensures consistent implementations between the internal (and legacy) `get_gradients` and newer `_compute_gradients` for all queries.
-refactors for python readability.
-includes new tests ensuring that `_num_microbatches=None` is tested.
-changes the `_global_state` to to be initialized in the init function for `_compute_gradients`.
PiperOrigin-RevId: 480668376
This enables creation of generic DPOptimizers by user's passing queries. The most common Gaussian query is automatically performed for convenience and backwards compatibility.
Byproducts of this update:
-ensures consistent implementations between the internal (and legacy) `get_gradients` and newer `_compute_gradients` for all queries.
-refactors for python readability.
PiperOrigin-RevId: 470883774
* Updated the numpy version.
* Synced the pandas version.
In Python 3.10, if you invoke `pip install pandas~=1.1.4 numpy~=1.21.4` and then `import pandas` you get the following error:
```
>>> import pandas
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/tmp/venv/lib/python3.10/site-packages/pandas/__init__.py", line 30, in <module>
from pandas._libs import hashtable as _hashtable, lib as _lib, tslib as _tslib
File "/tmp/venv/lib/python3.10/site-packages/pandas/_libs/__init__.py", line 13, in <module>
from pandas._libs.interval import Interval
File "pandas/_libs/interval.pyx", line 1, in init pandas._libs.interval
ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject
```
I believe that this is the cause of the issue https://github.com/scikit-learn-contrib/hdbscan/issues/457#issuecomment-773671043
PiperOrigin-RevId: 467952859
Update WORKSPACE, setup.py, and requirements.txt to latest dp-accounting library release.
Update scipy version in setup.py.
Update version to 0.8.1.
PiperOrigin-RevId: 461944491