This CL also moves the common embedding `sqr_norm_fn` logic between
`tf.keras.layers.Embedding` and `tfm.nlp.layers.OnDeviceEmbedding` into a
new registry function utility file.
PiperOrigin-RevId: 564481407
excluding the batch dimension.
This is a forward-looking change for testing more general layers such as
`tf.keras.layers.LayerNormalization` and `tf.keras.layers.EinsumDense`.
PiperOrigin-RevId: 560709678
These changes are intended to support a more modular system for when we
add more layer registry functions (and their corresponding tests). They are
also made so that we do not have an enormous number of lengthy tests inside
`clip_grads_test.py`.
PiperOrigin-RevId: 545779495
This change adds the following two new features to the above function:
(i) it supports nested custom layers of depth >2;
(ii) it allows the caller to exclude certain layers from the expansion.
Feature (ii) will be needed for the development of DP models that use
Trasformer or BERT-type layers.
PiperOrigin-RevId: 520919934
sklearn.metrics.roc_curve uses classification rules in the form "score >= threshold ==> predict positive".
When calling roc_curve, we used to label test data as positive class. This way, TPR = % test examples classified as test, FPR = % training examples classified as test. The classification rule is "loss >= threshold ==> predict test".
For membership inference, TPR is usually defined as % training examples classified as training, and FPR is % test examples classified as training.
As training samples usually have lower loss, we usually use rules in the form of "loss <= threshold ==> predict training".
Therefore, TPR in the 2nd case is actually (1 - FPR) in the 1st case, FPR in the 2nd case is (1 - TPR) in the 1st case.
This mismatch does not affect attacker advantage or AUC, but this can cause problem to PPV.
Now, we:
- set training set as positive class.
- for threshold and entropy attacks, set score to be -loss, so that higher score corresponds to training data.
- negate the thresholds (computed based on -loss) so that it corresponds to loss.
PiperOrigin-RevId: 519880043
In the current behavior, when using gradient accumulation, the `iterations` variable is incremented at every physical batch, while variables are only updated at every logical batch (where logical batch = accumulation_steps many physical batches). This causes certain optimizers that explicitly depend on `iterations` (such as Adam) to behave very differently under gradient accumulation.
With this change, `iterations` is only incremented after each logical batch.
PiperOrigin-RevId: 517197044