forked from 626_privacy/tensorflow_privacy
Minor documentation improvements.
PiperOrigin-RevId: 318063707
This commit is contained in:
parent
3658ef5dbc
commit
06765f69f0
2 changed files with 15 additions and 15 deletions
|
@ -1,20 +1,20 @@
|
||||||
# Membership inference attack functionality
|
# Membership inference attack
|
||||||
|
|
||||||
The goal is to provide empirical tests of "how much information a machine
|
A good privacy-preserving model learns from the training data, but
|
||||||
learning model has remembered about its training data". To this end, only the
|
doesn't memorize it. This library provides empirical tests for measuring
|
||||||
outputs of the model are used (e.g., losses, logits, predictions). From those
|
potential memorization.
|
||||||
alone, the attacks try to infer whether the corresponding inputs were part of
|
|
||||||
the training set.
|
|
||||||
|
|
||||||
> NOTE: Only the loss values are needed for some examples used during training
|
Technically, the tests build classifiers that infer whether a particular sample
|
||||||
> and some examples that have not been used during training (e.g., some examples
|
was present in the training set. The more accurate such classifier is, the more
|
||||||
> from the test set). No access to actual input data is needed. In case of
|
memorization is present and thus the less privacy-preserving the model is.
|
||||||
> classification models, one can additionally (or instead of losses) provide
|
|
||||||
> logits or output probabilities for stronger attacks.
|
|
||||||
|
|
||||||
The vulnerability of a model is measured via the area under the ROC-curve
|
The privacy vulnerability (or memorization potential) is measured
|
||||||
(`auc`) or via max{|fpr - tpr|} (`advantage`) of the attack classifier. These
|
via the area under the ROC-curve (`auc`) or via max{|fpr - tpr|} (`advantage`)
|
||||||
measures are very closely related.
|
of the attack classifier. These measures are very closely related.
|
||||||
|
|
||||||
|
The tests provided by the library are "black box". That is, only the outputs of
|
||||||
|
the model are used (e.g., losses, logits, predictions). Neither model internals
|
||||||
|
(weights) nor input samples are required.
|
||||||
|
|
||||||
## Highest level -- get attack summary
|
## Highest level -- get attack summary
|
||||||
|
|
||||||
|
|
|
@ -242,7 +242,7 @@
|
||||||
},
|
},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"#@title Calculate logits, probabilities and lossess for training and test sets.\n",
|
"#@title Calculate logits, probabilities and loss values for training and test sets.\n",
|
||||||
"#@markdown We will use these values later in the membership inference attack to\n",
|
"#@markdown We will use these values later in the membership inference attack to\n",
|
||||||
"#@markdown separate training and test samples.\n",
|
"#@markdown separate training and test samples.\n",
|
||||||
"print('Predict on train...')\n",
|
"print('Predict on train...')\n",
|
||||||
|
|
Loading…
Reference in a new issue