forked from 626_privacy/tensorflow_privacy
Moving membership_inference_attack to privacy_tests/membership_inference_attack
PiperOrigin-RevId: 377860420
This commit is contained in:
parent
eaf9fbf969
commit
c12a7acd9d
46 changed files with 2899 additions and 2681 deletions
26
LICENSE
26
LICENSE
|
@ -200,3 +200,29 @@
|
||||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
See the License for the specific language governing permissions and
|
See the License for the specific language governing permissions and
|
||||||
limitations under the License.
|
limitations under the License.
|
||||||
|
|
||||||
|
------------------
|
||||||
|
|
||||||
|
Files: privacy/membership_inference_attack/codelabs/third_party/seq2seq_membership_inference/*
|
||||||
|
|
||||||
|
MIT License
|
||||||
|
|
||||||
|
Copyright (c) 2019 Congzheng Song
|
||||||
|
|
||||||
|
Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||||
|
of this software and associated documentation files (the "Software"), to deal
|
||||||
|
in the Software without restriction, including without limitation the rights
|
||||||
|
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
||||||
|
copies of the Software, and to permit persons to whom the Software is
|
||||||
|
furnished to do so, subject to the following conditions:
|
||||||
|
|
||||||
|
The above copyright notice and this permission notice shall be included in all
|
||||||
|
copies or substantial portions of the Software.
|
||||||
|
|
||||||
|
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||||
|
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||||
|
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||||
|
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||||
|
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
||||||
|
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
||||||
|
SOFTWARE.
|
||||||
|
|
|
@ -1,269 +1,2 @@
|
||||||
# Membership inference attack
|
The sources from this folder were moved to
|
||||||
|
privacy/privacy_tests/membership_inference_attack.
|
||||||
A good privacy-preserving model learns from the training data, but
|
|
||||||
doesn't memorize it. This library provides empirical tests for measuring
|
|
||||||
potential memorization.
|
|
||||||
|
|
||||||
Technically, the tests build classifiers that infer whether a particular sample
|
|
||||||
was present in the training set. The more accurate such classifier is, the more
|
|
||||||
memorization is present and thus the less privacy-preserving the model is.
|
|
||||||
|
|
||||||
The privacy vulnerability (or memorization potential) is measured
|
|
||||||
via the area under the ROC-curve (`auc`) or via max{|fpr - tpr|} (`advantage`)
|
|
||||||
of the attack classifier. These measures are very closely related.
|
|
||||||
|
|
||||||
The tests provided by the library are "black box". That is, only the outputs of
|
|
||||||
the model are used (e.g., losses, logits, predictions). Neither model internals
|
|
||||||
(weights) nor input samples are required.
|
|
||||||
|
|
||||||
## How to use
|
|
||||||
|
|
||||||
### Installation notes
|
|
||||||
|
|
||||||
To use the latest version of the MIA library, please install TF Privacy with
|
|
||||||
"pip install -U git+https://github.com/tensorflow/privacy". See
|
|
||||||
https://github.com/tensorflow/privacy/issues/151 for more details.
|
|
||||||
|
|
||||||
### Basic usage
|
|
||||||
|
|
||||||
The simplest possible usage is
|
|
||||||
|
|
||||||
```python
|
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack import membership_inference_attack as mia
|
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import AttackInputData
|
|
||||||
|
|
||||||
# Suppose we have the labels as integers starting from 0
|
|
||||||
# labels_train shape: (n_train, )
|
|
||||||
# labels_test shape: (n_test, )
|
|
||||||
|
|
||||||
# Evaluate your model on training and test examples to get
|
|
||||||
# loss_train shape: (n_train, )
|
|
||||||
# loss_test shape: (n_test, )
|
|
||||||
|
|
||||||
attacks_result = mia.run_attacks(
|
|
||||||
AttackInputData(
|
|
||||||
loss_train = loss_train,
|
|
||||||
loss_test = loss_test,
|
|
||||||
labels_train = labels_train,
|
|
||||||
labels_test = labels_test))
|
|
||||||
```
|
|
||||||
|
|
||||||
This example calls `run_attacks` with the default options to run a host of
|
|
||||||
(fairly simple) attacks behind the scenes (depending on which data is fed in),
|
|
||||||
and computes the most important measures.
|
|
||||||
|
|
||||||
> NOTE: The train and test sets are balanced internally, i.e., an equal number
|
|
||||||
> of in-training and out-of-training examples is chosen for the attacks
|
|
||||||
> (whichever has fewer examples). These are subsampled uniformly at random
|
|
||||||
> without replacement from the larger of the two.
|
|
||||||
|
|
||||||
Then, we can view the attack results by:
|
|
||||||
|
|
||||||
```python
|
|
||||||
print(attacks_result.summary())
|
|
||||||
# Example output:
|
|
||||||
# -> Best-performing attacks over all slices
|
|
||||||
# THRESHOLD_ATTACK (with 50000 training and 10000 test examples) achieved an AUC of 0.59 on slice Entire dataset
|
|
||||||
# THRESHOLD_ATTACK (with 50000 training and 10000 test examples) achieved an advantage of 0.20 on slice Entire dataset
|
|
||||||
```
|
|
||||||
|
|
||||||
### Other codelabs
|
|
||||||
|
|
||||||
Please head over to the [codelabs](https://github.com/tensorflow/privacy/tree/master/tensorflow_privacy/privacy/membership_inference_attack/codelabs)
|
|
||||||
section for an overview of the library in action.
|
|
||||||
|
|
||||||
### Advanced usage
|
|
||||||
|
|
||||||
#### Specifying attacks to run
|
|
||||||
|
|
||||||
Sometimes, we have more information about the data, such as the logits and the
|
|
||||||
labels,
|
|
||||||
and we may want to have finer-grained control of the attack, such as using more
|
|
||||||
complicated classifiers instead of the simple threshold attack, and looks at the
|
|
||||||
attack results by examples' class.
|
|
||||||
In thoses cases, we can provide more information to `run_attacks`.
|
|
||||||
|
|
||||||
```python
|
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack import membership_inference_attack as mia
|
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import AttackInputData
|
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import SlicingSpec
|
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import AttackType
|
|
||||||
```
|
|
||||||
|
|
||||||
First, similar as before, we specify the input for the attack as an
|
|
||||||
`AttackInputData` object:
|
|
||||||
|
|
||||||
```python
|
|
||||||
# Evaluate your model on training and test examples to get
|
|
||||||
# logits_train shape: (n_train, n_classes)
|
|
||||||
# logits_test shape: (n_test, n_classes)
|
|
||||||
# loss_train shape: (n_train, )
|
|
||||||
# loss_test shape: (n_test, )
|
|
||||||
|
|
||||||
attack_input = AttackInputData(
|
|
||||||
logits_train = logits_train,
|
|
||||||
logits_test = logits_test,
|
|
||||||
loss_train = loss_train,
|
|
||||||
loss_test = loss_test,
|
|
||||||
labels_train = labels_train,
|
|
||||||
labels_test = labels_test)
|
|
||||||
```
|
|
||||||
|
|
||||||
Instead of `logits`, you can also specify
|
|
||||||
`probs_train` and `probs_test` as the predicted probabilty vectors of each
|
|
||||||
example.
|
|
||||||
|
|
||||||
Then, we specify some details of the attack.
|
|
||||||
The first part includes the specifications of the slicing of the data. For
|
|
||||||
example, we may want to evaluate the result on the whole dataset, or by class,
|
|
||||||
percentiles, or the correctness of the model's classification.
|
|
||||||
These can be specified by a `SlicingSpec` object.
|
|
||||||
|
|
||||||
```python
|
|
||||||
slicing_spec = SlicingSpec(
|
|
||||||
entire_dataset = True,
|
|
||||||
by_class = True,
|
|
||||||
by_percentiles = False,
|
|
||||||
by_classification_correctness = True)
|
|
||||||
```
|
|
||||||
|
|
||||||
The second part specifies the classifiers for the attacker to use.
|
|
||||||
Currently, our API supports five classifiers, including
|
|
||||||
`AttackType.THRESHOLD_ATTACK` for simple threshold attack,
|
|
||||||
`AttackType.LOGISTIC_REGRESSION`,
|
|
||||||
`AttackType.MULTI_LAYERED_PERCEPTRON`,
|
|
||||||
`AttackType.RANDOM_FOREST`, and
|
|
||||||
`AttackType.K_NEAREST_NEIGHBORS`
|
|
||||||
which use the corresponding machine learning models.
|
|
||||||
For some model, different classifiers can yield pertty different results.
|
|
||||||
We can put multiple classifers in a list:
|
|
||||||
|
|
||||||
```python
|
|
||||||
attack_types = [
|
|
||||||
AttackType.THRESHOLD_ATTACK,
|
|
||||||
AttackType.LOGISTIC_REGRESSION
|
|
||||||
]
|
|
||||||
```
|
|
||||||
|
|
||||||
Now, we can call the `run_attacks` methods with all specifications:
|
|
||||||
|
|
||||||
```python
|
|
||||||
attacks_result = mia.run_attacks(attack_input=attack_input,
|
|
||||||
slicing_spec=slicing_spec,
|
|
||||||
attack_types=attack_types)
|
|
||||||
```
|
|
||||||
|
|
||||||
This returns an object of type `AttackResults`. We can, for example, use the
|
|
||||||
following code to see the attack results specificed per-slice, as we have
|
|
||||||
request attacks by class and by model's classification correctness.
|
|
||||||
|
|
||||||
```python
|
|
||||||
print(attacks_result.summary(by_slices = True))
|
|
||||||
# Example output:
|
|
||||||
# -> Best-performing attacks over all slices
|
|
||||||
# THRESHOLD_ATTACK achieved an AUC of 0.75 on slice CORRECTLY_CLASSIFIED=False
|
|
||||||
# THRESHOLD_ATTACK achieved an advantage of 0.38 on slice CORRECTLY_CLASSIFIED=False
|
|
||||||
#
|
|
||||||
# Best-performing attacks over slice: "Entire dataset"
|
|
||||||
# LOGISTIC_REGRESSION achieved an AUC of 0.61
|
|
||||||
# THRESHOLD_ATTACK achieved an advantage of 0.22
|
|
||||||
#
|
|
||||||
# Best-performing attacks over slice: "CLASS=0"
|
|
||||||
# LOGISTIC_REGRESSION achieved an AUC of 0.62
|
|
||||||
# LOGISTIC_REGRESSION achieved an advantage of 0.24
|
|
||||||
#
|
|
||||||
# Best-performing attacks over slice: "CLASS=1"
|
|
||||||
# LOGISTIC_REGRESSION achieved an AUC of 0.61
|
|
||||||
# LOGISTIC_REGRESSION achieved an advantage of 0.19
|
|
||||||
#
|
|
||||||
# ...
|
|
||||||
#
|
|
||||||
# Best-performing attacks over slice: "CORRECTLY_CLASSIFIED=True"
|
|
||||||
# LOGISTIC_REGRESSION achieved an AUC of 0.53
|
|
||||||
# THRESHOLD_ATTACK achieved an advantage of 0.05
|
|
||||||
#
|
|
||||||
# Best-performing attacks over slice: "CORRECTLY_CLASSIFIED=False"
|
|
||||||
# THRESHOLD_ATTACK achieved an AUC of 0.75
|
|
||||||
# THRESHOLD_ATTACK achieved an advantage of 0.38
|
|
||||||
```
|
|
||||||
|
|
||||||
|
|
||||||
#### Viewing and plotting the attack results
|
|
||||||
|
|
||||||
We have seen an example of using `summary()` to view the attack results as text.
|
|
||||||
We also provide some other ways for inspecting the attack results.
|
|
||||||
|
|
||||||
To get the attack that achieves the maximum attacker advantage or AUC, we can do
|
|
||||||
|
|
||||||
```python
|
|
||||||
max_auc_attacker = attacks_result.get_result_with_max_auc()
|
|
||||||
max_advantage_attacker = attacks_result.get_result_with_max_attacker_advantage()
|
|
||||||
```
|
|
||||||
Then, for individual attack, such as `max_auc_attacker`, we can check its type,
|
|
||||||
attacker advantage and AUC by
|
|
||||||
|
|
||||||
```python
|
|
||||||
print("Attack type with max AUC: %s, AUC of %.2f, Attacker advantage of %.2f" %
|
|
||||||
(max_auc_attacker.attack_type,
|
|
||||||
max_auc_attacker.roc_curve.get_auc(),
|
|
||||||
max_auc_attacker.roc_curve.get_attacker_advantage()))
|
|
||||||
# Example output:
|
|
||||||
# -> Attack type with max AUC: THRESHOLD_ATTACK, AUC of 0.75, Attacker advantage of 0.38
|
|
||||||
```
|
|
||||||
We can also plot its ROC curve by
|
|
||||||
|
|
||||||
```python
|
|
||||||
import tensorflow_privacy.privacy.membership_inference_attack.plotting as plotting
|
|
||||||
|
|
||||||
figure = plotting.plot_roc_curve(max_auc_attacker.roc_curve)
|
|
||||||
```
|
|
||||||
which would give a figure like the one below
|
|
||||||
![roc_fig](https://github.com/tensorflow/privacy/blob/master/tensorflow_privacy/privacy/membership_inference_attack/codelab_roc_fig.png?raw=true)
|
|
||||||
|
|
||||||
Additionally, we provide functionality to convert the attack results into Pandas
|
|
||||||
data frame:
|
|
||||||
|
|
||||||
```python
|
|
||||||
import pandas as pd
|
|
||||||
|
|
||||||
pd.set_option("display.max_rows", 8, "display.max_columns", None)
|
|
||||||
print(attacks_result.calculate_pd_dataframe())
|
|
||||||
# Example output:
|
|
||||||
# slice feature slice value attack type Attacker advantage AUC
|
|
||||||
# 0 entire_dataset threshold 0.216440 0.600630
|
|
||||||
# 1 entire_dataset lr 0.212073 0.612989
|
|
||||||
# 2 class 0 threshold 0.226000 0.611669
|
|
||||||
# 3 class 0 lr 0.239452 0.624076
|
|
||||||
# .. ... ... ... ... ...
|
|
||||||
# 22 correctly_classfied True threshold 0.054907 0.471290
|
|
||||||
# 23 correctly_classfied True lr 0.046986 0.525194
|
|
||||||
# 24 correctly_classfied False threshold 0.379465 0.748138
|
|
||||||
# 25 correctly_classfied False lr 0.370713 0.737148
|
|
||||||
```
|
|
||||||
|
|
||||||
### External guides / press mentions
|
|
||||||
|
|
||||||
* [Introductory blog post](https://franziska-boenisch.de/posts/2021/01/membership-inference/)
|
|
||||||
to the theory and the library by Franziska Boenisch from the Fraunhofer AISEC
|
|
||||||
institute.
|
|
||||||
* [Google AI Blog Post](https://ai.googleblog.com/2021/01/google-research-looking-back-at-2020.html#ResponsibleAI)
|
|
||||||
* [TensorFlow Blog Post](https://blog.tensorflow.org/2020/06/introducing-new-privacy-testing-library.html)
|
|
||||||
* [VentureBeat article](https://venturebeat.com/2020/06/24/google-releases-experimental-tensorflow-module-that-tests-the-privacy-of-ai-models/)
|
|
||||||
* [Tech Xplore article](https://techxplore.com/news/2020-06-google-tensorflow-privacy-module.html)
|
|
||||||
|
|
||||||
|
|
||||||
## Contact / Feedback
|
|
||||||
|
|
||||||
Fill out this
|
|
||||||
[Google form](https://docs.google.com/forms/d/1DPwr3_OfMcqAOA6sdelTVjIZhKxMZkXvs94z16UCDa4/edit)
|
|
||||||
or reach out to us at tf-privacy@google.com and let us know how you’re using
|
|
||||||
this module. We’re keen on hearing your stories, feedback, and suggestions!
|
|
||||||
|
|
||||||
## Contributing
|
|
||||||
|
|
||||||
If you wish to add novel attacks to the attack library, please check our
|
|
||||||
[guidelines](https://github.com/tensorflow/privacy/blob/master/tensorflow_privacy/privacy/membership_inference_attack/CONTRIBUTING.md).
|
|
||||||
|
|
||||||
## Copyright
|
|
||||||
|
|
||||||
Copyright 2021 - Google LLC
|
|
||||||
|
|
|
@ -11,3 +11,13 @@
|
||||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
# See the License for the specific language governing permissions and
|
# See the License for the specific language governing permissions and
|
||||||
# limitations under the License.
|
# limitations under the License.
|
||||||
|
"""The old location of Membership Inference Attack sources."""
|
||||||
|
|
||||||
|
import warnings
|
||||||
|
|
||||||
|
warnings.warn(
|
||||||
|
"\nMembership inference attack sources were moved. Please replace"
|
||||||
|
"\nimport tensorflow_privacy.privacy.membership_inference_attack\n"
|
||||||
|
"\nwith"
|
||||||
|
"\nimport tensorflow_privacy.privacy.privacy_tests.membership_inference_attack"
|
||||||
|
)
|
||||||
|
|
|
@ -13,807 +13,6 @@
|
||||||
# limitations under the License.
|
# limitations under the License.
|
||||||
|
|
||||||
# Lint as: python3
|
# Lint as: python3
|
||||||
"""Data structures representing attack inputs, configuration, outputs."""
|
"""Moved to privacy_attack/membership_inference_attack."""
|
||||||
import collections
|
|
||||||
import enum
|
|
||||||
import glob
|
|
||||||
import os
|
|
||||||
import pickle
|
|
||||||
from typing import Any, Iterable, Union
|
|
||||||
from dataclasses import dataclass
|
|
||||||
import numpy as np
|
|
||||||
import pandas as pd
|
|
||||||
from scipy import special
|
|
||||||
from sklearn import metrics
|
|
||||||
import tensorflow_privacy.privacy.membership_inference_attack.utils as utils
|
|
||||||
|
|
||||||
ENTIRE_DATASET_SLICE_STR = 'Entire dataset'
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import * # pylint: disable=wildcard-import
|
||||||
|
|
||||||
|
|
||||||
class SlicingFeature(enum.Enum):
|
|
||||||
"""Enum with features by which slicing is available."""
|
|
||||||
CLASS = 'class'
|
|
||||||
PERCENTILE = 'percentile'
|
|
||||||
CORRECTLY_CLASSIFIED = 'correctly_classified'
|
|
||||||
|
|
||||||
|
|
||||||
@dataclass
|
|
||||||
class SingleSliceSpec:
|
|
||||||
"""Specifies a slice.
|
|
||||||
|
|
||||||
The slice is defined by values in one feature - it might be a single value
|
|
||||||
(eg. slice of examples of the specific classification class) or some set of
|
|
||||||
values (eg. range of percentiles of the attacked model loss).
|
|
||||||
|
|
||||||
When feature is None, it means that the slice is the entire dataset.
|
|
||||||
"""
|
|
||||||
feature: SlicingFeature = None
|
|
||||||
value: Any = None
|
|
||||||
|
|
||||||
@property
|
|
||||||
def entire_dataset(self):
|
|
||||||
return self.feature is None
|
|
||||||
|
|
||||||
def __str__(self):
|
|
||||||
if self.entire_dataset:
|
|
||||||
return ENTIRE_DATASET_SLICE_STR
|
|
||||||
|
|
||||||
if self.feature == SlicingFeature.PERCENTILE:
|
|
||||||
return 'Loss percentiles: %d-%d' % self.value
|
|
||||||
|
|
||||||
return '%s=%s' % (self.feature.name, self.value)
|
|
||||||
|
|
||||||
|
|
||||||
@dataclass
|
|
||||||
class SlicingSpec:
|
|
||||||
"""Specification of a slicing procedure.
|
|
||||||
|
|
||||||
Each variable which is set specifies a slicing by different dimension.
|
|
||||||
"""
|
|
||||||
|
|
||||||
# When is set to true, one of the slices is the whole dataset.
|
|
||||||
entire_dataset: bool = True
|
|
||||||
|
|
||||||
# Used in classification tasks for slicing by classes. It is assumed that
|
|
||||||
# classes are integers 0, 1, ... number of classes. When true one slice per
|
|
||||||
# each class is generated.
|
|
||||||
by_class: Union[bool, Iterable[int], int] = False
|
|
||||||
|
|
||||||
# if true, it generates 10 slices for percentiles of the loss - 0-10%, 10-20%,
|
|
||||||
# ... 90-100%.
|
|
||||||
by_percentiles: bool = False
|
|
||||||
|
|
||||||
# When true, a slice for correctly classifed and a slice for misclassifed
|
|
||||||
# examples will be generated.
|
|
||||||
by_classification_correctness: bool = False
|
|
||||||
|
|
||||||
def __str__(self):
|
|
||||||
"""Only keeps the True values."""
|
|
||||||
result = ['SlicingSpec(']
|
|
||||||
if self.entire_dataset:
|
|
||||||
result.append(' Entire dataset,')
|
|
||||||
if self.by_class:
|
|
||||||
if isinstance(self.by_class, Iterable):
|
|
||||||
result.append(' Into classes %s,' % self.by_class)
|
|
||||||
elif isinstance(self.by_class, int):
|
|
||||||
result.append(' Up to class %d,' % self.by_class)
|
|
||||||
else:
|
|
||||||
result.append(' By classes,')
|
|
||||||
if self.by_percentiles:
|
|
||||||
result.append(' By percentiles,')
|
|
||||||
if self.by_classification_correctness:
|
|
||||||
result.append(' By classification correctness,')
|
|
||||||
result.append(')')
|
|
||||||
return '\n'.join(result)
|
|
||||||
|
|
||||||
|
|
||||||
class AttackType(enum.Enum):
|
|
||||||
"""An enum define attack types."""
|
|
||||||
LOGISTIC_REGRESSION = 'lr'
|
|
||||||
MULTI_LAYERED_PERCEPTRON = 'mlp'
|
|
||||||
RANDOM_FOREST = 'rf'
|
|
||||||
K_NEAREST_NEIGHBORS = 'knn'
|
|
||||||
THRESHOLD_ATTACK = 'threshold'
|
|
||||||
THRESHOLD_ENTROPY_ATTACK = 'threshold-entropy'
|
|
||||||
|
|
||||||
@property
|
|
||||||
def is_trained_attack(self):
|
|
||||||
"""Returns whether this type of attack requires training a model."""
|
|
||||||
return (self != AttackType.THRESHOLD_ATTACK) and (
|
|
||||||
self != AttackType.THRESHOLD_ENTROPY_ATTACK)
|
|
||||||
|
|
||||||
def __str__(self):
|
|
||||||
"""Returns LOGISTIC_REGRESSION instead of AttackType.LOGISTIC_REGRESSION."""
|
|
||||||
return '%s' % self.name
|
|
||||||
|
|
||||||
|
|
||||||
class PrivacyMetric(enum.Enum):
|
|
||||||
"""An enum for the supported privacy risk metrics."""
|
|
||||||
AUC = 'AUC'
|
|
||||||
ATTACKER_ADVANTAGE = 'Attacker advantage'
|
|
||||||
|
|
||||||
def __str__(self):
|
|
||||||
"""Returns 'AUC' instead of PrivacyMetric.AUC."""
|
|
||||||
return '%s' % self.value
|
|
||||||
|
|
||||||
|
|
||||||
def _is_integer_type_array(a):
|
|
||||||
return np.issubdtype(a.dtype, np.integer)
|
|
||||||
|
|
||||||
|
|
||||||
def _is_last_dim_equal(arr1, arr1_name, arr2, arr2_name):
|
|
||||||
"""Checks whether the last dimension of the arrays is the same."""
|
|
||||||
if arr1 is not None and arr2 is not None and arr1.shape[-1] != arr2.shape[-1]:
|
|
||||||
raise ValueError('%s and %s should have the same number of features.' %
|
|
||||||
(arr1_name, arr2_name))
|
|
||||||
|
|
||||||
|
|
||||||
def _is_array_one_dimensional(arr, arr_name):
|
|
||||||
"""Checks whether the array is one dimensional."""
|
|
||||||
if arr is not None and len(arr.shape) != 1:
|
|
||||||
raise ValueError('%s should be a one dimensional numpy array.' % arr_name)
|
|
||||||
|
|
||||||
|
|
||||||
def _is_np_array(arr, arr_name):
|
|
||||||
"""Checks whether array is a numpy array."""
|
|
||||||
if arr is not None and not isinstance(arr, np.ndarray):
|
|
||||||
raise ValueError('%s should be a numpy array.' % arr_name)
|
|
||||||
|
|
||||||
|
|
||||||
def _log_value(probs, small_value=1e-30):
|
|
||||||
"""Compute the log value on the probability. Clip probabilities close to 0."""
|
|
||||||
return -np.log(np.maximum(probs, small_value))
|
|
||||||
|
|
||||||
|
|
||||||
@dataclass
|
|
||||||
class AttackInputData:
|
|
||||||
"""Input data for running an attack.
|
|
||||||
|
|
||||||
This includes only the data, and not configuration.
|
|
||||||
"""
|
|
||||||
|
|
||||||
logits_train: np.ndarray = None
|
|
||||||
logits_test: np.ndarray = None
|
|
||||||
|
|
||||||
# Predicted probabilities for each class. They can be derived from logits,
|
|
||||||
# so they can be set only if logits are not explicitly provided.
|
|
||||||
probs_train: np.ndarray = None
|
|
||||||
probs_test: np.ndarray = None
|
|
||||||
|
|
||||||
# Contains ground-truth classes. Classes are assumed to be integers starting
|
|
||||||
# from 0.
|
|
||||||
labels_train: np.ndarray = None
|
|
||||||
labels_test: np.ndarray = None
|
|
||||||
|
|
||||||
# Explicitly specified loss. If provided, this is used instead of deriving
|
|
||||||
# loss from logits and labels
|
|
||||||
loss_train: np.ndarray = None
|
|
||||||
loss_test: np.ndarray = None
|
|
||||||
|
|
||||||
# Explicitly specified prediction entropy. If provided, this is used instead
|
|
||||||
# of deriving entropy from logits and labels
|
|
||||||
# (https://arxiv.org/pdf/2003.10595.pdf by Song and Mittal).
|
|
||||||
entropy_train: np.ndarray = None
|
|
||||||
entropy_test: np.ndarray = None
|
|
||||||
|
|
||||||
@property
|
|
||||||
def num_classes(self):
|
|
||||||
if self.labels_train is None or self.labels_test is None:
|
|
||||||
raise ValueError(
|
|
||||||
'Can\'t identify the number of classes as no labels were provided. '
|
|
||||||
'Please set labels_train and labels_test')
|
|
||||||
return int(max(np.max(self.labels_train), np.max(self.labels_test))) + 1
|
|
||||||
|
|
||||||
@property
|
|
||||||
def logits_or_probs_train(self):
|
|
||||||
"""Returns train logits or probs whatever is not None."""
|
|
||||||
if self.logits_train is not None:
|
|
||||||
return self.logits_train
|
|
||||||
return self.probs_train
|
|
||||||
|
|
||||||
@property
|
|
||||||
def logits_or_probs_test(self):
|
|
||||||
"""Returns test logits or probs whatever is not None."""
|
|
||||||
if self.logits_test is not None:
|
|
||||||
return self.logits_test
|
|
||||||
return self.probs_test
|
|
||||||
|
|
||||||
@staticmethod
|
|
||||||
def _get_entropy(logits: np.ndarray, true_labels: np.ndarray):
|
|
||||||
"""Computes the prediction entropy (by Song and Mittal)."""
|
|
||||||
if (np.absolute(np.sum(logits, axis=1) - 1) <= 1e-3).all():
|
|
||||||
probs = logits
|
|
||||||
else:
|
|
||||||
# Using softmax to compute probability from logits.
|
|
||||||
probs = special.softmax(logits, axis=1)
|
|
||||||
if true_labels is None:
|
|
||||||
# When not given ground truth label, we compute the
|
|
||||||
# normal prediction entropy.
|
|
||||||
# See the Equation (7) in https://arxiv.org/pdf/2003.10595.pdf
|
|
||||||
return np.sum(np.multiply(probs, _log_value(probs)), axis=1)
|
|
||||||
else:
|
|
||||||
# When given the ground truth label, we compute the
|
|
||||||
# modified prediction entropy.
|
|
||||||
# See the Equation (8) in https://arxiv.org/pdf/2003.10595.pdf
|
|
||||||
log_probs = _log_value(probs)
|
|
||||||
reverse_probs = 1 - probs
|
|
||||||
log_reverse_probs = _log_value(reverse_probs)
|
|
||||||
modified_probs = np.copy(probs)
|
|
||||||
modified_probs[range(true_labels.size),
|
|
||||||
true_labels] = reverse_probs[range(true_labels.size),
|
|
||||||
true_labels]
|
|
||||||
modified_log_probs = np.copy(log_reverse_probs)
|
|
||||||
modified_log_probs[range(true_labels.size),
|
|
||||||
true_labels] = log_probs[range(true_labels.size),
|
|
||||||
true_labels]
|
|
||||||
return np.sum(np.multiply(modified_probs, modified_log_probs), axis=1)
|
|
||||||
|
|
||||||
def get_loss_train(self):
|
|
||||||
"""Calculates (if needed) cross-entropy losses for the training set.
|
|
||||||
|
|
||||||
Returns:
|
|
||||||
Loss (or None if neither the loss nor the labels are present).
|
|
||||||
"""
|
|
||||||
if self.loss_train is None:
|
|
||||||
if self.labels_train is None:
|
|
||||||
return None
|
|
||||||
if self.logits_train is not None:
|
|
||||||
self.loss_train = utils.log_loss_from_logits(self.labels_train,
|
|
||||||
self.logits_train)
|
|
||||||
else:
|
|
||||||
self.loss_train = utils.log_loss(self.labels_train, self.probs_train)
|
|
||||||
return self.loss_train
|
|
||||||
|
|
||||||
def get_loss_test(self):
|
|
||||||
"""Calculates (if needed) cross-entropy losses for the test set.
|
|
||||||
|
|
||||||
Returns:
|
|
||||||
Loss (or None if neither the loss nor the labels are present).
|
|
||||||
"""
|
|
||||||
if self.loss_test is None:
|
|
||||||
if self.labels_test is None:
|
|
||||||
return None
|
|
||||||
if self.logits_test is not None:
|
|
||||||
self.loss_test = utils.log_loss_from_logits(self.labels_test,
|
|
||||||
self.logits_test)
|
|
||||||
else:
|
|
||||||
self.loss_test = utils.log_loss(self.labels_test, self.probs_test)
|
|
||||||
return self.loss_test
|
|
||||||
|
|
||||||
def get_entropy_train(self):
|
|
||||||
"""Calculates prediction entropy for the training set."""
|
|
||||||
if self.entropy_train is not None:
|
|
||||||
return self.entropy_train
|
|
||||||
return self._get_entropy(self.logits_train, self.labels_train)
|
|
||||||
|
|
||||||
def get_entropy_test(self):
|
|
||||||
"""Calculates prediction entropy for the test set."""
|
|
||||||
if self.entropy_test is not None:
|
|
||||||
return self.entropy_test
|
|
||||||
return self._get_entropy(self.logits_test, self.labels_test)
|
|
||||||
|
|
||||||
def get_train_size(self):
|
|
||||||
"""Returns size of the training set."""
|
|
||||||
if self.loss_train is not None:
|
|
||||||
return self.loss_train.size
|
|
||||||
if self.entropy_train is not None:
|
|
||||||
return self.entropy_train.size
|
|
||||||
return self.logits_or_probs_train.shape[0]
|
|
||||||
|
|
||||||
def get_test_size(self):
|
|
||||||
"""Returns size of the test set."""
|
|
||||||
if self.loss_test is not None:
|
|
||||||
return self.loss_test.size
|
|
||||||
if self.entropy_test is not None:
|
|
||||||
return self.entropy_test.size
|
|
||||||
return self.logits_or_probs_test.shape[0]
|
|
||||||
|
|
||||||
def validate(self):
|
|
||||||
"""Validates the inputs."""
|
|
||||||
if (self.loss_train is None) != (self.loss_test is None):
|
|
||||||
raise ValueError(
|
|
||||||
'loss_test and loss_train should both be either set or unset')
|
|
||||||
|
|
||||||
if (self.entropy_train is None) != (self.entropy_test is None):
|
|
||||||
raise ValueError(
|
|
||||||
'entropy_test and entropy_train should both be either set or unset')
|
|
||||||
|
|
||||||
if (self.logits_train is None) != (self.logits_test is None):
|
|
||||||
raise ValueError(
|
|
||||||
'logits_train and logits_test should both be either set or unset')
|
|
||||||
|
|
||||||
if (self.probs_train is None) != (self.probs_test is None):
|
|
||||||
raise ValueError(
|
|
||||||
'probs_train and probs_test should both be either set or unset')
|
|
||||||
|
|
||||||
if (self.logits_train is not None) and (self.probs_train is not None):
|
|
||||||
raise ValueError('Logits and probs can not be both set')
|
|
||||||
|
|
||||||
if (self.labels_train is None) != (self.labels_test is None):
|
|
||||||
raise ValueError(
|
|
||||||
'labels_train and labels_test should both be either set or unset')
|
|
||||||
|
|
||||||
if (self.labels_train is None and self.loss_train is None and
|
|
||||||
self.logits_train is None and self.entropy_train is None):
|
|
||||||
raise ValueError(
|
|
||||||
'At least one of labels, logits, losses or entropy should be set')
|
|
||||||
|
|
||||||
if self.labels_train is not None and not _is_integer_type_array(
|
|
||||||
self.labels_train):
|
|
||||||
raise ValueError('labels_train elements should have integer type')
|
|
||||||
|
|
||||||
if self.labels_test is not None and not _is_integer_type_array(
|
|
||||||
self.labels_test):
|
|
||||||
raise ValueError('labels_test elements should have integer type')
|
|
||||||
|
|
||||||
_is_np_array(self.logits_train, 'logits_train')
|
|
||||||
_is_np_array(self.logits_test, 'logits_test')
|
|
||||||
_is_np_array(self.probs_train, 'probs_train')
|
|
||||||
_is_np_array(self.probs_test, 'probs_test')
|
|
||||||
_is_np_array(self.labels_train, 'labels_train')
|
|
||||||
_is_np_array(self.labels_test, 'labels_test')
|
|
||||||
_is_np_array(self.loss_train, 'loss_train')
|
|
||||||
_is_np_array(self.loss_test, 'loss_test')
|
|
||||||
_is_np_array(self.entropy_train, 'entropy_train')
|
|
||||||
_is_np_array(self.entropy_test, 'entropy_test')
|
|
||||||
|
|
||||||
_is_last_dim_equal(self.logits_train, 'logits_train', self.logits_test,
|
|
||||||
'logits_test')
|
|
||||||
_is_last_dim_equal(self.probs_train, 'probs_train', self.probs_test,
|
|
||||||
'probs_test')
|
|
||||||
_is_array_one_dimensional(self.loss_train, 'loss_train')
|
|
||||||
_is_array_one_dimensional(self.loss_test, 'loss_test')
|
|
||||||
_is_array_one_dimensional(self.entropy_train, 'entropy_train')
|
|
||||||
_is_array_one_dimensional(self.entropy_test, 'entropy_test')
|
|
||||||
_is_array_one_dimensional(self.labels_train, 'labels_train')
|
|
||||||
_is_array_one_dimensional(self.labels_test, 'labels_test')
|
|
||||||
|
|
||||||
def __str__(self):
|
|
||||||
"""Return the shapes of variables that are not None."""
|
|
||||||
result = ['AttackInputData(']
|
|
||||||
_append_array_shape(self.loss_train, 'loss_train', result)
|
|
||||||
_append_array_shape(self.loss_test, 'loss_test', result)
|
|
||||||
_append_array_shape(self.entropy_train, 'entropy_train', result)
|
|
||||||
_append_array_shape(self.entropy_test, 'entropy_test', result)
|
|
||||||
_append_array_shape(self.logits_train, 'logits_train', result)
|
|
||||||
_append_array_shape(self.logits_test, 'logits_test', result)
|
|
||||||
_append_array_shape(self.probs_train, 'probs_train', result)
|
|
||||||
_append_array_shape(self.probs_test, 'probs_test', result)
|
|
||||||
_append_array_shape(self.labels_train, 'labels_train', result)
|
|
||||||
_append_array_shape(self.labels_test, 'labels_test', result)
|
|
||||||
result.append(')')
|
|
||||||
return '\n'.join(result)
|
|
||||||
|
|
||||||
|
|
||||||
def _append_array_shape(arr: np.array, arr_name: str, result):
|
|
||||||
if arr is not None:
|
|
||||||
result.append(' %s with shape: %s,' % (arr_name, arr.shape))
|
|
||||||
|
|
||||||
|
|
||||||
@dataclass
|
|
||||||
class RocCurve:
|
|
||||||
"""Represents ROC curve of a membership inference classifier."""
|
|
||||||
# Thresholds used to define points on ROC curve.
|
|
||||||
# Thresholds are not explicitly part of the curve, and are stored for
|
|
||||||
# debugging purposes.
|
|
||||||
thresholds: np.ndarray
|
|
||||||
|
|
||||||
# True positive rates based on thresholds
|
|
||||||
tpr: np.ndarray
|
|
||||||
|
|
||||||
# False positive rates based on thresholds
|
|
||||||
fpr: np.ndarray
|
|
||||||
|
|
||||||
def get_auc(self):
|
|
||||||
"""Calculates area under curve (aka AUC)."""
|
|
||||||
return metrics.auc(self.fpr, self.tpr)
|
|
||||||
|
|
||||||
def get_attacker_advantage(self):
|
|
||||||
"""Calculates membership attacker's (or adversary's) advantage.
|
|
||||||
|
|
||||||
This metric is inspired by https://arxiv.org/abs/1709.01604, specifically
|
|
||||||
by Definition 4. The difference here is that we calculate maximum advantage
|
|
||||||
over all available classifier thresholds.
|
|
||||||
|
|
||||||
Returns:
|
|
||||||
a single float number with membership attacker's advantage.
|
|
||||||
"""
|
|
||||||
return max(np.abs(self.tpr - self.fpr))
|
|
||||||
|
|
||||||
def __str__(self):
|
|
||||||
"""Returns AUC and advantage metrics."""
|
|
||||||
return '\n'.join([
|
|
||||||
'RocCurve(',
|
|
||||||
' AUC: %.2f' % self.get_auc(),
|
|
||||||
' Attacker advantage: %.2f' % self.get_attacker_advantage(), ')'
|
|
||||||
])
|
|
||||||
|
|
||||||
|
|
||||||
# (no. of training examples, no. of test examples) for the test.
|
|
||||||
DataSize = collections.namedtuple('DataSize', 'ntrain ntest')
|
|
||||||
|
|
||||||
|
|
||||||
@dataclass
|
|
||||||
class SingleAttackResult:
|
|
||||||
"""Results from running a single attack."""
|
|
||||||
|
|
||||||
# Data slice this result was calculated for.
|
|
||||||
slice_spec: SingleSliceSpec
|
|
||||||
|
|
||||||
# (no. of training examples, no. of test examples) for the test.
|
|
||||||
data_size: DataSize
|
|
||||||
attack_type: AttackType
|
|
||||||
|
|
||||||
# NOTE: roc_curve could theoretically be derived from membership scores.
|
|
||||||
# Currently, we store it explicitly since not all attack types support
|
|
||||||
# membership scores.
|
|
||||||
# TODO(b/175870479): Consider deriving ROC curve from the membership scores.
|
|
||||||
|
|
||||||
# ROC curve representing the accuracy of the attacker
|
|
||||||
roc_curve: RocCurve
|
|
||||||
|
|
||||||
# Membership score is some measure of confidence of this attacker that
|
|
||||||
# a particular sample is a member of the training set.
|
|
||||||
#
|
|
||||||
# This is NOT necessarily probability. The nature of this score depends on
|
|
||||||
# the type of attacker. Scores from different attacker types are not directly
|
|
||||||
# comparable, but can be compared in relative terms (e.g. considering order
|
|
||||||
# imposed by this measure).
|
|
||||||
#
|
|
||||||
|
|
||||||
# Membership scores for the training set samples. For a perfect attacker,
|
|
||||||
# all training samples will have higher scores than test samples.
|
|
||||||
membership_scores_train: np.ndarray = None
|
|
||||||
|
|
||||||
# Membership scores for the test set samples. For a perfect attacker, all
|
|
||||||
# test set samples will have lower scores than the training set samples.
|
|
||||||
membership_scores_test: np.ndarray = None
|
|
||||||
|
|
||||||
def get_attacker_advantage(self):
|
|
||||||
return self.roc_curve.get_attacker_advantage()
|
|
||||||
|
|
||||||
def get_auc(self):
|
|
||||||
return self.roc_curve.get_auc()
|
|
||||||
|
|
||||||
def __str__(self):
|
|
||||||
"""Returns SliceSpec, AttackType, AUC and advantage metrics."""
|
|
||||||
return '\n'.join([
|
|
||||||
'SingleAttackResult(',
|
|
||||||
' SliceSpec: %s' % str(self.slice_spec),
|
|
||||||
' DataSize: (ntrain=%d, ntest=%d)' % (self.data_size.ntrain,
|
|
||||||
self.data_size.ntest),
|
|
||||||
' AttackType: %s' % str(self.attack_type),
|
|
||||||
' AUC: %.2f' % self.get_auc(),
|
|
||||||
' Attacker advantage: %.2f' % self.get_attacker_advantage(), ')'
|
|
||||||
])
|
|
||||||
|
|
||||||
|
|
||||||
@dataclass
|
|
||||||
class SingleMembershipProbabilityResult:
|
|
||||||
"""Results from computing membership probabilities (denoted as privacy risk score in https://arxiv.org/abs/2003.10595).
|
|
||||||
|
|
||||||
this part shows how to leverage membership probabilities to perform attacks
|
|
||||||
with thresholding on them.
|
|
||||||
"""
|
|
||||||
|
|
||||||
# Data slice this result was calculated for.
|
|
||||||
slice_spec: SingleSliceSpec
|
|
||||||
|
|
||||||
train_membership_probs: np.ndarray
|
|
||||||
|
|
||||||
test_membership_probs: np.ndarray
|
|
||||||
|
|
||||||
def attack_with_varied_thresholds(self, threshold_list):
|
|
||||||
"""Performs an attack with the specified thresholds.
|
|
||||||
|
|
||||||
For each threshold value, we count how many training and test samples with
|
|
||||||
membership probabilities larger than the threshold and further compute
|
|
||||||
precision and recall values. We skip the threshold value if it is larger
|
|
||||||
than every sample's membership probability.
|
|
||||||
|
|
||||||
Args:
|
|
||||||
threshold_list: List of provided thresholds
|
|
||||||
|
|
||||||
Returns:
|
|
||||||
An array of attack results.
|
|
||||||
"""
|
|
||||||
fpr, tpr, thresholds = metrics.roc_curve(
|
|
||||||
np.concatenate((np.ones(len(self.train_membership_probs)),
|
|
||||||
np.zeros(len(self.test_membership_probs)))),
|
|
||||||
np.concatenate(
|
|
||||||
(self.train_membership_probs, self.test_membership_probs)),
|
|
||||||
drop_intermediate=False)
|
|
||||||
|
|
||||||
precision_list = []
|
|
||||||
recall_list = []
|
|
||||||
meaningful_threshold_list = []
|
|
||||||
max_prob = max(self.train_membership_probs.max(),
|
|
||||||
self.test_membership_probs.max())
|
|
||||||
for threshold in threshold_list:
|
|
||||||
if threshold <= max_prob:
|
|
||||||
idx = np.argwhere(thresholds >= threshold)[-1][0]
|
|
||||||
meaningful_threshold_list.append(threshold)
|
|
||||||
precision_list.append(tpr[idx] / (tpr[idx] + fpr[idx]))
|
|
||||||
recall_list.append(tpr[idx])
|
|
||||||
|
|
||||||
return np.array(meaningful_threshold_list), np.array(
|
|
||||||
precision_list), np.array(recall_list)
|
|
||||||
|
|
||||||
def collect_results(self, threshold_list, return_roc_results=True):
|
|
||||||
"""The membership probability (from 0 to 1) represents each sample's probability of being in the training set.
|
|
||||||
|
|
||||||
Usually, we choose a list of threshold values from 0.5 (uncertain of
|
|
||||||
training or test) to 1 (100% certain of training)
|
|
||||||
to compute corresponding attack precision and recall.
|
|
||||||
|
|
||||||
Args:
|
|
||||||
threshold_list: List of provided thresholds
|
|
||||||
return_roc_results: Whether to return ROC results
|
|
||||||
|
|
||||||
Returns:
|
|
||||||
Summary string.
|
|
||||||
"""
|
|
||||||
meaningful_threshold_list, precision_list, recall_list = self.attack_with_varied_thresholds(
|
|
||||||
threshold_list)
|
|
||||||
summary = []
|
|
||||||
summary.append('\nMembership probability analysis over slice: \"%s\"' %
|
|
||||||
str(self.slice_spec))
|
|
||||||
for i in range(len(meaningful_threshold_list)):
|
|
||||||
summary.append(
|
|
||||||
' with %.4f as the threshold on membership probability, the precision-recall pair is (%.4f, %.4f)'
|
|
||||||
% (meaningful_threshold_list[i], precision_list[i], recall_list[i]))
|
|
||||||
if return_roc_results:
|
|
||||||
fpr, tpr, thresholds = metrics.roc_curve(
|
|
||||||
np.concatenate((np.ones(len(self.train_membership_probs)),
|
|
||||||
np.zeros(len(self.test_membership_probs)))),
|
|
||||||
np.concatenate(
|
|
||||||
(self.train_membership_probs, self.test_membership_probs)))
|
|
||||||
roc_curve = RocCurve(tpr=tpr, fpr=fpr, thresholds=thresholds)
|
|
||||||
summary.append(
|
|
||||||
' thresholding on membership probability achieved an AUC of %.2f' %
|
|
||||||
(roc_curve.get_auc()))
|
|
||||||
summary.append(
|
|
||||||
' thresholding on membership probability achieved an advantage of %.2f'
|
|
||||||
% (roc_curve.get_attacker_advantage()))
|
|
||||||
return summary
|
|
||||||
|
|
||||||
|
|
||||||
@dataclass
|
|
||||||
class MembershipProbabilityResults:
|
|
||||||
"""Membership probability results from multiple data slices."""
|
|
||||||
|
|
||||||
membership_prob_results: Iterable[SingleMembershipProbabilityResult]
|
|
||||||
|
|
||||||
def summary(self, threshold_list):
|
|
||||||
"""Returns the summary of membership probability analyses on all slices."""
|
|
||||||
summary = []
|
|
||||||
for single_result in self.membership_prob_results:
|
|
||||||
single_summary = single_result.collect_results(threshold_list)
|
|
||||||
summary.extend(single_summary)
|
|
||||||
return '\n'.join(summary)
|
|
||||||
|
|
||||||
|
|
||||||
@dataclass
|
|
||||||
class PrivacyReportMetadata:
|
|
||||||
"""Metadata about the evaluated model.
|
|
||||||
|
|
||||||
Used to create a privacy report based on AttackResults.
|
|
||||||
"""
|
|
||||||
accuracy_train: float = None
|
|
||||||
accuracy_test: float = None
|
|
||||||
|
|
||||||
loss_train: float = None
|
|
||||||
loss_test: float = None
|
|
||||||
|
|
||||||
model_variant_label: str = 'Default model variant'
|
|
||||||
epoch_num: int = None
|
|
||||||
|
|
||||||
|
|
||||||
class AttackResultsDFColumns(enum.Enum):
|
|
||||||
"""Columns for the Pandas DataFrame that stores AttackResults metrics."""
|
|
||||||
SLICE_FEATURE = 'slice feature'
|
|
||||||
SLICE_VALUE = 'slice value'
|
|
||||||
DATA_SIZE_TRAIN = 'train size'
|
|
||||||
DATA_SIZE_TEST = 'test size'
|
|
||||||
ATTACK_TYPE = 'attack type'
|
|
||||||
|
|
||||||
def __str__(self):
|
|
||||||
"""Returns 'slice value' instead of AttackResultsDFColumns.SLICE_VALUE."""
|
|
||||||
return '%s' % self.value
|
|
||||||
|
|
||||||
|
|
||||||
@dataclass
|
|
||||||
class AttackResults:
|
|
||||||
"""Results from running multiple attacks."""
|
|
||||||
single_attack_results: Iterable[SingleAttackResult]
|
|
||||||
|
|
||||||
privacy_report_metadata: PrivacyReportMetadata = None
|
|
||||||
|
|
||||||
def calculate_pd_dataframe(self):
|
|
||||||
"""Returns all metrics as a Pandas DataFrame."""
|
|
||||||
slice_features = []
|
|
||||||
slice_values = []
|
|
||||||
data_size_train = []
|
|
||||||
data_size_test = []
|
|
||||||
attack_types = []
|
|
||||||
advantages = []
|
|
||||||
aucs = []
|
|
||||||
|
|
||||||
for attack_result in self.single_attack_results:
|
|
||||||
slice_spec = attack_result.slice_spec
|
|
||||||
if slice_spec.entire_dataset:
|
|
||||||
slice_feature, slice_value = str(slice_spec), ''
|
|
||||||
else:
|
|
||||||
slice_feature, slice_value = slice_spec.feature.value, slice_spec.value
|
|
||||||
slice_features.append(str(slice_feature))
|
|
||||||
slice_values.append(str(slice_value))
|
|
||||||
data_size_train.append(attack_result.data_size.ntrain)
|
|
||||||
data_size_test.append(attack_result.data_size.ntest)
|
|
||||||
attack_types.append(str(attack_result.attack_type))
|
|
||||||
advantages.append(float(attack_result.get_attacker_advantage()))
|
|
||||||
aucs.append(float(attack_result.get_auc()))
|
|
||||||
|
|
||||||
df = pd.DataFrame({
|
|
||||||
str(AttackResultsDFColumns.SLICE_FEATURE): slice_features,
|
|
||||||
str(AttackResultsDFColumns.SLICE_VALUE): slice_values,
|
|
||||||
str(AttackResultsDFColumns.DATA_SIZE_TRAIN): data_size_train,
|
|
||||||
str(AttackResultsDFColumns.DATA_SIZE_TEST): data_size_test,
|
|
||||||
str(AttackResultsDFColumns.ATTACK_TYPE): attack_types,
|
|
||||||
str(PrivacyMetric.ATTACKER_ADVANTAGE): advantages,
|
|
||||||
str(PrivacyMetric.AUC): aucs
|
|
||||||
})
|
|
||||||
return df
|
|
||||||
|
|
||||||
def summary(self, by_slices=False) -> str:
|
|
||||||
"""Provides a summary of the metrics.
|
|
||||||
|
|
||||||
The summary provides the best-performing attacks for each requested data
|
|
||||||
slice.
|
|
||||||
Args:
|
|
||||||
by_slices : whether to prepare a per-slice summary.
|
|
||||||
|
|
||||||
Returns:
|
|
||||||
A string with a summary of all the metrics.
|
|
||||||
"""
|
|
||||||
summary = []
|
|
||||||
|
|
||||||
# Summary over all slices
|
|
||||||
max_auc_result_all = self.get_result_with_max_attacker_advantage()
|
|
||||||
summary.append('Best-performing attacks over all slices')
|
|
||||||
summary.append(
|
|
||||||
' %s (with %d training and %d test examples) achieved an AUC of %.2f on slice %s'
|
|
||||||
% (max_auc_result_all.attack_type,
|
|
||||||
max_auc_result_all.data_size.ntrain,
|
|
||||||
max_auc_result_all.data_size.ntest,
|
|
||||||
max_auc_result_all.get_auc(),
|
|
||||||
max_auc_result_all.slice_spec))
|
|
||||||
|
|
||||||
max_advantage_result_all = self.get_result_with_max_attacker_advantage()
|
|
||||||
summary.append(
|
|
||||||
' %s (with %d training and %d test examples) achieved an advantage of %.2f on slice %s'
|
|
||||||
% (max_advantage_result_all.attack_type,
|
|
||||||
max_advantage_result_all.data_size.ntrain,
|
|
||||||
max_advantage_result_all.data_size.ntest,
|
|
||||||
max_advantage_result_all.get_attacker_advantage(),
|
|
||||||
max_advantage_result_all.slice_spec))
|
|
||||||
|
|
||||||
slice_dict = self._group_results_by_slice()
|
|
||||||
|
|
||||||
if by_slices and len(slice_dict.keys()) > 1:
|
|
||||||
for slice_str in slice_dict:
|
|
||||||
results = slice_dict[slice_str]
|
|
||||||
summary.append('\nBest-performing attacks over slice: \"%s\"' %
|
|
||||||
slice_str)
|
|
||||||
max_auc_result = results.get_result_with_max_auc()
|
|
||||||
summary.append(
|
|
||||||
' %s (with %d training and %d test examples) achieved an AUC of %.2f'
|
|
||||||
% (max_auc_result.attack_type,
|
|
||||||
max_auc_result.data_size.ntrain,
|
|
||||||
max_auc_result.data_size.ntest,
|
|
||||||
max_auc_result.get_auc()))
|
|
||||||
max_advantage_result = results.get_result_with_max_attacker_advantage()
|
|
||||||
summary.append(
|
|
||||||
' %s (with %d training and %d test examples) achieved an advantage of %.2f'
|
|
||||||
% (max_advantage_result.attack_type,
|
|
||||||
max_advantage_result.data_size.ntrain,
|
|
||||||
max_auc_result.data_size.ntest,
|
|
||||||
max_advantage_result.get_attacker_advantage()))
|
|
||||||
|
|
||||||
return '\n'.join(summary)
|
|
||||||
|
|
||||||
def _group_results_by_slice(self):
|
|
||||||
"""Groups AttackResults into a dictionary keyed by the slice."""
|
|
||||||
slice_dict = {}
|
|
||||||
for attack_result in self.single_attack_results:
|
|
||||||
slice_str = str(attack_result.slice_spec)
|
|
||||||
if slice_str not in slice_dict:
|
|
||||||
slice_dict[slice_str] = AttackResults([])
|
|
||||||
slice_dict[slice_str].single_attack_results.append(attack_result)
|
|
||||||
return slice_dict
|
|
||||||
|
|
||||||
def get_result_with_max_auc(self) -> SingleAttackResult:
|
|
||||||
"""Get the result with maximum AUC for all attacks and slices."""
|
|
||||||
aucs = [result.get_auc() for result in self.single_attack_results]
|
|
||||||
|
|
||||||
if min(aucs) < 0.4:
|
|
||||||
print('Suspiciously low AUC detected: %.2f. ' +
|
|
||||||
'There might be a bug in the classifier' % min(aucs))
|
|
||||||
|
|
||||||
return self.single_attack_results[np.argmax(aucs)]
|
|
||||||
|
|
||||||
def get_result_with_max_attacker_advantage(self) -> SingleAttackResult:
|
|
||||||
"""Get the result with maximum advantage for all attacks and slices."""
|
|
||||||
return self.single_attack_results[np.argmax([
|
|
||||||
result.get_attacker_advantage() for result in self.single_attack_results
|
|
||||||
])]
|
|
||||||
|
|
||||||
def save(self, filepath):
|
|
||||||
"""Saves self to a pickle file."""
|
|
||||||
with open(filepath, 'wb') as out:
|
|
||||||
pickle.dump(self, out)
|
|
||||||
|
|
||||||
@classmethod
|
|
||||||
def load(cls, filepath):
|
|
||||||
"""Loads AttackResults from a pickle file."""
|
|
||||||
with open(filepath, 'rb') as inp:
|
|
||||||
return pickle.load(inp)
|
|
||||||
|
|
||||||
|
|
||||||
@dataclass
|
|
||||||
class AttackResultsCollection:
|
|
||||||
"""A collection of AttackResults."""
|
|
||||||
attack_results_list: Iterable[AttackResults]
|
|
||||||
|
|
||||||
def append(self, attack_results: AttackResults):
|
|
||||||
self.attack_results_list.append(attack_results)
|
|
||||||
|
|
||||||
def save(self, dirname):
|
|
||||||
"""Saves self to a pickle file."""
|
|
||||||
for i, attack_results in enumerate(self.attack_results_list):
|
|
||||||
filepath = os.path.join(dirname,
|
|
||||||
_get_attack_results_filename(attack_results, i))
|
|
||||||
|
|
||||||
attack_results.save(filepath)
|
|
||||||
|
|
||||||
@classmethod
|
|
||||||
def load(cls, dirname):
|
|
||||||
"""Loads AttackResultsCollection from all files in a directory."""
|
|
||||||
loaded_collection = AttackResultsCollection([])
|
|
||||||
for filepath in sorted(glob.glob('%s/*' % dirname)):
|
|
||||||
with open(filepath, 'rb') as inp:
|
|
||||||
loaded_collection.attack_results_list.append(pickle.load(inp))
|
|
||||||
return loaded_collection
|
|
||||||
|
|
||||||
|
|
||||||
def _get_attack_results_filename(attack_results: AttackResults, index: int):
|
|
||||||
"""Creates a filename for a specific set of AttackResults."""
|
|
||||||
metadata = attack_results.privacy_report_metadata
|
|
||||||
if metadata is not None:
|
|
||||||
return '%s_%s_epoch_%s.pickle' % (metadata.model_variant_label, index,
|
|
||||||
metadata.epoch_num)
|
|
||||||
return '%s.pickle' % index
|
|
||||||
|
|
||||||
|
|
||||||
def get_flattened_attack_metrics(results: AttackResults):
|
|
||||||
"""Get flattened attack metrics.
|
|
||||||
|
|
||||||
Args:
|
|
||||||
results: membership inference attack results.
|
|
||||||
|
|
||||||
Returns:
|
|
||||||
types: a list of attack types
|
|
||||||
slices: a list of slices
|
|
||||||
attack_metrics: a list of metric names
|
|
||||||
values: a list of metric values, i-th element correspond to properties[i]
|
|
||||||
"""
|
|
||||||
types = []
|
|
||||||
slices = []
|
|
||||||
attack_metrics = []
|
|
||||||
values = []
|
|
||||||
for attack_result in results.single_attack_results:
|
|
||||||
types += [str(attack_result.attack_type)] * 2
|
|
||||||
slices += [str(attack_result.slice_spec)] * 2
|
|
||||||
attack_metrics += ['adv', 'auc']
|
|
||||||
values += [float(attack_result.get_attacker_advantage()),
|
|
||||||
float(attack_result.get_auc())]
|
|
||||||
return types, slices, attack_metrics, values
|
|
||||||
|
|
|
@ -13,136 +13,6 @@
|
||||||
# limitations under the License.
|
# limitations under the License.
|
||||||
|
|
||||||
# Lint as: python3
|
# Lint as: python3
|
||||||
"""Specifying and creating AttackInputData slices."""
|
"""Moved to privacy_attack/membership_inference_attack."""
|
||||||
|
|
||||||
import collections
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.dataset_slicing import * # pylint: disable=wildcard-import
|
||||||
import copy
|
|
||||||
from typing import List
|
|
||||||
|
|
||||||
import numpy as np
|
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import AttackInputData
|
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import SingleSliceSpec
|
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import SlicingFeature
|
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import SlicingSpec
|
|
||||||
|
|
||||||
|
|
||||||
def _slice_if_not_none(a, idx):
|
|
||||||
return None if a is None else a[idx]
|
|
||||||
|
|
||||||
|
|
||||||
def _slice_data_by_indices(data: AttackInputData, idx_train,
|
|
||||||
idx_test) -> AttackInputData:
|
|
||||||
"""Slices train fields with with idx_train and test fields with and idx_test."""
|
|
||||||
|
|
||||||
result = AttackInputData()
|
|
||||||
|
|
||||||
# Slice train data.
|
|
||||||
result.logits_train = _slice_if_not_none(data.logits_train, idx_train)
|
|
||||||
result.probs_train = _slice_if_not_none(data.probs_train, idx_train)
|
|
||||||
result.labels_train = _slice_if_not_none(data.labels_train, idx_train)
|
|
||||||
result.loss_train = _slice_if_not_none(data.loss_train, idx_train)
|
|
||||||
result.entropy_train = _slice_if_not_none(data.entropy_train, idx_train)
|
|
||||||
|
|
||||||
# Slice test data.
|
|
||||||
result.logits_test = _slice_if_not_none(data.logits_test, idx_test)
|
|
||||||
result.probs_test = _slice_if_not_none(data.probs_test, idx_test)
|
|
||||||
result.labels_test = _slice_if_not_none(data.labels_test, idx_test)
|
|
||||||
result.loss_test = _slice_if_not_none(data.loss_test, idx_test)
|
|
||||||
result.entropy_test = _slice_if_not_none(data.entropy_test, idx_test)
|
|
||||||
|
|
||||||
return result
|
|
||||||
|
|
||||||
|
|
||||||
def _slice_by_class(data: AttackInputData, class_value: int) -> AttackInputData:
|
|
||||||
idx_train = data.labels_train == class_value
|
|
||||||
idx_test = data.labels_test == class_value
|
|
||||||
return _slice_data_by_indices(data, idx_train, idx_test)
|
|
||||||
|
|
||||||
|
|
||||||
def _slice_by_percentiles(data: AttackInputData, from_percentile: float,
|
|
||||||
to_percentile: float):
|
|
||||||
"""Slices samples by loss percentiles."""
|
|
||||||
|
|
||||||
# Find from_percentile and to_percentile percentiles in losses.
|
|
||||||
loss_train = data.get_loss_train()
|
|
||||||
loss_test = data.get_loss_test()
|
|
||||||
losses = np.concatenate((loss_train, loss_test))
|
|
||||||
from_loss = np.percentile(losses, from_percentile)
|
|
||||||
to_loss = np.percentile(losses, to_percentile)
|
|
||||||
|
|
||||||
idx_train = (from_loss <= loss_train) & (loss_train <= to_loss)
|
|
||||||
idx_test = (from_loss <= loss_test) & (loss_test <= to_loss)
|
|
||||||
|
|
||||||
return _slice_data_by_indices(data, idx_train, idx_test)
|
|
||||||
|
|
||||||
|
|
||||||
def _indices_by_classification(logits_or_probs, labels, correctly_classified):
|
|
||||||
idx_correct = labels == np.argmax(logits_or_probs, axis=1)
|
|
||||||
return idx_correct if correctly_classified else np.invert(idx_correct)
|
|
||||||
|
|
||||||
|
|
||||||
def _slice_by_classification_correctness(data: AttackInputData,
|
|
||||||
correctly_classified: bool):
|
|
||||||
idx_train = _indices_by_classification(data.logits_or_probs_train,
|
|
||||||
data.labels_train,
|
|
||||||
correctly_classified)
|
|
||||||
idx_test = _indices_by_classification(data.logits_or_probs_test,
|
|
||||||
data.labels_test, correctly_classified)
|
|
||||||
return _slice_data_by_indices(data, idx_train, idx_test)
|
|
||||||
|
|
||||||
|
|
||||||
def get_single_slice_specs(slicing_spec: SlicingSpec,
|
|
||||||
num_classes: int = None) -> List[SingleSliceSpec]:
|
|
||||||
"""Returns slices of data according to slicing_spec."""
|
|
||||||
result = []
|
|
||||||
|
|
||||||
if slicing_spec.entire_dataset:
|
|
||||||
result.append(SingleSliceSpec())
|
|
||||||
|
|
||||||
# Create slices by class.
|
|
||||||
by_class = slicing_spec.by_class
|
|
||||||
if isinstance(by_class, bool):
|
|
||||||
if by_class:
|
|
||||||
assert num_classes, "When by_class == True, num_classes should be given."
|
|
||||||
assert 0 <= num_classes <= 1000, (
|
|
||||||
f"Too much classes for slicing by classes. "
|
|
||||||
f"Found {num_classes}.")
|
|
||||||
for c in range(num_classes):
|
|
||||||
result.append(SingleSliceSpec(SlicingFeature.CLASS, c))
|
|
||||||
elif isinstance(by_class, int):
|
|
||||||
result.append(SingleSliceSpec(SlicingFeature.CLASS, by_class))
|
|
||||||
elif isinstance(by_class, collections.Iterable):
|
|
||||||
for c in by_class:
|
|
||||||
result.append(SingleSliceSpec(SlicingFeature.CLASS, c))
|
|
||||||
|
|
||||||
# Create slices by percentiles
|
|
||||||
if slicing_spec.by_percentiles:
|
|
||||||
for percent in range(0, 100, 10):
|
|
||||||
result.append(
|
|
||||||
SingleSliceSpec(SlicingFeature.PERCENTILE, (percent, percent + 10)))
|
|
||||||
|
|
||||||
# Create slices by correctness of the classifications.
|
|
||||||
if slicing_spec.by_classification_correctness:
|
|
||||||
result.append(SingleSliceSpec(SlicingFeature.CORRECTLY_CLASSIFIED, True))
|
|
||||||
result.append(SingleSliceSpec(SlicingFeature.CORRECTLY_CLASSIFIED, False))
|
|
||||||
|
|
||||||
return result
|
|
||||||
|
|
||||||
|
|
||||||
def get_slice(data: AttackInputData,
|
|
||||||
slice_spec: SingleSliceSpec) -> AttackInputData:
|
|
||||||
"""Returns a single slice of data according to slice_spec."""
|
|
||||||
if slice_spec.entire_dataset:
|
|
||||||
data_slice = copy.copy(data)
|
|
||||||
elif slice_spec.feature == SlicingFeature.CLASS:
|
|
||||||
data_slice = _slice_by_class(data, slice_spec.value)
|
|
||||||
elif slice_spec.feature == SlicingFeature.PERCENTILE:
|
|
||||||
from_percentile, to_percentile = slice_spec.value
|
|
||||||
data_slice = _slice_by_percentiles(data, from_percentile, to_percentile)
|
|
||||||
elif slice_spec.feature == SlicingFeature.CORRECTLY_CLASSIFIED:
|
|
||||||
data_slice = _slice_by_classification_correctness(data, slice_spec.value)
|
|
||||||
else:
|
|
||||||
raise ValueError('Unknown slice spec feature "%s"' % slice_spec.feature)
|
|
||||||
|
|
||||||
data_slice.slice_spec = slice_spec
|
|
||||||
return data_slice
|
|
||||||
|
|
|
@ -13,129 +13,6 @@
|
||||||
# limitations under the License.
|
# limitations under the License.
|
||||||
|
|
||||||
# Lint as: python3
|
# Lint as: python3
|
||||||
"""A callback and a function in keras for membership inference attack."""
|
"""Moved to privacy_attack/membership_inference_attack."""
|
||||||
|
|
||||||
import os
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.keras_evaluation import * # pylint: disable=wildcard-import
|
||||||
from typing import Iterable
|
|
||||||
from absl import logging
|
|
||||||
|
|
||||||
import tensorflow as tf
|
|
||||||
|
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack import membership_inference_attack as mia
|
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import AttackInputData
|
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import AttackType
|
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import get_flattened_attack_metrics
|
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import SlicingSpec
|
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack.utils import log_loss
|
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack.utils_tensorboard import write_results_to_tensorboard_tf2 as write_results_to_tensorboard
|
|
||||||
|
|
||||||
|
|
||||||
def calculate_losses(model, data, labels):
|
|
||||||
"""Calculate losses of model prediction on data, provided true labels.
|
|
||||||
|
|
||||||
Args:
|
|
||||||
model: model to make prediction
|
|
||||||
data: samples
|
|
||||||
labels: true labels of samples (integer valued)
|
|
||||||
|
|
||||||
Returns:
|
|
||||||
preds: probability vector of each sample
|
|
||||||
loss: cross entropy loss of each sample
|
|
||||||
"""
|
|
||||||
pred = model.predict(data)
|
|
||||||
loss = log_loss(labels, pred)
|
|
||||||
return pred, loss
|
|
||||||
|
|
||||||
|
|
||||||
class MembershipInferenceCallback(tf.keras.callbacks.Callback):
|
|
||||||
"""Callback to perform membership inference attack on epoch end."""
|
|
||||||
|
|
||||||
def __init__(
|
|
||||||
self,
|
|
||||||
in_train, out_train,
|
|
||||||
slicing_spec: SlicingSpec = None,
|
|
||||||
attack_types: Iterable[AttackType] = (AttackType.THRESHOLD_ATTACK,),
|
|
||||||
tensorboard_dir=None,
|
|
||||||
tensorboard_merge_classifiers=False):
|
|
||||||
"""Initalizes the callback.
|
|
||||||
|
|
||||||
Args:
|
|
||||||
in_train: (in_training samples, in_training labels)
|
|
||||||
out_train: (out_training samples, out_training labels)
|
|
||||||
slicing_spec: slicing specification of the attack
|
|
||||||
attack_types: a list of attacks, each of type AttackType
|
|
||||||
tensorboard_dir: directory for tensorboard summary
|
|
||||||
tensorboard_merge_classifiers: if true, plot different classifiers with
|
|
||||||
the same slicing_spec and metric in the same figure
|
|
||||||
"""
|
|
||||||
self._in_train_data, self._in_train_labels = in_train
|
|
||||||
self._out_train_data, self._out_train_labels = out_train
|
|
||||||
self._slicing_spec = slicing_spec
|
|
||||||
self._attack_types = attack_types
|
|
||||||
self._tensorboard_merge_classifiers = tensorboard_merge_classifiers
|
|
||||||
if tensorboard_dir:
|
|
||||||
if tensorboard_merge_classifiers:
|
|
||||||
self._writers = {}
|
|
||||||
for attack_type in attack_types:
|
|
||||||
self._writers[attack_type.name] = tf.summary.create_file_writer(
|
|
||||||
os.path.join(tensorboard_dir, 'MI', attack_type.name))
|
|
||||||
else:
|
|
||||||
self._writers = tf.summary.create_file_writer(
|
|
||||||
os.path.join(tensorboard_dir, 'MI'))
|
|
||||||
logging.info('Will write to tensorboard.')
|
|
||||||
else:
|
|
||||||
self._writers = None
|
|
||||||
|
|
||||||
def on_epoch_end(self, epoch, logs=None):
|
|
||||||
results = run_attack_on_keras_model(
|
|
||||||
self.model,
|
|
||||||
(self._in_train_data, self._in_train_labels),
|
|
||||||
(self._out_train_data, self._out_train_labels),
|
|
||||||
self._slicing_spec,
|
|
||||||
self._attack_types)
|
|
||||||
logging.info(results)
|
|
||||||
|
|
||||||
att_types, att_slices, att_metrics, att_values = get_flattened_attack_metrics(
|
|
||||||
results)
|
|
||||||
print('Attack result:')
|
|
||||||
print('\n'.join([' %s: %.4f' % (', '.join([s, t, m]), v) for t, s, m, v in
|
|
||||||
zip(att_types, att_slices, att_metrics, att_values)]))
|
|
||||||
|
|
||||||
# Write to tensorboard if tensorboard_dir is specified
|
|
||||||
if self._writers is not None:
|
|
||||||
write_results_to_tensorboard(results, self._writers, epoch,
|
|
||||||
self._tensorboard_merge_classifiers)
|
|
||||||
|
|
||||||
|
|
||||||
def run_attack_on_keras_model(
|
|
||||||
model, in_train, out_train,
|
|
||||||
slicing_spec: SlicingSpec = None,
|
|
||||||
attack_types: Iterable[AttackType] = (AttackType.THRESHOLD_ATTACK,)):
|
|
||||||
"""Performs the attack on a trained model.
|
|
||||||
|
|
||||||
Args:
|
|
||||||
model: model to be tested
|
|
||||||
in_train: a (in_training samples, in_training labels) tuple
|
|
||||||
out_train: a (out_training samples, out_training labels) tuple
|
|
||||||
slicing_spec: slicing specification of the attack
|
|
||||||
attack_types: a list of attacks, each of type AttackType
|
|
||||||
Returns:
|
|
||||||
Results of the attack
|
|
||||||
"""
|
|
||||||
in_train_data, in_train_labels = in_train
|
|
||||||
out_train_data, out_train_labels = out_train
|
|
||||||
|
|
||||||
# Compute predictions and losses
|
|
||||||
in_train_pred, in_train_loss = calculate_losses(model, in_train_data,
|
|
||||||
in_train_labels)
|
|
||||||
out_train_pred, out_train_loss = calculate_losses(model, out_train_data,
|
|
||||||
out_train_labels)
|
|
||||||
attack_input = AttackInputData(
|
|
||||||
logits_train=in_train_pred, logits_test=out_train_pred,
|
|
||||||
labels_train=in_train_labels, labels_test=out_train_labels,
|
|
||||||
loss_train=in_train_loss, loss_test=out_train_loss
|
|
||||||
)
|
|
||||||
results = mia.run_attacks(attack_input,
|
|
||||||
slicing_spec=slicing_spec,
|
|
||||||
attack_types=attack_types)
|
|
||||||
return results
|
|
||||||
|
|
|
@ -13,320 +13,6 @@
|
||||||
# limitations under the License.
|
# limitations under the License.
|
||||||
|
|
||||||
# Lint as: python3
|
# Lint as: python3
|
||||||
"""Code that runs membership inference attacks based on the model outputs.
|
"""Moved to privacy_attack/membership_inference_attack."""
|
||||||
|
|
||||||
This file belongs to the new API for membership inference attacks. This file
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.membership_inference_attack import * # pylint: disable=wildcard-import
|
||||||
will be renamed to membership_inference_attack.py after the old API is removed.
|
|
||||||
"""
|
|
||||||
|
|
||||||
from typing import Iterable
|
|
||||||
import numpy as np
|
|
||||||
from sklearn import metrics
|
|
||||||
|
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack import models
|
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import AttackInputData
|
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import AttackResults
|
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import AttackType
|
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import DataSize
|
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import MembershipProbabilityResults
|
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import PrivacyReportMetadata
|
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import RocCurve
|
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import SingleAttackResult
|
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import SingleMembershipProbabilityResult
|
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import SingleSliceSpec
|
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import SlicingSpec
|
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack.dataset_slicing import get_single_slice_specs
|
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack.dataset_slicing import get_slice
|
|
||||||
|
|
||||||
|
|
||||||
def _get_slice_spec(data: AttackInputData) -> SingleSliceSpec:
|
|
||||||
if hasattr(data, 'slice_spec'):
|
|
||||||
return data.slice_spec
|
|
||||||
return SingleSliceSpec()
|
|
||||||
|
|
||||||
|
|
||||||
def _run_trained_attack(attack_input: AttackInputData,
|
|
||||||
attack_type: AttackType,
|
|
||||||
balance_attacker_training: bool = True):
|
|
||||||
"""Classification attack done by ML models."""
|
|
||||||
attacker = None
|
|
||||||
|
|
||||||
if attack_type == AttackType.LOGISTIC_REGRESSION:
|
|
||||||
attacker = models.LogisticRegressionAttacker()
|
|
||||||
elif attack_type == AttackType.MULTI_LAYERED_PERCEPTRON:
|
|
||||||
attacker = models.MultilayerPerceptronAttacker()
|
|
||||||
elif attack_type == AttackType.RANDOM_FOREST:
|
|
||||||
attacker = models.RandomForestAttacker()
|
|
||||||
elif attack_type == AttackType.K_NEAREST_NEIGHBORS:
|
|
||||||
attacker = models.KNearestNeighborsAttacker()
|
|
||||||
else:
|
|
||||||
raise NotImplementedError('Attack type %s not implemented yet.' %
|
|
||||||
attack_type)
|
|
||||||
|
|
||||||
prepared_attacker_data = models.create_attacker_data(
|
|
||||||
attack_input, balance=balance_attacker_training)
|
|
||||||
|
|
||||||
attacker.train_model(prepared_attacker_data.features_train,
|
|
||||||
prepared_attacker_data.is_training_labels_train)
|
|
||||||
|
|
||||||
# Run the attacker on (permuted) test examples.
|
|
||||||
predictions_test = attacker.predict(prepared_attacker_data.features_test)
|
|
||||||
|
|
||||||
# Generate ROC curves with predictions.
|
|
||||||
fpr, tpr, thresholds = metrics.roc_curve(
|
|
||||||
prepared_attacker_data.is_training_labels_test, predictions_test)
|
|
||||||
|
|
||||||
roc_curve = RocCurve(tpr=tpr, fpr=fpr, thresholds=thresholds)
|
|
||||||
|
|
||||||
# NOTE: In the current setup we can't obtain membership scores for all
|
|
||||||
# samples, since some of them were used to train the attacker. This can be
|
|
||||||
# fixed by training several attackers to ensure each sample was left out
|
|
||||||
# in exactly one attacker (basically, this means performing cross-validation).
|
|
||||||
# TODO(b/175870479): Implement membership scores for predicted attackers.
|
|
||||||
|
|
||||||
return SingleAttackResult(
|
|
||||||
slice_spec=_get_slice_spec(attack_input),
|
|
||||||
data_size=prepared_attacker_data.data_size,
|
|
||||||
attack_type=attack_type,
|
|
||||||
roc_curve=roc_curve)
|
|
||||||
|
|
||||||
|
|
||||||
def _run_threshold_attack(attack_input: AttackInputData):
|
|
||||||
"""Runs a threshold attack on loss."""
|
|
||||||
ntrain, ntest = attack_input.get_train_size(), attack_input.get_test_size()
|
|
||||||
loss_train = attack_input.get_loss_train()
|
|
||||||
loss_test = attack_input.get_loss_test()
|
|
||||||
if loss_train is None or loss_test is None:
|
|
||||||
raise ValueError('Not possible to run threshold attack without losses.')
|
|
||||||
fpr, tpr, thresholds = metrics.roc_curve(
|
|
||||||
np.concatenate((np.zeros(ntrain), np.ones(ntest))),
|
|
||||||
np.concatenate((loss_train, loss_test)))
|
|
||||||
|
|
||||||
roc_curve = RocCurve(tpr=tpr, fpr=fpr, thresholds=thresholds)
|
|
||||||
|
|
||||||
return SingleAttackResult(
|
|
||||||
slice_spec=_get_slice_spec(attack_input),
|
|
||||||
data_size=DataSize(ntrain=ntrain, ntest=ntest),
|
|
||||||
attack_type=AttackType.THRESHOLD_ATTACK,
|
|
||||||
membership_scores_train=-attack_input.get_loss_train(),
|
|
||||||
membership_scores_test=-attack_input.get_loss_test(),
|
|
||||||
roc_curve=roc_curve)
|
|
||||||
|
|
||||||
|
|
||||||
def _run_threshold_entropy_attack(attack_input: AttackInputData):
|
|
||||||
ntrain, ntest = attack_input.get_train_size(), attack_input.get_test_size()
|
|
||||||
fpr, tpr, thresholds = metrics.roc_curve(
|
|
||||||
np.concatenate((np.zeros(ntrain), np.ones(ntest))),
|
|
||||||
np.concatenate(
|
|
||||||
(attack_input.get_entropy_train(), attack_input.get_entropy_test())))
|
|
||||||
|
|
||||||
roc_curve = RocCurve(tpr=tpr, fpr=fpr, thresholds=thresholds)
|
|
||||||
|
|
||||||
return SingleAttackResult(
|
|
||||||
slice_spec=_get_slice_spec(attack_input),
|
|
||||||
data_size=DataSize(ntrain=ntrain, ntest=ntest),
|
|
||||||
attack_type=AttackType.THRESHOLD_ENTROPY_ATTACK,
|
|
||||||
membership_scores_train=-attack_input.get_entropy_train(),
|
|
||||||
membership_scores_test=-attack_input.get_entropy_test(),
|
|
||||||
roc_curve=roc_curve)
|
|
||||||
|
|
||||||
|
|
||||||
def _run_attack(attack_input: AttackInputData,
|
|
||||||
attack_type: AttackType,
|
|
||||||
balance_attacker_training: bool = True,
|
|
||||||
min_num_samples: int = 1):
|
|
||||||
"""Runs membership inference attacks for specified input and type.
|
|
||||||
|
|
||||||
Args:
|
|
||||||
attack_input: input data for running an attack
|
|
||||||
attack_type: the attack to run
|
|
||||||
balance_attacker_training: Whether the training and test sets for the
|
|
||||||
membership inference attacker should have a balanced (roughly equal)
|
|
||||||
number of samples from the training and test sets used to develop
|
|
||||||
the model under attack.
|
|
||||||
min_num_samples: minimum number of examples in either training or test data.
|
|
||||||
|
|
||||||
Returns:
|
|
||||||
the attack result.
|
|
||||||
"""
|
|
||||||
attack_input.validate()
|
|
||||||
if min(attack_input.get_train_size(),
|
|
||||||
attack_input.get_test_size()) < min_num_samples:
|
|
||||||
return None
|
|
||||||
|
|
||||||
if attack_type.is_trained_attack:
|
|
||||||
return _run_trained_attack(attack_input, attack_type,
|
|
||||||
balance_attacker_training)
|
|
||||||
if attack_type == AttackType.THRESHOLD_ENTROPY_ATTACK:
|
|
||||||
return _run_threshold_entropy_attack(attack_input)
|
|
||||||
return _run_threshold_attack(attack_input)
|
|
||||||
|
|
||||||
|
|
||||||
def run_attacks(attack_input: AttackInputData,
|
|
||||||
slicing_spec: SlicingSpec = None,
|
|
||||||
attack_types: Iterable[AttackType] = (
|
|
||||||
AttackType.THRESHOLD_ATTACK,),
|
|
||||||
privacy_report_metadata: PrivacyReportMetadata = None,
|
|
||||||
balance_attacker_training: bool = True,
|
|
||||||
min_num_samples: int = 1) -> AttackResults:
|
|
||||||
"""Runs membership inference attacks on a classification model.
|
|
||||||
|
|
||||||
It runs attacks specified by attack_types on each attack_input slice which is
|
|
||||||
specified by slicing_spec.
|
|
||||||
|
|
||||||
Args:
|
|
||||||
attack_input: input data for running an attack
|
|
||||||
slicing_spec: specifies attack_input slices to run attack on
|
|
||||||
attack_types: attacks to run
|
|
||||||
privacy_report_metadata: the metadata of the model under attack.
|
|
||||||
balance_attacker_training: Whether the training and test sets for the
|
|
||||||
membership inference attacker should have a balanced (roughly equal)
|
|
||||||
number of samples from the training and test sets used to develop
|
|
||||||
the model under attack.
|
|
||||||
min_num_samples: minimum number of examples in either training or test data.
|
|
||||||
|
|
||||||
Returns:
|
|
||||||
the attack result.
|
|
||||||
"""
|
|
||||||
attack_input.validate()
|
|
||||||
attack_results = []
|
|
||||||
|
|
||||||
if slicing_spec is None:
|
|
||||||
slicing_spec = SlicingSpec(entire_dataset=True)
|
|
||||||
num_classes = None
|
|
||||||
if slicing_spec.by_class:
|
|
||||||
num_classes = attack_input.num_classes
|
|
||||||
input_slice_specs = get_single_slice_specs(slicing_spec, num_classes)
|
|
||||||
for single_slice_spec in input_slice_specs:
|
|
||||||
attack_input_slice = get_slice(attack_input, single_slice_spec)
|
|
||||||
for attack_type in attack_types:
|
|
||||||
attack_result = _run_attack(attack_input_slice, attack_type,
|
|
||||||
balance_attacker_training,
|
|
||||||
min_num_samples)
|
|
||||||
if attack_result is not None:
|
|
||||||
attack_results.append(attack_result)
|
|
||||||
|
|
||||||
privacy_report_metadata = _compute_missing_privacy_report_metadata(
|
|
||||||
privacy_report_metadata, attack_input)
|
|
||||||
|
|
||||||
return AttackResults(
|
|
||||||
single_attack_results=attack_results,
|
|
||||||
privacy_report_metadata=privacy_report_metadata)
|
|
||||||
|
|
||||||
|
|
||||||
def _compute_membership_probability(
|
|
||||||
attack_input: AttackInputData,
|
|
||||||
num_bins: int = 15) -> SingleMembershipProbabilityResult:
|
|
||||||
"""Computes each individual point's likelihood of being a member (denoted as privacy risk score in https://arxiv.org/abs/2003.10595).
|
|
||||||
|
|
||||||
For an individual sample, its privacy risk score is computed as the posterior
|
|
||||||
probability of being in the training set
|
|
||||||
after observing its prediction output by the target machine learning model.
|
|
||||||
|
|
||||||
Args:
|
|
||||||
attack_input: input data for compute membership probability
|
|
||||||
num_bins: the number of bins used to compute the training/test histogram
|
|
||||||
|
|
||||||
Returns:
|
|
||||||
membership probability results
|
|
||||||
"""
|
|
||||||
|
|
||||||
# Uses the provided loss or entropy. Otherwise computes the loss.
|
|
||||||
if attack_input.loss_train is not None and attack_input.loss_test is not None:
|
|
||||||
train_values = attack_input.loss_train
|
|
||||||
test_values = attack_input.loss_test
|
|
||||||
elif attack_input.entropy_train is not None and attack_input.entropy_test is not None:
|
|
||||||
train_values = attack_input.entropy_train
|
|
||||||
test_values = attack_input.entropy_test
|
|
||||||
else:
|
|
||||||
train_values = attack_input.get_loss_train()
|
|
||||||
test_values = attack_input.get_loss_test()
|
|
||||||
|
|
||||||
# Compute the histogram in the log scale
|
|
||||||
small_value = 1e-10
|
|
||||||
train_values = np.maximum(train_values, small_value)
|
|
||||||
test_values = np.maximum(test_values, small_value)
|
|
||||||
|
|
||||||
min_value = min(train_values.min(), test_values.min())
|
|
||||||
max_value = max(train_values.max(), test_values.max())
|
|
||||||
bins_hist = np.logspace(
|
|
||||||
np.log10(min_value), np.log10(max_value), num_bins + 1)
|
|
||||||
|
|
||||||
train_hist, _ = np.histogram(train_values, bins=bins_hist)
|
|
||||||
train_hist = train_hist / (len(train_values) + 0.0)
|
|
||||||
train_hist_indices = np.fmin(
|
|
||||||
np.digitize(train_values, bins=bins_hist), num_bins) - 1
|
|
||||||
|
|
||||||
test_hist, _ = np.histogram(test_values, bins=bins_hist)
|
|
||||||
test_hist = test_hist / (len(test_values) + 0.0)
|
|
||||||
test_hist_indices = np.fmin(
|
|
||||||
np.digitize(test_values, bins=bins_hist), num_bins) - 1
|
|
||||||
|
|
||||||
combined_hist = train_hist + test_hist
|
|
||||||
combined_hist[combined_hist == 0] = small_value
|
|
||||||
membership_prob_list = train_hist / (combined_hist + 0.0)
|
|
||||||
train_membership_probs = membership_prob_list[train_hist_indices]
|
|
||||||
test_membership_probs = membership_prob_list[test_hist_indices]
|
|
||||||
|
|
||||||
return SingleMembershipProbabilityResult(
|
|
||||||
slice_spec=_get_slice_spec(attack_input),
|
|
||||||
train_membership_probs=train_membership_probs,
|
|
||||||
test_membership_probs=test_membership_probs)
|
|
||||||
|
|
||||||
|
|
||||||
def run_membership_probability_analysis(
|
|
||||||
attack_input: AttackInputData,
|
|
||||||
slicing_spec: SlicingSpec = None) -> MembershipProbabilityResults:
|
|
||||||
"""Perform membership probability analysis on all given slice types.
|
|
||||||
|
|
||||||
Args:
|
|
||||||
attack_input: input data for compute membership probabilities
|
|
||||||
slicing_spec: specifies attack_input slices
|
|
||||||
|
|
||||||
Returns:
|
|
||||||
the membership probability results.
|
|
||||||
"""
|
|
||||||
attack_input.validate()
|
|
||||||
membership_prob_results = []
|
|
||||||
|
|
||||||
if slicing_spec is None:
|
|
||||||
slicing_spec = SlicingSpec(entire_dataset=True)
|
|
||||||
num_classes = None
|
|
||||||
if slicing_spec.by_class:
|
|
||||||
num_classes = attack_input.num_classes
|
|
||||||
input_slice_specs = get_single_slice_specs(slicing_spec, num_classes)
|
|
||||||
for single_slice_spec in input_slice_specs:
|
|
||||||
attack_input_slice = get_slice(attack_input, single_slice_spec)
|
|
||||||
membership_prob_results.append(
|
|
||||||
_compute_membership_probability(attack_input_slice))
|
|
||||||
|
|
||||||
return MembershipProbabilityResults(
|
|
||||||
membership_prob_results=membership_prob_results)
|
|
||||||
|
|
||||||
|
|
||||||
def _compute_missing_privacy_report_metadata(
|
|
||||||
metadata: PrivacyReportMetadata,
|
|
||||||
attack_input: AttackInputData) -> PrivacyReportMetadata:
|
|
||||||
"""Populates metadata fields if they are missing."""
|
|
||||||
if metadata is None:
|
|
||||||
metadata = PrivacyReportMetadata()
|
|
||||||
if metadata.accuracy_train is None:
|
|
||||||
metadata.accuracy_train = _get_accuracy(attack_input.logits_train,
|
|
||||||
attack_input.labels_train)
|
|
||||||
if metadata.accuracy_test is None:
|
|
||||||
metadata.accuracy_test = _get_accuracy(attack_input.logits_test,
|
|
||||||
attack_input.labels_test)
|
|
||||||
loss_train = attack_input.get_loss_train()
|
|
||||||
loss_test = attack_input.get_loss_test()
|
|
||||||
if metadata.loss_train is None and loss_train is not None:
|
|
||||||
metadata.loss_train = np.average(loss_train)
|
|
||||||
if metadata.loss_test is None and loss_test is not None:
|
|
||||||
metadata.loss_test = np.average(loss_test)
|
|
||||||
return metadata
|
|
||||||
|
|
||||||
|
|
||||||
def _get_accuracy(logits, labels):
|
|
||||||
"""Computes the accuracy if it is missing."""
|
|
||||||
if logits is None or labels is None:
|
|
||||||
return None
|
|
||||||
return metrics.accuracy_score(labels, np.argmax(logits, axis=1))
|
|
||||||
|
|
|
@ -13,198 +13,6 @@
|
||||||
# limitations under the License.
|
# limitations under the License.
|
||||||
|
|
||||||
# Lint as: python3
|
# Lint as: python3
|
||||||
"""Trained models for membership inference attacks."""
|
"""Moved to privacy_attack/membership_inference_attack."""
|
||||||
|
|
||||||
from dataclasses import dataclass
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.models import * # pylint: disable=wildcard-import
|
||||||
import numpy as np
|
|
||||||
from sklearn import ensemble
|
|
||||||
from sklearn import linear_model
|
|
||||||
from sklearn import model_selection
|
|
||||||
from sklearn import neighbors
|
|
||||||
from sklearn import neural_network
|
|
||||||
|
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import AttackInputData
|
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import DataSize
|
|
||||||
|
|
||||||
|
|
||||||
@dataclass
|
|
||||||
class AttackerData:
|
|
||||||
"""Input data for an ML classifier attack.
|
|
||||||
|
|
||||||
This includes only the data, and not configuration.
|
|
||||||
"""
|
|
||||||
|
|
||||||
features_train: np.ndarray = None
|
|
||||||
# element-wise boolean array denoting if the example was part of training.
|
|
||||||
is_training_labels_train: np.ndarray = None
|
|
||||||
|
|
||||||
features_test: np.ndarray = None
|
|
||||||
# element-wise boolean array denoting if the example was part of training.
|
|
||||||
is_training_labels_test: np.ndarray = None
|
|
||||||
|
|
||||||
data_size: DataSize = None
|
|
||||||
|
|
||||||
|
|
||||||
def create_attacker_data(attack_input_data: AttackInputData,
|
|
||||||
test_fraction: float = 0.25,
|
|
||||||
balance: bool = True) -> AttackerData:
|
|
||||||
"""Prepare AttackInputData to train ML attackers.
|
|
||||||
|
|
||||||
Combines logits and losses and performs a random train-test split.
|
|
||||||
|
|
||||||
Args:
|
|
||||||
attack_input_data: Original AttackInputData
|
|
||||||
test_fraction: Fraction of the dataset to include in the test split.
|
|
||||||
balance: Whether the training and test sets for the membership inference
|
|
||||||
attacker should have a balanced (roughly equal) number of samples
|
|
||||||
from the training and test sets used to develop the model
|
|
||||||
under attack.
|
|
||||||
|
|
||||||
Returns:
|
|
||||||
AttackerData.
|
|
||||||
"""
|
|
||||||
attack_input_train = _column_stack(attack_input_data.logits_or_probs_train,
|
|
||||||
attack_input_data.get_loss_train())
|
|
||||||
attack_input_test = _column_stack(attack_input_data.logits_or_probs_test,
|
|
||||||
attack_input_data.get_loss_test())
|
|
||||||
|
|
||||||
if balance:
|
|
||||||
min_size = min(attack_input_data.get_train_size(),
|
|
||||||
attack_input_data.get_test_size())
|
|
||||||
attack_input_train = _sample_multidimensional_array(attack_input_train,
|
|
||||||
min_size)
|
|
||||||
attack_input_test = _sample_multidimensional_array(attack_input_test,
|
|
||||||
min_size)
|
|
||||||
ntrain, ntest = attack_input_train.shape[0], attack_input_test.shape[0]
|
|
||||||
|
|
||||||
features_all = np.concatenate((attack_input_train, attack_input_test))
|
|
||||||
|
|
||||||
labels_all = np.concatenate(((np.zeros(ntrain)), (np.ones(ntest))))
|
|
||||||
|
|
||||||
# Perform a train-test split
|
|
||||||
features_train, features_test, is_training_labels_train, is_training_labels_test = model_selection.train_test_split(
|
|
||||||
features_all, labels_all, test_size=test_fraction, stratify=labels_all)
|
|
||||||
return AttackerData(features_train, is_training_labels_train, features_test,
|
|
||||||
is_training_labels_test,
|
|
||||||
DataSize(ntrain=ntrain, ntest=ntest))
|
|
||||||
|
|
||||||
|
|
||||||
def _sample_multidimensional_array(array, size):
|
|
||||||
indices = np.random.choice(len(array), size, replace=False)
|
|
||||||
return array[indices]
|
|
||||||
|
|
||||||
|
|
||||||
def _column_stack(logits, loss):
|
|
||||||
"""Stacks logits and losses.
|
|
||||||
|
|
||||||
In case that only one exists, returns that one.
|
|
||||||
Args:
|
|
||||||
logits: logits array
|
|
||||||
loss: loss array
|
|
||||||
|
|
||||||
Returns:
|
|
||||||
stacked logits and losses (or only one if both do not exist).
|
|
||||||
"""
|
|
||||||
if logits is None:
|
|
||||||
return np.expand_dims(loss, axis=-1)
|
|
||||||
if loss is None:
|
|
||||||
return logits
|
|
||||||
return np.column_stack((logits, loss))
|
|
||||||
|
|
||||||
|
|
||||||
class TrainedAttacker:
|
|
||||||
"""Base class for training attack models."""
|
|
||||||
model = None
|
|
||||||
|
|
||||||
def train_model(self, input_features, is_training_labels):
|
|
||||||
"""Train an attacker model.
|
|
||||||
|
|
||||||
This is trained on examples from train and test datasets.
|
|
||||||
Args:
|
|
||||||
input_features : array-like of shape (n_samples, n_features) Training
|
|
||||||
vector, where n_samples is the number of samples and n_features is the
|
|
||||||
number of features.
|
|
||||||
is_training_labels : a vector of booleans of shape (n_samples, )
|
|
||||||
representing whether the sample is in the training set or not.
|
|
||||||
"""
|
|
||||||
raise NotImplementedError()
|
|
||||||
|
|
||||||
def predict(self, input_features):
|
|
||||||
"""Predicts whether input_features belongs to train or test.
|
|
||||||
|
|
||||||
Args:
|
|
||||||
input_features : A vector of features with the same semantics as x_train
|
|
||||||
passed to train_model.
|
|
||||||
Returns:
|
|
||||||
An array of probabilities denoting whether the example belongs to test.
|
|
||||||
"""
|
|
||||||
if self.model is None:
|
|
||||||
raise AssertionError(
|
|
||||||
'Model not trained yet. Please call train_model first.')
|
|
||||||
return self.model.predict_proba(input_features)[:, 1]
|
|
||||||
|
|
||||||
|
|
||||||
class LogisticRegressionAttacker(TrainedAttacker):
|
|
||||||
"""Logistic regression attacker."""
|
|
||||||
|
|
||||||
def train_model(self, input_features, is_training_labels):
|
|
||||||
lr = linear_model.LogisticRegression(solver='lbfgs')
|
|
||||||
param_grid = {
|
|
||||||
'C': np.logspace(-4, 2, 10),
|
|
||||||
}
|
|
||||||
model = model_selection.GridSearchCV(
|
|
||||||
lr, param_grid=param_grid, cv=3, n_jobs=1, verbose=0)
|
|
||||||
model.fit(input_features, is_training_labels)
|
|
||||||
self.model = model
|
|
||||||
|
|
||||||
|
|
||||||
class MultilayerPerceptronAttacker(TrainedAttacker):
|
|
||||||
"""Multilayer perceptron attacker."""
|
|
||||||
|
|
||||||
def train_model(self, input_features, is_training_labels):
|
|
||||||
mlp_model = neural_network.MLPClassifier()
|
|
||||||
param_grid = {
|
|
||||||
'hidden_layer_sizes': [(64,), (32, 32)],
|
|
||||||
'solver': ['adam'],
|
|
||||||
'alpha': [0.0001, 0.001, 0.01],
|
|
||||||
}
|
|
||||||
n_jobs = -1
|
|
||||||
model = model_selection.GridSearchCV(
|
|
||||||
mlp_model, param_grid=param_grid, cv=3, n_jobs=n_jobs, verbose=0)
|
|
||||||
model.fit(input_features, is_training_labels)
|
|
||||||
self.model = model
|
|
||||||
|
|
||||||
|
|
||||||
class RandomForestAttacker(TrainedAttacker):
|
|
||||||
"""Random forest attacker."""
|
|
||||||
|
|
||||||
def train_model(self, input_features, is_training_labels):
|
|
||||||
"""Setup a random forest pipeline with cross-validation."""
|
|
||||||
rf_model = ensemble.RandomForestClassifier()
|
|
||||||
|
|
||||||
param_grid = {
|
|
||||||
'n_estimators': [100],
|
|
||||||
'max_features': ['auto', 'sqrt'],
|
|
||||||
'max_depth': [5, 10, 20, None],
|
|
||||||
'min_samples_split': [2, 5, 10],
|
|
||||||
'min_samples_leaf': [1, 2, 4]
|
|
||||||
}
|
|
||||||
n_jobs = -1
|
|
||||||
model = model_selection.GridSearchCV(
|
|
||||||
rf_model, param_grid=param_grid, cv=3, n_jobs=n_jobs, verbose=0)
|
|
||||||
model.fit(input_features, is_training_labels)
|
|
||||||
self.model = model
|
|
||||||
|
|
||||||
|
|
||||||
class KNearestNeighborsAttacker(TrainedAttacker):
|
|
||||||
"""K nearest neighbor attacker."""
|
|
||||||
|
|
||||||
def train_model(self, input_features, is_training_labels):
|
|
||||||
knn_model = neighbors.KNeighborsClassifier()
|
|
||||||
param_grid = {
|
|
||||||
'n_neighbors': [3, 5, 7],
|
|
||||||
}
|
|
||||||
model = model_selection.GridSearchCV(
|
|
||||||
knn_model, param_grid=param_grid, cv=3, n_jobs=1, verbose=0)
|
|
||||||
model.fit(input_features, is_training_labels)
|
|
||||||
self.model = model
|
|
||||||
|
|
|
@ -13,74 +13,6 @@
|
||||||
# limitations under the License.
|
# limitations under the License.
|
||||||
|
|
||||||
# Lint as: python3
|
# Lint as: python3
|
||||||
"""Plotting functionality for membership inference attack analysis.
|
"""Moved to privacy_attack/membership_inference_attack."""
|
||||||
|
|
||||||
Functions to plot ROC curves and histograms as well as functionality to store
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.plotting import * # pylint: disable=wildcard-import
|
||||||
figures to colossus.
|
|
||||||
"""
|
|
||||||
|
|
||||||
from typing import Text, Iterable
|
|
||||||
|
|
||||||
import matplotlib.pyplot as plt
|
|
||||||
import numpy as np
|
|
||||||
from sklearn import metrics
|
|
||||||
|
|
||||||
|
|
||||||
def save_plot(figure: plt.Figure, path: Text, outformat='png'):
|
|
||||||
"""Store a figure to disk."""
|
|
||||||
if path is not None:
|
|
||||||
with open(path, 'wb') as f:
|
|
||||||
figure.savefig(f, bbox_inches='tight', format=outformat)
|
|
||||||
plt.close(figure)
|
|
||||||
|
|
||||||
|
|
||||||
def plot_curve_with_area(x: Iterable[float],
|
|
||||||
y: Iterable[float],
|
|
||||||
xlabel: Text = 'x',
|
|
||||||
ylabel: Text = 'y') -> plt.Figure:
|
|
||||||
"""Plot the curve defined by inputs and the area under the curve.
|
|
||||||
|
|
||||||
All entries of x and y are required to lie between 0 and 1.
|
|
||||||
For example, x could be recall and y precision, or x is fpr and y is tpr.
|
|
||||||
|
|
||||||
Args:
|
|
||||||
x: Values on x-axis (1d)
|
|
||||||
y: Values on y-axis (must be same length as x)
|
|
||||||
xlabel: Label for x axis
|
|
||||||
ylabel: Label for y axis
|
|
||||||
|
|
||||||
Returns:
|
|
||||||
The matplotlib figure handle
|
|
||||||
"""
|
|
||||||
fig = plt.figure()
|
|
||||||
plt.plot([0, 1], [0, 1], 'k', lw=1.0)
|
|
||||||
plt.plot(x, y, lw=2, label=f'AUC: {metrics.auc(x, y):.3f}')
|
|
||||||
plt.xlabel(xlabel)
|
|
||||||
plt.ylabel(ylabel)
|
|
||||||
plt.legend()
|
|
||||||
return fig
|
|
||||||
|
|
||||||
|
|
||||||
def plot_histograms(train: Iterable[float],
|
|
||||||
test: Iterable[float],
|
|
||||||
xlabel: Text = 'x',
|
|
||||||
thresh: float = None) -> plt.Figure:
|
|
||||||
"""Plot histograms of training versus test metrics."""
|
|
||||||
xmin = min(np.min(train), np.min(test))
|
|
||||||
xmax = max(np.max(train), np.max(test))
|
|
||||||
bins = np.linspace(xmin, xmax, 100)
|
|
||||||
fig = plt.figure()
|
|
||||||
plt.hist(test, bins=bins, density=True, alpha=0.5, label='test', log='y')
|
|
||||||
plt.hist(train, bins=bins, density=True, alpha=0.5, label='train', log='y')
|
|
||||||
if thresh is not None:
|
|
||||||
plt.axvline(thresh, c='r', label=f'threshold = {thresh:.3f}')
|
|
||||||
plt.xlabel(xlabel)
|
|
||||||
plt.ylabel('normalized counts (density)')
|
|
||||||
plt.legend()
|
|
||||||
return fig
|
|
||||||
|
|
||||||
|
|
||||||
def plot_roc_curve(roc_curve) -> plt.Figure:
|
|
||||||
"""Plot the ROC curve and the area under the curve."""
|
|
||||||
return plot_curve_with_area(
|
|
||||||
roc_curve.fpr, roc_curve.tpr, xlabel='FPR', ylabel='TPR')
|
|
||||||
|
|
|
@ -13,126 +13,6 @@
|
||||||
# limitations under the License.
|
# limitations under the License.
|
||||||
|
|
||||||
# Lint as: python3
|
# Lint as: python3
|
||||||
"""Plotting code for ML Privacy Reports."""
|
"""Moved to privacy_attack/membership_inference_attack."""
|
||||||
from typing import Iterable
|
|
||||||
import matplotlib.pyplot as plt
|
|
||||||
import pandas as pd
|
|
||||||
|
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import AttackResults
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.plotting import * # pylint: disable=wildcard-import
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import AttackResultsCollection
|
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import AttackResultsDFColumns
|
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import ENTIRE_DATASET_SLICE_STR
|
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import PrivacyMetric
|
|
||||||
|
|
||||||
# Helper constants for DataFrame keys.
|
|
||||||
LEGEND_LABEL_STR = 'legend label'
|
|
||||||
EPOCH_STR = 'Epoch'
|
|
||||||
TRAIN_ACCURACY_STR = 'Train accuracy'
|
|
||||||
|
|
||||||
|
|
||||||
def plot_by_epochs(results: AttackResultsCollection,
|
|
||||||
privacy_metrics: Iterable[PrivacyMetric]) -> plt.Figure:
|
|
||||||
"""Plots privacy vulnerabilities vs epoch numbers.
|
|
||||||
|
|
||||||
In case multiple privacy metrics are specified, the plot will feature
|
|
||||||
multiple subplots (one subplot per metrics). Multiple model variants
|
|
||||||
are supported.
|
|
||||||
Args:
|
|
||||||
results: AttackResults for the plot
|
|
||||||
privacy_metrics: List of enumerated privacy metrics that should be plotted.
|
|
||||||
|
|
||||||
Returns:
|
|
||||||
A pyplot figure with privacy vs accuracy plots.
|
|
||||||
"""
|
|
||||||
|
|
||||||
_validate_results(results.attack_results_list)
|
|
||||||
all_results_df = _calculate_combined_df_with_metadata(
|
|
||||||
results.attack_results_list)
|
|
||||||
return _generate_subplots(
|
|
||||||
all_results_df=all_results_df,
|
|
||||||
x_axis_metric='Epoch',
|
|
||||||
figure_title='Vulnerability per Epoch',
|
|
||||||
privacy_metrics=privacy_metrics)
|
|
||||||
|
|
||||||
|
|
||||||
def plot_privacy_vs_accuracy(results: AttackResultsCollection,
|
|
||||||
privacy_metrics: Iterable[PrivacyMetric]):
|
|
||||||
"""Plots privacy vulnerabilities vs accuracy plots.
|
|
||||||
|
|
||||||
In case multiple privacy metrics are specified, the plot will feature
|
|
||||||
multiple subplots (one subplot per metrics). Multiple model variants
|
|
||||||
are supported.
|
|
||||||
Args:
|
|
||||||
results: AttackResults for the plot
|
|
||||||
privacy_metrics: List of enumerated privacy metrics that should be plotted.
|
|
||||||
|
|
||||||
Returns:
|
|
||||||
A pyplot figure with privacy vs accuracy plots.
|
|
||||||
|
|
||||||
"""
|
|
||||||
_validate_results(results.attack_results_list)
|
|
||||||
all_results_df = _calculate_combined_df_with_metadata(
|
|
||||||
results.attack_results_list)
|
|
||||||
return _generate_subplots(
|
|
||||||
all_results_df=all_results_df,
|
|
||||||
x_axis_metric='Train accuracy',
|
|
||||||
figure_title='Privacy vs Utility Analysis',
|
|
||||||
privacy_metrics=privacy_metrics)
|
|
||||||
|
|
||||||
|
|
||||||
def _calculate_combined_df_with_metadata(results: Iterable[AttackResults]):
|
|
||||||
"""Adds metadata to the dataframe and concats them together."""
|
|
||||||
all_results_df = None
|
|
||||||
for attack_results in results:
|
|
||||||
attack_results_df = attack_results.calculate_pd_dataframe()
|
|
||||||
attack_results_df = attack_results_df.loc[attack_results_df[str(
|
|
||||||
AttackResultsDFColumns.SLICE_FEATURE)] == ENTIRE_DATASET_SLICE_STR]
|
|
||||||
attack_results_df.insert(0, EPOCH_STR,
|
|
||||||
attack_results.privacy_report_metadata.epoch_num)
|
|
||||||
attack_results_df.insert(
|
|
||||||
0, TRAIN_ACCURACY_STR,
|
|
||||||
attack_results.privacy_report_metadata.accuracy_train)
|
|
||||||
attack_results_df.insert(
|
|
||||||
0, LEGEND_LABEL_STR,
|
|
||||||
attack_results.privacy_report_metadata.model_variant_label + ' - ' +
|
|
||||||
attack_results_df[str(AttackResultsDFColumns.ATTACK_TYPE)])
|
|
||||||
if all_results_df is None:
|
|
||||||
all_results_df = attack_results_df
|
|
||||||
else:
|
|
||||||
all_results_df = pd.concat([all_results_df, attack_results_df],
|
|
||||||
ignore_index=True)
|
|
||||||
return all_results_df
|
|
||||||
|
|
||||||
|
|
||||||
def _generate_subplots(all_results_df: pd.DataFrame, x_axis_metric: str,
|
|
||||||
figure_title: str,
|
|
||||||
privacy_metrics: Iterable[PrivacyMetric]):
|
|
||||||
"""Create one subplot per privacy metric for a specified x_axis_metric."""
|
|
||||||
fig, axes = plt.subplots(
|
|
||||||
1, len(privacy_metrics), figsize=(5 * len(privacy_metrics) + 3, 5))
|
|
||||||
# Set a title for the entire group of subplots.
|
|
||||||
fig.suptitle(figure_title)
|
|
||||||
if len(privacy_metrics) == 1:
|
|
||||||
axes = (axes,)
|
|
||||||
for i, privacy_metric in enumerate(privacy_metrics):
|
|
||||||
legend_labels = all_results_df[LEGEND_LABEL_STR].unique()
|
|
||||||
for legend_label in legend_labels:
|
|
||||||
single_label_results = all_results_df.loc[all_results_df[LEGEND_LABEL_STR]
|
|
||||||
== legend_label]
|
|
||||||
sorted_label_results = single_label_results.sort_values(x_axis_metric)
|
|
||||||
axes[i].plot(sorted_label_results[x_axis_metric],
|
|
||||||
sorted_label_results[str(privacy_metric)])
|
|
||||||
axes[i].set_xlabel(x_axis_metric)
|
|
||||||
axes[i].set_title('%s for %s' % (privacy_metric, ENTIRE_DATASET_SLICE_STR))
|
|
||||||
plt.legend(legend_labels, loc='upper left', bbox_to_anchor=(1.02, 1))
|
|
||||||
fig.tight_layout(rect=[0, 0, 1, 0.93]) # Leave space for suptitle.
|
|
||||||
|
|
||||||
return fig
|
|
||||||
|
|
||||||
|
|
||||||
def _validate_results(results: Iterable[AttackResults]):
|
|
||||||
for attack_results in results:
|
|
||||||
if not attack_results or not attack_results.privacy_report_metadata:
|
|
||||||
raise ValueError('Privacy metadata is not defined.')
|
|
||||||
if attack_results.privacy_report_metadata.epoch_num is None:
|
|
||||||
raise ValueError('epoch_num in metadata is not defined.')
|
|
||||||
|
|
|
@ -13,361 +13,6 @@
|
||||||
# limitations under the License.
|
# limitations under the License.
|
||||||
|
|
||||||
# Lint as: python3
|
# Lint as: python3
|
||||||
"""Code for membership inference attacks on seq2seq models.
|
"""Moved to privacy_attack/membership_inference_attack."""
|
||||||
|
|
||||||
Contains seq2seq specific logic for attack data structures, attack data
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.seq2seq_mia import * # pylint: disable=wildcard-import
|
||||||
generation,
|
|
||||||
and the logistic regression membership inference attack.
|
|
||||||
"""
|
|
||||||
from typing import Iterator, List
|
|
||||||
|
|
||||||
from dataclasses import dataclass
|
|
||||||
import numpy as np
|
|
||||||
from scipy.stats import rankdata
|
|
||||||
from sklearn import metrics
|
|
||||||
from sklearn import model_selection
|
|
||||||
import tensorflow as tf
|
|
||||||
|
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack import models
|
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import AttackResults
|
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import AttackType
|
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import DataSize
|
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import PrivacyReportMetadata
|
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import RocCurve
|
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import SingleAttackResult
|
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import SingleSliceSpec
|
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack.models import _sample_multidimensional_array
|
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack.models import AttackerData
|
|
||||||
|
|
||||||
|
|
||||||
def _is_iterator(obj, obj_name):
|
|
||||||
"""Checks whether obj is a generator."""
|
|
||||||
if obj is not None and not isinstance(obj, Iterator):
|
|
||||||
raise ValueError('%s should be a generator.' % obj_name)
|
|
||||||
|
|
||||||
|
|
||||||
@dataclass
|
|
||||||
class Seq2SeqAttackInputData:
|
|
||||||
"""Input data for running an attack on seq2seq models.
|
|
||||||
|
|
||||||
This includes only the data, and not configuration.
|
|
||||||
"""
|
|
||||||
logits_train: Iterator[np.ndarray] = None
|
|
||||||
logits_test: Iterator[np.ndarray] = None
|
|
||||||
|
|
||||||
# Contains ground-truth token indices for the target sequences.
|
|
||||||
labels_train: Iterator[np.ndarray] = None
|
|
||||||
labels_test: Iterator[np.ndarray] = None
|
|
||||||
|
|
||||||
# Size of the target sequence vocabulary.
|
|
||||||
vocab_size: int = None
|
|
||||||
|
|
||||||
# Train, test size = number of batches in training, test set.
|
|
||||||
# These values need to be supplied by the user as logits, labels
|
|
||||||
# are lazy loaded for seq2seq models.
|
|
||||||
train_size: int = 0
|
|
||||||
test_size: int = 0
|
|
||||||
|
|
||||||
def validate(self):
|
|
||||||
"""Validates the inputs."""
|
|
||||||
|
|
||||||
if (self.logits_train is None) != (self.logits_test is None):
|
|
||||||
raise ValueError(
|
|
||||||
'logits_train and logits_test should both be either set or unset')
|
|
||||||
|
|
||||||
if (self.labels_train is None) != (self.labels_test is None):
|
|
||||||
raise ValueError(
|
|
||||||
'labels_train and labels_test should both be either set or unset')
|
|
||||||
|
|
||||||
if self.logits_train is None or self.labels_train is None:
|
|
||||||
raise ValueError(
|
|
||||||
'Labels, logits of training, test sets should all be set')
|
|
||||||
|
|
||||||
if (self.vocab_size is None or self.train_size is None or
|
|
||||||
self.test_size is None):
|
|
||||||
raise ValueError('vocab_size, train_size, test_size should all be set')
|
|
||||||
|
|
||||||
if self.vocab_size is not None and not int:
|
|
||||||
raise ValueError('vocab_size should be of integer type')
|
|
||||||
|
|
||||||
if self.train_size is not None and not int:
|
|
||||||
raise ValueError('train_size should be of integer type')
|
|
||||||
|
|
||||||
if self.test_size is not None and not int:
|
|
||||||
raise ValueError('test_size should be of integer type')
|
|
||||||
|
|
||||||
_is_iterator(self.logits_train, 'logits_train')
|
|
||||||
_is_iterator(self.logits_test, 'logits_test')
|
|
||||||
_is_iterator(self.labels_train, 'labels_train')
|
|
||||||
_is_iterator(self.labels_test, 'labels_test')
|
|
||||||
|
|
||||||
def __str__(self):
|
|
||||||
"""Returns the shapes of variables that are not None."""
|
|
||||||
result = ['AttackInputData(']
|
|
||||||
|
|
||||||
if self.vocab_size is not None and self.train_size is not None:
|
|
||||||
result.append(
|
|
||||||
'logits_train with shape (%d, num_sequences, num_tokens, %d)' %
|
|
||||||
(self.train_size, self.vocab_size))
|
|
||||||
result.append(
|
|
||||||
'labels_train with shape (%d, num_sequences, num_tokens, 1)' %
|
|
||||||
self.train_size)
|
|
||||||
|
|
||||||
if self.vocab_size is not None and self.test_size is not None:
|
|
||||||
result.append(
|
|
||||||
'logits_test with shape (%d, num_sequences, num_tokens, %d)' %
|
|
||||||
(self.test_size, self.vocab_size))
|
|
||||||
result.append(
|
|
||||||
'labels_test with shape (%d, num_sequences, num_tokens, 1)' %
|
|
||||||
self.test_size)
|
|
||||||
|
|
||||||
result.append(')')
|
|
||||||
return '\n'.join(result)
|
|
||||||
|
|
||||||
|
|
||||||
def _get_attack_features_and_metadata(
|
|
||||||
logits: Iterator[np.ndarray],
|
|
||||||
labels: Iterator[np.ndarray]) -> (np.ndarray, float, float):
|
|
||||||
"""Returns the average rank of tokens per batch of sequences and the loss.
|
|
||||||
|
|
||||||
Args:
|
|
||||||
logits: Logits returned by a seq2seq model, dim = (num_batches,
|
|
||||||
num_sequences, num_tokens, vocab_size).
|
|
||||||
labels: Target labels for the seq2seq model, dim = (num_batches,
|
|
||||||
num_sequences, num_tokens, 1).
|
|
||||||
|
|
||||||
Returns:
|
|
||||||
1. An array of average ranks, dim = (num_batches, 1).
|
|
||||||
Each average rank is calculated over ranks of tokens in sequences of a
|
|
||||||
particular batch.
|
|
||||||
2. Loss computed over all logits and labels.
|
|
||||||
3. Accuracy computed over all logits and labels.
|
|
||||||
"""
|
|
||||||
ranks = []
|
|
||||||
loss = 0.0
|
|
||||||
dataset_length = 0.0
|
|
||||||
correct_preds = 0
|
|
||||||
total_preds = 0
|
|
||||||
for batch_logits, batch_labels in zip(logits, labels):
|
|
||||||
# Compute average rank for the current batch.
|
|
||||||
batch_ranks = _get_batch_ranks(batch_logits, batch_labels)
|
|
||||||
ranks.append(np.mean(batch_ranks))
|
|
||||||
|
|
||||||
# Update overall loss metrics with metrics of the current batch.
|
|
||||||
batch_loss, batch_length = _get_batch_loss_metrics(batch_logits,
|
|
||||||
batch_labels)
|
|
||||||
loss += batch_loss
|
|
||||||
dataset_length += batch_length
|
|
||||||
|
|
||||||
# Update overall accuracy metrics with metrics of the current batch.
|
|
||||||
batch_correct_preds, batch_total_preds = _get_batch_accuracy_metrics(
|
|
||||||
batch_logits, batch_labels)
|
|
||||||
correct_preds += batch_correct_preds
|
|
||||||
total_preds += batch_total_preds
|
|
||||||
|
|
||||||
# Compute loss and accuracy for the dataset.
|
|
||||||
loss = loss / dataset_length
|
|
||||||
accuracy = correct_preds / total_preds
|
|
||||||
|
|
||||||
return np.array(ranks), loss, accuracy
|
|
||||||
|
|
||||||
|
|
||||||
def _get_batch_ranks(batch_logits: np.ndarray,
|
|
||||||
batch_labels: np.ndarray) -> np.ndarray:
|
|
||||||
"""Returns the ranks of tokens in a batch of sequences.
|
|
||||||
|
|
||||||
Args:
|
|
||||||
batch_logits: Logits returned by a seq2seq model, dim = (num_sequences,
|
|
||||||
num_tokens, vocab_size).
|
|
||||||
batch_labels: Target labels for the seq2seq model, dim = (num_sequences,
|
|
||||||
num_tokens, 1).
|
|
||||||
|
|
||||||
Returns:
|
|
||||||
An array of ranks of tokens in a batch of sequences, dim = (num_sequences,
|
|
||||||
num_tokens, 1)
|
|
||||||
"""
|
|
||||||
batch_ranks = []
|
|
||||||
for sequence_logits, sequence_labels in zip(batch_logits, batch_labels):
|
|
||||||
batch_ranks += _get_ranks_for_sequence(sequence_logits, sequence_labels)
|
|
||||||
|
|
||||||
return np.array(batch_ranks)
|
|
||||||
|
|
||||||
|
|
||||||
def _get_ranks_for_sequence(logits: np.ndarray,
|
|
||||||
labels: np.ndarray) -> List[float]:
|
|
||||||
"""Returns ranks for a sequence.
|
|
||||||
|
|
||||||
Args:
|
|
||||||
logits: Logits of a single sequence, dim = (num_tokens, vocab_size).
|
|
||||||
labels: Target labels of a single sequence, dim = (num_tokens, 1).
|
|
||||||
|
|
||||||
Returns:
|
|
||||||
An array of ranks for tokens in the sequence, dim = (num_tokens, 1).
|
|
||||||
"""
|
|
||||||
sequence_ranks = []
|
|
||||||
for logit, label in zip(logits, labels.astype(int)):
|
|
||||||
rank = rankdata(-logit, method='min')[label] - 1.0
|
|
||||||
sequence_ranks.append(rank)
|
|
||||||
|
|
||||||
return sequence_ranks
|
|
||||||
|
|
||||||
|
|
||||||
def _get_batch_loss_metrics(batch_logits: np.ndarray,
|
|
||||||
batch_labels: np.ndarray) -> (float, int):
|
|
||||||
"""Returns the loss, number of sequences for a batch.
|
|
||||||
|
|
||||||
Args:
|
|
||||||
batch_logits: Logits returned by a seq2seq model, dim = (num_sequences,
|
|
||||||
num_tokens, vocab_size).
|
|
||||||
batch_labels: Target labels for the seq2seq model, dim = (num_sequences,
|
|
||||||
num_tokens, 1).
|
|
||||||
"""
|
|
||||||
batch_loss = 0.0
|
|
||||||
batch_length = len(batch_logits)
|
|
||||||
for sequence_logits, sequence_labels in zip(batch_logits, batch_labels):
|
|
||||||
sequence_loss = tf.losses.sparse_categorical_crossentropy(
|
|
||||||
tf.keras.backend.constant(sequence_labels),
|
|
||||||
tf.keras.backend.constant(sequence_logits),
|
|
||||||
from_logits=True)
|
|
||||||
batch_loss += sequence_loss.numpy().sum()
|
|
||||||
|
|
||||||
return batch_loss / batch_length, batch_length
|
|
||||||
|
|
||||||
|
|
||||||
def _get_batch_accuracy_metrics(batch_logits: np.ndarray,
|
|
||||||
batch_labels: np.ndarray) -> (float, float):
|
|
||||||
"""Returns the number of correct predictions, total number of predictions for a batch.
|
|
||||||
|
|
||||||
Args:
|
|
||||||
batch_logits: Logits returned by a seq2seq model, dim = (num_sequences,
|
|
||||||
num_tokens, vocab_size).
|
|
||||||
batch_labels: Target labels for the seq2seq model, dim = (num_sequences,
|
|
||||||
num_tokens, 1).
|
|
||||||
"""
|
|
||||||
batch_correct_preds = 0.0
|
|
||||||
batch_total_preds = 0.0
|
|
||||||
for sequence_logits, sequence_labels in zip(batch_logits, batch_labels):
|
|
||||||
preds = tf.metrics.sparse_categorical_accuracy(
|
|
||||||
tf.keras.backend.constant(sequence_labels),
|
|
||||||
tf.keras.backend.constant(sequence_logits))
|
|
||||||
batch_correct_preds += preds.numpy().sum()
|
|
||||||
batch_total_preds += len(sequence_labels)
|
|
||||||
|
|
||||||
return batch_correct_preds, batch_total_preds
|
|
||||||
|
|
||||||
|
|
||||||
def create_seq2seq_attacker_data(
|
|
||||||
attack_input_data: Seq2SeqAttackInputData,
|
|
||||||
test_fraction: float = 0.25,
|
|
||||||
balance: bool = True,
|
|
||||||
privacy_report_metadata: PrivacyReportMetadata = PrivacyReportMetadata()
|
|
||||||
) -> AttackerData:
|
|
||||||
"""Prepares Seq2SeqAttackInputData to train ML attackers.
|
|
||||||
|
|
||||||
Uses logits and losses to generate ranks and performs a random train-test
|
|
||||||
split.
|
|
||||||
|
|
||||||
Also computes metadata (loss, accuracy) for the model under attack
|
|
||||||
and populates respective fields of PrivacyReportMetadata.
|
|
||||||
|
|
||||||
Args:
|
|
||||||
attack_input_data: Original Seq2SeqAttackInputData
|
|
||||||
test_fraction: Fraction of the dataset to include in the test split.
|
|
||||||
balance: Whether the training and test sets for the membership inference
|
|
||||||
attacker should have a balanced (roughly equal) number of samples from the
|
|
||||||
training and test sets used to develop the model under attack.
|
|
||||||
privacy_report_metadata: the metadata of the model under attack.
|
|
||||||
|
|
||||||
Returns:
|
|
||||||
AttackerData.
|
|
||||||
"""
|
|
||||||
attack_input_train, loss_train, accuracy_train = _get_attack_features_and_metadata(
|
|
||||||
attack_input_data.logits_train, attack_input_data.labels_train)
|
|
||||||
attack_input_test, loss_test, accuracy_test = _get_attack_features_and_metadata(
|
|
||||||
attack_input_data.logits_test, attack_input_data.labels_test)
|
|
||||||
|
|
||||||
if balance:
|
|
||||||
min_size = min(len(attack_input_train), len(attack_input_test))
|
|
||||||
attack_input_train = _sample_multidimensional_array(attack_input_train,
|
|
||||||
min_size)
|
|
||||||
attack_input_test = _sample_multidimensional_array(attack_input_test,
|
|
||||||
min_size)
|
|
||||||
|
|
||||||
features_all = np.concatenate((attack_input_train, attack_input_test))
|
|
||||||
ntrain, ntest = attack_input_train.shape[0], attack_input_test.shape[0]
|
|
||||||
|
|
||||||
# Reshape for classifying one-dimensional features
|
|
||||||
features_all = features_all.reshape(-1, 1)
|
|
||||||
|
|
||||||
labels_all = np.concatenate(((np.zeros(ntrain)), (np.ones(ntest))))
|
|
||||||
|
|
||||||
# Perform a train-test split
|
|
||||||
features_train, features_test, \
|
|
||||||
is_training_labels_train, is_training_labels_test = \
|
|
||||||
model_selection.train_test_split(
|
|
||||||
features_all, labels_all, test_size=test_fraction, stratify=labels_all)
|
|
||||||
|
|
||||||
# Populate accuracy, loss fields in privacy report metadata
|
|
||||||
privacy_report_metadata.loss_train = loss_train
|
|
||||||
privacy_report_metadata.loss_test = loss_test
|
|
||||||
privacy_report_metadata.accuracy_train = accuracy_train
|
|
||||||
privacy_report_metadata.accuracy_test = accuracy_test
|
|
||||||
|
|
||||||
return AttackerData(features_train, is_training_labels_train, features_test,
|
|
||||||
is_training_labels_test,
|
|
||||||
DataSize(ntrain=ntrain, ntest=ntest))
|
|
||||||
|
|
||||||
|
|
||||||
def run_seq2seq_attack(attack_input: Seq2SeqAttackInputData,
|
|
||||||
privacy_report_metadata: PrivacyReportMetadata = None,
|
|
||||||
balance_attacker_training: bool = True) -> AttackResults:
|
|
||||||
"""Runs membership inference attacks on a seq2seq model.
|
|
||||||
|
|
||||||
Args:
|
|
||||||
attack_input: input data for running an attack
|
|
||||||
privacy_report_metadata: the metadata of the model under attack.
|
|
||||||
balance_attacker_training: Whether the training and test sets for the
|
|
||||||
membership inference attacker should have a balanced (roughly equal)
|
|
||||||
number of samples from the training and test sets used to develop the
|
|
||||||
model under attack.
|
|
||||||
|
|
||||||
Returns:
|
|
||||||
the attack result.
|
|
||||||
"""
|
|
||||||
attack_input.validate()
|
|
||||||
|
|
||||||
# The attacker uses the average rank (a single number) of a seq2seq dataset
|
|
||||||
# record to determine membership. So only Logistic Regression is supported,
|
|
||||||
# as it makes the most sense for single-number features.
|
|
||||||
attacker = models.LogisticRegressionAttacker()
|
|
||||||
|
|
||||||
# Create attacker data and populate fields of privacy_report_metadata
|
|
||||||
privacy_report_metadata = privacy_report_metadata or PrivacyReportMetadata()
|
|
||||||
prepared_attacker_data = create_seq2seq_attacker_data(
|
|
||||||
attack_input_data=attack_input,
|
|
||||||
balance=balance_attacker_training,
|
|
||||||
privacy_report_metadata=privacy_report_metadata)
|
|
||||||
|
|
||||||
attacker.train_model(prepared_attacker_data.features_train,
|
|
||||||
prepared_attacker_data.is_training_labels_train)
|
|
||||||
|
|
||||||
# Run the attacker on (permuted) test examples.
|
|
||||||
predictions_test = attacker.predict(prepared_attacker_data.features_test)
|
|
||||||
|
|
||||||
# Generate ROC curves with predictions.
|
|
||||||
fpr, tpr, thresholds = metrics.roc_curve(
|
|
||||||
prepared_attacker_data.is_training_labels_test, predictions_test)
|
|
||||||
|
|
||||||
roc_curve = RocCurve(tpr=tpr, fpr=fpr, thresholds=thresholds)
|
|
||||||
|
|
||||||
attack_results = [
|
|
||||||
SingleAttackResult(
|
|
||||||
slice_spec=SingleSliceSpec(),
|
|
||||||
attack_type=AttackType.LOGISTIC_REGRESSION,
|
|
||||||
roc_curve=roc_curve,
|
|
||||||
data_size=prepared_attacker_data.data_size)
|
|
||||||
]
|
|
||||||
|
|
||||||
return AttackResults(
|
|
||||||
single_attack_results=attack_results,
|
|
||||||
privacy_report_metadata=privacy_report_metadata)
|
|
||||||
|
|
|
@ -13,187 +13,6 @@
|
||||||
# limitations under the License.
|
# limitations under the License.
|
||||||
|
|
||||||
# Lint as: python3
|
# Lint as: python3
|
||||||
"""A hook and a function in tf estimator for membership inference attack."""
|
"""Moved to privacy_attack/membership_inference_attack."""
|
||||||
|
|
||||||
import os
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.tf_estimator_evaluation import * # pylint: disable=wildcard-import
|
||||||
from typing import Iterable
|
|
||||||
from absl import logging
|
|
||||||
import numpy as np
|
|
||||||
import tensorflow.compat.v1 as tf
|
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack import membership_inference_attack as mia
|
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import AttackInputData
|
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import AttackType
|
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import get_flattened_attack_metrics
|
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import SlicingSpec
|
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack.utils import log_loss
|
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack.utils_tensorboard import write_results_to_tensorboard
|
|
||||||
|
|
||||||
|
|
||||||
def calculate_losses(estimator, input_fn, labels):
|
|
||||||
"""Get predictions and losses for samples.
|
|
||||||
|
|
||||||
The assumptions are 1) the loss is cross-entropy loss, and 2) user have
|
|
||||||
specified prediction mode to return predictions, e.g.,
|
|
||||||
when mode == tf.estimator.ModeKeys.PREDICT, the model function returns
|
|
||||||
tf.estimator.EstimatorSpec(mode=mode, predictions=tf.nn.softmax(logits)).
|
|
||||||
|
|
||||||
Args:
|
|
||||||
estimator: model to make prediction
|
|
||||||
input_fn: input function to be used in estimator.predict
|
|
||||||
labels: array of size (n_samples, ), true labels of samples (integer valued)
|
|
||||||
|
|
||||||
Returns:
|
|
||||||
preds: probability vector of each sample
|
|
||||||
loss: cross entropy loss of each sample
|
|
||||||
"""
|
|
||||||
pred = np.array(list(estimator.predict(input_fn=input_fn)))
|
|
||||||
loss = log_loss(labels, pred)
|
|
||||||
return pred, loss
|
|
||||||
|
|
||||||
|
|
||||||
class MembershipInferenceTrainingHook(tf.estimator.SessionRunHook):
|
|
||||||
"""Training hook to perform membership inference attack on epoch end."""
|
|
||||||
|
|
||||||
def __init__(
|
|
||||||
self,
|
|
||||||
estimator,
|
|
||||||
in_train, out_train,
|
|
||||||
input_fn_constructor,
|
|
||||||
slicing_spec: SlicingSpec = None,
|
|
||||||
attack_types: Iterable[AttackType] = (AttackType.THRESHOLD_ATTACK,),
|
|
||||||
tensorboard_dir=None,
|
|
||||||
tensorboard_merge_classifiers=False):
|
|
||||||
"""Initialize the hook.
|
|
||||||
|
|
||||||
Args:
|
|
||||||
estimator: model to be tested
|
|
||||||
in_train: (in_training samples, in_training labels)
|
|
||||||
out_train: (out_training samples, out_training labels)
|
|
||||||
input_fn_constructor: a function that receives sample, label and construct
|
|
||||||
the input_fn for model prediction
|
|
||||||
slicing_spec: slicing specification of the attack
|
|
||||||
attack_types: a list of attacks, each of type AttackType
|
|
||||||
tensorboard_dir: directory for tensorboard summary
|
|
||||||
tensorboard_merge_classifiers: if true, plot different classifiers with
|
|
||||||
the same slicing_spec and metric in the same figure
|
|
||||||
"""
|
|
||||||
in_train_data, self._in_train_labels = in_train
|
|
||||||
out_train_data, self._out_train_labels = out_train
|
|
||||||
|
|
||||||
# Define the input functions for both in and out-training samples.
|
|
||||||
self._in_train_input_fn = input_fn_constructor(in_train_data,
|
|
||||||
self._in_train_labels)
|
|
||||||
self._out_train_input_fn = input_fn_constructor(out_train_data,
|
|
||||||
self._out_train_labels)
|
|
||||||
self._estimator = estimator
|
|
||||||
self._slicing_spec = slicing_spec
|
|
||||||
self._attack_types = attack_types
|
|
||||||
self._tensorboard_merge_classifiers = tensorboard_merge_classifiers
|
|
||||||
if tensorboard_dir:
|
|
||||||
if tensorboard_merge_classifiers:
|
|
||||||
self._writers = {}
|
|
||||||
with tf.Graph().as_default():
|
|
||||||
for attack_type in attack_types:
|
|
||||||
self._writers[attack_type.name] = tf.summary.FileWriter(
|
|
||||||
os.path.join(tensorboard_dir, 'MI', attack_type.name))
|
|
||||||
else:
|
|
||||||
with tf.Graph().as_default():
|
|
||||||
self._writers = tf.summary.FileWriter(
|
|
||||||
os.path.join(tensorboard_dir, 'MI'))
|
|
||||||
logging.info('Will write to tensorboard.')
|
|
||||||
else:
|
|
||||||
self._writers = None
|
|
||||||
|
|
||||||
def end(self, session):
|
|
||||||
results = run_attack_helper(self._estimator,
|
|
||||||
self._in_train_input_fn,
|
|
||||||
self._out_train_input_fn,
|
|
||||||
self._in_train_labels, self._out_train_labels,
|
|
||||||
self._slicing_spec,
|
|
||||||
self._attack_types)
|
|
||||||
logging.info(results)
|
|
||||||
|
|
||||||
att_types, att_slices, att_metrics, att_values = get_flattened_attack_metrics(
|
|
||||||
results)
|
|
||||||
print('Attack result:')
|
|
||||||
print('\n'.join([' %s: %.4f' % (', '.join([s, t, m]), v) for t, s, m, v in
|
|
||||||
zip(att_types, att_slices, att_metrics, att_values)]))
|
|
||||||
|
|
||||||
# Write to tensorboard if tensorboard_dir is specified
|
|
||||||
global_step = self._estimator.get_variable_value('global_step')
|
|
||||||
if self._writers is not None:
|
|
||||||
write_results_to_tensorboard(results, self._writers, global_step,
|
|
||||||
self._tensorboard_merge_classifiers)
|
|
||||||
|
|
||||||
|
|
||||||
def run_attack_on_tf_estimator_model(
|
|
||||||
estimator, in_train, out_train,
|
|
||||||
input_fn_constructor,
|
|
||||||
slicing_spec: SlicingSpec = None,
|
|
||||||
attack_types: Iterable[AttackType] = (AttackType.THRESHOLD_ATTACK,)):
|
|
||||||
"""Performs the attack in the end of training.
|
|
||||||
|
|
||||||
Args:
|
|
||||||
estimator: model to be tested
|
|
||||||
in_train: (in_training samples, in_training labels)
|
|
||||||
out_train: (out_training samples, out_training labels)
|
|
||||||
input_fn_constructor: a function that receives sample, label and construct
|
|
||||||
the input_fn for model prediction
|
|
||||||
slicing_spec: slicing specification of the attack
|
|
||||||
attack_types: a list of attacks, each of type AttackType
|
|
||||||
Returns:
|
|
||||||
Results of the attack
|
|
||||||
"""
|
|
||||||
in_train_data, in_train_labels = in_train
|
|
||||||
out_train_data, out_train_labels = out_train
|
|
||||||
|
|
||||||
# Define the input functions for both in and out-training samples.
|
|
||||||
in_train_input_fn = input_fn_constructor(in_train_data, in_train_labels)
|
|
||||||
out_train_input_fn = input_fn_constructor(out_train_data, out_train_labels)
|
|
||||||
|
|
||||||
# Call the helper to run the attack.
|
|
||||||
results = run_attack_helper(estimator,
|
|
||||||
in_train_input_fn, out_train_input_fn,
|
|
||||||
in_train_labels, out_train_labels,
|
|
||||||
slicing_spec,
|
|
||||||
attack_types)
|
|
||||||
logging.info('End of training attack:')
|
|
||||||
logging.info(results)
|
|
||||||
return results
|
|
||||||
|
|
||||||
|
|
||||||
def run_attack_helper(
|
|
||||||
estimator,
|
|
||||||
in_train_input_fn, out_train_input_fn,
|
|
||||||
in_train_labels, out_train_labels,
|
|
||||||
slicing_spec: SlicingSpec = None,
|
|
||||||
attack_types: Iterable[AttackType] = (AttackType.THRESHOLD_ATTACK,)):
|
|
||||||
"""A helper function to perform attack.
|
|
||||||
|
|
||||||
Args:
|
|
||||||
estimator: model to be tested
|
|
||||||
in_train_input_fn: input_fn for in training data
|
|
||||||
out_train_input_fn: input_fn for out of training data
|
|
||||||
in_train_labels: in training labels
|
|
||||||
out_train_labels: out of training labels
|
|
||||||
slicing_spec: slicing specification of the attack
|
|
||||||
attack_types: a list of attacks, each of type AttackType
|
|
||||||
Returns:
|
|
||||||
Results of the attack
|
|
||||||
"""
|
|
||||||
# Compute predictions and losses
|
|
||||||
in_train_pred, in_train_loss = calculate_losses(estimator,
|
|
||||||
in_train_input_fn,
|
|
||||||
in_train_labels)
|
|
||||||
out_train_pred, out_train_loss = calculate_losses(estimator,
|
|
||||||
out_train_input_fn,
|
|
||||||
out_train_labels)
|
|
||||||
attack_input = AttackInputData(
|
|
||||||
logits_train=in_train_pred, logits_test=out_train_pred,
|
|
||||||
labels_train=in_train_labels, labels_test=out_train_labels,
|
|
||||||
loss_train=in_train_loss, loss_test=out_train_loss
|
|
||||||
)
|
|
||||||
results = mia.run_attacks(attack_input,
|
|
||||||
slicing_spec=slicing_spec,
|
|
||||||
attack_types=attack_types)
|
|
||||||
return results
|
|
||||||
|
|
7
tensorflow_privacy/privacy/privacy_tests/README.md
Normal file
7
tensorflow_privacy/privacy/privacy_tests/README.md
Normal file
|
@ -0,0 +1,7 @@
|
||||||
|
# Privacy tests
|
||||||
|
|
||||||
|
A good privacy-preserving model learns from the training data, but
|
||||||
|
doesn't memorize individual samples. Excessive memorization is not only harmful
|
||||||
|
for the model predictive power, but also presents a privacy risk.
|
||||||
|
|
||||||
|
This library provides empirical tests for measuring potential memorization.
|
|
@ -0,0 +1,269 @@
|
||||||
|
# Membership inference attack
|
||||||
|
|
||||||
|
A good privacy-preserving model learns from the training data, but
|
||||||
|
doesn't memorize it. This library provides empirical tests for measuring
|
||||||
|
potential memorization.
|
||||||
|
|
||||||
|
Technically, the tests build classifiers that infer whether a particular sample
|
||||||
|
was present in the training set. The more accurate such classifier is, the more
|
||||||
|
memorization is present and thus the less privacy-preserving the model is.
|
||||||
|
|
||||||
|
The privacy vulnerability (or memorization potential) is measured
|
||||||
|
via the area under the ROC-curve (`auc`) or via max{|fpr - tpr|} (`advantage`)
|
||||||
|
of the attack classifier. These measures are very closely related.
|
||||||
|
|
||||||
|
The tests provided by the library are "black box". That is, only the outputs of
|
||||||
|
the model are used (e.g., losses, logits, predictions). Neither model internals
|
||||||
|
(weights) nor input samples are required.
|
||||||
|
|
||||||
|
## How to use
|
||||||
|
|
||||||
|
### Installation notes
|
||||||
|
|
||||||
|
To use the latest version of the MIA library, please install TF Privacy with
|
||||||
|
"pip install -U git+https://github.com/tensorflow/privacy". See
|
||||||
|
https://github.com/tensorflow/privacy/issues/151 for more details.
|
||||||
|
|
||||||
|
### Basic usage
|
||||||
|
|
||||||
|
The simplest possible usage is
|
||||||
|
|
||||||
|
```python
|
||||||
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack import membership_inference_attack as mia
|
||||||
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import AttackInputData
|
||||||
|
|
||||||
|
# Suppose we have the labels as integers starting from 0
|
||||||
|
# labels_train shape: (n_train, )
|
||||||
|
# labels_test shape: (n_test, )
|
||||||
|
|
||||||
|
# Evaluate your model on training and test examples to get
|
||||||
|
# loss_train shape: (n_train, )
|
||||||
|
# loss_test shape: (n_test, )
|
||||||
|
|
||||||
|
attacks_result = mia.run_attacks(
|
||||||
|
AttackInputData(
|
||||||
|
loss_train = loss_train,
|
||||||
|
loss_test = loss_test,
|
||||||
|
labels_train = labels_train,
|
||||||
|
labels_test = labels_test))
|
||||||
|
```
|
||||||
|
|
||||||
|
This example calls `run_attacks` with the default options to run a host of
|
||||||
|
(fairly simple) attacks behind the scenes (depending on which data is fed in),
|
||||||
|
and computes the most important measures.
|
||||||
|
|
||||||
|
> NOTE: The train and test sets are balanced internally, i.e., an equal number
|
||||||
|
> of in-training and out-of-training examples is chosen for the attacks
|
||||||
|
> (whichever has fewer examples). These are subsampled uniformly at random
|
||||||
|
> without replacement from the larger of the two.
|
||||||
|
|
||||||
|
Then, we can view the attack results by:
|
||||||
|
|
||||||
|
```python
|
||||||
|
print(attacks_result.summary())
|
||||||
|
# Example output:
|
||||||
|
# -> Best-performing attacks over all slices
|
||||||
|
# THRESHOLD_ATTACK (with 50000 training and 10000 test examples) achieved an AUC of 0.59 on slice Entire dataset
|
||||||
|
# THRESHOLD_ATTACK (with 50000 training and 10000 test examples) achieved an advantage of 0.20 on slice Entire dataset
|
||||||
|
```
|
||||||
|
|
||||||
|
### Other codelabs
|
||||||
|
|
||||||
|
Please head over to the [codelabs](https://github.com/tensorflow/privacy/tree/master/tensorflow_privacy/privacy/privacy_tests/membership_inference_attack/codelabs)
|
||||||
|
section for an overview of the library in action.
|
||||||
|
|
||||||
|
### Advanced usage
|
||||||
|
|
||||||
|
#### Specifying attacks to run
|
||||||
|
|
||||||
|
Sometimes, we have more information about the data, such as the logits and the
|
||||||
|
labels,
|
||||||
|
and we may want to have finer-grained control of the attack, such as using more
|
||||||
|
complicated classifiers instead of the simple threshold attack, and looks at the
|
||||||
|
attack results by examples' class.
|
||||||
|
In thoses cases, we can provide more information to `run_attacks`.
|
||||||
|
|
||||||
|
```python
|
||||||
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack import membership_inference_attack as mia
|
||||||
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import AttackInputData
|
||||||
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import SlicingSpec
|
||||||
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import AttackType
|
||||||
|
```
|
||||||
|
|
||||||
|
First, similar as before, we specify the input for the attack as an
|
||||||
|
`AttackInputData` object:
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Evaluate your model on training and test examples to get
|
||||||
|
# logits_train shape: (n_train, n_classes)
|
||||||
|
# logits_test shape: (n_test, n_classes)
|
||||||
|
# loss_train shape: (n_train, )
|
||||||
|
# loss_test shape: (n_test, )
|
||||||
|
|
||||||
|
attack_input = AttackInputData(
|
||||||
|
logits_train = logits_train,
|
||||||
|
logits_test = logits_test,
|
||||||
|
loss_train = loss_train,
|
||||||
|
loss_test = loss_test,
|
||||||
|
labels_train = labels_train,
|
||||||
|
labels_test = labels_test)
|
||||||
|
```
|
||||||
|
|
||||||
|
Instead of `logits`, you can also specify
|
||||||
|
`probs_train` and `probs_test` as the predicted probabilty vectors of each
|
||||||
|
example.
|
||||||
|
|
||||||
|
Then, we specify some details of the attack.
|
||||||
|
The first part includes the specifications of the slicing of the data. For
|
||||||
|
example, we may want to evaluate the result on the whole dataset, or by class,
|
||||||
|
percentiles, or the correctness of the model's classification.
|
||||||
|
These can be specified by a `SlicingSpec` object.
|
||||||
|
|
||||||
|
```python
|
||||||
|
slicing_spec = SlicingSpec(
|
||||||
|
entire_dataset = True,
|
||||||
|
by_class = True,
|
||||||
|
by_percentiles = False,
|
||||||
|
by_classification_correctness = True)
|
||||||
|
```
|
||||||
|
|
||||||
|
The second part specifies the classifiers for the attacker to use.
|
||||||
|
Currently, our API supports five classifiers, including
|
||||||
|
`AttackType.THRESHOLD_ATTACK` for simple threshold attack,
|
||||||
|
`AttackType.LOGISTIC_REGRESSION`,
|
||||||
|
`AttackType.MULTI_LAYERED_PERCEPTRON`,
|
||||||
|
`AttackType.RANDOM_FOREST`, and
|
||||||
|
`AttackType.K_NEAREST_NEIGHBORS`
|
||||||
|
which use the corresponding machine learning models.
|
||||||
|
For some model, different classifiers can yield pertty different results.
|
||||||
|
We can put multiple classifers in a list:
|
||||||
|
|
||||||
|
```python
|
||||||
|
attack_types = [
|
||||||
|
AttackType.THRESHOLD_ATTACK,
|
||||||
|
AttackType.LOGISTIC_REGRESSION
|
||||||
|
]
|
||||||
|
```
|
||||||
|
|
||||||
|
Now, we can call the `run_attacks` methods with all specifications:
|
||||||
|
|
||||||
|
```python
|
||||||
|
attacks_result = mia.run_attacks(attack_input=attack_input,
|
||||||
|
slicing_spec=slicing_spec,
|
||||||
|
attack_types=attack_types)
|
||||||
|
```
|
||||||
|
|
||||||
|
This returns an object of type `AttackResults`. We can, for example, use the
|
||||||
|
following code to see the attack results specificed per-slice, as we have
|
||||||
|
request attacks by class and by model's classification correctness.
|
||||||
|
|
||||||
|
```python
|
||||||
|
print(attacks_result.summary(by_slices = True))
|
||||||
|
# Example output:
|
||||||
|
# -> Best-performing attacks over all slices
|
||||||
|
# THRESHOLD_ATTACK achieved an AUC of 0.75 on slice CORRECTLY_CLASSIFIED=False
|
||||||
|
# THRESHOLD_ATTACK achieved an advantage of 0.38 on slice CORRECTLY_CLASSIFIED=False
|
||||||
|
#
|
||||||
|
# Best-performing attacks over slice: "Entire dataset"
|
||||||
|
# LOGISTIC_REGRESSION achieved an AUC of 0.61
|
||||||
|
# THRESHOLD_ATTACK achieved an advantage of 0.22
|
||||||
|
#
|
||||||
|
# Best-performing attacks over slice: "CLASS=0"
|
||||||
|
# LOGISTIC_REGRESSION achieved an AUC of 0.62
|
||||||
|
# LOGISTIC_REGRESSION achieved an advantage of 0.24
|
||||||
|
#
|
||||||
|
# Best-performing attacks over slice: "CLASS=1"
|
||||||
|
# LOGISTIC_REGRESSION achieved an AUC of 0.61
|
||||||
|
# LOGISTIC_REGRESSION achieved an advantage of 0.19
|
||||||
|
#
|
||||||
|
# ...
|
||||||
|
#
|
||||||
|
# Best-performing attacks over slice: "CORRECTLY_CLASSIFIED=True"
|
||||||
|
# LOGISTIC_REGRESSION achieved an AUC of 0.53
|
||||||
|
# THRESHOLD_ATTACK achieved an advantage of 0.05
|
||||||
|
#
|
||||||
|
# Best-performing attacks over slice: "CORRECTLY_CLASSIFIED=False"
|
||||||
|
# THRESHOLD_ATTACK achieved an AUC of 0.75
|
||||||
|
# THRESHOLD_ATTACK achieved an advantage of 0.38
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
#### Viewing and plotting the attack results
|
||||||
|
|
||||||
|
We have seen an example of using `summary()` to view the attack results as text.
|
||||||
|
We also provide some other ways for inspecting the attack results.
|
||||||
|
|
||||||
|
To get the attack that achieves the maximum attacker advantage or AUC, we can do
|
||||||
|
|
||||||
|
```python
|
||||||
|
max_auc_attacker = attacks_result.get_result_with_max_auc()
|
||||||
|
max_advantage_attacker = attacks_result.get_result_with_max_attacker_advantage()
|
||||||
|
```
|
||||||
|
Then, for individual attack, such as `max_auc_attacker`, we can check its type,
|
||||||
|
attacker advantage and AUC by
|
||||||
|
|
||||||
|
```python
|
||||||
|
print("Attack type with max AUC: %s, AUC of %.2f, Attacker advantage of %.2f" %
|
||||||
|
(max_auc_attacker.attack_type,
|
||||||
|
max_auc_attacker.roc_curve.get_auc(),
|
||||||
|
max_auc_attacker.roc_curve.get_attacker_advantage()))
|
||||||
|
# Example output:
|
||||||
|
# -> Attack type with max AUC: THRESHOLD_ATTACK, AUC of 0.75, Attacker advantage of 0.38
|
||||||
|
```
|
||||||
|
We can also plot its ROC curve by
|
||||||
|
|
||||||
|
```python
|
||||||
|
import tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.plotting as plotting
|
||||||
|
|
||||||
|
figure = plotting.plot_roc_curve(max_auc_attacker.roc_curve)
|
||||||
|
```
|
||||||
|
which would give a figure like the one below
|
||||||
|
![roc_fig](https://github.com/tensorflow/privacy/blob/master/tensorflow_privacy/privacy/privacy_tests/membership_inference_attack/codelab_roc_fig.png?raw=true)
|
||||||
|
|
||||||
|
Additionally, we provide functionality to convert the attack results into Pandas
|
||||||
|
data frame:
|
||||||
|
|
||||||
|
```python
|
||||||
|
import pandas as pd
|
||||||
|
|
||||||
|
pd.set_option("display.max_rows", 8, "display.max_columns", None)
|
||||||
|
print(attacks_result.calculate_pd_dataframe())
|
||||||
|
# Example output:
|
||||||
|
# slice feature slice value attack type Attacker advantage AUC
|
||||||
|
# 0 entire_dataset threshold 0.216440 0.600630
|
||||||
|
# 1 entire_dataset lr 0.212073 0.612989
|
||||||
|
# 2 class 0 threshold 0.226000 0.611669
|
||||||
|
# 3 class 0 lr 0.239452 0.624076
|
||||||
|
# .. ... ... ... ... ...
|
||||||
|
# 22 correctly_classfied True threshold 0.054907 0.471290
|
||||||
|
# 23 correctly_classfied True lr 0.046986 0.525194
|
||||||
|
# 24 correctly_classfied False threshold 0.379465 0.748138
|
||||||
|
# 25 correctly_classfied False lr 0.370713 0.737148
|
||||||
|
```
|
||||||
|
|
||||||
|
### External guides / press mentions
|
||||||
|
|
||||||
|
* [Introductory blog post](https://franziska-boenisch.de/posts/2021/01/membership-inference/)
|
||||||
|
to the theory and the library by Franziska Boenisch from the Fraunhofer AISEC
|
||||||
|
institute.
|
||||||
|
* [Google AI Blog Post](https://ai.googleblog.com/2021/01/google-research-looking-back-at-2020.html#ResponsibleAI)
|
||||||
|
* [TensorFlow Blog Post](https://blog.tensorflow.org/2020/06/introducing-new-privacy-testing-library.html)
|
||||||
|
* [VentureBeat article](https://venturebeat.com/2020/06/24/google-releases-experimental-tensorflow-module-that-tests-the-privacy-of-ai-models/)
|
||||||
|
* [Tech Xplore article](https://techxplore.com/news/2020-06-google-tensorflow-privacy-module.html)
|
||||||
|
|
||||||
|
|
||||||
|
## Contact / Feedback
|
||||||
|
|
||||||
|
Fill out this
|
||||||
|
[Google form](https://docs.google.com/forms/d/1DPwr3_OfMcqAOA6sdelTVjIZhKxMZkXvs94z16UCDa4/edit)
|
||||||
|
or reach out to us at tf-privacy@google.com and let us know how you’re using
|
||||||
|
this module. We’re keen on hearing your stories, feedback, and suggestions!
|
||||||
|
|
||||||
|
## Contributing
|
||||||
|
|
||||||
|
If you wish to add novel attacks to the attack library, please check our
|
||||||
|
[guidelines](https://github.com/tensorflow/privacy/blob/master/tensorflow_privacy/privacy/privacy_tests/membership_inference_attack/CONTRIBUTING.md).
|
||||||
|
|
||||||
|
## Copyright
|
||||||
|
|
||||||
|
Copyright 2021 - Google LLC
|
|
@ -0,0 +1,13 @@
|
||||||
|
# Copyright 2020, The TensorFlow Authors.
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
#
|
||||||
|
# http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
# See the License for the specific language governing permissions and
|
||||||
|
# limitations under the License.
|
Before Width: | Height: | Size: 14 KiB After Width: | Height: | Size: 14 KiB |
|
@ -2,7 +2,7 @@
|
||||||
|
|
||||||
## Introductory codelab
|
## Introductory codelab
|
||||||
|
|
||||||
The easiest way to get started is to go through [the introductory codelab](https://github.com/tensorflow/privacy/blob/master/tensorflow_privacy/privacy/membership_inference_attack/codelabs/codelab.ipynb).
|
The easiest way to get started is to go through [the introductory codelab](https://github.com/tensorflow/privacy/blob/master/tensorflow_privacy/privacy/privacy_tests/membership_inference_attack/codelabs/codelab.ipynb).
|
||||||
This trains a simple image classification model and tests it against a series
|
This trains a simple image classification model and tests it against a series
|
||||||
of membership inference attacks.
|
of membership inference attacks.
|
||||||
|
|
||||||
|
@ -10,18 +10,18 @@ For a more detailed overview of the library, please check the sections below.
|
||||||
|
|
||||||
## End to end example
|
## End to end example
|
||||||
As an alternative to the introductory codelab, we also have a standalone
|
As an alternative to the introductory codelab, we also have a standalone
|
||||||
[example.py](https://github.com/tensorflow/privacy/blob/master/tensorflow_privacy/privacy/membership_inference_attack/codelabs/example.py).
|
[example.py](https://github.com/tensorflow/privacy/blob/master/tensorflow_privacy/privacy/privacy_tests/membership_inference_attack/codelabs/example.py).
|
||||||
|
|
||||||
## Sequence to sequence models
|
## Sequence to sequence models
|
||||||
|
|
||||||
If you're interested in sequence to sequence model attacks, please see the
|
If you're interested in sequence to sequence model attacks, please see the
|
||||||
[seq2seq colab](https://github.com/tensorflow/privacy/blob/master/tensorflow_privacy/privacy/membership_inference_attack/codelabs/third_party/seq2seq_membership_inference/seq2seq_membership_inference_codelab.ipynb).
|
[seq2seq colab](https://github.com/tensorflow/privacy/blob/master/tensorflow_privacy/privacy/privacy_tests/membership_inference_attack/codelabs/third_party/seq2seq_membership_inference/seq2seq_membership_inference_codelab.ipynb).
|
||||||
|
|
||||||
## Membership probability score
|
## Membership probability score
|
||||||
|
|
||||||
If you're interested in the membership probability score (also called privacy
|
If you're interested in the membership probability score (also called privacy
|
||||||
risk score) developed by Song and Mittal, please see their
|
risk score) developed by Song and Mittal, please see their
|
||||||
[membership probability codelab](https://github.com/tensorflow/privacy/blob/master/tensorflow_privacy/privacy/membership_inference_attack/codelabs/membership_probability_codelab.ipynb).
|
[membership probability codelab](https://github.com/tensorflow/privacy/blob/master/tensorflow_privacy/privacy/privacy_tests/membership_inference_attack/codelabs/membership_probability_codelab.ipynb).
|
||||||
|
|
||||||
The accompanying paper is on [arXiv](https://arxiv.org/abs/2003.10595).
|
The accompanying paper is on [arXiv](https://arxiv.org/abs/2003.10595).
|
||||||
|
|
|
@ -53,10 +53,10 @@
|
||||||
"source": [
|
"source": [
|
||||||
"<table class=\"tfo-notebook-buttons\" align=\"left\">\n",
|
"<table class=\"tfo-notebook-buttons\" align=\"left\">\n",
|
||||||
" <td>\n",
|
" <td>\n",
|
||||||
" <a target=\"_blank\" href=\"https://colab.research.google.com/github/tensorflow/privacy/blob/master/tensorflow_privacy/privacy/membership_inference_attack/codelabs/codelab.ipynb\"><img src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" />Run in Google Colab</a>\n",
|
" <a target=\"_blank\" href=\"https://colab.research.google.com/github/tensorflow/privacy/blob/master/tensorflow_privacy/privacy/privacy_tests/membership_inference_attack/codelabs/codelab.ipynb\"><img src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" />Run in Google Colab</a>\n",
|
||||||
" </td>\n",
|
" </td>\n",
|
||||||
" <td>\n",
|
" <td>\n",
|
||||||
" <a target=\"_blank\" href=\"https://github.com/tensorflow/privacy/blob/master/tensorflow_privacy/privacy/membership_inference_attack/codelabs/codelab.ipynb\"><img src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\" />View source on GitHub</a>\n",
|
" <a target=\"_blank\" href=\"https://github.com/tensorflow/privacy/blob/master/tensorflow_privacy/privacy/privacy_tests/membership_inference_attack/codelabs/codelab.ipynb\"><img src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\" />View source on GitHub</a>\n",
|
||||||
" </td>\n",
|
" </td>\n",
|
||||||
"</table>"
|
"</table>"
|
||||||
]
|
]
|
||||||
|
@ -133,7 +133,7 @@
|
||||||
"source": [
|
"source": [
|
||||||
"!pip3 install git+https://github.com/tensorflow/privacy\n",
|
"!pip3 install git+https://github.com/tensorflow/privacy\n",
|
||||||
"\n",
|
"\n",
|
||||||
"from tensorflow_privacy.privacy.membership_inference_attack import membership_inference_attack as mia"
|
"from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack import membership_inference_attack as mia"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
|
@ -298,11 +298,11 @@
|
||||||
},
|
},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"from tensorflow_privacy.privacy.membership_inference_attack.data_structures import AttackInputData\n",
|
"from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import AttackInputData\n",
|
||||||
"from tensorflow_privacy.privacy.membership_inference_attack.data_structures import SlicingSpec\n",
|
"from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import SlicingSpec\n",
|
||||||
"from tensorflow_privacy.privacy.membership_inference_attack.data_structures import AttackType\n",
|
"from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import AttackType\n",
|
||||||
"\n",
|
"\n",
|
||||||
"import tensorflow_privacy.privacy.membership_inference_attack.plotting as plotting\n",
|
"import tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.plotting as plotting\n",
|
||||||
"\n",
|
"\n",
|
||||||
"labels_train = np.argmax(y_train, axis=1)\n",
|
"labels_train = np.argmax(y_train, axis=1)\n",
|
||||||
"labels_test = np.argmax(y_test, axis=1)\n",
|
"labels_test = np.argmax(y_test, axis=1)\n",
|
|
@ -28,18 +28,17 @@ from sklearn import metrics
|
||||||
from tensorflow import keras
|
from tensorflow import keras
|
||||||
from tensorflow.keras import layers
|
from tensorflow.keras import layers
|
||||||
from tensorflow.keras.utils import to_categorical
|
from tensorflow.keras.utils import to_categorical
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack import membership_inference_attack as mia
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack import membership_inference_attack as mia
|
||||||
|
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import AttackInputData
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import AttackInputData
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import AttackResults
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import AttackResults
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import AttackResultsCollection
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import AttackResultsCollection
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import AttackType
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import AttackType
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import PrivacyMetric
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import PrivacyMetric
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import \
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import PrivacyReportMetadata
|
||||||
PrivacyReportMetadata
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import SlicingSpec
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import SlicingSpec
|
import tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.plotting as plotting
|
||||||
import tensorflow_privacy.privacy.membership_inference_attack.plotting as plotting
|
import tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.privacy_report as privacy_report
|
||||||
import tensorflow_privacy.privacy.membership_inference_attack.privacy_report as privacy_report
|
|
||||||
|
|
||||||
|
|
||||||
def generate_random_cluster(center, scale, num_points):
|
def generate_random_cluster(center, scale, num_points):
|
|
@ -53,10 +53,10 @@
|
||||||
"source": [
|
"source": [
|
||||||
"<table class=\"tfo-notebook-buttons\" align=\"left\">\n",
|
"<table class=\"tfo-notebook-buttons\" align=\"left\">\n",
|
||||||
" <td>\n",
|
" <td>\n",
|
||||||
" <a target=\"_blank\" href=\"https://colab.research.google.com/github/tensorflow/privacy/blob/master/tensorflow_privacy/privacy/membership_inference_attack/codelabs/membership_probability_codelab.ipynb\"><img src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" />Run in Google Colab</a>\n",
|
" <a target=\"_blank\" href=\"https://colab.research.google.com/github/tensorflow/privacy/blob/master/tensorflow_privacy/privacy/privacy_tests/membership_inference_attack/codelabs/membership_probability_codelab.ipynb\"><img src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" />Run in Google Colab</a>\n",
|
||||||
" </td>\n",
|
" </td>\n",
|
||||||
" <td>\n",
|
" <td>\n",
|
||||||
" <a target=\"_blank\" href=\"https://github.com/tensorflow/privacy/blob/master/tensorflow_privacy/privacy/membership_inference_attack/codelabs/membership_probability_codelab.ipynb\"><img src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\" />View source on GitHub</a>\n",
|
" <a target=\"_blank\" href=\"https://github.com/tensorflow/privacy/blob/master/tensorflow_privacy/privacy/privacy_tests/membership_inference_attack/codelabs/membership_probability_codelab.ipynb\"><img src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\" />View source on GitHub</a>\n",
|
||||||
" </td>\n",
|
" </td>\n",
|
||||||
"</table>"
|
"</table>"
|
||||||
]
|
]
|
||||||
|
@ -133,7 +133,7 @@
|
||||||
"source": [
|
"source": [
|
||||||
"!pip3 install git+https://github.com/tensorflow/privacy\n",
|
"!pip3 install git+https://github.com/tensorflow/privacy\n",
|
||||||
"\n",
|
"\n",
|
||||||
"from tensorflow_privacy.privacy.membership_inference_attack import membership_inference_attack as mia"
|
"from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack import membership_inference_attack as mia"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
|
@ -627,11 +627,11 @@
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
"source": [
|
"source": [
|
||||||
"from tensorflow_privacy.privacy.membership_inference_attack.data_structures import AttackInputData\n",
|
"from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import AttackInputData\n",
|
||||||
"from tensorflow_privacy.privacy.membership_inference_attack.data_structures import SlicingSpec\n",
|
"from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import SlicingSpec\n",
|
||||||
"from tensorflow_privacy.privacy.membership_inference_attack.data_structures import AttackType\n",
|
"from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import AttackType\n",
|
||||||
"\n",
|
"\n",
|
||||||
"import tensorflow_privacy.privacy.membership_inference_attack.plotting as plotting\n",
|
"import tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.plotting as plotting\n",
|
||||||
"\n",
|
"\n",
|
||||||
"labels_train = np.argmax(y_train, axis=1)\n",
|
"labels_train = np.argmax(y_train, axis=1)\n",
|
||||||
"labels_test = np.argmax(y_test, axis=1)\n",
|
"labels_test = np.argmax(y_test, axis=1)\n",
|
||||||
|
@ -1190,9 +1190,9 @@
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
"source": [
|
"source": [
|
||||||
"from tensorflow_privacy.privacy.membership_inference_attack.data_structures import SingleSliceSpec\n",
|
"from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import SingleSliceSpec\n",
|
||||||
"from tensorflow_privacy.privacy.membership_inference_attack.data_structures import SlicingFeature\n",
|
"from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import SlicingFeature\n",
|
||||||
"from tensorflow_privacy.privacy.membership_inference_attack.dataset_slicing import get_slice\n",
|
"from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.dataset_slicing import get_slice\n",
|
||||||
"import matplotlib.pyplot as plt\n",
|
"import matplotlib.pyplot as plt\n",
|
||||||
"class_list = np.arange(10)\n",
|
"class_list = np.arange(10)\n",
|
||||||
"num_images = 5\n",
|
"num_images = 5\n",
|
|
@ -13,10 +13,10 @@
|
||||||
"source": [
|
"source": [
|
||||||
"<table class=\"tfo-notebook-buttons\" align=\"left\">\n",
|
"<table class=\"tfo-notebook-buttons\" align=\"left\">\n",
|
||||||
" <td>\n",
|
" <td>\n",
|
||||||
" <a target=\"_blank\" href=\"https://colab.research.google.com/github/tensorflow/privacy/blob/master/tensorflow_privacy/privacy/membership_inference_attack/codelabs/seq2seq_membership_inference_codelab.ipynb\"><img src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" />Run in Google Colab</a>\n",
|
" <a target=\"_blank\" href=\"https://colab.research.google.com/github/tensorflow/privacy/blob/master/tensorflow_privacy/privacy/privacy_tests/membership_inference_attack/codelabs/seq2seq_membership_inference_codelab.ipynb\"><img src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" />Run in Google Colab</a>\n",
|
||||||
" </td>\n",
|
" </td>\n",
|
||||||
" <td>\n",
|
" <td>\n",
|
||||||
" <a target=\"_blank\" href=\"https://github.com/tensorflow/privacy/blob/master/tensorflow_privacy/privacy/membership_inference_attack/codelabs/seq2seq_membership_inference_codelab.ipynb.ipynb\"><img src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\" />View source on GitHub</a>\n",
|
" <a target=\"_blank\" href=\"https://github.com/tensorflow/privacy/blob/master/tensorflow_privacy/privacy/privacy_tests/membership_inference_attack/codelabs/seq2seq_membership_inference_codelab.ipynb.ipynb\"><img src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\" />View source on GitHub</a>\n",
|
||||||
" </td>\n",
|
" </td>\n",
|
||||||
"</table>"
|
"</table>"
|
||||||
]
|
]
|
||||||
|
@ -106,7 +106,7 @@
|
||||||
"source": [
|
"source": [
|
||||||
"!pip3 install git+https://github.com/tensorflow/privacy\n",
|
"!pip3 install git+https://github.com/tensorflow/privacy\n",
|
||||||
"\n",
|
"\n",
|
||||||
"from tensorflow_privacy.privacy.membership_inference_attack import membership_inference_attack as mia"
|
"from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack import membership_inference_attack as mia"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
|
@ -1142,9 +1142,9 @@
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
"source": [
|
"source": [
|
||||||
"from tensorflow_privacy.privacy.membership_inference_attack.seq2seq_mia import Seq2SeqAttackInputData, \\\n",
|
"from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.seq2seq_mia import Seq2SeqAttackInputData, \\\n",
|
||||||
" run_seq2seq_attack\n",
|
" run_seq2seq_attack\n",
|
||||||
"import tensorflow_privacy.privacy.membership_inference_attack.plotting as plotting\n",
|
"import tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.plotting as plotting\n",
|
||||||
"\n",
|
"\n",
|
||||||
"attack_input = Seq2SeqAttackInputData(\n",
|
"attack_input = Seq2SeqAttackInputData(\n",
|
||||||
" logits_train = logits_train_gen,\n",
|
" logits_train = logits_train_gen,\n",
|
|
@ -0,0 +1,819 @@
|
||||||
|
# Copyright 2020, The TensorFlow Authors.
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
#
|
||||||
|
# http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
# See the License for the specific language governing permissions and
|
||||||
|
# limitations under the License.
|
||||||
|
|
||||||
|
# Lint as: python3
|
||||||
|
"""Data structures representing attack inputs, configuration, outputs."""
|
||||||
|
import collections
|
||||||
|
import enum
|
||||||
|
import glob
|
||||||
|
import os
|
||||||
|
import pickle
|
||||||
|
from typing import Any, Iterable, Union
|
||||||
|
from dataclasses import dataclass
|
||||||
|
import numpy as np
|
||||||
|
import pandas as pd
|
||||||
|
from scipy import special
|
||||||
|
from sklearn import metrics
|
||||||
|
import tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.utils as utils
|
||||||
|
|
||||||
|
ENTIRE_DATASET_SLICE_STR = 'Entire dataset'
|
||||||
|
|
||||||
|
|
||||||
|
class SlicingFeature(enum.Enum):
|
||||||
|
"""Enum with features by which slicing is available."""
|
||||||
|
CLASS = 'class'
|
||||||
|
PERCENTILE = 'percentile'
|
||||||
|
CORRECTLY_CLASSIFIED = 'correctly_classified'
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class SingleSliceSpec:
|
||||||
|
"""Specifies a slice.
|
||||||
|
|
||||||
|
The slice is defined by values in one feature - it might be a single value
|
||||||
|
(eg. slice of examples of the specific classification class) or some set of
|
||||||
|
values (eg. range of percentiles of the attacked model loss).
|
||||||
|
|
||||||
|
When feature is None, it means that the slice is the entire dataset.
|
||||||
|
"""
|
||||||
|
feature: SlicingFeature = None
|
||||||
|
value: Any = None
|
||||||
|
|
||||||
|
@property
|
||||||
|
def entire_dataset(self):
|
||||||
|
return self.feature is None
|
||||||
|
|
||||||
|
def __str__(self):
|
||||||
|
if self.entire_dataset:
|
||||||
|
return ENTIRE_DATASET_SLICE_STR
|
||||||
|
|
||||||
|
if self.feature == SlicingFeature.PERCENTILE:
|
||||||
|
return 'Loss percentiles: %d-%d' % self.value
|
||||||
|
|
||||||
|
return '%s=%s' % (self.feature.name, self.value)
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class SlicingSpec:
|
||||||
|
"""Specification of a slicing procedure.
|
||||||
|
|
||||||
|
Each variable which is set specifies a slicing by different dimension.
|
||||||
|
"""
|
||||||
|
|
||||||
|
# When is set to true, one of the slices is the whole dataset.
|
||||||
|
entire_dataset: bool = True
|
||||||
|
|
||||||
|
# Used in classification tasks for slicing by classes. It is assumed that
|
||||||
|
# classes are integers 0, 1, ... number of classes. When true one slice per
|
||||||
|
# each class is generated.
|
||||||
|
by_class: Union[bool, Iterable[int], int] = False
|
||||||
|
|
||||||
|
# if true, it generates 10 slices for percentiles of the loss - 0-10%, 10-20%,
|
||||||
|
# ... 90-100%.
|
||||||
|
by_percentiles: bool = False
|
||||||
|
|
||||||
|
# When true, a slice for correctly classifed and a slice for misclassifed
|
||||||
|
# examples will be generated.
|
||||||
|
by_classification_correctness: bool = False
|
||||||
|
|
||||||
|
def __str__(self):
|
||||||
|
"""Only keeps the True values."""
|
||||||
|
result = ['SlicingSpec(']
|
||||||
|
if self.entire_dataset:
|
||||||
|
result.append(' Entire dataset,')
|
||||||
|
if self.by_class:
|
||||||
|
if isinstance(self.by_class, Iterable):
|
||||||
|
result.append(' Into classes %s,' % self.by_class)
|
||||||
|
elif isinstance(self.by_class, int):
|
||||||
|
result.append(' Up to class %d,' % self.by_class)
|
||||||
|
else:
|
||||||
|
result.append(' By classes,')
|
||||||
|
if self.by_percentiles:
|
||||||
|
result.append(' By percentiles,')
|
||||||
|
if self.by_classification_correctness:
|
||||||
|
result.append(' By classification correctness,')
|
||||||
|
result.append(')')
|
||||||
|
return '\n'.join(result)
|
||||||
|
|
||||||
|
|
||||||
|
class AttackType(enum.Enum):
|
||||||
|
"""An enum define attack types."""
|
||||||
|
LOGISTIC_REGRESSION = 'lr'
|
||||||
|
MULTI_LAYERED_PERCEPTRON = 'mlp'
|
||||||
|
RANDOM_FOREST = 'rf'
|
||||||
|
K_NEAREST_NEIGHBORS = 'knn'
|
||||||
|
THRESHOLD_ATTACK = 'threshold'
|
||||||
|
THRESHOLD_ENTROPY_ATTACK = 'threshold-entropy'
|
||||||
|
|
||||||
|
@property
|
||||||
|
def is_trained_attack(self):
|
||||||
|
"""Returns whether this type of attack requires training a model."""
|
||||||
|
return (self != AttackType.THRESHOLD_ATTACK) and (
|
||||||
|
self != AttackType.THRESHOLD_ENTROPY_ATTACK)
|
||||||
|
|
||||||
|
def __str__(self):
|
||||||
|
"""Returns LOGISTIC_REGRESSION instead of AttackType.LOGISTIC_REGRESSION."""
|
||||||
|
return '%s' % self.name
|
||||||
|
|
||||||
|
|
||||||
|
class PrivacyMetric(enum.Enum):
|
||||||
|
"""An enum for the supported privacy risk metrics."""
|
||||||
|
AUC = 'AUC'
|
||||||
|
ATTACKER_ADVANTAGE = 'Attacker advantage'
|
||||||
|
|
||||||
|
def __str__(self):
|
||||||
|
"""Returns 'AUC' instead of PrivacyMetric.AUC."""
|
||||||
|
return '%s' % self.value
|
||||||
|
|
||||||
|
|
||||||
|
def _is_integer_type_array(a):
|
||||||
|
return np.issubdtype(a.dtype, np.integer)
|
||||||
|
|
||||||
|
|
||||||
|
def _is_last_dim_equal(arr1, arr1_name, arr2, arr2_name):
|
||||||
|
"""Checks whether the last dimension of the arrays is the same."""
|
||||||
|
if arr1 is not None and arr2 is not None and arr1.shape[-1] != arr2.shape[-1]:
|
||||||
|
raise ValueError('%s and %s should have the same number of features.' %
|
||||||
|
(arr1_name, arr2_name))
|
||||||
|
|
||||||
|
|
||||||
|
def _is_array_one_dimensional(arr, arr_name):
|
||||||
|
"""Checks whether the array is one dimensional."""
|
||||||
|
if arr is not None and len(arr.shape) != 1:
|
||||||
|
raise ValueError('%s should be a one dimensional numpy array.' % arr_name)
|
||||||
|
|
||||||
|
|
||||||
|
def _is_np_array(arr, arr_name):
|
||||||
|
"""Checks whether array is a numpy array."""
|
||||||
|
if arr is not None and not isinstance(arr, np.ndarray):
|
||||||
|
raise ValueError('%s should be a numpy array.' % arr_name)
|
||||||
|
|
||||||
|
|
||||||
|
def _log_value(probs, small_value=1e-30):
|
||||||
|
"""Compute the log value on the probability. Clip probabilities close to 0."""
|
||||||
|
return -np.log(np.maximum(probs, small_value))
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class AttackInputData:
|
||||||
|
"""Input data for running an attack.
|
||||||
|
|
||||||
|
This includes only the data, and not configuration.
|
||||||
|
"""
|
||||||
|
|
||||||
|
logits_train: np.ndarray = None
|
||||||
|
logits_test: np.ndarray = None
|
||||||
|
|
||||||
|
# Predicted probabilities for each class. They can be derived from logits,
|
||||||
|
# so they can be set only if logits are not explicitly provided.
|
||||||
|
probs_train: np.ndarray = None
|
||||||
|
probs_test: np.ndarray = None
|
||||||
|
|
||||||
|
# Contains ground-truth classes. Classes are assumed to be integers starting
|
||||||
|
# from 0.
|
||||||
|
labels_train: np.ndarray = None
|
||||||
|
labels_test: np.ndarray = None
|
||||||
|
|
||||||
|
# Explicitly specified loss. If provided, this is used instead of deriving
|
||||||
|
# loss from logits and labels
|
||||||
|
loss_train: np.ndarray = None
|
||||||
|
loss_test: np.ndarray = None
|
||||||
|
|
||||||
|
# Explicitly specified prediction entropy. If provided, this is used instead
|
||||||
|
# of deriving entropy from logits and labels
|
||||||
|
# (https://arxiv.org/pdf/2003.10595.pdf by Song and Mittal).
|
||||||
|
entropy_train: np.ndarray = None
|
||||||
|
entropy_test: np.ndarray = None
|
||||||
|
|
||||||
|
@property
|
||||||
|
def num_classes(self):
|
||||||
|
if self.labels_train is None or self.labels_test is None:
|
||||||
|
raise ValueError(
|
||||||
|
'Can\'t identify the number of classes as no labels were provided. '
|
||||||
|
'Please set labels_train and labels_test')
|
||||||
|
return int(max(np.max(self.labels_train), np.max(self.labels_test))) + 1
|
||||||
|
|
||||||
|
@property
|
||||||
|
def logits_or_probs_train(self):
|
||||||
|
"""Returns train logits or probs whatever is not None."""
|
||||||
|
if self.logits_train is not None:
|
||||||
|
return self.logits_train
|
||||||
|
return self.probs_train
|
||||||
|
|
||||||
|
@property
|
||||||
|
def logits_or_probs_test(self):
|
||||||
|
"""Returns test logits or probs whatever is not None."""
|
||||||
|
if self.logits_test is not None:
|
||||||
|
return self.logits_test
|
||||||
|
return self.probs_test
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def _get_entropy(logits: np.ndarray, true_labels: np.ndarray):
|
||||||
|
"""Computes the prediction entropy (by Song and Mittal)."""
|
||||||
|
if (np.absolute(np.sum(logits, axis=1) - 1) <= 1e-3).all():
|
||||||
|
probs = logits
|
||||||
|
else:
|
||||||
|
# Using softmax to compute probability from logits.
|
||||||
|
probs = special.softmax(logits, axis=1)
|
||||||
|
if true_labels is None:
|
||||||
|
# When not given ground truth label, we compute the
|
||||||
|
# normal prediction entropy.
|
||||||
|
# See the Equation (7) in https://arxiv.org/pdf/2003.10595.pdf
|
||||||
|
return np.sum(np.multiply(probs, _log_value(probs)), axis=1)
|
||||||
|
else:
|
||||||
|
# When given the ground truth label, we compute the
|
||||||
|
# modified prediction entropy.
|
||||||
|
# See the Equation (8) in https://arxiv.org/pdf/2003.10595.pdf
|
||||||
|
log_probs = _log_value(probs)
|
||||||
|
reverse_probs = 1 - probs
|
||||||
|
log_reverse_probs = _log_value(reverse_probs)
|
||||||
|
modified_probs = np.copy(probs)
|
||||||
|
modified_probs[range(true_labels.size),
|
||||||
|
true_labels] = reverse_probs[range(true_labels.size),
|
||||||
|
true_labels]
|
||||||
|
modified_log_probs = np.copy(log_reverse_probs)
|
||||||
|
modified_log_probs[range(true_labels.size),
|
||||||
|
true_labels] = log_probs[range(true_labels.size),
|
||||||
|
true_labels]
|
||||||
|
return np.sum(np.multiply(modified_probs, modified_log_probs), axis=1)
|
||||||
|
|
||||||
|
def get_loss_train(self):
|
||||||
|
"""Calculates (if needed) cross-entropy losses for the training set.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Loss (or None if neither the loss nor the labels are present).
|
||||||
|
"""
|
||||||
|
if self.loss_train is None:
|
||||||
|
if self.labels_train is None:
|
||||||
|
return None
|
||||||
|
if self.logits_train is not None:
|
||||||
|
self.loss_train = utils.log_loss_from_logits(self.labels_train,
|
||||||
|
self.logits_train)
|
||||||
|
else:
|
||||||
|
self.loss_train = utils.log_loss(self.labels_train, self.probs_train)
|
||||||
|
return self.loss_train
|
||||||
|
|
||||||
|
def get_loss_test(self):
|
||||||
|
"""Calculates (if needed) cross-entropy losses for the test set.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Loss (or None if neither the loss nor the labels are present).
|
||||||
|
"""
|
||||||
|
if self.loss_test is None:
|
||||||
|
if self.labels_test is None:
|
||||||
|
return None
|
||||||
|
if self.logits_test is not None:
|
||||||
|
self.loss_test = utils.log_loss_from_logits(self.labels_test,
|
||||||
|
self.logits_test)
|
||||||
|
else:
|
||||||
|
self.loss_test = utils.log_loss(self.labels_test, self.probs_test)
|
||||||
|
return self.loss_test
|
||||||
|
|
||||||
|
def get_entropy_train(self):
|
||||||
|
"""Calculates prediction entropy for the training set."""
|
||||||
|
if self.entropy_train is not None:
|
||||||
|
return self.entropy_train
|
||||||
|
return self._get_entropy(self.logits_train, self.labels_train)
|
||||||
|
|
||||||
|
def get_entropy_test(self):
|
||||||
|
"""Calculates prediction entropy for the test set."""
|
||||||
|
if self.entropy_test is not None:
|
||||||
|
return self.entropy_test
|
||||||
|
return self._get_entropy(self.logits_test, self.labels_test)
|
||||||
|
|
||||||
|
def get_train_size(self):
|
||||||
|
"""Returns size of the training set."""
|
||||||
|
if self.loss_train is not None:
|
||||||
|
return self.loss_train.size
|
||||||
|
if self.entropy_train is not None:
|
||||||
|
return self.entropy_train.size
|
||||||
|
return self.logits_or_probs_train.shape[0]
|
||||||
|
|
||||||
|
def get_test_size(self):
|
||||||
|
"""Returns size of the test set."""
|
||||||
|
if self.loss_test is not None:
|
||||||
|
return self.loss_test.size
|
||||||
|
if self.entropy_test is not None:
|
||||||
|
return self.entropy_test.size
|
||||||
|
return self.logits_or_probs_test.shape[0]
|
||||||
|
|
||||||
|
def validate(self):
|
||||||
|
"""Validates the inputs."""
|
||||||
|
if (self.loss_train is None) != (self.loss_test is None):
|
||||||
|
raise ValueError(
|
||||||
|
'loss_test and loss_train should both be either set or unset')
|
||||||
|
|
||||||
|
if (self.entropy_train is None) != (self.entropy_test is None):
|
||||||
|
raise ValueError(
|
||||||
|
'entropy_test and entropy_train should both be either set or unset')
|
||||||
|
|
||||||
|
if (self.logits_train is None) != (self.logits_test is None):
|
||||||
|
raise ValueError(
|
||||||
|
'logits_train and logits_test should both be either set or unset')
|
||||||
|
|
||||||
|
if (self.probs_train is None) != (self.probs_test is None):
|
||||||
|
raise ValueError(
|
||||||
|
'probs_train and probs_test should both be either set or unset')
|
||||||
|
|
||||||
|
if (self.logits_train is not None) and (self.probs_train is not None):
|
||||||
|
raise ValueError('Logits and probs can not be both set')
|
||||||
|
|
||||||
|
if (self.labels_train is None) != (self.labels_test is None):
|
||||||
|
raise ValueError(
|
||||||
|
'labels_train and labels_test should both be either set or unset')
|
||||||
|
|
||||||
|
if (self.labels_train is None and self.loss_train is None and
|
||||||
|
self.logits_train is None and self.entropy_train is None):
|
||||||
|
raise ValueError(
|
||||||
|
'At least one of labels, logits, losses or entropy should be set')
|
||||||
|
|
||||||
|
if self.labels_train is not None and not _is_integer_type_array(
|
||||||
|
self.labels_train):
|
||||||
|
raise ValueError('labels_train elements should have integer type')
|
||||||
|
|
||||||
|
if self.labels_test is not None and not _is_integer_type_array(
|
||||||
|
self.labels_test):
|
||||||
|
raise ValueError('labels_test elements should have integer type')
|
||||||
|
|
||||||
|
_is_np_array(self.logits_train, 'logits_train')
|
||||||
|
_is_np_array(self.logits_test, 'logits_test')
|
||||||
|
_is_np_array(self.probs_train, 'probs_train')
|
||||||
|
_is_np_array(self.probs_test, 'probs_test')
|
||||||
|
_is_np_array(self.labels_train, 'labels_train')
|
||||||
|
_is_np_array(self.labels_test, 'labels_test')
|
||||||
|
_is_np_array(self.loss_train, 'loss_train')
|
||||||
|
_is_np_array(self.loss_test, 'loss_test')
|
||||||
|
_is_np_array(self.entropy_train, 'entropy_train')
|
||||||
|
_is_np_array(self.entropy_test, 'entropy_test')
|
||||||
|
|
||||||
|
_is_last_dim_equal(self.logits_train, 'logits_train', self.logits_test,
|
||||||
|
'logits_test')
|
||||||
|
_is_last_dim_equal(self.probs_train, 'probs_train', self.probs_test,
|
||||||
|
'probs_test')
|
||||||
|
_is_array_one_dimensional(self.loss_train, 'loss_train')
|
||||||
|
_is_array_one_dimensional(self.loss_test, 'loss_test')
|
||||||
|
_is_array_one_dimensional(self.entropy_train, 'entropy_train')
|
||||||
|
_is_array_one_dimensional(self.entropy_test, 'entropy_test')
|
||||||
|
_is_array_one_dimensional(self.labels_train, 'labels_train')
|
||||||
|
_is_array_one_dimensional(self.labels_test, 'labels_test')
|
||||||
|
|
||||||
|
def __str__(self):
|
||||||
|
"""Return the shapes of variables that are not None."""
|
||||||
|
result = ['AttackInputData(']
|
||||||
|
_append_array_shape(self.loss_train, 'loss_train', result)
|
||||||
|
_append_array_shape(self.loss_test, 'loss_test', result)
|
||||||
|
_append_array_shape(self.entropy_train, 'entropy_train', result)
|
||||||
|
_append_array_shape(self.entropy_test, 'entropy_test', result)
|
||||||
|
_append_array_shape(self.logits_train, 'logits_train', result)
|
||||||
|
_append_array_shape(self.logits_test, 'logits_test', result)
|
||||||
|
_append_array_shape(self.probs_train, 'probs_train', result)
|
||||||
|
_append_array_shape(self.probs_test, 'probs_test', result)
|
||||||
|
_append_array_shape(self.labels_train, 'labels_train', result)
|
||||||
|
_append_array_shape(self.labels_test, 'labels_test', result)
|
||||||
|
result.append(')')
|
||||||
|
return '\n'.join(result)
|
||||||
|
|
||||||
|
|
||||||
|
def _append_array_shape(arr: np.array, arr_name: str, result):
|
||||||
|
if arr is not None:
|
||||||
|
result.append(' %s with shape: %s,' % (arr_name, arr.shape))
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class RocCurve:
|
||||||
|
"""Represents ROC curve of a membership inference classifier."""
|
||||||
|
# Thresholds used to define points on ROC curve.
|
||||||
|
# Thresholds are not explicitly part of the curve, and are stored for
|
||||||
|
# debugging purposes.
|
||||||
|
thresholds: np.ndarray
|
||||||
|
|
||||||
|
# True positive rates based on thresholds
|
||||||
|
tpr: np.ndarray
|
||||||
|
|
||||||
|
# False positive rates based on thresholds
|
||||||
|
fpr: np.ndarray
|
||||||
|
|
||||||
|
def get_auc(self):
|
||||||
|
"""Calculates area under curve (aka AUC)."""
|
||||||
|
return metrics.auc(self.fpr, self.tpr)
|
||||||
|
|
||||||
|
def get_attacker_advantage(self):
|
||||||
|
"""Calculates membership attacker's (or adversary's) advantage.
|
||||||
|
|
||||||
|
This metric is inspired by https://arxiv.org/abs/1709.01604, specifically
|
||||||
|
by Definition 4. The difference here is that we calculate maximum advantage
|
||||||
|
over all available classifier thresholds.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
a single float number with membership attacker's advantage.
|
||||||
|
"""
|
||||||
|
return max(np.abs(self.tpr - self.fpr))
|
||||||
|
|
||||||
|
def __str__(self):
|
||||||
|
"""Returns AUC and advantage metrics."""
|
||||||
|
return '\n'.join([
|
||||||
|
'RocCurve(',
|
||||||
|
' AUC: %.2f' % self.get_auc(),
|
||||||
|
' Attacker advantage: %.2f' % self.get_attacker_advantage(), ')'
|
||||||
|
])
|
||||||
|
|
||||||
|
|
||||||
|
# (no. of training examples, no. of test examples) for the test.
|
||||||
|
DataSize = collections.namedtuple('DataSize', 'ntrain ntest')
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class SingleAttackResult:
|
||||||
|
"""Results from running a single attack."""
|
||||||
|
|
||||||
|
# Data slice this result was calculated for.
|
||||||
|
slice_spec: SingleSliceSpec
|
||||||
|
|
||||||
|
# (no. of training examples, no. of test examples) for the test.
|
||||||
|
data_size: DataSize
|
||||||
|
attack_type: AttackType
|
||||||
|
|
||||||
|
# NOTE: roc_curve could theoretically be derived from membership scores.
|
||||||
|
# Currently, we store it explicitly since not all attack types support
|
||||||
|
# membership scores.
|
||||||
|
# TODO(b/175870479): Consider deriving ROC curve from the membership scores.
|
||||||
|
|
||||||
|
# ROC curve representing the accuracy of the attacker
|
||||||
|
roc_curve: RocCurve
|
||||||
|
|
||||||
|
# Membership score is some measure of confidence of this attacker that
|
||||||
|
# a particular sample is a member of the training set.
|
||||||
|
#
|
||||||
|
# This is NOT necessarily probability. The nature of this score depends on
|
||||||
|
# the type of attacker. Scores from different attacker types are not directly
|
||||||
|
# comparable, but can be compared in relative terms (e.g. considering order
|
||||||
|
# imposed by this measure).
|
||||||
|
#
|
||||||
|
|
||||||
|
# Membership scores for the training set samples. For a perfect attacker,
|
||||||
|
# all training samples will have higher scores than test samples.
|
||||||
|
membership_scores_train: np.ndarray = None
|
||||||
|
|
||||||
|
# Membership scores for the test set samples. For a perfect attacker, all
|
||||||
|
# test set samples will have lower scores than the training set samples.
|
||||||
|
membership_scores_test: np.ndarray = None
|
||||||
|
|
||||||
|
def get_attacker_advantage(self):
|
||||||
|
return self.roc_curve.get_attacker_advantage()
|
||||||
|
|
||||||
|
def get_auc(self):
|
||||||
|
return self.roc_curve.get_auc()
|
||||||
|
|
||||||
|
def __str__(self):
|
||||||
|
"""Returns SliceSpec, AttackType, AUC and advantage metrics."""
|
||||||
|
return '\n'.join([
|
||||||
|
'SingleAttackResult(',
|
||||||
|
' SliceSpec: %s' % str(self.slice_spec),
|
||||||
|
' DataSize: (ntrain=%d, ntest=%d)' % (self.data_size.ntrain,
|
||||||
|
self.data_size.ntest),
|
||||||
|
' AttackType: %s' % str(self.attack_type),
|
||||||
|
' AUC: %.2f' % self.get_auc(),
|
||||||
|
' Attacker advantage: %.2f' % self.get_attacker_advantage(), ')'
|
||||||
|
])
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class SingleMembershipProbabilityResult:
|
||||||
|
"""Results from computing membership probabilities (denoted as privacy risk score in https://arxiv.org/abs/2003.10595).
|
||||||
|
|
||||||
|
this part shows how to leverage membership probabilities to perform attacks
|
||||||
|
with thresholding on them.
|
||||||
|
"""
|
||||||
|
|
||||||
|
# Data slice this result was calculated for.
|
||||||
|
slice_spec: SingleSliceSpec
|
||||||
|
|
||||||
|
train_membership_probs: np.ndarray
|
||||||
|
|
||||||
|
test_membership_probs: np.ndarray
|
||||||
|
|
||||||
|
def attack_with_varied_thresholds(self, threshold_list):
|
||||||
|
"""Performs an attack with the specified thresholds.
|
||||||
|
|
||||||
|
For each threshold value, we count how many training and test samples with
|
||||||
|
membership probabilities larger than the threshold and further compute
|
||||||
|
precision and recall values. We skip the threshold value if it is larger
|
||||||
|
than every sample's membership probability.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
threshold_list: List of provided thresholds
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
An array of attack results.
|
||||||
|
"""
|
||||||
|
fpr, tpr, thresholds = metrics.roc_curve(
|
||||||
|
np.concatenate((np.ones(len(self.train_membership_probs)),
|
||||||
|
np.zeros(len(self.test_membership_probs)))),
|
||||||
|
np.concatenate(
|
||||||
|
(self.train_membership_probs, self.test_membership_probs)),
|
||||||
|
drop_intermediate=False)
|
||||||
|
|
||||||
|
precision_list = []
|
||||||
|
recall_list = []
|
||||||
|
meaningful_threshold_list = []
|
||||||
|
max_prob = max(self.train_membership_probs.max(),
|
||||||
|
self.test_membership_probs.max())
|
||||||
|
for threshold in threshold_list:
|
||||||
|
if threshold <= max_prob:
|
||||||
|
idx = np.argwhere(thresholds >= threshold)[-1][0]
|
||||||
|
meaningful_threshold_list.append(threshold)
|
||||||
|
precision_list.append(tpr[idx] / (tpr[idx] + fpr[idx]))
|
||||||
|
recall_list.append(tpr[idx])
|
||||||
|
|
||||||
|
return np.array(meaningful_threshold_list), np.array(
|
||||||
|
precision_list), np.array(recall_list)
|
||||||
|
|
||||||
|
def collect_results(self, threshold_list, return_roc_results=True):
|
||||||
|
"""The membership probability (from 0 to 1) represents each sample's probability of being in the training set.
|
||||||
|
|
||||||
|
Usually, we choose a list of threshold values from 0.5 (uncertain of
|
||||||
|
training or test) to 1 (100% certain of training)
|
||||||
|
to compute corresponding attack precision and recall.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
threshold_list: List of provided thresholds
|
||||||
|
return_roc_results: Whether to return ROC results
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Summary string.
|
||||||
|
"""
|
||||||
|
meaningful_threshold_list, precision_list, recall_list = self.attack_with_varied_thresholds(
|
||||||
|
threshold_list)
|
||||||
|
summary = []
|
||||||
|
summary.append('\nMembership probability analysis over slice: \"%s\"' %
|
||||||
|
str(self.slice_spec))
|
||||||
|
for i in range(len(meaningful_threshold_list)):
|
||||||
|
summary.append(
|
||||||
|
' with %.4f as the threshold on membership probability, the precision-recall pair is (%.4f, %.4f)'
|
||||||
|
% (meaningful_threshold_list[i], precision_list[i], recall_list[i]))
|
||||||
|
if return_roc_results:
|
||||||
|
fpr, tpr, thresholds = metrics.roc_curve(
|
||||||
|
np.concatenate((np.ones(len(self.train_membership_probs)),
|
||||||
|
np.zeros(len(self.test_membership_probs)))),
|
||||||
|
np.concatenate(
|
||||||
|
(self.train_membership_probs, self.test_membership_probs)))
|
||||||
|
roc_curve = RocCurve(tpr=tpr, fpr=fpr, thresholds=thresholds)
|
||||||
|
summary.append(
|
||||||
|
' thresholding on membership probability achieved an AUC of %.2f' %
|
||||||
|
(roc_curve.get_auc()))
|
||||||
|
summary.append(
|
||||||
|
' thresholding on membership probability achieved an advantage of %.2f'
|
||||||
|
% (roc_curve.get_attacker_advantage()))
|
||||||
|
return summary
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class MembershipProbabilityResults:
|
||||||
|
"""Membership probability results from multiple data slices."""
|
||||||
|
|
||||||
|
membership_prob_results: Iterable[SingleMembershipProbabilityResult]
|
||||||
|
|
||||||
|
def summary(self, threshold_list):
|
||||||
|
"""Returns the summary of membership probability analyses on all slices."""
|
||||||
|
summary = []
|
||||||
|
for single_result in self.membership_prob_results:
|
||||||
|
single_summary = single_result.collect_results(threshold_list)
|
||||||
|
summary.extend(single_summary)
|
||||||
|
return '\n'.join(summary)
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class PrivacyReportMetadata:
|
||||||
|
"""Metadata about the evaluated model.
|
||||||
|
|
||||||
|
Used to create a privacy report based on AttackResults.
|
||||||
|
"""
|
||||||
|
accuracy_train: float = None
|
||||||
|
accuracy_test: float = None
|
||||||
|
|
||||||
|
loss_train: float = None
|
||||||
|
loss_test: float = None
|
||||||
|
|
||||||
|
model_variant_label: str = 'Default model variant'
|
||||||
|
epoch_num: int = None
|
||||||
|
|
||||||
|
|
||||||
|
class AttackResultsDFColumns(enum.Enum):
|
||||||
|
"""Columns for the Pandas DataFrame that stores AttackResults metrics."""
|
||||||
|
SLICE_FEATURE = 'slice feature'
|
||||||
|
SLICE_VALUE = 'slice value'
|
||||||
|
DATA_SIZE_TRAIN = 'train size'
|
||||||
|
DATA_SIZE_TEST = 'test size'
|
||||||
|
ATTACK_TYPE = 'attack type'
|
||||||
|
|
||||||
|
def __str__(self):
|
||||||
|
"""Returns 'slice value' instead of AttackResultsDFColumns.SLICE_VALUE."""
|
||||||
|
return '%s' % self.value
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class AttackResults:
|
||||||
|
"""Results from running multiple attacks."""
|
||||||
|
single_attack_results: Iterable[SingleAttackResult]
|
||||||
|
|
||||||
|
privacy_report_metadata: PrivacyReportMetadata = None
|
||||||
|
|
||||||
|
def calculate_pd_dataframe(self):
|
||||||
|
"""Returns all metrics as a Pandas DataFrame."""
|
||||||
|
slice_features = []
|
||||||
|
slice_values = []
|
||||||
|
data_size_train = []
|
||||||
|
data_size_test = []
|
||||||
|
attack_types = []
|
||||||
|
advantages = []
|
||||||
|
aucs = []
|
||||||
|
|
||||||
|
for attack_result in self.single_attack_results:
|
||||||
|
slice_spec = attack_result.slice_spec
|
||||||
|
if slice_spec.entire_dataset:
|
||||||
|
slice_feature, slice_value = str(slice_spec), ''
|
||||||
|
else:
|
||||||
|
slice_feature, slice_value = slice_spec.feature.value, slice_spec.value
|
||||||
|
slice_features.append(str(slice_feature))
|
||||||
|
slice_values.append(str(slice_value))
|
||||||
|
data_size_train.append(attack_result.data_size.ntrain)
|
||||||
|
data_size_test.append(attack_result.data_size.ntest)
|
||||||
|
attack_types.append(str(attack_result.attack_type))
|
||||||
|
advantages.append(float(attack_result.get_attacker_advantage()))
|
||||||
|
aucs.append(float(attack_result.get_auc()))
|
||||||
|
|
||||||
|
df = pd.DataFrame({
|
||||||
|
str(AttackResultsDFColumns.SLICE_FEATURE): slice_features,
|
||||||
|
str(AttackResultsDFColumns.SLICE_VALUE): slice_values,
|
||||||
|
str(AttackResultsDFColumns.DATA_SIZE_TRAIN): data_size_train,
|
||||||
|
str(AttackResultsDFColumns.DATA_SIZE_TEST): data_size_test,
|
||||||
|
str(AttackResultsDFColumns.ATTACK_TYPE): attack_types,
|
||||||
|
str(PrivacyMetric.ATTACKER_ADVANTAGE): advantages,
|
||||||
|
str(PrivacyMetric.AUC): aucs
|
||||||
|
})
|
||||||
|
return df
|
||||||
|
|
||||||
|
def summary(self, by_slices=False) -> str:
|
||||||
|
"""Provides a summary of the metrics.
|
||||||
|
|
||||||
|
The summary provides the best-performing attacks for each requested data
|
||||||
|
slice.
|
||||||
|
Args:
|
||||||
|
by_slices : whether to prepare a per-slice summary.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
A string with a summary of all the metrics.
|
||||||
|
"""
|
||||||
|
summary = []
|
||||||
|
|
||||||
|
# Summary over all slices
|
||||||
|
max_auc_result_all = self.get_result_with_max_attacker_advantage()
|
||||||
|
summary.append('Best-performing attacks over all slices')
|
||||||
|
summary.append(
|
||||||
|
' %s (with %d training and %d test examples) achieved an AUC of %.2f on slice %s'
|
||||||
|
% (max_auc_result_all.attack_type,
|
||||||
|
max_auc_result_all.data_size.ntrain,
|
||||||
|
max_auc_result_all.data_size.ntest,
|
||||||
|
max_auc_result_all.get_auc(),
|
||||||
|
max_auc_result_all.slice_spec))
|
||||||
|
|
||||||
|
max_advantage_result_all = self.get_result_with_max_attacker_advantage()
|
||||||
|
summary.append(
|
||||||
|
' %s (with %d training and %d test examples) achieved an advantage of %.2f on slice %s'
|
||||||
|
% (max_advantage_result_all.attack_type,
|
||||||
|
max_advantage_result_all.data_size.ntrain,
|
||||||
|
max_advantage_result_all.data_size.ntest,
|
||||||
|
max_advantage_result_all.get_attacker_advantage(),
|
||||||
|
max_advantage_result_all.slice_spec))
|
||||||
|
|
||||||
|
slice_dict = self._group_results_by_slice()
|
||||||
|
|
||||||
|
if by_slices and len(slice_dict.keys()) > 1:
|
||||||
|
for slice_str in slice_dict:
|
||||||
|
results = slice_dict[slice_str]
|
||||||
|
summary.append('\nBest-performing attacks over slice: \"%s\"' %
|
||||||
|
slice_str)
|
||||||
|
max_auc_result = results.get_result_with_max_auc()
|
||||||
|
summary.append(
|
||||||
|
' %s (with %d training and %d test examples) achieved an AUC of %.2f'
|
||||||
|
% (max_auc_result.attack_type,
|
||||||
|
max_auc_result.data_size.ntrain,
|
||||||
|
max_auc_result.data_size.ntest,
|
||||||
|
max_auc_result.get_auc()))
|
||||||
|
max_advantage_result = results.get_result_with_max_attacker_advantage()
|
||||||
|
summary.append(
|
||||||
|
' %s (with %d training and %d test examples) achieved an advantage of %.2f'
|
||||||
|
% (max_advantage_result.attack_type,
|
||||||
|
max_advantage_result.data_size.ntrain,
|
||||||
|
max_auc_result.data_size.ntest,
|
||||||
|
max_advantage_result.get_attacker_advantage()))
|
||||||
|
|
||||||
|
return '\n'.join(summary)
|
||||||
|
|
||||||
|
def _group_results_by_slice(self):
|
||||||
|
"""Groups AttackResults into a dictionary keyed by the slice."""
|
||||||
|
slice_dict = {}
|
||||||
|
for attack_result in self.single_attack_results:
|
||||||
|
slice_str = str(attack_result.slice_spec)
|
||||||
|
if slice_str not in slice_dict:
|
||||||
|
slice_dict[slice_str] = AttackResults([])
|
||||||
|
slice_dict[slice_str].single_attack_results.append(attack_result)
|
||||||
|
return slice_dict
|
||||||
|
|
||||||
|
def get_result_with_max_auc(self) -> SingleAttackResult:
|
||||||
|
"""Get the result with maximum AUC for all attacks and slices."""
|
||||||
|
aucs = [result.get_auc() for result in self.single_attack_results]
|
||||||
|
|
||||||
|
if min(aucs) < 0.4:
|
||||||
|
print('Suspiciously low AUC detected: %.2f. ' +
|
||||||
|
'There might be a bug in the classifier' % min(aucs))
|
||||||
|
|
||||||
|
return self.single_attack_results[np.argmax(aucs)]
|
||||||
|
|
||||||
|
def get_result_with_max_attacker_advantage(self) -> SingleAttackResult:
|
||||||
|
"""Get the result with maximum advantage for all attacks and slices."""
|
||||||
|
return self.single_attack_results[np.argmax([
|
||||||
|
result.get_attacker_advantage() for result in self.single_attack_results
|
||||||
|
])]
|
||||||
|
|
||||||
|
def save(self, filepath):
|
||||||
|
"""Saves self to a pickle file."""
|
||||||
|
with open(filepath, 'wb') as out:
|
||||||
|
pickle.dump(self, out)
|
||||||
|
|
||||||
|
@classmethod
|
||||||
|
def load(cls, filepath):
|
||||||
|
"""Loads AttackResults from a pickle file."""
|
||||||
|
with open(filepath, 'rb') as inp:
|
||||||
|
return pickle.load(inp)
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class AttackResultsCollection:
|
||||||
|
"""A collection of AttackResults."""
|
||||||
|
attack_results_list: Iterable[AttackResults]
|
||||||
|
|
||||||
|
def append(self, attack_results: AttackResults):
|
||||||
|
self.attack_results_list.append(attack_results)
|
||||||
|
|
||||||
|
def save(self, dirname):
|
||||||
|
"""Saves self to a pickle file."""
|
||||||
|
for i, attack_results in enumerate(self.attack_results_list):
|
||||||
|
filepath = os.path.join(dirname,
|
||||||
|
_get_attack_results_filename(attack_results, i))
|
||||||
|
|
||||||
|
attack_results.save(filepath)
|
||||||
|
|
||||||
|
@classmethod
|
||||||
|
def load(cls, dirname):
|
||||||
|
"""Loads AttackResultsCollection from all files in a directory."""
|
||||||
|
loaded_collection = AttackResultsCollection([])
|
||||||
|
for filepath in sorted(glob.glob('%s/*' % dirname)):
|
||||||
|
with open(filepath, 'rb') as inp:
|
||||||
|
loaded_collection.attack_results_list.append(pickle.load(inp))
|
||||||
|
return loaded_collection
|
||||||
|
|
||||||
|
|
||||||
|
def _get_attack_results_filename(attack_results: AttackResults, index: int):
|
||||||
|
"""Creates a filename for a specific set of AttackResults."""
|
||||||
|
metadata = attack_results.privacy_report_metadata
|
||||||
|
if metadata is not None:
|
||||||
|
return '%s_%s_epoch_%s.pickle' % (metadata.model_variant_label, index,
|
||||||
|
metadata.epoch_num)
|
||||||
|
return '%s.pickle' % index
|
||||||
|
|
||||||
|
|
||||||
|
def get_flattened_attack_metrics(results: AttackResults):
|
||||||
|
"""Get flattened attack metrics.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
results: membership inference attack results.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
types: a list of attack types
|
||||||
|
slices: a list of slices
|
||||||
|
attack_metrics: a list of metric names
|
||||||
|
values: a list of metric values, i-th element correspond to properties[i]
|
||||||
|
"""
|
||||||
|
types = []
|
||||||
|
slices = []
|
||||||
|
attack_metrics = []
|
||||||
|
values = []
|
||||||
|
for attack_result in results.single_attack_results:
|
||||||
|
types += [str(attack_result.attack_type)] * 2
|
||||||
|
slices += [str(attack_result.slice_spec)] * 2
|
||||||
|
attack_metrics += ['adv', 'auc']
|
||||||
|
values += [float(attack_result.get_attacker_advantage()),
|
||||||
|
float(attack_result.get_auc())]
|
||||||
|
return types, slices, attack_metrics, values
|
|
@ -13,25 +13,25 @@
|
||||||
# limitations under the License.
|
# limitations under the License.
|
||||||
|
|
||||||
# Lint as: python3
|
# Lint as: python3
|
||||||
"""Tests for tensorflow_privacy.privacy.membership_inference_attack.data_structures."""
|
"""Tests for tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures."""
|
||||||
import os
|
import os
|
||||||
import tempfile
|
import tempfile
|
||||||
from absl.testing import absltest
|
from absl.testing import absltest
|
||||||
from absl.testing import parameterized
|
from absl.testing import parameterized
|
||||||
import numpy as np
|
import numpy as np
|
||||||
import pandas as pd
|
import pandas as pd
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import _log_value
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import _log_value
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import AttackInputData
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import AttackInputData
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import AttackResults
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import AttackResults
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import AttackResultsCollection
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import AttackResultsCollection
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import AttackType
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import AttackType
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import DataSize
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import DataSize
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import PrivacyReportMetadata
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import PrivacyReportMetadata
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import RocCurve
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import RocCurve
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import SingleAttackResult
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import SingleAttackResult
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import SingleMembershipProbabilityResult
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import SingleMembershipProbabilityResult
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import SingleSliceSpec
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import SingleSliceSpec
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import SlicingFeature
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import SlicingFeature
|
||||||
|
|
||||||
|
|
||||||
class SingleSliceSpecTest(parameterized.TestCase):
|
class SingleSliceSpecTest(parameterized.TestCase):
|
|
@ -0,0 +1,148 @@
|
||||||
|
# Copyright 2020, The TensorFlow Authors.
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
#
|
||||||
|
# http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
# See the License for the specific language governing permissions and
|
||||||
|
# limitations under the License.
|
||||||
|
|
||||||
|
# Lint as: python3
|
||||||
|
"""Specifying and creating AttackInputData slices."""
|
||||||
|
|
||||||
|
import collections
|
||||||
|
import copy
|
||||||
|
from typing import List
|
||||||
|
|
||||||
|
import numpy as np
|
||||||
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import AttackInputData
|
||||||
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import SingleSliceSpec
|
||||||
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import SlicingFeature
|
||||||
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import SlicingSpec
|
||||||
|
|
||||||
|
|
||||||
|
def _slice_if_not_none(a, idx):
|
||||||
|
return None if a is None else a[idx]
|
||||||
|
|
||||||
|
|
||||||
|
def _slice_data_by_indices(data: AttackInputData, idx_train,
|
||||||
|
idx_test) -> AttackInputData:
|
||||||
|
"""Slices train fields with with idx_train and test fields with and idx_test."""
|
||||||
|
|
||||||
|
result = AttackInputData()
|
||||||
|
|
||||||
|
# Slice train data.
|
||||||
|
result.logits_train = _slice_if_not_none(data.logits_train, idx_train)
|
||||||
|
result.probs_train = _slice_if_not_none(data.probs_train, idx_train)
|
||||||
|
result.labels_train = _slice_if_not_none(data.labels_train, idx_train)
|
||||||
|
result.loss_train = _slice_if_not_none(data.loss_train, idx_train)
|
||||||
|
result.entropy_train = _slice_if_not_none(data.entropy_train, idx_train)
|
||||||
|
|
||||||
|
# Slice test data.
|
||||||
|
result.logits_test = _slice_if_not_none(data.logits_test, idx_test)
|
||||||
|
result.probs_test = _slice_if_not_none(data.probs_test, idx_test)
|
||||||
|
result.labels_test = _slice_if_not_none(data.labels_test, idx_test)
|
||||||
|
result.loss_test = _slice_if_not_none(data.loss_test, idx_test)
|
||||||
|
result.entropy_test = _slice_if_not_none(data.entropy_test, idx_test)
|
||||||
|
|
||||||
|
return result
|
||||||
|
|
||||||
|
|
||||||
|
def _slice_by_class(data: AttackInputData, class_value: int) -> AttackInputData:
|
||||||
|
idx_train = data.labels_train == class_value
|
||||||
|
idx_test = data.labels_test == class_value
|
||||||
|
return _slice_data_by_indices(data, idx_train, idx_test)
|
||||||
|
|
||||||
|
|
||||||
|
def _slice_by_percentiles(data: AttackInputData, from_percentile: float,
|
||||||
|
to_percentile: float):
|
||||||
|
"""Slices samples by loss percentiles."""
|
||||||
|
|
||||||
|
# Find from_percentile and to_percentile percentiles in losses.
|
||||||
|
loss_train = data.get_loss_train()
|
||||||
|
loss_test = data.get_loss_test()
|
||||||
|
losses = np.concatenate((loss_train, loss_test))
|
||||||
|
from_loss = np.percentile(losses, from_percentile)
|
||||||
|
to_loss = np.percentile(losses, to_percentile)
|
||||||
|
|
||||||
|
idx_train = (from_loss <= loss_train) & (loss_train <= to_loss)
|
||||||
|
idx_test = (from_loss <= loss_test) & (loss_test <= to_loss)
|
||||||
|
|
||||||
|
return _slice_data_by_indices(data, idx_train, idx_test)
|
||||||
|
|
||||||
|
|
||||||
|
def _indices_by_classification(logits_or_probs, labels, correctly_classified):
|
||||||
|
idx_correct = labels == np.argmax(logits_or_probs, axis=1)
|
||||||
|
return idx_correct if correctly_classified else np.invert(idx_correct)
|
||||||
|
|
||||||
|
|
||||||
|
def _slice_by_classification_correctness(data: AttackInputData,
|
||||||
|
correctly_classified: bool):
|
||||||
|
idx_train = _indices_by_classification(data.logits_or_probs_train,
|
||||||
|
data.labels_train,
|
||||||
|
correctly_classified)
|
||||||
|
idx_test = _indices_by_classification(data.logits_or_probs_test,
|
||||||
|
data.labels_test, correctly_classified)
|
||||||
|
return _slice_data_by_indices(data, idx_train, idx_test)
|
||||||
|
|
||||||
|
|
||||||
|
def get_single_slice_specs(slicing_spec: SlicingSpec,
|
||||||
|
num_classes: int = None) -> List[SingleSliceSpec]:
|
||||||
|
"""Returns slices of data according to slicing_spec."""
|
||||||
|
result = []
|
||||||
|
|
||||||
|
if slicing_spec.entire_dataset:
|
||||||
|
result.append(SingleSliceSpec())
|
||||||
|
|
||||||
|
# Create slices by class.
|
||||||
|
by_class = slicing_spec.by_class
|
||||||
|
if isinstance(by_class, bool):
|
||||||
|
if by_class:
|
||||||
|
assert num_classes, "When by_class == True, num_classes should be given."
|
||||||
|
assert 0 <= num_classes <= 1000, (
|
||||||
|
f"Too much classes for slicing by classes. "
|
||||||
|
f"Found {num_classes}.")
|
||||||
|
for c in range(num_classes):
|
||||||
|
result.append(SingleSliceSpec(SlicingFeature.CLASS, c))
|
||||||
|
elif isinstance(by_class, int):
|
||||||
|
result.append(SingleSliceSpec(SlicingFeature.CLASS, by_class))
|
||||||
|
elif isinstance(by_class, collections.Iterable):
|
||||||
|
for c in by_class:
|
||||||
|
result.append(SingleSliceSpec(SlicingFeature.CLASS, c))
|
||||||
|
|
||||||
|
# Create slices by percentiles
|
||||||
|
if slicing_spec.by_percentiles:
|
||||||
|
for percent in range(0, 100, 10):
|
||||||
|
result.append(
|
||||||
|
SingleSliceSpec(SlicingFeature.PERCENTILE, (percent, percent + 10)))
|
||||||
|
|
||||||
|
# Create slices by correctness of the classifications.
|
||||||
|
if slicing_spec.by_classification_correctness:
|
||||||
|
result.append(SingleSliceSpec(SlicingFeature.CORRECTLY_CLASSIFIED, True))
|
||||||
|
result.append(SingleSliceSpec(SlicingFeature.CORRECTLY_CLASSIFIED, False))
|
||||||
|
|
||||||
|
return result
|
||||||
|
|
||||||
|
|
||||||
|
def get_slice(data: AttackInputData,
|
||||||
|
slice_spec: SingleSliceSpec) -> AttackInputData:
|
||||||
|
"""Returns a single slice of data according to slice_spec."""
|
||||||
|
if slice_spec.entire_dataset:
|
||||||
|
data_slice = copy.copy(data)
|
||||||
|
elif slice_spec.feature == SlicingFeature.CLASS:
|
||||||
|
data_slice = _slice_by_class(data, slice_spec.value)
|
||||||
|
elif slice_spec.feature == SlicingFeature.PERCENTILE:
|
||||||
|
from_percentile, to_percentile = slice_spec.value
|
||||||
|
data_slice = _slice_by_percentiles(data, from_percentile, to_percentile)
|
||||||
|
elif slice_spec.feature == SlicingFeature.CORRECTLY_CLASSIFIED:
|
||||||
|
data_slice = _slice_by_classification_correctness(data, slice_spec.value)
|
||||||
|
else:
|
||||||
|
raise ValueError('Unknown slice spec feature "%s"' % slice_spec.feature)
|
||||||
|
|
||||||
|
data_slice.slice_spec = slice_spec
|
||||||
|
return data_slice
|
|
@ -13,17 +13,17 @@
|
||||||
# limitations under the License.
|
# limitations under the License.
|
||||||
|
|
||||||
# Lint as: python3
|
# Lint as: python3
|
||||||
"""Tests for tensorflow_privacy.privacy.membership_inference_attack.dataset_slicing."""
|
"""Tests for tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.dataset_slicing."""
|
||||||
|
|
||||||
from absl.testing import absltest
|
from absl.testing import absltest
|
||||||
import numpy as np
|
import numpy as np
|
||||||
|
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import AttackInputData
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import AttackInputData
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import SingleSliceSpec
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import SingleSliceSpec
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import SlicingFeature
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import SlicingFeature
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import SlicingSpec
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import SlicingSpec
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack.dataset_slicing import get_single_slice_specs
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.dataset_slicing import get_single_slice_specs
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack.dataset_slicing import get_slice
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.dataset_slicing import get_slice
|
||||||
|
|
||||||
|
|
||||||
def _are_all_fields_equal(lhs, rhs) -> bool:
|
def _are_all_fields_equal(lhs, rhs) -> bool:
|
|
@ -0,0 +1,141 @@
|
||||||
|
# Copyright 2020, The TensorFlow Authors.
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
#
|
||||||
|
# http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
# See the License for the specific language governing permissions and
|
||||||
|
# limitations under the License.
|
||||||
|
|
||||||
|
# Lint as: python3
|
||||||
|
"""A callback and a function in keras for membership inference attack."""
|
||||||
|
|
||||||
|
import os
|
||||||
|
from typing import Iterable
|
||||||
|
from absl import logging
|
||||||
|
|
||||||
|
import tensorflow as tf
|
||||||
|
|
||||||
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack import membership_inference_attack as mia
|
||||||
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import AttackInputData
|
||||||
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import AttackType
|
||||||
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import get_flattened_attack_metrics
|
||||||
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import SlicingSpec
|
||||||
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.utils import log_loss
|
||||||
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.utils_tensorboard import write_results_to_tensorboard_tf2 as write_results_to_tensorboard
|
||||||
|
|
||||||
|
|
||||||
|
def calculate_losses(model, data, labels):
|
||||||
|
"""Calculate losses of model prediction on data, provided true labels.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
model: model to make prediction
|
||||||
|
data: samples
|
||||||
|
labels: true labels of samples (integer valued)
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
preds: probability vector of each sample
|
||||||
|
loss: cross entropy loss of each sample
|
||||||
|
"""
|
||||||
|
pred = model.predict(data)
|
||||||
|
loss = log_loss(labels, pred)
|
||||||
|
return pred, loss
|
||||||
|
|
||||||
|
|
||||||
|
class MembershipInferenceCallback(tf.keras.callbacks.Callback):
|
||||||
|
"""Callback to perform membership inference attack on epoch end."""
|
||||||
|
|
||||||
|
def __init__(
|
||||||
|
self,
|
||||||
|
in_train, out_train,
|
||||||
|
slicing_spec: SlicingSpec = None,
|
||||||
|
attack_types: Iterable[AttackType] = (AttackType.THRESHOLD_ATTACK,),
|
||||||
|
tensorboard_dir=None,
|
||||||
|
tensorboard_merge_classifiers=False):
|
||||||
|
"""Initalizes the callback.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
in_train: (in_training samples, in_training labels)
|
||||||
|
out_train: (out_training samples, out_training labels)
|
||||||
|
slicing_spec: slicing specification of the attack
|
||||||
|
attack_types: a list of attacks, each of type AttackType
|
||||||
|
tensorboard_dir: directory for tensorboard summary
|
||||||
|
tensorboard_merge_classifiers: if true, plot different classifiers with
|
||||||
|
the same slicing_spec and metric in the same figure
|
||||||
|
"""
|
||||||
|
self._in_train_data, self._in_train_labels = in_train
|
||||||
|
self._out_train_data, self._out_train_labels = out_train
|
||||||
|
self._slicing_spec = slicing_spec
|
||||||
|
self._attack_types = attack_types
|
||||||
|
self._tensorboard_merge_classifiers = tensorboard_merge_classifiers
|
||||||
|
if tensorboard_dir:
|
||||||
|
if tensorboard_merge_classifiers:
|
||||||
|
self._writers = {}
|
||||||
|
for attack_type in attack_types:
|
||||||
|
self._writers[attack_type.name] = tf.summary.create_file_writer(
|
||||||
|
os.path.join(tensorboard_dir, 'MI', attack_type.name))
|
||||||
|
else:
|
||||||
|
self._writers = tf.summary.create_file_writer(
|
||||||
|
os.path.join(tensorboard_dir, 'MI'))
|
||||||
|
logging.info('Will write to tensorboard.')
|
||||||
|
else:
|
||||||
|
self._writers = None
|
||||||
|
|
||||||
|
def on_epoch_end(self, epoch, logs=None):
|
||||||
|
results = run_attack_on_keras_model(
|
||||||
|
self.model,
|
||||||
|
(self._in_train_data, self._in_train_labels),
|
||||||
|
(self._out_train_data, self._out_train_labels),
|
||||||
|
self._slicing_spec,
|
||||||
|
self._attack_types)
|
||||||
|
logging.info(results)
|
||||||
|
|
||||||
|
att_types, att_slices, att_metrics, att_values = get_flattened_attack_metrics(
|
||||||
|
results)
|
||||||
|
print('Attack result:')
|
||||||
|
print('\n'.join([' %s: %.4f' % (', '.join([s, t, m]), v) for t, s, m, v in
|
||||||
|
zip(att_types, att_slices, att_metrics, att_values)]))
|
||||||
|
|
||||||
|
# Write to tensorboard if tensorboard_dir is specified
|
||||||
|
if self._writers is not None:
|
||||||
|
write_results_to_tensorboard(results, self._writers, epoch,
|
||||||
|
self._tensorboard_merge_classifiers)
|
||||||
|
|
||||||
|
|
||||||
|
def run_attack_on_keras_model(
|
||||||
|
model, in_train, out_train,
|
||||||
|
slicing_spec: SlicingSpec = None,
|
||||||
|
attack_types: Iterable[AttackType] = (AttackType.THRESHOLD_ATTACK,)):
|
||||||
|
"""Performs the attack on a trained model.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
model: model to be tested
|
||||||
|
in_train: a (in_training samples, in_training labels) tuple
|
||||||
|
out_train: a (out_training samples, out_training labels) tuple
|
||||||
|
slicing_spec: slicing specification of the attack
|
||||||
|
attack_types: a list of attacks, each of type AttackType
|
||||||
|
Returns:
|
||||||
|
Results of the attack
|
||||||
|
"""
|
||||||
|
in_train_data, in_train_labels = in_train
|
||||||
|
out_train_data, out_train_labels = out_train
|
||||||
|
|
||||||
|
# Compute predictions and losses
|
||||||
|
in_train_pred, in_train_loss = calculate_losses(model, in_train_data,
|
||||||
|
in_train_labels)
|
||||||
|
out_train_pred, out_train_loss = calculate_losses(model, out_train_data,
|
||||||
|
out_train_labels)
|
||||||
|
attack_input = AttackInputData(
|
||||||
|
logits_train=in_train_pred, logits_test=out_train_pred,
|
||||||
|
labels_train=in_train_labels, labels_test=out_train_labels,
|
||||||
|
loss_train=in_train_loss, loss_test=out_train_loss
|
||||||
|
)
|
||||||
|
results = mia.run_attacks(attack_input,
|
||||||
|
slicing_spec=slicing_spec,
|
||||||
|
attack_types=attack_types)
|
||||||
|
return results
|
|
@ -20,11 +20,11 @@ from absl import flags
|
||||||
|
|
||||||
import numpy as np
|
import numpy as np
|
||||||
import tensorflow as tf
|
import tensorflow as tf
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import AttackType
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import AttackType
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import get_flattened_attack_metrics
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import get_flattened_attack_metrics
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import SlicingSpec
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import SlicingSpec
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack.keras_evaluation import MembershipInferenceCallback
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.keras_evaluation import MembershipInferenceCallback
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack.keras_evaluation import run_attack_on_keras_model
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.keras_evaluation import run_attack_on_keras_model
|
||||||
|
|
||||||
|
|
||||||
FLAGS = flags.FLAGS
|
FLAGS = flags.FLAGS
|
|
@ -13,17 +13,17 @@
|
||||||
# limitations under the License.
|
# limitations under the License.
|
||||||
|
|
||||||
# Lint as: python3
|
# Lint as: python3
|
||||||
"""Tests for tensorflow_privacy.privacy.membership_inference_attack.keras_evaluation."""
|
"""Tests for tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.keras_evaluation."""
|
||||||
|
|
||||||
from absl.testing import absltest
|
from absl.testing import absltest
|
||||||
|
|
||||||
import numpy as np
|
import numpy as np
|
||||||
import tensorflow.compat.v1 as tf
|
import tensorflow.compat.v1 as tf
|
||||||
|
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack import keras_evaluation
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack import keras_evaluation
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import AttackResults
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import AttackResults
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import AttackType
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import AttackType
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import get_flattened_attack_metrics
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import get_flattened_attack_metrics
|
||||||
|
|
||||||
|
|
||||||
class UtilsTest(absltest.TestCase):
|
class UtilsTest(absltest.TestCase):
|
|
@ -0,0 +1,332 @@
|
||||||
|
# Copyright 2020, The TensorFlow Authors.
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
#
|
||||||
|
# http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
# See the License for the specific language governing permissions and
|
||||||
|
# limitations under the License.
|
||||||
|
|
||||||
|
# Lint as: python3
|
||||||
|
"""Code that runs membership inference attacks based on the model outputs.
|
||||||
|
|
||||||
|
This file belongs to the new API for membership inference attacks. This file
|
||||||
|
will be renamed to membership_inference_attack.py after the old API is removed.
|
||||||
|
"""
|
||||||
|
|
||||||
|
from typing import Iterable
|
||||||
|
import numpy as np
|
||||||
|
from sklearn import metrics
|
||||||
|
|
||||||
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack import models
|
||||||
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import AttackInputData
|
||||||
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import AttackResults
|
||||||
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import AttackType
|
||||||
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import DataSize
|
||||||
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import MembershipProbabilityResults
|
||||||
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import PrivacyReportMetadata
|
||||||
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import RocCurve
|
||||||
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import SingleAttackResult
|
||||||
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import SingleMembershipProbabilityResult
|
||||||
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import SingleSliceSpec
|
||||||
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import SlicingSpec
|
||||||
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.dataset_slicing import get_single_slice_specs
|
||||||
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.dataset_slicing import get_slice
|
||||||
|
|
||||||
|
|
||||||
|
def _get_slice_spec(data: AttackInputData) -> SingleSliceSpec:
|
||||||
|
if hasattr(data, 'slice_spec'):
|
||||||
|
return data.slice_spec
|
||||||
|
return SingleSliceSpec()
|
||||||
|
|
||||||
|
|
||||||
|
def _run_trained_attack(attack_input: AttackInputData,
|
||||||
|
attack_type: AttackType,
|
||||||
|
balance_attacker_training: bool = True):
|
||||||
|
"""Classification attack done by ML models."""
|
||||||
|
attacker = None
|
||||||
|
|
||||||
|
if attack_type == AttackType.LOGISTIC_REGRESSION:
|
||||||
|
attacker = models.LogisticRegressionAttacker()
|
||||||
|
elif attack_type == AttackType.MULTI_LAYERED_PERCEPTRON:
|
||||||
|
attacker = models.MultilayerPerceptronAttacker()
|
||||||
|
elif attack_type == AttackType.RANDOM_FOREST:
|
||||||
|
attacker = models.RandomForestAttacker()
|
||||||
|
elif attack_type == AttackType.K_NEAREST_NEIGHBORS:
|
||||||
|
attacker = models.KNearestNeighborsAttacker()
|
||||||
|
else:
|
||||||
|
raise NotImplementedError('Attack type %s not implemented yet.' %
|
||||||
|
attack_type)
|
||||||
|
|
||||||
|
prepared_attacker_data = models.create_attacker_data(
|
||||||
|
attack_input, balance=balance_attacker_training)
|
||||||
|
|
||||||
|
attacker.train_model(prepared_attacker_data.features_train,
|
||||||
|
prepared_attacker_data.is_training_labels_train)
|
||||||
|
|
||||||
|
# Run the attacker on (permuted) test examples.
|
||||||
|
predictions_test = attacker.predict(prepared_attacker_data.features_test)
|
||||||
|
|
||||||
|
# Generate ROC curves with predictions.
|
||||||
|
fpr, tpr, thresholds = metrics.roc_curve(
|
||||||
|
prepared_attacker_data.is_training_labels_test, predictions_test)
|
||||||
|
|
||||||
|
roc_curve = RocCurve(tpr=tpr, fpr=fpr, thresholds=thresholds)
|
||||||
|
|
||||||
|
# NOTE: In the current setup we can't obtain membership scores for all
|
||||||
|
# samples, since some of them were used to train the attacker. This can be
|
||||||
|
# fixed by training several attackers to ensure each sample was left out
|
||||||
|
# in exactly one attacker (basically, this means performing cross-validation).
|
||||||
|
# TODO(b/175870479): Implement membership scores for predicted attackers.
|
||||||
|
|
||||||
|
return SingleAttackResult(
|
||||||
|
slice_spec=_get_slice_spec(attack_input),
|
||||||
|
data_size=prepared_attacker_data.data_size,
|
||||||
|
attack_type=attack_type,
|
||||||
|
roc_curve=roc_curve)
|
||||||
|
|
||||||
|
|
||||||
|
def _run_threshold_attack(attack_input: AttackInputData):
|
||||||
|
"""Runs a threshold attack on loss."""
|
||||||
|
ntrain, ntest = attack_input.get_train_size(), attack_input.get_test_size()
|
||||||
|
loss_train = attack_input.get_loss_train()
|
||||||
|
loss_test = attack_input.get_loss_test()
|
||||||
|
if loss_train is None or loss_test is None:
|
||||||
|
raise ValueError('Not possible to run threshold attack without losses.')
|
||||||
|
fpr, tpr, thresholds = metrics.roc_curve(
|
||||||
|
np.concatenate((np.zeros(ntrain), np.ones(ntest))),
|
||||||
|
np.concatenate((loss_train, loss_test)))
|
||||||
|
|
||||||
|
roc_curve = RocCurve(tpr=tpr, fpr=fpr, thresholds=thresholds)
|
||||||
|
|
||||||
|
return SingleAttackResult(
|
||||||
|
slice_spec=_get_slice_spec(attack_input),
|
||||||
|
data_size=DataSize(ntrain=ntrain, ntest=ntest),
|
||||||
|
attack_type=AttackType.THRESHOLD_ATTACK,
|
||||||
|
membership_scores_train=-attack_input.get_loss_train(),
|
||||||
|
membership_scores_test=-attack_input.get_loss_test(),
|
||||||
|
roc_curve=roc_curve)
|
||||||
|
|
||||||
|
|
||||||
|
def _run_threshold_entropy_attack(attack_input: AttackInputData):
|
||||||
|
ntrain, ntest = attack_input.get_train_size(), attack_input.get_test_size()
|
||||||
|
fpr, tpr, thresholds = metrics.roc_curve(
|
||||||
|
np.concatenate((np.zeros(ntrain), np.ones(ntest))),
|
||||||
|
np.concatenate(
|
||||||
|
(attack_input.get_entropy_train(), attack_input.get_entropy_test())))
|
||||||
|
|
||||||
|
roc_curve = RocCurve(tpr=tpr, fpr=fpr, thresholds=thresholds)
|
||||||
|
|
||||||
|
return SingleAttackResult(
|
||||||
|
slice_spec=_get_slice_spec(attack_input),
|
||||||
|
data_size=DataSize(ntrain=ntrain, ntest=ntest),
|
||||||
|
attack_type=AttackType.THRESHOLD_ENTROPY_ATTACK,
|
||||||
|
membership_scores_train=-attack_input.get_entropy_train(),
|
||||||
|
membership_scores_test=-attack_input.get_entropy_test(),
|
||||||
|
roc_curve=roc_curve)
|
||||||
|
|
||||||
|
|
||||||
|
def _run_attack(attack_input: AttackInputData,
|
||||||
|
attack_type: AttackType,
|
||||||
|
balance_attacker_training: bool = True,
|
||||||
|
min_num_samples: int = 1):
|
||||||
|
"""Runs membership inference attacks for specified input and type.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
attack_input: input data for running an attack
|
||||||
|
attack_type: the attack to run
|
||||||
|
balance_attacker_training: Whether the training and test sets for the
|
||||||
|
membership inference attacker should have a balanced (roughly equal)
|
||||||
|
number of samples from the training and test sets used to develop
|
||||||
|
the model under attack.
|
||||||
|
min_num_samples: minimum number of examples in either training or test data.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
the attack result.
|
||||||
|
"""
|
||||||
|
attack_input.validate()
|
||||||
|
if min(attack_input.get_train_size(),
|
||||||
|
attack_input.get_test_size()) < min_num_samples:
|
||||||
|
return None
|
||||||
|
|
||||||
|
if attack_type.is_trained_attack:
|
||||||
|
return _run_trained_attack(attack_input, attack_type,
|
||||||
|
balance_attacker_training)
|
||||||
|
if attack_type == AttackType.THRESHOLD_ENTROPY_ATTACK:
|
||||||
|
return _run_threshold_entropy_attack(attack_input)
|
||||||
|
return _run_threshold_attack(attack_input)
|
||||||
|
|
||||||
|
|
||||||
|
def run_attacks(attack_input: AttackInputData,
|
||||||
|
slicing_spec: SlicingSpec = None,
|
||||||
|
attack_types: Iterable[AttackType] = (
|
||||||
|
AttackType.THRESHOLD_ATTACK,),
|
||||||
|
privacy_report_metadata: PrivacyReportMetadata = None,
|
||||||
|
balance_attacker_training: bool = True,
|
||||||
|
min_num_samples: int = 1) -> AttackResults:
|
||||||
|
"""Runs membership inference attacks on a classification model.
|
||||||
|
|
||||||
|
It runs attacks specified by attack_types on each attack_input slice which is
|
||||||
|
specified by slicing_spec.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
attack_input: input data for running an attack
|
||||||
|
slicing_spec: specifies attack_input slices to run attack on
|
||||||
|
attack_types: attacks to run
|
||||||
|
privacy_report_metadata: the metadata of the model under attack.
|
||||||
|
balance_attacker_training: Whether the training and test sets for the
|
||||||
|
membership inference attacker should have a balanced (roughly equal)
|
||||||
|
number of samples from the training and test sets used to develop
|
||||||
|
the model under attack.
|
||||||
|
min_num_samples: minimum number of examples in either training or test data.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
the attack result.
|
||||||
|
"""
|
||||||
|
attack_input.validate()
|
||||||
|
attack_results = []
|
||||||
|
|
||||||
|
if slicing_spec is None:
|
||||||
|
slicing_spec = SlicingSpec(entire_dataset=True)
|
||||||
|
num_classes = None
|
||||||
|
if slicing_spec.by_class:
|
||||||
|
num_classes = attack_input.num_classes
|
||||||
|
input_slice_specs = get_single_slice_specs(slicing_spec, num_classes)
|
||||||
|
for single_slice_spec in input_slice_specs:
|
||||||
|
attack_input_slice = get_slice(attack_input, single_slice_spec)
|
||||||
|
for attack_type in attack_types:
|
||||||
|
attack_result = _run_attack(attack_input_slice, attack_type,
|
||||||
|
balance_attacker_training,
|
||||||
|
min_num_samples)
|
||||||
|
if attack_result is not None:
|
||||||
|
attack_results.append(attack_result)
|
||||||
|
|
||||||
|
privacy_report_metadata = _compute_missing_privacy_report_metadata(
|
||||||
|
privacy_report_metadata, attack_input)
|
||||||
|
|
||||||
|
return AttackResults(
|
||||||
|
single_attack_results=attack_results,
|
||||||
|
privacy_report_metadata=privacy_report_metadata)
|
||||||
|
|
||||||
|
|
||||||
|
def _compute_membership_probability(
|
||||||
|
attack_input: AttackInputData,
|
||||||
|
num_bins: int = 15) -> SingleMembershipProbabilityResult:
|
||||||
|
"""Computes each individual point's likelihood of being a member (denoted as privacy risk score in https://arxiv.org/abs/2003.10595).
|
||||||
|
|
||||||
|
For an individual sample, its privacy risk score is computed as the posterior
|
||||||
|
probability of being in the training set
|
||||||
|
after observing its prediction output by the target machine learning model.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
attack_input: input data for compute membership probability
|
||||||
|
num_bins: the number of bins used to compute the training/test histogram
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
membership probability results
|
||||||
|
"""
|
||||||
|
|
||||||
|
# Uses the provided loss or entropy. Otherwise computes the loss.
|
||||||
|
if attack_input.loss_train is not None and attack_input.loss_test is not None:
|
||||||
|
train_values = attack_input.loss_train
|
||||||
|
test_values = attack_input.loss_test
|
||||||
|
elif attack_input.entropy_train is not None and attack_input.entropy_test is not None:
|
||||||
|
train_values = attack_input.entropy_train
|
||||||
|
test_values = attack_input.entropy_test
|
||||||
|
else:
|
||||||
|
train_values = attack_input.get_loss_train()
|
||||||
|
test_values = attack_input.get_loss_test()
|
||||||
|
|
||||||
|
# Compute the histogram in the log scale
|
||||||
|
small_value = 1e-10
|
||||||
|
train_values = np.maximum(train_values, small_value)
|
||||||
|
test_values = np.maximum(test_values, small_value)
|
||||||
|
|
||||||
|
min_value = min(train_values.min(), test_values.min())
|
||||||
|
max_value = max(train_values.max(), test_values.max())
|
||||||
|
bins_hist = np.logspace(
|
||||||
|
np.log10(min_value), np.log10(max_value), num_bins + 1)
|
||||||
|
|
||||||
|
train_hist, _ = np.histogram(train_values, bins=bins_hist)
|
||||||
|
train_hist = train_hist / (len(train_values) + 0.0)
|
||||||
|
train_hist_indices = np.fmin(
|
||||||
|
np.digitize(train_values, bins=bins_hist), num_bins) - 1
|
||||||
|
|
||||||
|
test_hist, _ = np.histogram(test_values, bins=bins_hist)
|
||||||
|
test_hist = test_hist / (len(test_values) + 0.0)
|
||||||
|
test_hist_indices = np.fmin(
|
||||||
|
np.digitize(test_values, bins=bins_hist), num_bins) - 1
|
||||||
|
|
||||||
|
combined_hist = train_hist + test_hist
|
||||||
|
combined_hist[combined_hist == 0] = small_value
|
||||||
|
membership_prob_list = train_hist / (combined_hist + 0.0)
|
||||||
|
train_membership_probs = membership_prob_list[train_hist_indices]
|
||||||
|
test_membership_probs = membership_prob_list[test_hist_indices]
|
||||||
|
|
||||||
|
return SingleMembershipProbabilityResult(
|
||||||
|
slice_spec=_get_slice_spec(attack_input),
|
||||||
|
train_membership_probs=train_membership_probs,
|
||||||
|
test_membership_probs=test_membership_probs)
|
||||||
|
|
||||||
|
|
||||||
|
def run_membership_probability_analysis(
|
||||||
|
attack_input: AttackInputData,
|
||||||
|
slicing_spec: SlicingSpec = None) -> MembershipProbabilityResults:
|
||||||
|
"""Perform membership probability analysis on all given slice types.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
attack_input: input data for compute membership probabilities
|
||||||
|
slicing_spec: specifies attack_input slices
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
the membership probability results.
|
||||||
|
"""
|
||||||
|
attack_input.validate()
|
||||||
|
membership_prob_results = []
|
||||||
|
|
||||||
|
if slicing_spec is None:
|
||||||
|
slicing_spec = SlicingSpec(entire_dataset=True)
|
||||||
|
num_classes = None
|
||||||
|
if slicing_spec.by_class:
|
||||||
|
num_classes = attack_input.num_classes
|
||||||
|
input_slice_specs = get_single_slice_specs(slicing_spec, num_classes)
|
||||||
|
for single_slice_spec in input_slice_specs:
|
||||||
|
attack_input_slice = get_slice(attack_input, single_slice_spec)
|
||||||
|
membership_prob_results.append(
|
||||||
|
_compute_membership_probability(attack_input_slice))
|
||||||
|
|
||||||
|
return MembershipProbabilityResults(
|
||||||
|
membership_prob_results=membership_prob_results)
|
||||||
|
|
||||||
|
|
||||||
|
def _compute_missing_privacy_report_metadata(
|
||||||
|
metadata: PrivacyReportMetadata,
|
||||||
|
attack_input: AttackInputData) -> PrivacyReportMetadata:
|
||||||
|
"""Populates metadata fields if they are missing."""
|
||||||
|
if metadata is None:
|
||||||
|
metadata = PrivacyReportMetadata()
|
||||||
|
if metadata.accuracy_train is None:
|
||||||
|
metadata.accuracy_train = _get_accuracy(attack_input.logits_train,
|
||||||
|
attack_input.labels_train)
|
||||||
|
if metadata.accuracy_test is None:
|
||||||
|
metadata.accuracy_test = _get_accuracy(attack_input.logits_test,
|
||||||
|
attack_input.labels_test)
|
||||||
|
loss_train = attack_input.get_loss_train()
|
||||||
|
loss_test = attack_input.get_loss_test()
|
||||||
|
if metadata.loss_train is None and loss_train is not None:
|
||||||
|
metadata.loss_train = np.average(loss_train)
|
||||||
|
if metadata.loss_test is None and loss_test is not None:
|
||||||
|
metadata.loss_test = np.average(loss_test)
|
||||||
|
return metadata
|
||||||
|
|
||||||
|
|
||||||
|
def _get_accuracy(logits, labels):
|
||||||
|
"""Computes the accuracy if it is missing."""
|
||||||
|
if logits is None or labels is None:
|
||||||
|
return None
|
||||||
|
return metrics.accuracy_score(labels, np.argmax(logits, axis=1))
|
|
@ -13,17 +13,17 @@
|
||||||
# limitations under the License.
|
# limitations under the License.
|
||||||
|
|
||||||
# Lint as: python3
|
# Lint as: python3
|
||||||
"""Tests for tensorflow_privacy.privacy.membership_inference_attack.utils."""
|
"""Tests for tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.utils."""
|
||||||
from absl.testing import absltest
|
from absl.testing import absltest
|
||||||
import numpy as np
|
import numpy as np
|
||||||
|
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack import membership_inference_attack as mia
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack import membership_inference_attack as mia
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import AttackInputData
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import AttackInputData
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import AttackType
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import AttackType
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import DataSize
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import DataSize
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import SingleSliceSpec
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import SingleSliceSpec
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import SlicingFeature
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import SlicingFeature
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import SlicingSpec
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import SlicingSpec
|
||||||
|
|
||||||
|
|
||||||
def get_test_input(n_train, n_test):
|
def get_test_input(n_train, n_test):
|
|
@ -0,0 +1,210 @@
|
||||||
|
# Copyright 2020, The TensorFlow Authors.
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
#
|
||||||
|
# http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
# See the License for the specific language governing permissions and
|
||||||
|
# limitations under the License.
|
||||||
|
|
||||||
|
# Lint as: python3
|
||||||
|
"""Trained models for membership inference attacks."""
|
||||||
|
|
||||||
|
from dataclasses import dataclass
|
||||||
|
import numpy as np
|
||||||
|
from sklearn import ensemble
|
||||||
|
from sklearn import linear_model
|
||||||
|
from sklearn import model_selection
|
||||||
|
from sklearn import neighbors
|
||||||
|
from sklearn import neural_network
|
||||||
|
|
||||||
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import AttackInputData
|
||||||
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import DataSize
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class AttackerData:
|
||||||
|
"""Input data for an ML classifier attack.
|
||||||
|
|
||||||
|
This includes only the data, and not configuration.
|
||||||
|
"""
|
||||||
|
|
||||||
|
features_train: np.ndarray = None
|
||||||
|
# element-wise boolean array denoting if the example was part of training.
|
||||||
|
is_training_labels_train: np.ndarray = None
|
||||||
|
|
||||||
|
features_test: np.ndarray = None
|
||||||
|
# element-wise boolean array denoting if the example was part of training.
|
||||||
|
is_training_labels_test: np.ndarray = None
|
||||||
|
|
||||||
|
data_size: DataSize = None
|
||||||
|
|
||||||
|
|
||||||
|
def create_attacker_data(attack_input_data: AttackInputData,
|
||||||
|
test_fraction: float = 0.25,
|
||||||
|
balance: bool = True) -> AttackerData:
|
||||||
|
"""Prepare AttackInputData to train ML attackers.
|
||||||
|
|
||||||
|
Combines logits and losses and performs a random train-test split.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
attack_input_data: Original AttackInputData
|
||||||
|
test_fraction: Fraction of the dataset to include in the test split.
|
||||||
|
balance: Whether the training and test sets for the membership inference
|
||||||
|
attacker should have a balanced (roughly equal) number of samples
|
||||||
|
from the training and test sets used to develop the model
|
||||||
|
under attack.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
AttackerData.
|
||||||
|
"""
|
||||||
|
attack_input_train = _column_stack(attack_input_data.logits_or_probs_train,
|
||||||
|
attack_input_data.get_loss_train())
|
||||||
|
attack_input_test = _column_stack(attack_input_data.logits_or_probs_test,
|
||||||
|
attack_input_data.get_loss_test())
|
||||||
|
|
||||||
|
if balance:
|
||||||
|
min_size = min(attack_input_data.get_train_size(),
|
||||||
|
attack_input_data.get_test_size())
|
||||||
|
attack_input_train = _sample_multidimensional_array(attack_input_train,
|
||||||
|
min_size)
|
||||||
|
attack_input_test = _sample_multidimensional_array(attack_input_test,
|
||||||
|
min_size)
|
||||||
|
ntrain, ntest = attack_input_train.shape[0], attack_input_test.shape[0]
|
||||||
|
|
||||||
|
features_all = np.concatenate((attack_input_train, attack_input_test))
|
||||||
|
|
||||||
|
labels_all = np.concatenate(((np.zeros(ntrain)), (np.ones(ntest))))
|
||||||
|
|
||||||
|
# Perform a train-test split
|
||||||
|
features_train, features_test, is_training_labels_train, is_training_labels_test = model_selection.train_test_split(
|
||||||
|
features_all, labels_all, test_size=test_fraction, stratify=labels_all)
|
||||||
|
return AttackerData(features_train, is_training_labels_train, features_test,
|
||||||
|
is_training_labels_test,
|
||||||
|
DataSize(ntrain=ntrain, ntest=ntest))
|
||||||
|
|
||||||
|
|
||||||
|
def _sample_multidimensional_array(array, size):
|
||||||
|
indices = np.random.choice(len(array), size, replace=False)
|
||||||
|
return array[indices]
|
||||||
|
|
||||||
|
|
||||||
|
def _column_stack(logits, loss):
|
||||||
|
"""Stacks logits and losses.
|
||||||
|
|
||||||
|
In case that only one exists, returns that one.
|
||||||
|
Args:
|
||||||
|
logits: logits array
|
||||||
|
loss: loss array
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
stacked logits and losses (or only one if both do not exist).
|
||||||
|
"""
|
||||||
|
if logits is None:
|
||||||
|
return np.expand_dims(loss, axis=-1)
|
||||||
|
if loss is None:
|
||||||
|
return logits
|
||||||
|
return np.column_stack((logits, loss))
|
||||||
|
|
||||||
|
|
||||||
|
class TrainedAttacker:
|
||||||
|
"""Base class for training attack models."""
|
||||||
|
model = None
|
||||||
|
|
||||||
|
def train_model(self, input_features, is_training_labels):
|
||||||
|
"""Train an attacker model.
|
||||||
|
|
||||||
|
This is trained on examples from train and test datasets.
|
||||||
|
Args:
|
||||||
|
input_features : array-like of shape (n_samples, n_features) Training
|
||||||
|
vector, where n_samples is the number of samples and n_features is the
|
||||||
|
number of features.
|
||||||
|
is_training_labels : a vector of booleans of shape (n_samples, )
|
||||||
|
representing whether the sample is in the training set or not.
|
||||||
|
"""
|
||||||
|
raise NotImplementedError()
|
||||||
|
|
||||||
|
def predict(self, input_features):
|
||||||
|
"""Predicts whether input_features belongs to train or test.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
input_features : A vector of features with the same semantics as x_train
|
||||||
|
passed to train_model.
|
||||||
|
Returns:
|
||||||
|
An array of probabilities denoting whether the example belongs to test.
|
||||||
|
"""
|
||||||
|
if self.model is None:
|
||||||
|
raise AssertionError(
|
||||||
|
'Model not trained yet. Please call train_model first.')
|
||||||
|
return self.model.predict_proba(input_features)[:, 1]
|
||||||
|
|
||||||
|
|
||||||
|
class LogisticRegressionAttacker(TrainedAttacker):
|
||||||
|
"""Logistic regression attacker."""
|
||||||
|
|
||||||
|
def train_model(self, input_features, is_training_labels):
|
||||||
|
lr = linear_model.LogisticRegression(solver='lbfgs')
|
||||||
|
param_grid = {
|
||||||
|
'C': np.logspace(-4, 2, 10),
|
||||||
|
}
|
||||||
|
model = model_selection.GridSearchCV(
|
||||||
|
lr, param_grid=param_grid, cv=3, n_jobs=1, verbose=0)
|
||||||
|
model.fit(input_features, is_training_labels)
|
||||||
|
self.model = model
|
||||||
|
|
||||||
|
|
||||||
|
class MultilayerPerceptronAttacker(TrainedAttacker):
|
||||||
|
"""Multilayer perceptron attacker."""
|
||||||
|
|
||||||
|
def train_model(self, input_features, is_training_labels):
|
||||||
|
mlp_model = neural_network.MLPClassifier()
|
||||||
|
param_grid = {
|
||||||
|
'hidden_layer_sizes': [(64,), (32, 32)],
|
||||||
|
'solver': ['adam'],
|
||||||
|
'alpha': [0.0001, 0.001, 0.01],
|
||||||
|
}
|
||||||
|
n_jobs = -1
|
||||||
|
model = model_selection.GridSearchCV(
|
||||||
|
mlp_model, param_grid=param_grid, cv=3, n_jobs=n_jobs, verbose=0)
|
||||||
|
model.fit(input_features, is_training_labels)
|
||||||
|
self.model = model
|
||||||
|
|
||||||
|
|
||||||
|
class RandomForestAttacker(TrainedAttacker):
|
||||||
|
"""Random forest attacker."""
|
||||||
|
|
||||||
|
def train_model(self, input_features, is_training_labels):
|
||||||
|
"""Setup a random forest pipeline with cross-validation."""
|
||||||
|
rf_model = ensemble.RandomForestClassifier()
|
||||||
|
|
||||||
|
param_grid = {
|
||||||
|
'n_estimators': [100],
|
||||||
|
'max_features': ['auto', 'sqrt'],
|
||||||
|
'max_depth': [5, 10, 20, None],
|
||||||
|
'min_samples_split': [2, 5, 10],
|
||||||
|
'min_samples_leaf': [1, 2, 4]
|
||||||
|
}
|
||||||
|
n_jobs = -1
|
||||||
|
model = model_selection.GridSearchCV(
|
||||||
|
rf_model, param_grid=param_grid, cv=3, n_jobs=n_jobs, verbose=0)
|
||||||
|
model.fit(input_features, is_training_labels)
|
||||||
|
self.model = model
|
||||||
|
|
||||||
|
|
||||||
|
class KNearestNeighborsAttacker(TrainedAttacker):
|
||||||
|
"""K nearest neighbor attacker."""
|
||||||
|
|
||||||
|
def train_model(self, input_features, is_training_labels):
|
||||||
|
knn_model = neighbors.KNeighborsClassifier()
|
||||||
|
param_grid = {
|
||||||
|
'n_neighbors': [3, 5, 7],
|
||||||
|
}
|
||||||
|
model = model_selection.GridSearchCV(
|
||||||
|
knn_model, param_grid=param_grid, cv=3, n_jobs=1, verbose=0)
|
||||||
|
model.fit(input_features, is_training_labels)
|
||||||
|
self.model = model
|
|
@ -13,12 +13,12 @@
|
||||||
# limitations under the License.
|
# limitations under the License.
|
||||||
|
|
||||||
# Lint as: python3
|
# Lint as: python3
|
||||||
"""Tests for tensorflow_privacy.privacy.membership_inference_attack.data_structures."""
|
"""Tests for tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures."""
|
||||||
from absl.testing import absltest
|
from absl.testing import absltest
|
||||||
import numpy as np
|
import numpy as np
|
||||||
|
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack import models
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack import models
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import AttackInputData
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import AttackInputData
|
||||||
|
|
||||||
|
|
||||||
class TrainedAttackerTest(absltest.TestCase):
|
class TrainedAttackerTest(absltest.TestCase):
|
|
@ -0,0 +1,86 @@
|
||||||
|
# Copyright 2020, The TensorFlow Authors.
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
#
|
||||||
|
# http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
# See the License for the specific language governing permissions and
|
||||||
|
# limitations under the License.
|
||||||
|
|
||||||
|
# Lint as: python3
|
||||||
|
"""Plotting functionality for membership inference attack analysis.
|
||||||
|
|
||||||
|
Functions to plot ROC curves and histograms as well as functionality to store
|
||||||
|
figures to colossus.
|
||||||
|
"""
|
||||||
|
|
||||||
|
from typing import Text, Iterable, Optional
|
||||||
|
|
||||||
|
import matplotlib.pyplot as plt
|
||||||
|
import numpy as np
|
||||||
|
from sklearn import metrics
|
||||||
|
|
||||||
|
|
||||||
|
def save_plot(figure: plt.Figure, path: Text, outformat='png'):
|
||||||
|
"""Store a figure to disk."""
|
||||||
|
if path is not None:
|
||||||
|
with open(path, 'wb') as f:
|
||||||
|
figure.savefig(f, bbox_inches='tight', format=outformat)
|
||||||
|
plt.close(figure)
|
||||||
|
|
||||||
|
|
||||||
|
def plot_curve_with_area(x: Iterable[float],
|
||||||
|
y: Iterable[float],
|
||||||
|
xlabel: Text = 'x',
|
||||||
|
ylabel: Text = 'y') -> plt.Figure:
|
||||||
|
"""Plot the curve defined by inputs and the area under the curve.
|
||||||
|
|
||||||
|
All entries of x and y are required to lie between 0 and 1.
|
||||||
|
For example, x could be recall and y precision, or x is fpr and y is tpr.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
x: Values on x-axis (1d)
|
||||||
|
y: Values on y-axis (must be same length as x)
|
||||||
|
xlabel: Label for x axis
|
||||||
|
ylabel: Label for y axis
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
The matplotlib figure handle
|
||||||
|
"""
|
||||||
|
fig = plt.figure()
|
||||||
|
plt.plot([0, 1], [0, 1], 'k', lw=1.0)
|
||||||
|
plt.plot(x, y, lw=2, label=f'AUC: {metrics.auc(x, y):.3f}')
|
||||||
|
plt.xlabel(xlabel)
|
||||||
|
plt.ylabel(ylabel)
|
||||||
|
plt.legend()
|
||||||
|
return fig
|
||||||
|
|
||||||
|
|
||||||
|
def plot_histograms(train: Iterable[float],
|
||||||
|
test: Iterable[float],
|
||||||
|
xlabel: Text = 'x',
|
||||||
|
thresh: Optional[float] = None) -> plt.Figure:
|
||||||
|
"""Plot histograms of training versus test metrics."""
|
||||||
|
xmin = min(np.min(train), np.min(test))
|
||||||
|
xmax = max(np.max(train), np.max(test))
|
||||||
|
bins = np.linspace(xmin, xmax, 100)
|
||||||
|
fig = plt.figure()
|
||||||
|
plt.hist(test, bins=bins, density=True, alpha=0.5, label='test', log='y')
|
||||||
|
plt.hist(train, bins=bins, density=True, alpha=0.5, label='train', log='y')
|
||||||
|
if thresh is not None:
|
||||||
|
plt.axvline(thresh, c='r', label=f'threshold = {thresh:.3f}')
|
||||||
|
plt.xlabel(xlabel)
|
||||||
|
plt.ylabel('normalized counts (density)')
|
||||||
|
plt.legend()
|
||||||
|
return fig
|
||||||
|
|
||||||
|
|
||||||
|
def plot_roc_curve(roc_curve) -> plt.Figure:
|
||||||
|
"""Plot the ROC curve and the area under the curve."""
|
||||||
|
return plot_curve_with_area(
|
||||||
|
roc_curve.fpr, roc_curve.tpr, xlabel='FPR', ylabel='TPR')
|
|
@ -0,0 +1,138 @@
|
||||||
|
# Copyright 2020, The TensorFlow Authors.
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
#
|
||||||
|
# http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
# See the License for the specific language governing permissions and
|
||||||
|
# limitations under the License.
|
||||||
|
|
||||||
|
# Lint as: python3
|
||||||
|
"""Plotting code for ML Privacy Reports."""
|
||||||
|
from typing import Iterable
|
||||||
|
import matplotlib.pyplot as plt
|
||||||
|
import pandas as pd
|
||||||
|
|
||||||
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import AttackResults
|
||||||
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import AttackResultsCollection
|
||||||
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import AttackResultsDFColumns
|
||||||
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import ENTIRE_DATASET_SLICE_STR
|
||||||
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import PrivacyMetric
|
||||||
|
|
||||||
|
# Helper constants for DataFrame keys.
|
||||||
|
LEGEND_LABEL_STR = 'legend label'
|
||||||
|
EPOCH_STR = 'Epoch'
|
||||||
|
TRAIN_ACCURACY_STR = 'Train accuracy'
|
||||||
|
|
||||||
|
|
||||||
|
def plot_by_epochs(results: AttackResultsCollection,
|
||||||
|
privacy_metrics: Iterable[PrivacyMetric]) -> plt.Figure:
|
||||||
|
"""Plots privacy vulnerabilities vs epoch numbers.
|
||||||
|
|
||||||
|
In case multiple privacy metrics are specified, the plot will feature
|
||||||
|
multiple subplots (one subplot per metrics). Multiple model variants
|
||||||
|
are supported.
|
||||||
|
Args:
|
||||||
|
results: AttackResults for the plot
|
||||||
|
privacy_metrics: List of enumerated privacy metrics that should be plotted.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
A pyplot figure with privacy vs accuracy plots.
|
||||||
|
"""
|
||||||
|
|
||||||
|
_validate_results(results.attack_results_list)
|
||||||
|
all_results_df = _calculate_combined_df_with_metadata(
|
||||||
|
results.attack_results_list)
|
||||||
|
return _generate_subplots(
|
||||||
|
all_results_df=all_results_df,
|
||||||
|
x_axis_metric='Epoch',
|
||||||
|
figure_title='Vulnerability per Epoch',
|
||||||
|
privacy_metrics=privacy_metrics)
|
||||||
|
|
||||||
|
|
||||||
|
def plot_privacy_vs_accuracy(results: AttackResultsCollection,
|
||||||
|
privacy_metrics: Iterable[PrivacyMetric]):
|
||||||
|
"""Plots privacy vulnerabilities vs accuracy plots.
|
||||||
|
|
||||||
|
In case multiple privacy metrics are specified, the plot will feature
|
||||||
|
multiple subplots (one subplot per metrics). Multiple model variants
|
||||||
|
are supported.
|
||||||
|
Args:
|
||||||
|
results: AttackResults for the plot
|
||||||
|
privacy_metrics: List of enumerated privacy metrics that should be plotted.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
A pyplot figure with privacy vs accuracy plots.
|
||||||
|
|
||||||
|
"""
|
||||||
|
_validate_results(results.attack_results_list)
|
||||||
|
all_results_df = _calculate_combined_df_with_metadata(
|
||||||
|
results.attack_results_list)
|
||||||
|
return _generate_subplots(
|
||||||
|
all_results_df=all_results_df,
|
||||||
|
x_axis_metric='Train accuracy',
|
||||||
|
figure_title='Privacy vs Utility Analysis',
|
||||||
|
privacy_metrics=privacy_metrics)
|
||||||
|
|
||||||
|
|
||||||
|
def _calculate_combined_df_with_metadata(results: Iterable[AttackResults]):
|
||||||
|
"""Adds metadata to the dataframe and concats them together."""
|
||||||
|
all_results_df = None
|
||||||
|
for attack_results in results:
|
||||||
|
attack_results_df = attack_results.calculate_pd_dataframe()
|
||||||
|
attack_results_df = attack_results_df.loc[attack_results_df[str(
|
||||||
|
AttackResultsDFColumns.SLICE_FEATURE)] == ENTIRE_DATASET_SLICE_STR]
|
||||||
|
attack_results_df.insert(0, EPOCH_STR,
|
||||||
|
attack_results.privacy_report_metadata.epoch_num)
|
||||||
|
attack_results_df.insert(
|
||||||
|
0, TRAIN_ACCURACY_STR,
|
||||||
|
attack_results.privacy_report_metadata.accuracy_train)
|
||||||
|
attack_results_df.insert(
|
||||||
|
0, LEGEND_LABEL_STR,
|
||||||
|
attack_results.privacy_report_metadata.model_variant_label + ' - ' +
|
||||||
|
attack_results_df[str(AttackResultsDFColumns.ATTACK_TYPE)])
|
||||||
|
if all_results_df is None:
|
||||||
|
all_results_df = attack_results_df
|
||||||
|
else:
|
||||||
|
all_results_df = pd.concat([all_results_df, attack_results_df],
|
||||||
|
ignore_index=True)
|
||||||
|
return all_results_df
|
||||||
|
|
||||||
|
|
||||||
|
def _generate_subplots(all_results_df: pd.DataFrame, x_axis_metric: str,
|
||||||
|
figure_title: str,
|
||||||
|
privacy_metrics: Iterable[PrivacyMetric]):
|
||||||
|
"""Create one subplot per privacy metric for a specified x_axis_metric."""
|
||||||
|
fig, axes = plt.subplots(
|
||||||
|
1, len(privacy_metrics), figsize=(5 * len(privacy_metrics) + 3, 5))
|
||||||
|
# Set a title for the entire group of subplots.
|
||||||
|
fig.suptitle(figure_title)
|
||||||
|
if len(privacy_metrics) == 1:
|
||||||
|
axes = (axes,)
|
||||||
|
for i, privacy_metric in enumerate(privacy_metrics):
|
||||||
|
legend_labels = all_results_df[LEGEND_LABEL_STR].unique()
|
||||||
|
for legend_label in legend_labels:
|
||||||
|
single_label_results = all_results_df.loc[all_results_df[LEGEND_LABEL_STR]
|
||||||
|
== legend_label]
|
||||||
|
sorted_label_results = single_label_results.sort_values(x_axis_metric)
|
||||||
|
axes[i].plot(sorted_label_results[x_axis_metric],
|
||||||
|
sorted_label_results[str(privacy_metric)])
|
||||||
|
axes[i].set_xlabel(x_axis_metric)
|
||||||
|
axes[i].set_title('%s for %s' % (privacy_metric, ENTIRE_DATASET_SLICE_STR))
|
||||||
|
plt.legend(legend_labels, loc='upper left', bbox_to_anchor=(1.02, 1))
|
||||||
|
fig.tight_layout(rect=[0, 0, 1, 0.93]) # Leave space for suptitle.
|
||||||
|
|
||||||
|
return fig
|
||||||
|
|
||||||
|
|
||||||
|
def _validate_results(results: Iterable[AttackResults]):
|
||||||
|
for attack_results in results:
|
||||||
|
if not attack_results or not attack_results.privacy_report_metadata:
|
||||||
|
raise ValueError('Privacy metadata is not defined.')
|
||||||
|
if attack_results.privacy_report_metadata.epoch_num is None:
|
||||||
|
raise ValueError('epoch_num in metadata is not defined.')
|
|
@ -13,21 +13,20 @@
|
||||||
# limitations under the License.
|
# limitations under the License.
|
||||||
|
|
||||||
# Lint as: python3
|
# Lint as: python3
|
||||||
"""Tests for tensorflow_privacy.privacy.membership_inference_attack.privacy_report."""
|
"""Tests for tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.privacy_report."""
|
||||||
from absl.testing import absltest
|
from absl.testing import absltest
|
||||||
import numpy as np
|
import numpy as np
|
||||||
|
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack import privacy_report
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack import privacy_report
|
||||||
|
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import AttackResults
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import AttackResults
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import AttackResultsCollection
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import AttackResultsCollection
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import AttackType
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import AttackType
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import DataSize
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import DataSize
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import \
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import PrivacyReportMetadata
|
||||||
PrivacyReportMetadata
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import RocCurve
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import RocCurve
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import SingleAttackResult
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import SingleAttackResult
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import SingleSliceSpec
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import SingleSliceSpec
|
|
||||||
|
|
||||||
|
|
||||||
class PrivacyReportTest(absltest.TestCase):
|
class PrivacyReportTest(absltest.TestCase):
|
|
@ -0,0 +1,373 @@
|
||||||
|
# Copyright 2020, The TensorFlow Authors.
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
#
|
||||||
|
# http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
# See the License for the specific language governing permissions and
|
||||||
|
# limitations under the License.
|
||||||
|
|
||||||
|
# Lint as: python3
|
||||||
|
"""Code for membership inference attacks on seq2seq models.
|
||||||
|
|
||||||
|
Contains seq2seq specific logic for attack data structures, attack data
|
||||||
|
generation,
|
||||||
|
and the logistic regression membership inference attack.
|
||||||
|
"""
|
||||||
|
from typing import Iterator, List
|
||||||
|
|
||||||
|
from dataclasses import dataclass
|
||||||
|
import numpy as np
|
||||||
|
from scipy.stats import rankdata
|
||||||
|
from sklearn import metrics
|
||||||
|
from sklearn import model_selection
|
||||||
|
import tensorflow as tf
|
||||||
|
|
||||||
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack import models
|
||||||
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import AttackResults
|
||||||
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import AttackType
|
||||||
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import DataSize
|
||||||
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import PrivacyReportMetadata
|
||||||
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import RocCurve
|
||||||
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import SingleAttackResult
|
||||||
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import SingleSliceSpec
|
||||||
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.models import _sample_multidimensional_array
|
||||||
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.models import AttackerData
|
||||||
|
|
||||||
|
|
||||||
|
def _is_iterator(obj, obj_name):
|
||||||
|
"""Checks whether obj is a generator."""
|
||||||
|
if obj is not None and not isinstance(obj, Iterator):
|
||||||
|
raise ValueError('%s should be a generator.' % obj_name)
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class Seq2SeqAttackInputData:
|
||||||
|
"""Input data for running an attack on seq2seq models.
|
||||||
|
|
||||||
|
This includes only the data, and not configuration.
|
||||||
|
"""
|
||||||
|
logits_train: Iterator[np.ndarray] = None
|
||||||
|
logits_test: Iterator[np.ndarray] = None
|
||||||
|
|
||||||
|
# Contains ground-truth token indices for the target sequences.
|
||||||
|
labels_train: Iterator[np.ndarray] = None
|
||||||
|
labels_test: Iterator[np.ndarray] = None
|
||||||
|
|
||||||
|
# Size of the target sequence vocabulary.
|
||||||
|
vocab_size: int = None
|
||||||
|
|
||||||
|
# Train, test size = number of batches in training, test set.
|
||||||
|
# These values need to be supplied by the user as logits, labels
|
||||||
|
# are lazy loaded for seq2seq models.
|
||||||
|
train_size: int = 0
|
||||||
|
test_size: int = 0
|
||||||
|
|
||||||
|
def validate(self):
|
||||||
|
"""Validates the inputs."""
|
||||||
|
|
||||||
|
if (self.logits_train is None) != (self.logits_test is None):
|
||||||
|
raise ValueError(
|
||||||
|
'logits_train and logits_test should both be either set or unset')
|
||||||
|
|
||||||
|
if (self.labels_train is None) != (self.labels_test is None):
|
||||||
|
raise ValueError(
|
||||||
|
'labels_train and labels_test should both be either set or unset')
|
||||||
|
|
||||||
|
if self.logits_train is None or self.labels_train is None:
|
||||||
|
raise ValueError(
|
||||||
|
'Labels, logits of training, test sets should all be set')
|
||||||
|
|
||||||
|
if (self.vocab_size is None or self.train_size is None or
|
||||||
|
self.test_size is None):
|
||||||
|
raise ValueError('vocab_size, train_size, test_size should all be set')
|
||||||
|
|
||||||
|
if self.vocab_size is not None and not int:
|
||||||
|
raise ValueError('vocab_size should be of integer type')
|
||||||
|
|
||||||
|
if self.train_size is not None and not int:
|
||||||
|
raise ValueError('train_size should be of integer type')
|
||||||
|
|
||||||
|
if self.test_size is not None and not int:
|
||||||
|
raise ValueError('test_size should be of integer type')
|
||||||
|
|
||||||
|
_is_iterator(self.logits_train, 'logits_train')
|
||||||
|
_is_iterator(self.logits_test, 'logits_test')
|
||||||
|
_is_iterator(self.labels_train, 'labels_train')
|
||||||
|
_is_iterator(self.labels_test, 'labels_test')
|
||||||
|
|
||||||
|
def __str__(self):
|
||||||
|
"""Returns the shapes of variables that are not None."""
|
||||||
|
result = ['AttackInputData(']
|
||||||
|
|
||||||
|
if self.vocab_size is not None and self.train_size is not None:
|
||||||
|
result.append(
|
||||||
|
'logits_train with shape (%d, num_sequences, num_tokens, %d)' %
|
||||||
|
(self.train_size, self.vocab_size))
|
||||||
|
result.append(
|
||||||
|
'labels_train with shape (%d, num_sequences, num_tokens, 1)' %
|
||||||
|
self.train_size)
|
||||||
|
|
||||||
|
if self.vocab_size is not None and self.test_size is not None:
|
||||||
|
result.append(
|
||||||
|
'logits_test with shape (%d, num_sequences, num_tokens, %d)' %
|
||||||
|
(self.test_size, self.vocab_size))
|
||||||
|
result.append(
|
||||||
|
'labels_test with shape (%d, num_sequences, num_tokens, 1)' %
|
||||||
|
self.test_size)
|
||||||
|
|
||||||
|
result.append(')')
|
||||||
|
return '\n'.join(result)
|
||||||
|
|
||||||
|
|
||||||
|
def _get_attack_features_and_metadata(
|
||||||
|
logits: Iterator[np.ndarray],
|
||||||
|
labels: Iterator[np.ndarray]) -> (np.ndarray, float, float):
|
||||||
|
"""Returns the average rank of tokens per batch of sequences and the loss.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
logits: Logits returned by a seq2seq model, dim = (num_batches,
|
||||||
|
num_sequences, num_tokens, vocab_size).
|
||||||
|
labels: Target labels for the seq2seq model, dim = (num_batches,
|
||||||
|
num_sequences, num_tokens, 1).
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
1. An array of average ranks, dim = (num_batches, 1).
|
||||||
|
Each average rank is calculated over ranks of tokens in sequences of a
|
||||||
|
particular batch.
|
||||||
|
2. Loss computed over all logits and labels.
|
||||||
|
3. Accuracy computed over all logits and labels.
|
||||||
|
"""
|
||||||
|
ranks = []
|
||||||
|
loss = 0.0
|
||||||
|
dataset_length = 0.0
|
||||||
|
correct_preds = 0
|
||||||
|
total_preds = 0
|
||||||
|
for batch_logits, batch_labels in zip(logits, labels):
|
||||||
|
# Compute average rank for the current batch.
|
||||||
|
batch_ranks = _get_batch_ranks(batch_logits, batch_labels)
|
||||||
|
ranks.append(np.mean(batch_ranks))
|
||||||
|
|
||||||
|
# Update overall loss metrics with metrics of the current batch.
|
||||||
|
batch_loss, batch_length = _get_batch_loss_metrics(batch_logits,
|
||||||
|
batch_labels)
|
||||||
|
loss += batch_loss
|
||||||
|
dataset_length += batch_length
|
||||||
|
|
||||||
|
# Update overall accuracy metrics with metrics of the current batch.
|
||||||
|
batch_correct_preds, batch_total_preds = _get_batch_accuracy_metrics(
|
||||||
|
batch_logits, batch_labels)
|
||||||
|
correct_preds += batch_correct_preds
|
||||||
|
total_preds += batch_total_preds
|
||||||
|
|
||||||
|
# Compute loss and accuracy for the dataset.
|
||||||
|
loss = loss / dataset_length
|
||||||
|
accuracy = correct_preds / total_preds
|
||||||
|
|
||||||
|
return np.array(ranks), loss, accuracy
|
||||||
|
|
||||||
|
|
||||||
|
def _get_batch_ranks(batch_logits: np.ndarray,
|
||||||
|
batch_labels: np.ndarray) -> np.ndarray:
|
||||||
|
"""Returns the ranks of tokens in a batch of sequences.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
batch_logits: Logits returned by a seq2seq model, dim = (num_sequences,
|
||||||
|
num_tokens, vocab_size).
|
||||||
|
batch_labels: Target labels for the seq2seq model, dim = (num_sequences,
|
||||||
|
num_tokens, 1).
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
An array of ranks of tokens in a batch of sequences, dim = (num_sequences,
|
||||||
|
num_tokens, 1)
|
||||||
|
"""
|
||||||
|
batch_ranks = []
|
||||||
|
for sequence_logits, sequence_labels in zip(batch_logits, batch_labels):
|
||||||
|
batch_ranks += _get_ranks_for_sequence(sequence_logits, sequence_labels)
|
||||||
|
|
||||||
|
return np.array(batch_ranks)
|
||||||
|
|
||||||
|
|
||||||
|
def _get_ranks_for_sequence(logits: np.ndarray,
|
||||||
|
labels: np.ndarray) -> List[float]:
|
||||||
|
"""Returns ranks for a sequence.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
logits: Logits of a single sequence, dim = (num_tokens, vocab_size).
|
||||||
|
labels: Target labels of a single sequence, dim = (num_tokens, 1).
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
An array of ranks for tokens in the sequence, dim = (num_tokens, 1).
|
||||||
|
"""
|
||||||
|
sequence_ranks = []
|
||||||
|
for logit, label in zip(logits, labels.astype(int)):
|
||||||
|
rank = rankdata(-logit, method='min')[label] - 1.0
|
||||||
|
sequence_ranks.append(rank)
|
||||||
|
|
||||||
|
return sequence_ranks
|
||||||
|
|
||||||
|
|
||||||
|
def _get_batch_loss_metrics(batch_logits: np.ndarray,
|
||||||
|
batch_labels: np.ndarray) -> (float, int):
|
||||||
|
"""Returns the loss, number of sequences for a batch.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
batch_logits: Logits returned by a seq2seq model, dim = (num_sequences,
|
||||||
|
num_tokens, vocab_size).
|
||||||
|
batch_labels: Target labels for the seq2seq model, dim = (num_sequences,
|
||||||
|
num_tokens, 1).
|
||||||
|
"""
|
||||||
|
batch_loss = 0.0
|
||||||
|
batch_length = len(batch_logits)
|
||||||
|
for sequence_logits, sequence_labels in zip(batch_logits, batch_labels):
|
||||||
|
sequence_loss = tf.losses.sparse_categorical_crossentropy(
|
||||||
|
tf.keras.backend.constant(sequence_labels),
|
||||||
|
tf.keras.backend.constant(sequence_logits),
|
||||||
|
from_logits=True)
|
||||||
|
batch_loss += sequence_loss.numpy().sum()
|
||||||
|
|
||||||
|
return batch_loss / batch_length, batch_length
|
||||||
|
|
||||||
|
|
||||||
|
def _get_batch_accuracy_metrics(batch_logits: np.ndarray,
|
||||||
|
batch_labels: np.ndarray) -> (float, float):
|
||||||
|
"""Returns the number of correct predictions, total number of predictions for a batch.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
batch_logits: Logits returned by a seq2seq model, dim = (num_sequences,
|
||||||
|
num_tokens, vocab_size).
|
||||||
|
batch_labels: Target labels for the seq2seq model, dim = (num_sequences,
|
||||||
|
num_tokens, 1).
|
||||||
|
"""
|
||||||
|
batch_correct_preds = 0.0
|
||||||
|
batch_total_preds = 0.0
|
||||||
|
for sequence_logits, sequence_labels in zip(batch_logits, batch_labels):
|
||||||
|
preds = tf.metrics.sparse_categorical_accuracy(
|
||||||
|
tf.keras.backend.constant(sequence_labels),
|
||||||
|
tf.keras.backend.constant(sequence_logits))
|
||||||
|
batch_correct_preds += preds.numpy().sum()
|
||||||
|
batch_total_preds += len(sequence_labels)
|
||||||
|
|
||||||
|
return batch_correct_preds, batch_total_preds
|
||||||
|
|
||||||
|
|
||||||
|
def create_seq2seq_attacker_data(
|
||||||
|
attack_input_data: Seq2SeqAttackInputData,
|
||||||
|
test_fraction: float = 0.25,
|
||||||
|
balance: bool = True,
|
||||||
|
privacy_report_metadata: PrivacyReportMetadata = PrivacyReportMetadata()
|
||||||
|
) -> AttackerData:
|
||||||
|
"""Prepares Seq2SeqAttackInputData to train ML attackers.
|
||||||
|
|
||||||
|
Uses logits and losses to generate ranks and performs a random train-test
|
||||||
|
split.
|
||||||
|
|
||||||
|
Also computes metadata (loss, accuracy) for the model under attack
|
||||||
|
and populates respective fields of PrivacyReportMetadata.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
attack_input_data: Original Seq2SeqAttackInputData
|
||||||
|
test_fraction: Fraction of the dataset to include in the test split.
|
||||||
|
balance: Whether the training and test sets for the membership inference
|
||||||
|
attacker should have a balanced (roughly equal) number of samples from the
|
||||||
|
training and test sets used to develop the model under attack.
|
||||||
|
privacy_report_metadata: the metadata of the model under attack.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
AttackerData.
|
||||||
|
"""
|
||||||
|
attack_input_train, loss_train, accuracy_train = _get_attack_features_and_metadata(
|
||||||
|
attack_input_data.logits_train, attack_input_data.labels_train)
|
||||||
|
attack_input_test, loss_test, accuracy_test = _get_attack_features_and_metadata(
|
||||||
|
attack_input_data.logits_test, attack_input_data.labels_test)
|
||||||
|
|
||||||
|
if balance:
|
||||||
|
min_size = min(len(attack_input_train), len(attack_input_test))
|
||||||
|
attack_input_train = _sample_multidimensional_array(attack_input_train,
|
||||||
|
min_size)
|
||||||
|
attack_input_test = _sample_multidimensional_array(attack_input_test,
|
||||||
|
min_size)
|
||||||
|
|
||||||
|
features_all = np.concatenate((attack_input_train, attack_input_test))
|
||||||
|
ntrain, ntest = attack_input_train.shape[0], attack_input_test.shape[0]
|
||||||
|
|
||||||
|
# Reshape for classifying one-dimensional features
|
||||||
|
features_all = features_all.reshape(-1, 1)
|
||||||
|
|
||||||
|
labels_all = np.concatenate(((np.zeros(ntrain)), (np.ones(ntest))))
|
||||||
|
|
||||||
|
# Perform a train-test split
|
||||||
|
features_train, features_test, \
|
||||||
|
is_training_labels_train, is_training_labels_test = \
|
||||||
|
model_selection.train_test_split(
|
||||||
|
features_all, labels_all, test_size=test_fraction, stratify=labels_all)
|
||||||
|
|
||||||
|
# Populate accuracy, loss fields in privacy report metadata
|
||||||
|
privacy_report_metadata.loss_train = loss_train
|
||||||
|
privacy_report_metadata.loss_test = loss_test
|
||||||
|
privacy_report_metadata.accuracy_train = accuracy_train
|
||||||
|
privacy_report_metadata.accuracy_test = accuracy_test
|
||||||
|
|
||||||
|
return AttackerData(features_train, is_training_labels_train, features_test,
|
||||||
|
is_training_labels_test,
|
||||||
|
DataSize(ntrain=ntrain, ntest=ntest))
|
||||||
|
|
||||||
|
|
||||||
|
def run_seq2seq_attack(attack_input: Seq2SeqAttackInputData,
|
||||||
|
privacy_report_metadata: PrivacyReportMetadata = None,
|
||||||
|
balance_attacker_training: bool = True) -> AttackResults:
|
||||||
|
"""Runs membership inference attacks on a seq2seq model.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
attack_input: input data for running an attack
|
||||||
|
privacy_report_metadata: the metadata of the model under attack.
|
||||||
|
balance_attacker_training: Whether the training and test sets for the
|
||||||
|
membership inference attacker should have a balanced (roughly equal)
|
||||||
|
number of samples from the training and test sets used to develop the
|
||||||
|
model under attack.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
the attack result.
|
||||||
|
"""
|
||||||
|
attack_input.validate()
|
||||||
|
|
||||||
|
# The attacker uses the average rank (a single number) of a seq2seq dataset
|
||||||
|
# record to determine membership. So only Logistic Regression is supported,
|
||||||
|
# as it makes the most sense for single-number features.
|
||||||
|
attacker = models.LogisticRegressionAttacker()
|
||||||
|
|
||||||
|
# Create attacker data and populate fields of privacy_report_metadata
|
||||||
|
privacy_report_metadata = privacy_report_metadata or PrivacyReportMetadata()
|
||||||
|
prepared_attacker_data = create_seq2seq_attacker_data(
|
||||||
|
attack_input_data=attack_input,
|
||||||
|
balance=balance_attacker_training,
|
||||||
|
privacy_report_metadata=privacy_report_metadata)
|
||||||
|
|
||||||
|
attacker.train_model(prepared_attacker_data.features_train,
|
||||||
|
prepared_attacker_data.is_training_labels_train)
|
||||||
|
|
||||||
|
# Run the attacker on (permuted) test examples.
|
||||||
|
predictions_test = attacker.predict(prepared_attacker_data.features_test)
|
||||||
|
|
||||||
|
# Generate ROC curves with predictions.
|
||||||
|
fpr, tpr, thresholds = metrics.roc_curve(
|
||||||
|
prepared_attacker_data.is_training_labels_test, predictions_test)
|
||||||
|
|
||||||
|
roc_curve = RocCurve(tpr=tpr, fpr=fpr, thresholds=thresholds)
|
||||||
|
|
||||||
|
attack_results = [
|
||||||
|
SingleAttackResult(
|
||||||
|
slice_spec=SingleSliceSpec(),
|
||||||
|
attack_type=AttackType.LOGISTIC_REGRESSION,
|
||||||
|
roc_curve=roc_curve,
|
||||||
|
data_size=prepared_attacker_data.data_size)
|
||||||
|
]
|
||||||
|
|
||||||
|
return AttackResults(
|
||||||
|
single_attack_results=attack_results,
|
||||||
|
privacy_report_metadata=privacy_report_metadata)
|
|
@ -13,15 +13,15 @@
|
||||||
# limitations under the License.
|
# limitations under the License.
|
||||||
|
|
||||||
# Lint as: python3
|
# Lint as: python3
|
||||||
"""Tests for tensorflow_privacy.privacy.membership_inference_attack.seq2seq_mia."""
|
"""Tests for tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.seq2seq_mia."""
|
||||||
from absl.testing import absltest
|
from absl.testing import absltest
|
||||||
import numpy as np
|
import numpy as np
|
||||||
|
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import AttackType
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import AttackType
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import PrivacyReportMetadata
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import PrivacyReportMetadata
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack.seq2seq_mia import create_seq2seq_attacker_data
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.seq2seq_mia import create_seq2seq_attacker_data
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack.seq2seq_mia import run_seq2seq_attack
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.seq2seq_mia import run_seq2seq_attack
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack.seq2seq_mia import Seq2SeqAttackInputData
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.seq2seq_mia import Seq2SeqAttackInputData
|
||||||
|
|
||||||
|
|
||||||
class Seq2SeqAttackInputDataTest(absltest.TestCase):
|
class Seq2SeqAttackInputDataTest(absltest.TestCase):
|
|
@ -0,0 +1,199 @@
|
||||||
|
# Copyright 2020, The TensorFlow Authors.
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
#
|
||||||
|
# http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
# See the License for the specific language governing permissions and
|
||||||
|
# limitations under the License.
|
||||||
|
|
||||||
|
# Lint as: python3
|
||||||
|
"""A hook and a function in tf estimator for membership inference attack."""
|
||||||
|
|
||||||
|
import os
|
||||||
|
from typing import Iterable
|
||||||
|
from absl import logging
|
||||||
|
import numpy as np
|
||||||
|
import tensorflow.compat.v1 as tf
|
||||||
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack import membership_inference_attack as mia
|
||||||
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import AttackInputData
|
||||||
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import AttackType
|
||||||
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import get_flattened_attack_metrics
|
||||||
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import SlicingSpec
|
||||||
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.utils import log_loss
|
||||||
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.utils_tensorboard import write_results_to_tensorboard
|
||||||
|
|
||||||
|
|
||||||
|
def calculate_losses(estimator, input_fn, labels):
|
||||||
|
"""Get predictions and losses for samples.
|
||||||
|
|
||||||
|
The assumptions are 1) the loss is cross-entropy loss, and 2) user have
|
||||||
|
specified prediction mode to return predictions, e.g.,
|
||||||
|
when mode == tf.estimator.ModeKeys.PREDICT, the model function returns
|
||||||
|
tf.estimator.EstimatorSpec(mode=mode, predictions=tf.nn.softmax(logits)).
|
||||||
|
|
||||||
|
Args:
|
||||||
|
estimator: model to make prediction
|
||||||
|
input_fn: input function to be used in estimator.predict
|
||||||
|
labels: array of size (n_samples, ), true labels of samples (integer valued)
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
preds: probability vector of each sample
|
||||||
|
loss: cross entropy loss of each sample
|
||||||
|
"""
|
||||||
|
pred = np.array(list(estimator.predict(input_fn=input_fn)))
|
||||||
|
loss = log_loss(labels, pred)
|
||||||
|
return pred, loss
|
||||||
|
|
||||||
|
|
||||||
|
class MembershipInferenceTrainingHook(tf.estimator.SessionRunHook):
|
||||||
|
"""Training hook to perform membership inference attack on epoch end."""
|
||||||
|
|
||||||
|
def __init__(
|
||||||
|
self,
|
||||||
|
estimator,
|
||||||
|
in_train, out_train,
|
||||||
|
input_fn_constructor,
|
||||||
|
slicing_spec: SlicingSpec = None,
|
||||||
|
attack_types: Iterable[AttackType] = (AttackType.THRESHOLD_ATTACK,),
|
||||||
|
tensorboard_dir=None,
|
||||||
|
tensorboard_merge_classifiers=False):
|
||||||
|
"""Initialize the hook.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
estimator: model to be tested
|
||||||
|
in_train: (in_training samples, in_training labels)
|
||||||
|
out_train: (out_training samples, out_training labels)
|
||||||
|
input_fn_constructor: a function that receives sample, label and construct
|
||||||
|
the input_fn for model prediction
|
||||||
|
slicing_spec: slicing specification of the attack
|
||||||
|
attack_types: a list of attacks, each of type AttackType
|
||||||
|
tensorboard_dir: directory for tensorboard summary
|
||||||
|
tensorboard_merge_classifiers: if true, plot different classifiers with
|
||||||
|
the same slicing_spec and metric in the same figure
|
||||||
|
"""
|
||||||
|
in_train_data, self._in_train_labels = in_train
|
||||||
|
out_train_data, self._out_train_labels = out_train
|
||||||
|
|
||||||
|
# Define the input functions for both in and out-training samples.
|
||||||
|
self._in_train_input_fn = input_fn_constructor(in_train_data,
|
||||||
|
self._in_train_labels)
|
||||||
|
self._out_train_input_fn = input_fn_constructor(out_train_data,
|
||||||
|
self._out_train_labels)
|
||||||
|
self._estimator = estimator
|
||||||
|
self._slicing_spec = slicing_spec
|
||||||
|
self._attack_types = attack_types
|
||||||
|
self._tensorboard_merge_classifiers = tensorboard_merge_classifiers
|
||||||
|
if tensorboard_dir:
|
||||||
|
if tensorboard_merge_classifiers:
|
||||||
|
self._writers = {}
|
||||||
|
with tf.Graph().as_default():
|
||||||
|
for attack_type in attack_types:
|
||||||
|
self._writers[attack_type.name] = tf.summary.FileWriter(
|
||||||
|
os.path.join(tensorboard_dir, 'MI', attack_type.name))
|
||||||
|
else:
|
||||||
|
with tf.Graph().as_default():
|
||||||
|
self._writers = tf.summary.FileWriter(
|
||||||
|
os.path.join(tensorboard_dir, 'MI'))
|
||||||
|
logging.info('Will write to tensorboard.')
|
||||||
|
else:
|
||||||
|
self._writers = None
|
||||||
|
|
||||||
|
def end(self, session):
|
||||||
|
results = run_attack_helper(self._estimator,
|
||||||
|
self._in_train_input_fn,
|
||||||
|
self._out_train_input_fn,
|
||||||
|
self._in_train_labels, self._out_train_labels,
|
||||||
|
self._slicing_spec,
|
||||||
|
self._attack_types)
|
||||||
|
logging.info(results)
|
||||||
|
|
||||||
|
att_types, att_slices, att_metrics, att_values = get_flattened_attack_metrics(
|
||||||
|
results)
|
||||||
|
print('Attack result:')
|
||||||
|
print('\n'.join([' %s: %.4f' % (', '.join([s, t, m]), v) for t, s, m, v in
|
||||||
|
zip(att_types, att_slices, att_metrics, att_values)]))
|
||||||
|
|
||||||
|
# Write to tensorboard if tensorboard_dir is specified
|
||||||
|
global_step = self._estimator.get_variable_value('global_step')
|
||||||
|
if self._writers is not None:
|
||||||
|
write_results_to_tensorboard(results, self._writers, global_step,
|
||||||
|
self._tensorboard_merge_classifiers)
|
||||||
|
|
||||||
|
|
||||||
|
def run_attack_on_tf_estimator_model(
|
||||||
|
estimator, in_train, out_train,
|
||||||
|
input_fn_constructor,
|
||||||
|
slicing_spec: SlicingSpec = None,
|
||||||
|
attack_types: Iterable[AttackType] = (AttackType.THRESHOLD_ATTACK,)):
|
||||||
|
"""Performs the attack in the end of training.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
estimator: model to be tested
|
||||||
|
in_train: (in_training samples, in_training labels)
|
||||||
|
out_train: (out_training samples, out_training labels)
|
||||||
|
input_fn_constructor: a function that receives sample, label and construct
|
||||||
|
the input_fn for model prediction
|
||||||
|
slicing_spec: slicing specification of the attack
|
||||||
|
attack_types: a list of attacks, each of type AttackType
|
||||||
|
Returns:
|
||||||
|
Results of the attack
|
||||||
|
"""
|
||||||
|
in_train_data, in_train_labels = in_train
|
||||||
|
out_train_data, out_train_labels = out_train
|
||||||
|
|
||||||
|
# Define the input functions for both in and out-training samples.
|
||||||
|
in_train_input_fn = input_fn_constructor(in_train_data, in_train_labels)
|
||||||
|
out_train_input_fn = input_fn_constructor(out_train_data, out_train_labels)
|
||||||
|
|
||||||
|
# Call the helper to run the attack.
|
||||||
|
results = run_attack_helper(estimator,
|
||||||
|
in_train_input_fn, out_train_input_fn,
|
||||||
|
in_train_labels, out_train_labels,
|
||||||
|
slicing_spec,
|
||||||
|
attack_types)
|
||||||
|
logging.info('End of training attack:')
|
||||||
|
logging.info(results)
|
||||||
|
return results
|
||||||
|
|
||||||
|
|
||||||
|
def run_attack_helper(
|
||||||
|
estimator,
|
||||||
|
in_train_input_fn, out_train_input_fn,
|
||||||
|
in_train_labels, out_train_labels,
|
||||||
|
slicing_spec: SlicingSpec = None,
|
||||||
|
attack_types: Iterable[AttackType] = (AttackType.THRESHOLD_ATTACK,)):
|
||||||
|
"""A helper function to perform attack.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
estimator: model to be tested
|
||||||
|
in_train_input_fn: input_fn for in training data
|
||||||
|
out_train_input_fn: input_fn for out of training data
|
||||||
|
in_train_labels: in training labels
|
||||||
|
out_train_labels: out of training labels
|
||||||
|
slicing_spec: slicing specification of the attack
|
||||||
|
attack_types: a list of attacks, each of type AttackType
|
||||||
|
Returns:
|
||||||
|
Results of the attack
|
||||||
|
"""
|
||||||
|
# Compute predictions and losses
|
||||||
|
in_train_pred, in_train_loss = calculate_losses(estimator,
|
||||||
|
in_train_input_fn,
|
||||||
|
in_train_labels)
|
||||||
|
out_train_pred, out_train_loss = calculate_losses(estimator,
|
||||||
|
out_train_input_fn,
|
||||||
|
out_train_labels)
|
||||||
|
attack_input = AttackInputData(
|
||||||
|
logits_train=in_train_pred, logits_test=out_train_pred,
|
||||||
|
labels_train=in_train_labels, labels_test=out_train_labels,
|
||||||
|
loss_train=in_train_loss, loss_test=out_train_loss
|
||||||
|
)
|
||||||
|
results = mia.run_attacks(attack_input,
|
||||||
|
slicing_spec=slicing_spec,
|
||||||
|
attack_types=attack_types)
|
||||||
|
return results
|
|
@ -21,11 +21,11 @@ from absl import logging
|
||||||
|
|
||||||
import numpy as np
|
import numpy as np
|
||||||
import tensorflow.compat.v1 as tf
|
import tensorflow.compat.v1 as tf
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import AttackType
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import AttackType
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import get_flattened_attack_metrics
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import get_flattened_attack_metrics
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import SlicingSpec
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import SlicingSpec
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack.tf_estimator_evaluation import MembershipInferenceTrainingHook
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.tf_estimator_evaluation import MembershipInferenceTrainingHook
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack.tf_estimator_evaluation import run_attack_on_tf_estimator_model
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.tf_estimator_evaluation import run_attack_on_tf_estimator_model
|
||||||
|
|
||||||
|
|
||||||
FLAGS = flags.FLAGS
|
FLAGS = flags.FLAGS
|
|
@ -13,17 +13,17 @@
|
||||||
# limitations under the License.
|
# limitations under the License.
|
||||||
|
|
||||||
# Lint as: python3
|
# Lint as: python3
|
||||||
"""Tests for tensorflow_privacy.privacy.membership_inference_attack.tf_estimator_evaluation."""
|
"""Tests for tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.tf_estimator_evaluation."""
|
||||||
|
|
||||||
from absl.testing import absltest
|
from absl.testing import absltest
|
||||||
|
|
||||||
import numpy as np
|
import numpy as np
|
||||||
import tensorflow.compat.v1 as tf
|
import tensorflow.compat.v1 as tf
|
||||||
|
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack import tf_estimator_evaluation
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack import tf_estimator_evaluation
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import AttackResults
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import AttackResults
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import AttackType
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import AttackType
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import get_flattened_attack_metrics
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import get_flattened_attack_metrics
|
||||||
|
|
||||||
|
|
||||||
class UtilsTest(absltest.TestCase):
|
class UtilsTest(absltest.TestCase):
|
|
@ -19,8 +19,8 @@ from typing import Union
|
||||||
|
|
||||||
import tensorflow as tf2
|
import tensorflow as tf2
|
||||||
import tensorflow.compat.v1 as tf1
|
import tensorflow.compat.v1 as tf1
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import AttackResults
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import AttackResults
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import get_flattened_attack_metrics
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import get_flattened_attack_metrics
|
||||||
|
|
||||||
|
|
||||||
def write_to_tensorboard(writers, tags, values, step):
|
def write_to_tensorboard(writers, tags, values, step):
|
|
@ -13,12 +13,12 @@
|
||||||
# limitations under the License.
|
# limitations under the License.
|
||||||
|
|
||||||
# Lint as: python3
|
# Lint as: python3
|
||||||
"""Tests for tensorflow_privacy.privacy.membership_inference_attack.utils."""
|
"""Tests for tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.utils."""
|
||||||
from absl.testing import absltest
|
from absl.testing import absltest
|
||||||
|
|
||||||
import numpy as np
|
import numpy as np
|
||||||
|
|
||||||
from tensorflow_privacy.privacy.membership_inference_attack import utils
|
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack import utils
|
||||||
|
|
||||||
|
|
||||||
class UtilsTest(absltest.TestCase):
|
class UtilsTest(absltest.TestCase):
|
Loading…
Reference in a new issue