Moving membership_inference_attack to privacy_tests/membership_inference_attack

PiperOrigin-RevId: 377860420
This commit is contained in:
Vadym Doroshenko 2021-06-07 01:11:32 -07:00 committed by A. Unique TensorFlower
parent eaf9fbf969
commit c12a7acd9d
46 changed files with 2899 additions and 2681 deletions

26
LICENSE
View file

@ -200,3 +200,29 @@
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
------------------
Files: privacy/membership_inference_attack/codelabs/third_party/seq2seq_membership_inference/*
MIT License
Copyright (c) 2019 Congzheng Song
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

View file

@ -1,269 +1,2 @@
# Membership inference attack
A good privacy-preserving model learns from the training data, but
doesn't memorize it. This library provides empirical tests for measuring
potential memorization.
Technically, the tests build classifiers that infer whether a particular sample
was present in the training set. The more accurate such classifier is, the more
memorization is present and thus the less privacy-preserving the model is.
The privacy vulnerability (or memorization potential) is measured
via the area under the ROC-curve (`auc`) or via max{|fpr - tpr|} (`advantage`)
of the attack classifier. These measures are very closely related.
The tests provided by the library are "black box". That is, only the outputs of
the model are used (e.g., losses, logits, predictions). Neither model internals
(weights) nor input samples are required.
## How to use
### Installation notes
To use the latest version of the MIA library, please install TF Privacy with
"pip install -U git+https://github.com/tensorflow/privacy". See
https://github.com/tensorflow/privacy/issues/151 for more details.
### Basic usage
The simplest possible usage is
```python
from tensorflow_privacy.privacy.membership_inference_attack import membership_inference_attack as mia
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import AttackInputData
# Suppose we have the labels as integers starting from 0
# labels_train shape: (n_train, )
# labels_test shape: (n_test, )
# Evaluate your model on training and test examples to get
# loss_train shape: (n_train, )
# loss_test shape: (n_test, )
attacks_result = mia.run_attacks(
AttackInputData(
loss_train = loss_train,
loss_test = loss_test,
labels_train = labels_train,
labels_test = labels_test))
```
This example calls `run_attacks` with the default options to run a host of
(fairly simple) attacks behind the scenes (depending on which data is fed in),
and computes the most important measures.
> NOTE: The train and test sets are balanced internally, i.e., an equal number
> of in-training and out-of-training examples is chosen for the attacks
> (whichever has fewer examples). These are subsampled uniformly at random
> without replacement from the larger of the two.
Then, we can view the attack results by:
```python
print(attacks_result.summary())
# Example output:
# -> Best-performing attacks over all slices
# THRESHOLD_ATTACK (with 50000 training and 10000 test examples) achieved an AUC of 0.59 on slice Entire dataset
# THRESHOLD_ATTACK (with 50000 training and 10000 test examples) achieved an advantage of 0.20 on slice Entire dataset
```
### Other codelabs
Please head over to the [codelabs](https://github.com/tensorflow/privacy/tree/master/tensorflow_privacy/privacy/membership_inference_attack/codelabs)
section for an overview of the library in action.
### Advanced usage
#### Specifying attacks to run
Sometimes, we have more information about the data, such as the logits and the
labels,
and we may want to have finer-grained control of the attack, such as using more
complicated classifiers instead of the simple threshold attack, and looks at the
attack results by examples' class.
In thoses cases, we can provide more information to `run_attacks`.
```python
from tensorflow_privacy.privacy.membership_inference_attack import membership_inference_attack as mia
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import AttackInputData
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import SlicingSpec
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import AttackType
```
First, similar as before, we specify the input for the attack as an
`AttackInputData` object:
```python
# Evaluate your model on training and test examples to get
# logits_train shape: (n_train, n_classes)
# logits_test shape: (n_test, n_classes)
# loss_train shape: (n_train, )
# loss_test shape: (n_test, )
attack_input = AttackInputData(
logits_train = logits_train,
logits_test = logits_test,
loss_train = loss_train,
loss_test = loss_test,
labels_train = labels_train,
labels_test = labels_test)
```
Instead of `logits`, you can also specify
`probs_train` and `probs_test` as the predicted probabilty vectors of each
example.
Then, we specify some details of the attack.
The first part includes the specifications of the slicing of the data. For
example, we may want to evaluate the result on the whole dataset, or by class,
percentiles, or the correctness of the model's classification.
These can be specified by a `SlicingSpec` object.
```python
slicing_spec = SlicingSpec(
entire_dataset = True,
by_class = True,
by_percentiles = False,
by_classification_correctness = True)
```
The second part specifies the classifiers for the attacker to use.
Currently, our API supports five classifiers, including
`AttackType.THRESHOLD_ATTACK` for simple threshold attack,
`AttackType.LOGISTIC_REGRESSION`,
`AttackType.MULTI_LAYERED_PERCEPTRON`,
`AttackType.RANDOM_FOREST`, and
`AttackType.K_NEAREST_NEIGHBORS`
which use the corresponding machine learning models.
For some model, different classifiers can yield pertty different results.
We can put multiple classifers in a list:
```python
attack_types = [
AttackType.THRESHOLD_ATTACK,
AttackType.LOGISTIC_REGRESSION
]
```
Now, we can call the `run_attacks` methods with all specifications:
```python
attacks_result = mia.run_attacks(attack_input=attack_input,
slicing_spec=slicing_spec,
attack_types=attack_types)
```
This returns an object of type `AttackResults`. We can, for example, use the
following code to see the attack results specificed per-slice, as we have
request attacks by class and by model's classification correctness.
```python
print(attacks_result.summary(by_slices = True))
# Example output:
# -> Best-performing attacks over all slices
# THRESHOLD_ATTACK achieved an AUC of 0.75 on slice CORRECTLY_CLASSIFIED=False
# THRESHOLD_ATTACK achieved an advantage of 0.38 on slice CORRECTLY_CLASSIFIED=False
#
# Best-performing attacks over slice: "Entire dataset"
# LOGISTIC_REGRESSION achieved an AUC of 0.61
# THRESHOLD_ATTACK achieved an advantage of 0.22
#
# Best-performing attacks over slice: "CLASS=0"
# LOGISTIC_REGRESSION achieved an AUC of 0.62
# LOGISTIC_REGRESSION achieved an advantage of 0.24
#
# Best-performing attacks over slice: "CLASS=1"
# LOGISTIC_REGRESSION achieved an AUC of 0.61
# LOGISTIC_REGRESSION achieved an advantage of 0.19
#
# ...
#
# Best-performing attacks over slice: "CORRECTLY_CLASSIFIED=True"
# LOGISTIC_REGRESSION achieved an AUC of 0.53
# THRESHOLD_ATTACK achieved an advantage of 0.05
#
# Best-performing attacks over slice: "CORRECTLY_CLASSIFIED=False"
# THRESHOLD_ATTACK achieved an AUC of 0.75
# THRESHOLD_ATTACK achieved an advantage of 0.38
```
#### Viewing and plotting the attack results
We have seen an example of using `summary()` to view the attack results as text.
We also provide some other ways for inspecting the attack results.
To get the attack that achieves the maximum attacker advantage or AUC, we can do
```python
max_auc_attacker = attacks_result.get_result_with_max_auc()
max_advantage_attacker = attacks_result.get_result_with_max_attacker_advantage()
```
Then, for individual attack, such as `max_auc_attacker`, we can check its type,
attacker advantage and AUC by
```python
print("Attack type with max AUC: %s, AUC of %.2f, Attacker advantage of %.2f" %
(max_auc_attacker.attack_type,
max_auc_attacker.roc_curve.get_auc(),
max_auc_attacker.roc_curve.get_attacker_advantage()))
# Example output:
# -> Attack type with max AUC: THRESHOLD_ATTACK, AUC of 0.75, Attacker advantage of 0.38
```
We can also plot its ROC curve by
```python
import tensorflow_privacy.privacy.membership_inference_attack.plotting as plotting
figure = plotting.plot_roc_curve(max_auc_attacker.roc_curve)
```
which would give a figure like the one below
![roc_fig](https://github.com/tensorflow/privacy/blob/master/tensorflow_privacy/privacy/membership_inference_attack/codelab_roc_fig.png?raw=true)
Additionally, we provide functionality to convert the attack results into Pandas
data frame:
```python
import pandas as pd
pd.set_option("display.max_rows", 8, "display.max_columns", None)
print(attacks_result.calculate_pd_dataframe())
# Example output:
# slice feature slice value attack type Attacker advantage AUC
# 0 entire_dataset threshold 0.216440 0.600630
# 1 entire_dataset lr 0.212073 0.612989
# 2 class 0 threshold 0.226000 0.611669
# 3 class 0 lr 0.239452 0.624076
# .. ... ... ... ... ...
# 22 correctly_classfied True threshold 0.054907 0.471290
# 23 correctly_classfied True lr 0.046986 0.525194
# 24 correctly_classfied False threshold 0.379465 0.748138
# 25 correctly_classfied False lr 0.370713 0.737148
```
### External guides / press mentions
* [Introductory blog post](https://franziska-boenisch.de/posts/2021/01/membership-inference/)
to the theory and the library by Franziska Boenisch from the Fraunhofer AISEC
institute.
* [Google AI Blog Post](https://ai.googleblog.com/2021/01/google-research-looking-back-at-2020.html#ResponsibleAI)
* [TensorFlow Blog Post](https://blog.tensorflow.org/2020/06/introducing-new-privacy-testing-library.html)
* [VentureBeat article](https://venturebeat.com/2020/06/24/google-releases-experimental-tensorflow-module-that-tests-the-privacy-of-ai-models/)
* [Tech Xplore article](https://techxplore.com/news/2020-06-google-tensorflow-privacy-module.html)
## Contact / Feedback
Fill out this
[Google form](https://docs.google.com/forms/d/1DPwr3_OfMcqAOA6sdelTVjIZhKxMZkXvs94z16UCDa4/edit)
or reach out to us at tf-privacy@google.com and let us know how youre using
this module. Were keen on hearing your stories, feedback, and suggestions!
## Contributing
If you wish to add novel attacks to the attack library, please check our
[guidelines](https://github.com/tensorflow/privacy/blob/master/tensorflow_privacy/privacy/membership_inference_attack/CONTRIBUTING.md).
## Copyright
Copyright 2021 - Google LLC
The sources from this folder were moved to
privacy/privacy_tests/membership_inference_attack.

View file

@ -11,3 +11,13 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""The old location of Membership Inference Attack sources."""
import warnings
warnings.warn(
"\nMembership inference attack sources were moved. Please replace"
"\nimport tensorflow_privacy.privacy.membership_inference_attack\n"
"\nwith"
"\nimport tensorflow_privacy.privacy.privacy_tests.membership_inference_attack"
)

View file

@ -13,807 +13,6 @@
# limitations under the License.
# Lint as: python3
"""Data structures representing attack inputs, configuration, outputs."""
import collections
import enum
import glob
import os
import pickle
from typing import Any, Iterable, Union
from dataclasses import dataclass
import numpy as np
import pandas as pd
from scipy import special
from sklearn import metrics
import tensorflow_privacy.privacy.membership_inference_attack.utils as utils
"""Moved to privacy_attack/membership_inference_attack."""
ENTIRE_DATASET_SLICE_STR = 'Entire dataset'
class SlicingFeature(enum.Enum):
"""Enum with features by which slicing is available."""
CLASS = 'class'
PERCENTILE = 'percentile'
CORRECTLY_CLASSIFIED = 'correctly_classified'
@dataclass
class SingleSliceSpec:
"""Specifies a slice.
The slice is defined by values in one feature - it might be a single value
(eg. slice of examples of the specific classification class) or some set of
values (eg. range of percentiles of the attacked model loss).
When feature is None, it means that the slice is the entire dataset.
"""
feature: SlicingFeature = None
value: Any = None
@property
def entire_dataset(self):
return self.feature is None
def __str__(self):
if self.entire_dataset:
return ENTIRE_DATASET_SLICE_STR
if self.feature == SlicingFeature.PERCENTILE:
return 'Loss percentiles: %d-%d' % self.value
return '%s=%s' % (self.feature.name, self.value)
@dataclass
class SlicingSpec:
"""Specification of a slicing procedure.
Each variable which is set specifies a slicing by different dimension.
"""
# When is set to true, one of the slices is the whole dataset.
entire_dataset: bool = True
# Used in classification tasks for slicing by classes. It is assumed that
# classes are integers 0, 1, ... number of classes. When true one slice per
# each class is generated.
by_class: Union[bool, Iterable[int], int] = False
# if true, it generates 10 slices for percentiles of the loss - 0-10%, 10-20%,
# ... 90-100%.
by_percentiles: bool = False
# When true, a slice for correctly classifed and a slice for misclassifed
# examples will be generated.
by_classification_correctness: bool = False
def __str__(self):
"""Only keeps the True values."""
result = ['SlicingSpec(']
if self.entire_dataset:
result.append(' Entire dataset,')
if self.by_class:
if isinstance(self.by_class, Iterable):
result.append(' Into classes %s,' % self.by_class)
elif isinstance(self.by_class, int):
result.append(' Up to class %d,' % self.by_class)
else:
result.append(' By classes,')
if self.by_percentiles:
result.append(' By percentiles,')
if self.by_classification_correctness:
result.append(' By classification correctness,')
result.append(')')
return '\n'.join(result)
class AttackType(enum.Enum):
"""An enum define attack types."""
LOGISTIC_REGRESSION = 'lr'
MULTI_LAYERED_PERCEPTRON = 'mlp'
RANDOM_FOREST = 'rf'
K_NEAREST_NEIGHBORS = 'knn'
THRESHOLD_ATTACK = 'threshold'
THRESHOLD_ENTROPY_ATTACK = 'threshold-entropy'
@property
def is_trained_attack(self):
"""Returns whether this type of attack requires training a model."""
return (self != AttackType.THRESHOLD_ATTACK) and (
self != AttackType.THRESHOLD_ENTROPY_ATTACK)
def __str__(self):
"""Returns LOGISTIC_REGRESSION instead of AttackType.LOGISTIC_REGRESSION."""
return '%s' % self.name
class PrivacyMetric(enum.Enum):
"""An enum for the supported privacy risk metrics."""
AUC = 'AUC'
ATTACKER_ADVANTAGE = 'Attacker advantage'
def __str__(self):
"""Returns 'AUC' instead of PrivacyMetric.AUC."""
return '%s' % self.value
def _is_integer_type_array(a):
return np.issubdtype(a.dtype, np.integer)
def _is_last_dim_equal(arr1, arr1_name, arr2, arr2_name):
"""Checks whether the last dimension of the arrays is the same."""
if arr1 is not None and arr2 is not None and arr1.shape[-1] != arr2.shape[-1]:
raise ValueError('%s and %s should have the same number of features.' %
(arr1_name, arr2_name))
def _is_array_one_dimensional(arr, arr_name):
"""Checks whether the array is one dimensional."""
if arr is not None and len(arr.shape) != 1:
raise ValueError('%s should be a one dimensional numpy array.' % arr_name)
def _is_np_array(arr, arr_name):
"""Checks whether array is a numpy array."""
if arr is not None and not isinstance(arr, np.ndarray):
raise ValueError('%s should be a numpy array.' % arr_name)
def _log_value(probs, small_value=1e-30):
"""Compute the log value on the probability. Clip probabilities close to 0."""
return -np.log(np.maximum(probs, small_value))
@dataclass
class AttackInputData:
"""Input data for running an attack.
This includes only the data, and not configuration.
"""
logits_train: np.ndarray = None
logits_test: np.ndarray = None
# Predicted probabilities for each class. They can be derived from logits,
# so they can be set only if logits are not explicitly provided.
probs_train: np.ndarray = None
probs_test: np.ndarray = None
# Contains ground-truth classes. Classes are assumed to be integers starting
# from 0.
labels_train: np.ndarray = None
labels_test: np.ndarray = None
# Explicitly specified loss. If provided, this is used instead of deriving
# loss from logits and labels
loss_train: np.ndarray = None
loss_test: np.ndarray = None
# Explicitly specified prediction entropy. If provided, this is used instead
# of deriving entropy from logits and labels
# (https://arxiv.org/pdf/2003.10595.pdf by Song and Mittal).
entropy_train: np.ndarray = None
entropy_test: np.ndarray = None
@property
def num_classes(self):
if self.labels_train is None or self.labels_test is None:
raise ValueError(
'Can\'t identify the number of classes as no labels were provided. '
'Please set labels_train and labels_test')
return int(max(np.max(self.labels_train), np.max(self.labels_test))) + 1
@property
def logits_or_probs_train(self):
"""Returns train logits or probs whatever is not None."""
if self.logits_train is not None:
return self.logits_train
return self.probs_train
@property
def logits_or_probs_test(self):
"""Returns test logits or probs whatever is not None."""
if self.logits_test is not None:
return self.logits_test
return self.probs_test
@staticmethod
def _get_entropy(logits: np.ndarray, true_labels: np.ndarray):
"""Computes the prediction entropy (by Song and Mittal)."""
if (np.absolute(np.sum(logits, axis=1) - 1) <= 1e-3).all():
probs = logits
else:
# Using softmax to compute probability from logits.
probs = special.softmax(logits, axis=1)
if true_labels is None:
# When not given ground truth label, we compute the
# normal prediction entropy.
# See the Equation (7) in https://arxiv.org/pdf/2003.10595.pdf
return np.sum(np.multiply(probs, _log_value(probs)), axis=1)
else:
# When given the ground truth label, we compute the
# modified prediction entropy.
# See the Equation (8) in https://arxiv.org/pdf/2003.10595.pdf
log_probs = _log_value(probs)
reverse_probs = 1 - probs
log_reverse_probs = _log_value(reverse_probs)
modified_probs = np.copy(probs)
modified_probs[range(true_labels.size),
true_labels] = reverse_probs[range(true_labels.size),
true_labels]
modified_log_probs = np.copy(log_reverse_probs)
modified_log_probs[range(true_labels.size),
true_labels] = log_probs[range(true_labels.size),
true_labels]
return np.sum(np.multiply(modified_probs, modified_log_probs), axis=1)
def get_loss_train(self):
"""Calculates (if needed) cross-entropy losses for the training set.
Returns:
Loss (or None if neither the loss nor the labels are present).
"""
if self.loss_train is None:
if self.labels_train is None:
return None
if self.logits_train is not None:
self.loss_train = utils.log_loss_from_logits(self.labels_train,
self.logits_train)
else:
self.loss_train = utils.log_loss(self.labels_train, self.probs_train)
return self.loss_train
def get_loss_test(self):
"""Calculates (if needed) cross-entropy losses for the test set.
Returns:
Loss (or None if neither the loss nor the labels are present).
"""
if self.loss_test is None:
if self.labels_test is None:
return None
if self.logits_test is not None:
self.loss_test = utils.log_loss_from_logits(self.labels_test,
self.logits_test)
else:
self.loss_test = utils.log_loss(self.labels_test, self.probs_test)
return self.loss_test
def get_entropy_train(self):
"""Calculates prediction entropy for the training set."""
if self.entropy_train is not None:
return self.entropy_train
return self._get_entropy(self.logits_train, self.labels_train)
def get_entropy_test(self):
"""Calculates prediction entropy for the test set."""
if self.entropy_test is not None:
return self.entropy_test
return self._get_entropy(self.logits_test, self.labels_test)
def get_train_size(self):
"""Returns size of the training set."""
if self.loss_train is not None:
return self.loss_train.size
if self.entropy_train is not None:
return self.entropy_train.size
return self.logits_or_probs_train.shape[0]
def get_test_size(self):
"""Returns size of the test set."""
if self.loss_test is not None:
return self.loss_test.size
if self.entropy_test is not None:
return self.entropy_test.size
return self.logits_or_probs_test.shape[0]
def validate(self):
"""Validates the inputs."""
if (self.loss_train is None) != (self.loss_test is None):
raise ValueError(
'loss_test and loss_train should both be either set or unset')
if (self.entropy_train is None) != (self.entropy_test is None):
raise ValueError(
'entropy_test and entropy_train should both be either set or unset')
if (self.logits_train is None) != (self.logits_test is None):
raise ValueError(
'logits_train and logits_test should both be either set or unset')
if (self.probs_train is None) != (self.probs_test is None):
raise ValueError(
'probs_train and probs_test should both be either set or unset')
if (self.logits_train is not None) and (self.probs_train is not None):
raise ValueError('Logits and probs can not be both set')
if (self.labels_train is None) != (self.labels_test is None):
raise ValueError(
'labels_train and labels_test should both be either set or unset')
if (self.labels_train is None and self.loss_train is None and
self.logits_train is None and self.entropy_train is None):
raise ValueError(
'At least one of labels, logits, losses or entropy should be set')
if self.labels_train is not None and not _is_integer_type_array(
self.labels_train):
raise ValueError('labels_train elements should have integer type')
if self.labels_test is not None and not _is_integer_type_array(
self.labels_test):
raise ValueError('labels_test elements should have integer type')
_is_np_array(self.logits_train, 'logits_train')
_is_np_array(self.logits_test, 'logits_test')
_is_np_array(self.probs_train, 'probs_train')
_is_np_array(self.probs_test, 'probs_test')
_is_np_array(self.labels_train, 'labels_train')
_is_np_array(self.labels_test, 'labels_test')
_is_np_array(self.loss_train, 'loss_train')
_is_np_array(self.loss_test, 'loss_test')
_is_np_array(self.entropy_train, 'entropy_train')
_is_np_array(self.entropy_test, 'entropy_test')
_is_last_dim_equal(self.logits_train, 'logits_train', self.logits_test,
'logits_test')
_is_last_dim_equal(self.probs_train, 'probs_train', self.probs_test,
'probs_test')
_is_array_one_dimensional(self.loss_train, 'loss_train')
_is_array_one_dimensional(self.loss_test, 'loss_test')
_is_array_one_dimensional(self.entropy_train, 'entropy_train')
_is_array_one_dimensional(self.entropy_test, 'entropy_test')
_is_array_one_dimensional(self.labels_train, 'labels_train')
_is_array_one_dimensional(self.labels_test, 'labels_test')
def __str__(self):
"""Return the shapes of variables that are not None."""
result = ['AttackInputData(']
_append_array_shape(self.loss_train, 'loss_train', result)
_append_array_shape(self.loss_test, 'loss_test', result)
_append_array_shape(self.entropy_train, 'entropy_train', result)
_append_array_shape(self.entropy_test, 'entropy_test', result)
_append_array_shape(self.logits_train, 'logits_train', result)
_append_array_shape(self.logits_test, 'logits_test', result)
_append_array_shape(self.probs_train, 'probs_train', result)
_append_array_shape(self.probs_test, 'probs_test', result)
_append_array_shape(self.labels_train, 'labels_train', result)
_append_array_shape(self.labels_test, 'labels_test', result)
result.append(')')
return '\n'.join(result)
def _append_array_shape(arr: np.array, arr_name: str, result):
if arr is not None:
result.append(' %s with shape: %s,' % (arr_name, arr.shape))
@dataclass
class RocCurve:
"""Represents ROC curve of a membership inference classifier."""
# Thresholds used to define points on ROC curve.
# Thresholds are not explicitly part of the curve, and are stored for
# debugging purposes.
thresholds: np.ndarray
# True positive rates based on thresholds
tpr: np.ndarray
# False positive rates based on thresholds
fpr: np.ndarray
def get_auc(self):
"""Calculates area under curve (aka AUC)."""
return metrics.auc(self.fpr, self.tpr)
def get_attacker_advantage(self):
"""Calculates membership attacker's (or adversary's) advantage.
This metric is inspired by https://arxiv.org/abs/1709.01604, specifically
by Definition 4. The difference here is that we calculate maximum advantage
over all available classifier thresholds.
Returns:
a single float number with membership attacker's advantage.
"""
return max(np.abs(self.tpr - self.fpr))
def __str__(self):
"""Returns AUC and advantage metrics."""
return '\n'.join([
'RocCurve(',
' AUC: %.2f' % self.get_auc(),
' Attacker advantage: %.2f' % self.get_attacker_advantage(), ')'
])
# (no. of training examples, no. of test examples) for the test.
DataSize = collections.namedtuple('DataSize', 'ntrain ntest')
@dataclass
class SingleAttackResult:
"""Results from running a single attack."""
# Data slice this result was calculated for.
slice_spec: SingleSliceSpec
# (no. of training examples, no. of test examples) for the test.
data_size: DataSize
attack_type: AttackType
# NOTE: roc_curve could theoretically be derived from membership scores.
# Currently, we store it explicitly since not all attack types support
# membership scores.
# TODO(b/175870479): Consider deriving ROC curve from the membership scores.
# ROC curve representing the accuracy of the attacker
roc_curve: RocCurve
# Membership score is some measure of confidence of this attacker that
# a particular sample is a member of the training set.
#
# This is NOT necessarily probability. The nature of this score depends on
# the type of attacker. Scores from different attacker types are not directly
# comparable, but can be compared in relative terms (e.g. considering order
# imposed by this measure).
#
# Membership scores for the training set samples. For a perfect attacker,
# all training samples will have higher scores than test samples.
membership_scores_train: np.ndarray = None
# Membership scores for the test set samples. For a perfect attacker, all
# test set samples will have lower scores than the training set samples.
membership_scores_test: np.ndarray = None
def get_attacker_advantage(self):
return self.roc_curve.get_attacker_advantage()
def get_auc(self):
return self.roc_curve.get_auc()
def __str__(self):
"""Returns SliceSpec, AttackType, AUC and advantage metrics."""
return '\n'.join([
'SingleAttackResult(',
' SliceSpec: %s' % str(self.slice_spec),
' DataSize: (ntrain=%d, ntest=%d)' % (self.data_size.ntrain,
self.data_size.ntest),
' AttackType: %s' % str(self.attack_type),
' AUC: %.2f' % self.get_auc(),
' Attacker advantage: %.2f' % self.get_attacker_advantage(), ')'
])
@dataclass
class SingleMembershipProbabilityResult:
"""Results from computing membership probabilities (denoted as privacy risk score in https://arxiv.org/abs/2003.10595).
this part shows how to leverage membership probabilities to perform attacks
with thresholding on them.
"""
# Data slice this result was calculated for.
slice_spec: SingleSliceSpec
train_membership_probs: np.ndarray
test_membership_probs: np.ndarray
def attack_with_varied_thresholds(self, threshold_list):
"""Performs an attack with the specified thresholds.
For each threshold value, we count how many training and test samples with
membership probabilities larger than the threshold and further compute
precision and recall values. We skip the threshold value if it is larger
than every sample's membership probability.
Args:
threshold_list: List of provided thresholds
Returns:
An array of attack results.
"""
fpr, tpr, thresholds = metrics.roc_curve(
np.concatenate((np.ones(len(self.train_membership_probs)),
np.zeros(len(self.test_membership_probs)))),
np.concatenate(
(self.train_membership_probs, self.test_membership_probs)),
drop_intermediate=False)
precision_list = []
recall_list = []
meaningful_threshold_list = []
max_prob = max(self.train_membership_probs.max(),
self.test_membership_probs.max())
for threshold in threshold_list:
if threshold <= max_prob:
idx = np.argwhere(thresholds >= threshold)[-1][0]
meaningful_threshold_list.append(threshold)
precision_list.append(tpr[idx] / (tpr[idx] + fpr[idx]))
recall_list.append(tpr[idx])
return np.array(meaningful_threshold_list), np.array(
precision_list), np.array(recall_list)
def collect_results(self, threshold_list, return_roc_results=True):
"""The membership probability (from 0 to 1) represents each sample's probability of being in the training set.
Usually, we choose a list of threshold values from 0.5 (uncertain of
training or test) to 1 (100% certain of training)
to compute corresponding attack precision and recall.
Args:
threshold_list: List of provided thresholds
return_roc_results: Whether to return ROC results
Returns:
Summary string.
"""
meaningful_threshold_list, precision_list, recall_list = self.attack_with_varied_thresholds(
threshold_list)
summary = []
summary.append('\nMembership probability analysis over slice: \"%s\"' %
str(self.slice_spec))
for i in range(len(meaningful_threshold_list)):
summary.append(
' with %.4f as the threshold on membership probability, the precision-recall pair is (%.4f, %.4f)'
% (meaningful_threshold_list[i], precision_list[i], recall_list[i]))
if return_roc_results:
fpr, tpr, thresholds = metrics.roc_curve(
np.concatenate((np.ones(len(self.train_membership_probs)),
np.zeros(len(self.test_membership_probs)))),
np.concatenate(
(self.train_membership_probs, self.test_membership_probs)))
roc_curve = RocCurve(tpr=tpr, fpr=fpr, thresholds=thresholds)
summary.append(
' thresholding on membership probability achieved an AUC of %.2f' %
(roc_curve.get_auc()))
summary.append(
' thresholding on membership probability achieved an advantage of %.2f'
% (roc_curve.get_attacker_advantage()))
return summary
@dataclass
class MembershipProbabilityResults:
"""Membership probability results from multiple data slices."""
membership_prob_results: Iterable[SingleMembershipProbabilityResult]
def summary(self, threshold_list):
"""Returns the summary of membership probability analyses on all slices."""
summary = []
for single_result in self.membership_prob_results:
single_summary = single_result.collect_results(threshold_list)
summary.extend(single_summary)
return '\n'.join(summary)
@dataclass
class PrivacyReportMetadata:
"""Metadata about the evaluated model.
Used to create a privacy report based on AttackResults.
"""
accuracy_train: float = None
accuracy_test: float = None
loss_train: float = None
loss_test: float = None
model_variant_label: str = 'Default model variant'
epoch_num: int = None
class AttackResultsDFColumns(enum.Enum):
"""Columns for the Pandas DataFrame that stores AttackResults metrics."""
SLICE_FEATURE = 'slice feature'
SLICE_VALUE = 'slice value'
DATA_SIZE_TRAIN = 'train size'
DATA_SIZE_TEST = 'test size'
ATTACK_TYPE = 'attack type'
def __str__(self):
"""Returns 'slice value' instead of AttackResultsDFColumns.SLICE_VALUE."""
return '%s' % self.value
@dataclass
class AttackResults:
"""Results from running multiple attacks."""
single_attack_results: Iterable[SingleAttackResult]
privacy_report_metadata: PrivacyReportMetadata = None
def calculate_pd_dataframe(self):
"""Returns all metrics as a Pandas DataFrame."""
slice_features = []
slice_values = []
data_size_train = []
data_size_test = []
attack_types = []
advantages = []
aucs = []
for attack_result in self.single_attack_results:
slice_spec = attack_result.slice_spec
if slice_spec.entire_dataset:
slice_feature, slice_value = str(slice_spec), ''
else:
slice_feature, slice_value = slice_spec.feature.value, slice_spec.value
slice_features.append(str(slice_feature))
slice_values.append(str(slice_value))
data_size_train.append(attack_result.data_size.ntrain)
data_size_test.append(attack_result.data_size.ntest)
attack_types.append(str(attack_result.attack_type))
advantages.append(float(attack_result.get_attacker_advantage()))
aucs.append(float(attack_result.get_auc()))
df = pd.DataFrame({
str(AttackResultsDFColumns.SLICE_FEATURE): slice_features,
str(AttackResultsDFColumns.SLICE_VALUE): slice_values,
str(AttackResultsDFColumns.DATA_SIZE_TRAIN): data_size_train,
str(AttackResultsDFColumns.DATA_SIZE_TEST): data_size_test,
str(AttackResultsDFColumns.ATTACK_TYPE): attack_types,
str(PrivacyMetric.ATTACKER_ADVANTAGE): advantages,
str(PrivacyMetric.AUC): aucs
})
return df
def summary(self, by_slices=False) -> str:
"""Provides a summary of the metrics.
The summary provides the best-performing attacks for each requested data
slice.
Args:
by_slices : whether to prepare a per-slice summary.
Returns:
A string with a summary of all the metrics.
"""
summary = []
# Summary over all slices
max_auc_result_all = self.get_result_with_max_attacker_advantage()
summary.append('Best-performing attacks over all slices')
summary.append(
' %s (with %d training and %d test examples) achieved an AUC of %.2f on slice %s'
% (max_auc_result_all.attack_type,
max_auc_result_all.data_size.ntrain,
max_auc_result_all.data_size.ntest,
max_auc_result_all.get_auc(),
max_auc_result_all.slice_spec))
max_advantage_result_all = self.get_result_with_max_attacker_advantage()
summary.append(
' %s (with %d training and %d test examples) achieved an advantage of %.2f on slice %s'
% (max_advantage_result_all.attack_type,
max_advantage_result_all.data_size.ntrain,
max_advantage_result_all.data_size.ntest,
max_advantage_result_all.get_attacker_advantage(),
max_advantage_result_all.slice_spec))
slice_dict = self._group_results_by_slice()
if by_slices and len(slice_dict.keys()) > 1:
for slice_str in slice_dict:
results = slice_dict[slice_str]
summary.append('\nBest-performing attacks over slice: \"%s\"' %
slice_str)
max_auc_result = results.get_result_with_max_auc()
summary.append(
' %s (with %d training and %d test examples) achieved an AUC of %.2f'
% (max_auc_result.attack_type,
max_auc_result.data_size.ntrain,
max_auc_result.data_size.ntest,
max_auc_result.get_auc()))
max_advantage_result = results.get_result_with_max_attacker_advantage()
summary.append(
' %s (with %d training and %d test examples) achieved an advantage of %.2f'
% (max_advantage_result.attack_type,
max_advantage_result.data_size.ntrain,
max_auc_result.data_size.ntest,
max_advantage_result.get_attacker_advantage()))
return '\n'.join(summary)
def _group_results_by_slice(self):
"""Groups AttackResults into a dictionary keyed by the slice."""
slice_dict = {}
for attack_result in self.single_attack_results:
slice_str = str(attack_result.slice_spec)
if slice_str not in slice_dict:
slice_dict[slice_str] = AttackResults([])
slice_dict[slice_str].single_attack_results.append(attack_result)
return slice_dict
def get_result_with_max_auc(self) -> SingleAttackResult:
"""Get the result with maximum AUC for all attacks and slices."""
aucs = [result.get_auc() for result in self.single_attack_results]
if min(aucs) < 0.4:
print('Suspiciously low AUC detected: %.2f. ' +
'There might be a bug in the classifier' % min(aucs))
return self.single_attack_results[np.argmax(aucs)]
def get_result_with_max_attacker_advantage(self) -> SingleAttackResult:
"""Get the result with maximum advantage for all attacks and slices."""
return self.single_attack_results[np.argmax([
result.get_attacker_advantage() for result in self.single_attack_results
])]
def save(self, filepath):
"""Saves self to a pickle file."""
with open(filepath, 'wb') as out:
pickle.dump(self, out)
@classmethod
def load(cls, filepath):
"""Loads AttackResults from a pickle file."""
with open(filepath, 'rb') as inp:
return pickle.load(inp)
@dataclass
class AttackResultsCollection:
"""A collection of AttackResults."""
attack_results_list: Iterable[AttackResults]
def append(self, attack_results: AttackResults):
self.attack_results_list.append(attack_results)
def save(self, dirname):
"""Saves self to a pickle file."""
for i, attack_results in enumerate(self.attack_results_list):
filepath = os.path.join(dirname,
_get_attack_results_filename(attack_results, i))
attack_results.save(filepath)
@classmethod
def load(cls, dirname):
"""Loads AttackResultsCollection from all files in a directory."""
loaded_collection = AttackResultsCollection([])
for filepath in sorted(glob.glob('%s/*' % dirname)):
with open(filepath, 'rb') as inp:
loaded_collection.attack_results_list.append(pickle.load(inp))
return loaded_collection
def _get_attack_results_filename(attack_results: AttackResults, index: int):
"""Creates a filename for a specific set of AttackResults."""
metadata = attack_results.privacy_report_metadata
if metadata is not None:
return '%s_%s_epoch_%s.pickle' % (metadata.model_variant_label, index,
metadata.epoch_num)
return '%s.pickle' % index
def get_flattened_attack_metrics(results: AttackResults):
"""Get flattened attack metrics.
Args:
results: membership inference attack results.
Returns:
types: a list of attack types
slices: a list of slices
attack_metrics: a list of metric names
values: a list of metric values, i-th element correspond to properties[i]
"""
types = []
slices = []
attack_metrics = []
values = []
for attack_result in results.single_attack_results:
types += [str(attack_result.attack_type)] * 2
slices += [str(attack_result.slice_spec)] * 2
attack_metrics += ['adv', 'auc']
values += [float(attack_result.get_attacker_advantage()),
float(attack_result.get_auc())]
return types, slices, attack_metrics, values
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import * # pylint: disable=wildcard-import

View file

@ -13,136 +13,6 @@
# limitations under the License.
# Lint as: python3
"""Specifying and creating AttackInputData slices."""
"""Moved to privacy_attack/membership_inference_attack."""
import collections
import copy
from typing import List
import numpy as np
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import AttackInputData
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import SingleSliceSpec
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import SlicingFeature
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import SlicingSpec
def _slice_if_not_none(a, idx):
return None if a is None else a[idx]
def _slice_data_by_indices(data: AttackInputData, idx_train,
idx_test) -> AttackInputData:
"""Slices train fields with with idx_train and test fields with and idx_test."""
result = AttackInputData()
# Slice train data.
result.logits_train = _slice_if_not_none(data.logits_train, idx_train)
result.probs_train = _slice_if_not_none(data.probs_train, idx_train)
result.labels_train = _slice_if_not_none(data.labels_train, idx_train)
result.loss_train = _slice_if_not_none(data.loss_train, idx_train)
result.entropy_train = _slice_if_not_none(data.entropy_train, idx_train)
# Slice test data.
result.logits_test = _slice_if_not_none(data.logits_test, idx_test)
result.probs_test = _slice_if_not_none(data.probs_test, idx_test)
result.labels_test = _slice_if_not_none(data.labels_test, idx_test)
result.loss_test = _slice_if_not_none(data.loss_test, idx_test)
result.entropy_test = _slice_if_not_none(data.entropy_test, idx_test)
return result
def _slice_by_class(data: AttackInputData, class_value: int) -> AttackInputData:
idx_train = data.labels_train == class_value
idx_test = data.labels_test == class_value
return _slice_data_by_indices(data, idx_train, idx_test)
def _slice_by_percentiles(data: AttackInputData, from_percentile: float,
to_percentile: float):
"""Slices samples by loss percentiles."""
# Find from_percentile and to_percentile percentiles in losses.
loss_train = data.get_loss_train()
loss_test = data.get_loss_test()
losses = np.concatenate((loss_train, loss_test))
from_loss = np.percentile(losses, from_percentile)
to_loss = np.percentile(losses, to_percentile)
idx_train = (from_loss <= loss_train) & (loss_train <= to_loss)
idx_test = (from_loss <= loss_test) & (loss_test <= to_loss)
return _slice_data_by_indices(data, idx_train, idx_test)
def _indices_by_classification(logits_or_probs, labels, correctly_classified):
idx_correct = labels == np.argmax(logits_or_probs, axis=1)
return idx_correct if correctly_classified else np.invert(idx_correct)
def _slice_by_classification_correctness(data: AttackInputData,
correctly_classified: bool):
idx_train = _indices_by_classification(data.logits_or_probs_train,
data.labels_train,
correctly_classified)
idx_test = _indices_by_classification(data.logits_or_probs_test,
data.labels_test, correctly_classified)
return _slice_data_by_indices(data, idx_train, idx_test)
def get_single_slice_specs(slicing_spec: SlicingSpec,
num_classes: int = None) -> List[SingleSliceSpec]:
"""Returns slices of data according to slicing_spec."""
result = []
if slicing_spec.entire_dataset:
result.append(SingleSliceSpec())
# Create slices by class.
by_class = slicing_spec.by_class
if isinstance(by_class, bool):
if by_class:
assert num_classes, "When by_class == True, num_classes should be given."
assert 0 <= num_classes <= 1000, (
f"Too much classes for slicing by classes. "
f"Found {num_classes}.")
for c in range(num_classes):
result.append(SingleSliceSpec(SlicingFeature.CLASS, c))
elif isinstance(by_class, int):
result.append(SingleSliceSpec(SlicingFeature.CLASS, by_class))
elif isinstance(by_class, collections.Iterable):
for c in by_class:
result.append(SingleSliceSpec(SlicingFeature.CLASS, c))
# Create slices by percentiles
if slicing_spec.by_percentiles:
for percent in range(0, 100, 10):
result.append(
SingleSliceSpec(SlicingFeature.PERCENTILE, (percent, percent + 10)))
# Create slices by correctness of the classifications.
if slicing_spec.by_classification_correctness:
result.append(SingleSliceSpec(SlicingFeature.CORRECTLY_CLASSIFIED, True))
result.append(SingleSliceSpec(SlicingFeature.CORRECTLY_CLASSIFIED, False))
return result
def get_slice(data: AttackInputData,
slice_spec: SingleSliceSpec) -> AttackInputData:
"""Returns a single slice of data according to slice_spec."""
if slice_spec.entire_dataset:
data_slice = copy.copy(data)
elif slice_spec.feature == SlicingFeature.CLASS:
data_slice = _slice_by_class(data, slice_spec.value)
elif slice_spec.feature == SlicingFeature.PERCENTILE:
from_percentile, to_percentile = slice_spec.value
data_slice = _slice_by_percentiles(data, from_percentile, to_percentile)
elif slice_spec.feature == SlicingFeature.CORRECTLY_CLASSIFIED:
data_slice = _slice_by_classification_correctness(data, slice_spec.value)
else:
raise ValueError('Unknown slice spec feature "%s"' % slice_spec.feature)
data_slice.slice_spec = slice_spec
return data_slice
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.dataset_slicing import * # pylint: disable=wildcard-import

View file

@ -13,129 +13,6 @@
# limitations under the License.
# Lint as: python3
"""A callback and a function in keras for membership inference attack."""
"""Moved to privacy_attack/membership_inference_attack."""
import os
from typing import Iterable
from absl import logging
import tensorflow as tf
from tensorflow_privacy.privacy.membership_inference_attack import membership_inference_attack as mia
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import AttackInputData
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import AttackType
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import get_flattened_attack_metrics
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import SlicingSpec
from tensorflow_privacy.privacy.membership_inference_attack.utils import log_loss
from tensorflow_privacy.privacy.membership_inference_attack.utils_tensorboard import write_results_to_tensorboard_tf2 as write_results_to_tensorboard
def calculate_losses(model, data, labels):
"""Calculate losses of model prediction on data, provided true labels.
Args:
model: model to make prediction
data: samples
labels: true labels of samples (integer valued)
Returns:
preds: probability vector of each sample
loss: cross entropy loss of each sample
"""
pred = model.predict(data)
loss = log_loss(labels, pred)
return pred, loss
class MembershipInferenceCallback(tf.keras.callbacks.Callback):
"""Callback to perform membership inference attack on epoch end."""
def __init__(
self,
in_train, out_train,
slicing_spec: SlicingSpec = None,
attack_types: Iterable[AttackType] = (AttackType.THRESHOLD_ATTACK,),
tensorboard_dir=None,
tensorboard_merge_classifiers=False):
"""Initalizes the callback.
Args:
in_train: (in_training samples, in_training labels)
out_train: (out_training samples, out_training labels)
slicing_spec: slicing specification of the attack
attack_types: a list of attacks, each of type AttackType
tensorboard_dir: directory for tensorboard summary
tensorboard_merge_classifiers: if true, plot different classifiers with
the same slicing_spec and metric in the same figure
"""
self._in_train_data, self._in_train_labels = in_train
self._out_train_data, self._out_train_labels = out_train
self._slicing_spec = slicing_spec
self._attack_types = attack_types
self._tensorboard_merge_classifiers = tensorboard_merge_classifiers
if tensorboard_dir:
if tensorboard_merge_classifiers:
self._writers = {}
for attack_type in attack_types:
self._writers[attack_type.name] = tf.summary.create_file_writer(
os.path.join(tensorboard_dir, 'MI', attack_type.name))
else:
self._writers = tf.summary.create_file_writer(
os.path.join(tensorboard_dir, 'MI'))
logging.info('Will write to tensorboard.')
else:
self._writers = None
def on_epoch_end(self, epoch, logs=None):
results = run_attack_on_keras_model(
self.model,
(self._in_train_data, self._in_train_labels),
(self._out_train_data, self._out_train_labels),
self._slicing_spec,
self._attack_types)
logging.info(results)
att_types, att_slices, att_metrics, att_values = get_flattened_attack_metrics(
results)
print('Attack result:')
print('\n'.join([' %s: %.4f' % (', '.join([s, t, m]), v) for t, s, m, v in
zip(att_types, att_slices, att_metrics, att_values)]))
# Write to tensorboard if tensorboard_dir is specified
if self._writers is not None:
write_results_to_tensorboard(results, self._writers, epoch,
self._tensorboard_merge_classifiers)
def run_attack_on_keras_model(
model, in_train, out_train,
slicing_spec: SlicingSpec = None,
attack_types: Iterable[AttackType] = (AttackType.THRESHOLD_ATTACK,)):
"""Performs the attack on a trained model.
Args:
model: model to be tested
in_train: a (in_training samples, in_training labels) tuple
out_train: a (out_training samples, out_training labels) tuple
slicing_spec: slicing specification of the attack
attack_types: a list of attacks, each of type AttackType
Returns:
Results of the attack
"""
in_train_data, in_train_labels = in_train
out_train_data, out_train_labels = out_train
# Compute predictions and losses
in_train_pred, in_train_loss = calculate_losses(model, in_train_data,
in_train_labels)
out_train_pred, out_train_loss = calculate_losses(model, out_train_data,
out_train_labels)
attack_input = AttackInputData(
logits_train=in_train_pred, logits_test=out_train_pred,
labels_train=in_train_labels, labels_test=out_train_labels,
loss_train=in_train_loss, loss_test=out_train_loss
)
results = mia.run_attacks(attack_input,
slicing_spec=slicing_spec,
attack_types=attack_types)
return results
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.keras_evaluation import * # pylint: disable=wildcard-import

View file

@ -13,320 +13,6 @@
# limitations under the License.
# Lint as: python3
"""Code that runs membership inference attacks based on the model outputs.
"""Moved to privacy_attack/membership_inference_attack."""
This file belongs to the new API for membership inference attacks. This file
will be renamed to membership_inference_attack.py after the old API is removed.
"""
from typing import Iterable
import numpy as np
from sklearn import metrics
from tensorflow_privacy.privacy.membership_inference_attack import models
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import AttackInputData
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import AttackResults
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import AttackType
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import DataSize
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import MembershipProbabilityResults
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import PrivacyReportMetadata
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import RocCurve
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import SingleAttackResult
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import SingleMembershipProbabilityResult
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import SingleSliceSpec
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import SlicingSpec
from tensorflow_privacy.privacy.membership_inference_attack.dataset_slicing import get_single_slice_specs
from tensorflow_privacy.privacy.membership_inference_attack.dataset_slicing import get_slice
def _get_slice_spec(data: AttackInputData) -> SingleSliceSpec:
if hasattr(data, 'slice_spec'):
return data.slice_spec
return SingleSliceSpec()
def _run_trained_attack(attack_input: AttackInputData,
attack_type: AttackType,
balance_attacker_training: bool = True):
"""Classification attack done by ML models."""
attacker = None
if attack_type == AttackType.LOGISTIC_REGRESSION:
attacker = models.LogisticRegressionAttacker()
elif attack_type == AttackType.MULTI_LAYERED_PERCEPTRON:
attacker = models.MultilayerPerceptronAttacker()
elif attack_type == AttackType.RANDOM_FOREST:
attacker = models.RandomForestAttacker()
elif attack_type == AttackType.K_NEAREST_NEIGHBORS:
attacker = models.KNearestNeighborsAttacker()
else:
raise NotImplementedError('Attack type %s not implemented yet.' %
attack_type)
prepared_attacker_data = models.create_attacker_data(
attack_input, balance=balance_attacker_training)
attacker.train_model(prepared_attacker_data.features_train,
prepared_attacker_data.is_training_labels_train)
# Run the attacker on (permuted) test examples.
predictions_test = attacker.predict(prepared_attacker_data.features_test)
# Generate ROC curves with predictions.
fpr, tpr, thresholds = metrics.roc_curve(
prepared_attacker_data.is_training_labels_test, predictions_test)
roc_curve = RocCurve(tpr=tpr, fpr=fpr, thresholds=thresholds)
# NOTE: In the current setup we can't obtain membership scores for all
# samples, since some of them were used to train the attacker. This can be
# fixed by training several attackers to ensure each sample was left out
# in exactly one attacker (basically, this means performing cross-validation).
# TODO(b/175870479): Implement membership scores for predicted attackers.
return SingleAttackResult(
slice_spec=_get_slice_spec(attack_input),
data_size=prepared_attacker_data.data_size,
attack_type=attack_type,
roc_curve=roc_curve)
def _run_threshold_attack(attack_input: AttackInputData):
"""Runs a threshold attack on loss."""
ntrain, ntest = attack_input.get_train_size(), attack_input.get_test_size()
loss_train = attack_input.get_loss_train()
loss_test = attack_input.get_loss_test()
if loss_train is None or loss_test is None:
raise ValueError('Not possible to run threshold attack without losses.')
fpr, tpr, thresholds = metrics.roc_curve(
np.concatenate((np.zeros(ntrain), np.ones(ntest))),
np.concatenate((loss_train, loss_test)))
roc_curve = RocCurve(tpr=tpr, fpr=fpr, thresholds=thresholds)
return SingleAttackResult(
slice_spec=_get_slice_spec(attack_input),
data_size=DataSize(ntrain=ntrain, ntest=ntest),
attack_type=AttackType.THRESHOLD_ATTACK,
membership_scores_train=-attack_input.get_loss_train(),
membership_scores_test=-attack_input.get_loss_test(),
roc_curve=roc_curve)
def _run_threshold_entropy_attack(attack_input: AttackInputData):
ntrain, ntest = attack_input.get_train_size(), attack_input.get_test_size()
fpr, tpr, thresholds = metrics.roc_curve(
np.concatenate((np.zeros(ntrain), np.ones(ntest))),
np.concatenate(
(attack_input.get_entropy_train(), attack_input.get_entropy_test())))
roc_curve = RocCurve(tpr=tpr, fpr=fpr, thresholds=thresholds)
return SingleAttackResult(
slice_spec=_get_slice_spec(attack_input),
data_size=DataSize(ntrain=ntrain, ntest=ntest),
attack_type=AttackType.THRESHOLD_ENTROPY_ATTACK,
membership_scores_train=-attack_input.get_entropy_train(),
membership_scores_test=-attack_input.get_entropy_test(),
roc_curve=roc_curve)
def _run_attack(attack_input: AttackInputData,
attack_type: AttackType,
balance_attacker_training: bool = True,
min_num_samples: int = 1):
"""Runs membership inference attacks for specified input and type.
Args:
attack_input: input data for running an attack
attack_type: the attack to run
balance_attacker_training: Whether the training and test sets for the
membership inference attacker should have a balanced (roughly equal)
number of samples from the training and test sets used to develop
the model under attack.
min_num_samples: minimum number of examples in either training or test data.
Returns:
the attack result.
"""
attack_input.validate()
if min(attack_input.get_train_size(),
attack_input.get_test_size()) < min_num_samples:
return None
if attack_type.is_trained_attack:
return _run_trained_attack(attack_input, attack_type,
balance_attacker_training)
if attack_type == AttackType.THRESHOLD_ENTROPY_ATTACK:
return _run_threshold_entropy_attack(attack_input)
return _run_threshold_attack(attack_input)
def run_attacks(attack_input: AttackInputData,
slicing_spec: SlicingSpec = None,
attack_types: Iterable[AttackType] = (
AttackType.THRESHOLD_ATTACK,),
privacy_report_metadata: PrivacyReportMetadata = None,
balance_attacker_training: bool = True,
min_num_samples: int = 1) -> AttackResults:
"""Runs membership inference attacks on a classification model.
It runs attacks specified by attack_types on each attack_input slice which is
specified by slicing_spec.
Args:
attack_input: input data for running an attack
slicing_spec: specifies attack_input slices to run attack on
attack_types: attacks to run
privacy_report_metadata: the metadata of the model under attack.
balance_attacker_training: Whether the training and test sets for the
membership inference attacker should have a balanced (roughly equal)
number of samples from the training and test sets used to develop
the model under attack.
min_num_samples: minimum number of examples in either training or test data.
Returns:
the attack result.
"""
attack_input.validate()
attack_results = []
if slicing_spec is None:
slicing_spec = SlicingSpec(entire_dataset=True)
num_classes = None
if slicing_spec.by_class:
num_classes = attack_input.num_classes
input_slice_specs = get_single_slice_specs(slicing_spec, num_classes)
for single_slice_spec in input_slice_specs:
attack_input_slice = get_slice(attack_input, single_slice_spec)
for attack_type in attack_types:
attack_result = _run_attack(attack_input_slice, attack_type,
balance_attacker_training,
min_num_samples)
if attack_result is not None:
attack_results.append(attack_result)
privacy_report_metadata = _compute_missing_privacy_report_metadata(
privacy_report_metadata, attack_input)
return AttackResults(
single_attack_results=attack_results,
privacy_report_metadata=privacy_report_metadata)
def _compute_membership_probability(
attack_input: AttackInputData,
num_bins: int = 15) -> SingleMembershipProbabilityResult:
"""Computes each individual point's likelihood of being a member (denoted as privacy risk score in https://arxiv.org/abs/2003.10595).
For an individual sample, its privacy risk score is computed as the posterior
probability of being in the training set
after observing its prediction output by the target machine learning model.
Args:
attack_input: input data for compute membership probability
num_bins: the number of bins used to compute the training/test histogram
Returns:
membership probability results
"""
# Uses the provided loss or entropy. Otherwise computes the loss.
if attack_input.loss_train is not None and attack_input.loss_test is not None:
train_values = attack_input.loss_train
test_values = attack_input.loss_test
elif attack_input.entropy_train is not None and attack_input.entropy_test is not None:
train_values = attack_input.entropy_train
test_values = attack_input.entropy_test
else:
train_values = attack_input.get_loss_train()
test_values = attack_input.get_loss_test()
# Compute the histogram in the log scale
small_value = 1e-10
train_values = np.maximum(train_values, small_value)
test_values = np.maximum(test_values, small_value)
min_value = min(train_values.min(), test_values.min())
max_value = max(train_values.max(), test_values.max())
bins_hist = np.logspace(
np.log10(min_value), np.log10(max_value), num_bins + 1)
train_hist, _ = np.histogram(train_values, bins=bins_hist)
train_hist = train_hist / (len(train_values) + 0.0)
train_hist_indices = np.fmin(
np.digitize(train_values, bins=bins_hist), num_bins) - 1
test_hist, _ = np.histogram(test_values, bins=bins_hist)
test_hist = test_hist / (len(test_values) + 0.0)
test_hist_indices = np.fmin(
np.digitize(test_values, bins=bins_hist), num_bins) - 1
combined_hist = train_hist + test_hist
combined_hist[combined_hist == 0] = small_value
membership_prob_list = train_hist / (combined_hist + 0.0)
train_membership_probs = membership_prob_list[train_hist_indices]
test_membership_probs = membership_prob_list[test_hist_indices]
return SingleMembershipProbabilityResult(
slice_spec=_get_slice_spec(attack_input),
train_membership_probs=train_membership_probs,
test_membership_probs=test_membership_probs)
def run_membership_probability_analysis(
attack_input: AttackInputData,
slicing_spec: SlicingSpec = None) -> MembershipProbabilityResults:
"""Perform membership probability analysis on all given slice types.
Args:
attack_input: input data for compute membership probabilities
slicing_spec: specifies attack_input slices
Returns:
the membership probability results.
"""
attack_input.validate()
membership_prob_results = []
if slicing_spec is None:
slicing_spec = SlicingSpec(entire_dataset=True)
num_classes = None
if slicing_spec.by_class:
num_classes = attack_input.num_classes
input_slice_specs = get_single_slice_specs(slicing_spec, num_classes)
for single_slice_spec in input_slice_specs:
attack_input_slice = get_slice(attack_input, single_slice_spec)
membership_prob_results.append(
_compute_membership_probability(attack_input_slice))
return MembershipProbabilityResults(
membership_prob_results=membership_prob_results)
def _compute_missing_privacy_report_metadata(
metadata: PrivacyReportMetadata,
attack_input: AttackInputData) -> PrivacyReportMetadata:
"""Populates metadata fields if they are missing."""
if metadata is None:
metadata = PrivacyReportMetadata()
if metadata.accuracy_train is None:
metadata.accuracy_train = _get_accuracy(attack_input.logits_train,
attack_input.labels_train)
if metadata.accuracy_test is None:
metadata.accuracy_test = _get_accuracy(attack_input.logits_test,
attack_input.labels_test)
loss_train = attack_input.get_loss_train()
loss_test = attack_input.get_loss_test()
if metadata.loss_train is None and loss_train is not None:
metadata.loss_train = np.average(loss_train)
if metadata.loss_test is None and loss_test is not None:
metadata.loss_test = np.average(loss_test)
return metadata
def _get_accuracy(logits, labels):
"""Computes the accuracy if it is missing."""
if logits is None or labels is None:
return None
return metrics.accuracy_score(labels, np.argmax(logits, axis=1))
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.membership_inference_attack import * # pylint: disable=wildcard-import

View file

@ -13,198 +13,6 @@
# limitations under the License.
# Lint as: python3
"""Trained models for membership inference attacks."""
"""Moved to privacy_attack/membership_inference_attack."""
from dataclasses import dataclass
import numpy as np
from sklearn import ensemble
from sklearn import linear_model
from sklearn import model_selection
from sklearn import neighbors
from sklearn import neural_network
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import AttackInputData
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import DataSize
@dataclass
class AttackerData:
"""Input data for an ML classifier attack.
This includes only the data, and not configuration.
"""
features_train: np.ndarray = None
# element-wise boolean array denoting if the example was part of training.
is_training_labels_train: np.ndarray = None
features_test: np.ndarray = None
# element-wise boolean array denoting if the example was part of training.
is_training_labels_test: np.ndarray = None
data_size: DataSize = None
def create_attacker_data(attack_input_data: AttackInputData,
test_fraction: float = 0.25,
balance: bool = True) -> AttackerData:
"""Prepare AttackInputData to train ML attackers.
Combines logits and losses and performs a random train-test split.
Args:
attack_input_data: Original AttackInputData
test_fraction: Fraction of the dataset to include in the test split.
balance: Whether the training and test sets for the membership inference
attacker should have a balanced (roughly equal) number of samples
from the training and test sets used to develop the model
under attack.
Returns:
AttackerData.
"""
attack_input_train = _column_stack(attack_input_data.logits_or_probs_train,
attack_input_data.get_loss_train())
attack_input_test = _column_stack(attack_input_data.logits_or_probs_test,
attack_input_data.get_loss_test())
if balance:
min_size = min(attack_input_data.get_train_size(),
attack_input_data.get_test_size())
attack_input_train = _sample_multidimensional_array(attack_input_train,
min_size)
attack_input_test = _sample_multidimensional_array(attack_input_test,
min_size)
ntrain, ntest = attack_input_train.shape[0], attack_input_test.shape[0]
features_all = np.concatenate((attack_input_train, attack_input_test))
labels_all = np.concatenate(((np.zeros(ntrain)), (np.ones(ntest))))
# Perform a train-test split
features_train, features_test, is_training_labels_train, is_training_labels_test = model_selection.train_test_split(
features_all, labels_all, test_size=test_fraction, stratify=labels_all)
return AttackerData(features_train, is_training_labels_train, features_test,
is_training_labels_test,
DataSize(ntrain=ntrain, ntest=ntest))
def _sample_multidimensional_array(array, size):
indices = np.random.choice(len(array), size, replace=False)
return array[indices]
def _column_stack(logits, loss):
"""Stacks logits and losses.
In case that only one exists, returns that one.
Args:
logits: logits array
loss: loss array
Returns:
stacked logits and losses (or only one if both do not exist).
"""
if logits is None:
return np.expand_dims(loss, axis=-1)
if loss is None:
return logits
return np.column_stack((logits, loss))
class TrainedAttacker:
"""Base class for training attack models."""
model = None
def train_model(self, input_features, is_training_labels):
"""Train an attacker model.
This is trained on examples from train and test datasets.
Args:
input_features : array-like of shape (n_samples, n_features) Training
vector, where n_samples is the number of samples and n_features is the
number of features.
is_training_labels : a vector of booleans of shape (n_samples, )
representing whether the sample is in the training set or not.
"""
raise NotImplementedError()
def predict(self, input_features):
"""Predicts whether input_features belongs to train or test.
Args:
input_features : A vector of features with the same semantics as x_train
passed to train_model.
Returns:
An array of probabilities denoting whether the example belongs to test.
"""
if self.model is None:
raise AssertionError(
'Model not trained yet. Please call train_model first.')
return self.model.predict_proba(input_features)[:, 1]
class LogisticRegressionAttacker(TrainedAttacker):
"""Logistic regression attacker."""
def train_model(self, input_features, is_training_labels):
lr = linear_model.LogisticRegression(solver='lbfgs')
param_grid = {
'C': np.logspace(-4, 2, 10),
}
model = model_selection.GridSearchCV(
lr, param_grid=param_grid, cv=3, n_jobs=1, verbose=0)
model.fit(input_features, is_training_labels)
self.model = model
class MultilayerPerceptronAttacker(TrainedAttacker):
"""Multilayer perceptron attacker."""
def train_model(self, input_features, is_training_labels):
mlp_model = neural_network.MLPClassifier()
param_grid = {
'hidden_layer_sizes': [(64,), (32, 32)],
'solver': ['adam'],
'alpha': [0.0001, 0.001, 0.01],
}
n_jobs = -1
model = model_selection.GridSearchCV(
mlp_model, param_grid=param_grid, cv=3, n_jobs=n_jobs, verbose=0)
model.fit(input_features, is_training_labels)
self.model = model
class RandomForestAttacker(TrainedAttacker):
"""Random forest attacker."""
def train_model(self, input_features, is_training_labels):
"""Setup a random forest pipeline with cross-validation."""
rf_model = ensemble.RandomForestClassifier()
param_grid = {
'n_estimators': [100],
'max_features': ['auto', 'sqrt'],
'max_depth': [5, 10, 20, None],
'min_samples_split': [2, 5, 10],
'min_samples_leaf': [1, 2, 4]
}
n_jobs = -1
model = model_selection.GridSearchCV(
rf_model, param_grid=param_grid, cv=3, n_jobs=n_jobs, verbose=0)
model.fit(input_features, is_training_labels)
self.model = model
class KNearestNeighborsAttacker(TrainedAttacker):
"""K nearest neighbor attacker."""
def train_model(self, input_features, is_training_labels):
knn_model = neighbors.KNeighborsClassifier()
param_grid = {
'n_neighbors': [3, 5, 7],
}
model = model_selection.GridSearchCV(
knn_model, param_grid=param_grid, cv=3, n_jobs=1, verbose=0)
model.fit(input_features, is_training_labels)
self.model = model
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.models import * # pylint: disable=wildcard-import

View file

@ -13,74 +13,6 @@
# limitations under the License.
# Lint as: python3
"""Plotting functionality for membership inference attack analysis.
"""Moved to privacy_attack/membership_inference_attack."""
Functions to plot ROC curves and histograms as well as functionality to store
figures to colossus.
"""
from typing import Text, Iterable
import matplotlib.pyplot as plt
import numpy as np
from sklearn import metrics
def save_plot(figure: plt.Figure, path: Text, outformat='png'):
"""Store a figure to disk."""
if path is not None:
with open(path, 'wb') as f:
figure.savefig(f, bbox_inches='tight', format=outformat)
plt.close(figure)
def plot_curve_with_area(x: Iterable[float],
y: Iterable[float],
xlabel: Text = 'x',
ylabel: Text = 'y') -> plt.Figure:
"""Plot the curve defined by inputs and the area under the curve.
All entries of x and y are required to lie between 0 and 1.
For example, x could be recall and y precision, or x is fpr and y is tpr.
Args:
x: Values on x-axis (1d)
y: Values on y-axis (must be same length as x)
xlabel: Label for x axis
ylabel: Label for y axis
Returns:
The matplotlib figure handle
"""
fig = plt.figure()
plt.plot([0, 1], [0, 1], 'k', lw=1.0)
plt.plot(x, y, lw=2, label=f'AUC: {metrics.auc(x, y):.3f}')
plt.xlabel(xlabel)
plt.ylabel(ylabel)
plt.legend()
return fig
def plot_histograms(train: Iterable[float],
test: Iterable[float],
xlabel: Text = 'x',
thresh: float = None) -> plt.Figure:
"""Plot histograms of training versus test metrics."""
xmin = min(np.min(train), np.min(test))
xmax = max(np.max(train), np.max(test))
bins = np.linspace(xmin, xmax, 100)
fig = plt.figure()
plt.hist(test, bins=bins, density=True, alpha=0.5, label='test', log='y')
plt.hist(train, bins=bins, density=True, alpha=0.5, label='train', log='y')
if thresh is not None:
plt.axvline(thresh, c='r', label=f'threshold = {thresh:.3f}')
plt.xlabel(xlabel)
plt.ylabel('normalized counts (density)')
plt.legend()
return fig
def plot_roc_curve(roc_curve) -> plt.Figure:
"""Plot the ROC curve and the area under the curve."""
return plot_curve_with_area(
roc_curve.fpr, roc_curve.tpr, xlabel='FPR', ylabel='TPR')
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.plotting import * # pylint: disable=wildcard-import

View file

@ -13,126 +13,6 @@
# limitations under the License.
# Lint as: python3
"""Plotting code for ML Privacy Reports."""
from typing import Iterable
import matplotlib.pyplot as plt
import pandas as pd
"""Moved to privacy_attack/membership_inference_attack."""
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import AttackResults
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import AttackResultsCollection
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import AttackResultsDFColumns
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import ENTIRE_DATASET_SLICE_STR
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import PrivacyMetric
# Helper constants for DataFrame keys.
LEGEND_LABEL_STR = 'legend label'
EPOCH_STR = 'Epoch'
TRAIN_ACCURACY_STR = 'Train accuracy'
def plot_by_epochs(results: AttackResultsCollection,
privacy_metrics: Iterable[PrivacyMetric]) -> plt.Figure:
"""Plots privacy vulnerabilities vs epoch numbers.
In case multiple privacy metrics are specified, the plot will feature
multiple subplots (one subplot per metrics). Multiple model variants
are supported.
Args:
results: AttackResults for the plot
privacy_metrics: List of enumerated privacy metrics that should be plotted.
Returns:
A pyplot figure with privacy vs accuracy plots.
"""
_validate_results(results.attack_results_list)
all_results_df = _calculate_combined_df_with_metadata(
results.attack_results_list)
return _generate_subplots(
all_results_df=all_results_df,
x_axis_metric='Epoch',
figure_title='Vulnerability per Epoch',
privacy_metrics=privacy_metrics)
def plot_privacy_vs_accuracy(results: AttackResultsCollection,
privacy_metrics: Iterable[PrivacyMetric]):
"""Plots privacy vulnerabilities vs accuracy plots.
In case multiple privacy metrics are specified, the plot will feature
multiple subplots (one subplot per metrics). Multiple model variants
are supported.
Args:
results: AttackResults for the plot
privacy_metrics: List of enumerated privacy metrics that should be plotted.
Returns:
A pyplot figure with privacy vs accuracy plots.
"""
_validate_results(results.attack_results_list)
all_results_df = _calculate_combined_df_with_metadata(
results.attack_results_list)
return _generate_subplots(
all_results_df=all_results_df,
x_axis_metric='Train accuracy',
figure_title='Privacy vs Utility Analysis',
privacy_metrics=privacy_metrics)
def _calculate_combined_df_with_metadata(results: Iterable[AttackResults]):
"""Adds metadata to the dataframe and concats them together."""
all_results_df = None
for attack_results in results:
attack_results_df = attack_results.calculate_pd_dataframe()
attack_results_df = attack_results_df.loc[attack_results_df[str(
AttackResultsDFColumns.SLICE_FEATURE)] == ENTIRE_DATASET_SLICE_STR]
attack_results_df.insert(0, EPOCH_STR,
attack_results.privacy_report_metadata.epoch_num)
attack_results_df.insert(
0, TRAIN_ACCURACY_STR,
attack_results.privacy_report_metadata.accuracy_train)
attack_results_df.insert(
0, LEGEND_LABEL_STR,
attack_results.privacy_report_metadata.model_variant_label + ' - ' +
attack_results_df[str(AttackResultsDFColumns.ATTACK_TYPE)])
if all_results_df is None:
all_results_df = attack_results_df
else:
all_results_df = pd.concat([all_results_df, attack_results_df],
ignore_index=True)
return all_results_df
def _generate_subplots(all_results_df: pd.DataFrame, x_axis_metric: str,
figure_title: str,
privacy_metrics: Iterable[PrivacyMetric]):
"""Create one subplot per privacy metric for a specified x_axis_metric."""
fig, axes = plt.subplots(
1, len(privacy_metrics), figsize=(5 * len(privacy_metrics) + 3, 5))
# Set a title for the entire group of subplots.
fig.suptitle(figure_title)
if len(privacy_metrics) == 1:
axes = (axes,)
for i, privacy_metric in enumerate(privacy_metrics):
legend_labels = all_results_df[LEGEND_LABEL_STR].unique()
for legend_label in legend_labels:
single_label_results = all_results_df.loc[all_results_df[LEGEND_LABEL_STR]
== legend_label]
sorted_label_results = single_label_results.sort_values(x_axis_metric)
axes[i].plot(sorted_label_results[x_axis_metric],
sorted_label_results[str(privacy_metric)])
axes[i].set_xlabel(x_axis_metric)
axes[i].set_title('%s for %s' % (privacy_metric, ENTIRE_DATASET_SLICE_STR))
plt.legend(legend_labels, loc='upper left', bbox_to_anchor=(1.02, 1))
fig.tight_layout(rect=[0, 0, 1, 0.93]) # Leave space for suptitle.
return fig
def _validate_results(results: Iterable[AttackResults]):
for attack_results in results:
if not attack_results or not attack_results.privacy_report_metadata:
raise ValueError('Privacy metadata is not defined.')
if attack_results.privacy_report_metadata.epoch_num is None:
raise ValueError('epoch_num in metadata is not defined.')
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.plotting import * # pylint: disable=wildcard-import

View file

@ -13,361 +13,6 @@
# limitations under the License.
# Lint as: python3
"""Code for membership inference attacks on seq2seq models.
"""Moved to privacy_attack/membership_inference_attack."""
Contains seq2seq specific logic for attack data structures, attack data
generation,
and the logistic regression membership inference attack.
"""
from typing import Iterator, List
from dataclasses import dataclass
import numpy as np
from scipy.stats import rankdata
from sklearn import metrics
from sklearn import model_selection
import tensorflow as tf
from tensorflow_privacy.privacy.membership_inference_attack import models
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import AttackResults
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import AttackType
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import DataSize
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import PrivacyReportMetadata
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import RocCurve
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import SingleAttackResult
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import SingleSliceSpec
from tensorflow_privacy.privacy.membership_inference_attack.models import _sample_multidimensional_array
from tensorflow_privacy.privacy.membership_inference_attack.models import AttackerData
def _is_iterator(obj, obj_name):
"""Checks whether obj is a generator."""
if obj is not None and not isinstance(obj, Iterator):
raise ValueError('%s should be a generator.' % obj_name)
@dataclass
class Seq2SeqAttackInputData:
"""Input data for running an attack on seq2seq models.
This includes only the data, and not configuration.
"""
logits_train: Iterator[np.ndarray] = None
logits_test: Iterator[np.ndarray] = None
# Contains ground-truth token indices for the target sequences.
labels_train: Iterator[np.ndarray] = None
labels_test: Iterator[np.ndarray] = None
# Size of the target sequence vocabulary.
vocab_size: int = None
# Train, test size = number of batches in training, test set.
# These values need to be supplied by the user as logits, labels
# are lazy loaded for seq2seq models.
train_size: int = 0
test_size: int = 0
def validate(self):
"""Validates the inputs."""
if (self.logits_train is None) != (self.logits_test is None):
raise ValueError(
'logits_train and logits_test should both be either set or unset')
if (self.labels_train is None) != (self.labels_test is None):
raise ValueError(
'labels_train and labels_test should both be either set or unset')
if self.logits_train is None or self.labels_train is None:
raise ValueError(
'Labels, logits of training, test sets should all be set')
if (self.vocab_size is None or self.train_size is None or
self.test_size is None):
raise ValueError('vocab_size, train_size, test_size should all be set')
if self.vocab_size is not None and not int:
raise ValueError('vocab_size should be of integer type')
if self.train_size is not None and not int:
raise ValueError('train_size should be of integer type')
if self.test_size is not None and not int:
raise ValueError('test_size should be of integer type')
_is_iterator(self.logits_train, 'logits_train')
_is_iterator(self.logits_test, 'logits_test')
_is_iterator(self.labels_train, 'labels_train')
_is_iterator(self.labels_test, 'labels_test')
def __str__(self):
"""Returns the shapes of variables that are not None."""
result = ['AttackInputData(']
if self.vocab_size is not None and self.train_size is not None:
result.append(
'logits_train with shape (%d, num_sequences, num_tokens, %d)' %
(self.train_size, self.vocab_size))
result.append(
'labels_train with shape (%d, num_sequences, num_tokens, 1)' %
self.train_size)
if self.vocab_size is not None and self.test_size is not None:
result.append(
'logits_test with shape (%d, num_sequences, num_tokens, %d)' %
(self.test_size, self.vocab_size))
result.append(
'labels_test with shape (%d, num_sequences, num_tokens, 1)' %
self.test_size)
result.append(')')
return '\n'.join(result)
def _get_attack_features_and_metadata(
logits: Iterator[np.ndarray],
labels: Iterator[np.ndarray]) -> (np.ndarray, float, float):
"""Returns the average rank of tokens per batch of sequences and the loss.
Args:
logits: Logits returned by a seq2seq model, dim = (num_batches,
num_sequences, num_tokens, vocab_size).
labels: Target labels for the seq2seq model, dim = (num_batches,
num_sequences, num_tokens, 1).
Returns:
1. An array of average ranks, dim = (num_batches, 1).
Each average rank is calculated over ranks of tokens in sequences of a
particular batch.
2. Loss computed over all logits and labels.
3. Accuracy computed over all logits and labels.
"""
ranks = []
loss = 0.0
dataset_length = 0.0
correct_preds = 0
total_preds = 0
for batch_logits, batch_labels in zip(logits, labels):
# Compute average rank for the current batch.
batch_ranks = _get_batch_ranks(batch_logits, batch_labels)
ranks.append(np.mean(batch_ranks))
# Update overall loss metrics with metrics of the current batch.
batch_loss, batch_length = _get_batch_loss_metrics(batch_logits,
batch_labels)
loss += batch_loss
dataset_length += batch_length
# Update overall accuracy metrics with metrics of the current batch.
batch_correct_preds, batch_total_preds = _get_batch_accuracy_metrics(
batch_logits, batch_labels)
correct_preds += batch_correct_preds
total_preds += batch_total_preds
# Compute loss and accuracy for the dataset.
loss = loss / dataset_length
accuracy = correct_preds / total_preds
return np.array(ranks), loss, accuracy
def _get_batch_ranks(batch_logits: np.ndarray,
batch_labels: np.ndarray) -> np.ndarray:
"""Returns the ranks of tokens in a batch of sequences.
Args:
batch_logits: Logits returned by a seq2seq model, dim = (num_sequences,
num_tokens, vocab_size).
batch_labels: Target labels for the seq2seq model, dim = (num_sequences,
num_tokens, 1).
Returns:
An array of ranks of tokens in a batch of sequences, dim = (num_sequences,
num_tokens, 1)
"""
batch_ranks = []
for sequence_logits, sequence_labels in zip(batch_logits, batch_labels):
batch_ranks += _get_ranks_for_sequence(sequence_logits, sequence_labels)
return np.array(batch_ranks)
def _get_ranks_for_sequence(logits: np.ndarray,
labels: np.ndarray) -> List[float]:
"""Returns ranks for a sequence.
Args:
logits: Logits of a single sequence, dim = (num_tokens, vocab_size).
labels: Target labels of a single sequence, dim = (num_tokens, 1).
Returns:
An array of ranks for tokens in the sequence, dim = (num_tokens, 1).
"""
sequence_ranks = []
for logit, label in zip(logits, labels.astype(int)):
rank = rankdata(-logit, method='min')[label] - 1.0
sequence_ranks.append(rank)
return sequence_ranks
def _get_batch_loss_metrics(batch_logits: np.ndarray,
batch_labels: np.ndarray) -> (float, int):
"""Returns the loss, number of sequences for a batch.
Args:
batch_logits: Logits returned by a seq2seq model, dim = (num_sequences,
num_tokens, vocab_size).
batch_labels: Target labels for the seq2seq model, dim = (num_sequences,
num_tokens, 1).
"""
batch_loss = 0.0
batch_length = len(batch_logits)
for sequence_logits, sequence_labels in zip(batch_logits, batch_labels):
sequence_loss = tf.losses.sparse_categorical_crossentropy(
tf.keras.backend.constant(sequence_labels),
tf.keras.backend.constant(sequence_logits),
from_logits=True)
batch_loss += sequence_loss.numpy().sum()
return batch_loss / batch_length, batch_length
def _get_batch_accuracy_metrics(batch_logits: np.ndarray,
batch_labels: np.ndarray) -> (float, float):
"""Returns the number of correct predictions, total number of predictions for a batch.
Args:
batch_logits: Logits returned by a seq2seq model, dim = (num_sequences,
num_tokens, vocab_size).
batch_labels: Target labels for the seq2seq model, dim = (num_sequences,
num_tokens, 1).
"""
batch_correct_preds = 0.0
batch_total_preds = 0.0
for sequence_logits, sequence_labels in zip(batch_logits, batch_labels):
preds = tf.metrics.sparse_categorical_accuracy(
tf.keras.backend.constant(sequence_labels),
tf.keras.backend.constant(sequence_logits))
batch_correct_preds += preds.numpy().sum()
batch_total_preds += len(sequence_labels)
return batch_correct_preds, batch_total_preds
def create_seq2seq_attacker_data(
attack_input_data: Seq2SeqAttackInputData,
test_fraction: float = 0.25,
balance: bool = True,
privacy_report_metadata: PrivacyReportMetadata = PrivacyReportMetadata()
) -> AttackerData:
"""Prepares Seq2SeqAttackInputData to train ML attackers.
Uses logits and losses to generate ranks and performs a random train-test
split.
Also computes metadata (loss, accuracy) for the model under attack
and populates respective fields of PrivacyReportMetadata.
Args:
attack_input_data: Original Seq2SeqAttackInputData
test_fraction: Fraction of the dataset to include in the test split.
balance: Whether the training and test sets for the membership inference
attacker should have a balanced (roughly equal) number of samples from the
training and test sets used to develop the model under attack.
privacy_report_metadata: the metadata of the model under attack.
Returns:
AttackerData.
"""
attack_input_train, loss_train, accuracy_train = _get_attack_features_and_metadata(
attack_input_data.logits_train, attack_input_data.labels_train)
attack_input_test, loss_test, accuracy_test = _get_attack_features_and_metadata(
attack_input_data.logits_test, attack_input_data.labels_test)
if balance:
min_size = min(len(attack_input_train), len(attack_input_test))
attack_input_train = _sample_multidimensional_array(attack_input_train,
min_size)
attack_input_test = _sample_multidimensional_array(attack_input_test,
min_size)
features_all = np.concatenate((attack_input_train, attack_input_test))
ntrain, ntest = attack_input_train.shape[0], attack_input_test.shape[0]
# Reshape for classifying one-dimensional features
features_all = features_all.reshape(-1, 1)
labels_all = np.concatenate(((np.zeros(ntrain)), (np.ones(ntest))))
# Perform a train-test split
features_train, features_test, \
is_training_labels_train, is_training_labels_test = \
model_selection.train_test_split(
features_all, labels_all, test_size=test_fraction, stratify=labels_all)
# Populate accuracy, loss fields in privacy report metadata
privacy_report_metadata.loss_train = loss_train
privacy_report_metadata.loss_test = loss_test
privacy_report_metadata.accuracy_train = accuracy_train
privacy_report_metadata.accuracy_test = accuracy_test
return AttackerData(features_train, is_training_labels_train, features_test,
is_training_labels_test,
DataSize(ntrain=ntrain, ntest=ntest))
def run_seq2seq_attack(attack_input: Seq2SeqAttackInputData,
privacy_report_metadata: PrivacyReportMetadata = None,
balance_attacker_training: bool = True) -> AttackResults:
"""Runs membership inference attacks on a seq2seq model.
Args:
attack_input: input data for running an attack
privacy_report_metadata: the metadata of the model under attack.
balance_attacker_training: Whether the training and test sets for the
membership inference attacker should have a balanced (roughly equal)
number of samples from the training and test sets used to develop the
model under attack.
Returns:
the attack result.
"""
attack_input.validate()
# The attacker uses the average rank (a single number) of a seq2seq dataset
# record to determine membership. So only Logistic Regression is supported,
# as it makes the most sense for single-number features.
attacker = models.LogisticRegressionAttacker()
# Create attacker data and populate fields of privacy_report_metadata
privacy_report_metadata = privacy_report_metadata or PrivacyReportMetadata()
prepared_attacker_data = create_seq2seq_attacker_data(
attack_input_data=attack_input,
balance=balance_attacker_training,
privacy_report_metadata=privacy_report_metadata)
attacker.train_model(prepared_attacker_data.features_train,
prepared_attacker_data.is_training_labels_train)
# Run the attacker on (permuted) test examples.
predictions_test = attacker.predict(prepared_attacker_data.features_test)
# Generate ROC curves with predictions.
fpr, tpr, thresholds = metrics.roc_curve(
prepared_attacker_data.is_training_labels_test, predictions_test)
roc_curve = RocCurve(tpr=tpr, fpr=fpr, thresholds=thresholds)
attack_results = [
SingleAttackResult(
slice_spec=SingleSliceSpec(),
attack_type=AttackType.LOGISTIC_REGRESSION,
roc_curve=roc_curve,
data_size=prepared_attacker_data.data_size)
]
return AttackResults(
single_attack_results=attack_results,
privacy_report_metadata=privacy_report_metadata)
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.seq2seq_mia import * # pylint: disable=wildcard-import

View file

@ -13,187 +13,6 @@
# limitations under the License.
# Lint as: python3
"""A hook and a function in tf estimator for membership inference attack."""
"""Moved to privacy_attack/membership_inference_attack."""
import os
from typing import Iterable
from absl import logging
import numpy as np
import tensorflow.compat.v1 as tf
from tensorflow_privacy.privacy.membership_inference_attack import membership_inference_attack as mia
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import AttackInputData
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import AttackType
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import get_flattened_attack_metrics
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import SlicingSpec
from tensorflow_privacy.privacy.membership_inference_attack.utils import log_loss
from tensorflow_privacy.privacy.membership_inference_attack.utils_tensorboard import write_results_to_tensorboard
def calculate_losses(estimator, input_fn, labels):
"""Get predictions and losses for samples.
The assumptions are 1) the loss is cross-entropy loss, and 2) user have
specified prediction mode to return predictions, e.g.,
when mode == tf.estimator.ModeKeys.PREDICT, the model function returns
tf.estimator.EstimatorSpec(mode=mode, predictions=tf.nn.softmax(logits)).
Args:
estimator: model to make prediction
input_fn: input function to be used in estimator.predict
labels: array of size (n_samples, ), true labels of samples (integer valued)
Returns:
preds: probability vector of each sample
loss: cross entropy loss of each sample
"""
pred = np.array(list(estimator.predict(input_fn=input_fn)))
loss = log_loss(labels, pred)
return pred, loss
class MembershipInferenceTrainingHook(tf.estimator.SessionRunHook):
"""Training hook to perform membership inference attack on epoch end."""
def __init__(
self,
estimator,
in_train, out_train,
input_fn_constructor,
slicing_spec: SlicingSpec = None,
attack_types: Iterable[AttackType] = (AttackType.THRESHOLD_ATTACK,),
tensorboard_dir=None,
tensorboard_merge_classifiers=False):
"""Initialize the hook.
Args:
estimator: model to be tested
in_train: (in_training samples, in_training labels)
out_train: (out_training samples, out_training labels)
input_fn_constructor: a function that receives sample, label and construct
the input_fn for model prediction
slicing_spec: slicing specification of the attack
attack_types: a list of attacks, each of type AttackType
tensorboard_dir: directory for tensorboard summary
tensorboard_merge_classifiers: if true, plot different classifiers with
the same slicing_spec and metric in the same figure
"""
in_train_data, self._in_train_labels = in_train
out_train_data, self._out_train_labels = out_train
# Define the input functions for both in and out-training samples.
self._in_train_input_fn = input_fn_constructor(in_train_data,
self._in_train_labels)
self._out_train_input_fn = input_fn_constructor(out_train_data,
self._out_train_labels)
self._estimator = estimator
self._slicing_spec = slicing_spec
self._attack_types = attack_types
self._tensorboard_merge_classifiers = tensorboard_merge_classifiers
if tensorboard_dir:
if tensorboard_merge_classifiers:
self._writers = {}
with tf.Graph().as_default():
for attack_type in attack_types:
self._writers[attack_type.name] = tf.summary.FileWriter(
os.path.join(tensorboard_dir, 'MI', attack_type.name))
else:
with tf.Graph().as_default():
self._writers = tf.summary.FileWriter(
os.path.join(tensorboard_dir, 'MI'))
logging.info('Will write to tensorboard.')
else:
self._writers = None
def end(self, session):
results = run_attack_helper(self._estimator,
self._in_train_input_fn,
self._out_train_input_fn,
self._in_train_labels, self._out_train_labels,
self._slicing_spec,
self._attack_types)
logging.info(results)
att_types, att_slices, att_metrics, att_values = get_flattened_attack_metrics(
results)
print('Attack result:')
print('\n'.join([' %s: %.4f' % (', '.join([s, t, m]), v) for t, s, m, v in
zip(att_types, att_slices, att_metrics, att_values)]))
# Write to tensorboard if tensorboard_dir is specified
global_step = self._estimator.get_variable_value('global_step')
if self._writers is not None:
write_results_to_tensorboard(results, self._writers, global_step,
self._tensorboard_merge_classifiers)
def run_attack_on_tf_estimator_model(
estimator, in_train, out_train,
input_fn_constructor,
slicing_spec: SlicingSpec = None,
attack_types: Iterable[AttackType] = (AttackType.THRESHOLD_ATTACK,)):
"""Performs the attack in the end of training.
Args:
estimator: model to be tested
in_train: (in_training samples, in_training labels)
out_train: (out_training samples, out_training labels)
input_fn_constructor: a function that receives sample, label and construct
the input_fn for model prediction
slicing_spec: slicing specification of the attack
attack_types: a list of attacks, each of type AttackType
Returns:
Results of the attack
"""
in_train_data, in_train_labels = in_train
out_train_data, out_train_labels = out_train
# Define the input functions for both in and out-training samples.
in_train_input_fn = input_fn_constructor(in_train_data, in_train_labels)
out_train_input_fn = input_fn_constructor(out_train_data, out_train_labels)
# Call the helper to run the attack.
results = run_attack_helper(estimator,
in_train_input_fn, out_train_input_fn,
in_train_labels, out_train_labels,
slicing_spec,
attack_types)
logging.info('End of training attack:')
logging.info(results)
return results
def run_attack_helper(
estimator,
in_train_input_fn, out_train_input_fn,
in_train_labels, out_train_labels,
slicing_spec: SlicingSpec = None,
attack_types: Iterable[AttackType] = (AttackType.THRESHOLD_ATTACK,)):
"""A helper function to perform attack.
Args:
estimator: model to be tested
in_train_input_fn: input_fn for in training data
out_train_input_fn: input_fn for out of training data
in_train_labels: in training labels
out_train_labels: out of training labels
slicing_spec: slicing specification of the attack
attack_types: a list of attacks, each of type AttackType
Returns:
Results of the attack
"""
# Compute predictions and losses
in_train_pred, in_train_loss = calculate_losses(estimator,
in_train_input_fn,
in_train_labels)
out_train_pred, out_train_loss = calculate_losses(estimator,
out_train_input_fn,
out_train_labels)
attack_input = AttackInputData(
logits_train=in_train_pred, logits_test=out_train_pred,
labels_train=in_train_labels, labels_test=out_train_labels,
loss_train=in_train_loss, loss_test=out_train_loss
)
results = mia.run_attacks(attack_input,
slicing_spec=slicing_spec,
attack_types=attack_types)
return results
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.tf_estimator_evaluation import * # pylint: disable=wildcard-import

View file

@ -0,0 +1,7 @@
# Privacy tests
A good privacy-preserving model learns from the training data, but
doesn't memorize individual samples. Excessive memorization is not only harmful
for the model predictive power, but also presents a privacy risk.
This library provides empirical tests for measuring potential memorization.

View file

@ -0,0 +1,269 @@
# Membership inference attack
A good privacy-preserving model learns from the training data, but
doesn't memorize it. This library provides empirical tests for measuring
potential memorization.
Technically, the tests build classifiers that infer whether a particular sample
was present in the training set. The more accurate such classifier is, the more
memorization is present and thus the less privacy-preserving the model is.
The privacy vulnerability (or memorization potential) is measured
via the area under the ROC-curve (`auc`) or via max{|fpr - tpr|} (`advantage`)
of the attack classifier. These measures are very closely related.
The tests provided by the library are "black box". That is, only the outputs of
the model are used (e.g., losses, logits, predictions). Neither model internals
(weights) nor input samples are required.
## How to use
### Installation notes
To use the latest version of the MIA library, please install TF Privacy with
"pip install -U git+https://github.com/tensorflow/privacy". See
https://github.com/tensorflow/privacy/issues/151 for more details.
### Basic usage
The simplest possible usage is
```python
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack import membership_inference_attack as mia
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import AttackInputData
# Suppose we have the labels as integers starting from 0
# labels_train shape: (n_train, )
# labels_test shape: (n_test, )
# Evaluate your model on training and test examples to get
# loss_train shape: (n_train, )
# loss_test shape: (n_test, )
attacks_result = mia.run_attacks(
AttackInputData(
loss_train = loss_train,
loss_test = loss_test,
labels_train = labels_train,
labels_test = labels_test))
```
This example calls `run_attacks` with the default options to run a host of
(fairly simple) attacks behind the scenes (depending on which data is fed in),
and computes the most important measures.
> NOTE: The train and test sets are balanced internally, i.e., an equal number
> of in-training and out-of-training examples is chosen for the attacks
> (whichever has fewer examples). These are subsampled uniformly at random
> without replacement from the larger of the two.
Then, we can view the attack results by:
```python
print(attacks_result.summary())
# Example output:
# -> Best-performing attacks over all slices
# THRESHOLD_ATTACK (with 50000 training and 10000 test examples) achieved an AUC of 0.59 on slice Entire dataset
# THRESHOLD_ATTACK (with 50000 training and 10000 test examples) achieved an advantage of 0.20 on slice Entire dataset
```
### Other codelabs
Please head over to the [codelabs](https://github.com/tensorflow/privacy/tree/master/tensorflow_privacy/privacy/privacy_tests/membership_inference_attack/codelabs)
section for an overview of the library in action.
### Advanced usage
#### Specifying attacks to run
Sometimes, we have more information about the data, such as the logits and the
labels,
and we may want to have finer-grained control of the attack, such as using more
complicated classifiers instead of the simple threshold attack, and looks at the
attack results by examples' class.
In thoses cases, we can provide more information to `run_attacks`.
```python
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack import membership_inference_attack as mia
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import AttackInputData
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import SlicingSpec
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import AttackType
```
First, similar as before, we specify the input for the attack as an
`AttackInputData` object:
```python
# Evaluate your model on training and test examples to get
# logits_train shape: (n_train, n_classes)
# logits_test shape: (n_test, n_classes)
# loss_train shape: (n_train, )
# loss_test shape: (n_test, )
attack_input = AttackInputData(
logits_train = logits_train,
logits_test = logits_test,
loss_train = loss_train,
loss_test = loss_test,
labels_train = labels_train,
labels_test = labels_test)
```
Instead of `logits`, you can also specify
`probs_train` and `probs_test` as the predicted probabilty vectors of each
example.
Then, we specify some details of the attack.
The first part includes the specifications of the slicing of the data. For
example, we may want to evaluate the result on the whole dataset, or by class,
percentiles, or the correctness of the model's classification.
These can be specified by a `SlicingSpec` object.
```python
slicing_spec = SlicingSpec(
entire_dataset = True,
by_class = True,
by_percentiles = False,
by_classification_correctness = True)
```
The second part specifies the classifiers for the attacker to use.
Currently, our API supports five classifiers, including
`AttackType.THRESHOLD_ATTACK` for simple threshold attack,
`AttackType.LOGISTIC_REGRESSION`,
`AttackType.MULTI_LAYERED_PERCEPTRON`,
`AttackType.RANDOM_FOREST`, and
`AttackType.K_NEAREST_NEIGHBORS`
which use the corresponding machine learning models.
For some model, different classifiers can yield pertty different results.
We can put multiple classifers in a list:
```python
attack_types = [
AttackType.THRESHOLD_ATTACK,
AttackType.LOGISTIC_REGRESSION
]
```
Now, we can call the `run_attacks` methods with all specifications:
```python
attacks_result = mia.run_attacks(attack_input=attack_input,
slicing_spec=slicing_spec,
attack_types=attack_types)
```
This returns an object of type `AttackResults`. We can, for example, use the
following code to see the attack results specificed per-slice, as we have
request attacks by class and by model's classification correctness.
```python
print(attacks_result.summary(by_slices = True))
# Example output:
# -> Best-performing attacks over all slices
# THRESHOLD_ATTACK achieved an AUC of 0.75 on slice CORRECTLY_CLASSIFIED=False
# THRESHOLD_ATTACK achieved an advantage of 0.38 on slice CORRECTLY_CLASSIFIED=False
#
# Best-performing attacks over slice: "Entire dataset"
# LOGISTIC_REGRESSION achieved an AUC of 0.61
# THRESHOLD_ATTACK achieved an advantage of 0.22
#
# Best-performing attacks over slice: "CLASS=0"
# LOGISTIC_REGRESSION achieved an AUC of 0.62
# LOGISTIC_REGRESSION achieved an advantage of 0.24
#
# Best-performing attacks over slice: "CLASS=1"
# LOGISTIC_REGRESSION achieved an AUC of 0.61
# LOGISTIC_REGRESSION achieved an advantage of 0.19
#
# ...
#
# Best-performing attacks over slice: "CORRECTLY_CLASSIFIED=True"
# LOGISTIC_REGRESSION achieved an AUC of 0.53
# THRESHOLD_ATTACK achieved an advantage of 0.05
#
# Best-performing attacks over slice: "CORRECTLY_CLASSIFIED=False"
# THRESHOLD_ATTACK achieved an AUC of 0.75
# THRESHOLD_ATTACK achieved an advantage of 0.38
```
#### Viewing and plotting the attack results
We have seen an example of using `summary()` to view the attack results as text.
We also provide some other ways for inspecting the attack results.
To get the attack that achieves the maximum attacker advantage or AUC, we can do
```python
max_auc_attacker = attacks_result.get_result_with_max_auc()
max_advantage_attacker = attacks_result.get_result_with_max_attacker_advantage()
```
Then, for individual attack, such as `max_auc_attacker`, we can check its type,
attacker advantage and AUC by
```python
print("Attack type with max AUC: %s, AUC of %.2f, Attacker advantage of %.2f" %
(max_auc_attacker.attack_type,
max_auc_attacker.roc_curve.get_auc(),
max_auc_attacker.roc_curve.get_attacker_advantage()))
# Example output:
# -> Attack type with max AUC: THRESHOLD_ATTACK, AUC of 0.75, Attacker advantage of 0.38
```
We can also plot its ROC curve by
```python
import tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.plotting as plotting
figure = plotting.plot_roc_curve(max_auc_attacker.roc_curve)
```
which would give a figure like the one below
![roc_fig](https://github.com/tensorflow/privacy/blob/master/tensorflow_privacy/privacy/privacy_tests/membership_inference_attack/codelab_roc_fig.png?raw=true)
Additionally, we provide functionality to convert the attack results into Pandas
data frame:
```python
import pandas as pd
pd.set_option("display.max_rows", 8, "display.max_columns", None)
print(attacks_result.calculate_pd_dataframe())
# Example output:
# slice feature slice value attack type Attacker advantage AUC
# 0 entire_dataset threshold 0.216440 0.600630
# 1 entire_dataset lr 0.212073 0.612989
# 2 class 0 threshold 0.226000 0.611669
# 3 class 0 lr 0.239452 0.624076
# .. ... ... ... ... ...
# 22 correctly_classfied True threshold 0.054907 0.471290
# 23 correctly_classfied True lr 0.046986 0.525194
# 24 correctly_classfied False threshold 0.379465 0.748138
# 25 correctly_classfied False lr 0.370713 0.737148
```
### External guides / press mentions
* [Introductory blog post](https://franziska-boenisch.de/posts/2021/01/membership-inference/)
to the theory and the library by Franziska Boenisch from the Fraunhofer AISEC
institute.
* [Google AI Blog Post](https://ai.googleblog.com/2021/01/google-research-looking-back-at-2020.html#ResponsibleAI)
* [TensorFlow Blog Post](https://blog.tensorflow.org/2020/06/introducing-new-privacy-testing-library.html)
* [VentureBeat article](https://venturebeat.com/2020/06/24/google-releases-experimental-tensorflow-module-that-tests-the-privacy-of-ai-models/)
* [Tech Xplore article](https://techxplore.com/news/2020-06-google-tensorflow-privacy-module.html)
## Contact / Feedback
Fill out this
[Google form](https://docs.google.com/forms/d/1DPwr3_OfMcqAOA6sdelTVjIZhKxMZkXvs94z16UCDa4/edit)
or reach out to us at tf-privacy@google.com and let us know how youre using
this module. Were keen on hearing your stories, feedback, and suggestions!
## Contributing
If you wish to add novel attacks to the attack library, please check our
[guidelines](https://github.com/tensorflow/privacy/blob/master/tensorflow_privacy/privacy/privacy_tests/membership_inference_attack/CONTRIBUTING.md).
## Copyright
Copyright 2021 - Google LLC

View file

@ -0,0 +1,13 @@
# Copyright 2020, The TensorFlow Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

View file

@ -2,7 +2,7 @@
## Introductory codelab
The easiest way to get started is to go through [the introductory codelab](https://github.com/tensorflow/privacy/blob/master/tensorflow_privacy/privacy/membership_inference_attack/codelabs/codelab.ipynb).
The easiest way to get started is to go through [the introductory codelab](https://github.com/tensorflow/privacy/blob/master/tensorflow_privacy/privacy/privacy_tests/membership_inference_attack/codelabs/codelab.ipynb).
This trains a simple image classification model and tests it against a series
of membership inference attacks.
@ -10,18 +10,18 @@ For a more detailed overview of the library, please check the sections below.
## End to end example
As an alternative to the introductory codelab, we also have a standalone
[example.py](https://github.com/tensorflow/privacy/blob/master/tensorflow_privacy/privacy/membership_inference_attack/codelabs/example.py).
[example.py](https://github.com/tensorflow/privacy/blob/master/tensorflow_privacy/privacy/privacy_tests/membership_inference_attack/codelabs/example.py).
## Sequence to sequence models
If you're interested in sequence to sequence model attacks, please see the
[seq2seq colab](https://github.com/tensorflow/privacy/blob/master/tensorflow_privacy/privacy/membership_inference_attack/codelabs/third_party/seq2seq_membership_inference/seq2seq_membership_inference_codelab.ipynb).
[seq2seq colab](https://github.com/tensorflow/privacy/blob/master/tensorflow_privacy/privacy/privacy_tests/membership_inference_attack/codelabs/third_party/seq2seq_membership_inference/seq2seq_membership_inference_codelab.ipynb).
## Membership probability score
If you're interested in the membership probability score (also called privacy
risk score) developed by Song and Mittal, please see their
[membership probability codelab](https://github.com/tensorflow/privacy/blob/master/tensorflow_privacy/privacy/membership_inference_attack/codelabs/membership_probability_codelab.ipynb).
[membership probability codelab](https://github.com/tensorflow/privacy/blob/master/tensorflow_privacy/privacy/privacy_tests/membership_inference_attack/codelabs/membership_probability_codelab.ipynb).
The accompanying paper is on [arXiv](https://arxiv.org/abs/2003.10595).

View file

@ -53,10 +53,10 @@
"source": [
"<table class=\"tfo-notebook-buttons\" align=\"left\">\n",
" <td>\n",
" <a target=\"_blank\" href=\"https://colab.research.google.com/github/tensorflow/privacy/blob/master/tensorflow_privacy/privacy/membership_inference_attack/codelabs/codelab.ipynb\"><img src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" />Run in Google Colab</a>\n",
" <a target=\"_blank\" href=\"https://colab.research.google.com/github/tensorflow/privacy/blob/master/tensorflow_privacy/privacy/privacy_tests/membership_inference_attack/codelabs/codelab.ipynb\"><img src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" />Run in Google Colab</a>\n",
" </td>\n",
" <td>\n",
" <a target=\"_blank\" href=\"https://github.com/tensorflow/privacy/blob/master/tensorflow_privacy/privacy/membership_inference_attack/codelabs/codelab.ipynb\"><img src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\" />View source on GitHub</a>\n",
" <a target=\"_blank\" href=\"https://github.com/tensorflow/privacy/blob/master/tensorflow_privacy/privacy/privacy_tests/membership_inference_attack/codelabs/codelab.ipynb\"><img src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\" />View source on GitHub</a>\n",
" </td>\n",
"</table>"
]
@ -133,7 +133,7 @@
"source": [
"!pip3 install git+https://github.com/tensorflow/privacy\n",
"\n",
"from tensorflow_privacy.privacy.membership_inference_attack import membership_inference_attack as mia"
"from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack import membership_inference_attack as mia"
]
},
{
@ -298,11 +298,11 @@
},
"outputs": [],
"source": [
"from tensorflow_privacy.privacy.membership_inference_attack.data_structures import AttackInputData\n",
"from tensorflow_privacy.privacy.membership_inference_attack.data_structures import SlicingSpec\n",
"from tensorflow_privacy.privacy.membership_inference_attack.data_structures import AttackType\n",
"from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import AttackInputData\n",
"from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import SlicingSpec\n",
"from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import AttackType\n",
"\n",
"import tensorflow_privacy.privacy.membership_inference_attack.plotting as plotting\n",
"import tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.plotting as plotting\n",
"\n",
"labels_train = np.argmax(y_train, axis=1)\n",
"labels_test = np.argmax(y_test, axis=1)\n",

View file

@ -28,18 +28,17 @@ from sklearn import metrics
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.utils import to_categorical
from tensorflow_privacy.privacy.membership_inference_attack import membership_inference_attack as mia
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack import membership_inference_attack as mia
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import AttackInputData
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import AttackResults
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import AttackResultsCollection
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import AttackType
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import PrivacyMetric
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import \
PrivacyReportMetadata
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import SlicingSpec
import tensorflow_privacy.privacy.membership_inference_attack.plotting as plotting
import tensorflow_privacy.privacy.membership_inference_attack.privacy_report as privacy_report
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import AttackInputData
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import AttackResults
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import AttackResultsCollection
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import AttackType
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import PrivacyMetric
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import PrivacyReportMetadata
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import SlicingSpec
import tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.plotting as plotting
import tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.privacy_report as privacy_report
def generate_random_cluster(center, scale, num_points):

View file

@ -53,10 +53,10 @@
"source": [
"<table class=\"tfo-notebook-buttons\" align=\"left\">\n",
" <td>\n",
" <a target=\"_blank\" href=\"https://colab.research.google.com/github/tensorflow/privacy/blob/master/tensorflow_privacy/privacy/membership_inference_attack/codelabs/membership_probability_codelab.ipynb\"><img src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" />Run in Google Colab</a>\n",
" <a target=\"_blank\" href=\"https://colab.research.google.com/github/tensorflow/privacy/blob/master/tensorflow_privacy/privacy/privacy_tests/membership_inference_attack/codelabs/membership_probability_codelab.ipynb\"><img src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" />Run in Google Colab</a>\n",
" </td>\n",
" <td>\n",
" <a target=\"_blank\" href=\"https://github.com/tensorflow/privacy/blob/master/tensorflow_privacy/privacy/membership_inference_attack/codelabs/membership_probability_codelab.ipynb\"><img src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\" />View source on GitHub</a>\n",
" <a target=\"_blank\" href=\"https://github.com/tensorflow/privacy/blob/master/tensorflow_privacy/privacy/privacy_tests/membership_inference_attack/codelabs/membership_probability_codelab.ipynb\"><img src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\" />View source on GitHub</a>\n",
" </td>\n",
"</table>"
]
@ -133,7 +133,7 @@
"source": [
"!pip3 install git+https://github.com/tensorflow/privacy\n",
"\n",
"from tensorflow_privacy.privacy.membership_inference_attack import membership_inference_attack as mia"
"from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack import membership_inference_attack as mia"
]
},
{
@ -627,11 +627,11 @@
}
],
"source": [
"from tensorflow_privacy.privacy.membership_inference_attack.data_structures import AttackInputData\n",
"from tensorflow_privacy.privacy.membership_inference_attack.data_structures import SlicingSpec\n",
"from tensorflow_privacy.privacy.membership_inference_attack.data_structures import AttackType\n",
"from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import AttackInputData\n",
"from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import SlicingSpec\n",
"from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import AttackType\n",
"\n",
"import tensorflow_privacy.privacy.membership_inference_attack.plotting as plotting\n",
"import tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.plotting as plotting\n",
"\n",
"labels_train = np.argmax(y_train, axis=1)\n",
"labels_test = np.argmax(y_test, axis=1)\n",
@ -1190,9 +1190,9 @@
}
],
"source": [
"from tensorflow_privacy.privacy.membership_inference_attack.data_structures import SingleSliceSpec\n",
"from tensorflow_privacy.privacy.membership_inference_attack.data_structures import SlicingFeature\n",
"from tensorflow_privacy.privacy.membership_inference_attack.dataset_slicing import get_slice\n",
"from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import SingleSliceSpec\n",
"from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import SlicingFeature\n",
"from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.dataset_slicing import get_slice\n",
"import matplotlib.pyplot as plt\n",
"class_list = np.arange(10)\n",
"num_images = 5\n",

View file

@ -13,10 +13,10 @@
"source": [
"<table class=\"tfo-notebook-buttons\" align=\"left\">\n",
" <td>\n",
" <a target=\"_blank\" href=\"https://colab.research.google.com/github/tensorflow/privacy/blob/master/tensorflow_privacy/privacy/membership_inference_attack/codelabs/seq2seq_membership_inference_codelab.ipynb\"><img src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" />Run in Google Colab</a>\n",
" <a target=\"_blank\" href=\"https://colab.research.google.com/github/tensorflow/privacy/blob/master/tensorflow_privacy/privacy/privacy_tests/membership_inference_attack/codelabs/seq2seq_membership_inference_codelab.ipynb\"><img src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" />Run in Google Colab</a>\n",
" </td>\n",
" <td>\n",
" <a target=\"_blank\" href=\"https://github.com/tensorflow/privacy/blob/master/tensorflow_privacy/privacy/membership_inference_attack/codelabs/seq2seq_membership_inference_codelab.ipynb.ipynb\"><img src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\" />View source on GitHub</a>\n",
" <a target=\"_blank\" href=\"https://github.com/tensorflow/privacy/blob/master/tensorflow_privacy/privacy/privacy_tests/membership_inference_attack/codelabs/seq2seq_membership_inference_codelab.ipynb.ipynb\"><img src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\" />View source on GitHub</a>\n",
" </td>\n",
"</table>"
]
@ -106,7 +106,7 @@
"source": [
"!pip3 install git+https://github.com/tensorflow/privacy\n",
"\n",
"from tensorflow_privacy.privacy.membership_inference_attack import membership_inference_attack as mia"
"from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack import membership_inference_attack as mia"
]
},
{
@ -1142,9 +1142,9 @@
}
],
"source": [
"from tensorflow_privacy.privacy.membership_inference_attack.seq2seq_mia import Seq2SeqAttackInputData, \\\n",
"from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.seq2seq_mia import Seq2SeqAttackInputData, \\\n",
" run_seq2seq_attack\n",
"import tensorflow_privacy.privacy.membership_inference_attack.plotting as plotting\n",
"import tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.plotting as plotting\n",
"\n",
"attack_input = Seq2SeqAttackInputData(\n",
" logits_train = logits_train_gen,\n",

View file

@ -0,0 +1,819 @@
# Copyright 2020, The TensorFlow Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# Lint as: python3
"""Data structures representing attack inputs, configuration, outputs."""
import collections
import enum
import glob
import os
import pickle
from typing import Any, Iterable, Union
from dataclasses import dataclass
import numpy as np
import pandas as pd
from scipy import special
from sklearn import metrics
import tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.utils as utils
ENTIRE_DATASET_SLICE_STR = 'Entire dataset'
class SlicingFeature(enum.Enum):
"""Enum with features by which slicing is available."""
CLASS = 'class'
PERCENTILE = 'percentile'
CORRECTLY_CLASSIFIED = 'correctly_classified'
@dataclass
class SingleSliceSpec:
"""Specifies a slice.
The slice is defined by values in one feature - it might be a single value
(eg. slice of examples of the specific classification class) or some set of
values (eg. range of percentiles of the attacked model loss).
When feature is None, it means that the slice is the entire dataset.
"""
feature: SlicingFeature = None
value: Any = None
@property
def entire_dataset(self):
return self.feature is None
def __str__(self):
if self.entire_dataset:
return ENTIRE_DATASET_SLICE_STR
if self.feature == SlicingFeature.PERCENTILE:
return 'Loss percentiles: %d-%d' % self.value
return '%s=%s' % (self.feature.name, self.value)
@dataclass
class SlicingSpec:
"""Specification of a slicing procedure.
Each variable which is set specifies a slicing by different dimension.
"""
# When is set to true, one of the slices is the whole dataset.
entire_dataset: bool = True
# Used in classification tasks for slicing by classes. It is assumed that
# classes are integers 0, 1, ... number of classes. When true one slice per
# each class is generated.
by_class: Union[bool, Iterable[int], int] = False
# if true, it generates 10 slices for percentiles of the loss - 0-10%, 10-20%,
# ... 90-100%.
by_percentiles: bool = False
# When true, a slice for correctly classifed and a slice for misclassifed
# examples will be generated.
by_classification_correctness: bool = False
def __str__(self):
"""Only keeps the True values."""
result = ['SlicingSpec(']
if self.entire_dataset:
result.append(' Entire dataset,')
if self.by_class:
if isinstance(self.by_class, Iterable):
result.append(' Into classes %s,' % self.by_class)
elif isinstance(self.by_class, int):
result.append(' Up to class %d,' % self.by_class)
else:
result.append(' By classes,')
if self.by_percentiles:
result.append(' By percentiles,')
if self.by_classification_correctness:
result.append(' By classification correctness,')
result.append(')')
return '\n'.join(result)
class AttackType(enum.Enum):
"""An enum define attack types."""
LOGISTIC_REGRESSION = 'lr'
MULTI_LAYERED_PERCEPTRON = 'mlp'
RANDOM_FOREST = 'rf'
K_NEAREST_NEIGHBORS = 'knn'
THRESHOLD_ATTACK = 'threshold'
THRESHOLD_ENTROPY_ATTACK = 'threshold-entropy'
@property
def is_trained_attack(self):
"""Returns whether this type of attack requires training a model."""
return (self != AttackType.THRESHOLD_ATTACK) and (
self != AttackType.THRESHOLD_ENTROPY_ATTACK)
def __str__(self):
"""Returns LOGISTIC_REGRESSION instead of AttackType.LOGISTIC_REGRESSION."""
return '%s' % self.name
class PrivacyMetric(enum.Enum):
"""An enum for the supported privacy risk metrics."""
AUC = 'AUC'
ATTACKER_ADVANTAGE = 'Attacker advantage'
def __str__(self):
"""Returns 'AUC' instead of PrivacyMetric.AUC."""
return '%s' % self.value
def _is_integer_type_array(a):
return np.issubdtype(a.dtype, np.integer)
def _is_last_dim_equal(arr1, arr1_name, arr2, arr2_name):
"""Checks whether the last dimension of the arrays is the same."""
if arr1 is not None and arr2 is not None and arr1.shape[-1] != arr2.shape[-1]:
raise ValueError('%s and %s should have the same number of features.' %
(arr1_name, arr2_name))
def _is_array_one_dimensional(arr, arr_name):
"""Checks whether the array is one dimensional."""
if arr is not None and len(arr.shape) != 1:
raise ValueError('%s should be a one dimensional numpy array.' % arr_name)
def _is_np_array(arr, arr_name):
"""Checks whether array is a numpy array."""
if arr is not None and not isinstance(arr, np.ndarray):
raise ValueError('%s should be a numpy array.' % arr_name)
def _log_value(probs, small_value=1e-30):
"""Compute the log value on the probability. Clip probabilities close to 0."""
return -np.log(np.maximum(probs, small_value))
@dataclass
class AttackInputData:
"""Input data for running an attack.
This includes only the data, and not configuration.
"""
logits_train: np.ndarray = None
logits_test: np.ndarray = None
# Predicted probabilities for each class. They can be derived from logits,
# so they can be set only if logits are not explicitly provided.
probs_train: np.ndarray = None
probs_test: np.ndarray = None
# Contains ground-truth classes. Classes are assumed to be integers starting
# from 0.
labels_train: np.ndarray = None
labels_test: np.ndarray = None
# Explicitly specified loss. If provided, this is used instead of deriving
# loss from logits and labels
loss_train: np.ndarray = None
loss_test: np.ndarray = None
# Explicitly specified prediction entropy. If provided, this is used instead
# of deriving entropy from logits and labels
# (https://arxiv.org/pdf/2003.10595.pdf by Song and Mittal).
entropy_train: np.ndarray = None
entropy_test: np.ndarray = None
@property
def num_classes(self):
if self.labels_train is None or self.labels_test is None:
raise ValueError(
'Can\'t identify the number of classes as no labels were provided. '
'Please set labels_train and labels_test')
return int(max(np.max(self.labels_train), np.max(self.labels_test))) + 1
@property
def logits_or_probs_train(self):
"""Returns train logits or probs whatever is not None."""
if self.logits_train is not None:
return self.logits_train
return self.probs_train
@property
def logits_or_probs_test(self):
"""Returns test logits or probs whatever is not None."""
if self.logits_test is not None:
return self.logits_test
return self.probs_test
@staticmethod
def _get_entropy(logits: np.ndarray, true_labels: np.ndarray):
"""Computes the prediction entropy (by Song and Mittal)."""
if (np.absolute(np.sum(logits, axis=1) - 1) <= 1e-3).all():
probs = logits
else:
# Using softmax to compute probability from logits.
probs = special.softmax(logits, axis=1)
if true_labels is None:
# When not given ground truth label, we compute the
# normal prediction entropy.
# See the Equation (7) in https://arxiv.org/pdf/2003.10595.pdf
return np.sum(np.multiply(probs, _log_value(probs)), axis=1)
else:
# When given the ground truth label, we compute the
# modified prediction entropy.
# See the Equation (8) in https://arxiv.org/pdf/2003.10595.pdf
log_probs = _log_value(probs)
reverse_probs = 1 - probs
log_reverse_probs = _log_value(reverse_probs)
modified_probs = np.copy(probs)
modified_probs[range(true_labels.size),
true_labels] = reverse_probs[range(true_labels.size),
true_labels]
modified_log_probs = np.copy(log_reverse_probs)
modified_log_probs[range(true_labels.size),
true_labels] = log_probs[range(true_labels.size),
true_labels]
return np.sum(np.multiply(modified_probs, modified_log_probs), axis=1)
def get_loss_train(self):
"""Calculates (if needed) cross-entropy losses for the training set.
Returns:
Loss (or None if neither the loss nor the labels are present).
"""
if self.loss_train is None:
if self.labels_train is None:
return None
if self.logits_train is not None:
self.loss_train = utils.log_loss_from_logits(self.labels_train,
self.logits_train)
else:
self.loss_train = utils.log_loss(self.labels_train, self.probs_train)
return self.loss_train
def get_loss_test(self):
"""Calculates (if needed) cross-entropy losses for the test set.
Returns:
Loss (or None if neither the loss nor the labels are present).
"""
if self.loss_test is None:
if self.labels_test is None:
return None
if self.logits_test is not None:
self.loss_test = utils.log_loss_from_logits(self.labels_test,
self.logits_test)
else:
self.loss_test = utils.log_loss(self.labels_test, self.probs_test)
return self.loss_test
def get_entropy_train(self):
"""Calculates prediction entropy for the training set."""
if self.entropy_train is not None:
return self.entropy_train
return self._get_entropy(self.logits_train, self.labels_train)
def get_entropy_test(self):
"""Calculates prediction entropy for the test set."""
if self.entropy_test is not None:
return self.entropy_test
return self._get_entropy(self.logits_test, self.labels_test)
def get_train_size(self):
"""Returns size of the training set."""
if self.loss_train is not None:
return self.loss_train.size
if self.entropy_train is not None:
return self.entropy_train.size
return self.logits_or_probs_train.shape[0]
def get_test_size(self):
"""Returns size of the test set."""
if self.loss_test is not None:
return self.loss_test.size
if self.entropy_test is not None:
return self.entropy_test.size
return self.logits_or_probs_test.shape[0]
def validate(self):
"""Validates the inputs."""
if (self.loss_train is None) != (self.loss_test is None):
raise ValueError(
'loss_test and loss_train should both be either set or unset')
if (self.entropy_train is None) != (self.entropy_test is None):
raise ValueError(
'entropy_test and entropy_train should both be either set or unset')
if (self.logits_train is None) != (self.logits_test is None):
raise ValueError(
'logits_train and logits_test should both be either set or unset')
if (self.probs_train is None) != (self.probs_test is None):
raise ValueError(
'probs_train and probs_test should both be either set or unset')
if (self.logits_train is not None) and (self.probs_train is not None):
raise ValueError('Logits and probs can not be both set')
if (self.labels_train is None) != (self.labels_test is None):
raise ValueError(
'labels_train and labels_test should both be either set or unset')
if (self.labels_train is None and self.loss_train is None and
self.logits_train is None and self.entropy_train is None):
raise ValueError(
'At least one of labels, logits, losses or entropy should be set')
if self.labels_train is not None and not _is_integer_type_array(
self.labels_train):
raise ValueError('labels_train elements should have integer type')
if self.labels_test is not None and not _is_integer_type_array(
self.labels_test):
raise ValueError('labels_test elements should have integer type')
_is_np_array(self.logits_train, 'logits_train')
_is_np_array(self.logits_test, 'logits_test')
_is_np_array(self.probs_train, 'probs_train')
_is_np_array(self.probs_test, 'probs_test')
_is_np_array(self.labels_train, 'labels_train')
_is_np_array(self.labels_test, 'labels_test')
_is_np_array(self.loss_train, 'loss_train')
_is_np_array(self.loss_test, 'loss_test')
_is_np_array(self.entropy_train, 'entropy_train')
_is_np_array(self.entropy_test, 'entropy_test')
_is_last_dim_equal(self.logits_train, 'logits_train', self.logits_test,
'logits_test')
_is_last_dim_equal(self.probs_train, 'probs_train', self.probs_test,
'probs_test')
_is_array_one_dimensional(self.loss_train, 'loss_train')
_is_array_one_dimensional(self.loss_test, 'loss_test')
_is_array_one_dimensional(self.entropy_train, 'entropy_train')
_is_array_one_dimensional(self.entropy_test, 'entropy_test')
_is_array_one_dimensional(self.labels_train, 'labels_train')
_is_array_one_dimensional(self.labels_test, 'labels_test')
def __str__(self):
"""Return the shapes of variables that are not None."""
result = ['AttackInputData(']
_append_array_shape(self.loss_train, 'loss_train', result)
_append_array_shape(self.loss_test, 'loss_test', result)
_append_array_shape(self.entropy_train, 'entropy_train', result)
_append_array_shape(self.entropy_test, 'entropy_test', result)
_append_array_shape(self.logits_train, 'logits_train', result)
_append_array_shape(self.logits_test, 'logits_test', result)
_append_array_shape(self.probs_train, 'probs_train', result)
_append_array_shape(self.probs_test, 'probs_test', result)
_append_array_shape(self.labels_train, 'labels_train', result)
_append_array_shape(self.labels_test, 'labels_test', result)
result.append(')')
return '\n'.join(result)
def _append_array_shape(arr: np.array, arr_name: str, result):
if arr is not None:
result.append(' %s with shape: %s,' % (arr_name, arr.shape))
@dataclass
class RocCurve:
"""Represents ROC curve of a membership inference classifier."""
# Thresholds used to define points on ROC curve.
# Thresholds are not explicitly part of the curve, and are stored for
# debugging purposes.
thresholds: np.ndarray
# True positive rates based on thresholds
tpr: np.ndarray
# False positive rates based on thresholds
fpr: np.ndarray
def get_auc(self):
"""Calculates area under curve (aka AUC)."""
return metrics.auc(self.fpr, self.tpr)
def get_attacker_advantage(self):
"""Calculates membership attacker's (or adversary's) advantage.
This metric is inspired by https://arxiv.org/abs/1709.01604, specifically
by Definition 4. The difference here is that we calculate maximum advantage
over all available classifier thresholds.
Returns:
a single float number with membership attacker's advantage.
"""
return max(np.abs(self.tpr - self.fpr))
def __str__(self):
"""Returns AUC and advantage metrics."""
return '\n'.join([
'RocCurve(',
' AUC: %.2f' % self.get_auc(),
' Attacker advantage: %.2f' % self.get_attacker_advantage(), ')'
])
# (no. of training examples, no. of test examples) for the test.
DataSize = collections.namedtuple('DataSize', 'ntrain ntest')
@dataclass
class SingleAttackResult:
"""Results from running a single attack."""
# Data slice this result was calculated for.
slice_spec: SingleSliceSpec
# (no. of training examples, no. of test examples) for the test.
data_size: DataSize
attack_type: AttackType
# NOTE: roc_curve could theoretically be derived from membership scores.
# Currently, we store it explicitly since not all attack types support
# membership scores.
# TODO(b/175870479): Consider deriving ROC curve from the membership scores.
# ROC curve representing the accuracy of the attacker
roc_curve: RocCurve
# Membership score is some measure of confidence of this attacker that
# a particular sample is a member of the training set.
#
# This is NOT necessarily probability. The nature of this score depends on
# the type of attacker. Scores from different attacker types are not directly
# comparable, but can be compared in relative terms (e.g. considering order
# imposed by this measure).
#
# Membership scores for the training set samples. For a perfect attacker,
# all training samples will have higher scores than test samples.
membership_scores_train: np.ndarray = None
# Membership scores for the test set samples. For a perfect attacker, all
# test set samples will have lower scores than the training set samples.
membership_scores_test: np.ndarray = None
def get_attacker_advantage(self):
return self.roc_curve.get_attacker_advantage()
def get_auc(self):
return self.roc_curve.get_auc()
def __str__(self):
"""Returns SliceSpec, AttackType, AUC and advantage metrics."""
return '\n'.join([
'SingleAttackResult(',
' SliceSpec: %s' % str(self.slice_spec),
' DataSize: (ntrain=%d, ntest=%d)' % (self.data_size.ntrain,
self.data_size.ntest),
' AttackType: %s' % str(self.attack_type),
' AUC: %.2f' % self.get_auc(),
' Attacker advantage: %.2f' % self.get_attacker_advantage(), ')'
])
@dataclass
class SingleMembershipProbabilityResult:
"""Results from computing membership probabilities (denoted as privacy risk score in https://arxiv.org/abs/2003.10595).
this part shows how to leverage membership probabilities to perform attacks
with thresholding on them.
"""
# Data slice this result was calculated for.
slice_spec: SingleSliceSpec
train_membership_probs: np.ndarray
test_membership_probs: np.ndarray
def attack_with_varied_thresholds(self, threshold_list):
"""Performs an attack with the specified thresholds.
For each threshold value, we count how many training and test samples with
membership probabilities larger than the threshold and further compute
precision and recall values. We skip the threshold value if it is larger
than every sample's membership probability.
Args:
threshold_list: List of provided thresholds
Returns:
An array of attack results.
"""
fpr, tpr, thresholds = metrics.roc_curve(
np.concatenate((np.ones(len(self.train_membership_probs)),
np.zeros(len(self.test_membership_probs)))),
np.concatenate(
(self.train_membership_probs, self.test_membership_probs)),
drop_intermediate=False)
precision_list = []
recall_list = []
meaningful_threshold_list = []
max_prob = max(self.train_membership_probs.max(),
self.test_membership_probs.max())
for threshold in threshold_list:
if threshold <= max_prob:
idx = np.argwhere(thresholds >= threshold)[-1][0]
meaningful_threshold_list.append(threshold)
precision_list.append(tpr[idx] / (tpr[idx] + fpr[idx]))
recall_list.append(tpr[idx])
return np.array(meaningful_threshold_list), np.array(
precision_list), np.array(recall_list)
def collect_results(self, threshold_list, return_roc_results=True):
"""The membership probability (from 0 to 1) represents each sample's probability of being in the training set.
Usually, we choose a list of threshold values from 0.5 (uncertain of
training or test) to 1 (100% certain of training)
to compute corresponding attack precision and recall.
Args:
threshold_list: List of provided thresholds
return_roc_results: Whether to return ROC results
Returns:
Summary string.
"""
meaningful_threshold_list, precision_list, recall_list = self.attack_with_varied_thresholds(
threshold_list)
summary = []
summary.append('\nMembership probability analysis over slice: \"%s\"' %
str(self.slice_spec))
for i in range(len(meaningful_threshold_list)):
summary.append(
' with %.4f as the threshold on membership probability, the precision-recall pair is (%.4f, %.4f)'
% (meaningful_threshold_list[i], precision_list[i], recall_list[i]))
if return_roc_results:
fpr, tpr, thresholds = metrics.roc_curve(
np.concatenate((np.ones(len(self.train_membership_probs)),
np.zeros(len(self.test_membership_probs)))),
np.concatenate(
(self.train_membership_probs, self.test_membership_probs)))
roc_curve = RocCurve(tpr=tpr, fpr=fpr, thresholds=thresholds)
summary.append(
' thresholding on membership probability achieved an AUC of %.2f' %
(roc_curve.get_auc()))
summary.append(
' thresholding on membership probability achieved an advantage of %.2f'
% (roc_curve.get_attacker_advantage()))
return summary
@dataclass
class MembershipProbabilityResults:
"""Membership probability results from multiple data slices."""
membership_prob_results: Iterable[SingleMembershipProbabilityResult]
def summary(self, threshold_list):
"""Returns the summary of membership probability analyses on all slices."""
summary = []
for single_result in self.membership_prob_results:
single_summary = single_result.collect_results(threshold_list)
summary.extend(single_summary)
return '\n'.join(summary)
@dataclass
class PrivacyReportMetadata:
"""Metadata about the evaluated model.
Used to create a privacy report based on AttackResults.
"""
accuracy_train: float = None
accuracy_test: float = None
loss_train: float = None
loss_test: float = None
model_variant_label: str = 'Default model variant'
epoch_num: int = None
class AttackResultsDFColumns(enum.Enum):
"""Columns for the Pandas DataFrame that stores AttackResults metrics."""
SLICE_FEATURE = 'slice feature'
SLICE_VALUE = 'slice value'
DATA_SIZE_TRAIN = 'train size'
DATA_SIZE_TEST = 'test size'
ATTACK_TYPE = 'attack type'
def __str__(self):
"""Returns 'slice value' instead of AttackResultsDFColumns.SLICE_VALUE."""
return '%s' % self.value
@dataclass
class AttackResults:
"""Results from running multiple attacks."""
single_attack_results: Iterable[SingleAttackResult]
privacy_report_metadata: PrivacyReportMetadata = None
def calculate_pd_dataframe(self):
"""Returns all metrics as a Pandas DataFrame."""
slice_features = []
slice_values = []
data_size_train = []
data_size_test = []
attack_types = []
advantages = []
aucs = []
for attack_result in self.single_attack_results:
slice_spec = attack_result.slice_spec
if slice_spec.entire_dataset:
slice_feature, slice_value = str(slice_spec), ''
else:
slice_feature, slice_value = slice_spec.feature.value, slice_spec.value
slice_features.append(str(slice_feature))
slice_values.append(str(slice_value))
data_size_train.append(attack_result.data_size.ntrain)
data_size_test.append(attack_result.data_size.ntest)
attack_types.append(str(attack_result.attack_type))
advantages.append(float(attack_result.get_attacker_advantage()))
aucs.append(float(attack_result.get_auc()))
df = pd.DataFrame({
str(AttackResultsDFColumns.SLICE_FEATURE): slice_features,
str(AttackResultsDFColumns.SLICE_VALUE): slice_values,
str(AttackResultsDFColumns.DATA_SIZE_TRAIN): data_size_train,
str(AttackResultsDFColumns.DATA_SIZE_TEST): data_size_test,
str(AttackResultsDFColumns.ATTACK_TYPE): attack_types,
str(PrivacyMetric.ATTACKER_ADVANTAGE): advantages,
str(PrivacyMetric.AUC): aucs
})
return df
def summary(self, by_slices=False) -> str:
"""Provides a summary of the metrics.
The summary provides the best-performing attacks for each requested data
slice.
Args:
by_slices : whether to prepare a per-slice summary.
Returns:
A string with a summary of all the metrics.
"""
summary = []
# Summary over all slices
max_auc_result_all = self.get_result_with_max_attacker_advantage()
summary.append('Best-performing attacks over all slices')
summary.append(
' %s (with %d training and %d test examples) achieved an AUC of %.2f on slice %s'
% (max_auc_result_all.attack_type,
max_auc_result_all.data_size.ntrain,
max_auc_result_all.data_size.ntest,
max_auc_result_all.get_auc(),
max_auc_result_all.slice_spec))
max_advantage_result_all = self.get_result_with_max_attacker_advantage()
summary.append(
' %s (with %d training and %d test examples) achieved an advantage of %.2f on slice %s'
% (max_advantage_result_all.attack_type,
max_advantage_result_all.data_size.ntrain,
max_advantage_result_all.data_size.ntest,
max_advantage_result_all.get_attacker_advantage(),
max_advantage_result_all.slice_spec))
slice_dict = self._group_results_by_slice()
if by_slices and len(slice_dict.keys()) > 1:
for slice_str in slice_dict:
results = slice_dict[slice_str]
summary.append('\nBest-performing attacks over slice: \"%s\"' %
slice_str)
max_auc_result = results.get_result_with_max_auc()
summary.append(
' %s (with %d training and %d test examples) achieved an AUC of %.2f'
% (max_auc_result.attack_type,
max_auc_result.data_size.ntrain,
max_auc_result.data_size.ntest,
max_auc_result.get_auc()))
max_advantage_result = results.get_result_with_max_attacker_advantage()
summary.append(
' %s (with %d training and %d test examples) achieved an advantage of %.2f'
% (max_advantage_result.attack_type,
max_advantage_result.data_size.ntrain,
max_auc_result.data_size.ntest,
max_advantage_result.get_attacker_advantage()))
return '\n'.join(summary)
def _group_results_by_slice(self):
"""Groups AttackResults into a dictionary keyed by the slice."""
slice_dict = {}
for attack_result in self.single_attack_results:
slice_str = str(attack_result.slice_spec)
if slice_str not in slice_dict:
slice_dict[slice_str] = AttackResults([])
slice_dict[slice_str].single_attack_results.append(attack_result)
return slice_dict
def get_result_with_max_auc(self) -> SingleAttackResult:
"""Get the result with maximum AUC for all attacks and slices."""
aucs = [result.get_auc() for result in self.single_attack_results]
if min(aucs) < 0.4:
print('Suspiciously low AUC detected: %.2f. ' +
'There might be a bug in the classifier' % min(aucs))
return self.single_attack_results[np.argmax(aucs)]
def get_result_with_max_attacker_advantage(self) -> SingleAttackResult:
"""Get the result with maximum advantage for all attacks and slices."""
return self.single_attack_results[np.argmax([
result.get_attacker_advantage() for result in self.single_attack_results
])]
def save(self, filepath):
"""Saves self to a pickle file."""
with open(filepath, 'wb') as out:
pickle.dump(self, out)
@classmethod
def load(cls, filepath):
"""Loads AttackResults from a pickle file."""
with open(filepath, 'rb') as inp:
return pickle.load(inp)
@dataclass
class AttackResultsCollection:
"""A collection of AttackResults."""
attack_results_list: Iterable[AttackResults]
def append(self, attack_results: AttackResults):
self.attack_results_list.append(attack_results)
def save(self, dirname):
"""Saves self to a pickle file."""
for i, attack_results in enumerate(self.attack_results_list):
filepath = os.path.join(dirname,
_get_attack_results_filename(attack_results, i))
attack_results.save(filepath)
@classmethod
def load(cls, dirname):
"""Loads AttackResultsCollection from all files in a directory."""
loaded_collection = AttackResultsCollection([])
for filepath in sorted(glob.glob('%s/*' % dirname)):
with open(filepath, 'rb') as inp:
loaded_collection.attack_results_list.append(pickle.load(inp))
return loaded_collection
def _get_attack_results_filename(attack_results: AttackResults, index: int):
"""Creates a filename for a specific set of AttackResults."""
metadata = attack_results.privacy_report_metadata
if metadata is not None:
return '%s_%s_epoch_%s.pickle' % (metadata.model_variant_label, index,
metadata.epoch_num)
return '%s.pickle' % index
def get_flattened_attack_metrics(results: AttackResults):
"""Get flattened attack metrics.
Args:
results: membership inference attack results.
Returns:
types: a list of attack types
slices: a list of slices
attack_metrics: a list of metric names
values: a list of metric values, i-th element correspond to properties[i]
"""
types = []
slices = []
attack_metrics = []
values = []
for attack_result in results.single_attack_results:
types += [str(attack_result.attack_type)] * 2
slices += [str(attack_result.slice_spec)] * 2
attack_metrics += ['adv', 'auc']
values += [float(attack_result.get_attacker_advantage()),
float(attack_result.get_auc())]
return types, slices, attack_metrics, values

View file

@ -13,25 +13,25 @@
# limitations under the License.
# Lint as: python3
"""Tests for tensorflow_privacy.privacy.membership_inference_attack.data_structures."""
"""Tests for tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures."""
import os
import tempfile
from absl.testing import absltest
from absl.testing import parameterized
import numpy as np
import pandas as pd
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import _log_value
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import AttackInputData
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import AttackResults
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import AttackResultsCollection
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import AttackType
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import DataSize
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import PrivacyReportMetadata
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import RocCurve
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import SingleAttackResult
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import SingleMembershipProbabilityResult
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import SingleSliceSpec
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import SlicingFeature
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import _log_value
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import AttackInputData
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import AttackResults
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import AttackResultsCollection
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import AttackType
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import DataSize
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import PrivacyReportMetadata
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import RocCurve
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import SingleAttackResult
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import SingleMembershipProbabilityResult
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import SingleSliceSpec
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import SlicingFeature
class SingleSliceSpecTest(parameterized.TestCase):

View file

@ -0,0 +1,148 @@
# Copyright 2020, The TensorFlow Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# Lint as: python3
"""Specifying and creating AttackInputData slices."""
import collections
import copy
from typing import List
import numpy as np
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import AttackInputData
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import SingleSliceSpec
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import SlicingFeature
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import SlicingSpec
def _slice_if_not_none(a, idx):
return None if a is None else a[idx]
def _slice_data_by_indices(data: AttackInputData, idx_train,
idx_test) -> AttackInputData:
"""Slices train fields with with idx_train and test fields with and idx_test."""
result = AttackInputData()
# Slice train data.
result.logits_train = _slice_if_not_none(data.logits_train, idx_train)
result.probs_train = _slice_if_not_none(data.probs_train, idx_train)
result.labels_train = _slice_if_not_none(data.labels_train, idx_train)
result.loss_train = _slice_if_not_none(data.loss_train, idx_train)
result.entropy_train = _slice_if_not_none(data.entropy_train, idx_train)
# Slice test data.
result.logits_test = _slice_if_not_none(data.logits_test, idx_test)
result.probs_test = _slice_if_not_none(data.probs_test, idx_test)
result.labels_test = _slice_if_not_none(data.labels_test, idx_test)
result.loss_test = _slice_if_not_none(data.loss_test, idx_test)
result.entropy_test = _slice_if_not_none(data.entropy_test, idx_test)
return result
def _slice_by_class(data: AttackInputData, class_value: int) -> AttackInputData:
idx_train = data.labels_train == class_value
idx_test = data.labels_test == class_value
return _slice_data_by_indices(data, idx_train, idx_test)
def _slice_by_percentiles(data: AttackInputData, from_percentile: float,
to_percentile: float):
"""Slices samples by loss percentiles."""
# Find from_percentile and to_percentile percentiles in losses.
loss_train = data.get_loss_train()
loss_test = data.get_loss_test()
losses = np.concatenate((loss_train, loss_test))
from_loss = np.percentile(losses, from_percentile)
to_loss = np.percentile(losses, to_percentile)
idx_train = (from_loss <= loss_train) & (loss_train <= to_loss)
idx_test = (from_loss <= loss_test) & (loss_test <= to_loss)
return _slice_data_by_indices(data, idx_train, idx_test)
def _indices_by_classification(logits_or_probs, labels, correctly_classified):
idx_correct = labels == np.argmax(logits_or_probs, axis=1)
return idx_correct if correctly_classified else np.invert(idx_correct)
def _slice_by_classification_correctness(data: AttackInputData,
correctly_classified: bool):
idx_train = _indices_by_classification(data.logits_or_probs_train,
data.labels_train,
correctly_classified)
idx_test = _indices_by_classification(data.logits_or_probs_test,
data.labels_test, correctly_classified)
return _slice_data_by_indices(data, idx_train, idx_test)
def get_single_slice_specs(slicing_spec: SlicingSpec,
num_classes: int = None) -> List[SingleSliceSpec]:
"""Returns slices of data according to slicing_spec."""
result = []
if slicing_spec.entire_dataset:
result.append(SingleSliceSpec())
# Create slices by class.
by_class = slicing_spec.by_class
if isinstance(by_class, bool):
if by_class:
assert num_classes, "When by_class == True, num_classes should be given."
assert 0 <= num_classes <= 1000, (
f"Too much classes for slicing by classes. "
f"Found {num_classes}.")
for c in range(num_classes):
result.append(SingleSliceSpec(SlicingFeature.CLASS, c))
elif isinstance(by_class, int):
result.append(SingleSliceSpec(SlicingFeature.CLASS, by_class))
elif isinstance(by_class, collections.Iterable):
for c in by_class:
result.append(SingleSliceSpec(SlicingFeature.CLASS, c))
# Create slices by percentiles
if slicing_spec.by_percentiles:
for percent in range(0, 100, 10):
result.append(
SingleSliceSpec(SlicingFeature.PERCENTILE, (percent, percent + 10)))
# Create slices by correctness of the classifications.
if slicing_spec.by_classification_correctness:
result.append(SingleSliceSpec(SlicingFeature.CORRECTLY_CLASSIFIED, True))
result.append(SingleSliceSpec(SlicingFeature.CORRECTLY_CLASSIFIED, False))
return result
def get_slice(data: AttackInputData,
slice_spec: SingleSliceSpec) -> AttackInputData:
"""Returns a single slice of data according to slice_spec."""
if slice_spec.entire_dataset:
data_slice = copy.copy(data)
elif slice_spec.feature == SlicingFeature.CLASS:
data_slice = _slice_by_class(data, slice_spec.value)
elif slice_spec.feature == SlicingFeature.PERCENTILE:
from_percentile, to_percentile = slice_spec.value
data_slice = _slice_by_percentiles(data, from_percentile, to_percentile)
elif slice_spec.feature == SlicingFeature.CORRECTLY_CLASSIFIED:
data_slice = _slice_by_classification_correctness(data, slice_spec.value)
else:
raise ValueError('Unknown slice spec feature "%s"' % slice_spec.feature)
data_slice.slice_spec = slice_spec
return data_slice

View file

@ -13,17 +13,17 @@
# limitations under the License.
# Lint as: python3
"""Tests for tensorflow_privacy.privacy.membership_inference_attack.dataset_slicing."""
"""Tests for tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.dataset_slicing."""
from absl.testing import absltest
import numpy as np
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import AttackInputData
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import SingleSliceSpec
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import SlicingFeature
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import SlicingSpec
from tensorflow_privacy.privacy.membership_inference_attack.dataset_slicing import get_single_slice_specs
from tensorflow_privacy.privacy.membership_inference_attack.dataset_slicing import get_slice
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import AttackInputData
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import SingleSliceSpec
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import SlicingFeature
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import SlicingSpec
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.dataset_slicing import get_single_slice_specs
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.dataset_slicing import get_slice
def _are_all_fields_equal(lhs, rhs) -> bool:

View file

@ -0,0 +1,141 @@
# Copyright 2020, The TensorFlow Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# Lint as: python3
"""A callback and a function in keras for membership inference attack."""
import os
from typing import Iterable
from absl import logging
import tensorflow as tf
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack import membership_inference_attack as mia
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import AttackInputData
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import AttackType
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import get_flattened_attack_metrics
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import SlicingSpec
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.utils import log_loss
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.utils_tensorboard import write_results_to_tensorboard_tf2 as write_results_to_tensorboard
def calculate_losses(model, data, labels):
"""Calculate losses of model prediction on data, provided true labels.
Args:
model: model to make prediction
data: samples
labels: true labels of samples (integer valued)
Returns:
preds: probability vector of each sample
loss: cross entropy loss of each sample
"""
pred = model.predict(data)
loss = log_loss(labels, pred)
return pred, loss
class MembershipInferenceCallback(tf.keras.callbacks.Callback):
"""Callback to perform membership inference attack on epoch end."""
def __init__(
self,
in_train, out_train,
slicing_spec: SlicingSpec = None,
attack_types: Iterable[AttackType] = (AttackType.THRESHOLD_ATTACK,),
tensorboard_dir=None,
tensorboard_merge_classifiers=False):
"""Initalizes the callback.
Args:
in_train: (in_training samples, in_training labels)
out_train: (out_training samples, out_training labels)
slicing_spec: slicing specification of the attack
attack_types: a list of attacks, each of type AttackType
tensorboard_dir: directory for tensorboard summary
tensorboard_merge_classifiers: if true, plot different classifiers with
the same slicing_spec and metric in the same figure
"""
self._in_train_data, self._in_train_labels = in_train
self._out_train_data, self._out_train_labels = out_train
self._slicing_spec = slicing_spec
self._attack_types = attack_types
self._tensorboard_merge_classifiers = tensorboard_merge_classifiers
if tensorboard_dir:
if tensorboard_merge_classifiers:
self._writers = {}
for attack_type in attack_types:
self._writers[attack_type.name] = tf.summary.create_file_writer(
os.path.join(tensorboard_dir, 'MI', attack_type.name))
else:
self._writers = tf.summary.create_file_writer(
os.path.join(tensorboard_dir, 'MI'))
logging.info('Will write to tensorboard.')
else:
self._writers = None
def on_epoch_end(self, epoch, logs=None):
results = run_attack_on_keras_model(
self.model,
(self._in_train_data, self._in_train_labels),
(self._out_train_data, self._out_train_labels),
self._slicing_spec,
self._attack_types)
logging.info(results)
att_types, att_slices, att_metrics, att_values = get_flattened_attack_metrics(
results)
print('Attack result:')
print('\n'.join([' %s: %.4f' % (', '.join([s, t, m]), v) for t, s, m, v in
zip(att_types, att_slices, att_metrics, att_values)]))
# Write to tensorboard if tensorboard_dir is specified
if self._writers is not None:
write_results_to_tensorboard(results, self._writers, epoch,
self._tensorboard_merge_classifiers)
def run_attack_on_keras_model(
model, in_train, out_train,
slicing_spec: SlicingSpec = None,
attack_types: Iterable[AttackType] = (AttackType.THRESHOLD_ATTACK,)):
"""Performs the attack on a trained model.
Args:
model: model to be tested
in_train: a (in_training samples, in_training labels) tuple
out_train: a (out_training samples, out_training labels) tuple
slicing_spec: slicing specification of the attack
attack_types: a list of attacks, each of type AttackType
Returns:
Results of the attack
"""
in_train_data, in_train_labels = in_train
out_train_data, out_train_labels = out_train
# Compute predictions and losses
in_train_pred, in_train_loss = calculate_losses(model, in_train_data,
in_train_labels)
out_train_pred, out_train_loss = calculate_losses(model, out_train_data,
out_train_labels)
attack_input = AttackInputData(
logits_train=in_train_pred, logits_test=out_train_pred,
labels_train=in_train_labels, labels_test=out_train_labels,
loss_train=in_train_loss, loss_test=out_train_loss
)
results = mia.run_attacks(attack_input,
slicing_spec=slicing_spec,
attack_types=attack_types)
return results

View file

@ -20,11 +20,11 @@ from absl import flags
import numpy as np
import tensorflow as tf
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import AttackType
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import get_flattened_attack_metrics
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import SlicingSpec
from tensorflow_privacy.privacy.membership_inference_attack.keras_evaluation import MembershipInferenceCallback
from tensorflow_privacy.privacy.membership_inference_attack.keras_evaluation import run_attack_on_keras_model
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import AttackType
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import get_flattened_attack_metrics
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import SlicingSpec
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.keras_evaluation import MembershipInferenceCallback
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.keras_evaluation import run_attack_on_keras_model
FLAGS = flags.FLAGS

View file

@ -13,17 +13,17 @@
# limitations under the License.
# Lint as: python3
"""Tests for tensorflow_privacy.privacy.membership_inference_attack.keras_evaluation."""
"""Tests for tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.keras_evaluation."""
from absl.testing import absltest
import numpy as np
import tensorflow.compat.v1 as tf
from tensorflow_privacy.privacy.membership_inference_attack import keras_evaluation
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import AttackResults
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import AttackType
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import get_flattened_attack_metrics
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack import keras_evaluation
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import AttackResults
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import AttackType
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import get_flattened_attack_metrics
class UtilsTest(absltest.TestCase):

View file

@ -0,0 +1,332 @@
# Copyright 2020, The TensorFlow Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# Lint as: python3
"""Code that runs membership inference attacks based on the model outputs.
This file belongs to the new API for membership inference attacks. This file
will be renamed to membership_inference_attack.py after the old API is removed.
"""
from typing import Iterable
import numpy as np
from sklearn import metrics
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack import models
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import AttackInputData
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import AttackResults
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import AttackType
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import DataSize
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import MembershipProbabilityResults
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import PrivacyReportMetadata
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import RocCurve
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import SingleAttackResult
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import SingleMembershipProbabilityResult
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import SingleSliceSpec
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import SlicingSpec
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.dataset_slicing import get_single_slice_specs
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.dataset_slicing import get_slice
def _get_slice_spec(data: AttackInputData) -> SingleSliceSpec:
if hasattr(data, 'slice_spec'):
return data.slice_spec
return SingleSliceSpec()
def _run_trained_attack(attack_input: AttackInputData,
attack_type: AttackType,
balance_attacker_training: bool = True):
"""Classification attack done by ML models."""
attacker = None
if attack_type == AttackType.LOGISTIC_REGRESSION:
attacker = models.LogisticRegressionAttacker()
elif attack_type == AttackType.MULTI_LAYERED_PERCEPTRON:
attacker = models.MultilayerPerceptronAttacker()
elif attack_type == AttackType.RANDOM_FOREST:
attacker = models.RandomForestAttacker()
elif attack_type == AttackType.K_NEAREST_NEIGHBORS:
attacker = models.KNearestNeighborsAttacker()
else:
raise NotImplementedError('Attack type %s not implemented yet.' %
attack_type)
prepared_attacker_data = models.create_attacker_data(
attack_input, balance=balance_attacker_training)
attacker.train_model(prepared_attacker_data.features_train,
prepared_attacker_data.is_training_labels_train)
# Run the attacker on (permuted) test examples.
predictions_test = attacker.predict(prepared_attacker_data.features_test)
# Generate ROC curves with predictions.
fpr, tpr, thresholds = metrics.roc_curve(
prepared_attacker_data.is_training_labels_test, predictions_test)
roc_curve = RocCurve(tpr=tpr, fpr=fpr, thresholds=thresholds)
# NOTE: In the current setup we can't obtain membership scores for all
# samples, since some of them were used to train the attacker. This can be
# fixed by training several attackers to ensure each sample was left out
# in exactly one attacker (basically, this means performing cross-validation).
# TODO(b/175870479): Implement membership scores for predicted attackers.
return SingleAttackResult(
slice_spec=_get_slice_spec(attack_input),
data_size=prepared_attacker_data.data_size,
attack_type=attack_type,
roc_curve=roc_curve)
def _run_threshold_attack(attack_input: AttackInputData):
"""Runs a threshold attack on loss."""
ntrain, ntest = attack_input.get_train_size(), attack_input.get_test_size()
loss_train = attack_input.get_loss_train()
loss_test = attack_input.get_loss_test()
if loss_train is None or loss_test is None:
raise ValueError('Not possible to run threshold attack without losses.')
fpr, tpr, thresholds = metrics.roc_curve(
np.concatenate((np.zeros(ntrain), np.ones(ntest))),
np.concatenate((loss_train, loss_test)))
roc_curve = RocCurve(tpr=tpr, fpr=fpr, thresholds=thresholds)
return SingleAttackResult(
slice_spec=_get_slice_spec(attack_input),
data_size=DataSize(ntrain=ntrain, ntest=ntest),
attack_type=AttackType.THRESHOLD_ATTACK,
membership_scores_train=-attack_input.get_loss_train(),
membership_scores_test=-attack_input.get_loss_test(),
roc_curve=roc_curve)
def _run_threshold_entropy_attack(attack_input: AttackInputData):
ntrain, ntest = attack_input.get_train_size(), attack_input.get_test_size()
fpr, tpr, thresholds = metrics.roc_curve(
np.concatenate((np.zeros(ntrain), np.ones(ntest))),
np.concatenate(
(attack_input.get_entropy_train(), attack_input.get_entropy_test())))
roc_curve = RocCurve(tpr=tpr, fpr=fpr, thresholds=thresholds)
return SingleAttackResult(
slice_spec=_get_slice_spec(attack_input),
data_size=DataSize(ntrain=ntrain, ntest=ntest),
attack_type=AttackType.THRESHOLD_ENTROPY_ATTACK,
membership_scores_train=-attack_input.get_entropy_train(),
membership_scores_test=-attack_input.get_entropy_test(),
roc_curve=roc_curve)
def _run_attack(attack_input: AttackInputData,
attack_type: AttackType,
balance_attacker_training: bool = True,
min_num_samples: int = 1):
"""Runs membership inference attacks for specified input and type.
Args:
attack_input: input data for running an attack
attack_type: the attack to run
balance_attacker_training: Whether the training and test sets for the
membership inference attacker should have a balanced (roughly equal)
number of samples from the training and test sets used to develop
the model under attack.
min_num_samples: minimum number of examples in either training or test data.
Returns:
the attack result.
"""
attack_input.validate()
if min(attack_input.get_train_size(),
attack_input.get_test_size()) < min_num_samples:
return None
if attack_type.is_trained_attack:
return _run_trained_attack(attack_input, attack_type,
balance_attacker_training)
if attack_type == AttackType.THRESHOLD_ENTROPY_ATTACK:
return _run_threshold_entropy_attack(attack_input)
return _run_threshold_attack(attack_input)
def run_attacks(attack_input: AttackInputData,
slicing_spec: SlicingSpec = None,
attack_types: Iterable[AttackType] = (
AttackType.THRESHOLD_ATTACK,),
privacy_report_metadata: PrivacyReportMetadata = None,
balance_attacker_training: bool = True,
min_num_samples: int = 1) -> AttackResults:
"""Runs membership inference attacks on a classification model.
It runs attacks specified by attack_types on each attack_input slice which is
specified by slicing_spec.
Args:
attack_input: input data for running an attack
slicing_spec: specifies attack_input slices to run attack on
attack_types: attacks to run
privacy_report_metadata: the metadata of the model under attack.
balance_attacker_training: Whether the training and test sets for the
membership inference attacker should have a balanced (roughly equal)
number of samples from the training and test sets used to develop
the model under attack.
min_num_samples: minimum number of examples in either training or test data.
Returns:
the attack result.
"""
attack_input.validate()
attack_results = []
if slicing_spec is None:
slicing_spec = SlicingSpec(entire_dataset=True)
num_classes = None
if slicing_spec.by_class:
num_classes = attack_input.num_classes
input_slice_specs = get_single_slice_specs(slicing_spec, num_classes)
for single_slice_spec in input_slice_specs:
attack_input_slice = get_slice(attack_input, single_slice_spec)
for attack_type in attack_types:
attack_result = _run_attack(attack_input_slice, attack_type,
balance_attacker_training,
min_num_samples)
if attack_result is not None:
attack_results.append(attack_result)
privacy_report_metadata = _compute_missing_privacy_report_metadata(
privacy_report_metadata, attack_input)
return AttackResults(
single_attack_results=attack_results,
privacy_report_metadata=privacy_report_metadata)
def _compute_membership_probability(
attack_input: AttackInputData,
num_bins: int = 15) -> SingleMembershipProbabilityResult:
"""Computes each individual point's likelihood of being a member (denoted as privacy risk score in https://arxiv.org/abs/2003.10595).
For an individual sample, its privacy risk score is computed as the posterior
probability of being in the training set
after observing its prediction output by the target machine learning model.
Args:
attack_input: input data for compute membership probability
num_bins: the number of bins used to compute the training/test histogram
Returns:
membership probability results
"""
# Uses the provided loss or entropy. Otherwise computes the loss.
if attack_input.loss_train is not None and attack_input.loss_test is not None:
train_values = attack_input.loss_train
test_values = attack_input.loss_test
elif attack_input.entropy_train is not None and attack_input.entropy_test is not None:
train_values = attack_input.entropy_train
test_values = attack_input.entropy_test
else:
train_values = attack_input.get_loss_train()
test_values = attack_input.get_loss_test()
# Compute the histogram in the log scale
small_value = 1e-10
train_values = np.maximum(train_values, small_value)
test_values = np.maximum(test_values, small_value)
min_value = min(train_values.min(), test_values.min())
max_value = max(train_values.max(), test_values.max())
bins_hist = np.logspace(
np.log10(min_value), np.log10(max_value), num_bins + 1)
train_hist, _ = np.histogram(train_values, bins=bins_hist)
train_hist = train_hist / (len(train_values) + 0.0)
train_hist_indices = np.fmin(
np.digitize(train_values, bins=bins_hist), num_bins) - 1
test_hist, _ = np.histogram(test_values, bins=bins_hist)
test_hist = test_hist / (len(test_values) + 0.0)
test_hist_indices = np.fmin(
np.digitize(test_values, bins=bins_hist), num_bins) - 1
combined_hist = train_hist + test_hist
combined_hist[combined_hist == 0] = small_value
membership_prob_list = train_hist / (combined_hist + 0.0)
train_membership_probs = membership_prob_list[train_hist_indices]
test_membership_probs = membership_prob_list[test_hist_indices]
return SingleMembershipProbabilityResult(
slice_spec=_get_slice_spec(attack_input),
train_membership_probs=train_membership_probs,
test_membership_probs=test_membership_probs)
def run_membership_probability_analysis(
attack_input: AttackInputData,
slicing_spec: SlicingSpec = None) -> MembershipProbabilityResults:
"""Perform membership probability analysis on all given slice types.
Args:
attack_input: input data for compute membership probabilities
slicing_spec: specifies attack_input slices
Returns:
the membership probability results.
"""
attack_input.validate()
membership_prob_results = []
if slicing_spec is None:
slicing_spec = SlicingSpec(entire_dataset=True)
num_classes = None
if slicing_spec.by_class:
num_classes = attack_input.num_classes
input_slice_specs = get_single_slice_specs(slicing_spec, num_classes)
for single_slice_spec in input_slice_specs:
attack_input_slice = get_slice(attack_input, single_slice_spec)
membership_prob_results.append(
_compute_membership_probability(attack_input_slice))
return MembershipProbabilityResults(
membership_prob_results=membership_prob_results)
def _compute_missing_privacy_report_metadata(
metadata: PrivacyReportMetadata,
attack_input: AttackInputData) -> PrivacyReportMetadata:
"""Populates metadata fields if they are missing."""
if metadata is None:
metadata = PrivacyReportMetadata()
if metadata.accuracy_train is None:
metadata.accuracy_train = _get_accuracy(attack_input.logits_train,
attack_input.labels_train)
if metadata.accuracy_test is None:
metadata.accuracy_test = _get_accuracy(attack_input.logits_test,
attack_input.labels_test)
loss_train = attack_input.get_loss_train()
loss_test = attack_input.get_loss_test()
if metadata.loss_train is None and loss_train is not None:
metadata.loss_train = np.average(loss_train)
if metadata.loss_test is None and loss_test is not None:
metadata.loss_test = np.average(loss_test)
return metadata
def _get_accuracy(logits, labels):
"""Computes the accuracy if it is missing."""
if logits is None or labels is None:
return None
return metrics.accuracy_score(labels, np.argmax(logits, axis=1))

View file

@ -13,17 +13,17 @@
# limitations under the License.
# Lint as: python3
"""Tests for tensorflow_privacy.privacy.membership_inference_attack.utils."""
"""Tests for tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.utils."""
from absl.testing import absltest
import numpy as np
from tensorflow_privacy.privacy.membership_inference_attack import membership_inference_attack as mia
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import AttackInputData
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import AttackType
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import DataSize
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import SingleSliceSpec
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import SlicingFeature
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import SlicingSpec
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack import membership_inference_attack as mia
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import AttackInputData
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import AttackType
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import DataSize
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import SingleSliceSpec
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import SlicingFeature
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import SlicingSpec
def get_test_input(n_train, n_test):

View file

@ -0,0 +1,210 @@
# Copyright 2020, The TensorFlow Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# Lint as: python3
"""Trained models for membership inference attacks."""
from dataclasses import dataclass
import numpy as np
from sklearn import ensemble
from sklearn import linear_model
from sklearn import model_selection
from sklearn import neighbors
from sklearn import neural_network
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import AttackInputData
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import DataSize
@dataclass
class AttackerData:
"""Input data for an ML classifier attack.
This includes only the data, and not configuration.
"""
features_train: np.ndarray = None
# element-wise boolean array denoting if the example was part of training.
is_training_labels_train: np.ndarray = None
features_test: np.ndarray = None
# element-wise boolean array denoting if the example was part of training.
is_training_labels_test: np.ndarray = None
data_size: DataSize = None
def create_attacker_data(attack_input_data: AttackInputData,
test_fraction: float = 0.25,
balance: bool = True) -> AttackerData:
"""Prepare AttackInputData to train ML attackers.
Combines logits and losses and performs a random train-test split.
Args:
attack_input_data: Original AttackInputData
test_fraction: Fraction of the dataset to include in the test split.
balance: Whether the training and test sets for the membership inference
attacker should have a balanced (roughly equal) number of samples
from the training and test sets used to develop the model
under attack.
Returns:
AttackerData.
"""
attack_input_train = _column_stack(attack_input_data.logits_or_probs_train,
attack_input_data.get_loss_train())
attack_input_test = _column_stack(attack_input_data.logits_or_probs_test,
attack_input_data.get_loss_test())
if balance:
min_size = min(attack_input_data.get_train_size(),
attack_input_data.get_test_size())
attack_input_train = _sample_multidimensional_array(attack_input_train,
min_size)
attack_input_test = _sample_multidimensional_array(attack_input_test,
min_size)
ntrain, ntest = attack_input_train.shape[0], attack_input_test.shape[0]
features_all = np.concatenate((attack_input_train, attack_input_test))
labels_all = np.concatenate(((np.zeros(ntrain)), (np.ones(ntest))))
# Perform a train-test split
features_train, features_test, is_training_labels_train, is_training_labels_test = model_selection.train_test_split(
features_all, labels_all, test_size=test_fraction, stratify=labels_all)
return AttackerData(features_train, is_training_labels_train, features_test,
is_training_labels_test,
DataSize(ntrain=ntrain, ntest=ntest))
def _sample_multidimensional_array(array, size):
indices = np.random.choice(len(array), size, replace=False)
return array[indices]
def _column_stack(logits, loss):
"""Stacks logits and losses.
In case that only one exists, returns that one.
Args:
logits: logits array
loss: loss array
Returns:
stacked logits and losses (or only one if both do not exist).
"""
if logits is None:
return np.expand_dims(loss, axis=-1)
if loss is None:
return logits
return np.column_stack((logits, loss))
class TrainedAttacker:
"""Base class for training attack models."""
model = None
def train_model(self, input_features, is_training_labels):
"""Train an attacker model.
This is trained on examples from train and test datasets.
Args:
input_features : array-like of shape (n_samples, n_features) Training
vector, where n_samples is the number of samples and n_features is the
number of features.
is_training_labels : a vector of booleans of shape (n_samples, )
representing whether the sample is in the training set or not.
"""
raise NotImplementedError()
def predict(self, input_features):
"""Predicts whether input_features belongs to train or test.
Args:
input_features : A vector of features with the same semantics as x_train
passed to train_model.
Returns:
An array of probabilities denoting whether the example belongs to test.
"""
if self.model is None:
raise AssertionError(
'Model not trained yet. Please call train_model first.')
return self.model.predict_proba(input_features)[:, 1]
class LogisticRegressionAttacker(TrainedAttacker):
"""Logistic regression attacker."""
def train_model(self, input_features, is_training_labels):
lr = linear_model.LogisticRegression(solver='lbfgs')
param_grid = {
'C': np.logspace(-4, 2, 10),
}
model = model_selection.GridSearchCV(
lr, param_grid=param_grid, cv=3, n_jobs=1, verbose=0)
model.fit(input_features, is_training_labels)
self.model = model
class MultilayerPerceptronAttacker(TrainedAttacker):
"""Multilayer perceptron attacker."""
def train_model(self, input_features, is_training_labels):
mlp_model = neural_network.MLPClassifier()
param_grid = {
'hidden_layer_sizes': [(64,), (32, 32)],
'solver': ['adam'],
'alpha': [0.0001, 0.001, 0.01],
}
n_jobs = -1
model = model_selection.GridSearchCV(
mlp_model, param_grid=param_grid, cv=3, n_jobs=n_jobs, verbose=0)
model.fit(input_features, is_training_labels)
self.model = model
class RandomForestAttacker(TrainedAttacker):
"""Random forest attacker."""
def train_model(self, input_features, is_training_labels):
"""Setup a random forest pipeline with cross-validation."""
rf_model = ensemble.RandomForestClassifier()
param_grid = {
'n_estimators': [100],
'max_features': ['auto', 'sqrt'],
'max_depth': [5, 10, 20, None],
'min_samples_split': [2, 5, 10],
'min_samples_leaf': [1, 2, 4]
}
n_jobs = -1
model = model_selection.GridSearchCV(
rf_model, param_grid=param_grid, cv=3, n_jobs=n_jobs, verbose=0)
model.fit(input_features, is_training_labels)
self.model = model
class KNearestNeighborsAttacker(TrainedAttacker):
"""K nearest neighbor attacker."""
def train_model(self, input_features, is_training_labels):
knn_model = neighbors.KNeighborsClassifier()
param_grid = {
'n_neighbors': [3, 5, 7],
}
model = model_selection.GridSearchCV(
knn_model, param_grid=param_grid, cv=3, n_jobs=1, verbose=0)
model.fit(input_features, is_training_labels)
self.model = model

View file

@ -13,12 +13,12 @@
# limitations under the License.
# Lint as: python3
"""Tests for tensorflow_privacy.privacy.membership_inference_attack.data_structures."""
"""Tests for tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures."""
from absl.testing import absltest
import numpy as np
from tensorflow_privacy.privacy.membership_inference_attack import models
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import AttackInputData
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack import models
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import AttackInputData
class TrainedAttackerTest(absltest.TestCase):

View file

@ -0,0 +1,86 @@
# Copyright 2020, The TensorFlow Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# Lint as: python3
"""Plotting functionality for membership inference attack analysis.
Functions to plot ROC curves and histograms as well as functionality to store
figures to colossus.
"""
from typing import Text, Iterable, Optional
import matplotlib.pyplot as plt
import numpy as np
from sklearn import metrics
def save_plot(figure: plt.Figure, path: Text, outformat='png'):
"""Store a figure to disk."""
if path is not None:
with open(path, 'wb') as f:
figure.savefig(f, bbox_inches='tight', format=outformat)
plt.close(figure)
def plot_curve_with_area(x: Iterable[float],
y: Iterable[float],
xlabel: Text = 'x',
ylabel: Text = 'y') -> plt.Figure:
"""Plot the curve defined by inputs and the area under the curve.
All entries of x and y are required to lie between 0 and 1.
For example, x could be recall and y precision, or x is fpr and y is tpr.
Args:
x: Values on x-axis (1d)
y: Values on y-axis (must be same length as x)
xlabel: Label for x axis
ylabel: Label for y axis
Returns:
The matplotlib figure handle
"""
fig = plt.figure()
plt.plot([0, 1], [0, 1], 'k', lw=1.0)
plt.plot(x, y, lw=2, label=f'AUC: {metrics.auc(x, y):.3f}')
plt.xlabel(xlabel)
plt.ylabel(ylabel)
plt.legend()
return fig
def plot_histograms(train: Iterable[float],
test: Iterable[float],
xlabel: Text = 'x',
thresh: Optional[float] = None) -> plt.Figure:
"""Plot histograms of training versus test metrics."""
xmin = min(np.min(train), np.min(test))
xmax = max(np.max(train), np.max(test))
bins = np.linspace(xmin, xmax, 100)
fig = plt.figure()
plt.hist(test, bins=bins, density=True, alpha=0.5, label='test', log='y')
plt.hist(train, bins=bins, density=True, alpha=0.5, label='train', log='y')
if thresh is not None:
plt.axvline(thresh, c='r', label=f'threshold = {thresh:.3f}')
plt.xlabel(xlabel)
plt.ylabel('normalized counts (density)')
plt.legend()
return fig
def plot_roc_curve(roc_curve) -> plt.Figure:
"""Plot the ROC curve and the area under the curve."""
return plot_curve_with_area(
roc_curve.fpr, roc_curve.tpr, xlabel='FPR', ylabel='TPR')

View file

@ -0,0 +1,138 @@
# Copyright 2020, The TensorFlow Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# Lint as: python3
"""Plotting code for ML Privacy Reports."""
from typing import Iterable
import matplotlib.pyplot as plt
import pandas as pd
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import AttackResults
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import AttackResultsCollection
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import AttackResultsDFColumns
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import ENTIRE_DATASET_SLICE_STR
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import PrivacyMetric
# Helper constants for DataFrame keys.
LEGEND_LABEL_STR = 'legend label'
EPOCH_STR = 'Epoch'
TRAIN_ACCURACY_STR = 'Train accuracy'
def plot_by_epochs(results: AttackResultsCollection,
privacy_metrics: Iterable[PrivacyMetric]) -> plt.Figure:
"""Plots privacy vulnerabilities vs epoch numbers.
In case multiple privacy metrics are specified, the plot will feature
multiple subplots (one subplot per metrics). Multiple model variants
are supported.
Args:
results: AttackResults for the plot
privacy_metrics: List of enumerated privacy metrics that should be plotted.
Returns:
A pyplot figure with privacy vs accuracy plots.
"""
_validate_results(results.attack_results_list)
all_results_df = _calculate_combined_df_with_metadata(
results.attack_results_list)
return _generate_subplots(
all_results_df=all_results_df,
x_axis_metric='Epoch',
figure_title='Vulnerability per Epoch',
privacy_metrics=privacy_metrics)
def plot_privacy_vs_accuracy(results: AttackResultsCollection,
privacy_metrics: Iterable[PrivacyMetric]):
"""Plots privacy vulnerabilities vs accuracy plots.
In case multiple privacy metrics are specified, the plot will feature
multiple subplots (one subplot per metrics). Multiple model variants
are supported.
Args:
results: AttackResults for the plot
privacy_metrics: List of enumerated privacy metrics that should be plotted.
Returns:
A pyplot figure with privacy vs accuracy plots.
"""
_validate_results(results.attack_results_list)
all_results_df = _calculate_combined_df_with_metadata(
results.attack_results_list)
return _generate_subplots(
all_results_df=all_results_df,
x_axis_metric='Train accuracy',
figure_title='Privacy vs Utility Analysis',
privacy_metrics=privacy_metrics)
def _calculate_combined_df_with_metadata(results: Iterable[AttackResults]):
"""Adds metadata to the dataframe and concats them together."""
all_results_df = None
for attack_results in results:
attack_results_df = attack_results.calculate_pd_dataframe()
attack_results_df = attack_results_df.loc[attack_results_df[str(
AttackResultsDFColumns.SLICE_FEATURE)] == ENTIRE_DATASET_SLICE_STR]
attack_results_df.insert(0, EPOCH_STR,
attack_results.privacy_report_metadata.epoch_num)
attack_results_df.insert(
0, TRAIN_ACCURACY_STR,
attack_results.privacy_report_metadata.accuracy_train)
attack_results_df.insert(
0, LEGEND_LABEL_STR,
attack_results.privacy_report_metadata.model_variant_label + ' - ' +
attack_results_df[str(AttackResultsDFColumns.ATTACK_TYPE)])
if all_results_df is None:
all_results_df = attack_results_df
else:
all_results_df = pd.concat([all_results_df, attack_results_df],
ignore_index=True)
return all_results_df
def _generate_subplots(all_results_df: pd.DataFrame, x_axis_metric: str,
figure_title: str,
privacy_metrics: Iterable[PrivacyMetric]):
"""Create one subplot per privacy metric for a specified x_axis_metric."""
fig, axes = plt.subplots(
1, len(privacy_metrics), figsize=(5 * len(privacy_metrics) + 3, 5))
# Set a title for the entire group of subplots.
fig.suptitle(figure_title)
if len(privacy_metrics) == 1:
axes = (axes,)
for i, privacy_metric in enumerate(privacy_metrics):
legend_labels = all_results_df[LEGEND_LABEL_STR].unique()
for legend_label in legend_labels:
single_label_results = all_results_df.loc[all_results_df[LEGEND_LABEL_STR]
== legend_label]
sorted_label_results = single_label_results.sort_values(x_axis_metric)
axes[i].plot(sorted_label_results[x_axis_metric],
sorted_label_results[str(privacy_metric)])
axes[i].set_xlabel(x_axis_metric)
axes[i].set_title('%s for %s' % (privacy_metric, ENTIRE_DATASET_SLICE_STR))
plt.legend(legend_labels, loc='upper left', bbox_to_anchor=(1.02, 1))
fig.tight_layout(rect=[0, 0, 1, 0.93]) # Leave space for suptitle.
return fig
def _validate_results(results: Iterable[AttackResults]):
for attack_results in results:
if not attack_results or not attack_results.privacy_report_metadata:
raise ValueError('Privacy metadata is not defined.')
if attack_results.privacy_report_metadata.epoch_num is None:
raise ValueError('epoch_num in metadata is not defined.')

View file

@ -13,21 +13,20 @@
# limitations under the License.
# Lint as: python3
"""Tests for tensorflow_privacy.privacy.membership_inference_attack.privacy_report."""
"""Tests for tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.privacy_report."""
from absl.testing import absltest
import numpy as np
from tensorflow_privacy.privacy.membership_inference_attack import privacy_report
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack import privacy_report
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import AttackResults
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import AttackResultsCollection
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import AttackType
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import DataSize
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import \
PrivacyReportMetadata
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import RocCurve
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import SingleAttackResult
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import SingleSliceSpec
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import AttackResults
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import AttackResultsCollection
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import AttackType
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import DataSize
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import PrivacyReportMetadata
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import RocCurve
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import SingleAttackResult
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import SingleSliceSpec
class PrivacyReportTest(absltest.TestCase):

View file

@ -0,0 +1,373 @@
# Copyright 2020, The TensorFlow Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# Lint as: python3
"""Code for membership inference attacks on seq2seq models.
Contains seq2seq specific logic for attack data structures, attack data
generation,
and the logistic regression membership inference attack.
"""
from typing import Iterator, List
from dataclasses import dataclass
import numpy as np
from scipy.stats import rankdata
from sklearn import metrics
from sklearn import model_selection
import tensorflow as tf
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack import models
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import AttackResults
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import AttackType
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import DataSize
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import PrivacyReportMetadata
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import RocCurve
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import SingleAttackResult
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import SingleSliceSpec
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.models import _sample_multidimensional_array
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.models import AttackerData
def _is_iterator(obj, obj_name):
"""Checks whether obj is a generator."""
if obj is not None and not isinstance(obj, Iterator):
raise ValueError('%s should be a generator.' % obj_name)
@dataclass
class Seq2SeqAttackInputData:
"""Input data for running an attack on seq2seq models.
This includes only the data, and not configuration.
"""
logits_train: Iterator[np.ndarray] = None
logits_test: Iterator[np.ndarray] = None
# Contains ground-truth token indices for the target sequences.
labels_train: Iterator[np.ndarray] = None
labels_test: Iterator[np.ndarray] = None
# Size of the target sequence vocabulary.
vocab_size: int = None
# Train, test size = number of batches in training, test set.
# These values need to be supplied by the user as logits, labels
# are lazy loaded for seq2seq models.
train_size: int = 0
test_size: int = 0
def validate(self):
"""Validates the inputs."""
if (self.logits_train is None) != (self.logits_test is None):
raise ValueError(
'logits_train and logits_test should both be either set or unset')
if (self.labels_train is None) != (self.labels_test is None):
raise ValueError(
'labels_train and labels_test should both be either set or unset')
if self.logits_train is None or self.labels_train is None:
raise ValueError(
'Labels, logits of training, test sets should all be set')
if (self.vocab_size is None or self.train_size is None or
self.test_size is None):
raise ValueError('vocab_size, train_size, test_size should all be set')
if self.vocab_size is not None and not int:
raise ValueError('vocab_size should be of integer type')
if self.train_size is not None and not int:
raise ValueError('train_size should be of integer type')
if self.test_size is not None and not int:
raise ValueError('test_size should be of integer type')
_is_iterator(self.logits_train, 'logits_train')
_is_iterator(self.logits_test, 'logits_test')
_is_iterator(self.labels_train, 'labels_train')
_is_iterator(self.labels_test, 'labels_test')
def __str__(self):
"""Returns the shapes of variables that are not None."""
result = ['AttackInputData(']
if self.vocab_size is not None and self.train_size is not None:
result.append(
'logits_train with shape (%d, num_sequences, num_tokens, %d)' %
(self.train_size, self.vocab_size))
result.append(
'labels_train with shape (%d, num_sequences, num_tokens, 1)' %
self.train_size)
if self.vocab_size is not None and self.test_size is not None:
result.append(
'logits_test with shape (%d, num_sequences, num_tokens, %d)' %
(self.test_size, self.vocab_size))
result.append(
'labels_test with shape (%d, num_sequences, num_tokens, 1)' %
self.test_size)
result.append(')')
return '\n'.join(result)
def _get_attack_features_and_metadata(
logits: Iterator[np.ndarray],
labels: Iterator[np.ndarray]) -> (np.ndarray, float, float):
"""Returns the average rank of tokens per batch of sequences and the loss.
Args:
logits: Logits returned by a seq2seq model, dim = (num_batches,
num_sequences, num_tokens, vocab_size).
labels: Target labels for the seq2seq model, dim = (num_batches,
num_sequences, num_tokens, 1).
Returns:
1. An array of average ranks, dim = (num_batches, 1).
Each average rank is calculated over ranks of tokens in sequences of a
particular batch.
2. Loss computed over all logits and labels.
3. Accuracy computed over all logits and labels.
"""
ranks = []
loss = 0.0
dataset_length = 0.0
correct_preds = 0
total_preds = 0
for batch_logits, batch_labels in zip(logits, labels):
# Compute average rank for the current batch.
batch_ranks = _get_batch_ranks(batch_logits, batch_labels)
ranks.append(np.mean(batch_ranks))
# Update overall loss metrics with metrics of the current batch.
batch_loss, batch_length = _get_batch_loss_metrics(batch_logits,
batch_labels)
loss += batch_loss
dataset_length += batch_length
# Update overall accuracy metrics with metrics of the current batch.
batch_correct_preds, batch_total_preds = _get_batch_accuracy_metrics(
batch_logits, batch_labels)
correct_preds += batch_correct_preds
total_preds += batch_total_preds
# Compute loss and accuracy for the dataset.
loss = loss / dataset_length
accuracy = correct_preds / total_preds
return np.array(ranks), loss, accuracy
def _get_batch_ranks(batch_logits: np.ndarray,
batch_labels: np.ndarray) -> np.ndarray:
"""Returns the ranks of tokens in a batch of sequences.
Args:
batch_logits: Logits returned by a seq2seq model, dim = (num_sequences,
num_tokens, vocab_size).
batch_labels: Target labels for the seq2seq model, dim = (num_sequences,
num_tokens, 1).
Returns:
An array of ranks of tokens in a batch of sequences, dim = (num_sequences,
num_tokens, 1)
"""
batch_ranks = []
for sequence_logits, sequence_labels in zip(batch_logits, batch_labels):
batch_ranks += _get_ranks_for_sequence(sequence_logits, sequence_labels)
return np.array(batch_ranks)
def _get_ranks_for_sequence(logits: np.ndarray,
labels: np.ndarray) -> List[float]:
"""Returns ranks for a sequence.
Args:
logits: Logits of a single sequence, dim = (num_tokens, vocab_size).
labels: Target labels of a single sequence, dim = (num_tokens, 1).
Returns:
An array of ranks for tokens in the sequence, dim = (num_tokens, 1).
"""
sequence_ranks = []
for logit, label in zip(logits, labels.astype(int)):
rank = rankdata(-logit, method='min')[label] - 1.0
sequence_ranks.append(rank)
return sequence_ranks
def _get_batch_loss_metrics(batch_logits: np.ndarray,
batch_labels: np.ndarray) -> (float, int):
"""Returns the loss, number of sequences for a batch.
Args:
batch_logits: Logits returned by a seq2seq model, dim = (num_sequences,
num_tokens, vocab_size).
batch_labels: Target labels for the seq2seq model, dim = (num_sequences,
num_tokens, 1).
"""
batch_loss = 0.0
batch_length = len(batch_logits)
for sequence_logits, sequence_labels in zip(batch_logits, batch_labels):
sequence_loss = tf.losses.sparse_categorical_crossentropy(
tf.keras.backend.constant(sequence_labels),
tf.keras.backend.constant(sequence_logits),
from_logits=True)
batch_loss += sequence_loss.numpy().sum()
return batch_loss / batch_length, batch_length
def _get_batch_accuracy_metrics(batch_logits: np.ndarray,
batch_labels: np.ndarray) -> (float, float):
"""Returns the number of correct predictions, total number of predictions for a batch.
Args:
batch_logits: Logits returned by a seq2seq model, dim = (num_sequences,
num_tokens, vocab_size).
batch_labels: Target labels for the seq2seq model, dim = (num_sequences,
num_tokens, 1).
"""
batch_correct_preds = 0.0
batch_total_preds = 0.0
for sequence_logits, sequence_labels in zip(batch_logits, batch_labels):
preds = tf.metrics.sparse_categorical_accuracy(
tf.keras.backend.constant(sequence_labels),
tf.keras.backend.constant(sequence_logits))
batch_correct_preds += preds.numpy().sum()
batch_total_preds += len(sequence_labels)
return batch_correct_preds, batch_total_preds
def create_seq2seq_attacker_data(
attack_input_data: Seq2SeqAttackInputData,
test_fraction: float = 0.25,
balance: bool = True,
privacy_report_metadata: PrivacyReportMetadata = PrivacyReportMetadata()
) -> AttackerData:
"""Prepares Seq2SeqAttackInputData to train ML attackers.
Uses logits and losses to generate ranks and performs a random train-test
split.
Also computes metadata (loss, accuracy) for the model under attack
and populates respective fields of PrivacyReportMetadata.
Args:
attack_input_data: Original Seq2SeqAttackInputData
test_fraction: Fraction of the dataset to include in the test split.
balance: Whether the training and test sets for the membership inference
attacker should have a balanced (roughly equal) number of samples from the
training and test sets used to develop the model under attack.
privacy_report_metadata: the metadata of the model under attack.
Returns:
AttackerData.
"""
attack_input_train, loss_train, accuracy_train = _get_attack_features_and_metadata(
attack_input_data.logits_train, attack_input_data.labels_train)
attack_input_test, loss_test, accuracy_test = _get_attack_features_and_metadata(
attack_input_data.logits_test, attack_input_data.labels_test)
if balance:
min_size = min(len(attack_input_train), len(attack_input_test))
attack_input_train = _sample_multidimensional_array(attack_input_train,
min_size)
attack_input_test = _sample_multidimensional_array(attack_input_test,
min_size)
features_all = np.concatenate((attack_input_train, attack_input_test))
ntrain, ntest = attack_input_train.shape[0], attack_input_test.shape[0]
# Reshape for classifying one-dimensional features
features_all = features_all.reshape(-1, 1)
labels_all = np.concatenate(((np.zeros(ntrain)), (np.ones(ntest))))
# Perform a train-test split
features_train, features_test, \
is_training_labels_train, is_training_labels_test = \
model_selection.train_test_split(
features_all, labels_all, test_size=test_fraction, stratify=labels_all)
# Populate accuracy, loss fields in privacy report metadata
privacy_report_metadata.loss_train = loss_train
privacy_report_metadata.loss_test = loss_test
privacy_report_metadata.accuracy_train = accuracy_train
privacy_report_metadata.accuracy_test = accuracy_test
return AttackerData(features_train, is_training_labels_train, features_test,
is_training_labels_test,
DataSize(ntrain=ntrain, ntest=ntest))
def run_seq2seq_attack(attack_input: Seq2SeqAttackInputData,
privacy_report_metadata: PrivacyReportMetadata = None,
balance_attacker_training: bool = True) -> AttackResults:
"""Runs membership inference attacks on a seq2seq model.
Args:
attack_input: input data for running an attack
privacy_report_metadata: the metadata of the model under attack.
balance_attacker_training: Whether the training and test sets for the
membership inference attacker should have a balanced (roughly equal)
number of samples from the training and test sets used to develop the
model under attack.
Returns:
the attack result.
"""
attack_input.validate()
# The attacker uses the average rank (a single number) of a seq2seq dataset
# record to determine membership. So only Logistic Regression is supported,
# as it makes the most sense for single-number features.
attacker = models.LogisticRegressionAttacker()
# Create attacker data and populate fields of privacy_report_metadata
privacy_report_metadata = privacy_report_metadata or PrivacyReportMetadata()
prepared_attacker_data = create_seq2seq_attacker_data(
attack_input_data=attack_input,
balance=balance_attacker_training,
privacy_report_metadata=privacy_report_metadata)
attacker.train_model(prepared_attacker_data.features_train,
prepared_attacker_data.is_training_labels_train)
# Run the attacker on (permuted) test examples.
predictions_test = attacker.predict(prepared_attacker_data.features_test)
# Generate ROC curves with predictions.
fpr, tpr, thresholds = metrics.roc_curve(
prepared_attacker_data.is_training_labels_test, predictions_test)
roc_curve = RocCurve(tpr=tpr, fpr=fpr, thresholds=thresholds)
attack_results = [
SingleAttackResult(
slice_spec=SingleSliceSpec(),
attack_type=AttackType.LOGISTIC_REGRESSION,
roc_curve=roc_curve,
data_size=prepared_attacker_data.data_size)
]
return AttackResults(
single_attack_results=attack_results,
privacy_report_metadata=privacy_report_metadata)

View file

@ -13,15 +13,15 @@
# limitations under the License.
# Lint as: python3
"""Tests for tensorflow_privacy.privacy.membership_inference_attack.seq2seq_mia."""
"""Tests for tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.seq2seq_mia."""
from absl.testing import absltest
import numpy as np
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import AttackType
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import PrivacyReportMetadata
from tensorflow_privacy.privacy.membership_inference_attack.seq2seq_mia import create_seq2seq_attacker_data
from tensorflow_privacy.privacy.membership_inference_attack.seq2seq_mia import run_seq2seq_attack
from tensorflow_privacy.privacy.membership_inference_attack.seq2seq_mia import Seq2SeqAttackInputData
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import AttackType
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import PrivacyReportMetadata
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.seq2seq_mia import create_seq2seq_attacker_data
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.seq2seq_mia import run_seq2seq_attack
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.seq2seq_mia import Seq2SeqAttackInputData
class Seq2SeqAttackInputDataTest(absltest.TestCase):

View file

@ -0,0 +1,199 @@
# Copyright 2020, The TensorFlow Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# Lint as: python3
"""A hook and a function in tf estimator for membership inference attack."""
import os
from typing import Iterable
from absl import logging
import numpy as np
import tensorflow.compat.v1 as tf
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack import membership_inference_attack as mia
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import AttackInputData
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import AttackType
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import get_flattened_attack_metrics
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import SlicingSpec
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.utils import log_loss
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.utils_tensorboard import write_results_to_tensorboard
def calculate_losses(estimator, input_fn, labels):
"""Get predictions and losses for samples.
The assumptions are 1) the loss is cross-entropy loss, and 2) user have
specified prediction mode to return predictions, e.g.,
when mode == tf.estimator.ModeKeys.PREDICT, the model function returns
tf.estimator.EstimatorSpec(mode=mode, predictions=tf.nn.softmax(logits)).
Args:
estimator: model to make prediction
input_fn: input function to be used in estimator.predict
labels: array of size (n_samples, ), true labels of samples (integer valued)
Returns:
preds: probability vector of each sample
loss: cross entropy loss of each sample
"""
pred = np.array(list(estimator.predict(input_fn=input_fn)))
loss = log_loss(labels, pred)
return pred, loss
class MembershipInferenceTrainingHook(tf.estimator.SessionRunHook):
"""Training hook to perform membership inference attack on epoch end."""
def __init__(
self,
estimator,
in_train, out_train,
input_fn_constructor,
slicing_spec: SlicingSpec = None,
attack_types: Iterable[AttackType] = (AttackType.THRESHOLD_ATTACK,),
tensorboard_dir=None,
tensorboard_merge_classifiers=False):
"""Initialize the hook.
Args:
estimator: model to be tested
in_train: (in_training samples, in_training labels)
out_train: (out_training samples, out_training labels)
input_fn_constructor: a function that receives sample, label and construct
the input_fn for model prediction
slicing_spec: slicing specification of the attack
attack_types: a list of attacks, each of type AttackType
tensorboard_dir: directory for tensorboard summary
tensorboard_merge_classifiers: if true, plot different classifiers with
the same slicing_spec and metric in the same figure
"""
in_train_data, self._in_train_labels = in_train
out_train_data, self._out_train_labels = out_train
# Define the input functions for both in and out-training samples.
self._in_train_input_fn = input_fn_constructor(in_train_data,
self._in_train_labels)
self._out_train_input_fn = input_fn_constructor(out_train_data,
self._out_train_labels)
self._estimator = estimator
self._slicing_spec = slicing_spec
self._attack_types = attack_types
self._tensorboard_merge_classifiers = tensorboard_merge_classifiers
if tensorboard_dir:
if tensorboard_merge_classifiers:
self._writers = {}
with tf.Graph().as_default():
for attack_type in attack_types:
self._writers[attack_type.name] = tf.summary.FileWriter(
os.path.join(tensorboard_dir, 'MI', attack_type.name))
else:
with tf.Graph().as_default():
self._writers = tf.summary.FileWriter(
os.path.join(tensorboard_dir, 'MI'))
logging.info('Will write to tensorboard.')
else:
self._writers = None
def end(self, session):
results = run_attack_helper(self._estimator,
self._in_train_input_fn,
self._out_train_input_fn,
self._in_train_labels, self._out_train_labels,
self._slicing_spec,
self._attack_types)
logging.info(results)
att_types, att_slices, att_metrics, att_values = get_flattened_attack_metrics(
results)
print('Attack result:')
print('\n'.join([' %s: %.4f' % (', '.join([s, t, m]), v) for t, s, m, v in
zip(att_types, att_slices, att_metrics, att_values)]))
# Write to tensorboard if tensorboard_dir is specified
global_step = self._estimator.get_variable_value('global_step')
if self._writers is not None:
write_results_to_tensorboard(results, self._writers, global_step,
self._tensorboard_merge_classifiers)
def run_attack_on_tf_estimator_model(
estimator, in_train, out_train,
input_fn_constructor,
slicing_spec: SlicingSpec = None,
attack_types: Iterable[AttackType] = (AttackType.THRESHOLD_ATTACK,)):
"""Performs the attack in the end of training.
Args:
estimator: model to be tested
in_train: (in_training samples, in_training labels)
out_train: (out_training samples, out_training labels)
input_fn_constructor: a function that receives sample, label and construct
the input_fn for model prediction
slicing_spec: slicing specification of the attack
attack_types: a list of attacks, each of type AttackType
Returns:
Results of the attack
"""
in_train_data, in_train_labels = in_train
out_train_data, out_train_labels = out_train
# Define the input functions for both in and out-training samples.
in_train_input_fn = input_fn_constructor(in_train_data, in_train_labels)
out_train_input_fn = input_fn_constructor(out_train_data, out_train_labels)
# Call the helper to run the attack.
results = run_attack_helper(estimator,
in_train_input_fn, out_train_input_fn,
in_train_labels, out_train_labels,
slicing_spec,
attack_types)
logging.info('End of training attack:')
logging.info(results)
return results
def run_attack_helper(
estimator,
in_train_input_fn, out_train_input_fn,
in_train_labels, out_train_labels,
slicing_spec: SlicingSpec = None,
attack_types: Iterable[AttackType] = (AttackType.THRESHOLD_ATTACK,)):
"""A helper function to perform attack.
Args:
estimator: model to be tested
in_train_input_fn: input_fn for in training data
out_train_input_fn: input_fn for out of training data
in_train_labels: in training labels
out_train_labels: out of training labels
slicing_spec: slicing specification of the attack
attack_types: a list of attacks, each of type AttackType
Returns:
Results of the attack
"""
# Compute predictions and losses
in_train_pred, in_train_loss = calculate_losses(estimator,
in_train_input_fn,
in_train_labels)
out_train_pred, out_train_loss = calculate_losses(estimator,
out_train_input_fn,
out_train_labels)
attack_input = AttackInputData(
logits_train=in_train_pred, logits_test=out_train_pred,
labels_train=in_train_labels, labels_test=out_train_labels,
loss_train=in_train_loss, loss_test=out_train_loss
)
results = mia.run_attacks(attack_input,
slicing_spec=slicing_spec,
attack_types=attack_types)
return results

View file

@ -21,11 +21,11 @@ from absl import logging
import numpy as np
import tensorflow.compat.v1 as tf
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import AttackType
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import get_flattened_attack_metrics
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import SlicingSpec
from tensorflow_privacy.privacy.membership_inference_attack.tf_estimator_evaluation import MembershipInferenceTrainingHook
from tensorflow_privacy.privacy.membership_inference_attack.tf_estimator_evaluation import run_attack_on_tf_estimator_model
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import AttackType
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import get_flattened_attack_metrics
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import SlicingSpec
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.tf_estimator_evaluation import MembershipInferenceTrainingHook
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.tf_estimator_evaluation import run_attack_on_tf_estimator_model
FLAGS = flags.FLAGS

View file

@ -13,17 +13,17 @@
# limitations under the License.
# Lint as: python3
"""Tests for tensorflow_privacy.privacy.membership_inference_attack.tf_estimator_evaluation."""
"""Tests for tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.tf_estimator_evaluation."""
from absl.testing import absltest
import numpy as np
import tensorflow.compat.v1 as tf
from tensorflow_privacy.privacy.membership_inference_attack import tf_estimator_evaluation
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import AttackResults
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import AttackType
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import get_flattened_attack_metrics
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack import tf_estimator_evaluation
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import AttackResults
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import AttackType
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import get_flattened_attack_metrics
class UtilsTest(absltest.TestCase):

View file

@ -19,8 +19,8 @@ from typing import Union
import tensorflow as tf2
import tensorflow.compat.v1 as tf1
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import AttackResults
from tensorflow_privacy.privacy.membership_inference_attack.data_structures import get_flattened_attack_metrics
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import AttackResults
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.data_structures import get_flattened_attack_metrics
def write_to_tensorboard(writers, tags, values, step):

View file

@ -13,12 +13,12 @@
# limitations under the License.
# Lint as: python3
"""Tests for tensorflow_privacy.privacy.membership_inference_attack.utils."""
"""Tests for tensorflow_privacy.privacy.privacy_tests.membership_inference_attack.utils."""
from absl.testing import absltest
import numpy as np
from tensorflow_privacy.privacy.membership_inference_attack import utils
from tensorflow_privacy.privacy.privacy_tests.membership_inference_attack import utils
class UtilsTest(absltest.TestCase):