2022-05-09 16:04:33 -06:00
|
|
|
## Truth Serum: Poisoning Machine Learning Models to Reveal Their Secrets
|
|
|
|
|
|
|
|
This directory contains code to reproduce results from the paper:
|
|
|
|
|
|
|
|
**"Truth Serum: Poisoning Machine Learning Models to Reveal Their Secrets"**<br>
|
|
|
|
https://arxiv.org/abs/2204.00032 <br>
|
|
|
|
by Florian Tramèr, Reza Shokri, Ayrton San Joaquin, Hoang Le, Matthew Jagielski, Sanghyun Hong and Nicholas Carlini
|
|
|
|
|
|
|
|
### INSTALLING
|
|
|
|
|
|
|
|
The experiments in this directory are built on top of the
|
|
|
|
[LiRA membership inference attack](../mi_lira_2021).
|
|
|
|
|
|
|
|
After following the [installation instructions](../mi_lira_2021#installing) for
|
|
|
|
LiRa, make sure the attack code is on your `PYTHONPATH`:
|
|
|
|
|
|
|
|
```bash
|
|
|
|
export PYTHONPATH="${PYTHONPATH}:../mi_lira_2021"
|
|
|
|
```
|
|
|
|
|
|
|
|
### RUNNING THE CODE
|
|
|
|
|
|
|
|
#### 1. Train the models
|
|
|
|
|
|
|
|
The first step in our attack is to train shadow models, with some data points
|
|
|
|
targeted by a poisoning attack. You can train 16 shadow models with the command
|
|
|
|
|
|
|
|
> bash scripts/train_demo.sh
|
|
|
|
|
|
|
|
or if you have multiple GPUs on your machine and want to train these models in
|
|
|
|
parallel, then modify and run
|
|
|
|
|
|
|
|
> bash scripts/train_demo_multigpu.sh
|
|
|
|
|
|
|
|
This will train 16 CIFAR-10 wide ResNet models to ~91% accuracy each, with 250
|
|
|
|
points targeted for poisoning. For each of these 250 targeted points, the
|
|
|
|
attacker adds 8 mislabeled poisoned copies of the point into the training set.
|
|
|
|
The training run will output a bunch of files under the directory exp/cifar10
|
|
|
|
with structure:
|
|
|
|
|
|
|
|
```
|
|
|
|
exp/cifar10/
|
|
|
|
- xtrain.npy
|
|
|
|
- ytain.npy
|
|
|
|
- poison_pos.npy
|
|
|
|
- experiment_N_of_16
|
|
|
|
-- hparams.json
|
|
|
|
-- keep.npy
|
|
|
|
-- ckpt/
|
|
|
|
--- 0000000100.npz
|
|
|
|
-- tb/
|
|
|
|
```
|
|
|
|
|
|
|
|
The following flags control the poisoning attack:
|
|
|
|
|
|
|
|
- `num_poison_targets (default=250)`. The number of targeted points.
|
|
|
|
- `poison_reps (default=8)`. The number of replicas per poison.
|
|
|
|
- `poison_pos_seed (default=0)`. The random seed to use to choose the target
|
|
|
|
points.
|
|
|
|
|
|
|
|
We recommend that `num_poison_targets * poison_reps < 5000` on CIFAR-10, as
|
|
|
|
otherwise the poisons introduce too much label noise and the model's accuracy
|
|
|
|
(and the attack's success rate) will be degraded.
|
|
|
|
|
|
|
|
#### 2. Perform inference and compute scores
|
|
|
|
|
|
|
|
Exactly as for LiRA, we then evaluate the models on the entire CIFAR-10 dataset,
|
|
|
|
and generate logit-scaled membership inference scores. See
|
|
|
|
[here](../mi_lira_2021#2-perform-inference) and
|
|
|
|
[here](../mi_lira_2021#3-compute-membership-inference-scores) for details.
|
|
|
|
|
|
|
|
```bash
|
|
|
|
python3 -m inference --logdir=exp/cifar10/
|
|
|
|
python3 -m score exp/cifar10/
|
|
|
|
```
|
|
|
|
|
|
|
|
### PLOTTING THE RESULTS
|
|
|
|
|
|
|
|
Finally we can generate pretty pictures, and run the plotting code
|
|
|
|
|
|
|
|
```bash
|
|
|
|
python3 plot_poison.py
|
|
|
|
```
|
|
|
|
|
|
|
|
which should give (something like) the following output
|
|
|
|
|
|
|
|
![Log-log ROC Curve for all attacks](fprtpr.png "Log-log ROC Curve")
|
|
|
|
|
|
|
|
```
|
|
|
|
Attack No poison (LiRA)
|
2023-06-13 01:57:46 -06:00
|
|
|
AUC 0.6992, Accuracy 0.6240, TPR@0.1%FPR of 0.0529
|
2022-05-09 16:04:33 -06:00
|
|
|
Attack No poison (Global threshold)
|
2023-06-13 01:57:46 -06:00
|
|
|
AUC 0.6200, Accuracy 0.6167, TPR@0.1%FPR of 0.0011
|
2022-05-09 16:04:33 -06:00
|
|
|
Attack With poison (LiRA)
|
2023-06-13 01:57:46 -06:00
|
|
|
AUC 0.9904, Accuracy 0.9617, TPR@0.1%FPR of 0.3730
|
2022-05-09 16:04:33 -06:00
|
|
|
Attack With poison (Global threshold)
|
2023-06-13 01:57:46 -06:00
|
|
|
AUC 0.9911, Accuracy 0.9580, TPR@0.1%FPR of 0.2130
|
2022-05-09 16:04:33 -06:00
|
|
|
```
|
|
|
|
|
|
|
|
where the baselines are LiRA and a simple global threshold on the membership
|
|
|
|
scores, both without poisoning. With poisoning, both LiRA and the global
|
|
|
|
threshold attack are boosted significantly. Note that because we only train a
|
|
|
|
few models, we use the fixed variance variant of LiRA.
|
|
|
|
|
|
|
|
### Citation
|
|
|
|
|
|
|
|
You can cite this paper with
|
|
|
|
|
|
|
|
```
|
|
|
|
@article{tramer2022truth,
|
|
|
|
title={Truth Serum: Poisoning Machine Learning Models to Reveal Their Secrets},
|
|
|
|
author={Tramer, Florian and Shokri, Reza and San Joaquin, Ayrton and Le, Hoang and Jagielski, Matthew and Hong, Sanghyun and Carlini, Nicholas},
|
|
|
|
journal={arXiv preprint arXiv:2204.00032},
|
|
|
|
year={2022}
|
|
|
|
}
|
|
|
|
```
|