## Truth Serum: Poisoning Machine Learning Models to Reveal Their Secrets
This directory contains code to reproduce results from the paper:
**"Truth Serum: Poisoning Machine Learning Models to Reveal Their Secrets"**
https://arxiv.org/abs/2204.00032
by Florian Tramèr, Reza Shokri, Ayrton San Joaquin, Hoang Le, Matthew Jagielski, Sanghyun Hong and Nicholas Carlini
### INSTALLING
The experiments in this directory are built on top of the
[LiRA membership inference attack](../mi_lira_2021).
After following the [installation instructions](../mi_lira_2021#installing) for
LiRa, make sure the attack code is on your `PYTHONPATH`:
```bash
export PYTHONPATH="${PYTHONPATH}:../mi_lira_2021"
```
### RUNNING THE CODE
#### 1. Train the models
The first step in our attack is to train shadow models, with some data points
targeted by a poisoning attack. You can train 16 shadow models with the command
> bash scripts/train_demo.sh
or if you have multiple GPUs on your machine and want to train these models in
parallel, then modify and run
> bash scripts/train_demo_multigpu.sh
This will train 16 CIFAR-10 wide ResNet models to ~91% accuracy each, with 250
points targeted for poisoning. For each of these 250 targeted points, the
attacker adds 8 mislabeled poisoned copies of the point into the training set.
The training run will output a bunch of files under the directory exp/cifar10
with structure:
```
exp/cifar10/
- xtrain.npy
- ytain.npy
- poison_pos.npy
- experiment_N_of_16
-- hparams.json
-- keep.npy
-- ckpt/
--- 0000000100.npz
-- tb/
```
The following flags control the poisoning attack:
- `num_poison_targets (default=250)`. The number of targeted points.
- `poison_reps (default=8)`. The number of replicas per poison.
- `poison_pos_seed (default=0)`. The random seed to use to choose the target
points.
We recommend that `num_poison_targets * poison_reps < 5000` on CIFAR-10, as
otherwise the poisons introduce too much label noise and the model's accuracy
(and the attack's success rate) will be degraded.
#### 2. Perform inference and compute scores
Exactly as for LiRA, we then evaluate the models on the entire CIFAR-10 dataset,
and generate logit-scaled membership inference scores. See
[here](../mi_lira_2021#2-perform-inference) and
[here](../mi_lira_2021#3-compute-membership-inference-scores) for details.
```bash
python3 -m inference --logdir=exp/cifar10/
python3 -m score exp/cifar10/
```
### PLOTTING THE RESULTS
Finally we can generate pretty pictures, and run the plotting code
```bash
python3 plot_poison.py
```
which should give (something like) the following output
![Log-log ROC Curve for all attacks](fprtpr.png "Log-log ROC Curve")
```
Attack No poison (LiRA)
AUC 0.6992, Accuracy 0.6240, TPR@0.1%FPR of 0.0529
Attack No poison (Global threshold)
AUC 0.6200, Accuracy 0.6167, TPR@0.1%FPR of 0.0011
Attack With poison (LiRA)
AUC 0.9904, Accuracy 0.9617, TPR@0.1%FPR of 0.3730
Attack With poison (Global threshold)
AUC 0.9911, Accuracy 0.9580, TPR@0.1%FPR of 0.2130
```
where the baselines are LiRA and a simple global threshold on the membership
scores, both without poisoning. With poisoning, both LiRA and the global
threshold attack are boosted significantly. Note that because we only train a
few models, we use the fixed variance variant of LiRA.
### Citation
You can cite this paper with
```
@article{tramer2022truth,
title={Truth Serum: Poisoning Machine Learning Models to Reveal Their Secrets},
author={Tramer, Florian and Shokri, Reza and San Joaquin, Ayrton and Le, Hoang and Jagielski, Matthew and Hong, Sanghyun and Carlini, Nicholas},
journal={arXiv preprint arXiv:2204.00032},
year={2022}
}
```