tensorflow_privacy/research/mi_poison_2022/README.md

## Truth Serum: Poisoning Machine Learning Models to Reveal Their Secrets

This directory contains code to reproduce results from the paper:

**"Truth Serum: Poisoning Machine Learning Models to Reveal Their Secrets"**<br>
https://arxiv.org/abs/2204.00032 <br>
by Florian Tramèr, Reza Shokri, Ayrton San Joaquin, Hoang Le, Matthew Jagielski, Sanghyun Hong and Nicholas Carlini

### INSTALLING

The experiments in this directory are built on top of the
[LiRA membership inference attack](../mi_lira_2021).

After following the [installation instructions](../mi_lira_2021#installing) for
LiRa, make sure the attack code is on your `PYTHONPATH`:

```bash
export PYTHONPATH="${PYTHONPATH}:../mi_lira_2021"
```

### RUNNING THE CODE

#### 1. Train the models

The first step in our attack is to train shadow models, with some data points
targeted by a poisoning attack. You can train 16 shadow models with the command

> bash scripts/train_demo.sh

or if you have multiple GPUs on your machine and want to train these models in
parallel, then modify and run

> bash scripts/train_demo_multigpu.sh

This will train 16 CIFAR-10 wide ResNet models to ~91% accuracy each, with 250
points targeted for poisoning. For each of these 250 targeted points, the
attacker adds 8 mislabeled poisoned copies of the point into the training set.
The training run will output a bunch of files under the directory exp/cifar10
with structure:

```
exp/cifar10/
- xtrain.npy
- ytain.npy
- poison_pos.npy
- experiment_N_of_16
-- hparams.json
-- keep.npy
-- ckpt/
--- 0000000100.npz
-- tb/
```

The following flags control the poisoning attack:

-   `num_poison_targets (default=250)`. The number of targeted points.
-   `poison_reps (default=8)`. The number of replicas per poison.
-   `poison_pos_seed (default=0)`. The random seed to use to choose the target
    points.

We recommend that `num_poison_targets * poison_reps < 5000` on CIFAR-10, as
otherwise the poisons introduce too much label noise and the model's accuracy
(and the attack's success rate) will be degraded.

#### 2. Perform inference and compute scores

Exactly as for LiRA, we then evaluate the models on the entire CIFAR-10 dataset,
and generate logit-scaled membership inference scores. See
[here](../mi_lira_2021#2-perform-inference) and
[here](../mi_lira_2021#3-compute-membership-inference-scores) for details.

```bash
python3 -m inference --logdir=exp/cifar10/
python3 -m score exp/cifar10/
```

### PLOTTING THE RESULTS

Finally we can generate pretty pictures, and run the plotting code

```bash
python3 plot_poison.py
```

which should give (something like) the following output

![Log-log ROC Curve for all attacks](fprtpr.png "Log-log ROC Curve")

```
Attack No poison (LiRA)
   AUC 0.7025, Accuracy 0.6258, TPR@0.1%FPR of 0.0544
Attack No poison (Global threshold)
   AUC 0.6191, Accuracy 0.6173, TPR@0.1%FPR of 0.0012
Attack With poison (LiRA)
   AUC 0.9943, Accuracy 0.9653, TPR@0.1%FPR of 0.4945
Attack With poison (Global threshold)
   AUC 0.9922, Accuracy 0.9603, TPR@0.1%FPR of 0.3930
```

where the baselines are LiRA and a simple global threshold on the membership
scores, both without poisoning. With poisoning, both LiRA and the global
threshold attack are boosted significantly. Note that because we only train a
few models, we use the fixed variance variant of LiRA.

### Citation

You can cite this paper with

```
@article{tramer2022truth,
  title={Truth Serum: Poisoning Machine Learning Models to Reveal Their Secrets},
  author={Tramer, Florian and Shokri, Reza and San Joaquin, Ayrton and Le, Hoang and Jagielski, Matthew and Hong, Sanghyun and Carlini, Nicholas},
  journal={arXiv preprint arXiv:2204.00032},
  year={2022}
}
```
COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/privacy/pull/234 from ftramer:truth_serum fe44a0713952ef1615abf032947082eb5c082836 PiperOrigin-RevId: 447573314 2022-05-09 16:04:33 -06:00			`## Truth Serum: Poisoning Machine Learning Models to Reveal Their Secrets`

			`This directory contains code to reproduce results from the paper:`

			`"Truth Serum: Poisoning Machine Learning Models to Reveal Their Secrets"<br>`
			`https://arxiv.org/abs/2204.00032 <br>`
			`by Florian Tramèr, Reza Shokri, Ayrton San Joaquin, Hoang Le, Matthew Jagielski, Sanghyun Hong and Nicholas Carlini`

			`### INSTALLING`

			`The experiments in this directory are built on top of the`
			`[LiRA membership inference attack](../mi_lira_2021).`

			`After following the [installation instructions](../mi_lira_2021#installing) for`
			LiRa, make sure the attack code is on your `PYTHONPATH`:

			```bash
			`export PYTHONPATH="${PYTHONPATH}:../mi_lira_2021"`
			```

			`### RUNNING THE CODE`

			`#### 1. Train the models`

			`The first step in our attack is to train shadow models, with some data points`
			`targeted by a poisoning attack. You can train 16 shadow models with the command`

			`> bash scripts/train_demo.sh`

			`or if you have multiple GPUs on your machine and want to train these models in`
			`parallel, then modify and run`

			`> bash scripts/train_demo_multigpu.sh`

			`This will train 16 CIFAR-10 wide ResNet models to ~91% accuracy each, with 250`
			`points targeted for poisoning. For each of these 250 targeted points, the`
			`attacker adds 8 mislabeled poisoned copies of the point into the training set.`
			`The training run will output a bunch of files under the directory exp/cifar10`
			`with structure:`

			```
			`exp/cifar10/`
			`- xtrain.npy`
			`- ytain.npy`
			`- poison_pos.npy`
			`- experiment_N_of_16`
			`-- hparams.json`
			`-- keep.npy`
			`-- ckpt/`
			`--- 0000000100.npz`
			`-- tb/`
			```

			`The following flags control the poisoning attack:`

			- `num_poison_targets (default=250)`. The number of targeted points.
			- `poison_reps (default=8)`. The number of replicas per poison.
			- `poison_pos_seed (default=0)`. The random seed to use to choose the target
			`points.`

			We recommend that `num_poison_targets * poison_reps < 5000` on CIFAR-10, as
			`otherwise the poisons introduce too much label noise and the model's accuracy`
			`(and the attack's success rate) will be degraded.`

			`#### 2. Perform inference and compute scores`

			`Exactly as for LiRA, we then evaluate the models on the entire CIFAR-10 dataset,`
			`and generate logit-scaled membership inference scores. See`
			`[here](../mi_lira_2021#2-perform-inference) and`
			`[here](../mi_lira_2021#3-compute-membership-inference-scores) for details.`

			```bash
			`python3 -m inference --logdir=exp/cifar10/`
			`python3 -m score exp/cifar10/`
			```

			`### PLOTTING THE RESULTS`

			`Finally we can generate pretty pictures, and run the plotting code`

			```bash
			`python3 plot_poison.py`
			```

			`which should give (something like) the following output`

			`![Log-log ROC Curve for all attacks](fprtpr.png "Log-log ROC Curve")`

			```
			`Attack No poison (LiRA)`
Fixes comments and membership scores for thresholds attack. PiperOrigin-RevId: 555579896 2023-08-10 12:29:52 -06:00			`AUC 0.7025, Accuracy 0.6258, TPR@0.1%FPR of 0.0544`
COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/privacy/pull/234 from ftramer:truth_serum fe44a0713952ef1615abf032947082eb5c082836 PiperOrigin-RevId: 447573314 2022-05-09 16:04:33 -06:00			`Attack No poison (Global threshold)`
Fixes comments and membership scores for thresholds attack. PiperOrigin-RevId: 555579896 2023-08-10 12:29:52 -06:00			`AUC 0.6191, Accuracy 0.6173, TPR@0.1%FPR of 0.0012`
COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/privacy/pull/234 from ftramer:truth_serum fe44a0713952ef1615abf032947082eb5c082836 PiperOrigin-RevId: 447573314 2022-05-09 16:04:33 -06:00			`Attack With poison (LiRA)`
Fixes comments and membership scores for thresholds attack. PiperOrigin-RevId: 555579896 2023-08-10 12:29:52 -06:00			`AUC 0.9943, Accuracy 0.9653, TPR@0.1%FPR of 0.4945`
COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/privacy/pull/234 from ftramer:truth_serum fe44a0713952ef1615abf032947082eb5c082836 PiperOrigin-RevId: 447573314 2022-05-09 16:04:33 -06:00			`Attack With poison (Global threshold)`
Fixes comments and membership scores for thresholds attack. PiperOrigin-RevId: 555579896 2023-08-10 12:29:52 -06:00			`AUC 0.9922, Accuracy 0.9603, TPR@0.1%FPR of 0.3930`
COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/privacy/pull/234 from ftramer:truth_serum fe44a0713952ef1615abf032947082eb5c082836 PiperOrigin-RevId: 447573314 2022-05-09 16:04:33 -06:00			```

			`where the baselines are LiRA and a simple global threshold on the membership`
			`scores, both without poisoning. With poisoning, both LiRA and the global`
			`threshold attack are boosted significantly. Note that because we only train a`
			`few models, we use the fixed variance variant of LiRA.`

			`### Citation`

			`You can cite this paper with`

			```
			`@article{tramer2022truth,`
			`title={Truth Serum: Poisoning Machine Learning Models to Reveal Their Secrets},`
			`author={Tramer, Florian and Shokri, Reza and San Joaquin, Ayrton and Le, Hoang and Jagielski, Matthew and Hong, Sanghyun and Carlini, Nicholas},`
			`journal={arXiv preprint arXiv:2204.00032},`
			`year={2022}`
			`}`
			```