tensorflow_privacy/research/pate_2018
Peter Hawkins 3d038a490a [NumPy] Remove references to deprecated NumPy type aliases.
This change replaces references to a number of deprecated NumPy type aliases (np.bool, np.int, np.float, np.complex, np.object, np.str) with their recommended replacement (bool, int, float, complex, object, str).

NumPy 1.24 drops the deprecated aliases, so we must remove uses before updating NumPy.

PiperOrigin-RevId: 497194550
2022-12-22 10:32:59 -08:00
..
ICLR2018 Add TensorFlow Privacy BUILD and WORKSPACE files. 2022-02-16 23:30:06 +00:00
BUILD Add TensorFlow Privacy BUILD and WORKSPACE files. 2022-02-16 23:30:06 +00:00
core.py [NumPy] Remove references to deprecated NumPy type aliases. 2022-12-22 10:32:59 -08:00
core_test.py Add missing licenses. 2019-01-14 16:02:35 -08:00
README.md Add missing licenses. 2019-01-14 16:02:35 -08:00
smooth_sensitivity.py FIX: python3 compatibility 2019-01-15 14:52:53 -06:00
smooth_sensitivity_test.py Add missing licenses. 2019-01-14 16:02:35 -08:00

Implementation of an RDP privacy accountant and smooth sensitivity analysis for the PATE framework. The underlying theory and supporting experiments appear in "Scalable Private Learning with PATE" by Nicolas Papernot, Shuang Song, Ilya Mironov, Ananth Raghunathan, Kunal Talwar, Ulfar Erlingsson (ICLR 2018, https://arxiv.org/abs/1802.08908).

Overview

The PATE ('Private Aggregation of Teacher Ensembles') framework was introduced by Papernot et al. in "Semi-supervised Knowledge Transfer for Deep Learning from Private Training Data" (ICLR 2017, https://arxiv.org/abs/1610.05755). The framework enables model-agnostic training that provably provides differential privacy of the training dataset.

The framework consists of teachers, the student model, and the aggregator. The teachers are models trained on disjoint subsets of the training datasets. The student model has access to an insensitive (e.g., public) unlabelled dataset, which is labelled by interacting with the ensemble of teachers via the aggregator. The aggregator tallies outputs of the teacher models, and either forwards a (noisy) aggregate to the student, or refuses to answer.

Differential privacy is enforced by the aggregator. The privacy guarantees can be data-independent, which means that they are solely the function of the aggregator's parameters. Alternatively, privacy analysis can be data-dependent, which allows for finer reasoning where, under certain conditions on the input distribution, the final privacy guarantees can be improved relative to the data-independent analysis. Data-dependent privacy guarantees may, by themselves, be a function of sensitive data and therefore publishing these guarantees requires its own sanitization procedure. In our case sanitization of data-dependent privacy guarantees proceeds via smooth sensitivity analysis.

The common machinery used for all privacy analyses in this repository is the Rényi differential privacy, or RDP (see https://arxiv.org/abs/1702.07476).

This repository contains implementations of privacy accountants and smooth sensitivity analysis for several data-independent and data-dependent mechanism that together comprise the PATE framework.

Requirements

  • Python, version ≥ 2.7
  • absl (see here, or just type pip install absl-py)
  • numpy
  • scipy
  • sympy (for smooth sensitivity analysis)
  • unittest (for testing)

Self-testing

To verify the installation run

$ python core_test.py
$ python smooth_sensitivity_test.py

Files in this directory

  • core.py — RDP privacy accountant for several vote aggregators (GNMax, Threshold, Laplace).

  • smooth_sensitivity.py — Smooth sensitivity analysis for GNMax and Threshold mechanisms.

  • core_test.py and smooth_sensitivity_test.py — Unit tests for the files above.

Contact information

You may direct your comments to mironov@google.com and PR to @ilyamironov.