Add tensorflow_privacy/ to list of git_files.

PiperOrigin-RevId: 274022715
2019-10-10 13:08:38 -07:00 · 2019-10-10 13:08:38 -07:00 · 6c03ce49fd
commit 6c03ce49fd
parent b125e3a686
82 changed files with 0 additions and 14541 deletions
--- a/tensorflow_privacy/CONTRIBUTING.md
+++ b/tensorflow_privacy/CONTRIBUTING.md
@ -1,28 +0,0 @@
-# How to Contribute
-
-We'd love to accept your patches and contributions to this project. There are
-just a few small guidelines you need to follow.
-
-## Contributor License Agreement
-
-Contributions to this project must be accompanied by a Contributor License
-Agreement. You (or your employer) retain the copyright to your contribution;
-this simply gives us permission to use and redistribute your contributions as
-part of the project. Head over to <https://cla.developers.google.com/> to see
-your current agreements on file or to sign a new one.
-
-You generally only need to submit a CLA once, so if you've already submitted one
-(even if it was for a different project), you probably don't need to do it
-again.
-
-## Code reviews
-
-All submissions, including submissions by project members, require review. We
-use GitHub pull requests for this purpose. Consult
-[GitHub Help](https://help.github.com/articles/about-pull-requests/) for more
-information on using pull requests.
-
-## Community Guidelines
-
-This project follows Google's 
-[Open Source Community Guidelines](https://opensource.google.com/conduct/).
--- a/tensorflow_privacy/LICENSE
+++ b/tensorflow_privacy/LICENSE
@ -1,202 +0,0 @@
-
-                                 Apache License
-                           Version 2.0, January 2004
-                        http://www.apache.org/licenses/
-
-   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
-
-   1. Definitions.
-
-      "License" shall mean the terms and conditions for use, reproduction,
-      and distribution as defined by Sections 1 through 9 of this document.
-
-      "Licensor" shall mean the copyright owner or entity authorized by
-      the copyright owner that is granting the License.
-
-      "Legal Entity" shall mean the union of the acting entity and all
-      other entities that control, are controlled by, or are under common
-      control with that entity. For the purposes of this definition,
-      "control" means (i) the power, direct or indirect, to cause the
-      direction or management of such entity, whether by contract or
-      otherwise, or (ii) ownership of fifty percent (50%) or more of the
-      outstanding shares, or (iii) beneficial ownership of such entity.
-
-      "You" (or "Your") shall mean an individual or Legal Entity
-      exercising permissions granted by this License.
-
-      "Source" form shall mean the preferred form for making modifications,
-      including but not limited to software source code, documentation
-      source, and configuration files.
-
-      "Object" form shall mean any form resulting from mechanical
-      transformation or translation of a Source form, including but
-      not limited to compiled object code, generated documentation,
-      and conversions to other media types.
-
-      "Work" shall mean the work of authorship, whether in Source or
-      Object form, made available under the License, as indicated by a
-      copyright notice that is included in or attached to the work
-      (an example is provided in the Appendix below).
-
-      "Derivative Works" shall mean any work, whether in Source or Object
-      form, that is based on (or derived from) the Work and for which the
-      editorial revisions, annotations, elaborations, or other modifications
-      represent, as a whole, an original work of authorship. For the purposes
-      of this License, Derivative Works shall not include works that remain
-      separable from, or merely link (or bind by name) to the interfaces of,
-      the Work and Derivative Works thereof.
-
-      "Contribution" shall mean any work of authorship, including
-      the original version of the Work and any modifications or additions
-      to that Work or Derivative Works thereof, that is intentionally
-      submitted to Licensor for inclusion in the Work by the copyright owner
-      or by an individual or Legal Entity authorized to submit on behalf of
-      the copyright owner. For the purposes of this definition, "submitted"
-      means any form of electronic, verbal, or written communication sent
-      to the Licensor or its representatives, including but not limited to
-      communication on electronic mailing lists, source code control systems,
-      and issue tracking systems that are managed by, or on behalf of, the
-      Licensor for the purpose of discussing and improving the Work, but
-      excluding communication that is conspicuously marked or otherwise
-      designated in writing by the copyright owner as "Not a Contribution."
-
-      "Contributor" shall mean Licensor and any individual or Legal Entity
-      on behalf of whom a Contribution has been received by Licensor and
-      subsequently incorporated within the Work.
-
-   2. Grant of Copyright License. Subject to the terms and conditions of
-      this License, each Contributor hereby grants to You a perpetual,
-      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
-      copyright license to reproduce, prepare Derivative Works of,
-      publicly display, publicly perform, sublicense, and distribute the
-      Work and such Derivative Works in Source or Object form.
-
-   3. Grant of Patent License. Subject to the terms and conditions of
-      this License, each Contributor hereby grants to You a perpetual,
-      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
-      (except as stated in this section) patent license to make, have made,
-      use, offer to sell, sell, import, and otherwise transfer the Work,
-      where such license applies only to those patent claims licensable
-      by such Contributor that are necessarily infringed by their
-      Contribution(s) alone or by combination of their Contribution(s)
-      with the Work to which such Contribution(s) was submitted. If You
-      institute patent litigation against any entity (including a
-      cross-claim or counterclaim in a lawsuit) alleging that the Work
-      or a Contribution incorporated within the Work constitutes direct
-      or contributory patent infringement, then any patent licenses
-      granted to You under this License for that Work shall terminate
-      as of the date such litigation is filed.
-
-   4. Redistribution. You may reproduce and distribute copies of the
-      Work or Derivative Works thereof in any medium, with or without
-      modifications, and in Source or Object form, provided that You
-      meet the following conditions:
-
-      (a) You must give any other recipients of the Work or
-          Derivative Works a copy of this License; and
-
-      (b) You must cause any modified files to carry prominent notices
-          stating that You changed the files; and
-
-      (c) You must retain, in the Source form of any Derivative Works
-          that You distribute, all copyright, patent, trademark, and
-          attribution notices from the Source form of the Work,
-          excluding those notices that do not pertain to any part of
-          the Derivative Works; and
-
-      (d) If the Work includes a "NOTICE" text file as part of its
-          distribution, then any Derivative Works that You distribute must
-          include a readable copy of the attribution notices contained
-          within such NOTICE file, excluding those notices that do not
-          pertain to any part of the Derivative Works, in at least one
-          of the following places: within a NOTICE text file distributed
-          as part of the Derivative Works; within the Source form or
-          documentation, if provided along with the Derivative Works; or,
-          within a display generated by the Derivative Works, if and
-          wherever such third-party notices normally appear. The contents
-          of the NOTICE file are for informational purposes only and
-          do not modify the License. You may add Your own attribution
-          notices within Derivative Works that You distribute, alongside
-          or as an addendum to the NOTICE text from the Work, provided
-          that such additional attribution notices cannot be construed
-          as modifying the License.
-
-      You may add Your own copyright statement to Your modifications and
-      may provide additional or different license terms and conditions
-      for use, reproduction, or distribution of Your modifications, or
-      for any such Derivative Works as a whole, provided Your use,
-      reproduction, and distribution of the Work otherwise complies with
-      the conditions stated in this License.
-
-   5. Submission of Contributions. Unless You explicitly state otherwise,
-      any Contribution intentionally submitted for inclusion in the Work
-      by You to the Licensor shall be under the terms and conditions of
-      this License, without any additional terms or conditions.
-      Notwithstanding the above, nothing herein shall supersede or modify
-      the terms of any separate license agreement you may have executed
-      with Licensor regarding such Contributions.
-
-   6. Trademarks. This License does not grant permission to use the trade
-      names, trademarks, service marks, or product names of the Licensor,
-      except as required for reasonable and customary use in describing the
-      origin of the Work and reproducing the content of the NOTICE file.
-
-   7. Disclaimer of Warranty. Unless required by applicable law or
-      agreed to in writing, Licensor provides the Work (and each
-      Contributor provides its Contributions) on an "AS IS" BASIS,
-      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
-      implied, including, without limitation, any warranties or conditions
-      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
-      PARTICULAR PURPOSE. You are solely responsible for determining the
-      appropriateness of using or redistributing the Work and assume any
-      risks associated with Your exercise of permissions under this License.
-
-   8. Limitation of Liability. In no event and under no legal theory,
-      whether in tort (including negligence), contract, or otherwise,
-      unless required by applicable law (such as deliberate and grossly
-      negligent acts) or agreed to in writing, shall any Contributor be
-      liable to You for damages, including any direct, indirect, special,
-      incidental, or consequential damages of any character arising as a
-      result of this License or out of the use or inability to use the
-      Work (including but not limited to damages for loss of goodwill,
-      work stoppage, computer failure or malfunction, or any and all
-      other commercial damages or losses), even if such Contributor
-      has been advised of the possibility of such damages.
-
-   9. Accepting Warranty or Additional Liability. While redistributing
-      the Work or Derivative Works thereof, You may choose to offer,
-      and charge a fee for, acceptance of support, warranty, indemnity,
-      or other liability obligations and/or rights consistent with this
-      License. However, in accepting such obligations, You may act only
-      on Your own behalf and on Your sole responsibility, not on behalf
-      of any other Contributor, and only if You agree to indemnify,
-      defend, and hold each Contributor harmless for any liability
-      incurred by, or claims asserted against, such Contributor by reason
-      of your accepting any such warranty or additional liability.
-
-   END OF TERMS AND CONDITIONS
-
-   APPENDIX: How to apply the Apache License to your work.
-
-      To apply the Apache License to your work, attach the following
-      boilerplate notice, with the fields enclosed by brackets "[]"
-      replaced with your own identifying information. (Don't include
-      the brackets!)  The text should be enclosed in the appropriate
-      comment syntax for the file format. We also recommend that a
-      file or class name and description of purpose be included on the
-      same "printed page" as the copyright notice for easier
-      identification within third-party archives.
-
-   Copyright 2018, The TensorFlow Privacy Authors.
-
-   Licensed under the Apache License, Version 2.0 (the "License");
-   you may not use this file except in compliance with the License.
-   You may obtain a copy of the License at
-
-       http://www.apache.org/licenses/LICENSE-2.0
-
-   Unless required by applicable law or agreed to in writing, software
-   distributed under the License is distributed on an "AS IS" BASIS,
-   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-   See the License for the specific language governing permissions and
-   limitations under the License.
--- a/tensorflow_privacy/README.md
+++ b/tensorflow_privacy/README.md
@ -1,113 +0,0 @@
-# TensorFlow Privacy
-
-This repository contains the source code for TensorFlow Privacy, a Python
-library that includes implementations of TensorFlow optimizers for training
-machine learning models with differential privacy. The library comes with
-tutorials and analysis tools for computing the privacy guarantees provided.
-
-The TensorFlow Privacy library is under continual development, always welcoming
-contributions. In particular, we always welcome help towards resolving the
-issues currently open.
-
-## Setting up TensorFlow Privacy
-
-### Dependencies
-
-This library uses [TensorFlow](https://www.tensorflow.org/) to define machine
-learning models. Therefore, installing TensorFlow (>= 1.14) is a pre-requisite.
-You can find instructions [here](https://www.tensorflow.org/install/). For
-better performance, it is also recommended to install TensorFlow with GPU
-support (detailed instructions on how to do this are available in the TensorFlow
-installation documentation).
-
-In addition to TensorFlow and its dependencies, other prerequisites are:
-
-  * `scipy` >= 0.17
-
-  * `mpmath` (for testing)
-
-  * `tensorflow_datasets` (for the RNN tutorial `lm_dpsgd_tutorial.py` only)
-
-### Installing TensorFlow Privacy
-
-First, clone this GitHub repository into a directory of your choice:
-
-```
-git clone https://github.com/tensorflow/privacy
-```
-
-You can then install the local package in "editable" mode in order to add it to
-your `PYTHONPATH`:
-
-```
-cd privacy
-pip install -e .
-```
-
-If you'd like to make contributions, we recommend first forking the repository
-and then cloning your fork rather than cloning this repository directly.
-
-## Contributing
-
-Contributions are welcomed! Bug fixes and new features can be initiated through
-GitHub pull requests. To speed the code review process, we ask that:
-
-*   When making code contributions to TensorFlow Privacy, you follow the `PEP8
-    with two spaces` coding style (the same as the one used by TensorFlow) in
-    your pull requests. In most cases this can be done by running `autopep8 -i
-    --indent-size 2 <file>` on the files you have edited.
-
-*   You should also check your code with pylint and TensorFlow's pylint
-    [configuration file](https://raw.githubusercontent.com/tensorflow/tensorflow/master/tensorflow/tools/ci_build/pylintrc)
-    by running `pylint --rcfile=/path/to/the/tf/rcfile <edited file.py>`.
-
-*   When making your first pull request, you
-    [sign the Google CLA](https://cla.developers.google.com/clas)
-
-*   We do not accept pull requests that add git submodules because of
-    [the problems that arise when maintaining git submodules](https://medium.com/@porteneuve/mastering-git-submodules-34c65e940407)
-
-## Tutorials directory
-
-To help you get started with the functionalities provided by this library, we
-provide a detailed walkthrough [here](tutorials/walkthrough/walkthrough.md) that
-will teach you how to wrap existing optimizers
-(e.g., SGD, Adam, ...) into their differentially private counterparts using
-TensorFlow (TF) Privacy. You will also learn how to tune the parameters
-introduced by differentially private optimization and how to
-measure the privacy guarantees provided using analysis tools included in TF
-Privacy.
-
-In addition, the
-`tutorials/` folder comes with scripts demonstrating how to use the library
-features. The list of tutorials is described in the README included in the
-tutorials directory.
-
-NOTE: the tutorials are maintained carefully. However, they are not considered
-part of the API and they can change at any time without warning. You should not
-write 3rd party code that imports the tutorials and expect that the interface
-will not break.
-
-## Research directory
-
-This folder contains code to reproduce results from research papers related to
-privacy in machine learning. It is not maintained as carefully as the tutorials
-directory, but rather intended as a convenient archive. 
-
-## Remarks
-
-The content of this repository supersedes the following existing folder in the
-tensorflow/models [repository](https://github.com/tensorflow/models/tree/master/research/differential_privacy)
-
-## Contacts
-
-If you have any questions that cannot be addressed by raising an issue, feel
-free to contact:
-
-* Galen Andrew (@galenmandrew)
-* Steve Chien (@schien1729)
-* Nicolas Papernot (@npapernot)
-
-## Copyright
-
-Copyright 2019 - Google LLC
--- a/tensorflow_privacy/privacy/BUILD
+++ b/tensorflow_privacy/privacy/BUILD
@ -1,14 +0,0 @@
-package(default_visibility = ["//visibility:public"])
-
-licenses(["notice"])
-
-exports_files(["LICENSE"])
-
-# This is here for backwards compatibility. New BUILD rules should depend on
-# //third_party/py/tensorflow_privacy:privacy directly.
-py_library(
-    name = "privacy",
-    deps = [
-        "//third_party/py/tensorflow_privacy",
-    ],
-)
--- a/tensorflow_privacy/privacy/analysis/init.py
+++ b/tensorflow_privacy/privacy/analysis/init.py
--- a/tensorflow_privacy/privacy/analysis/compute_dp_sgd_privacy.py
+++ b/tensorflow_privacy/privacy/analysis/compute_dp_sgd_privacy.py
@ -1,97 +0,0 @@
-# Copyright 2019 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-r"""Command-line script for computing privacy of a model trained with DP-SGD.
-
-The script applies the RDP accountant to estimate privacy budget of an iterated
-Sampled Gaussian Mechanism. The mechanism's parameters are controlled by flags.
-
-Example:
-  compute_dp_sgd_privacy
-    --N=60000 \
-    --batch_size=256 \
-    --noise_multiplier=1.12 \
-    --epochs=60 \
-    --delta=1e-5
-
-The output states that DP-SGD with these parameters satisfies (2.92, 1e-5)-DP.
-"""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import math
-import sys
-
-from absl import app
-from absl import flags
-
-# Opting out of loading all sibling packages and their dependencies.
-sys.skip_tf_privacy_import = True
-
-from tensorflow_privacy.privacy.analysis.rdp_accountant import compute_rdp  # pylint: disable=g-import-not-at-top
-from tensorflow_privacy.privacy.analysis.rdp_accountant import get_privacy_spent
-
-FLAGS = flags.FLAGS
-
-flags.DEFINE_integer('N', None, 'Total number of examples')
-flags.DEFINE_integer('batch_size', None, 'Batch size')
-flags.DEFINE_float('noise_multiplier', None, 'Noise multiplier for DP-SGD')
-flags.DEFINE_float('epochs', None, 'Number of epochs (may be fractional)')
-flags.DEFINE_float('delta', 1e-6, 'Target delta')
-
-flags.mark_flag_as_required('N')
-flags.mark_flag_as_required('batch_size')
-flags.mark_flag_as_required('noise_multiplier')
-flags.mark_flag_as_required('epochs')
-
-
-def apply_dp_sgd_analysis(q, sigma, steps, orders, delta):
-  """Compute and print results of DP-SGD analysis."""
-
-  # compute_rdp requires that sigma be the ratio of the standard deviation of
-  # the Gaussian noise to the l2-sensitivity of the function to which it is
-  # added. Hence, sigma here corresponds to the `noise_multiplier` parameter
-  # in the DP-SGD implementation found in privacy.optimizers.dp_optimizer
-  rdp = compute_rdp(q, sigma, steps, orders)
-
-  eps, _, opt_order = get_privacy_spent(orders, rdp, target_delta=delta)
-
-  print('DP-SGD with sampling rate = {:.3g}% and noise_multiplier = {} iterated'
-        ' over {} steps satisfies'.format(100 * q, sigma, steps), end=' ')
-  print('differential privacy with eps = {:.3g} and delta = {}.'.format(
-      eps, delta))
-  print('The optimal RDP order is {}.'.format(opt_order))
-
-  if opt_order == max(orders) or opt_order == min(orders):
-    print('The privacy estimate is likely to be improved by expanding '
-          'the set of orders.')
-
-
-def main(argv):
-  del argv  # argv is not used.
-
-  q = FLAGS.batch_size / FLAGS.N  # q - the sampling ratio.
-  if q > 1:
-    raise app.UsageError('N must be larger than the batch size.')
-  orders = ([1.25, 1.5, 1.75, 2., 2.25, 2.5, 3., 3.5, 4., 4.5] +
-            list(range(5, 64)) + [128, 256, 512])
-  steps = int(math.ceil(FLAGS.epochs * FLAGS.N / FLAGS.batch_size))
-
-  apply_dp_sgd_analysis(q, FLAGS.noise_multiplier, steps, orders, FLAGS.delta)
-
-
-if __name__ == '__main__':
-  app.run(main)
--- a/tensorflow_privacy/privacy/analysis/privacy_ledger.py
+++ b/tensorflow_privacy/privacy/analysis/privacy_ledger.py
@ -1,257 +0,0 @@
-# Copyright 2019 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-"""PrivacyLedger class for keeping a record of private queries."""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import collections
-
-from distutils.version import LooseVersion
-import numpy as np
-import tensorflow as tf
-
-from tensorflow_privacy.privacy.analysis import tensor_buffer
-from tensorflow_privacy.privacy.dp_query import dp_query
-
-if LooseVersion(tf.__version__) < LooseVersion('2.0.0'):
-  nest = tf.contrib.framework.nest
-else:
-  nest = tf.nest
-
-SampleEntry = collections.namedtuple(  # pylint: disable=invalid-name
-    'SampleEntry', ['population_size', 'selection_probability', 'queries'])
-
-GaussianSumQueryEntry = collections.namedtuple(  # pylint: disable=invalid-name
-    'GaussianSumQueryEntry', ['l2_norm_bound', 'noise_stddev'])
-
-
-def format_ledger(sample_array, query_array):
-  """Converts array representation into a list of SampleEntries."""
-  samples = []
-  query_pos = 0
-  sample_pos = 0
-  for sample in sample_array:
-    population_size, selection_probability, num_queries = sample
-    queries = []
-    for _ in range(int(num_queries)):
-      query = query_array[query_pos]
-      assert int(query[0]) == sample_pos
-      queries.append(GaussianSumQueryEntry(*query[1:]))
-      query_pos += 1
-    samples.append(SampleEntry(population_size, selection_probability, queries))
-    sample_pos += 1
-  return samples
-
-
-class PrivacyLedger(object):
-  """Class for keeping a record of private queries.
-
-  The PrivacyLedger keeps a record of all queries executed over a given dataset
-  for the purpose of computing privacy guarantees.
-  """
-
-  def __init__(self,
-               population_size,
-               selection_probability):
-    """Initialize the PrivacyLedger.
-
-    Args:
-      population_size: An integer (may be variable) specifying the size of the
-        population, i.e. size of the training data used in each epoch.
-      selection_probability: A float (may be variable) specifying the
-        probability each record is included in a sample.
-
-    Raises:
-      ValueError: If selection_probability is 0.
-    """
-    self._population_size = population_size
-    self._selection_probability = selection_probability
-
-    if tf.executing_eagerly():
-      if tf.equal(selection_probability, 0):
-        raise ValueError('Selection probability cannot be 0.')
-      init_capacity = tf.cast(tf.ceil(1 / selection_probability), tf.int32)
-    else:
-      if selection_probability == 0:
-        raise ValueError('Selection probability cannot be 0.')
-      init_capacity = np.int(np.ceil(1 / selection_probability))
-
-    # The query buffer stores rows corresponding to GaussianSumQueryEntries.
-    self._query_buffer = tensor_buffer.TensorBuffer(
-        init_capacity, [3], tf.float32, 'query')
-    self._sample_var = tf.Variable(
-        initial_value=tf.zeros([3]), trainable=False, name='sample')
-
-    # The sample buffer stores rows corresponding to SampleEntries.
-    self._sample_buffer = tensor_buffer.TensorBuffer(
-        init_capacity, [3], tf.float32, 'sample')
-    self._sample_count = tf.Variable(
-        initial_value=0.0, trainable=False, name='sample_count')
-    self._query_count = tf.Variable(
-        initial_value=0.0, trainable=False, name='query_count')
-    try:
-      # Newer versions of TF
-      self._cs = tf.CriticalSection()
-    except AttributeError:
-      # Older versions of TF
-      self._cs = tf.contrib.framework.CriticalSection()
-
-  def record_sum_query(self, l2_norm_bound, noise_stddev):
-    """Records that a query was issued.
-
-    Args:
-      l2_norm_bound: The maximum l2 norm of the tensor group in the query.
-      noise_stddev: The standard deviation of the noise applied to the sum.
-
-    Returns:
-      An operation recording the sum query to the ledger.
-    """
-
-    def _do_record_query():
-      with tf.control_dependencies(
-          [tf.assign(self._query_count, self._query_count + 1)]):
-        return self._query_buffer.append(
-            [self._sample_count, l2_norm_bound, noise_stddev])
-
-    return self._cs.execute(_do_record_query)
-
-  def finalize_sample(self):
-    """Finalizes sample and records sample ledger entry."""
-    with tf.control_dependencies([
-        tf.assign(self._sample_var, [
-            self._population_size, self._selection_probability,
-            self._query_count
-        ])
-    ]):
-      with tf.control_dependencies([
-          tf.assign(self._sample_count, self._sample_count + 1),
-          tf.assign(self._query_count, 0)
-      ]):
-        return self._sample_buffer.append(self._sample_var)
-
-  def get_unformatted_ledger(self):
-    return self._sample_buffer.values, self._query_buffer.values
-
-  def get_formatted_ledger(self, sess):
-    """Gets the formatted query ledger.
-
-    Args:
-      sess: The tensorflow session in which the ledger was created.
-
-    Returns:
-      The query ledger as a list of SampleEntries.
-    """
-    sample_array = sess.run(self._sample_buffer.values)
-    query_array = sess.run(self._query_buffer.values)
-
-    return format_ledger(sample_array, query_array)
-
-  def get_formatted_ledger_eager(self):
-    """Gets the formatted query ledger.
-
-    Returns:
-      The query ledger as a list of SampleEntries.
-    """
-    sample_array = self._sample_buffer.values.numpy()
-    query_array = self._query_buffer.values.numpy()
-
-    return format_ledger(sample_array, query_array)
-
-
-class QueryWithLedger(dp_query.DPQuery):
-  """A class for DP queries that record events to a PrivacyLedger.
-
-  QueryWithLedger should be the top-level query in a structure of queries that
-  may include sum queries, nested queries, etc. It should simply wrap another
-  query and contain a reference to the ledger. Any contained queries (including
-  those contained in the leaves of a nested query) should also contain a
-  reference to the same ledger object.
-
-  For example usage, see privacy_ledger_test.py.
-  """
-
-  def __init__(self, query,
-               population_size=None, selection_probability=None,
-               ledger=None):
-    """Initializes the QueryWithLedger.
-
-    Args:
-      query: The query whose events should be recorded to the ledger. Any
-        subqueries (including those in the leaves of a nested query) should also
-        contain a reference to the same ledger given here.
-      population_size: An integer (may be variable) specifying the size of the
-        population, i.e. size of the training data used in each epoch. May be
-        None if `ledger` is specified.
-      selection_probability: A float (may be variable) specifying the
-        probability each record is included in a sample. May be None if `ledger`
-        is specified.
-      ledger: A PrivacyLedger to use. Must be specified if either of
-        `population_size` or `selection_probability` is None.
-    """
-    self._query = query
-    if population_size is not None and selection_probability is not None:
-      self.set_ledger(PrivacyLedger(population_size, selection_probability))
-    elif ledger is not None:
-      self.set_ledger(ledger)
-    else:
-      raise ValueError('One of (population_size, selection_probability) or '
-                       'ledger must be specified.')
-
-  @property
-  def ledger(self):
-    return self._ledger
-
-  def set_ledger(self, ledger):
-    self._ledger = ledger
-    self._query.set_ledger(ledger)
-
-  def initial_global_state(self):
-    """See base class."""
-    return self._query.initial_global_state()
-
-  def derive_sample_params(self, global_state):
-    """See base class."""
-    return self._query.derive_sample_params(global_state)
-
-  def initial_sample_state(self, template):
-    """See base class."""
-    return self._query.initial_sample_state(template)
-
-  def preprocess_record(self, params, record):
-    """See base class."""
-    return self._query.preprocess_record(params, record)
-
-  def accumulate_preprocessed_record(self, sample_state, preprocessed_record):
-    """See base class."""
-    return self._query.accumulate_preprocessed_record(
-        sample_state, preprocessed_record)
-
-  def merge_sample_states(self, sample_state_1, sample_state_2):
-    """See base class."""
-    return self._query.merge_sample_states(sample_state_1, sample_state_2)
-
-  def get_noised_result(self, sample_state, global_state):
-    """Ensures sample is recorded to the ledger and returns noised result."""
-    # Ensure sample_state is fully aggregated before calling get_noised_result.
-    with tf.control_dependencies(nest.flatten(sample_state)):
-      result, new_global_state = self._query.get_noised_result(
-          sample_state, global_state)
-    # Ensure inner queries have recorded before finalizing.
-    with tf.control_dependencies(nest.flatten(result)):
-      finalize = self._ledger.finalize_sample()
-    # Ensure finalizing happens.
-    with tf.control_dependencies([finalize]):
-      return nest.map_structure(tf.identity, result), new_global_state
--- a/tensorflow_privacy/privacy/analysis/privacy_ledger_test.py
+++ b/tensorflow_privacy/privacy/analysis/privacy_ledger_test.py
@ -1,137 +0,0 @@
-# Copyright 2019 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-"""Tests for PrivacyLedger."""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import tensorflow as tf
-
-from tensorflow_privacy.privacy.analysis import privacy_ledger
-from tensorflow_privacy.privacy.dp_query import gaussian_query
-from tensorflow_privacy.privacy.dp_query import nested_query
-from tensorflow_privacy.privacy.dp_query import test_utils
-
-tf.enable_eager_execution()
-
-
-class PrivacyLedgerTest(tf.test.TestCase):
-
-  def test_fail_on_probability_zero(self):
-    with self.assertRaisesRegexp(ValueError,
-                                 'Selection probability cannot be 0.'):
-      privacy_ledger.PrivacyLedger(10, 0)
-
-  def test_basic(self):
-    ledger = privacy_ledger.PrivacyLedger(10, 0.1)
-    ledger.record_sum_query(5.0, 1.0)
-    ledger.record_sum_query(2.0, 0.5)
-
-    ledger.finalize_sample()
-
-    expected_queries = [[5.0, 1.0], [2.0, 0.5]]
-    formatted = ledger.get_formatted_ledger_eager()
-
-    sample = formatted[0]
-    self.assertAllClose(sample.population_size, 10.0)
-    self.assertAllClose(sample.selection_probability, 0.1)
-    self.assertAllClose(sorted(sample.queries), sorted(expected_queries))
-
-  def test_sum_query(self):
-    record1 = tf.constant([2.0, 0.0])
-    record2 = tf.constant([-1.0, 1.0])
-
-    population_size = tf.Variable(0)
-    selection_probability = tf.Variable(1.0)
-
-    query = gaussian_query.GaussianSumQuery(
-        l2_norm_clip=10.0, stddev=0.0)
-    query = privacy_ledger.QueryWithLedger(
-        query, population_size, selection_probability)
-
-    # First sample.
-    tf.assign(population_size, 10)
-    tf.assign(selection_probability, 0.1)
-    test_utils.run_query(query, [record1, record2])
-
-    expected_queries = [[10.0, 0.0]]
-    formatted = query.ledger.get_formatted_ledger_eager()
-    sample_1 = formatted[0]
-    self.assertAllClose(sample_1.population_size, 10.0)
-    self.assertAllClose(sample_1.selection_probability, 0.1)
-    self.assertAllClose(sample_1.queries, expected_queries)
-
-    # Second sample.
-    tf.assign(population_size, 20)
-    tf.assign(selection_probability, 0.2)
-    test_utils.run_query(query, [record1, record2])
-
-    formatted = query.ledger.get_formatted_ledger_eager()
-    sample_1, sample_2 = formatted
-    self.assertAllClose(sample_1.population_size, 10.0)
-    self.assertAllClose(sample_1.selection_probability, 0.1)
-    self.assertAllClose(sample_1.queries, expected_queries)
-
-    self.assertAllClose(sample_2.population_size, 20.0)
-    self.assertAllClose(sample_2.selection_probability, 0.2)
-    self.assertAllClose(sample_2.queries, expected_queries)
-
-  def test_nested_query(self):
-    population_size = tf.Variable(0)
-    selection_probability = tf.Variable(1.0)
-
-    query1 = gaussian_query.GaussianAverageQuery(
-        l2_norm_clip=4.0, sum_stddev=2.0, denominator=5.0)
-    query2 = gaussian_query.GaussianAverageQuery(
-        l2_norm_clip=5.0, sum_stddev=1.0, denominator=5.0)
-
-    query = nested_query.NestedQuery([query1, query2])
-    query = privacy_ledger.QueryWithLedger(
-        query, population_size, selection_probability)
-
-    record1 = [1.0, [12.0, 9.0]]
-    record2 = [5.0, [1.0, 2.0]]
-
-    # First sample.
-    tf.assign(population_size, 10)
-    tf.assign(selection_probability, 0.1)
-    test_utils.run_query(query, [record1, record2])
-
-    expected_queries = [[4.0, 2.0], [5.0, 1.0]]
-    formatted = query.ledger.get_formatted_ledger_eager()
-    sample_1 = formatted[0]
-    self.assertAllClose(sample_1.population_size, 10.0)
-    self.assertAllClose(sample_1.selection_probability, 0.1)
-    self.assertAllClose(sorted(sample_1.queries), sorted(expected_queries))
-
-    # Second sample.
-    tf.assign(population_size, 20)
-    tf.assign(selection_probability, 0.2)
-    test_utils.run_query(query, [record1, record2])
-
-    formatted = query.ledger.get_formatted_ledger_eager()
-    sample_1, sample_2 = formatted
-    self.assertAllClose(sample_1.population_size, 10.0)
-    self.assertAllClose(sample_1.selection_probability, 0.1)
-    self.assertAllClose(sorted(sample_1.queries), sorted(expected_queries))
-
-    self.assertAllClose(sample_2.population_size, 20.0)
-    self.assertAllClose(sample_2.selection_probability, 0.2)
-    self.assertAllClose(sorted(sample_2.queries), sorted(expected_queries))
-
-
-if __name__ == '__main__':
-  tf.test.main()
--- a/tensorflow_privacy/privacy/analysis/rdp_accountant.py
+++ b/tensorflow_privacy/privacy/analysis/rdp_accountant.py
@ -1,318 +0,0 @@
-# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""RDP analysis of the Sampled Gaussian Mechanism.
-
-Functionality for computing Renyi differential privacy (RDP) of an additive
-Sampled Gaussian Mechanism (SGM). Its public interface consists of two methods:
-  compute_rdp(q, noise_multiplier, T, orders) computes RDP for SGM iterated
-                                   T times.
-  get_privacy_spent(orders, rdp, target_eps, target_delta) computes delta
-                                   (or eps) given RDP at multiple orders and
-                                   a target value for eps (or delta).
-
-Example use:
-
-Suppose that we have run an SGM applied to a function with l2-sensitivity 1.
-Its parameters are given as a list of tuples (q1, sigma1, T1), ...,
-(qk, sigma_k, Tk), and we wish to compute eps for a given delta.
-The example code would be:
-
-  max_order = 32
-  orders = range(2, max_order + 1)
-  rdp = np.zeros_like(orders, dtype=float)
-  for q, sigma, T in parameters:
-   rdp += rdp_accountant.compute_rdp(q, sigma, T, orders)
-  eps, _, opt_order = rdp_accountant.get_privacy_spent(rdp, target_delta=delta)
-"""
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import math
-import sys
-
-import numpy as np
-from scipy import special
-import six
-
-########################
-# LOG-SPACE ARITHMETIC #
-########################
-
-
-def _log_add(logx, logy):
-  """Add two numbers in the log space."""
-  a, b = min(logx, logy), max(logx, logy)
-  if a == -np.inf:  # adding 0
-    return b
-  # Use exp(a) + exp(b) = (exp(a - b) + 1) * exp(b)
-  return math.log1p(math.exp(a - b)) + b  # log1p(x) = log(x + 1)
-
-
-def _log_sub(logx, logy):
-  """Subtract two numbers in the log space. Answer must be non-negative."""
-  if logx < logy:
-    raise ValueError("The result of subtraction must be non-negative.")
-  if logy == -np.inf:  # subtracting 0
-    return logx
-  if logx == logy:
-    return -np.inf  # 0 is represented as -np.inf in the log space.
-
-  try:
-    # Use exp(x) - exp(y) = (exp(x - y) - 1) * exp(y).
-    return math.log(math.expm1(logx - logy)) + logy  # expm1(x) = exp(x) - 1
-  except OverflowError:
-    return logx
-
-
-def _log_print(logx):
-  """Pretty print."""
-  if logx < math.log(sys.float_info.max):
-    return "{}".format(math.exp(logx))
-  else:
-    return "exp({})".format(logx)
-
-
-def _compute_log_a_int(q, sigma, alpha):
-  """Compute log(A_alpha) for integer alpha. 0 < q < 1."""
-  assert isinstance(alpha, six.integer_types)
-
-  # Initialize with 0 in the log space.
-  log_a = -np.inf
-
-  for i in range(alpha + 1):
-    log_coef_i = (
-        math.log(special.binom(alpha, i)) + i * math.log(q) +
-        (alpha - i) * math.log(1 - q))
-
-    s = log_coef_i + (i * i - i) / (2 * (sigma**2))
-    log_a = _log_add(log_a, s)
-
-  return float(log_a)
-
-
-def _compute_log_a_frac(q, sigma, alpha):
-  """Compute log(A_alpha) for fractional alpha. 0 < q < 1."""
-  # The two parts of A_alpha, integrals over (-inf,z0] and [z0, +inf), are
-  # initialized to 0 in the log space:
-  log_a0, log_a1 = -np.inf, -np.inf
-  i = 0
-
-  z0 = sigma**2 * math.log(1 / q - 1) + .5
-
-  while True:  # do ... until loop
-    coef = special.binom(alpha, i)
-    log_coef = math.log(abs(coef))
-    j = alpha - i
-
-    log_t0 = log_coef + i * math.log(q) + j * math.log(1 - q)
-    log_t1 = log_coef + j * math.log(q) + i * math.log(1 - q)
-
-    log_e0 = math.log(.5) + _log_erfc((i - z0) / (math.sqrt(2) * sigma))
-    log_e1 = math.log(.5) + _log_erfc((z0 - j) / (math.sqrt(2) * sigma))
-
-    log_s0 = log_t0 + (i * i - i) / (2 * (sigma**2)) + log_e0
-    log_s1 = log_t1 + (j * j - j) / (2 * (sigma**2)) + log_e1
-
-    if coef > 0:
-      log_a0 = _log_add(log_a0, log_s0)
-      log_a1 = _log_add(log_a1, log_s1)
-    else:
-      log_a0 = _log_sub(log_a0, log_s0)
-      log_a1 = _log_sub(log_a1, log_s1)
-
-    i += 1
-    if max(log_s0, log_s1) < -30:
-      break
-
-  return _log_add(log_a0, log_a1)
-
-
-def _compute_log_a(q, sigma, alpha):
-  """Compute log(A_alpha) for any positive finite alpha."""
-  if float(alpha).is_integer():
-    return _compute_log_a_int(q, sigma, int(alpha))
-  else:
-    return _compute_log_a_frac(q, sigma, alpha)
-
-
-def _log_erfc(x):
-  """Compute log(erfc(x)) with high accuracy for large x."""
-  try:
-    return math.log(2) + special.log_ndtr(-x * 2**.5)
-  except NameError:
-    # If log_ndtr is not available, approximate as follows:
-    r = special.erfc(x)
-    if r == 0.0:
-      # Using the Laurent series at infinity for the tail of the erfc function:
-      #     erfc(x) ~ exp(-x^2-.5/x^2+.625/x^4)/(x*pi^.5)
-      # To verify in Mathematica:
-      #     Series[Log[Erfc[x]] + Log[x] + Log[Pi]/2 + x^2, {x, Infinity, 6}]
-      return (-math.log(math.pi) / 2 - math.log(x) - x**2 - .5 * x**-2 +
-              .625 * x**-4 - 37. / 24. * x**-6 + 353. / 64. * x**-8)
-    else:
-      return math.log(r)
-
-
-def _compute_delta(orders, rdp, eps):
-  """Compute delta given a list of RDP values and target epsilon.
-
-  Args:
-    orders: An array (or a scalar) of orders.
-    rdp: A list (or a scalar) of RDP guarantees.
-    eps: The target epsilon.
-
-  Returns:
-    Pair of (delta, optimal_order).
-
-  Raises:
-    ValueError: If input is malformed.
-
-  """
-  orders_vec = np.atleast_1d(orders)
-  rdp_vec = np.atleast_1d(rdp)
-
-  if len(orders_vec) != len(rdp_vec):
-    raise ValueError("Input lists must have the same length.")
-
-  deltas = np.exp((rdp_vec - eps) * (orders_vec - 1))
-  idx_opt = np.argmin(deltas)
-  return min(deltas[idx_opt], 1.), orders_vec[idx_opt]
-
-
-def _compute_eps(orders, rdp, delta):
-  """Compute epsilon given a list of RDP values and target delta.
-
-  Args:
-    orders: An array (or a scalar) of orders.
-    rdp: A list (or a scalar) of RDP guarantees.
-    delta: The target delta.
-
-  Returns:
-    Pair of (eps, optimal_order).
-
-  Raises:
-    ValueError: If input is malformed.
-
-  """
-  orders_vec = np.atleast_1d(orders)
-  rdp_vec = np.atleast_1d(rdp)
-
-  if len(orders_vec) != len(rdp_vec):
-    raise ValueError("Input lists must have the same length.")
-
-  eps = rdp_vec - math.log(delta) / (orders_vec - 1)
-
-  idx_opt = np.nanargmin(eps)  # Ignore NaNs
-  return eps[idx_opt], orders_vec[idx_opt]
-
-
-def _compute_rdp(q, sigma, alpha):
-  """Compute RDP of the Sampled Gaussian mechanism at order alpha.
-
-  Args:
-    q: The sampling rate.
-    sigma: The std of the additive Gaussian noise.
-    alpha: The order at which RDP is computed.
-
-  Returns:
-    RDP at alpha, can be np.inf.
-  """
-  if q == 0:
-    return 0
-
-  if q == 1.:
-    return alpha / (2 * sigma**2)
-
-  if np.isinf(alpha):
-    return np.inf
-
-  return _compute_log_a(q, sigma, alpha) / (alpha - 1)
-
-
-def compute_rdp(q, noise_multiplier, steps, orders):
-  """Compute RDP of the Sampled Gaussian Mechanism.
-
-  Args:
-    q: The sampling rate.
-    noise_multiplier: The ratio of the standard deviation of the Gaussian noise
-        to the l2-sensitivity of the function to which it is added.
-    steps: The number of steps.
-    orders: An array (or a scalar) of RDP orders.
-
-  Returns:
-    The RDPs at all orders, can be np.inf.
-  """
-  if np.isscalar(orders):
-    rdp = _compute_rdp(q, noise_multiplier, orders)
-  else:
-    rdp = np.array([_compute_rdp(q, noise_multiplier, order)
-                    for order in orders])
-
-  return rdp * steps
-
-
-def get_privacy_spent(orders, rdp, target_eps=None, target_delta=None):
-  """Compute delta (or eps) for given eps (or delta) from RDP values.
-
-  Args:
-    orders: An array (or a scalar) of RDP orders.
-    rdp: An array of RDP values. Must be of the same length as the orders list.
-    target_eps: If not None, the epsilon for which we compute the corresponding
-              delta.
-    target_delta: If not None, the delta for which we compute the corresponding
-              epsilon. Exactly one of target_eps and target_delta must be None.
-
-  Returns:
-    eps, delta, opt_order.
-
-  Raises:
-    ValueError: If target_eps and target_delta are messed up.
-  """
-  if target_eps is None and target_delta is None:
-    raise ValueError(
-        "Exactly one out of eps and delta must be None. (Both are).")
-
-  if target_eps is not None and target_delta is not None:
-    raise ValueError(
-        "Exactly one out of eps and delta must be None. (None is).")
-
-  if target_eps is not None:
-    delta, opt_order = _compute_delta(orders, rdp, target_eps)
-    return target_eps, delta, opt_order
-  else:
-    eps, opt_order = _compute_eps(orders, rdp, target_delta)
-    return eps, target_delta, opt_order
-
-
-def compute_rdp_from_ledger(ledger, orders):
-  """Compute RDP of Sampled Gaussian Mechanism from ledger.
-
-  Args:
-    ledger: A formatted privacy ledger.
-    orders: An array (or a scalar) of RDP orders.
-
-  Returns:
-    RDP at all orders, can be np.inf.
-  """
-  total_rdp = np.zeros_like(orders, dtype=float)
-  for sample in ledger:
-    # Compute equivalent z from l2_clip_bounds and noise stddevs in sample.
-    # See https://arxiv.org/pdf/1812.06210.pdf for derivation of this formula.
-    effective_z = sum([
-        (q.noise_stddev / q.l2_norm_bound)**-2 for q in sample.queries])**-0.5
-    total_rdp += compute_rdp(
-        sample.selection_probability, effective_z, 1, orders)
-  return total_rdp
--- a/tensorflow_privacy/privacy/analysis/rdp_accountant_test.py
+++ b/tensorflow_privacy/privacy/analysis/rdp_accountant_test.py
@ -1,177 +0,0 @@
-# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Tests for rdp_accountant.py."""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import sys
-
-from absl.testing import absltest
-from absl.testing import parameterized
-from mpmath import exp
-from mpmath import inf
-from mpmath import log
-from mpmath import npdf
-from mpmath import quad
-import numpy as np
-
-from tensorflow_privacy.privacy.analysis import privacy_ledger
-from tensorflow_privacy.privacy.analysis import rdp_accountant
-
-
-class TestGaussianMoments(parameterized.TestCase):
-  #################################
-  # HELPER FUNCTIONS:             #
-  # Exact computations using      #
-  # multi-precision arithmetic.   #
-  #################################
-
-  def _log_float_mp(self, x):
-    # Convert multi-precision input to float log space.
-    if x >= sys.float_info.min:
-      return float(log(x))
-    else:
-      return -np.inf
-
-  def _integral_mp(self, fn, bounds=(-inf, inf)):
-    integral, _ = quad(fn, bounds, error=True, maxdegree=8)
-    return integral
-
-  def _distributions_mp(self, sigma, q):
-
-    def _mu0(x):
-      return npdf(x, mu=0, sigma=sigma)
-
-    def _mu1(x):
-      return npdf(x, mu=1, sigma=sigma)
-
-    def _mu(x):
-      return (1 - q) * _mu0(x) + q * _mu1(x)
-
-    return _mu0, _mu  # Closure!
-
-  def _mu1_over_mu0(self, x, sigma):
-    # Closed-form expression for N(1, sigma^2) / N(0, sigma^2) at x.
-    return exp((2 * x - 1) / (2 * sigma**2))
-
-  def _mu_over_mu0(self, x, q, sigma):
-    return (1 - q) + q * self._mu1_over_mu0(x, sigma)
-
-  def _compute_a_mp(self, sigma, q, alpha):
-    """Compute A_alpha for arbitrary alpha by numerical integration."""
-    mu0, _ = self._distributions_mp(sigma, q)
-    a_alpha_fn = lambda z: mu0(z) * self._mu_over_mu0(z, q, sigma)**alpha
-    a_alpha = self._integral_mp(a_alpha_fn)
-    return a_alpha
-
-  # TEST ROUTINES
-  def test_compute_rdp_no_data(self):
-    # q = 0
-    self.assertEqual(rdp_accountant.compute_rdp(0, 10, 1, 20), 0)
-
-  def test_compute_rdp_no_sampling(self):
-    # q = 1, RDP = alpha/2 * sigma^2
-    self.assertEqual(rdp_accountant.compute_rdp(1, 10, 1, 20), 0.1)
-
-  def test_compute_rdp_scalar(self):
-    rdp_scalar = rdp_accountant.compute_rdp(0.1, 2, 10, 5)
-    self.assertAlmostEqual(rdp_scalar, 0.07737, places=5)
-
-  def test_compute_rdp_sequence(self):
-    rdp_vec = rdp_accountant.compute_rdp(0.01, 2.5, 50,
-                                         [1.5, 2.5, 5, 50, 100, np.inf])
-    self.assertSequenceAlmostEqual(
-        rdp_vec, [0.00065, 0.001085, 0.00218075, 0.023846, 167.416307, np.inf],
-        delta=1e-5)
-
-  params = ({'q': 1e-7, 'sigma': .1, 'order': 1.01},
-            {'q': 1e-6, 'sigma': .1, 'order': 256},
-            {'q': 1e-5, 'sigma': .1, 'order': 256.1},
-            {'q': 1e-6, 'sigma': 1, 'order': 27},
-            {'q': 1e-4, 'sigma': 1., 'order': 1.5},
-            {'q': 1e-3, 'sigma': 1., 'order': 2},
-            {'q': .01, 'sigma': 10, 'order': 20},
-            {'q': .1, 'sigma': 100, 'order': 20.5},
-            {'q': .99, 'sigma': .1, 'order': 256},
-            {'q': .999, 'sigma': 100, 'order': 256.1})
-
-  # pylint:disable=undefined-variable
-  @parameterized.parameters(p for p in params)
-  def test_compute_log_a_equals_mp(self, q, sigma, order):
-    # Compare the cheap computation of log(A) with an expensive, multi-precision
-    # computation.
-    log_a = rdp_accountant._compute_log_a(q, sigma, order)
-    log_a_mp = self._log_float_mp(self._compute_a_mp(sigma, q, order))
-    np.testing.assert_allclose(log_a, log_a_mp, rtol=1e-4)
-
-  def test_get_privacy_spent_check_target_delta(self):
-    orders = range(2, 33)
-    rdp = rdp_accountant.compute_rdp(0.01, 4, 10000, orders)
-    eps, _, opt_order = rdp_accountant.get_privacy_spent(
-        orders, rdp, target_delta=1e-5)
-    self.assertAlmostEqual(eps, 1.258575, places=5)
-    self.assertEqual(opt_order, 20)
-
-  def test_get_privacy_spent_check_target_eps(self):
-    orders = range(2, 33)
-    rdp = rdp_accountant.compute_rdp(0.01, 4, 10000, orders)
-    _, delta, opt_order = rdp_accountant.get_privacy_spent(
-        orders, rdp, target_eps=1.258575)
-    self.assertAlmostEqual(delta, 1e-5)
-    self.assertEqual(opt_order, 20)
-
-  def test_check_composition(self):
-    orders = (1.25, 1.5, 1.75, 2., 2.5, 3., 4., 5., 6., 7., 8., 10., 12., 14.,
-              16., 20., 24., 28., 32., 64., 256.)
-
-    rdp = rdp_accountant.compute_rdp(q=1e-4,
-                                     noise_multiplier=.4,
-                                     steps=40000,
-                                     orders=orders)
-
-    eps, _, opt_order = rdp_accountant.get_privacy_spent(orders, rdp,
-                                                         target_delta=1e-6)
-
-    rdp += rdp_accountant.compute_rdp(q=0.1,
-                                      noise_multiplier=2,
-                                      steps=100,
-                                      orders=orders)
-    eps, _, opt_order = rdp_accountant.get_privacy_spent(orders, rdp,
-                                                         target_delta=1e-5)
-    self.assertAlmostEqual(eps, 8.509656, places=5)
-    self.assertEqual(opt_order, 2.5)
-
-  def test_compute_rdp_from_ledger(self):
-    orders = range(2, 33)
-    q = 0.1
-    n = 1000
-    l2_norm_clip = 3.14159
-    noise_stddev = 2.71828
-    steps = 3
-
-    query_entry = privacy_ledger.GaussianSumQueryEntry(
-        l2_norm_clip, noise_stddev)
-    ledger = [privacy_ledger.SampleEntry(n, q, [query_entry])] * steps
-
-    z = noise_stddev / l2_norm_clip
-    rdp = rdp_accountant.compute_rdp(q, z, steps, orders)
-    rdp_from_ledger = rdp_accountant.compute_rdp_from_ledger(ledger, orders)
-    self.assertSequenceAlmostEqual(rdp, rdp_from_ledger)
-
-
-if __name__ == '__main__':
-  absltest.main()
--- a/tensorflow_privacy/privacy/analysis/tensor_buffer.py
+++ b/tensorflow_privacy/privacy/analysis/tensor_buffer.py
@ -1,134 +0,0 @@
-# Copyright 2019 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-"""A lightweight buffer for maintaining tensors."""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import tensorflow as tf
-
-
-class TensorBuffer(object):
-  """A lightweight buffer for maintaining lists.
-
-  The TensorBuffer accumulates tensors of the given shape into a tensor (whose
-  rank is one more than that of the given shape) via calls to `append`. The
-  current value of the accumulated tensor can be extracted via the property
-  `values`.
-  """
-
-  def __init__(self, capacity, shape, dtype=tf.int32, name=None):
-    """Initializes the TensorBuffer.
-
-    Args:
-      capacity: Initial capacity. Buffer will double in capacity each time it is
-        filled to capacity.
-      shape: The shape (as tuple or list) of the tensors to accumulate.
-      dtype: The type of the tensors.
-      name: A string name for the variable_scope used.
-
-    Raises:
-      ValueError: If the shape is empty (specifies scalar shape).
-    """
-    shape = list(shape)
-    self._rank = len(shape)
-    self._name = name
-    self._dtype = dtype
-    if not self._rank:
-      raise ValueError('Shape cannot be scalar.')
-    shape = [capacity] + shape
-
-    with tf.variable_scope(self._name):
-      # We need to use a placeholder as the initial value to allow resizing.
-      self._buffer = tf.Variable(
-          initial_value=tf.placeholder_with_default(
-              tf.zeros(shape, dtype), shape=None),
-          trainable=False,
-          name='buffer',
-          use_resource=True)
-      self._current_size = tf.Variable(
-          initial_value=0, dtype=tf.int32, trainable=False, name='current_size')
-      self._capacity = tf.Variable(
-          initial_value=capacity,
-          dtype=tf.int32,
-          trainable=False,
-          name='capacity')
-
-  def append(self, value):
-    """Appends a new tensor to the end of the buffer.
-
-    Args:
-      value: The tensor to append. Must match the shape specified in the
-        initializer.
-
-    Returns:
-      An op appending the new tensor to the end of the buffer.
-    """
-
-    def _double_capacity():
-      """Doubles the capacity of the current tensor buffer."""
-      padding = tf.zeros_like(self._buffer, self._buffer.dtype)
-      new_buffer = tf.concat([self._buffer, padding], axis=0)
-      if tf.executing_eagerly():
-        with tf.variable_scope(self._name, reuse=True):
-          self._buffer = tf.get_variable(
-              name='buffer',
-              dtype=self._dtype,
-              initializer=new_buffer,
-              trainable=False)
-          return self._buffer, tf.assign(self._capacity,
-                                         tf.multiply(self._capacity, 2))
-      else:
-        return tf.assign(
-            self._buffer, new_buffer,
-            validate_shape=False), tf.assign(self._capacity,
-                                             tf.multiply(self._capacity, 2))
-
-    update_buffer, update_capacity = tf.cond(
-        tf.equal(self._current_size, self._capacity),
-        _double_capacity, lambda: (self._buffer, self._capacity))
-
-    with tf.control_dependencies([update_buffer, update_capacity]):
-      with tf.control_dependencies([
-          tf.assert_less(
-              self._current_size,
-              self._capacity,
-              message='Appending past end of TensorBuffer.'),
-          tf.assert_equal(
-              tf.shape(value),
-              tf.shape(self._buffer)[1:],
-              message='Appending value of inconsistent shape.')
-      ]):
-        with tf.control_dependencies(
-            [tf.assign(self._buffer[self._current_size, :], value)]):
-          return tf.assign_add(self._current_size, 1)
-
-  @property
-  def values(self):
-    """Returns the accumulated tensor."""
-    begin_value = tf.zeros([self._rank + 1], dtype=tf.int32)
-    value_size = tf.concat([[self._current_size],
-                            tf.constant(-1, tf.int32, [self._rank])], 0)
-    return tf.slice(self._buffer, begin_value, value_size)
-
-  @property
-  def current_size(self):
-    """Returns the current number of tensors in the buffer."""
-    return self._current_size
-
-  @property
-  def capacity(self):
-    """Returns the current capacity of the buffer."""
-    return self._capacity
--- a/tensorflow_privacy/privacy/analysis/tensor_buffer_test_eager.py
+++ b/tensorflow_privacy/privacy/analysis/tensor_buffer_test_eager.py
@ -1,84 +0,0 @@
-# Copyright 2019 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-"""Tests for tensor_buffer in eager mode."""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import tensorflow as tf
-
-from tensorflow_privacy.privacy.analysis import tensor_buffer
-
-tf.enable_eager_execution()
-
-
-class TensorBufferTest(tf.test.TestCase):
-  """Tests for TensorBuffer in eager mode."""
-
-  def test_basic(self):
-    size, shape = 2, [2, 3]
-
-    my_buffer = tensor_buffer.TensorBuffer(size, shape, name='my_buffer')
-
-    value1 = [[1, 2, 3], [4, 5, 6]]
-    my_buffer.append(value1)
-    self.assertAllEqual(my_buffer.values.numpy(), [value1])
-
-    value2 = [[4, 5, 6], [7, 8, 9]]
-    my_buffer.append(value2)
-    self.assertAllEqual(my_buffer.values.numpy(), [value1, value2])
-
-  def test_fail_on_scalar(self):
-    with self.assertRaisesRegexp(ValueError, 'Shape cannot be scalar.'):
-      tensor_buffer.TensorBuffer(1, ())
-
-  def test_fail_on_inconsistent_shape(self):
-    size, shape = 1, [2, 3]
-
-    my_buffer = tensor_buffer.TensorBuffer(size, shape, name='my_buffer')
-
-    with self.assertRaisesRegexp(
-        tf.errors.InvalidArgumentError,
-        'Appending value of inconsistent shape.'):
-      my_buffer.append(tf.ones(shape=[3, 4], dtype=tf.int32))
-
-  def test_resize(self):
-    size, shape = 2, [2, 3]
-
-    my_buffer = tensor_buffer.TensorBuffer(size, shape, name='my_buffer')
-
-    # Append three buffers. Third one should succeed after resizing.
-    value1 = [[1, 2, 3], [4, 5, 6]]
-    my_buffer.append(value1)
-    self.assertAllEqual(my_buffer.values.numpy(), [value1])
-    self.assertAllEqual(my_buffer.current_size.numpy(), 1)
-    self.assertAllEqual(my_buffer.capacity.numpy(), 2)
-
-    value2 = [[4, 5, 6], [7, 8, 9]]
-    my_buffer.append(value2)
-    self.assertAllEqual(my_buffer.values.numpy(), [value1, value2])
-    self.assertAllEqual(my_buffer.current_size.numpy(), 2)
-    self.assertAllEqual(my_buffer.capacity.numpy(), 2)
-
-    value3 = [[7, 8, 9], [10, 11, 12]]
-    my_buffer.append(value3)
-    self.assertAllEqual(my_buffer.values.numpy(), [value1, value2, value3])
-    self.assertAllEqual(my_buffer.current_size.numpy(), 3)
-    # Capacity should have doubled.
-    self.assertAllEqual(my_buffer.capacity.numpy(), 4)
-
-
-if __name__ == '__main__':
-  tf.test.main()
--- a/tensorflow_privacy/privacy/analysis/tensor_buffer_test_graph.py
+++ b/tensorflow_privacy/privacy/analysis/tensor_buffer_test_graph.py
@ -1,72 +0,0 @@
-# Copyright 2019 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-"""Tests for tensor_buffer in graph mode."""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import tensorflow as tf
-
-from tensorflow_privacy.privacy.analysis import tensor_buffer
-
-
-class TensorBufferTest(tf.test.TestCase):
-  """Tests for TensorBuffer in graph mode."""
-
-  def test_noresize(self):
-    """Test buffer does not resize if capacity is not exceeded."""
-    with self.cached_session() as sess:
-      size, shape = 2, [2, 3]
-
-      my_buffer = tensor_buffer.TensorBuffer(size, shape, name='my_buffer')
-      value1 = [[1, 2, 3], [4, 5, 6]]
-      with tf.control_dependencies([my_buffer.append(value1)]):
-        value2 = [[7, 8, 9], [10, 11, 12]]
-        with tf.control_dependencies([my_buffer.append(value2)]):
-          values = my_buffer.values
-          current_size = my_buffer.current_size
-          capacity = my_buffer.capacity
-      self.evaluate(tf.global_variables_initializer())
-
-      v, cs, cap = sess.run([values, current_size, capacity])
-      self.assertAllEqual(v, [value1, value2])
-      self.assertEqual(cs, 2)
-      self.assertEqual(cap, 2)
-
-  def test_resize(self):
-    """Test buffer resizes if capacity is exceeded."""
-    with self.cached_session() as sess:
-      size, shape = 2, [2, 3]
-
-      my_buffer = tensor_buffer.TensorBuffer(size, shape, name='my_buffer')
-      value1 = [[1, 2, 3], [4, 5, 6]]
-      with tf.control_dependencies([my_buffer.append(value1)]):
-        value2 = [[7, 8, 9], [10, 11, 12]]
-        with tf.control_dependencies([my_buffer.append(value2)]):
-          value3 = [[13, 14, 15], [16, 17, 18]]
-          with tf.control_dependencies([my_buffer.append(value3)]):
-            values = my_buffer.values
-            current_size = my_buffer.current_size
-            capacity = my_buffer.capacity
-      self.evaluate(tf.global_variables_initializer())
-
-      v, cs, cap = sess.run([values, current_size, capacity])
-      self.assertAllEqual(v, [value1, value2, value3])
-      self.assertEqual(cs, 3)
-      self.assertEqual(cap, 4)
-
-
-if __name__ == '__main__':
-  tf.test.main()
--- a/tensorflow_privacy/privacy/bolt_on/README.md
+++ b/tensorflow_privacy/privacy/bolt_on/README.md
@ -1,67 +0,0 @@
-# BoltOn Subpackage
-
-This package contains source code for the BoltOn method, a particular
-differential-privacy (DP) technique that uses output perturbations and
-leverages additional assumptions to provide a new way of approaching the
-privacy guarantees.
-
-## BoltOn Description
-
-This method uses 4 key steps to achieve privacy guarantees:
-  1. Adds noise to weights after training (output perturbation).
-  2. Projects weights to R, the radius of the hypothesis space,
-      after each batch. This value is configurable by the user.
-  3. Limits learning rate
-  4. Uses a strongly convex loss function (see compile)
-
-For more details on the strong convexity requirements, see:
-Bolt-on Differential Privacy for Scalable Stochastic Gradient
-Descent-based Analytics by Xi Wu et al. at https://arxiv.org/pdf/1606.04722.pdf
-
-## Why BoltOn?
-
-The major difference for the BoltOn method is that it injects noise post model
-convergence, rather than noising gradients or weights during training. This
-approach requires some additional constraints listed in the Description.
-Should the use-case and model satisfy these constraints, this is another
-approach that can be trained to maximize utility while maintaining the privacy.
-The paper describes in detail the advantages and disadvantages of this approach
-and its results compared to some other methods, namely noising at each iteration
-and no noising.
-
-## Tutorials
-
-This package has a tutorial that can be found in the root tutorials directory,
-under `bolton_tutorial.py`.
-
-## Contribution
-
-This package was initially contributed by Georgian Partners with the hope of
-growing the tensorflow/privacy library. There are several rich use cases for
-delta-epsilon privacy in machine learning, some of which can be explored here:
-https://medium.com/apache-mxnet/epsilon-differential-privacy-for-machine-learning-using-mxnet-a4270fe3865e
-https://arxiv.org/pdf/1811.04911.pdf
-
-## Stability
-
-As we are pegged on tensorflow2.0, this package may encounter stability
-issues in the ongoing development of tensorflow2.0.
-
-This sub-package is currently stable for 2.0.0a0, 2.0.0b0, and 2.0.0.b1 If you
-would like to use this subpackage, please do use one of these versions as we
-cannot guarantee it will work for all latest releases. If you do find issues,
-feel free to raise an issue to the contributors listed below.
-
-## Contacts
-
-In addition to the maintainers of tensorflow/privacy listed in the root
-README.md, please feel free to contact members of Georgian Partners. In
-particular,
-
-* Georgian Partners(@georgianpartners)
-* Ji Chao Zhang(@Jichaogp)
-* Christopher Choquette(@cchoquette)
-
-## Copyright
-
-Copyright 2019 - Google LLC
--- a/tensorflow_privacy/privacy/bolt_on/init.py
+++ b/tensorflow_privacy/privacy/bolt_on/init.py
@ -1,29 +0,0 @@
-# Copyright 2019, The TensorFlow Privacy Authors.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#      http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-"""BoltOn Method for privacy."""
-import sys
-from distutils.version import LooseVersion
-import tensorflow as tf
-
-if LooseVersion(tf.__version__) < LooseVersion("2.0.0"):
-  raise ImportError("Please upgrade your version "
-                    "of tensorflow from: {0} to at least 2.0.0 to "
-                    "use privacy/bolt_on".format(LooseVersion(tf.__version__)))
-if hasattr(sys, "skip_tf_privacy_import"):  # Useful for standalone scripts.
-  pass
-else:
-  from tensorflow_privacy.privacy.bolt_on.models import BoltOnModel  # pylint: disable=g-import-not-at-top
-  from tensorflow_privacy.privacy.bolt_on.optimizers import BoltOn  # pylint: disable=g-import-not-at-top
-  from tensorflow_privacy.privacy.bolt_on.losses import StrongConvexHuber  # pylint: disable=g-import-not-at-top
-  from tensorflow_privacy.privacy.bolt_on.losses import StrongConvexBinaryCrossentropy  # pylint: disable=g-import-not-at-top
--- a/tensorflow_privacy/privacy/bolt_on/losses.py
+++ b/tensorflow_privacy/privacy/bolt_on/losses.py
@ -1,304 +0,0 @@
-# Copyright 2019, The TensorFlow Authors.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#      http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-"""Loss functions for BoltOn method."""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import tensorflow as tf
-from tensorflow.python.framework import ops as _ops
-from tensorflow.python.keras import losses
-from tensorflow.python.keras.regularizers import L1L2
-from tensorflow.python.keras.utils import losses_utils
-from tensorflow.python.platform import tf_logging as logging
-
-
-class StrongConvexMixin:  # pylint: disable=old-style-class
-  """Strong Convex Mixin base class.
-
-  Strong Convex Mixin base class for any loss function that will be used with
-  BoltOn model. Subclasses must be strongly convex and implement the
-  associated constants. They must also conform to the requirements of tf losses
-  (see super class).
-
-  For more details on the strong convexity requirements, see:
-  Bolt-on Differential Privacy for Scalable Stochastic Gradient
-  Descent-based Analytics by Xi Wu et. al.
-  """
-
-  def radius(self):
-    """Radius, R, of the hypothesis space W.
-
-    W is a convex set that forms the hypothesis space.
-
-    Returns:
-      R
-    """
-    raise NotImplementedError("Radius not implemented for StrongConvex Loss"
-                              "function: %s" % str(self.__class__.__name__))
-
-  def gamma(self):
-    """Returns strongly convex parameter, gamma."""
-    raise NotImplementedError("Gamma not implemented for StrongConvex Loss"
-                              "function: %s" % str(self.__class__.__name__))
-
-  def beta(self, class_weight):
-    """Smoothness, beta.
-
-    Args:
-      class_weight: the class weights as scalar or 1d tensor, where its
-        dimensionality is equal to the number of outputs.
-
-    Returns:
-      Beta
-    """
-    raise NotImplementedError("Beta not implemented for StrongConvex Loss"
-                              "function: %s" % str(self.__class__.__name__))
-
-  def lipchitz_constant(self, class_weight):
-    """Lipchitz constant, L.
-
-    Args:
-      class_weight: class weights used
-
-    Returns: L
-    """
-    raise NotImplementedError("lipchitz constant not implemented for "
-                              "StrongConvex Loss"
-                              "function: %s" % str(self.__class__.__name__))
-
-  def kernel_regularizer(self):
-    """Returns the kernel_regularizer to be used.
-
-    Any subclass should override this method if they want a kernel_regularizer
-    (if required for the loss function to be StronglyConvex.
-    """
-    return None
-
-  def max_class_weight(self, class_weight, dtype):
-    """The maximum weighting in class weights (max value) as a scalar tensor.
-
-    Args:
-      class_weight: class weights used
-      dtype: the data type for tensor conversions.
-
-    Returns:
-      maximum class weighting as tensor scalar
-    """
-    class_weight = _ops.convert_to_tensor_v2(class_weight, dtype)
-    return tf.math.reduce_max(class_weight)
-
-
-class StrongConvexHuber(losses.Loss, StrongConvexMixin):
-  """Strong Convex version of Huber loss using l2 weight regularization."""
-
-  def __init__(self,
-               reg_lambda,
-               c_arg,
-               radius_constant,
-               delta,
-               reduction=losses_utils.ReductionV2.SUM_OVER_BATCH_SIZE,
-               dtype=tf.float32):
-    """Constructor.
-
-    Args:
-      reg_lambda: Weight regularization constant
-      c_arg: Penalty parameter C of the loss term
-      radius_constant: constant defining the length of the radius
-      delta: delta value in huber loss.  When to switch from quadratic to
-        absolute deviation.
-      reduction: reduction type to use. See super class
-      dtype: tf datatype to use for tensor conversions.
-
-    Returns:
-      Loss values per sample.
-    """
-    if c_arg <= 0:
-      raise ValueError("c: {0}, should be >= 0".format(c_arg))
-    if reg_lambda <= 0:
-      raise ValueError("reg lambda: {0} must be positive".format(reg_lambda))
-    if radius_constant <= 0:
-      raise ValueError("radius_constant: {0}, should be >= 0".format(
-          radius_constant
-      ))
-    if delta <= 0:
-      raise ValueError("delta: {0}, should be >= 0".format(
-          delta
-      ))
-    self.C = c_arg  # pylint: disable=invalid-name
-    self.delta = delta
-    self.radius_constant = radius_constant
-    self.dtype = dtype
-    self.reg_lambda = tf.constant(reg_lambda, dtype=self.dtype)
-    super(StrongConvexHuber, self).__init__(
-        name="strongconvexhuber",
-        reduction=reduction,
-    )
-
-  def call(self, y_true, y_pred):
-    """Computes loss.
-
-    Args:
-      y_true: Ground truth values. One hot encoded using -1 and 1.
-      y_pred: The predicted values.
-
-    Returns:
-      Loss values per sample.
-    """
-    h = self.delta
-    z = y_pred * y_true
-    one = tf.constant(1, dtype=self.dtype)
-    four = tf.constant(4, dtype=self.dtype)
-
-    if z > one + h:  # pylint: disable=no-else-return
-      return _ops.convert_to_tensor_v2(0, dtype=self.dtype)
-    elif tf.math.abs(one - z) <= h:
-      return one / (four * h) * tf.math.pow(one + h - z, 2)
-    return one - z
-
-  def radius(self):
-    """See super class."""
-    return self.radius_constant / self.reg_lambda
-
-  def gamma(self):
-    """See super class."""
-    return self.reg_lambda
-
-  def beta(self, class_weight):
-    """See super class."""
-    max_class_weight = self.max_class_weight(class_weight, self.dtype)
-    delta = _ops.convert_to_tensor_v2(self.delta,
-                                      dtype=self.dtype
-                                     )
-    return self.C * max_class_weight / (delta *
-                                        tf.constant(2, dtype=self.dtype)) + \
-           self.reg_lambda
-
-  def lipchitz_constant(self, class_weight):
-    """See super class."""
-    # if class_weight is provided,
-    # it should be a vector of the same size of number of classes
-    max_class_weight = self.max_class_weight(class_weight, self.dtype)
-    lc = self.C * max_class_weight + \
-         self.reg_lambda * self.radius()
-    return lc
-
-  def kernel_regularizer(self):
-    """Return l2 loss using 0.5*reg_lambda as the l2 term (as desired).
-
-    L2 regularization is required for this loss function to be strongly convex.
-
-    Returns:
-      The L2 regularizer layer for this loss function, with regularizer constant
-      set to half the 0.5 * reg_lambda.
-    """
-    return L1L2(l2=self.reg_lambda/2)
-
-
-class StrongConvexBinaryCrossentropy(
-    losses.BinaryCrossentropy,
-    StrongConvexMixin
-):
-  """Strongly Convex BinaryCrossentropy loss using l2 weight regularization."""
-
-  def __init__(self,
-               reg_lambda,
-               c_arg,
-               radius_constant,
-               from_logits=True,
-               label_smoothing=0,
-               reduction=losses_utils.ReductionV2.SUM_OVER_BATCH_SIZE,
-               dtype=tf.float32):
-    """StrongConvexBinaryCrossentropy class.
-
-    Args:
-      reg_lambda: Weight regularization constant
-      c_arg: Penalty parameter C of the loss term
-      radius_constant: constant defining the length of the radius
-      from_logits: True if the input are unscaled logits. False if they are
-        already scaled.
-      label_smoothing: amount of smoothing to perform on labels
-        relaxation of trust in labels, e.g. (1 -> 1-x, 0 -> 0+x). Note, the
-        impact of this parameter's effect on privacy is not known and thus the
-        default should be used.
-      reduction: reduction type to use. See super class
-      dtype: tf datatype to use for tensor conversions.
-    """
-    if label_smoothing != 0:
-      logging.warning("The impact of label smoothing on privacy is unknown. "
-                      "Use label smoothing at your own risk as it may not "
-                      "guarantee privacy.")
-
-    if reg_lambda <= 0:
-      raise ValueError("reg lambda: {0} must be positive".format(reg_lambda))
-    if c_arg <= 0:
-      raise ValueError("c: {0}, should be >= 0".format(c_arg))
-    if radius_constant <= 0:
-      raise ValueError("radius_constant: {0}, should be >= 0".format(
-          radius_constant
-      ))
-    self.dtype = dtype
-    self.C = c_arg  # pylint: disable=invalid-name
-    self.reg_lambda = tf.constant(reg_lambda, dtype=self.dtype)
-    super(StrongConvexBinaryCrossentropy, self).__init__(
-        reduction=reduction,
-        name="strongconvexbinarycrossentropy",
-        from_logits=from_logits,
-        label_smoothing=label_smoothing,
-    )
-    self.radius_constant = radius_constant
-
-  def call(self, y_true, y_pred):
-    """Computes loss.
-
-    Args:
-      y_true: Ground truth values.
-      y_pred: The predicted values.
-
-    Returns:
-      Loss values per sample.
-    """
-    loss = super(StrongConvexBinaryCrossentropy, self).call(y_true, y_pred)
-    loss = loss * self.C
-    return loss
-
-  def radius(self):
-    """See super class."""
-    return self.radius_constant / self.reg_lambda
-
-  def gamma(self):
-    """See super class."""
-    return self.reg_lambda
-
-  def beta(self, class_weight):
-    """See super class."""
-    max_class_weight = self.max_class_weight(class_weight, self.dtype)
-    return self.C * max_class_weight + self.reg_lambda
-
-  def lipchitz_constant(self, class_weight):
-    """See super class."""
-    max_class_weight = self.max_class_weight(class_weight, self.dtype)
-    return self.C * max_class_weight + self.reg_lambda * self.radius()
-
-  def kernel_regularizer(self):
-    """Return l2 loss using 0.5*reg_lambda as the l2 term (as desired).
-
-    L2 regularization is required for this loss function to be strongly convex.
-
-    Returns:
-      The L2 regularizer layer for this loss function, with regularizer constant
-      set to half the 0.5 * reg_lambda.
-    """
-    return L1L2(l2=self.reg_lambda/2)
--- a/tensorflow_privacy/privacy/bolt_on/losses_test.py
+++ b/tensorflow_privacy/privacy/bolt_on/losses_test.py
@ -1,431 +0,0 @@
-# Copyright 2019, The TensorFlow Authors.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#      http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-"""Unit testing for losses."""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-from contextlib import contextmanager  # pylint: disable=g-importing-member
-from io import StringIO  # pylint: disable=g-importing-member
-import sys
-from absl.testing import parameterized
-import tensorflow as tf
-from tensorflow.python.framework import test_util
-from tensorflow.python.keras import keras_parameterized
-from tensorflow.python.keras.regularizers import L1L2
-from tensorflow_privacy.privacy.bolt_on.losses import StrongConvexBinaryCrossentropy
-from tensorflow_privacy.privacy.bolt_on.losses import StrongConvexHuber
-from tensorflow_privacy.privacy.bolt_on.losses import StrongConvexMixin
-
-
-@contextmanager
-def captured_output():
-  """Capture std_out and std_err within context."""
-  new_out, new_err = StringIO(), StringIO()
-  old_out, old_err = sys.stdout, sys.stderr
-  try:
-    sys.stdout, sys.stderr = new_out, new_err
-    yield sys.stdout, sys.stderr
-  finally:
-    sys.stdout, sys.stderr = old_out, old_err
-
-
-class StrongConvexMixinTests(keras_parameterized.TestCase):
-  """Tests for the StrongConvexMixin."""
-  @parameterized.named_parameters([
-      {'testcase_name': 'beta not implemented',
-       'fn': 'beta',
-       'args': [1]},
-      {'testcase_name': 'gamma not implemented',
-       'fn': 'gamma',
-       'args': []},
-      {'testcase_name': 'lipchitz not implemented',
-       'fn': 'lipchitz_constant',
-       'args': [1]},
-      {'testcase_name': 'radius not implemented',
-       'fn': 'radius',
-       'args': []},
-  ])
-
-  def test_not_implemented(self, fn, args):
-    """Test that the given fn's are not implemented on the mixin.
-
-    Args:
-      fn: fn on Mixin to test
-      args: arguments to fn of Mixin
-    """
-    with self.assertRaises(NotImplementedError):
-      loss = StrongConvexMixin()
-      getattr(loss, fn, None)(*args)
-
-  @parameterized.named_parameters([
-      {'testcase_name': 'radius not implemented',
-       'fn': 'kernel_regularizer',
-       'args': []},
-  ])
-  def test_return_none(self, fn, args):
-    """Test that fn of Mixin returns None.
-
-    Args:
-      fn: fn of Mixin to test
-      args: arguments to fn of Mixin
-    """
-    loss = StrongConvexMixin()
-    ret = getattr(loss, fn, None)(*args)
-    self.assertEqual(ret, None)
-
-
-class BinaryCrossesntropyTests(keras_parameterized.TestCase):
-  """tests for BinaryCrossesntropy StrongConvex loss."""
-
-  @parameterized.named_parameters([
-      {'testcase_name': 'normal',
-       'reg_lambda': 1,
-       'C': 1,
-       'radius_constant': 1
-      },  # pylint: disable=invalid-name
-  ])
-  def test_init_params(self, reg_lambda, C, radius_constant):
-    """Test initialization for given arguments.
-
-    Args:
-      reg_lambda: initialization value for reg_lambda arg
-      C: initialization value for C arg
-      radius_constant: initialization value for radius_constant arg
-    """
-    # test valid domains for each variable
-    loss = StrongConvexBinaryCrossentropy(reg_lambda, C, radius_constant)
-    self.assertIsInstance(loss, StrongConvexBinaryCrossentropy)
-
-  @parameterized.named_parameters([
-      {'testcase_name': 'negative c',
-       'reg_lambda': 1,
-       'C': -1,
-       'radius_constant': 1
-      },
-      {'testcase_name': 'negative radius',
-       'reg_lambda': 1,
-       'C': 1,
-       'radius_constant': -1
-      },
-      {'testcase_name': 'negative lambda',
-       'reg_lambda': -1,
-       'C': 1,
-       'radius_constant': 1
-      },  # pylint: disable=invalid-name
-  ])
-  def test_bad_init_params(self, reg_lambda, C, radius_constant):
-    """Test invalid domain for given params. Should return ValueError.
-
-    Args:
-      reg_lambda: initialization value for reg_lambda arg
-      C: initialization value for C arg
-      radius_constant: initialization value for radius_constant arg
-    """
-    # test valid domains for each variable
-    with self.assertRaises(ValueError):
-      StrongConvexBinaryCrossentropy(reg_lambda, C, radius_constant)
-
-  @test_util.run_all_in_graph_and_eager_modes
-  @parameterized.named_parameters([
-      # [] for compatibility with tensorflow loss calculation
-      {'testcase_name': 'both positive',
-       'logits': [10000],
-       'y_true': [1],
-       'result': 0,
-      },
-      {'testcase_name': 'positive gradient negative logits',
-       'logits': [-10000],
-       'y_true': [1],
-       'result': 10000,
-      },
-      {'testcase_name': 'positivee gradient positive logits',
-       'logits': [10000],
-       'y_true': [0],
-       'result': 10000,
-      },
-      {'testcase_name': 'both negative',
-       'logits': [-10000],
-       'y_true': [0],
-       'result': 0
-      },
-  ])
-  def test_calculation(self, logits, y_true, result):
-    """Test the call method to ensure it returns the correct value.
-
-    Args:
-      logits: unscaled output of model
-      y_true: label
-      result: correct loss calculation value
-    """
-    logits = tf.Variable(logits, False, dtype=tf.float32)
-    y_true = tf.Variable(y_true, False, dtype=tf.float32)
-    loss = StrongConvexBinaryCrossentropy(0.00001, 1, 1)
-    loss = loss(y_true, logits)
-    self.assertEqual(loss.numpy(), result)
-
-  @parameterized.named_parameters([
-      {'testcase_name': 'beta',
-       'init_args': [1, 1, 1],
-       'fn': 'beta',
-       'args': [1],
-       'result': tf.constant(2, dtype=tf.float32)
-      },
-      {'testcase_name': 'gamma',
-       'fn': 'gamma',
-       'init_args': [1, 1, 1],
-       'args': [],
-       'result': tf.constant(1, dtype=tf.float32),
-      },
-      {'testcase_name': 'lipchitz constant',
-       'fn': 'lipchitz_constant',
-       'init_args': [1, 1, 1],
-       'args': [1],
-       'result': tf.constant(2, dtype=tf.float32),
-      },
-      {'testcase_name': 'kernel regularizer',
-       'fn': 'kernel_regularizer',
-       'init_args': [1, 1, 1],
-       'args': [],
-       'result': L1L2(l2=0.5),
-      },
-  ])
-  def test_fns(self, init_args, fn, args, result):
-    """Test that fn of BinaryCrossentropy loss returns the correct result.
-
-    Args:
-      init_args: init values for loss instance
-      fn: the fn to test
-      args: the arguments to above function
-      result: the correct result from the fn
-    """
-    loss = StrongConvexBinaryCrossentropy(*init_args)
-    expected = getattr(loss, fn, lambda: 'fn not found')(*args)
-    if hasattr(expected, 'numpy') and hasattr(result, 'numpy'):  # both tensor
-      expected = expected.numpy()
-      result = result.numpy()
-    if hasattr(expected, 'l2') and hasattr(result, 'l2'):  # both l2 regularizer
-      expected = expected.l2
-      result = result.l2
-    self.assertEqual(expected, result)
-
-  @parameterized.named_parameters([
-      {'testcase_name': 'label_smoothing',
-       'init_args': [1, 1, 1, True, 0.1],
-       'fn': None,
-       'args': None,
-       'print_res': 'The impact of label smoothing on privacy is unknown.'
-      },
-  ])
-  def test_prints(self, init_args, fn, args, print_res):
-    """Test logger warning from StrongConvexBinaryCrossentropy.
-
-    Args:
-      init_args: arguments to init the object with.
-      fn: function to test
-      args: arguments to above function
-      print_res: print result that should have been printed.
-    """
-    with captured_output() as (out, err):  # pylint: disable=unused-variable
-      loss = StrongConvexBinaryCrossentropy(*init_args)
-      if fn is not None:
-        getattr(loss, fn, lambda *arguments: print('error'))(*args)
-    self.assertRegexMatch(err.getvalue().strip(), [print_res])
-
-
-class HuberTests(keras_parameterized.TestCase):
-  """tests for BinaryCrossesntropy StrongConvex loss."""
-
-  @parameterized.named_parameters([
-      {'testcase_name': 'normal',
-       'reg_lambda': 1,
-       'c': 1,
-       'radius_constant': 1,
-       'delta': 1,
-      },
-  ])
-  def test_init_params(self, reg_lambda, c, radius_constant, delta):
-    """Test initialization for given arguments.
-
-    Args:
-      reg_lambda: initialization value for reg_lambda arg
-      c: initialization value for C arg
-      radius_constant: initialization value for radius_constant arg
-      delta: the delta parameter for the huber loss
-    """
-    # test valid domains for each variable
-    loss = StrongConvexHuber(reg_lambda, c, radius_constant, delta)
-    self.assertIsInstance(loss, StrongConvexHuber)
-
-  @parameterized.named_parameters([
-      {'testcase_name': 'negative c',
-       'reg_lambda': 1,
-       'c': -1,
-       'radius_constant': 1,
-       'delta': 1
-      },
-      {'testcase_name': 'negative radius',
-       'reg_lambda': 1,
-       'c': 1,
-       'radius_constant': -1,
-       'delta': 1
-      },
-      {'testcase_name': 'negative lambda',
-       'reg_lambda': -1,
-       'c': 1,
-       'radius_constant': 1,
-       'delta': 1
-      },
-      {'testcase_name': 'negative delta',
-       'reg_lambda': 1,
-       'c': 1,
-       'radius_constant': 1,
-       'delta': -1
-      },
-  ])
-  def test_bad_init_params(self, reg_lambda, c, radius_constant, delta):
-    """Test invalid domain for given params. Should return ValueError.
-
-    Args:
-      reg_lambda: initialization value for reg_lambda arg
-      c: initialization value for C arg
-      radius_constant: initialization value for radius_constant arg
-      delta: the delta parameter for the huber loss
-    """
-    # test valid domains for each variable
-    with self.assertRaises(ValueError):
-      StrongConvexHuber(reg_lambda, c, radius_constant, delta)
-
-  # test the bounds and test varied delta's
-  @test_util.run_all_in_graph_and_eager_modes
-  @parameterized.named_parameters([
-      {'testcase_name': 'delta=1,y_true=1 z>1+h decision boundary',
-       'logits': 2.1,
-       'y_true': 1,
-       'delta': 1,
-       'result': 0,
-      },
-      {'testcase_name': 'delta=1,y_true=1 z<1+h decision boundary',
-       'logits': 1.9,
-       'y_true': 1,
-       'delta': 1,
-       'result': 0.01*0.25,
-      },
-      {'testcase_name': 'delta=1,y_true=1 1-z< h decision boundary',
-       'logits': 0.1,
-       'y_true': 1,
-       'delta': 1,
-       'result': 1.9**2 * 0.25,
-      },
-      {'testcase_name': 'delta=1,y_true=1 z < 1-h decision boundary',
-       'logits': -0.1,
-       'y_true': 1,
-       'delta': 1,
-       'result': 1.1,
-      },
-      {'testcase_name': 'delta=2,y_true=1 z>1+h decision boundary',
-       'logits': 3.1,
-       'y_true': 1,
-       'delta': 2,
-       'result': 0,
-      },
-      {'testcase_name': 'delta=2,y_true=1 z<1+h decision boundary',
-       'logits': 2.9,
-       'y_true': 1,
-       'delta': 2,
-       'result': 0.01*0.125,
-      },
-      {'testcase_name': 'delta=2,y_true=1 1-z < h decision boundary',
-       'logits': 1.1,
-       'y_true': 1,
-       'delta': 2,
-       'result': 1.9**2 * 0.125,
-      },
-      {'testcase_name': 'delta=2,y_true=1 z < 1-h decision boundary',
-       'logits': -1.1,
-       'y_true': 1,
-       'delta': 2,
-       'result': 2.1,
-      },
-      {'testcase_name': 'delta=1,y_true=-1 z>1+h decision boundary',
-       'logits': -2.1,
-       'y_true': -1,
-       'delta': 1,
-       'result': 0,
-      },
-  ])
-  def test_calculation(self, logits, y_true, delta, result):
-    """Test the call method to ensure it returns the correct value.
-
-    Args:
-      logits: unscaled output of model
-      y_true: label
-      delta: delta value for StrongConvexHuber loss.
-      result: correct loss calculation value
-    """
-    logits = tf.Variable(logits, False, dtype=tf.float32)
-    y_true = tf.Variable(y_true, False, dtype=tf.float32)
-    loss = StrongConvexHuber(0.00001, 1, 1, delta)
-    loss = loss(y_true, logits)
-    self.assertAllClose(loss.numpy(), result)
-
-  @parameterized.named_parameters([
-      {'testcase_name': 'beta',
-       'init_args': [1, 1, 1, 1],
-       'fn': 'beta',
-       'args': [1],
-       'result': tf.Variable(1.5, dtype=tf.float32)
-      },
-      {'testcase_name': 'gamma',
-       'fn': 'gamma',
-       'init_args': [1, 1, 1, 1],
-       'args': [],
-       'result': tf.Variable(1, dtype=tf.float32),
-      },
-      {'testcase_name': 'lipchitz constant',
-       'fn': 'lipchitz_constant',
-       'init_args': [1, 1, 1, 1],
-       'args': [1],
-       'result': tf.Variable(2, dtype=tf.float32),
-      },
-      {'testcase_name': 'kernel regularizer',
-       'fn': 'kernel_regularizer',
-       'init_args': [1, 1, 1, 1],
-       'args': [],
-       'result': L1L2(l2=0.5),
-      },
-  ])
-  def test_fns(self, init_args, fn, args, result):
-    """Test that fn of BinaryCrossentropy loss returns the correct result.
-
-    Args:
-      init_args: init values for loss instance
-      fn: the fn to test
-      args: the arguments to above function
-      result: the correct result from the fn
-    """
-    loss = StrongConvexHuber(*init_args)
-    expected = getattr(loss, fn, lambda: 'fn not found')(*args)
-    if hasattr(expected, 'numpy') and hasattr(result, 'numpy'):  # both tensor
-      expected = expected.numpy()
-      result = result.numpy()
-    if hasattr(expected, 'l2') and hasattr(result, 'l2'):  # both l2 regularizer
-      expected = expected.l2
-      result = result.l2
-    self.assertEqual(expected, result)
-
-
-if __name__ == '__main__':
-  tf.test.main()
--- a/tensorflow_privacy/privacy/bolt_on/models.py
+++ b/tensorflow_privacy/privacy/bolt_on/models.py
@ -1,303 +0,0 @@
-# Copyright 2019, The TensorFlow Authors.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#      http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-"""BoltOn model for Bolt-on method of differentially private ML."""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-import tensorflow as tf
-from tensorflow.python.framework import ops as _ops
-from tensorflow.python.keras import optimizers
-from tensorflow.python.keras.models import Model
-from tensorflow_privacy.privacy.bolt_on.losses import StrongConvexMixin
-from tensorflow_privacy.privacy.bolt_on.optimizers import BoltOn
-
-
-class BoltOnModel(Model):  # pylint: disable=abstract-method
-  """BoltOn episilon-delta differential privacy model.
-
-  The privacy guarantees are dependent on the noise that is sampled. Please
-  see the paper linked below for more details.
-
-  Uses 4 key steps to achieve privacy guarantees:
-  1. Adds noise to weights after training (output perturbation).
-  2. Projects weights to R after each batch
-  3. Limits learning rate
-  4. Use a strongly convex loss function (see compile)
-
-  For more details on the strong convexity requirements, see:
-  Bolt-on Differential Privacy for Scalable Stochastic Gradient
-  Descent-based Analytics by Xi Wu et al.
-  """
-
-  def __init__(self,
-               n_outputs,
-               seed=1,
-               dtype=tf.float32):
-    """Private constructor.
-
-    Args:
-        n_outputs: number of output classes to predict.
-        seed: random seed to use
-        dtype: data type to use for tensors
-    """
-    super(BoltOnModel, self).__init__(name='bolton', dynamic=False)
-    if n_outputs <= 0:
-      raise ValueError('n_outputs = {0} is not valid. Must be > 0.'.format(
-          n_outputs
-      ))
-    self.n_outputs = n_outputs
-    self.seed = seed
-    self._layers_instantiated = False
-    self._dtype = dtype
-
-  def call(self, inputs):  # pylint: disable=arguments-differ
-    """Forward pass of network.
-
-    Args:
-        inputs: inputs to neural network
-
-    Returns:
-      Output logits for the given inputs.
-
-    """
-    return self.output_layer(inputs)
-
-  def compile(self,
-              optimizer,
-              loss,
-              kernel_initializer=tf.initializers.GlorotUniform,
-              **kwargs):  # pylint: disable=arguments-differ
-    """See super class. Default optimizer used in BoltOn method is SGD.
-
-    Args:
-      optimizer: The optimizer to use. This will be automatically wrapped
-        with the BoltOn Optimizer.
-      loss: The loss function to use. Must be a StrongConvex loss (extend the
-        StrongConvexMixin).
-      kernel_initializer: The kernel initializer to use for the single layer.
-      **kwargs: kwargs to keras Model.compile. See super.
-    """
-    if not isinstance(loss, StrongConvexMixin):
-      raise ValueError('loss function must be a Strongly Convex and therefore '
-                       'extend the StrongConvexMixin.')
-    if not self._layers_instantiated:  # compile may be called multiple times
-      # for instance, if the input/outputs are not defined until fit.
-      self.output_layer = tf.keras.layers.Dense(
-          self.n_outputs,
-          kernel_regularizer=loss.kernel_regularizer(),
-          kernel_initializer=kernel_initializer(),
-      )
-      self._layers_instantiated = True
-    if not isinstance(optimizer, BoltOn):
-      optimizer = optimizers.get(optimizer)
-      optimizer = BoltOn(optimizer, loss)
-
-    super(BoltOnModel, self).compile(optimizer, loss=loss, **kwargs)
-
-  def fit(self,
-          x=None,
-          y=None,
-          batch_size=None,
-          class_weight=None,
-          n_samples=None,
-          epsilon=2,
-          noise_distribution='laplace',
-          steps_per_epoch=None,
-          **kwargs):  # pylint: disable=arguments-differ
-    """Reroutes to super fit with  BoltOn delta-epsilon privacy requirements.
-
-    Note, inputs must be normalized s.t. ||x|| < 1.
-    Requirements are as follows:
-      1. Adds noise to weights after training (output perturbation).
-      2. Projects weights to R after each batch
-      3. Limits learning rate
-      4. Use a strongly convex loss function (see compile)
-    See super implementation for more details.
-
-    Args:
-      x: Inputs to fit on, see super.
-      y: Labels to fit on, see super.
-      batch_size: The batch size to use for training, see super.
-      class_weight: the class weights to be used. Can be a scalar or 1D tensor
-                    whose dim == n_classes.
-      n_samples: the number of individual samples in x.
-      epsilon: privacy parameter, which trades off between utility an privacy.
-                See the bolt-on paper for more description.
-      noise_distribution: the distribution to pull noise from.
-      steps_per_epoch:
-      **kwargs: kwargs to keras Model.fit. See super.
-
-    Returns:
-      Output from super fit method.
-    """
-    if class_weight is None:
-      class_weight_ = self.calculate_class_weights(class_weight)
-    else:
-      class_weight_ = class_weight
-    if n_samples is not None:
-      data_size = n_samples
-    elif hasattr(x, 'shape'):
-      data_size = x.shape[0]
-    elif hasattr(x, '__len__'):
-      data_size = len(x)
-    else:
-      data_size = None
-    batch_size_ = self._validate_or_infer_batch_size(batch_size,
-                                                     steps_per_epoch,
-                                                     x)
-    if batch_size_ is None:
-      batch_size_ = 32
-    # inferring batch_size to be passed to optimizer. batch_size must remain its
-    # initial value when passed to super().fit()
-    if batch_size_ is None:
-      raise ValueError('batch_size: {0} is an '
-                       'invalid value'.format(batch_size_))
-    if data_size is None:
-      raise ValueError('Could not infer the number of samples. Please pass '
-                       'this in using n_samples.')
-    with self.optimizer(noise_distribution,
-                        epsilon,
-                        self.layers,
-                        class_weight_,
-                        data_size,
-                        batch_size_) as _:
-      out = super(BoltOnModel, self).fit(x=x,
-                                         y=y,
-                                         batch_size=batch_size,
-                                         class_weight=class_weight,
-                                         steps_per_epoch=steps_per_epoch,
-                                         **kwargs)
-    return out
-
-  def fit_generator(self,
-                    generator,
-                    class_weight=None,
-                    noise_distribution='laplace',
-                    epsilon=2,
-                    n_samples=None,
-                    steps_per_epoch=None,
-                    **kwargs):  # pylint: disable=arguments-differ
-    """Fit with a generator.
-
-    This method is the same as fit except for when the passed dataset
-    is a generator. See super method and fit for more details.
-
-    Args:
-      generator: Inputs generator following Tensorflow guidelines, see super.
-      class_weight: the class weights to be used. Can be a scalar or 1D tensor
-                    whose dim == n_classes.
-      noise_distribution: the distribution to get noise from.
-      epsilon: privacy parameter, which trades off utility and privacy. See
-                BoltOn paper for more description.
-      n_samples: number of individual samples in x
-      steps_per_epoch: Number of steps per training epoch, see super.
-      **kwargs: **kwargs
-
-    Returns:
-      Output from super fit_generator method.
-    """
-    if class_weight is None:
-      class_weight = self.calculate_class_weights(class_weight)
-    if n_samples is not None:
-      data_size = n_samples
-    elif hasattr(generator, 'shape'):
-      data_size = generator.shape[0]
-    elif hasattr(generator, '__len__'):
-      data_size = len(generator)
-    else:
-      raise ValueError('The number of samples could not be determined. '
-                       'Please make sure that if you are using a generator'
-                       'to call this method directly with n_samples kwarg '
-                       'passed.')
-    batch_size = self._validate_or_infer_batch_size(None, steps_per_epoch,
-                                                    generator)
-    if batch_size is None:
-      batch_size = 32
-    with self.optimizer(noise_distribution,
-                        epsilon,
-                        self.layers,
-                        class_weight,
-                        data_size,
-                        batch_size) as _:
-      out = super(BoltOnModel, self).fit_generator(
-          generator,
-          class_weight=class_weight,
-          steps_per_epoch=steps_per_epoch,
-          **kwargs)
-    return out
-
-  def calculate_class_weights(self,
-                              class_weights=None,
-                              class_counts=None,
-                              num_classes=None):
-    """Calculates class weighting to be used in training.
-
-    Args:
-      class_weights: str specifying type, array giving weights, or None.
-      class_counts: If class_weights is not None, then an array of
-                    the number of samples for each class
-      num_classes: If class_weights is not None, then the number of
-                      classes.
-    Returns:
-      class_weights as 1D tensor, to be passed to model's fit method.
-    """
-    # Value checking
-    class_keys = ['balanced']
-    is_string = False
-    if isinstance(class_weights, str):
-      is_string = True
-      if class_weights not in class_keys:
-        raise ValueError('Detected string class_weights with '
-                         'value: {0}, which is not one of {1}.'
-                         'Please select a valid class_weight type'
-                         'or pass an array'.format(class_weights,
-                                                   class_keys))
-      if class_counts is None:
-        raise ValueError('Class counts must be provided if using '
-                         'class_weights=%s' % class_weights)
-      class_counts_shape = tf.Variable(class_counts,
-                                       trainable=False,
-                                       dtype=self._dtype).shape
-      if len(class_counts_shape) != 1:
-        raise ValueError('class counts must be a 1D array.'
-                         'Detected: {0}'.format(class_counts_shape))
-      if num_classes is None:
-        raise ValueError('num_classes must be provided if using '
-                         'class_weights=%s' % class_weights)
-    elif class_weights is not None:
-      if num_classes is None:
-        raise ValueError('You must pass a value for num_classes if '
-                         'creating an array of class_weights')
-    # performing class weight calculation
-    if class_weights is None:
-      class_weights = 1
-    elif is_string and class_weights == 'balanced':
-      num_samples = sum(class_counts)
-      weighted_counts = tf.dtypes.cast(tf.math.multiply(num_classes,
-                                                        class_counts),
-                                       self._dtype)
-      class_weights = tf.Variable(num_samples, dtype=self._dtype) / \
-                      tf.Variable(weighted_counts, dtype=self._dtype)
-    else:
-      class_weights = _ops.convert_to_tensor_v2(class_weights)
-      if len(class_weights.shape) != 1:
-        raise ValueError('Detected class_weights shape: {0} instead of '
-                         '1D array'.format(class_weights.shape))
-      if class_weights.shape[0] != num_classes:
-        raise ValueError(
-            'Detected array length: {0} instead of: {1}'.format(
-                class_weights.shape[0],
-                num_classes))
-    return class_weights
--- a/tensorflow_privacy/privacy/bolt_on/models_test.py
+++ b/tensorflow_privacy/privacy/bolt_on/models_test.py
@ -1,548 +0,0 @@
-# Copyright 2019, The TensorFlow Authors.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#      http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-"""Unit testing for models."""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-from absl.testing import parameterized
-import tensorflow as tf
-from tensorflow.python.framework import ops as _ops
-from tensorflow.python.keras import keras_parameterized
-from tensorflow.python.keras import losses
-from tensorflow.python.keras.optimizer_v2.optimizer_v2 import OptimizerV2
-from tensorflow.python.keras.regularizers import L1L2
-from tensorflow_privacy.privacy.bolt_on import models
-from tensorflow_privacy.privacy.bolt_on.losses import StrongConvexMixin
-from tensorflow_privacy.privacy.bolt_on.optimizers import BoltOn
-
-
-class TestLoss(losses.Loss, StrongConvexMixin):
-  """Test loss function for testing BoltOn model."""
-
-  def __init__(self, reg_lambda, c_arg, radius_constant, name='test'):
-    super(TestLoss, self).__init__(name=name)
-    self.reg_lambda = reg_lambda
-    self.C = c_arg  # pylint: disable=invalid-name
-    self.radius_constant = radius_constant
-
-  def radius(self):
-    """Radius, R, of the hypothesis space W.
-
-    W is a convex set that forms the hypothesis space.
-
-    Returns:
-      radius
-    """
-    return _ops.convert_to_tensor_v2(1, dtype=tf.float32)
-
-  def gamma(self):
-    """Returns strongly convex parameter, gamma."""
-    return _ops.convert_to_tensor_v2(1, dtype=tf.float32)
-
-  def beta(self, class_weight):  # pylint: disable=unused-argument
-    """Smoothness, beta.
-
-    Args:
-      class_weight: the class weights as scalar or 1d tensor, where its
-        dimensionality is equal to the number of outputs.
-
-    Returns:
-      Beta
-    """
-    return _ops.convert_to_tensor_v2(1, dtype=tf.float32)
-
-  def lipchitz_constant(self, class_weight):  # pylint: disable=unused-argument
-    """Lipchitz constant, L.
-
-    Args:
-      class_weight: class weights used
-
-    Returns:
-      L
-    """
-    return _ops.convert_to_tensor_v2(1, dtype=tf.float32)
-
-  def call(self, y_true, y_pred):
-    """Loss function that is minimized at the mean of the input points."""
-    return 0.5 * tf.reduce_sum(
-        tf.math.squared_difference(y_true, y_pred),
-        axis=1
-    )
-
-  def max_class_weight(self, class_weight):
-    """the maximum weighting in class weights (max value) as a scalar tensor.
-
-    Args:
-      class_weight: class weights used
-
-    Returns:
-      maximum class weighting as tensor scalar
-    """
-    if class_weight is None:
-      return 1
-    raise ValueError('')
-
-  def kernel_regularizer(self):
-    """Returns the kernel_regularizer to be used.
-
-    Any subclass should override this method if they want a kernel_regularizer
-    (if required for the loss function to be StronglyConvex.
-    """
-    return L1L2(l2=self.reg_lambda)
-
-
-class TestOptimizer(OptimizerV2):
-  """Test optimizer used for testing BoltOn model."""
-
-  def __init__(self):
-    super(TestOptimizer, self).__init__('test')
-
-  def compute_gradients(self):
-    return 0
-
-  def get_config(self):
-    return {}
-
-  def _create_slots(self, var):
-    pass
-
-  def _resource_apply_dense(self, grad, handle):
-    return grad
-
-  def _resource_apply_sparse(self, grad, handle, indices):
-    return grad
-
-
-class InitTests(keras_parameterized.TestCase):
-  """Tests for keras model initialization."""
-
-  @parameterized.named_parameters([
-      {'testcase_name': 'normal',
-       'n_outputs': 1,
-      },
-      {'testcase_name': 'many outputs',
-       'n_outputs': 100,
-      },
-  ])
-  def test_init_params(self, n_outputs):
-    """Test initialization of BoltOnModel.
-
-    Args:
-        n_outputs: number of output neurons
-    """
-    # test valid domains for each variable
-    clf = models.BoltOnModel(n_outputs)
-    self.assertIsInstance(clf, models.BoltOnModel)
-
-  @parameterized.named_parameters([
-      {'testcase_name': 'invalid n_outputs',
-       'n_outputs': -1,
-      },
-  ])
-  def test_bad_init_params(self, n_outputs):
-    """test bad initializations of BoltOnModel that should raise errors.
-
-    Args:
-        n_outputs: number of output neurons
-    """
-    # test invalid domains for each variable, especially noise
-    with self.assertRaises(ValueError):
-      models.BoltOnModel(n_outputs)
-
-  @parameterized.named_parameters([
-      {'testcase_name': 'string compile',
-       'n_outputs': 1,
-       'loss': TestLoss(1, 1, 1),
-       'optimizer': 'adam',
-      },
-      {'testcase_name': 'test compile',
-       'n_outputs': 100,
-       'loss': TestLoss(1, 1, 1),
-       'optimizer': TestOptimizer(),
-      },
-  ])
-  def test_compile(self, n_outputs, loss, optimizer):
-    """Test compilation of BoltOnModel.
-
-    Args:
-      n_outputs: number of output neurons
-      loss: instantiated TestLoss instance
-      optimizer: instantiated TestOptimizer instance
-    """
-    # test compilation of valid tf.optimizer and tf.loss
-    with self.cached_session():
-      clf = models.BoltOnModel(n_outputs)
-      clf.compile(optimizer, loss)
-      self.assertEqual(clf.loss, loss)
-
-  @parameterized.named_parameters([
-      {'testcase_name': 'Not strong loss',
-       'n_outputs': 1,
-       'loss': losses.BinaryCrossentropy(),
-       'optimizer': 'adam',
-      },
-      {'testcase_name': 'Not valid optimizer',
-       'n_outputs': 1,
-       'loss': TestLoss(1, 1, 1),
-       'optimizer': 'ada',
-      }
-  ])
-  def test_bad_compile(self, n_outputs, loss, optimizer):
-    """test bad compilations of BoltOnModel that should raise errors.
-
-    Args:
-      n_outputs: number of output neurons
-      loss: instantiated TestLoss instance
-      optimizer: instantiated TestOptimizer instance
-    """
-    # test compilaton of invalid tf.optimizer and non instantiated loss.
-    with self.cached_session():
-      with self.assertRaises((ValueError, AttributeError)):
-        clf = models.BoltOnModel(n_outputs)
-        clf.compile(optimizer, loss)
-
-
-def _cat_dataset(n_samples, input_dim, n_classes, batch_size, generator=False):
-  """Creates a categorically encoded dataset.
-
-  Creates a categorically encoded dataset (y is categorical).
-  returns the specified dataset either as a static array or as a generator.
-  Will have evenly split samples across each output class.
-  Each output class will be a different point in the input space.
-
-  Args:
-    n_samples: number of rows
-    input_dim: input dimensionality
-    n_classes: output dimensionality
-    batch_size: The desired batch_size
-    generator: False for array, True for generator
-
-  Returns:
-    X as (n_samples, input_dim), Y as (n_samples, n_outputs)
-  """
-  x_stack = []
-  y_stack = []
-  for i_class in range(n_classes):
-    x_stack.append(
-        tf.constant(1*i_class, tf.float32, (n_samples, input_dim))
-    )
-    y_stack.append(
-        tf.constant(i_class, tf.float32, (n_samples, n_classes))
-    )
-  x_set, y_set = tf.stack(x_stack), tf.stack(y_stack)
-  if generator:
-    dataset = tf.data.Dataset.from_tensor_slices(
-        (x_set, y_set)
-    )
-    dataset = dataset.batch(batch_size=batch_size)
-    return dataset
-  return x_set, y_set
-
-
-def _do_fit(n_samples,
-            input_dim,
-            n_outputs,
-            epsilon,
-            generator,
-            batch_size,
-            reset_n_samples,
-            optimizer,
-            loss,
-            distribution='laplace'):
-  """Instantiate necessary components for fitting and perform a model fit.
-
-  Args:
-    n_samples: number of samples in dataset
-    input_dim: the sample dimensionality
-    n_outputs: number of output neurons
-    epsilon: privacy parameter
-    generator: True to create a generator, False to use an iterator
-    batch_size: batch_size to use
-    reset_n_samples: True to set _samples to None prior to fitting.
-                      False does nothing
-    optimizer: instance of TestOptimizer
-    loss: instance of TestLoss
-    distribution: distribution to get noise from.
-
-  Returns:
-    BoltOnModel instsance
-  """
-  clf = models.BoltOnModel(n_outputs)
-  clf.compile(optimizer, loss)
-  if generator:
-    x = _cat_dataset(
-        n_samples,
-        input_dim,
-        n_outputs,
-        batch_size,
-        generator=generator
-    )
-    y = None
-    # x = x.batch(batch_size)
-    x = x.shuffle(n_samples//2)
-    batch_size = None
-    if reset_n_samples:
-      n_samples = None
-    clf.fit_generator(x,
-                      n_samples=n_samples,
-                      noise_distribution=distribution,
-                      epsilon=epsilon)
-  else:
-    x, y = _cat_dataset(
-        n_samples,
-        input_dim,
-        n_outputs,
-        batch_size,
-        generator=generator)
-    if reset_n_samples:
-      n_samples = None
-    clf.fit(x,
-            y,
-            batch_size=batch_size,
-            n_samples=n_samples,
-            noise_distribution=distribution,
-            epsilon=epsilon)
-  return clf
-
-
-class FitTests(keras_parameterized.TestCase):
-  """Test cases for keras model fitting."""
-
-  # @test_util.run_all_in_graph_and_eager_modes
-  @parameterized.named_parameters([
-      {'testcase_name': 'iterator fit',
-       'generator': False,
-       'reset_n_samples': True,
-      },
-      {'testcase_name': 'iterator fit no samples',
-       'generator': False,
-       'reset_n_samples': True,
-      },
-      {'testcase_name': 'generator fit',
-       'generator': True,
-       'reset_n_samples': False,
-      },
-      {'testcase_name': 'with callbacks',
-       'generator': True,
-       'reset_n_samples': False,
-      },
-  ])
-  def test_fit(self, generator, reset_n_samples):
-    """Tests fitting of BoltOnModel.
-
-    Args:
-      generator: True for generator test, False for iterator test.
-      reset_n_samples: True to reset the n_samples to None, False does nothing
-    """
-    loss = TestLoss(1, 1, 1)
-    optimizer = BoltOn(TestOptimizer(), loss)
-    n_classes = 2
-    input_dim = 5
-    epsilon = 1
-    batch_size = 1
-    n_samples = 10
-    clf = _do_fit(
-        n_samples,
-        input_dim,
-        n_classes,
-        epsilon,
-        generator,
-        batch_size,
-        reset_n_samples,
-        optimizer,
-        loss,
-    )
-    self.assertEqual(hasattr(clf, 'layers'), True)
-
-  @parameterized.named_parameters([
-      {'testcase_name': 'generator fit',
-       'generator': True,
-      },
-  ])
-  def test_fit_gen(self, generator):
-    """Tests the fit_generator method of BoltOnModel.
-
-    Args:
-      generator: True to test with a generator dataset
-    """
-    loss = TestLoss(1, 1, 1)
-    optimizer = TestOptimizer()
-    n_classes = 2
-    input_dim = 5
-    batch_size = 1
-    n_samples = 10
-    clf = models.BoltOnModel(n_classes)
-    clf.compile(optimizer, loss)
-    x = _cat_dataset(
-        n_samples,
-        input_dim,
-        n_classes,
-        batch_size,
-        generator=generator
-    )
-    x = x.batch(batch_size)
-    x = x.shuffle(n_samples // 2)
-    clf.fit_generator(x, n_samples=n_samples)
-    self.assertEqual(hasattr(clf, 'layers'), True)
-
-  @parameterized.named_parameters([
-      {'testcase_name': 'iterator no n_samples',
-       'generator': True,
-       'reset_n_samples': True,
-       'distribution': 'laplace'
-      },
-      {'testcase_name': 'invalid distribution',
-       'generator': True,
-       'reset_n_samples': True,
-       'distribution': 'not_valid'
-      },
-  ])
-  def test_bad_fit(self, generator, reset_n_samples, distribution):
-    """Tests fitting with invalid parameters, which should raise an error.
-
-    Args:
-      generator: True to test with generator, False is iterator
-      reset_n_samples: True to reset the n_samples param to None prior to
-        passing it to fit
-      distribution: distribution to get noise from.
-    """
-    with self.assertRaises(ValueError):
-      loss = TestLoss(1, 1, 1)
-      optimizer = TestOptimizer()
-      n_classes = 2
-      input_dim = 5
-      epsilon = 1
-      batch_size = 1
-      n_samples = 10
-      _do_fit(
-          n_samples,
-          input_dim,
-          n_classes,
-          epsilon,
-          generator,
-          batch_size,
-          reset_n_samples,
-          optimizer,
-          loss,
-          distribution
-      )
-
-  @parameterized.named_parameters([
-      {'testcase_name': 'None class_weights',
-       'class_weights': None,
-       'class_counts': None,
-       'num_classes': None,
-       'result': 1},
-      {'testcase_name': 'class weights array',
-       'class_weights': [1, 1],
-       'class_counts': [1, 1],
-       'num_classes': 2,
-       'result': [1, 1]},
-      {'testcase_name': 'class weights balanced',
-       'class_weights': 'balanced',
-       'class_counts': [1, 1],
-       'num_classes': 2,
-       'result': [1, 1]},
-  ])
-  def test_class_calculate(self,
-                           class_weights,
-                           class_counts,
-                           num_classes,
-                           result):
-    """Tests the BOltonModel calculate_class_weights method.
-
-    Args:
-      class_weights: the class_weights to use
-      class_counts: count of number of samples for each class
-      num_classes: number of outputs neurons
-      result: expected result
-    """
-    clf = models.BoltOnModel(1, 1)
-    expected = clf.calculate_class_weights(class_weights,
-                                           class_counts,
-                                           num_classes)
-
-    if hasattr(expected, 'numpy'):
-      expected = expected.numpy()
-    self.assertAllEqual(
-        expected,
-        result
-    )
-  @parameterized.named_parameters([
-      {'testcase_name': 'class weight not valid str',
-       'class_weights': 'not_valid',
-       'class_counts': 1,
-       'num_classes': 1,
-       'err_msg': 'Detected string class_weights with value: not_valid'},
-      {'testcase_name': 'no class counts',
-       'class_weights': 'balanced',
-       'class_counts': None,
-       'num_classes': 1,
-       'err_msg': 'Class counts must be provided if '
-                  'using class_weights=balanced'},
-      {'testcase_name': 'no num classes',
-       'class_weights': 'balanced',
-       'class_counts': [1],
-       'num_classes': None,
-       'err_msg': 'num_classes must be provided if '
-                  'using class_weights=balanced'},
-      {'testcase_name': 'class counts not array',
-       'class_weights': 'balanced',
-       'class_counts': 1,
-       'num_classes': None,
-       'err_msg': 'class counts must be a 1D array.'},
-      {'testcase_name': 'class counts array, no num classes',
-       'class_weights': [1],
-       'class_counts': None,
-       'num_classes': None,
-       'err_msg': 'You must pass a value for num_classes if '
-                  'creating an array of class_weights'},
-      {'testcase_name': 'class counts array, improper shape',
-       'class_weights': [[1], [1]],
-       'class_counts': None,
-       'num_classes': 2,
-       'err_msg': 'Detected class_weights shape'},
-      {'testcase_name': 'class counts array, wrong number classes',
-       'class_weights': [1, 1, 1],
-       'class_counts': None,
-       'num_classes': 2,
-       'err_msg': 'Detected array length:'},
-  ])
-
-  def test_class_errors(self,
-                        class_weights,
-                        class_counts,
-                        num_classes,
-                        err_msg):
-    """Tests the BOltonModel calculate_class_weights method.
-
-    This test passes invalid params which should raise the expected errors.
-
-    Args:
-      class_weights: the class_weights to use.
-      class_counts: count of number of samples for each class.
-      num_classes: number of outputs neurons.
-      err_msg: The expected error message.
-    """
-    clf = models.BoltOnModel(1, 1)
-    with self.assertRaisesRegexp(ValueError, err_msg):  # pylint: disable=deprecated-method
-      clf.calculate_class_weights(class_weights,
-                                  class_counts,
-                                  num_classes)
-
-
-if __name__ == '__main__':
-  tf.test.main()
--- a/tensorflow_privacy/privacy/bolt_on/optimizers.py
+++ b/tensorflow_privacy/privacy/bolt_on/optimizers.py
@ -1,388 +0,0 @@
-# Copyright 2019, The TensorFlow Authors.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#      http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-"""BoltOn Optimizer for Bolt-on method."""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import tensorflow as tf
-from tensorflow.python.keras.optimizer_v2 import optimizer_v2
-from tensorflow.python.ops import math_ops
-from tensorflow_privacy.privacy.bolt_on.losses import StrongConvexMixin
-
-_accepted_distributions = ['laplace']  # implemented distributions for noising
-
-
-class GammaBetaDecreasingStep(
-    optimizer_v2.learning_rate_schedule.LearningRateSchedule):
-  """Computes LR as minimum of 1/beta and 1/(gamma * step) at each step.
-
-  This is a required step for privacy guarantees.
-  """
-
-  def __init__(self):
-    self.is_init = False
-    self.beta = None
-    self.gamma = None
-
-  def __call__(self, step):
-    """Computes and returns the learning rate.
-
-    Args:
-      step: the current iteration number
-
-    Returns:
-      decayed learning rate to minimum of 1/beta and 1/(gamma * step) as per
-      the BoltOn privacy requirements.
-    """
-    if not self.is_init:
-      raise AttributeError('Please initialize the {0} Learning Rate Scheduler.'
-                           'This is performed automatically by using the '
-                           '{1} as a context manager, '
-                           'as desired'.format(self.__class__.__name__,
-                                               BoltOn.__class__.__name__
-                                              )
-                          )
-    dtype = self.beta.dtype
-    one = tf.constant(1, dtype)
-    return tf.math.minimum(tf.math.reduce_min(one/self.beta),
-                           one/(self.gamma*math_ops.cast(step, dtype))
-                          )
-
-  def get_config(self):
-    """Return config to setup the learning rate scheduler."""
-    return {'beta': self.beta, 'gamma': self.gamma}
-
-  def initialize(self, beta, gamma):
-    """Setups scheduler with beta and gamma values from the loss function.
-
-    Meant to be used with .fit as the loss params may depend on values passed to
-    fit.
-
-    Args:
-      beta: Smoothness value. See StrongConvexMixin
-      gamma: Strong Convexity parameter. See StrongConvexMixin.
-    """
-    self.is_init = True
-    self.beta = beta
-    self.gamma = gamma
-
-  def de_initialize(self):
-    """De initialize post fit, as another fit call may use other parameters."""
-    self.is_init = False
-    self.beta = None
-    self.gamma = None
-
-
-class BoltOn(optimizer_v2.OptimizerV2):
-  """Wrap another tf optimizer with BoltOn privacy protocol.
-
-  BoltOn optimizer wraps another tf optimizer to be used
-  as the visible optimizer to the tf model. No matter the optimizer
-  passed, "BoltOn" enables the bolt-on model to control the learning rate
-  based on the strongly convex loss.
-
-  To use the BoltOn method, you must:
-  1. instantiate it with an instantiated tf optimizer and StrongConvexLoss.
-  2. use it as a context manager around your .fit method internals.
-
-  This can be accomplished by the following:
-  optimizer = tf.optimizers.SGD()
-  loss = privacy.bolt_on.losses.StrongConvexBinaryCrossentropy()
-  bolton = BoltOn(optimizer, loss)
-  with bolton(*args) as _:
-    model.fit()
-  The args required for the context manager can be found in the __call__
-  method.
-
-  For more details on the strong convexity requirements, see:
-  Bolt-on Differential Privacy for Scalable Stochastic Gradient
-  Descent-based Analytics by Xi Wu et. al.
-  """
-
-  def __init__(self,  # pylint: disable=super-init-not-called
-               optimizer,
-               loss,
-               dtype=tf.float32,
-              ):
-    """Constructor.
-
-    Args:
-      optimizer: Optimizer_v2 or subclass to be used as the optimizer
-        (wrapped).
-      loss: StrongConvexLoss function that the model is being compiled with.
-      dtype: dtype
-    """
-
-    if not isinstance(loss, StrongConvexMixin):
-      raise ValueError('loss function must be a Strongly Convex and therefore '
-                       'extend the StrongConvexMixin.')
-    self._private_attributes = [
-        '_internal_optimizer',
-        'dtype',
-        'noise_distribution',
-        'epsilon',
-        'loss',
-        'class_weights',
-        'input_dim',
-        'n_samples',
-        'layers',
-        'batch_size',
-        '_is_init',
-    ]
-    self._internal_optimizer = optimizer
-    self.learning_rate = GammaBetaDecreasingStep()  # use the BoltOn Learning
-    # rate scheduler, as required for privacy guarantees. This will still need
-    # to get values from the loss function near the time that .fit is called
-    # on the model (when this optimizer will be called as a context manager)
-    self.dtype = dtype
-    self.loss = loss
-    self._is_init = False
-
-  def get_config(self):
-    """Reroutes to _internal_optimizer. See super/_internal_optimizer."""
-    return self._internal_optimizer.get_config()
-
-  def project_weights_to_r(self, force=False):
-    """Normalize the weights to the R-ball.
-
-    Args:
-      force: True to normalize regardless of previous weight values.
-        False to check if weights > R-ball and only normalize then.
-
-    Raises:
-      Exception: If not called from inside this optimizer context.
-    """
-    if not self._is_init:
-      raise Exception('This method must be called from within the optimizer\'s '
-                      'context.')
-    radius = self.loss.radius()
-    for layer in self.layers:
-      weight_norm = tf.norm(layer.kernel, axis=0)
-      if force:
-        layer.kernel = layer.kernel / (weight_norm / radius)
-      else:
-        layer.kernel = tf.cond(
-            tf.reduce_sum(tf.cast(weight_norm > radius, dtype=self.dtype)) > 0,
-            lambda k=layer.kernel, w=weight_norm, r=radius: k / (w / r),  # pylint: disable=cell-var-from-loop
-            lambda k=layer.kernel: k  # pylint: disable=cell-var-from-loop
-        )
-
-  def get_noise(self, input_dim, output_dim):
-    """Sample noise to be added to weights for privacy guarantee.
-
-    Args:
-      input_dim: the input dimensionality for the weights
-      output_dim: the output dimensionality for the weights
-
-    Returns:
-      Noise in shape of layer's weights to be added to the weights.
-
-    Raises:
-      Exception: If not called from inside this optimizer's context.
-    """
-    if not self._is_init:
-      raise Exception('This method must be called from within the optimizer\'s '
-                      'context.')
-    loss = self.loss
-    distribution = self.noise_distribution.lower()
-    if distribution == _accepted_distributions[0]:  # laplace
-      per_class_epsilon = self.epsilon / (output_dim)
-      l2_sensitivity = (2 *
-                        loss.lipchitz_constant(self.class_weights)) / \
-                       (loss.gamma() * self.n_samples * self.batch_size)
-      unit_vector = tf.random.normal(shape=(input_dim, output_dim),
-                                     mean=0,
-                                     seed=1,
-                                     stddev=1.0,
-                                     dtype=self.dtype)
-      unit_vector = unit_vector / tf.math.sqrt(
-          tf.reduce_sum(tf.math.square(unit_vector), axis=0)
-      )
-
-      beta = l2_sensitivity / per_class_epsilon
-      alpha = input_dim  # input_dim
-      gamma = tf.random.gamma([output_dim],
-                              alpha,
-                              beta=1 / beta,
-                              seed=1,
-                              dtype=self.dtype
-                             )
-      return unit_vector * gamma
-    raise NotImplementedError('Noise distribution: {0} is not '
-                              'a valid distribution'.format(distribution))
-
-  def from_config(self, *args, **kwargs):  # pylint: disable=arguments-differ
-    """Reroutes to _internal_optimizer. See super/_internal_optimizer."""
-    return self._internal_optimizer.from_config(*args, **kwargs)
-
-  def __getattr__(self, name):
-    """Get attr.
-
-    return _internal_optimizer off self instance, and everything else
-    from the _internal_optimizer instance.
-
-    Args:
-      name: Name of attribute to get from this or aggregate optimizer.
-
-    Returns:
-      attribute from BoltOn if specified to come from self, else
-      from _internal_optimizer.
-    """
-    if name == '_private_attributes' or name in self._private_attributes:
-      return getattr(self, name)
-    optim = object.__getattribute__(self, '_internal_optimizer')
-    try:
-      return object.__getattribute__(optim, name)
-    except AttributeError:
-      raise AttributeError(
-          "Neither '{0}' nor '{1}' object has attribute '{2}'"
-          "".format(self.__class__.__name__,
-                    self._internal_optimizer.__class__.__name__,
-                    name)
-          )
-
-  def __setattr__(self, key, value):
-    """Set attribute to self instance if its the internal optimizer.
-
-    Reroute everything else to the _internal_optimizer.
-
-    Args:
-      key: attribute name
-      value: attribute value
-    """
-    if key == '_private_attributes':
-      object.__setattr__(self, key, value)
-    elif key in self._private_attributes:
-      object.__setattr__(self, key, value)
-    else:
-      setattr(self._internal_optimizer, key, value)
-
-  def _resource_apply_dense(self, *args, **kwargs):  # pylint: disable=arguments-differ
-    """Reroutes to _internal_optimizer. See super/_internal_optimizer."""
-    return self._internal_optimizer._resource_apply_dense(*args, **kwargs)  # pylint: disable=protected-access
-
-  def _resource_apply_sparse(self, *args, **kwargs):  # pylint: disable=arguments-differ
-    """Reroutes to _internal_optimizer. See super/_internal_optimizer."""
-    return self._internal_optimizer._resource_apply_sparse(*args, **kwargs)  # pylint: disable=protected-access
-
-  def get_updates(self, loss, params):
-    """Reroutes to _internal_optimizer. See super/_internal_optimizer."""
-    out = self._internal_optimizer.get_updates(loss, params)
-    self.project_weights_to_r()
-    return out
-
-  def apply_gradients(self, *args, **kwargs):  # pylint: disable=arguments-differ
-    """Reroutes to _internal_optimizer. See super/_internal_optimizer."""
-    out = self._internal_optimizer.apply_gradients(*args, **kwargs)
-    self.project_weights_to_r()
-    return out
-
-  def minimize(self, *args, **kwargs):  # pylint: disable=arguments-differ
-    """Reroutes to _internal_optimizer. See super/_internal_optimizer."""
-    out = self._internal_optimizer.minimize(*args, **kwargs)
-    self.project_weights_to_r()
-    return out
-
-  def _compute_gradients(self, *args, **kwargs):  # pylint: disable=arguments-differ,protected-access
-    """Reroutes to _internal_optimizer. See super/_internal_optimizer."""
-    return self._internal_optimizer._compute_gradients(*args, **kwargs)  # pylint: disable=protected-access
-
-  def get_gradients(self, *args, **kwargs):  # pylint: disable=arguments-differ
-    """Reroutes to _internal_optimizer. See super/_internal_optimizer."""
-    return self._internal_optimizer.get_gradients(*args, **kwargs)
-
-  def __enter__(self):
-    """Context manager call at the beginning of with statement.
-
-    Returns:
-      self, to be used in context manager
-    """
-    self._is_init = True
-    return self
-
-  def __call__(self,
-               noise_distribution,
-               epsilon,
-               layers,
-               class_weights,
-               n_samples,
-               batch_size):
-    """Accepts required values for bolton method from context entry point.
-
-    Stores them on the optimizer for use throughout fitting.
-
-    Args:
-      noise_distribution: the noise distribution to pick.
-        see _accepted_distributions and get_noise for possible values.
-      epsilon: privacy parameter. Lower gives more privacy but less utility.
-      layers: list of Keras/Tensorflow layers. Can be found as model.layers
-      class_weights: class_weights used, which may either be a scalar or 1D
-        tensor with dim == n_classes.
-      n_samples: number of rows/individual samples in the training set
-      batch_size: batch size used.
-
-    Returns:
-      self, to be used by the __enter__ method for context.
-    """
-    if epsilon <= 0:
-      raise ValueError('Detected epsilon: {0}. '
-                       'Valid range is 0 < epsilon <inf'.format(epsilon))
-    if noise_distribution not in _accepted_distributions:
-      raise ValueError('Detected noise distribution: {0} not one of: {1} valid'
-                       'distributions'.format(noise_distribution,
-                                              _accepted_distributions))
-    self.noise_distribution = noise_distribution
-    self.learning_rate.initialize(self.loss.beta(class_weights),
-                                  self.loss.gamma())
-    self.epsilon = tf.constant(epsilon, dtype=self.dtype)
-    self.class_weights = tf.constant(class_weights, dtype=self.dtype)
-    self.n_samples = tf.constant(n_samples, dtype=self.dtype)
-    self.layers = layers
-    self.batch_size = tf.constant(batch_size, dtype=self.dtype)
-    return self
-
-  def __exit__(self, *args):
-    """Exit call from with statement.
-
-    Used to:
-    1.reset the model and fit parameters passed to the optimizer
-      to enable the BoltOn Privacy guarantees. These are reset to ensure
-      that any future calls to fit with the same instance of the optimizer
-      will properly error out.
-
-    2.call post-fit methods normalizing/projecting the model weights and
-      adding noise to the weights.
-
-    Args:
-      *args: encompasses the type, value, and traceback values which are unused.
-    """
-    self.project_weights_to_r(True)
-    for layer in self.layers:
-      input_dim = layer.kernel.shape[0]
-      output_dim = layer.units
-      noise = self.get_noise(input_dim,
-                             output_dim,
-                            )
-      layer.kernel = tf.math.add(layer.kernel, noise)
-    self.noise_distribution = None
-    self.learning_rate.de_initialize()
-    self.epsilon = -1
-    self.batch_size = -1
-    self.class_weights = None
-    self.n_samples = None
-    self.input_dim = None
-    self.layers = None
-    self._is_init = False
--- a/tensorflow_privacy/privacy/bolt_on/optimizers_test.py
+++ b/tensorflow_privacy/privacy/bolt_on/optimizers_test.py
@ -1,579 +0,0 @@
-# Copyright 2019, The TensorFlow Authors.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#      http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-"""Unit testing for optimizers."""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import unittest
-from absl.testing import parameterized
-import tensorflow as tf
-from tensorflow.python import ops as _ops
-from tensorflow.python.framework import test_util
-from tensorflow.python.keras import keras_parameterized
-from tensorflow.python.keras import losses
-from tensorflow.python.keras.initializers import constant
-from tensorflow.python.keras.models import Model
-from tensorflow.python.keras.optimizer_v2.optimizer_v2 import OptimizerV2
-from tensorflow.python.keras.regularizers import L1L2
-from tensorflow.python.platform import test
-from tensorflow_privacy.privacy.bolt_on import optimizers as opt
-from tensorflow_privacy.privacy.bolt_on.losses import StrongConvexMixin
-
-
-class TestModel(Model):  # pylint: disable=abstract-method
-  """BoltOn episilon-delta model.
-
-  Uses 4 key steps to achieve privacy guarantees:
-  1. Adds noise to weights after training (output perturbation).
-  2. Projects weights to R after each batch
-  3. Limits learning rate
-  4. Use a strongly convex loss function (see compile)
-
-  For more details on the strong convexity requirements, see:
-  Bolt-on Differential Privacy for Scalable Stochastic Gradient
-  Descent-based Analytics by Xi Wu et. al.
-  """
-
-  def __init__(self, n_outputs=2, input_shape=(16,), init_value=2):
-    """Constructor.
-
-    Args:
-      n_outputs: number of output neurons
-      input_shape:
-      init_value:
-    """
-    super(TestModel, self).__init__(name='bolton', dynamic=False)
-    self.n_outputs = n_outputs
-    self.layer_input_shape = input_shape
-    self.output_layer = tf.keras.layers.Dense(
-        self.n_outputs,
-        input_shape=self.layer_input_shape,
-        kernel_regularizer=L1L2(l2=1),
-        kernel_initializer=constant(init_value),
-    )
-
-
-class TestLoss(losses.Loss, StrongConvexMixin):
-  """Test loss function for testing BoltOn model."""
-
-  def __init__(self, reg_lambda, c_arg, radius_constant, name='test'):
-    super(TestLoss, self).__init__(name=name)
-    self.reg_lambda = reg_lambda
-    self.C = c_arg  # pylint: disable=invalid-name
-    self.radius_constant = radius_constant
-
-  def radius(self):
-    """Radius, R, of the hypothesis space W.
-
-    W is a convex set that forms the hypothesis space.
-
-    Returns:
-      a tensor
-    """
-    return _ops.convert_to_tensor_v2(self.radius_constant, dtype=tf.float32)
-
-  def gamma(self):
-    """Returns strongly convex parameter, gamma."""
-    return _ops.convert_to_tensor_v2(1, dtype=tf.float32)
-
-  def beta(self, class_weight):  # pylint: disable=unused-argument
-    """Smoothness, beta.
-
-    Args:
-      class_weight: the class weights as scalar or 1d tensor, where its
-        dimensionality is equal to the number of outputs.
-
-    Returns:
-      Beta
-    """
-    return _ops.convert_to_tensor_v2(1, dtype=tf.float32)
-
-  def lipchitz_constant(self, class_weight):  # pylint: disable=unused-argument
-    """Lipchitz constant, L.
-
-    Args:
-      class_weight: class weights used
-
-    Returns:
-      constant L
-    """
-    return _ops.convert_to_tensor_v2(1, dtype=tf.float32)
-
-  def call(self, y_true, y_pred):
-    """Loss function that is minimized at the mean of the input points."""
-    return 0.5 * tf.reduce_sum(
-        tf.math.squared_difference(y_true, y_pred),
-        axis=1
-    )
-
-  def max_class_weight(self, class_weight, dtype=tf.float32):
-    """the maximum weighting in class weights (max value) as a scalar tensor.
-
-    Args:
-      class_weight: class weights used
-      dtype: the data type for tensor conversions.
-
-    Returns:
-      maximum class weighting as tensor scalar
-    """
-    if class_weight is None:
-      return 1
-    raise NotImplementedError('')
-
-  def kernel_regularizer(self):
-    """Returns the kernel_regularizer to be used.
-
-    Any subclass should override this method if they want a kernel_regularizer
-    (if required for the loss function to be StronglyConvex.
-    """
-    return L1L2(l2=self.reg_lambda)
-
-
-class TestOptimizer(OptimizerV2):
-  """Optimizer used for testing the BoltOn optimizer."""
-
-  def __init__(self):
-    super(TestOptimizer, self).__init__('test')
-    self.not_private = 'test'
-    self.iterations = tf.constant(1, dtype=tf.float32)
-    self._iterations = tf.constant(1, dtype=tf.float32)
-
-  def _compute_gradients(self, loss, var_list, grad_loss=None):
-    return 'test'
-
-  def get_config(self):
-    return 'test'
-
-  def from_config(self, config, custom_objects=None):
-    return 'test'
-
-  def _create_slots(self):
-    return 'test'
-
-  def _resource_apply_dense(self, grad, handle):
-    return 'test'
-
-  def _resource_apply_sparse(self, grad, handle, indices):
-    return 'test'
-
-  def get_updates(self, loss, params):
-    return 'test'
-
-  def apply_gradients(self, grads_and_vars, name=None):
-    return 'test'
-
-  def minimize(self, loss, var_list, grad_loss=None, name=None):
-    return 'test'
-
-  def get_gradients(self, loss, params):
-    return 'test'
-
-  def limit_learning_rate(self):
-    return 'test'
-
-
-class BoltonOptimizerTest(keras_parameterized.TestCase):
-  """BoltOn Optimizer tests."""
-  @test_util.run_all_in_graph_and_eager_modes
-  @parameterized.named_parameters([
-      {'testcase_name': 'getattr',
-       'fn': '__getattr__',
-       'args': ['dtype'],
-       'result': tf.float32,
-       'test_attr': None},
-      {'testcase_name': 'project_weights_to_r',
-       'fn': 'project_weights_to_r',
-       'args': ['dtype'],
-       'result': None,
-       'test_attr': ''},
-  ])
-
-  def test_fn(self, fn, args, result, test_attr):
-    """test that a fn of BoltOn optimizer is working as expected.
-
-    Args:
-      fn: method of Optimizer to test
-      args: args to optimizer fn
-      result: the expected result
-      test_attr: None if the fn returns the test result. Otherwise, this is
-        the attribute of BoltOn to check against result with.
-
-    """
-    tf.random.set_seed(1)
-    loss = TestLoss(1, 1, 1)
-    bolton = opt.BoltOn(TestOptimizer(), loss)
-    model = TestModel(1)
-    model.layers[0].kernel = \
-      model.layers[0].kernel_initializer((model.layer_input_shape[0],
-                                          model.n_outputs))
-    bolton._is_init = True  # pylint: disable=protected-access
-    bolton.layers = model.layers
-    bolton.epsilon = 2
-    bolton.noise_distribution = 'laplace'
-    bolton.n_outputs = 1
-    bolton.n_samples = 1
-    res = getattr(bolton, fn, None)(*args)
-    if test_attr is not None:
-      res = getattr(bolton, test_attr, None)
-    if hasattr(res, 'numpy') and hasattr(result, 'numpy'):  # both tensors/not
-      res = res.numpy()
-      result = result.numpy()
-    self.assertEqual(res, result)
-
-  @test_util.run_all_in_graph_and_eager_modes
-  @parameterized.named_parameters([
-      {'testcase_name': '1 value project to r=1',
-       'r': 1,
-       'init_value': 2,
-       'shape': (1,),
-       'n_out': 1,
-       'result': [[1]]},
-      {'testcase_name': '2 value project to r=1',
-       'r': 1,
-       'init_value': 2,
-       'shape': (2,),
-       'n_out': 1,
-       'result': [[0.707107], [0.707107]]},
-      {'testcase_name': '1 value project to r=2',
-       'r': 2,
-       'init_value': 3,
-       'shape': (1,),
-       'n_out': 1,
-       'result': [[2]]},
-      {'testcase_name': 'no project',
-       'r': 2,
-       'init_value': 1,
-       'shape': (1,),
-       'n_out': 1,
-       'result': [[1]]},
-  ])
-  def test_project(self, r, shape, n_out, init_value, result):
-    """test that a fn of BoltOn optimizer is working as expected.
-
-    Args:
-      r: Radius value for StrongConvex loss function.
-      shape: input_dimensionality
-      n_out: output dimensionality
-      init_value: the initial value for 'constant' kernel initializer
-      result: the expected output after projection.
-    """
-    tf.random.set_seed(1)
-    def project_fn(r):
-      loss = TestLoss(1, 1, r)
-      bolton = opt.BoltOn(TestOptimizer(), loss)
-      model = TestModel(n_out, shape, init_value)
-      model.compile(bolton, loss)
-      model.layers[0].kernel = \
-        model.layers[0].kernel_initializer((model.layer_input_shape[0],
-                                            model.n_outputs))
-      bolton._is_init = True  # pylint: disable=protected-access
-      bolton.layers = model.layers
-      bolton.epsilon = 2
-      bolton.noise_distribution = 'laplace'
-      bolton.n_outputs = 1
-      bolton.n_samples = 1
-      bolton.project_weights_to_r()
-      return _ops.convert_to_tensor_v2(bolton.layers[0].kernel, tf.float32)
-    res = project_fn(r)
-    self.assertAllClose(res, result)
-
-  @test_util.run_all_in_graph_and_eager_modes
-  @parameterized.named_parameters([
-      {'testcase_name': 'normal values',
-       'epsilon': 2,
-       'noise': 'laplace',
-       'class_weights': 1},
-  ])
-  def test_context_manager(self, noise, epsilon, class_weights):
-    """Tests the context manager functionality of the optimizer.
-
-    Args:
-      noise: noise distribution to pick
-      epsilon: epsilon privacy parameter to use
-      class_weights: class_weights to use
-    """
-    @tf.function
-    def test_run():
-      loss = TestLoss(1, 1, 1)
-      bolton = opt.BoltOn(TestOptimizer(), loss)
-      model = TestModel(1, (1,), 1)
-      model.compile(bolton, loss)
-      model.layers[0].kernel = \
-        model.layers[0].kernel_initializer((model.layer_input_shape[0],
-                                            model.n_outputs))
-      with bolton(noise, epsilon, model.layers, class_weights, 1, 1) as _:
-        pass
-      return _ops.convert_to_tensor_v2(bolton.epsilon, dtype=tf.float32)
-    epsilon = test_run()
-    self.assertEqual(epsilon.numpy(), -1)
-
-  @parameterized.named_parameters([
-      {'testcase_name': 'invalid noise',
-       'epsilon': 1,
-       'noise': 'not_valid',
-       'err_msg': 'Detected noise distribution: not_valid not one of:'},
-      {'testcase_name': 'invalid epsilon',
-       'epsilon': -1,
-       'noise': 'laplace',
-       'err_msg': 'Detected epsilon: -1. Valid range is 0 < epsilon <inf'},
-  ])
-  def test_context_domains(self, noise, epsilon, err_msg):
-    """Tests the context domains.
-
-    Args:
-      noise: noise distribution to pick
-      epsilon: epsilon privacy parameter to use
-      err_msg: the expected error message
-
-    """
-
-    @tf.function
-    def test_run(noise, epsilon):
-      loss = TestLoss(1, 1, 1)
-      bolton = opt.BoltOn(TestOptimizer(), loss)
-      model = TestModel(1, (1,), 1)
-      model.compile(bolton, loss)
-      model.layers[0].kernel = \
-        model.layers[0].kernel_initializer((model.layer_input_shape[0],
-                                            model.n_outputs))
-      with bolton(noise, epsilon, model.layers, 1, 1, 1) as _:
-        pass
-    with self.assertRaisesRegexp(ValueError, err_msg):  # pylint: disable=deprecated-method
-      test_run(noise, epsilon)
-
-  @parameterized.named_parameters([
-      {'testcase_name': 'fn: get_noise',
-       'fn': 'get_noise',
-       'args': [1, 1],
-       'err_msg': 'This method must be called from within the '
-                  'optimizer\'s context'},
-  ])
-  def test_not_in_context(self, fn, args, err_msg):
-    """Tests that the expected functions raise errors when not in context.
-
-    Args:
-        fn: the function to test
-        args: the arguments for said function
-        err_msg: expected error message
-    """
-    def test_run(fn, args):
-      loss = TestLoss(1, 1, 1)
-      bolton = opt.BoltOn(TestOptimizer(), loss)
-      model = TestModel(1, (1,), 1)
-      model.compile(bolton, loss)
-      model.layers[0].kernel = \
-        model.layers[0].kernel_initializer((model.layer_input_shape[0],
-                                            model.n_outputs))
-      getattr(bolton, fn)(*args)
-
-    with self.assertRaisesRegexp(Exception, err_msg):  # pylint: disable=deprecated-method
-      test_run(fn, args)
-
-  @parameterized.named_parameters([
-      {'testcase_name': 'fn: get_updates',
-       'fn': 'get_updates',
-       'args': [0, 0]},
-      {'testcase_name': 'fn: get_config',
-       'fn': 'get_config',
-       'args': []},
-      {'testcase_name': 'fn: from_config',
-       'fn': 'from_config',
-       'args': [0]},
-      {'testcase_name': 'fn: _resource_apply_dense',
-       'fn': '_resource_apply_dense',
-       'args': [1, 1]},
-      {'testcase_name': 'fn: _resource_apply_sparse',
-       'fn': '_resource_apply_sparse',
-       'args': [1, 1, 1]},
-      {'testcase_name': 'fn: apply_gradients',
-       'fn': 'apply_gradients',
-       'args': [1]},
-      {'testcase_name': 'fn: minimize',
-       'fn': 'minimize',
-       'args': [1, 1]},
-      {'testcase_name': 'fn: _compute_gradients',
-       'fn': '_compute_gradients',
-       'args': [1, 1]},
-      {'testcase_name': 'fn: get_gradients',
-       'fn': 'get_gradients',
-       'args': [1, 1]},
-  ])
-  def test_rerouted_function(self, fn, args):
-    """Tests rerouted function.
-
-    Tests that a method of the internal optimizer is correctly routed from
-    the BoltOn instance to the internal optimizer instance (TestOptimizer,
-    here).
-
-    Args:
-      fn: fn to test
-      args: arguments to that fn
-    """
-    loss = TestLoss(1, 1, 1)
-    optimizer = TestOptimizer()
-    bolton = opt.BoltOn(optimizer, loss)
-    model = TestModel(3)
-    model.compile(optimizer, loss)
-    model.layers[0].kernel = \
-        model.layers[0].kernel_initializer((model.layer_input_shape[0],
-                                            model.n_outputs))
-    model.layers[0].kernel = \
-      model.layers[0].kernel_initializer((model.layer_input_shape[0],
-                                          model.n_outputs))
-    bolton._is_init = True  # pylint: disable=protected-access
-    bolton.layers = model.layers
-    bolton.epsilon = 2
-    bolton.noise_distribution = 'laplace'
-    bolton.n_outputs = 1
-    bolton.n_samples = 1
-    self.assertEqual(
-        getattr(bolton, fn, lambda: 'fn not found')(*args),
-        'test'
-    )
-
-  @parameterized.named_parameters([
-      {'testcase_name': 'fn: project_weights_to_r',
-       'fn': 'project_weights_to_r',
-       'args': []},
-      {'testcase_name': 'fn: get_noise',
-       'fn': 'get_noise',
-       'args': [1, 1]},
-  ])
-  def test_not_reroute_fn(self, fn, args):
-    """Test function is not rerouted.
-
-    Test that a fn that should not be rerouted to the internal optimizer is
-    in fact not rerouted.
-
-    Args:
-      fn: fn to test
-      args: arguments to that fn
-    """
-    def test_run(fn, args):
-      loss = TestLoss(1, 1, 1)
-      bolton = opt.BoltOn(TestOptimizer(), loss)
-      model = TestModel(1, (1,), 1)
-      model.compile(bolton, loss)
-      model.layers[0].kernel = \
-        model.layers[0].kernel_initializer((model.layer_input_shape[0],
-                                            model.n_outputs))
-      bolton._is_init = True  # pylint: disable=protected-access
-      bolton.noise_distribution = 'laplace'
-      bolton.epsilon = 1
-      bolton.layers = model.layers
-      bolton.class_weights = 1
-      bolton.n_samples = 1
-      bolton.batch_size = 1
-      bolton.n_outputs = 1
-      res = getattr(bolton, fn, lambda: 'test')(*args)
-      if res != 'test':
-        res = 1
-      else:
-        res = 0
-      return _ops.convert_to_tensor_v2(res, dtype=tf.float32)
-    self.assertNotEqual(test_run(fn, args), 0)
-
-  @parameterized.named_parameters([
-      {'testcase_name': 'attr: _iterations',
-       'attr': '_iterations'}
-  ])
-  def test_reroute_attr(self, attr):
-    """Test a function is rerouted.
-
-    Test that attribute of internal optimizer is correctly rerouted to the
-    internal optimizer.
-
-    Args:
-      attr: attribute to test
-    """
-    loss = TestLoss(1, 1, 1)
-    internal_optimizer = TestOptimizer()
-    optimizer = opt.BoltOn(internal_optimizer, loss)
-    self.assertEqual(getattr(optimizer, attr),
-                     getattr(internal_optimizer, attr))
-
-  @parameterized.named_parameters([
-      {'testcase_name': 'attr does not exist',
-       'attr': '_not_valid'}
-  ])
-  def test_attribute_error(self, attr):
-    """Test rerouting of attributes.
-
-    Test that attribute of internal optimizer is correctly rerouted to the
-    internal optimizer
-
-    Args:
-      attr: attribute to test
-    """
-    loss = TestLoss(1, 1, 1)
-    internal_optimizer = TestOptimizer()
-    optimizer = opt.BoltOn(internal_optimizer, loss)
-    with self.assertRaises(AttributeError):
-      getattr(optimizer, attr)
-
-
-class SchedulerTest(keras_parameterized.TestCase):
-  """GammaBeta Scheduler tests."""
-
-  @parameterized.named_parameters([
-      {'testcase_name': 'not in context',
-       'err_msg': 'Please initialize the GammaBetaDecreasingStep Learning Rate'
-                  ' Scheduler'
-      }
-  ])
-  def test_bad_call(self, err_msg):
-    """Test attribute of internal opt correctly rerouted to the internal opt.
-
-    Args:
-      err_msg: The expected error message from the scheduler bad call.
-    """
-    scheduler = opt.GammaBetaDecreasingStep()
-    with self.assertRaisesRegexp(Exception, err_msg):  # pylint: disable=deprecated-method
-      scheduler(1)
-
-  @parameterized.named_parameters([
-      {'testcase_name': 'step 1',
-       'step': 1,
-       'res': 0.5},
-      {'testcase_name': 'step 2',
-       'step': 2,
-       'res': 0.5},
-      {'testcase_name': 'step 3',
-       'step': 3,
-       'res': 0.333333333},
-  ])
-  def test_call(self, step, res):
-    """Test call.
-
-    Test that attribute of internal optimizer is correctly rerouted to the
-    internal optimizer
-
-    Args:
-      step: step number to 'GammaBetaDecreasingStep' 'Scheduler'.
-      res: expected result from call to 'GammaBetaDecreasingStep' 'Scheduler'.
-    """
-    beta = _ops.convert_to_tensor_v2(2, dtype=tf.float32)
-    gamma = _ops.convert_to_tensor_v2(1, dtype=tf.float32)
-    scheduler = opt.GammaBetaDecreasingStep()
-    scheduler.initialize(beta, gamma)
-    step = _ops.convert_to_tensor_v2(step, dtype=tf.float32)
-    lr = scheduler(step)
-    self.assertAllClose(lr.numpy(), res)
-
-
-if __name__ == '__main__':
-  test.main()
-  unittest.main()
--- a/tensorflow_privacy/privacy/dp_query/BUILD
+++ b/tensorflow_privacy/privacy/dp_query/BUILD
@ -1,140 +0,0 @@
-package(default_visibility = ["//visibility:public"])
-
-licenses(["notice"])  # Apache 2.0
-
-py_library(
-    name = "dp_query",
-    srcs = ["dp_query.py"],
-    deps = [
-        "//third_party/py/distutils",
-        "//third_party/py/tensorflow",
-    ],
-)
-
-py_library(
-    name = "gaussian_query",
-    srcs = ["gaussian_query.py"],
-    deps = [
-        ":dp_query",
-        ":normalized_query",
-        "//third_party/py/distutils",
-        "//third_party/py/tensorflow",
-    ],
-)
-
-py_test(
-    name = "gaussian_query_test",
-    size = "small",
-    srcs = ["gaussian_query_test.py"],
-    python_version = "PY2",
-    deps = [
-        ":gaussian_query",
-        ":test_utils",
-        "//third_party/py/absl/testing:parameterized",
-        "//third_party/py/numpy",
-        "//third_party/py/six",
-        "//third_party/py/tensorflow",
-    ],
-)
-
-py_library(
-    name = "no_privacy_query",
-    srcs = ["no_privacy_query.py"],
-    deps = [
-        ":dp_query",
-        "//third_party/py/distutils",
-        "//third_party/py/tensorflow",
-    ],
-)
-
-py_test(
-    name = "no_privacy_query_test",
-    size = "small",
-    srcs = ["no_privacy_query_test.py"],
-    python_version = "PY2",
-    deps = [
-        ":no_privacy_query",
-        ":test_utils",
-        "//third_party/py/absl/testing:parameterized",
-        "//third_party/py/tensorflow",
-    ],
-)
-
-py_library(
-    name = "normalized_query",
-    srcs = ["normalized_query.py"],
-    deps = [
-        ":dp_query",
-        "//third_party/py/distutils",
-        "//third_party/py/tensorflow",
-    ],
-)
-
-py_test(
-    name = "normalized_query_test",
-    size = "small",
-    srcs = ["normalized_query_test.py"],
-    python_version = "PY2",
-    deps = [
-        ":gaussian_query",
-        ":normalized_query",
-        ":test_utils",
-        "//third_party/py/tensorflow",
-    ],
-)
-
-py_library(
-    name = "nested_query",
-    srcs = ["nested_query.py"],
-    deps = [
-        ":dp_query",
-        "//third_party/py/distutils",
-        "//third_party/py/tensorflow",
-    ],
-)
-
-py_test(
-    name = "nested_query_test",
-    size = "small",
-    srcs = ["nested_query_test.py"],
-    python_version = "PY2",
-    deps = [
-        ":gaussian_query",
-        ":nested_query",
-        ":test_utils",
-        "//third_party/py/absl/testing:parameterized",
-        "//third_party/py/distutils",
-        "//third_party/py/numpy",
-        "//third_party/py/tensorflow",
-    ],
-)
-
-py_library(
-    name = "quantile_adaptive_clip_sum_query",
-    srcs = ["quantile_adaptive_clip_sum_query.py"],
-    deps = [
-        ":dp_query",
-        ":gaussian_query",
-        ":normalized_query",
-        "//third_party/py/tensorflow",
-    ],
-)
-
-py_test(
-    name = "quantile_adaptive_clip_sum_query_test",
-    srcs = ["quantile_adaptive_clip_sum_query_test.py"],
-    python_version = "PY2",
-    deps = [
-        ":quantile_adaptive_clip_sum_query",
-        ":test_utils",
-        "//third_party/py/numpy",
-        "//third_party/py/tensorflow",
-        "//third_party/py/tensorflow_privacy/privacy/analysis:privacy_ledger",
-    ],
-)
-
-py_library(
-    name = "test_utils",
-    srcs = ["test_utils.py"],
-    deps = [],
-)
--- a/tensorflow_privacy/privacy/dp_query/init.py
+++ b/tensorflow_privacy/privacy/dp_query/init.py
--- a/tensorflow_privacy/privacy/dp_query/dp_query.py
+++ b/tensorflow_privacy/privacy/dp_query/dp_query.py
@ -1,225 +0,0 @@
-# Copyright 2019, The TensorFlow Authors.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#      http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-"""An interface for differentially private query mechanisms.
-
-The DPQuery class abstracts the differential privacy mechanism needed by DP-SGD.
-
-The nomenclature is not specific to machine learning, but rather comes from
-the differential privacy literature. Therefore, instead of talking about
-examples, minibatches, and gradients, the code talks about records, samples and
-queries. For more detail, please see the paper here:
-https://arxiv.org/pdf/1812.06210.pdf
-
-A common usage paradigm for this class is centralized DP-SGD training on a
-fixed set of training examples, which we call "standard DP-SGD training."
-In such training, SGD applies as usual by computing gradient updates from a set
-of training examples that form a minibatch. However, each minibatch is broken
-up into disjoint "microbatches."  The gradient of each microbatch is computed
-and clipped to a maximum norm, with the "records" for all such clipped gradients
-forming a "sample" that constitutes the entire minibatch. Subsequently, that
-sample can be "queried" to get an averaged, noised gradient update that can be
-applied to model parameters.
-
-In order to prevent inaccurate accounting of privacy parameters, the only
-means of inspecting the gradients and updates of SGD training is via the use
-of the below interfaces, and through the accumulation and querying of a
-"sample state" abstraction. Thus, accessing data is indirect on purpose.
-
-The DPQuery class also allows the use of a global state that may change between
-samples. In the common situation where the privacy mechanism remains unchanged
-throughout the entire training process, the global state is usually None.
-"""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import abc
-from distutils.version import LooseVersion
-
-import tensorflow as tf
-if LooseVersion(tf.__version__) < LooseVersion('2.0.0'):
-  nest = tf.contrib.framework.nest
-else:
-  nest = tf.nest
-
-
-class DPQuery(object):
-  """Interface for differentially private query mechanisms."""
-
-  __metaclass__ = abc.ABCMeta
-
-  def set_ledger(self, ledger):
-    """Supplies privacy ledger to which the query can record privacy events.
-
-    Args:
-      ledger: A `PrivacyLedger`.
-    """
-    del ledger
-    raise TypeError(
-        'DPQuery type %s does not support set_ledger.' % type(self).__name__)
-
-  def initial_global_state(self):
-    """Returns the initial global state for the DPQuery."""
-    return ()
-
-  def derive_sample_params(self, global_state):
-    """Given the global state, derives parameters to use for the next sample.
-
-    Args:
-      global_state: The current global state.
-
-    Returns:
-      Parameters to use to process records in the next sample.
-    """
-    del global_state  # unused.
-    return ()
-
-  @abc.abstractmethod
-  def initial_sample_state(self, template):
-    """Returns an initial state to use for the next sample.
-
-    Args:
-      template: A nested structure of tensors, TensorSpecs, or numpy arrays used
-        as a template to create the initial sample state. It is assumed that the
-        leaves of the structure are python scalars or some type that has
-        properties `shape` and `dtype`.
-
-    Returns: An initial sample state.
-    """
-    pass
-
-  def preprocess_record(self, params, record):
-    """Preprocesses a single record.
-
-    This preprocessing is applied to one client's record, e.g. selecting vectors
-    and clipping them to a fixed L2 norm. This method can be executed in a
-    separate TF session, or even on a different machine, so it should not depend
-    on any TF inputs other than those provided as input arguments. In
-    particular, implementations should avoid accessing any TF tensors or
-    variables that are stored in self.
-
-    Args:
-      params: The parameters for the sample. In standard DP-SGD training,
-        the clipping norm for the sample's microbatch gradients (i.e.,
-        a maximum norm magnitude to which each gradient is clipped)
-      record: The record to be processed. In standard DP-SGD training,
-        the gradient computed for the examples in one microbatch, which
-        may be the gradient for just one example (for size 1 microbatches).
-
-    Returns:
-      A structure of tensors to be aggregated.
-    """
-    del params  # unused.
-    return record
-
-  @abc.abstractmethod
-  def accumulate_preprocessed_record(
-      self, sample_state, preprocessed_record):
-    """Accumulates a single preprocessed record into the sample state.
-
-    This method is intended to only do simple aggregation, typically just a sum.
-    In the future, we might remove this method and replace it with a way to
-    declaratively specify the type of aggregation required.
-
-    Args:
-      sample_state: The current sample state. In standard DP-SGD training,
-        the accumulated sum of previous clipped microbatch gradients.
-      preprocessed_record: The preprocessed record to accumulate.
-
-    Returns:
-      The updated sample state.
-    """
-    pass
-
-  def accumulate_record(self, params, sample_state, record):
-    """Accumulates a single record into the sample state.
-
-    This is a helper method that simply delegates to `preprocess_record` and
-    `accumulate_preprocessed_record` for the common case when both of those
-    functions run on a single device.
-
-    Args:
-      params: The parameters for the sample. In standard DP-SGD training,
-        the clipping norm for the sample's microbatch gradients (i.e.,
-        a maximum norm magnitude to which each gradient is clipped)
-      sample_state: The current sample state. In standard DP-SGD training,
-        the accumulated sum of previous clipped microbatch gradients.
-      record: The record to accumulate. In standard DP-SGD training,
-        the gradient computed for the examples in one microbatch, which
-        may be the gradient for just one example (for size 1 microbatches).
-
-    Returns:
-      The updated sample state. In standard DP-SGD training, the set of
-      previous mcrobatch gradients with the addition of the record argument.
-    """
-    preprocessed_record = self.preprocess_record(params, record)
-    return self.accumulate_preprocessed_record(
-        sample_state, preprocessed_record)
-
-  @abc.abstractmethod
-  def merge_sample_states(self, sample_state_1, sample_state_2):
-    """Merges two sample states into a single state.
-
-    Args:
-      sample_state_1: The first sample state to merge.
-      sample_state_2: The second sample state to merge.
-
-    Returns:
-      The merged sample state.
-    """
-    pass
-
-  @abc.abstractmethod
-  def get_noised_result(self, sample_state, global_state):
-    """Gets query result after all records of sample have been accumulated.
-
-    Args:
-      sample_state: The sample state after all records have been accumulated.
-        In standard DP-SGD training, the accumulated sum of clipped microbatch
-        gradients (in the special case of microbatches of size 1, the clipped
-        per-example gradients).
-      global_state: The global state, storing long-term privacy bookkeeping.
-
-    Returns:
-      A tuple (result, new_global_state) where "result" is the result of the
-      query and "new_global_state" is the updated global state. In standard
-      DP-SGD training, the result is a gradient update comprising a noised
-      average of the clipped gradients in the sample state---with the noise and
-      averaging performed in a manner that guarantees differential privacy.
-    """
-    pass
-
-
-def zeros_like(arg):
-  """A `zeros_like` function that also works for `tf.TensorSpec`s."""
-  try:
-    arg = tf.convert_to_tensor(arg)
-  except TypeError:
-    pass
-  return tf.zeros(arg.shape, arg.dtype)
-
-
-class SumAggregationDPQuery(DPQuery):
-  """Base class for DPQueries that aggregate via sum."""
-
-  def initial_sample_state(self, template):
-    return nest.map_structure(zeros_like, template)
-
-  def accumulate_preprocessed_record(self, sample_state, preprocessed_record):
-    return nest.map_structure(tf.add, sample_state, preprocessed_record)
-
-  def merge_sample_states(self, sample_state_1, sample_state_2):
-    return nest.map_structure(tf.add, sample_state_1, sample_state_2)
--- a/tensorflow_privacy/privacy/dp_query/gaussian_query.py
+++ b/tensorflow_privacy/privacy/dp_query/gaussian_query.py
@ -1,145 +0,0 @@
-# Copyright 2018, The TensorFlow Authors.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#      http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-"""Implements DPQuery interface for Gaussian average queries.
-"""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import collections
-
-from distutils.version import LooseVersion
-import tensorflow as tf
-
-from tensorflow_privacy.privacy.dp_query import dp_query
-from tensorflow_privacy.privacy.dp_query import normalized_query
-
-if LooseVersion(tf.__version__) < LooseVersion('2.0.0'):
-  nest = tf.contrib.framework.nest
-else:
-  nest = tf.nest
-
-
-class GaussianSumQuery(dp_query.SumAggregationDPQuery):
-  """Implements DPQuery interface for Gaussian sum queries.
-
-  Accumulates clipped vectors, then adds Gaussian noise to the sum.
-  """
-
-  # pylint: disable=invalid-name
-  _GlobalState = collections.namedtuple(
-      '_GlobalState', ['l2_norm_clip', 'stddev'])
-
-  def __init__(self, l2_norm_clip, stddev):
-    """Initializes the GaussianSumQuery.
-
-    Args:
-      l2_norm_clip: The clipping norm to apply to the global norm of each
-        record.
-      stddev: The stddev of the noise added to the sum.
-    """
-    self._l2_norm_clip = l2_norm_clip
-    self._stddev = stddev
-    self._ledger = None
-
-  def set_ledger(self, ledger):
-    self._ledger = ledger
-
-  def make_global_state(self, l2_norm_clip, stddev):
-    """Creates a global state from the given parameters."""
-    return self._GlobalState(tf.cast(l2_norm_clip, tf.float32),
-                             tf.cast(stddev, tf.float32))
-
-  def initial_global_state(self):
-    return self.make_global_state(self._l2_norm_clip, self._stddev)
-
-  def derive_sample_params(self, global_state):
-    return global_state.l2_norm_clip
-
-  def initial_sample_state(self, template):
-    return nest.map_structure(
-        dp_query.zeros_like, template)
-
-  def preprocess_record_impl(self, params, record):
-    """Clips the l2 norm, returning the clipped record and the l2 norm.
-
-    Args:
-      params: The parameters for the sample.
-      record: The record to be processed.
-
-    Returns:
-      A tuple (preprocessed_records, l2_norm) where `preprocessed_records` is
-        the structure of preprocessed tensors, and l2_norm is the total l2 norm
-        before clipping.
-    """
-    l2_norm_clip = params
-    record_as_list = nest.flatten(record)
-    clipped_as_list, norm = tf.clip_by_global_norm(record_as_list, l2_norm_clip)
-    return nest.pack_sequence_as(record, clipped_as_list), norm
-
-  def preprocess_record(self, params, record):
-    preprocessed_record, _ = self.preprocess_record_impl(params, record)
-    return preprocessed_record
-
-  def get_noised_result(self, sample_state, global_state):
-    """See base class."""
-    if LooseVersion(tf.__version__) < LooseVersion('2.0.0'):
-      def add_noise(v):
-        return v + tf.random_normal(tf.shape(v), stddev=global_state.stddev)
-    else:
-      random_normal = tf.random_normal_initializer(stddev=global_state.stddev)
-      def add_noise(v):
-        return v + random_normal(tf.shape(v))
-
-    if self._ledger:
-      dependencies = [
-          self._ledger.record_sum_query(
-              global_state.l2_norm_clip, global_state.stddev)
-      ]
-    else:
-      dependencies = []
-    with tf.control_dependencies(dependencies):
-      return nest.map_structure(add_noise, sample_state), global_state
-
-
-class GaussianAverageQuery(normalized_query.NormalizedQuery):
-  """Implements DPQuery interface for Gaussian average queries.
-
-  Accumulates clipped vectors, adds Gaussian noise, and normalizes.
-
-  Note that we use "fixed-denominator" estimation: the denominator should be
-  specified as the expected number of records per sample. Accumulating the
-  denominator separately would also be possible but would be produce a higher
-  variance estimator.
-  """
-
-  def __init__(self,
-               l2_norm_clip,
-               sum_stddev,
-               denominator):
-    """Initializes the GaussianAverageQuery.
-
-    Args:
-      l2_norm_clip: The clipping norm to apply to the global norm of each
-        record.
-      sum_stddev: The stddev of the noise added to the sum (before
-        normalization).
-      denominator: The normalization constant (applied after noise is added to
-        the sum).
-    """
-    super(GaussianAverageQuery, self).__init__(
-        numerator_query=GaussianSumQuery(l2_norm_clip, sum_stddev),
-        denominator=denominator)
--- a/tensorflow_privacy/privacy/dp_query/gaussian_query_test.py
+++ b/tensorflow_privacy/privacy/dp_query/gaussian_query_test.py
@ -1,161 +0,0 @@
-# Copyright 2018, The TensorFlow Authors.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#      http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-"""Tests for GaussianAverageQuery."""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-from absl.testing import parameterized
-import numpy as np
-from six.moves import xrange
-import tensorflow as tf
-
-from tensorflow_privacy.privacy.dp_query import gaussian_query
-from tensorflow_privacy.privacy.dp_query import test_utils
-
-
-class GaussianQueryTest(tf.test.TestCase, parameterized.TestCase):
-
-  def test_gaussian_sum_no_clip_no_noise(self):
-    with self.cached_session() as sess:
-      record1 = tf.constant([2.0, 0.0])
-      record2 = tf.constant([-1.0, 1.0])
-
-      query = gaussian_query.GaussianSumQuery(
-          l2_norm_clip=10.0, stddev=0.0)
-      query_result, _ = test_utils.run_query(query, [record1, record2])
-      result = sess.run(query_result)
-      expected = [1.0, 1.0]
-      self.assertAllClose(result, expected)
-
-  def test_gaussian_sum_with_clip_no_noise(self):
-    with self.cached_session() as sess:
-      record1 = tf.constant([-6.0, 8.0])  # Clipped to [-3.0, 4.0].
-      record2 = tf.constant([4.0, -3.0])  # Not clipped.
-
-      query = gaussian_query.GaussianSumQuery(
-          l2_norm_clip=5.0, stddev=0.0)
-      query_result, _ = test_utils.run_query(query, [record1, record2])
-      result = sess.run(query_result)
-      expected = [1.0, 1.0]
-      self.assertAllClose(result, expected)
-
-  def test_gaussian_sum_with_changing_clip_no_noise(self):
-    with self.cached_session() as sess:
-      record1 = tf.constant([-6.0, 8.0])  # Clipped to [-3.0, 4.0].
-      record2 = tf.constant([4.0, -3.0])  # Not clipped.
-
-      l2_norm_clip = tf.Variable(5.0)
-      l2_norm_clip_placeholder = tf.placeholder(tf.float32)
-      assign_l2_norm_clip = tf.assign(l2_norm_clip, l2_norm_clip_placeholder)
-      query = gaussian_query.GaussianSumQuery(
-          l2_norm_clip=l2_norm_clip, stddev=0.0)
-      query_result, _ = test_utils.run_query(query, [record1, record2])
-
-      self.evaluate(tf.global_variables_initializer())
-      result = sess.run(query_result)
-      expected = [1.0, 1.0]
-      self.assertAllClose(result, expected)
-
-      sess.run(assign_l2_norm_clip, {l2_norm_clip_placeholder: 0.0})
-      result = sess.run(query_result)
-      expected = [0.0, 0.0]
-      self.assertAllClose(result, expected)
-
-  def test_gaussian_sum_with_noise(self):
-    with self.cached_session() as sess:
-      record1, record2 = 2.71828, 3.14159
-      stddev = 1.0
-
-      query = gaussian_query.GaussianSumQuery(
-          l2_norm_clip=5.0, stddev=stddev)
-      query_result, _ = test_utils.run_query(query, [record1, record2])
-
-      noised_sums = []
-      for _ in xrange(1000):
-        noised_sums.append(sess.run(query_result))
-
-      result_stddev = np.std(noised_sums)
-      self.assertNear(result_stddev, stddev, 0.1)
-
-  def test_gaussian_sum_merge(self):
-    records1 = [tf.constant([2.0, 0.0]), tf.constant([-1.0, 1.0])]
-    records2 = [tf.constant([3.0, 5.0]), tf.constant([-1.0, 4.0])]
-
-    def get_sample_state(records):
-      query = gaussian_query.GaussianSumQuery(l2_norm_clip=10.0, stddev=1.0)
-      global_state = query.initial_global_state()
-      params = query.derive_sample_params(global_state)
-      sample_state = query.initial_sample_state(records[0])
-      for record in records:
-        sample_state = query.accumulate_record(params, sample_state, record)
-      return sample_state
-
-    sample_state_1 = get_sample_state(records1)
-    sample_state_2 = get_sample_state(records2)
-
-    merged = gaussian_query.GaussianSumQuery(10.0, 1.0).merge_sample_states(
-        sample_state_1,
-        sample_state_2)
-
-    with self.cached_session() as sess:
-      result = sess.run(merged)
-
-    expected = [3.0, 10.0]
-    self.assertAllClose(result, expected)
-
-  def test_gaussian_average_no_noise(self):
-    with self.cached_session() as sess:
-      record1 = tf.constant([5.0, 0.0])   # Clipped to [3.0, 0.0].
-      record2 = tf.constant([-1.0, 2.0])  # Not clipped.
-
-      query = gaussian_query.GaussianAverageQuery(
-          l2_norm_clip=3.0, sum_stddev=0.0, denominator=2.0)
-      query_result, _ = test_utils.run_query(query, [record1, record2])
-      result = sess.run(query_result)
-      expected_average = [1.0, 1.0]
-      self.assertAllClose(result, expected_average)
-
-  def test_gaussian_average_with_noise(self):
-    with self.cached_session() as sess:
-      record1, record2 = 2.71828, 3.14159
-      sum_stddev = 1.0
-      denominator = 2.0
-
-      query = gaussian_query.GaussianAverageQuery(
-          l2_norm_clip=5.0, sum_stddev=sum_stddev, denominator=denominator)
-      query_result, _ = test_utils.run_query(query, [record1, record2])
-
-      noised_averages = []
-      for _ in range(1000):
-        noised_averages.append(sess.run(query_result))
-
-      result_stddev = np.std(noised_averages)
-      avg_stddev = sum_stddev / denominator
-      self.assertNear(result_stddev, avg_stddev, 0.1)
-
-  @parameterized.named_parameters(
-      ('type_mismatch', [1.0], (1.0,), TypeError),
-      ('too_few_on_left', [1.0], [1.0, 1.0], ValueError),
-      ('too_few_on_right', [1.0, 1.0], [1.0], ValueError))
-  def test_incompatible_records(self, record1, record2, error_type):
-    query = gaussian_query.GaussianSumQuery(1.0, 0.0)
-    with self.assertRaises(error_type):
-      test_utils.run_query(query, [record1, record2])
-
-
-if __name__ == '__main__':
-  tf.test.main()
--- a/tensorflow_privacy/privacy/dp_query/nested_query.py
+++ b/tensorflow_privacy/privacy/dp_query/nested_query.py
@ -1,116 +0,0 @@
-# Copyright 2018, The TensorFlow Authors.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#      http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-"""Implements DPQuery interface for queries over nested structures.
-"""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-from distutils.version import LooseVersion
-import tensorflow as tf
-
-from tensorflow_privacy.privacy.dp_query import dp_query
-
-if LooseVersion(tf.__version__) < LooseVersion('2.0.0'):
-  nest = tf.contrib.framework.nest
-else:
-  nest = tf.nest
-
-
-class NestedQuery(dp_query.DPQuery):
-  """Implements DPQuery interface for structured queries.
-
-  NestedQuery evaluates arbitrary nested structures of queries. Records must be
-  nested structures of tensors that are compatible (in type and arity) with the
-  query structure, but are allowed to have deeper structure within each leaf of
-  the query structure. For example, the nested query [q1, q2] is compatible with
-  the record [t1, t2] or [t1, (t2, t3)], but not with (t1, t2), [t1] or
-  [t1, t2, t3]. The entire substructure of each record corresponding to a leaf
-  node of the query structure is routed to the corresponding query. If the same
-  tensor should be consumed by multiple sub-queries, it can be replicated in the
-  record, for example [t1, t1].
-
-  NestedQuery is intended to allow privacy mechanisms for groups as described in
-  [McMahan & Andrew, 2018: "A General Approach to Adding Differential Privacy to
-  Iterative Training Procedures" (https://arxiv.org/abs/1812.06210)].
-  """
-
-  def __init__(self, queries):
-    """Initializes the NestedQuery.
-
-    Args:
-      queries: A nested structure of queries.
-    """
-    self._queries = queries
-
-  def _map_to_queries(self, fn, *inputs, **kwargs):
-    def caller(query, *args):
-      return getattr(query, fn)(*args, **kwargs)
-    return nest.map_structure_up_to(
-        self._queries, caller, self._queries, *inputs)
-
-  def set_ledger(self, ledger):
-    self._map_to_queries('set_ledger', ledger=ledger)
-
-  def initial_global_state(self):
-    """See base class."""
-    return self._map_to_queries('initial_global_state')
-
-  def derive_sample_params(self, global_state):
-    """See base class."""
-    return self._map_to_queries('derive_sample_params', global_state)
-
-  def initial_sample_state(self, template):
-    """See base class."""
-    return self._map_to_queries('initial_sample_state', template)
-
-  def preprocess_record(self, params, record):
-    """See base class."""
-    return self._map_to_queries('preprocess_record', params, record)
-
-  def accumulate_preprocessed_record(
-      self, sample_state, preprocessed_record):
-    """See base class."""
-    return self._map_to_queries(
-        'accumulate_preprocessed_record',
-        sample_state,
-        preprocessed_record)
-
-  def merge_sample_states(self, sample_state_1, sample_state_2):
-    return self._map_to_queries(
-        'merge_sample_states', sample_state_1, sample_state_2)
-
-  def get_noised_result(self, sample_state, global_state):
-    """Gets query result after all records of sample have been accumulated.
-
-    Args:
-      sample_state: The sample state after all records have been accumulated.
-      global_state: The global state.
-
-    Returns:
-      A tuple (result, new_global_state) where "result" is a structure matching
-      the query structure containing the results of the subqueries and
-      "new_global_state" is a structure containing the updated global states
-      for the subqueries.
-    """
-    estimates_and_new_global_states = self._map_to_queries(
-        'get_noised_result', sample_state, global_state)
-
-    flat_estimates, flat_new_global_states = zip(
-        *nest.flatten_up_to(self._queries, estimates_and_new_global_states))
-    return (
-        nest.pack_sequence_as(self._queries, flat_estimates),
-        nest.pack_sequence_as(self._queries, flat_new_global_states))
--- a/tensorflow_privacy/privacy/dp_query/nested_query_test.py
+++ b/tensorflow_privacy/privacy/dp_query/nested_query_test.py
@ -1,148 +0,0 @@
-# Copyright 2018, The TensorFlow Authors.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#      http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-"""Tests for NestedQuery."""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-
-from absl.testing import parameterized
-from distutils.version import LooseVersion
-import numpy as np
-import tensorflow as tf
-
-from tensorflow_privacy.privacy.dp_query import gaussian_query
-from tensorflow_privacy.privacy.dp_query import nested_query
-from tensorflow_privacy.privacy.dp_query import test_utils
-
-if LooseVersion(tf.__version__) < LooseVersion('2.0.0'):
-  nest = tf.contrib.framework.nest
-else:
-  nest = tf.nest
-
-_basic_query = gaussian_query.GaussianSumQuery(1.0, 0.0)
-
-
-class NestedQueryTest(tf.test.TestCase, parameterized.TestCase):
-
-  def test_nested_gaussian_sum_no_clip_no_noise(self):
-    with self.cached_session() as sess:
-      query1 = gaussian_query.GaussianSumQuery(
-          l2_norm_clip=10.0, stddev=0.0)
-      query2 = gaussian_query.GaussianSumQuery(
-          l2_norm_clip=10.0, stddev=0.0)
-
-      query = nested_query.NestedQuery([query1, query2])
-
-      record1 = [1.0, [2.0, 3.0]]
-      record2 = [4.0, [3.0, 2.0]]
-
-      query_result, _ = test_utils.run_query(query, [record1, record2])
-      result = sess.run(query_result)
-      expected = [5.0, [5.0, 5.0]]
-      self.assertAllClose(result, expected)
-
-  def test_nested_gaussian_average_no_clip_no_noise(self):
-    with self.cached_session() as sess:
-      query1 = gaussian_query.GaussianAverageQuery(
-          l2_norm_clip=10.0, sum_stddev=0.0, denominator=5.0)
-      query2 = gaussian_query.GaussianAverageQuery(
-          l2_norm_clip=10.0, sum_stddev=0.0, denominator=5.0)
-
-      query = nested_query.NestedQuery([query1, query2])
-
-      record1 = [1.0, [2.0, 3.0]]
-      record2 = [4.0, [3.0, 2.0]]
-
-      query_result, _ = test_utils.run_query(query, [record1, record2])
-      result = sess.run(query_result)
-      expected = [1.0, [1.0, 1.0]]
-      self.assertAllClose(result, expected)
-
-  def test_nested_gaussian_average_with_clip_no_noise(self):
-    with self.cached_session() as sess:
-      query1 = gaussian_query.GaussianAverageQuery(
-          l2_norm_clip=4.0, sum_stddev=0.0, denominator=5.0)
-      query2 = gaussian_query.GaussianAverageQuery(
-          l2_norm_clip=5.0, sum_stddev=0.0, denominator=5.0)
-
-      query = nested_query.NestedQuery([query1, query2])
-
-      record1 = [1.0, [12.0, 9.0]]  # Clipped to [1.0, [4.0, 3.0]]
-      record2 = [5.0, [1.0, 2.0]]   # Clipped to [4.0, [1.0, 2.0]]
-
-      query_result, _ = test_utils.run_query(query, [record1, record2])
-      result = sess.run(query_result)
-      expected = [1.0, [1.0, 1.0]]
-      self.assertAllClose(result, expected)
-
-  def test_complex_nested_query(self):
-    with self.cached_session() as sess:
-      query_ab = gaussian_query.GaussianSumQuery(
-          l2_norm_clip=1.0, stddev=0.0)
-      query_c = gaussian_query.GaussianAverageQuery(
-          l2_norm_clip=10.0, sum_stddev=0.0, denominator=2.0)
-      query_d = gaussian_query.GaussianSumQuery(
-          l2_norm_clip=10.0, stddev=0.0)
-
-      query = nested_query.NestedQuery(
-          [query_ab, {'c': query_c, 'd': [query_d]}])
-
-      record1 = [{'a': 0.0, 'b': 2.71828}, {'c': (-4.0, 6.0), 'd': [-4.0]}]
-      record2 = [{'a': 3.14159, 'b': 0.0}, {'c': (6.0, -4.0), 'd': [5.0]}]
-
-      query_result, _ = test_utils.run_query(query, [record1, record2])
-      result = sess.run(query_result)
-      expected = [{'a': 1.0, 'b': 1.0}, {'c': (1.0, 1.0), 'd': [1.0]}]
-      self.assertAllClose(result, expected)
-
-  def test_nested_query_with_noise(self):
-    with self.cached_session() as sess:
-      sum_stddev = 2.71828
-      denominator = 3.14159
-
-      query1 = gaussian_query.GaussianSumQuery(
-          l2_norm_clip=1.5, stddev=sum_stddev)
-      query2 = gaussian_query.GaussianAverageQuery(
-          l2_norm_clip=0.5, sum_stddev=sum_stddev, denominator=denominator)
-      query = nested_query.NestedQuery((query1, query2))
-
-      record1 = (3.0, [2.0, 1.5])
-      record2 = (0.0, [-1.0, -3.5])
-
-      query_result, _ = test_utils.run_query(query, [record1, record2])
-
-      noised_averages = []
-      for _ in range(1000):
-        noised_averages.append(nest.flatten(sess.run(query_result)))
-
-      result_stddev = np.std(noised_averages, 0)
-      avg_stddev = sum_stddev / denominator
-      expected_stddev = [sum_stddev, avg_stddev, avg_stddev]
-      self.assertArrayNear(result_stddev, expected_stddev, 0.1)
-
-  @parameterized.named_parameters(
-      ('type_mismatch', [_basic_query], (1.0,), TypeError),
-      ('too_many_queries', [_basic_query, _basic_query], [1.0], ValueError),
-      ('query_too_deep', [_basic_query, [_basic_query]], [1.0, 1.0], TypeError))
-  def test_record_incompatible_with_query(
-      self, queries, record, error_type):
-    with self.assertRaises(error_type):
-      test_utils.run_query(nested_query.NestedQuery(queries), [record])
-
-
-if __name__ == '__main__':
-  tf.test.main()
--- a/tensorflow_privacy/privacy/dp_query/no_privacy_query.py
+++ b/tensorflow_privacy/privacy/dp_query/no_privacy_query.py
@ -1,70 +0,0 @@
-# Copyright 2018, The TensorFlow Authors.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#      http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-"""Implements DPQuery interface for no privacy average queries."""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-from distutils.version import LooseVersion
-import tensorflow as tf
-
-from tensorflow_privacy.privacy.dp_query import dp_query
-
-if LooseVersion(tf.__version__) < LooseVersion('2.0.0'):
-  nest = tf.contrib.framework.nest
-else:
-  nest = tf.nest
-
-
-class NoPrivacySumQuery(dp_query.SumAggregationDPQuery):
-  """Implements DPQuery interface for a sum query with no privacy.
-
-  Accumulates vectors without clipping or adding noise.
-  """
-
-  def get_noised_result(self, sample_state, global_state):
-    """See base class."""
-    return sample_state, global_state
-
-
-class NoPrivacyAverageQuery(dp_query.SumAggregationDPQuery):
-  """Implements DPQuery interface for an average query with no privacy.
-
-  Accumulates vectors and normalizes by the total number of accumulated vectors.
-  """
-
-  def initial_sample_state(self, template):
-    """See base class."""
-    return (super(NoPrivacyAverageQuery, self).initial_sample_state(template),
-            tf.constant(0.0))
-
-  def preprocess_record(self, params, record, weight=1):
-    """Multiplies record by weight."""
-    weighted_record = nest.map_structure(lambda t: weight * t, record)
-    return (weighted_record, tf.cast(weight, tf.float32))
-
-  def accumulate_record(self, params, sample_state, record, weight=1):
-    """Accumulates record, multiplying by weight."""
-    weighted_record = nest.map_structure(lambda t: weight * t, record)
-    return self.accumulate_preprocessed_record(
-        sample_state, (weighted_record, tf.cast(weight, tf.float32)))
-
-  def get_noised_result(self, sample_state, global_state):
-    """See base class."""
-    sum_state, denominator = sample_state
-
-    return (
-        nest.map_structure(lambda t: t / denominator, sum_state),
-        global_state)
--- a/tensorflow_privacy/privacy/dp_query/no_privacy_query_test.py
+++ b/tensorflow_privacy/privacy/dp_query/no_privacy_query_test.py
@ -1,77 +0,0 @@
-# Copyright 2019, The TensorFlow Authors.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#      http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-"""Tests for NoPrivacyAverageQuery."""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-from absl.testing import parameterized
-import tensorflow as tf
-
-from tensorflow_privacy.privacy.dp_query import no_privacy_query
-from tensorflow_privacy.privacy.dp_query import test_utils
-
-
-class NoPrivacyQueryTest(tf.test.TestCase, parameterized.TestCase):
-
-  def test_sum(self):
-    with self.cached_session() as sess:
-      record1 = tf.constant([2.0, 0.0])
-      record2 = tf.constant([-1.0, 1.0])
-
-      query = no_privacy_query.NoPrivacySumQuery()
-      query_result, _ = test_utils.run_query(query, [record1, record2])
-      result = sess.run(query_result)
-      expected = [1.0, 1.0]
-      self.assertAllClose(result, expected)
-
-  def test_no_privacy_average(self):
-    with self.cached_session() as sess:
-      record1 = tf.constant([5.0, 0.0])
-      record2 = tf.constant([-1.0, 2.0])
-
-      query = no_privacy_query.NoPrivacyAverageQuery()
-      query_result, _ = test_utils.run_query(query, [record1, record2])
-      result = sess.run(query_result)
-      expected = [2.0, 1.0]
-      self.assertAllClose(result, expected)
-
-  def test_no_privacy_weighted_average(self):
-    with self.cached_session() as sess:
-      record1 = tf.constant([4.0, 0.0])
-      record2 = tf.constant([-1.0, 1.0])
-
-      weights = [1, 3]
-
-      query = no_privacy_query.NoPrivacyAverageQuery()
-      query_result, _ = test_utils.run_query(
-          query, [record1, record2], weights=weights)
-      result = sess.run(query_result)
-      expected = [0.25, 0.75]
-      self.assertAllClose(result, expected)
-
-  @parameterized.named_parameters(
-      ('type_mismatch', [1.0], (1.0,), TypeError),
-      ('too_few_on_left', [1.0], [1.0, 1.0], ValueError),
-      ('too_few_on_right', [1.0, 1.0], [1.0], ValueError))
-  def test_incompatible_records(self, record1, record2, error_type):
-    query = no_privacy_query.NoPrivacySumQuery()
-    with self.assertRaises(error_type):
-      test_utils.run_query(query, [record1, record2])
-
-
-if __name__ == '__main__':
-  tf.test.main()
--- a/tensorflow_privacy/privacy/dp_query/normalized_query.py
+++ b/tensorflow_privacy/privacy/dp_query/normalized_query.py
@ -1,97 +0,0 @@
-# Copyright 2019, The TensorFlow Authors.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#      http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-"""Implements DPQuery interface for normalized queries.
-"""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import collections
-
-from distutils.version import LooseVersion
-import tensorflow as tf
-
-from tensorflow_privacy.privacy.dp_query import dp_query
-
-if LooseVersion(tf.__version__) < LooseVersion('2.0.0'):
-  nest = tf.contrib.framework.nest
-else:
-  nest = tf.nest
-
-
-class NormalizedQuery(dp_query.DPQuery):
-  """DPQuery for queries with a DPQuery numerator and fixed denominator."""
-
-  # pylint: disable=invalid-name
-  _GlobalState = collections.namedtuple(
-      '_GlobalState', ['numerator_state', 'denominator'])
-
-  def __init__(self, numerator_query, denominator):
-    """Initializer for NormalizedQuery.
-
-    Args:
-      numerator_query: A DPQuery for the numerator.
-      denominator: A value for the denominator. May be None if it will be
-        supplied via the set_denominator function before get_noised_result is
-        called.
-    """
-    self._numerator = numerator_query
-    self._denominator = denominator
-
-  def set_ledger(self, ledger):
-    """See base class."""
-    self._numerator.set_ledger(ledger)
-
-  def initial_global_state(self):
-    """See base class."""
-    if self._denominator is not None:
-      denominator = tf.cast(self._denominator, tf.float32)
-    else:
-      denominator = None
-    return self._GlobalState(
-        self._numerator.initial_global_state(), denominator)
-
-  def derive_sample_params(self, global_state):
-    """See base class."""
-    return self._numerator.derive_sample_params(global_state.numerator_state)
-
-  def initial_sample_state(self, template):
-    """See base class."""
-    # NormalizedQuery has no sample state beyond the numerator state.
-    return self._numerator.initial_sample_state(template)
-
-  def preprocess_record(self, params, record):
-    return self._numerator.preprocess_record(params, record)
-
-  def accumulate_preprocessed_record(
-      self, sample_state, preprocessed_record):
-    """See base class."""
-    return self._numerator.accumulate_preprocessed_record(
-        sample_state, preprocessed_record)
-
-  def get_noised_result(self, sample_state, global_state):
-    """See base class."""
-    noised_sum, new_sum_global_state = self._numerator.get_noised_result(
-        sample_state, global_state.numerator_state)
-    def normalize(v):
-      return tf.truediv(v, global_state.denominator)
-
-    return (nest.map_structure(normalize, noised_sum),
-            self._GlobalState(new_sum_global_state, global_state.denominator))
-
-  def merge_sample_states(self, sample_state_1, sample_state_2):
-    """See base class."""
-    return self._numerator.merge_sample_states(sample_state_1, sample_state_2)
--- a/tensorflow_privacy/privacy/dp_query/normalized_query_test.py
+++ b/tensorflow_privacy/privacy/dp_query/normalized_query_test.py
@ -1,47 +0,0 @@
-# Copyright 2019, The TensorFlow Authors.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#      http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-"""Tests for GaussianAverageQuery."""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import tensorflow as tf
-
-from tensorflow_privacy.privacy.dp_query import gaussian_query
-from tensorflow_privacy.privacy.dp_query import normalized_query
-from tensorflow_privacy.privacy.dp_query import test_utils
-
-
-class NormalizedQueryTest(tf.test.TestCase):
-
-  def test_normalization(self):
-    with self.cached_session() as sess:
-      record1 = tf.constant([-6.0, 8.0])  # Clipped to [-3.0, 4.0].
-      record2 = tf.constant([4.0, -3.0])  # Not clipped.
-
-      sum_query = gaussian_query.GaussianSumQuery(
-          l2_norm_clip=5.0, stddev=0.0)
-      query = normalized_query.NormalizedQuery(
-          numerator_query=sum_query, denominator=2.0)
-
-      query_result, _ = test_utils.run_query(query, [record1, record2])
-      result = sess.run(query_result)
-      expected = [0.5, 0.5]
-      self.assertAllClose(result, expected)
-
-
-if __name__ == '__main__':
-  tf.test.main()
--- a/tensorflow_privacy/privacy/dp_query/quantile_adaptive_clip_sum_query.py
+++ b/tensorflow_privacy/privacy/dp_query/quantile_adaptive_clip_sum_query.py
@ -1,288 +0,0 @@
-# Copyright 2019, The TensorFlow Authors.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#      http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-"""Implements DPQuery interface for adaptive clip queries.
-
-Instead of a fixed clipping norm specified in advance, the clipping norm is
-dynamically adjusted to match a target fraction of clipped updates per sample,
-where the actual fraction of clipped updates is itself estimated in a
-differentially private manner. For details see Thakkar et al., "Differentially
-Private Learning with Adaptive Clipping" [http://arxiv.org/abs/1905.03871].
-"""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import collections
-from distutils.version import LooseVersion
-
-import tensorflow as tf
-
-from tensorflow_privacy.privacy.dp_query import dp_query
-from tensorflow_privacy.privacy.dp_query import gaussian_query
-from tensorflow_privacy.privacy.dp_query import normalized_query
-
-if LooseVersion(tf.__version__) < LooseVersion('2.0.0'):
-  nest = tf.contrib.framework.nest
-else:
-  nest = tf.nest
-
-
-class QuantileAdaptiveClipSumQuery(dp_query.DPQuery):
-  """DPQuery for sum queries with adaptive clipping.
-
-  Clipping norm is tuned adaptively to converge to a value such that a specified
-  quantile of updates are clipped.
-  """
-
-  # pylint: disable=invalid-name
-  _GlobalState = collections.namedtuple(
-      '_GlobalState', [
-          'l2_norm_clip',
-          'noise_multiplier',
-          'target_unclipped_quantile',
-          'learning_rate',
-          'sum_state',
-          'clipped_fraction_state'])
-
-  # pylint: disable=invalid-name
-  _SampleState = collections.namedtuple(
-      '_SampleState', ['sum_state', 'clipped_fraction_state'])
-
-  # pylint: disable=invalid-name
-  _SampleParams = collections.namedtuple(
-      '_SampleParams', ['sum_params', 'clipped_fraction_params'])
-
-  def __init__(
-      self,
-      initial_l2_norm_clip,
-      noise_multiplier,
-      target_unclipped_quantile,
-      learning_rate,
-      clipped_count_stddev,
-      expected_num_records):
-    """Initializes the QuantileAdaptiveClipSumQuery.
-
-    Args:
-      initial_l2_norm_clip: The initial value of clipping norm.
-      noise_multiplier: The multiplier of the l2_norm_clip to make the stddev of
-        the noise added to the output of the sum query.
-      target_unclipped_quantile: The desired quantile of updates which should be
-        unclipped. I.e., a value of 0.8 means a value of l2_norm_clip should be
-        found for which approximately 20% of updates are clipped each round.
-      learning_rate: The learning rate for the clipping norm adaptation. A
-        rate of r means that the clipping norm will change by a maximum of r at
-        each step. This maximum is attained when |clip - target| is 1.0.
-      clipped_count_stddev: The stddev of the noise added to the clipped_count.
-        Since the sensitivity of the clipped count is 0.5, as a rule of thumb it
-        should be about 0.5 for reasonable privacy.
-      expected_num_records: The expected number of records per round, used to
-        estimate the clipped count quantile.
-    """
-    self._initial_l2_norm_clip = initial_l2_norm_clip
-    self._noise_multiplier = noise_multiplier
-    self._target_unclipped_quantile = target_unclipped_quantile
-    self._learning_rate = learning_rate
-
-    # Initialize sum query's global state with None, to be set later.
-    self._sum_query = gaussian_query.GaussianSumQuery(None, None)
-
-    # self._clipped_fraction_query is a DPQuery used to estimate the fraction of
-    # records that are clipped. It accumulates an indicator 0/1 of whether each
-    # record is clipped, and normalizes by the expected number of records. In
-    # practice, we accumulate clipped counts shifted by -0.5 so they are
-    # centered at zero. This makes the sensitivity of the clipped count query
-    # 0.5 instead of 1.0, since the maximum that a single record could affect
-    # the count is 0.5. Note that although the l2_norm_clip of the clipped
-    # fraction query is 0.5, no clipping will ever actually occur because the
-    # value of each record is always +/-0.5.
-    self._clipped_fraction_query = gaussian_query.GaussianAverageQuery(
-        l2_norm_clip=0.5,
-        sum_stddev=clipped_count_stddev,
-        denominator=expected_num_records)
-
-  def set_ledger(self, ledger):
-    """See base class."""
-    self._sum_query.set_ledger(ledger)
-    self._clipped_fraction_query.set_ledger(ledger)
-
-  def initial_global_state(self):
-    """See base class."""
-    initial_l2_norm_clip = tf.cast(self._initial_l2_norm_clip, tf.float32)
-    noise_multiplier = tf.cast(self._noise_multiplier, tf.float32)
-    target_unclipped_quantile = tf.cast(self._target_unclipped_quantile,
-                                        tf.float32)
-    learning_rate = tf.cast(self._learning_rate, tf.float32)
-    sum_stddev = initial_l2_norm_clip * noise_multiplier
-
-    sum_query_global_state = self._sum_query.make_global_state(
-        l2_norm_clip=initial_l2_norm_clip,
-        stddev=sum_stddev)
-
-    return self._GlobalState(
-        initial_l2_norm_clip,
-        noise_multiplier,
-        target_unclipped_quantile,
-        learning_rate,
-        sum_query_global_state,
-        self._clipped_fraction_query.initial_global_state())
-
-  def derive_sample_params(self, global_state):
-    """See base class."""
-
-    # Assign values to variables that inner sum query uses.
-    sum_params = self._sum_query.derive_sample_params(global_state.sum_state)
-    clipped_fraction_params = self._clipped_fraction_query.derive_sample_params(
-        global_state.clipped_fraction_state)
-    return self._SampleParams(sum_params, clipped_fraction_params)
-
-  def initial_sample_state(self, template):
-    """See base class."""
-    sum_state = self._sum_query.initial_sample_state(template)
-    clipped_fraction_state = self._clipped_fraction_query.initial_sample_state(
-        tf.constant(0.0))
-    return self._SampleState(sum_state, clipped_fraction_state)
-
-  def preprocess_record(self, params, record):
-    preprocessed_sum_record, global_norm = (
-        self._sum_query.preprocess_record_impl(params.sum_params, record))
-
-    # Note we are relying on the internals of GaussianSumQuery here. If we want
-    # to open this up to other kinds of inner queries we'd have to do this in a
-    # more general way.
-    l2_norm_clip = params.sum_params
-
-    # We accumulate clipped counts shifted by 0.5 so they are centered at zero.
-    # This makes the sensitivity of the clipped count query 0.5 instead of 1.0.
-    was_clipped = tf.cast(global_norm >= l2_norm_clip, tf.float32) - 0.5
-
-    preprocessed_clipped_fraction_record = (
-        self._clipped_fraction_query.preprocess_record(
-            params.clipped_fraction_params, was_clipped))
-
-    return preprocessed_sum_record, preprocessed_clipped_fraction_record
-
-  def accumulate_preprocessed_record(
-      self, sample_state, preprocessed_record, weight=1):
-    """See base class."""
-    preprocessed_sum_record, preprocessed_clipped_fraction_record = preprocessed_record
-    sum_state = self._sum_query.accumulate_preprocessed_record(
-        sample_state.sum_state, preprocessed_sum_record)
-
-    clipped_fraction_state = self._clipped_fraction_query.accumulate_preprocessed_record(
-        sample_state.clipped_fraction_state,
-        preprocessed_clipped_fraction_record)
-    return self._SampleState(sum_state, clipped_fraction_state)
-
-  def merge_sample_states(self, sample_state_1, sample_state_2):
-    """See base class."""
-    return self._SampleState(
-        self._sum_query.merge_sample_states(
-            sample_state_1.sum_state,
-            sample_state_2.sum_state),
-        self._clipped_fraction_query.merge_sample_states(
-            sample_state_1.clipped_fraction_state,
-            sample_state_2.clipped_fraction_state))
-
-  def get_noised_result(self, sample_state, global_state):
-    """See base class."""
-    gs = global_state
-
-    noised_vectors, sum_state = self._sum_query.get_noised_result(
-        sample_state.sum_state, gs.sum_state)
-    del sum_state  # Unused. To be set explicitly later.
-
-    clipped_fraction_result, new_clipped_fraction_state = (
-        self._clipped_fraction_query.get_noised_result(
-            sample_state.clipped_fraction_state,
-            gs.clipped_fraction_state))
-
-    # Unshift clipped percentile by 0.5. (See comment in accumulate_record.)
-    clipped_quantile = clipped_fraction_result + 0.5
-    unclipped_quantile = 1.0 - clipped_quantile
-
-    # Protect against out-of-range estimates.
-    unclipped_quantile = tf.minimum(1.0, tf.maximum(0.0, unclipped_quantile))
-
-    # Loss function is convex, with derivative in [-1, 1], and minimized when
-    # the true quantile matches the target.
-    loss_grad = unclipped_quantile - global_state.target_unclipped_quantile
-
-    new_l2_norm_clip = gs.l2_norm_clip - global_state.learning_rate * loss_grad
-    new_l2_norm_clip = tf.maximum(0.0, new_l2_norm_clip)
-
-    new_sum_stddev = new_l2_norm_clip * global_state.noise_multiplier
-    new_sum_query_global_state = self._sum_query.make_global_state(
-        l2_norm_clip=new_l2_norm_clip,
-        stddev=new_sum_stddev)
-
-    new_global_state = global_state._replace(
-        l2_norm_clip=new_l2_norm_clip,
-        sum_state=new_sum_query_global_state,
-        clipped_fraction_state=new_clipped_fraction_state)
-
-    return noised_vectors, new_global_state
-
-
-class QuantileAdaptiveClipAverageQuery(normalized_query.NormalizedQuery):
-  """DPQuery for average queries with adaptive clipping.
-
-  Clipping norm is tuned adaptively to converge to a value such that a specified
-  quantile of updates are clipped.
-
-  Note that we use "fixed-denominator" estimation: the denominator should be
-  specified as the expected number of records per sample. Accumulating the
-  denominator separately would also be possible but would be produce a higher
-  variance estimator.
-  """
-
-  def __init__(
-      self,
-      initial_l2_norm_clip,
-      noise_multiplier,
-      denominator,
-      target_unclipped_quantile,
-      learning_rate,
-      clipped_count_stddev,
-      expected_num_records):
-    """Initializes the AdaptiveClipAverageQuery.
-
-    Args:
-      initial_l2_norm_clip: The initial value of clipping norm.
-      noise_multiplier: The multiplier of the l2_norm_clip to make the stddev of
-        the noise.
-      denominator: The normalization constant (applied after noise is added to
-        the sum).
-      target_unclipped_quantile: The desired quantile of updates which should be
-        clipped.
-      learning_rate: The learning rate for the clipping norm adaptation. A
-        rate of r means that the clipping norm will change by a maximum of r at
-        each step. The maximum is attained when |clip - target| is 1.0.
-      clipped_count_stddev: The stddev of the noise added to the clipped_count.
-        Since the sensitivity of the clipped count is 0.5, as a rule of thumb it
-        should be about 0.5 for reasonable privacy.
-      expected_num_records: The expected number of records, used to estimate the
-        clipped count quantile.
-    """
-    numerator_query = QuantileAdaptiveClipSumQuery(
-        initial_l2_norm_clip,
-        noise_multiplier,
-        target_unclipped_quantile,
-        learning_rate,
-        clipped_count_stddev,
-        expected_num_records)
-    super(QuantileAdaptiveClipAverageQuery, self).__init__(
-        numerator_query=numerator_query,
-        denominator=denominator)
--- a/tensorflow_privacy/privacy/dp_query/quantile_adaptive_clip_sum_query_test.py
+++ b/tensorflow_privacy/privacy/dp_query/quantile_adaptive_clip_sum_query_test.py
@ -1,296 +0,0 @@
-# Copyright 2019, The TensorFlow Authors.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#      http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-"""Tests for QuantileAdaptiveClipSumQuery."""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import numpy as np
-import tensorflow as tf
-
-from tensorflow_privacy.privacy.analysis import privacy_ledger
-from tensorflow_privacy.privacy.dp_query import quantile_adaptive_clip_sum_query
-from tensorflow_privacy.privacy.dp_query import test_utils
-
-tf.enable_eager_execution()
-
-
-class QuantileAdaptiveClipSumQueryTest(tf.test.TestCase):
-
-  def test_sum_no_clip_no_noise(self):
-    record1 = tf.constant([2.0, 0.0])
-    record2 = tf.constant([-1.0, 1.0])
-
-    query = quantile_adaptive_clip_sum_query.QuantileAdaptiveClipSumQuery(
-        initial_l2_norm_clip=10.0,
-        noise_multiplier=0.0,
-        target_unclipped_quantile=1.0,
-        learning_rate=0.0,
-        clipped_count_stddev=0.0,
-        expected_num_records=2.0)
-    query_result, _ = test_utils.run_query(query, [record1, record2])
-    result = query_result.numpy()
-    expected = [1.0, 1.0]
-    self.assertAllClose(result, expected)
-
-  def test_sum_with_clip_no_noise(self):
-    record1 = tf.constant([-6.0, 8.0])  # Clipped to [-3.0, 4.0].
-    record2 = tf.constant([4.0, -3.0])  # Not clipped.
-
-    query = quantile_adaptive_clip_sum_query.QuantileAdaptiveClipSumQuery(
-        initial_l2_norm_clip=5.0,
-        noise_multiplier=0.0,
-        target_unclipped_quantile=1.0,
-        learning_rate=0.0,
-        clipped_count_stddev=0.0,
-        expected_num_records=2.0)
-
-    query_result, _ = test_utils.run_query(query, [record1, record2])
-    result = query_result.numpy()
-    expected = [1.0, 1.0]
-    self.assertAllClose(result, expected)
-
-  def test_sum_with_noise(self):
-    record1, record2 = 2.71828, 3.14159
-    stddev = 1.0
-    clip = 5.0
-
-    query = quantile_adaptive_clip_sum_query.QuantileAdaptiveClipSumQuery(
-        initial_l2_norm_clip=clip,
-        noise_multiplier=stddev / clip,
-        target_unclipped_quantile=1.0,
-        learning_rate=0.0,
-        clipped_count_stddev=0.0,
-        expected_num_records=2.0)
-
-    noised_sums = []
-    for _ in xrange(1000):
-      query_result, _ = test_utils.run_query(query, [record1, record2])
-      noised_sums.append(query_result.numpy())
-
-    result_stddev = np.std(noised_sums)
-    self.assertNear(result_stddev, stddev, 0.1)
-
-  def test_average_no_noise(self):
-    record1 = tf.constant([5.0, 0.0])   # Clipped to [3.0, 0.0].
-    record2 = tf.constant([-1.0, 2.0])  # Not clipped.
-
-    query = quantile_adaptive_clip_sum_query.QuantileAdaptiveClipAverageQuery(
-        initial_l2_norm_clip=3.0,
-        noise_multiplier=0.0,
-        denominator=2.0,
-        target_unclipped_quantile=1.0,
-        learning_rate=0.0,
-        clipped_count_stddev=0.0,
-        expected_num_records=2.0)
-    query_result, _ = test_utils.run_query(query, [record1, record2])
-    result = query_result.numpy()
-    expected_average = [1.0, 1.0]
-    self.assertAllClose(result, expected_average)
-
-  def test_average_with_noise(self):
-    record1, record2 = 2.71828, 3.14159
-    sum_stddev = 1.0
-    denominator = 2.0
-    clip = 3.0
-
-    query = quantile_adaptive_clip_sum_query.QuantileAdaptiveClipAverageQuery(
-        initial_l2_norm_clip=clip,
-        noise_multiplier=sum_stddev / clip,
-        denominator=denominator,
-        target_unclipped_quantile=1.0,
-        learning_rate=0.0,
-        clipped_count_stddev=0.0,
-        expected_num_records=2.0)
-
-    noised_averages = []
-    for _ in range(1000):
-      query_result, _ = test_utils.run_query(query, [record1, record2])
-      noised_averages.append(query_result.numpy())
-
-    result_stddev = np.std(noised_averages)
-    avg_stddev = sum_stddev / denominator
-    self.assertNear(result_stddev, avg_stddev, 0.1)
-
-  def test_adaptation_target_zero(self):
-    record1 = tf.constant([8.5])
-    record2 = tf.constant([-7.25])
-
-    query = quantile_adaptive_clip_sum_query.QuantileAdaptiveClipSumQuery(
-        initial_l2_norm_clip=10.0,
-        noise_multiplier=0.0,
-        target_unclipped_quantile=0.0,
-        learning_rate=1.0,
-        clipped_count_stddev=0.0,
-        expected_num_records=2.0)
-
-    global_state = query.initial_global_state()
-
-    initial_clip = global_state.l2_norm_clip
-    self.assertAllClose(initial_clip, 10.0)
-
-    # On the first two iterations, nothing is clipped, so the clip goes down
-    # by 1.0 (the learning rate). When the clip reaches 8.0, one record is
-    # clipped, so the clip goes down by only 0.5. After two more iterations,
-    # both records are clipped, and the clip norm stays there (at 7.0).
-
-    expected_sums = [1.25, 1.25, 0.75, 0.25, 0.0]
-    expected_clips = [9.0, 8.0, 7.5, 7.0, 7.0]
-    for expected_sum, expected_clip in zip(expected_sums, expected_clips):
-      actual_sum, global_state = test_utils.run_query(
-          query, [record1, record2], global_state)
-
-      actual_clip = global_state.l2_norm_clip
-
-      self.assertAllClose(actual_clip.numpy(), expected_clip)
-      self.assertAllClose(actual_sum.numpy(), (expected_sum,))
-
-  def test_adaptation_target_one(self):
-    record1 = tf.constant([-1.5])
-    record2 = tf.constant([2.75])
-
-    query = quantile_adaptive_clip_sum_query.QuantileAdaptiveClipSumQuery(
-        initial_l2_norm_clip=0.0,
-        noise_multiplier=0.0,
-        target_unclipped_quantile=1.0,
-        learning_rate=1.0,
-        clipped_count_stddev=0.0,
-        expected_num_records=2.0)
-
-    global_state = query.initial_global_state()
-
-    initial_clip = global_state.l2_norm_clip
-    self.assertAllClose(initial_clip, 0.0)
-
-    # On the first two iterations, both are clipped, so the clip goes up
-    # by 1.0 (the learning rate). When the clip reaches 2.0, only one record is
-    # clipped, so the clip goes up by only 0.5. After two more iterations,
-    # both records are clipped, and the clip norm stays there (at 3.0).
-
-    expected_sums = [0.0, 0.0, 0.5, 1.0, 1.25]
-    expected_clips = [1.0, 2.0, 2.5, 3.0, 3.0]
-    for expected_sum, expected_clip in zip(expected_sums, expected_clips):
-      actual_sum, global_state = test_utils.run_query(
-          query, [record1, record2], global_state)
-
-      actual_clip = global_state.l2_norm_clip
-
-      self.assertAllClose(actual_clip.numpy(), expected_clip)
-      self.assertAllClose(actual_sum.numpy(), (expected_sum,))
-
-  def test_adaptation_linspace(self):
-    # 100 records equally spaced from 0 to 10 in 0.1 increments.
-    # Test that with a decaying learning rate we converge to the correct
-    # median with error at most 0.1.
-    records = [tf.constant(x) for x in np.linspace(
-        0.0, 10.0, num=21, dtype=np.float32)]
-
-    learning_rate = tf.Variable(1.0)
-
-    query = quantile_adaptive_clip_sum_query.QuantileAdaptiveClipSumQuery(
-        initial_l2_norm_clip=0.0,
-        noise_multiplier=0.0,
-        target_unclipped_quantile=0.5,
-        learning_rate=learning_rate,
-        clipped_count_stddev=0.0,
-        expected_num_records=2.0)
-
-    global_state = query.initial_global_state()
-
-    for t in range(50):
-      tf.assign(learning_rate, 1.0 / np.sqrt(t+1))
-      _, global_state = test_utils.run_query(query, records, global_state)
-
-      actual_clip = global_state.l2_norm_clip
-
-      if t > 40:
-        self.assertNear(actual_clip, 5.0, 0.25)
-
-  def test_adaptation_all_equal(self):
-    # 100 equal records. Test that with a decaying learning rate we converge to
-    # that record and bounce around it.
-    records = [tf.constant(5.0)] * 20
-
-    learning_rate = tf.Variable(1.0)
-
-    query = quantile_adaptive_clip_sum_query.QuantileAdaptiveClipSumQuery(
-        initial_l2_norm_clip=0.0,
-        noise_multiplier=0.0,
-        target_unclipped_quantile=0.5,
-        learning_rate=learning_rate,
-        clipped_count_stddev=0.0,
-        expected_num_records=2.0)
-
-    global_state = query.initial_global_state()
-
-    for t in range(50):
-      tf.assign(learning_rate, 1.0 / np.sqrt(t+1))
-      _, global_state = test_utils.run_query(query, records, global_state)
-
-      actual_clip = global_state.l2_norm_clip
-
-      if t > 40:
-        self.assertNear(actual_clip, 5.0, 0.25)
-
-  def test_ledger(self):
-    record1 = tf.constant([8.5])
-    record2 = tf.constant([-7.25])
-
-    population_size = tf.Variable(0)
-    selection_probability = tf.Variable(1.0)
-
-    query = quantile_adaptive_clip_sum_query.QuantileAdaptiveClipSumQuery(
-        initial_l2_norm_clip=10.0,
-        noise_multiplier=1.0,
-        target_unclipped_quantile=0.0,
-        learning_rate=1.0,
-        clipped_count_stddev=0.0,
-        expected_num_records=2.0)
-
-    query = privacy_ledger.QueryWithLedger(
-        query, population_size, selection_probability)
-
-    # First sample.
-    tf.assign(population_size, 10)
-    tf.assign(selection_probability, 0.1)
-    _, global_state = test_utils.run_query(query, [record1, record2])
-
-    expected_queries = [[10.0, 10.0], [0.5, 0.0]]
-    formatted = query.ledger.get_formatted_ledger_eager()
-    sample_1 = formatted[0]
-    self.assertAllClose(sample_1.population_size, 10.0)
-    self.assertAllClose(sample_1.selection_probability, 0.1)
-    self.assertAllClose(sample_1.queries, expected_queries)
-
-    # Second sample.
-    tf.assign(population_size, 20)
-    tf.assign(selection_probability, 0.2)
-    test_utils.run_query(query, [record1, record2], global_state)
-
-    formatted = query.ledger.get_formatted_ledger_eager()
-    sample_1, sample_2 = formatted
-    self.assertAllClose(sample_1.population_size, 10.0)
-    self.assertAllClose(sample_1.selection_probability, 0.1)
-    self.assertAllClose(sample_1.queries, expected_queries)
-
-    expected_queries_2 = [[9.0, 9.0], [0.5, 0.0]]
-    self.assertAllClose(sample_2.population_size, 20.0)
-    self.assertAllClose(sample_2.selection_probability, 0.2)
-    self.assertAllClose(sample_2.queries, expected_queries_2)
-
-
-if __name__ == '__main__':
-  tf.test.main()
--- a/tensorflow_privacy/privacy/dp_query/test_utils.py
+++ b/tensorflow_privacy/privacy/dp_query/test_utils.py
@ -1,49 +0,0 @@
-# Copyright 2019, The TensorFlow Authors.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#      http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-"""Utility methods for testing private queries.
-
-Utility methods for testing private queries.
-"""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-
-def run_query(query, records, global_state=None, weights=None):
-  """Executes query on the given set of records as a single sample.
-
-  Args:
-    query: A PrivateQuery to run.
-    records: An iterable containing records to pass to the query.
-    global_state: The current global state. If None, an initial global state is
-      generated.
-    weights: An optional iterable containing the weights of the records.
-
-  Returns:
-    A tuple (result, new_global_state) where "result" is the result of the
-      query and "new_global_state" is the updated global state.
-  """
-  if not global_state:
-    global_state = query.initial_global_state()
-  params = query.derive_sample_params(global_state)
-  sample_state = query.initial_sample_state(next(iter(records)))
-  if weights is None:
-    for record in records:
-      sample_state = query.accumulate_record(params, sample_state, record)
-  else:
-    for weight, record in zip(weights, records):
-      sample_state = query.accumulate_record(
-          params, sample_state, record, weight)
-  return query.get_noised_result(sample_state, global_state)
--- a/tensorflow_privacy/privacy/optimizers/init.py
+++ b/tensorflow_privacy/privacy/optimizers/init.py
--- a/tensorflow_privacy/privacy/optimizers/dp_optimizer.py
+++ b/tensorflow_privacy/privacy/optimizers/dp_optimizer.py
@ -1,239 +0,0 @@
-# Copyright 2018, The TensorFlow Authors.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#      http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-"""Differentially private optimizers for TensorFlow."""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-from distutils.version import LooseVersion
-import tensorflow as tf
-
-from tensorflow_privacy.privacy.analysis import privacy_ledger
-from tensorflow_privacy.privacy.dp_query import gaussian_query
-
-if LooseVersion(tf.__version__) < LooseVersion('2.0.0'):
-  nest = tf.contrib.framework.nest
-else:
-  nest = tf.nest
-
-
-def make_optimizer_class(cls):
-  """Constructs a DP optimizer class from an existing one."""
-  if LooseVersion(tf.__version__) < LooseVersion('2.0.0'):
-    parent_code = tf.train.Optimizer.compute_gradients.__code__
-    child_code = cls.compute_gradients.__code__
-    GATE_OP = tf.train.Optimizer.GATE_OP  # pylint: disable=invalid-name
-  else:
-    parent_code = tf.optimizers.Optimizer._compute_gradients.__code__  # pylint: disable=protected-access
-    child_code = cls._compute_gradients.__code__  # pylint: disable=protected-access
-    GATE_OP = None  # pylint: disable=invalid-name
-  if child_code is not parent_code:
-    tf.logging.warning(
-        'WARNING: Calling make_optimizer_class() on class %s that overrides '
-        'method compute_gradients(). Check to ensure that '
-        'make_optimizer_class() does not interfere with overridden version.',
-        cls.__name__)
-
-  class DPOptimizerClass(cls):
-    """Differentially private subclass of given class cls."""
-
-    def __init__(
-        self,
-        dp_sum_query,
-        num_microbatches=None,
-        unroll_microbatches=False,
-        *args,  # pylint: disable=keyword-arg-before-vararg, g-doc-args
-        **kwargs):
-      """Initialize the DPOptimizerClass.
-
-      Args:
-        dp_sum_query: DPQuery object, specifying differential privacy
-          mechanism to use.
-        num_microbatches: How many microbatches into which the minibatch is
-          split. If None, will default to the size of the minibatch, and
-          per-example gradients will be computed.
-        unroll_microbatches: If true, processes microbatches within a Python
-          loop instead of a tf.while_loop. Can be used if using a tf.while_loop
-          raises an exception.
-      """
-      super(DPOptimizerClass, self).__init__(*args, **kwargs)
-      self._dp_sum_query = dp_sum_query
-      self._num_microbatches = num_microbatches
-      self._global_state = self._dp_sum_query.initial_global_state()
-      # TODO(b/122613513): Set unroll_microbatches=True to avoid this bug.
-      # Beware: When num_microbatches is large (>100), enabling this parameter
-      # may cause an OOM error.
-      self._unroll_microbatches = unroll_microbatches
-
-    def compute_gradients(self,
-                          loss,
-                          var_list,
-                          gate_gradients=GATE_OP,
-                          aggregation_method=None,
-                          colocate_gradients_with_ops=False,
-                          grad_loss=None,
-                          gradient_tape=None):
-      if callable(loss):
-        # TF is running in Eager mode, check we received a vanilla tape.
-        if not gradient_tape:
-          raise ValueError('When in Eager mode, a tape needs to be passed.')
-
-        vector_loss = loss()
-        if self._num_microbatches is None:
-          self._num_microbatches = tf.shape(vector_loss)[0]
-        sample_state = self._dp_sum_query.initial_sample_state(var_list)
-        microbatches_losses = tf.reshape(vector_loss,
-                                         [self._num_microbatches, -1])
-        sample_params = (
-            self._dp_sum_query.derive_sample_params(self._global_state))
-
-        def process_microbatch(i, sample_state):
-          """Process one microbatch (record) with privacy helper."""
-          microbatch_loss = tf.reduce_mean(tf.gather(microbatches_losses, [i]))
-          grads = gradient_tape.gradient(microbatch_loss, var_list)
-          sample_state = self._dp_sum_query.accumulate_record(
-              sample_params, sample_state, grads)
-          return sample_state
-
-        for idx in range(self._num_microbatches):
-          sample_state = process_microbatch(idx, sample_state)
-
-        grad_sums, self._global_state = (
-            self._dp_sum_query.get_noised_result(
-                sample_state, self._global_state))
-
-        def normalize(v):
-          return v / tf.cast(self._num_microbatches, tf.float32)
-
-        final_grads = nest.map_structure(normalize, grad_sums)
-
-        grads_and_vars = list(zip(final_grads, var_list))
-        return grads_and_vars
-
-      else:
-        # TF is running in graph mode, check we did not receive a gradient tape.
-        if gradient_tape:
-          raise ValueError('When in graph mode, a tape should not be passed.')
-
-        # Note: it would be closer to the correct i.i.d. sampling of records if
-        # we sampled each microbatch from the appropriate binomial distribution,
-        # although that still wouldn't be quite correct because it would be
-        # sampling from the dataset without replacement.
-        if self._num_microbatches is None:
-          self._num_microbatches = tf.shape(loss)[0]
-
-        microbatches_losses = tf.reshape(loss, [self._num_microbatches, -1])
-        sample_params = (
-            self._dp_sum_query.derive_sample_params(self._global_state))
-
-        def process_microbatch(i, sample_state):
-          """Process one microbatch (record) with privacy helper."""
-          grads, _ = zip(*super(cls, self).compute_gradients(
-              tf.reduce_mean(tf.gather(microbatches_losses,
-                                       [i])), var_list, gate_gradients,
-              aggregation_method, colocate_gradients_with_ops, grad_loss))
-          grads_list = [
-              g if g is not None else tf.zeros_like(v)
-              for (g, v) in zip(list(grads), var_list)
-          ]
-          sample_state = self._dp_sum_query.accumulate_record(
-              sample_params, sample_state, grads_list)
-          return sample_state
-
-        if var_list is None:
-          var_list = (
-              tf.trainable_variables() + tf.get_collection(
-                  tf.GraphKeys.TRAINABLE_RESOURCE_VARIABLES))
-
-        sample_state = self._dp_sum_query.initial_sample_state(var_list)
-
-        if self._unroll_microbatches:
-          for idx in range(self._num_microbatches):
-            sample_state = process_microbatch(idx, sample_state)
-        else:
-          # Use of while_loop here requires that sample_state be a nested
-          # structure of tensors. In general, we would prefer to allow it to be
-          # an arbitrary opaque type.
-          cond_fn = lambda i, _: tf.less(i, self._num_microbatches)
-          body_fn = lambda i, state: [tf.add(i, 1), process_microbatch(i, state)]  # pylint: disable=line-too-long
-          idx = tf.constant(0)
-          _, sample_state = tf.while_loop(cond_fn, body_fn, [idx, sample_state])
-
-        grad_sums, self._global_state = (
-            self._dp_sum_query.get_noised_result(
-                sample_state, self._global_state))
-
-        def normalize(v):
-          return tf.truediv(v, tf.cast(self._num_microbatches, tf.float32))
-
-        final_grads = nest.map_structure(normalize, grad_sums)
-
-        return list(zip(final_grads, var_list))
-
-  return DPOptimizerClass
-
-
-def make_gaussian_optimizer_class(cls):
-  """Constructs a DP optimizer with Gaussian averaging of updates."""
-
-  class DPGaussianOptimizerClass(make_optimizer_class(cls)):
-    """DP subclass of given class cls using Gaussian averaging."""
-
-    def __init__(
-        self,
-        l2_norm_clip,
-        noise_multiplier,
-        num_microbatches=None,
-        ledger=None,
-        unroll_microbatches=False,
-        *args,  # pylint: disable=keyword-arg-before-vararg
-        **kwargs):
-      dp_sum_query = gaussian_query.GaussianSumQuery(
-          l2_norm_clip, l2_norm_clip * noise_multiplier)
-
-      if ledger:
-        dp_sum_query = privacy_ledger.QueryWithLedger(dp_sum_query,
-                                                      ledger=ledger)
-
-      super(DPGaussianOptimizerClass, self).__init__(
-          dp_sum_query,
-          num_microbatches,
-          unroll_microbatches,
-          *args,
-          **kwargs)
-
-    @property
-    def ledger(self):
-      return self._dp_sum_query.ledger
-
-  return DPGaussianOptimizerClass
-
-if LooseVersion(tf.__version__) < LooseVersion('2.0.0'):
-  AdagradOptimizer = tf.train.AdagradOptimizer
-  AdamOptimizer = tf.train.AdamOptimizer
-  GradientDescentOptimizer = tf.train.GradientDescentOptimizer
-else:
-  AdagradOptimizer = tf.optimizers.Adagrad
-  AdamOptimizer = tf.optimizers.Adam
-  GradientDescentOptimizer = tf.optimizers.SGD  # pylint: disable=invalid-name
-
-DPAdagradOptimizer = make_optimizer_class(AdagradOptimizer)
-DPAdamOptimizer = make_optimizer_class(AdamOptimizer)
-DPGradientDescentOptimizer = make_optimizer_class(GradientDescentOptimizer)
-
-DPAdagradGaussianOptimizer = make_gaussian_optimizer_class(AdagradOptimizer)
-DPAdamGaussianOptimizer = make_gaussian_optimizer_class(AdamOptimizer)
-DPGradientDescentGaussianOptimizer = make_gaussian_optimizer_class(
-    GradientDescentOptimizer)
--- a/tensorflow_privacy/privacy/optimizers/dp_optimizer_eager_test.py
+++ b/tensorflow_privacy/privacy/optimizers/dp_optimizer_eager_test.py
@ -1,130 +0,0 @@
-# Copyright 2019, The TensorFlow Authors.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#      http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-"""Tests for differentially private optimizers."""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-from absl.testing import parameterized
-import numpy as np
-import tensorflow as tf
-
-from tensorflow_privacy.privacy.analysis import privacy_ledger
-from tensorflow_privacy.privacy.dp_query import gaussian_query
-from tensorflow_privacy.privacy.optimizers import dp_optimizer
-
-
-class DPOptimizerEagerTest(tf.test.TestCase, parameterized.TestCase):
-
-  def setUp(self):
-    tf.enable_eager_execution()
-    super(DPOptimizerEagerTest, self).setUp()
-
-  def _loss_fn(self, val0, val1):
-    return 0.5 * tf.reduce_sum(tf.squared_difference(val0, val1), axis=1)
-
-  @parameterized.named_parameters(
-      ('DPGradientDescent 1', dp_optimizer.DPGradientDescentOptimizer, 1,
-       [-2.5, -2.5]),
-      ('DPGradientDescent 2', dp_optimizer.DPGradientDescentOptimizer, 2,
-       [-2.5, -2.5]),
-      ('DPGradientDescent 4', dp_optimizer.DPGradientDescentOptimizer, 4,
-       [-2.5, -2.5]),
-      ('DPAdagrad 1', dp_optimizer.DPAdagradOptimizer, 1, [-2.5, -2.5]),
-      ('DPAdagrad 2', dp_optimizer.DPAdagradOptimizer, 2, [-2.5, -2.5]),
-      ('DPAdagrad 4', dp_optimizer.DPAdagradOptimizer, 4, [-2.5, -2.5]),
-      ('DPAdam 1', dp_optimizer.DPAdamOptimizer, 1, [-2.5, -2.5]),
-      ('DPAdam 2', dp_optimizer.DPAdamOptimizer, 2, [-2.5, -2.5]),
-      ('DPAdam 4', dp_optimizer.DPAdamOptimizer, 4, [-2.5, -2.5]))
-  def testBaseline(self, cls, num_microbatches, expected_answer):
-    with tf.GradientTape(persistent=True) as gradient_tape:
-      var0 = tf.Variable([1.0, 2.0])
-      data0 = tf.Variable([[3.0, 4.0], [5.0, 6.0], [7.0, 8.0], [-1.0, 0.0]])
-
-      dp_sum_query = gaussian_query.GaussianSumQuery(1.0e9, 0.0)
-      dp_sum_query = privacy_ledger.QueryWithLedger(
-          dp_sum_query, 1e6, num_microbatches / 1e6)
-
-      opt = cls(
-          dp_sum_query,
-          num_microbatches=num_microbatches,
-          learning_rate=2.0)
-
-      self.evaluate(tf.global_variables_initializer())
-      # Fetch params to validate initial values
-      self.assertAllClose([1.0, 2.0], self.evaluate(var0))
-
-      # Expected gradient is sum of differences divided by number of
-      # microbatches.
-      grads_and_vars = opt.compute_gradients(
-          lambda: self._loss_fn(var0, data0), [var0],
-          gradient_tape=gradient_tape)
-      self.assertAllCloseAccordingToType(expected_answer, grads_and_vars[0][0])
-
-  @parameterized.named_parameters(
-      ('DPGradientDescent', dp_optimizer.DPGradientDescentOptimizer),
-      ('DPAdagrad', dp_optimizer.DPAdagradOptimizer),
-      ('DPAdam', dp_optimizer.DPAdamOptimizer))
-  def testClippingNorm(self, cls):
-    with tf.GradientTape(persistent=True) as gradient_tape:
-      var0 = tf.Variable([0.0, 0.0])
-      data0 = tf.Variable([[3.0, 4.0], [6.0, 8.0]])
-
-      dp_sum_query = gaussian_query.GaussianSumQuery(1.0, 0.0)
-      dp_sum_query = privacy_ledger.QueryWithLedger(dp_sum_query, 1e6, 1 / 1e6)
-
-      opt = cls(dp_sum_query, num_microbatches=1, learning_rate=2.0)
-
-      self.evaluate(tf.global_variables_initializer())
-      # Fetch params to validate initial values
-      self.assertAllClose([0.0, 0.0], self.evaluate(var0))
-
-      # Expected gradient is sum of differences.
-      grads_and_vars = opt.compute_gradients(
-          lambda: self._loss_fn(var0, data0), [var0],
-          gradient_tape=gradient_tape)
-      self.assertAllCloseAccordingToType([-0.6, -0.8], grads_and_vars[0][0])
-
-  @parameterized.named_parameters(
-      ('DPGradientDescent', dp_optimizer.DPGradientDescentOptimizer),
-      ('DPAdagrad', dp_optimizer.DPAdagradOptimizer),
-      ('DPAdam', dp_optimizer.DPAdamOptimizer))
-  def testNoiseMultiplier(self, cls):
-    with tf.GradientTape(persistent=True) as gradient_tape:
-      var0 = tf.Variable([0.0])
-      data0 = tf.Variable([[0.0]])
-
-      dp_sum_query = gaussian_query.GaussianSumQuery(4.0, 8.0)
-      dp_sum_query = privacy_ledger.QueryWithLedger(dp_sum_query, 1e6, 1 / 1e6)
-
-      opt = cls(dp_sum_query, num_microbatches=1, learning_rate=2.0)
-
-      self.evaluate(tf.global_variables_initializer())
-      # Fetch params to validate initial values
-      self.assertAllClose([0.0], self.evaluate(var0))
-
-      grads = []
-      for _ in range(1000):
-        grads_and_vars = opt.compute_gradients(
-            lambda: self._loss_fn(var0, data0), [var0],
-            gradient_tape=gradient_tape)
-        grads.append(grads_and_vars[0][0])
-
-      # Test standard deviation is close to l2_norm_clip * noise_multiplier.
-      self.assertNear(np.std(grads), 2.0 * 4.0, 0.5)
-
-
-if __name__ == '__main__':
-  tf.test.main()
--- a/tensorflow_privacy/privacy/optimizers/dp_optimizer_test.py
+++ b/tensorflow_privacy/privacy/optimizers/dp_optimizer_test.py
@ -1,241 +0,0 @@
-# Copyright 2019, The TensorFlow Authors.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#      http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-"""Tests for differentially private optimizers."""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-from absl.testing import parameterized
-import mock
-import numpy as np
-import tensorflow as tf
-
-from tensorflow_privacy.privacy.analysis import privacy_ledger
-from tensorflow_privacy.privacy.dp_query import gaussian_query
-from tensorflow_privacy.privacy.optimizers import dp_optimizer
-
-
-class DPOptimizerTest(tf.test.TestCase, parameterized.TestCase):
-
-  def _loss(self, val0, val1):
-    """Loss function that is minimized at the mean of the input points."""
-    return 0.5 * tf.reduce_sum(tf.squared_difference(val0, val1), axis=1)
-
-  # Parameters for testing: optimizer, num_microbatches, expected answer.
-  @parameterized.named_parameters(
-      ('DPGradientDescent 1', dp_optimizer.DPGradientDescentOptimizer, 1,
-       [-2.5, -2.5]),
-      ('DPGradientDescent 2', dp_optimizer.DPGradientDescentOptimizer, 2,
-       [-2.5, -2.5]),
-      ('DPGradientDescent 4', dp_optimizer.DPGradientDescentOptimizer, 4,
-       [-2.5, -2.5]),
-      ('DPAdagrad 1', dp_optimizer.DPAdagradOptimizer, 1, [-2.5, -2.5]),
-      ('DPAdagrad 2', dp_optimizer.DPAdagradOptimizer, 2, [-2.5, -2.5]),
-      ('DPAdagrad 4', dp_optimizer.DPAdagradOptimizer, 4, [-2.5, -2.5]),
-      ('DPAdam 1', dp_optimizer.DPAdamOptimizer, 1, [-2.5, -2.5]),
-      ('DPAdam 2', dp_optimizer.DPAdamOptimizer, 2, [-2.5, -2.5]),
-      ('DPAdam 4', dp_optimizer.DPAdamOptimizer, 4, [-2.5, -2.5]))
-  def testBaseline(self, cls, num_microbatches, expected_answer):
-    with self.cached_session() as sess:
-      var0 = tf.Variable([1.0, 2.0])
-      data0 = tf.Variable([[3.0, 4.0], [5.0, 6.0], [7.0, 8.0], [-1.0, 0.0]])
-
-      dp_sum_query = gaussian_query.GaussianSumQuery(1.0e9, 0.0)
-      dp_sum_query = privacy_ledger.QueryWithLedger(
-          dp_sum_query, 1e6, num_microbatches / 1e6)
-
-      opt = cls(
-          dp_sum_query,
-          num_microbatches=num_microbatches,
-          learning_rate=2.0)
-
-      self.evaluate(tf.global_variables_initializer())
-      # Fetch params to validate initial values
-      self.assertAllClose([1.0, 2.0], self.evaluate(var0))
-
-      # Expected gradient is sum of differences divided by number of
-      # microbatches.
-      gradient_op = opt.compute_gradients(self._loss(data0, var0), [var0])
-      grads_and_vars = sess.run(gradient_op)
-      self.assertAllCloseAccordingToType(expected_answer, grads_and_vars[0][0])
-
-  @parameterized.named_parameters(
-      ('DPGradientDescent', dp_optimizer.DPGradientDescentOptimizer),
-      ('DPAdagrad', dp_optimizer.DPAdagradOptimizer),
-      ('DPAdam', dp_optimizer.DPAdamOptimizer))
-  def testClippingNorm(self, cls):
-    with self.cached_session() as sess:
-      var0 = tf.Variable([0.0, 0.0])
-      data0 = tf.Variable([[3.0, 4.0], [6.0, 8.0]])
-
-      dp_sum_query = gaussian_query.GaussianSumQuery(1.0, 0.0)
-      dp_sum_query = privacy_ledger.QueryWithLedger(dp_sum_query, 1e6, 1 / 1e6)
-
-      opt = cls(dp_sum_query, num_microbatches=1, learning_rate=2.0)
-
-      self.evaluate(tf.global_variables_initializer())
-      # Fetch params to validate initial values
-      self.assertAllClose([0.0, 0.0], self.evaluate(var0))
-
-      # Expected gradient is sum of differences.
-      gradient_op = opt.compute_gradients(self._loss(data0, var0), [var0])
-      grads_and_vars = sess.run(gradient_op)
-      self.assertAllCloseAccordingToType([-0.6, -0.8], grads_and_vars[0][0])
-
-  @parameterized.named_parameters(
-      ('DPGradientDescent', dp_optimizer.DPGradientDescentOptimizer),
-      ('DPAdagrad', dp_optimizer.DPAdagradOptimizer),
-      ('DPAdam', dp_optimizer.DPAdamOptimizer))
-  def testNoiseMultiplier(self, cls):
-    with self.cached_session() as sess:
-      var0 = tf.Variable([0.0])
-      data0 = tf.Variable([[0.0]])
-
-      dp_sum_query = gaussian_query.GaussianSumQuery(4.0, 8.0)
-      dp_sum_query = privacy_ledger.QueryWithLedger(dp_sum_query, 1e6, 1 / 1e6)
-
-      opt = cls(dp_sum_query, num_microbatches=1, learning_rate=2.0)
-
-      self.evaluate(tf.global_variables_initializer())
-      # Fetch params to validate initial values
-      self.assertAllClose([0.0], self.evaluate(var0))
-
-      gradient_op = opt.compute_gradients(self._loss(data0, var0), [var0])
-      grads = []
-      for _ in range(1000):
-        grads_and_vars = sess.run(gradient_op)
-        grads.append(grads_and_vars[0][0])
-
-      # Test standard deviation is close to l2_norm_clip * noise_multiplier.
-      self.assertNear(np.std(grads), 2.0 * 4.0, 0.5)
-
-  @mock.patch.object(tf, 'logging')
-  def testComputeGradientsOverrideWarning(self, mock_logging):
-
-    class SimpleOptimizer(tf.train.Optimizer):
-
-      def compute_gradients(self):
-        return 0
-
-    dp_optimizer.make_optimizer_class(SimpleOptimizer)
-    mock_logging.warning.assert_called_once_with(
-        'WARNING: Calling make_optimizer_class() on class %s that overrides '
-        'method compute_gradients(). Check to ensure that '
-        'make_optimizer_class() does not interfere with overridden version.',
-        'SimpleOptimizer')
-
-  def testEstimator(self):
-    """Tests that DP optimizers work with tf.estimator."""
-
-    def linear_model_fn(features, labels, mode):
-      preds = tf.keras.layers.Dense(
-          1, activation='linear', name='dense').apply(features['x'])
-
-      vector_loss = tf.squared_difference(labels, preds)
-      scalar_loss = tf.reduce_mean(vector_loss)
-      dp_sum_query = gaussian_query.GaussianSumQuery(1.0, 0.0)
-      dp_sum_query = privacy_ledger.QueryWithLedger(dp_sum_query, 1e6, 1 / 1e6)
-      optimizer = dp_optimizer.DPGradientDescentOptimizer(
-          dp_sum_query,
-          num_microbatches=1,
-          learning_rate=1.0)
-      global_step = tf.train.get_global_step()
-      train_op = optimizer.minimize(loss=vector_loss, global_step=global_step)
-      return tf.estimator.EstimatorSpec(
-          mode=mode, loss=scalar_loss, train_op=train_op)
-
-    linear_regressor = tf.estimator.Estimator(model_fn=linear_model_fn)
-    true_weights = np.array([[-5], [4], [3], [2]]).astype(np.float32)
-    true_bias = 6.0
-    train_data = np.random.normal(scale=3.0, size=(200, 4)).astype(np.float32)
-
-    train_labels = np.matmul(train_data,
-                             true_weights) + true_bias + np.random.normal(
-                                 scale=0.1, size=(200, 1)).astype(np.float32)
-
-    train_input_fn = tf.estimator.inputs.numpy_input_fn(
-        x={'x': train_data},
-        y=train_labels,
-        batch_size=20,
-        num_epochs=10,
-        shuffle=True)
-    linear_regressor.train(input_fn=train_input_fn, steps=100)
-    self.assertAllClose(
-        linear_regressor.get_variable_value('dense/kernel'),
-        true_weights,
-        atol=1.0)
-
-  @parameterized.named_parameters(
-      ('DPGradientDescent', dp_optimizer.DPGradientDescentOptimizer),
-      ('DPAdagrad', dp_optimizer.DPAdagradOptimizer),
-      ('DPAdam', dp_optimizer.DPAdamOptimizer))
-  def testUnrollMicrobatches(self, cls):
-    with self.cached_session() as sess:
-      var0 = tf.Variable([1.0, 2.0])
-      data0 = tf.Variable([[3.0, 4.0], [5.0, 6.0], [7.0, 8.0], [-1.0, 0.0]])
-
-      num_microbatches = 4
-
-      dp_sum_query = gaussian_query.GaussianSumQuery(1.0e9, 0.0)
-      dp_sum_query = privacy_ledger.QueryWithLedger(
-          dp_sum_query, 1e6, num_microbatches / 1e6)
-
-      opt = cls(
-          dp_sum_query,
-          num_microbatches=num_microbatches,
-          learning_rate=2.0,
-          unroll_microbatches=True)
-
-      self.evaluate(tf.global_variables_initializer())
-      # Fetch params to validate initial values
-      self.assertAllClose([1.0, 2.0], self.evaluate(var0))
-
-      # Expected gradient is sum of differences divided by number of
-      # microbatches.
-      gradient_op = opt.compute_gradients(self._loss(data0, var0), [var0])
-      grads_and_vars = sess.run(gradient_op)
-      self.assertAllCloseAccordingToType([-2.5, -2.5], grads_and_vars[0][0])
-
-  @parameterized.named_parameters(
-      ('DPGradientDescent', dp_optimizer.DPGradientDescentGaussianOptimizer),
-      ('DPAdagrad', dp_optimizer.DPAdagradGaussianOptimizer),
-      ('DPAdam', dp_optimizer.DPAdamGaussianOptimizer))
-  def testDPGaussianOptimizerClass(self, cls):
-    with self.cached_session() as sess:
-      var0 = tf.Variable([0.0])
-      data0 = tf.Variable([[0.0]])
-
-      opt = cls(
-          l2_norm_clip=4.0,
-          noise_multiplier=2.0,
-          num_microbatches=1,
-          learning_rate=2.0)
-
-      self.evaluate(tf.global_variables_initializer())
-      # Fetch params to validate initial values
-      self.assertAllClose([0.0], self.evaluate(var0))
-
-      gradient_op = opt.compute_gradients(self._loss(data0, var0), [var0])
-      grads = []
-      for _ in range(1000):
-        grads_and_vars = sess.run(gradient_op)
-        grads.append(grads_and_vars[0][0])
-
-      # Test standard deviation is close to l2_norm_clip * noise_multiplier.
-      self.assertNear(np.std(grads), 2.0 * 4.0, 0.5)
-
-
-if __name__ == '__main__':
-  tf.test.main()
--- a/tensorflow_privacy/privacy/optimizers/dp_optimizer_vectorized.py
+++ b/tensorflow_privacy/privacy/optimizers/dp_optimizer_vectorized.py
@ -1,153 +0,0 @@
-# Copyright 2019, The TensorFlow Authors.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#      http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-"""Vectorized differentially private optimizers for TensorFlow."""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-from distutils.version import LooseVersion
-import tensorflow as tf
-
-if LooseVersion(tf.__version__) < LooseVersion('2.0.0'):
-  nest = tf.contrib.framework.nest
-  AdagradOptimizer = tf.train.AdagradOptimizer
-  AdamOptimizer = tf.train.AdamOptimizer
-  GradientDescentOptimizer = tf.train.GradientDescentOptimizer
-  parent_code = tf.train.Optimizer.compute_gradients.__code__
-  GATE_OP = tf.train.Optimizer.GATE_OP  # pylint: disable=invalid-name
-else:
-  nest = tf.nest
-  AdagradOptimizer = tf.optimizers.Adagrad
-  AdamOptimizer = tf.optimizers.Adam
-  GradientDescentOptimizer = tf.optimizers.SGD  # pylint: disable=invalid-name
-  parent_code = tf.optimizers.Optimizer._compute_gradients.__code__  # pylint: disable=protected-access
-  GATE_OP = None  # pylint: disable=invalid-name
-
-
-def make_vectorized_optimizer_class(cls):
-  """Constructs a vectorized DP optimizer class from an existing one."""
-  if LooseVersion(tf.__version__) < LooseVersion('2.0.0'):
-    child_code = cls.compute_gradients.__code__
-  else:
-    child_code = cls._compute_gradients.__code__  # pylint: disable=protected-access
-  if child_code is not parent_code:
-    tf.logging.warning(
-        'WARNING: Calling make_optimizer_class() on class %s that overrides '
-        'method compute_gradients(). Check to ensure that '
-        'make_optimizer_class() does not interfere with overridden version.',
-        cls.__name__)
-
-  class DPOptimizerClass(cls):
-    """Differentially private subclass of given class cls."""
-
-    def __init__(
-        self,
-        l2_norm_clip,
-        noise_multiplier,
-        num_microbatches=None,
-        *args,  # pylint: disable=keyword-arg-before-vararg, g-doc-args
-        **kwargs):
-      """Initialize the DPOptimizerClass.
-
-      Args:
-        l2_norm_clip: Clipping norm (max L2 norm of per microbatch gradients)
-        noise_multiplier: Ratio of the standard deviation to the clipping norm
-        num_microbatches: How many microbatches into which the minibatch is
-          split. If None, will default to the size of the minibatch, and
-          per-example gradients will be computed.
-      """
-      super(DPOptimizerClass, self).__init__(*args, **kwargs)
-      self._l2_norm_clip = l2_norm_clip
-      self._noise_multiplier = noise_multiplier
-      self._num_microbatches = num_microbatches
-
-    def compute_gradients(self,
-                          loss,
-                          var_list,
-                          gate_gradients=GATE_OP,
-                          aggregation_method=None,
-                          colocate_gradients_with_ops=False,
-                          grad_loss=None,
-                          gradient_tape=None):
-      if callable(loss):
-        # TF is running in Eager mode
-        raise NotImplementedError('Vectorized optimizer unavailable for TF2.')
-      else:
-        # TF is running in graph mode, check we did not receive a gradient tape.
-        if gradient_tape:
-          raise ValueError('When in graph mode, a tape should not be passed.')
-
-        batch_size = tf.shape(loss)[0]
-        if self._num_microbatches is None:
-          self._num_microbatches = batch_size
-
-        # Note: it would be closer to the correct i.i.d. sampling of records if
-        # we sampled each microbatch from the appropriate binomial distribution,
-        # although that still wouldn't be quite correct because it would be
-        # sampling from the dataset without replacement.
-        microbatch_losses = tf.reshape(loss, [self._num_microbatches, -1])
-
-        if var_list is None:
-          var_list = (
-              tf.trainable_variables() + tf.get_collection(
-                  tf.GraphKeys.TRAINABLE_RESOURCE_VARIABLES))
-
-        def process_microbatch(microbatch_loss):
-          """Compute clipped grads for one microbatch."""
-          microbatch_loss = tf.reduce_mean(microbatch_loss)
-          grads, _ = zip(*super(DPOptimizerClass, self).compute_gradients(
-              microbatch_loss,
-              var_list,
-              gate_gradients,
-              aggregation_method,
-              colocate_gradients_with_ops,
-              grad_loss))
-          grads_list = [
-              g if g is not None else tf.zeros_like(v)
-              for (g, v) in zip(list(grads), var_list)
-          ]
-          # Clip gradients to have L2 norm of l2_norm_clip.
-          # Here, we use TF primitives rather than the built-in
-          # tf.clip_by_global_norm() so that operations can be vectorized
-          # across microbatches.
-          grads_flat = nest.flatten(grads_list)
-          squared_l2_norms = [tf.reduce_sum(tf.square(g)) for g in grads_flat]
-          global_norm = tf.sqrt(tf.add_n(squared_l2_norms))
-          div = tf.maximum(global_norm / self._l2_norm_clip, 1.)
-          clipped_flat = [g / div for g in grads_flat]
-          clipped_grads = nest.pack_sequence_as(grads_list, clipped_flat)
-          return clipped_grads
-
-        clipped_grads = tf.vectorized_map(process_microbatch, microbatch_losses)
-
-        def reduce_noise_normalize_batch(stacked_grads):
-          summed_grads = tf.reduce_sum(stacked_grads, axis=0)
-          noise_stddev = self._l2_norm_clip * self._noise_multiplier
-          noise = tf.random.normal(tf.shape(summed_grads),
-                                   stddev=noise_stddev)
-          noised_grads = summed_grads + noise
-          return noised_grads / tf.cast(self._num_microbatches, tf.float32)
-
-        final_grads = nest.map_structure(reduce_noise_normalize_batch,
-                                         clipped_grads)
-
-        return list(zip(final_grads, var_list))
-
-  return DPOptimizerClass
-
-
-VectorizedDPAdagrad = make_vectorized_optimizer_class(AdagradOptimizer)
-VectorizedDPAdam = make_vectorized_optimizer_class(AdamOptimizer)
-VectorizedDPSGD = make_vectorized_optimizer_class(GradientDescentOptimizer)
--- a/tensorflow_privacy/privacy/optimizers/dp_optimizer_vectorized_test.py
+++ b/tensorflow_privacy/privacy/optimizers/dp_optimizer_vectorized_test.py
@ -1,204 +0,0 @@
-# Copyright 2019, The TensorFlow Authors.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#      http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-"""Tests for differentially private optimizers."""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-from absl.testing import parameterized
-import mock
-import numpy as np
-import tensorflow as tf
-
-from tensorflow_privacy.privacy.optimizers import dp_optimizer_vectorized
-from tensorflow_privacy.privacy.optimizers.dp_optimizer_vectorized import VectorizedDPAdagrad
-from tensorflow_privacy.privacy.optimizers.dp_optimizer_vectorized import VectorizedDPAdam
-from tensorflow_privacy.privacy.optimizers.dp_optimizer_vectorized import VectorizedDPSGD
-
-
-class DPOptimizerTest(tf.test.TestCase, parameterized.TestCase):
-
-  def _loss(self, val0, val1):
-    """Loss function that is minimized at the mean of the input points."""
-    return 0.5 * tf.reduce_sum(tf.squared_difference(val0, val1), axis=1)
-
-  # Parameters for testing: optimizer, num_microbatches, expected answer.
-  @parameterized.named_parameters(
-      ('DPGradientDescent 1', VectorizedDPSGD, 1, [-2.5, -2.5]),
-      ('DPGradientDescent 2', VectorizedDPSGD, 2, [-2.5, -2.5]),
-      ('DPGradientDescent 4', VectorizedDPSGD, 4, [-2.5, -2.5]),
-      ('DPAdagrad 1', VectorizedDPAdagrad, 1, [-2.5, -2.5]),
-      ('DPAdagrad 2', VectorizedDPAdagrad, 2, [-2.5, -2.5]),
-      ('DPAdagrad 4', VectorizedDPAdagrad, 4, [-2.5, -2.5]),
-      ('DPAdam 1', VectorizedDPAdam, 1, [-2.5, -2.5]),
-      ('DPAdam 2', VectorizedDPAdam, 2, [-2.5, -2.5]),
-      ('DPAdam 4', VectorizedDPAdam, 4, [-2.5, -2.5]))
-  def testBaseline(self, cls, num_microbatches, expected_answer):
-    with self.cached_session() as sess:
-      var0 = tf.Variable([1.0, 2.0])
-      data0 = tf.Variable([[3.0, 4.0], [5.0, 6.0], [7.0, 8.0], [-1.0, 0.0]])
-
-      opt = cls(
-          l2_norm_clip=1.0e9,
-          noise_multiplier=0.0,
-          num_microbatches=num_microbatches,
-          learning_rate=2.0)
-
-      self.evaluate(tf.global_variables_initializer())
-      # Fetch params to validate initial values
-      self.assertAllClose([1.0, 2.0], self.evaluate(var0))
-
-      # Expected gradient is sum of differences divided by number of
-      # microbatches.
-      gradient_op = opt.compute_gradients(self._loss(data0, var0), [var0])
-      grads_and_vars = sess.run(gradient_op)
-      self.assertAllCloseAccordingToType(expected_answer, grads_and_vars[0][0])
-
-  @parameterized.named_parameters(
-      ('DPGradientDescent', VectorizedDPSGD),
-      ('DPAdagrad', VectorizedDPAdagrad),
-      ('DPAdam', VectorizedDPAdam))
-  def testClippingNorm(self, cls):
-    with self.cached_session() as sess:
-      var0 = tf.Variable([0.0, 0.0])
-      data0 = tf.Variable([[3.0, 4.0], [6.0, 8.0]])
-
-      opt = cls(l2_norm_clip=1.0,
-                noise_multiplier=0.,
-                num_microbatches=1,
-                learning_rate=2.0)
-
-      self.evaluate(tf.global_variables_initializer())
-      # Fetch params to validate initial values
-      self.assertAllClose([0.0, 0.0], self.evaluate(var0))
-
-      # Expected gradient is sum of differences.
-      gradient_op = opt.compute_gradients(self._loss(data0, var0), [var0])
-      grads_and_vars = sess.run(gradient_op)
-      self.assertAllCloseAccordingToType([-0.6, -0.8], grads_and_vars[0][0])
-
-  @parameterized.named_parameters(
-      ('DPGradientDescent', VectorizedDPSGD),
-      ('DPAdagrad', VectorizedDPAdagrad),
-      ('DPAdam', VectorizedDPAdam))
-  def testNoiseMultiplier(self, cls):
-    with self.cached_session() as sess:
-      var0 = tf.Variable([0.0])
-      data0 = tf.Variable([[0.0]])
-
-      opt = cls(l2_norm_clip=4.0,
-                noise_multiplier=8.0,
-                num_microbatches=1,
-                learning_rate=2.0)
-
-      self.evaluate(tf.global_variables_initializer())
-      # Fetch params to validate initial values
-      self.assertAllClose([0.0], self.evaluate(var0))
-
-      gradient_op = opt.compute_gradients(self._loss(data0, var0), [var0])
-      grads = []
-      for _ in range(5000):
-        grads_and_vars = sess.run(gradient_op)
-        grads.append(grads_and_vars[0][0])
-
-      # Test standard deviation is close to l2_norm_clip * noise_multiplier.
-      self.assertNear(np.std(grads), 4.0 * 8.0, 0.5)
-
-  @mock.patch.object(tf, 'logging')
-  def testComputeGradientsOverrideWarning(self, mock_logging):
-
-    class SimpleOptimizer(tf.train.Optimizer):
-
-      def compute_gradients(self):
-        return 0
-
-    dp_optimizer_vectorized.make_vectorized_optimizer_class(SimpleOptimizer)
-    mock_logging.warning.assert_called_once_with(
-        'WARNING: Calling make_optimizer_class() on class %s that overrides '
-        'method compute_gradients(). Check to ensure that '
-        'make_optimizer_class() does not interfere with overridden version.',
-        'SimpleOptimizer')
-
-  def testEstimator(self):
-    """Tests that DP optimizers work with tf.estimator."""
-
-    def linear_model_fn(features, labels, mode):
-      preds = tf.keras.layers.Dense(
-          1, activation='linear', name='dense').apply(features['x'])
-
-      vector_loss = tf.squared_difference(labels, preds)
-      scalar_loss = tf.reduce_mean(vector_loss)
-      optimizer = VectorizedDPSGD(
-          l2_norm_clip=1.0,
-          noise_multiplier=0.,
-          num_microbatches=1,
-          learning_rate=1.0)
-      global_step = tf.train.get_global_step()
-      train_op = optimizer.minimize(loss=vector_loss, global_step=global_step)
-      return tf.estimator.EstimatorSpec(
-          mode=mode, loss=scalar_loss, train_op=train_op)
-
-    linear_regressor = tf.estimator.Estimator(model_fn=linear_model_fn)
-    true_weights = np.array([[-5], [4], [3], [2]]).astype(np.float32)
-    true_bias = 6.0
-    train_data = np.random.normal(scale=3.0, size=(200, 4)).astype(np.float32)
-
-    train_labels = np.matmul(train_data,
-                             true_weights) + true_bias + np.random.normal(
-                                 scale=0.1, size=(200, 1)).astype(np.float32)
-
-    train_input_fn = tf.estimator.inputs.numpy_input_fn(
-        x={'x': train_data},
-        y=train_labels,
-        batch_size=20,
-        num_epochs=10,
-        shuffle=True)
-    linear_regressor.train(input_fn=train_input_fn, steps=100)
-    self.assertAllClose(
-        linear_regressor.get_variable_value('dense/kernel'),
-        true_weights,
-        atol=1.0)
-
-  @parameterized.named_parameters(
-      ('DPGradientDescent', VectorizedDPSGD),
-      ('DPAdagrad', VectorizedDPAdagrad),
-      ('DPAdam', VectorizedDPAdam))
-  def testDPGaussianOptimizerClass(self, cls):
-    with self.cached_session() as sess:
-      var0 = tf.Variable([0.0])
-      data0 = tf.Variable([[0.0]])
-
-      opt = cls(
-          l2_norm_clip=4.0,
-          noise_multiplier=2.0,
-          num_microbatches=1,
-          learning_rate=2.0)
-
-      self.evaluate(tf.global_variables_initializer())
-      # Fetch params to validate initial values
-      self.assertAllClose([0.0], self.evaluate(var0))
-
-      gradient_op = opt.compute_gradients(self._loss(data0, var0), [var0])
-      grads = []
-      for _ in range(1000):
-        grads_and_vars = sess.run(gradient_op)
-        grads.append(grads_and_vars[0][0])
-
-      # Test standard deviation is close to l2_norm_clip * noise_multiplier.
-      self.assertNear(np.std(grads), 2.0 * 4.0, 0.5)
-
-
-if __name__ == '__main__':
-  tf.test.main()
--- a/tensorflow_privacy/requirements.txt
+++ b/tensorflow_privacy/requirements.txt
@ -1,3 +0,0 @@
-tensorflow>=1.13
-mpmath
-scipy>=0.17
--- a/tensorflow_privacy/research/README.md
+++ b/tensorflow_privacy/research/README.md
@ -1,9 +0,0 @@
-# Research
-
-This folder contains code to reproduce results from research papers. Currently,
-the following papers are included: 
-
-* Semi-supervised Knowledge Transfer for Deep Learning from Private Training
-  Data (ICLR 2017): `pate_2017`
-
-* Scalable Private Learning with PATE (ICLR 2018): `pate_2018`
--- a/tensorflow_privacy/research/pate_2017/README.md
+++ b/tensorflow_privacy/research/pate_2017/README.md
@ -1,123 +0,0 @@
-# Learning private models with multiple teachers
-
-This repository contains code to create a setup for learning privacy-preserving
-student models by transferring knowledge from an ensemble of teachers trained
-on disjoint subsets of the data for which privacy guarantees are to be provided.
-
-Knowledge acquired by teachers is transferred to the student in a differentially
-private manner by noisily aggregating the teacher decisions before feeding them
-to the student during training.
-
-The paper describing the approach is [arXiv:1610.05755](https://arxiv.org/abs/1610.05755)
-
-## Dependencies
-
-This model uses `TensorFlow` to perform numerical computations associated with
-machine learning models, as well as common Python libraries like: `numpy`,
-`scipy`, and `six`. Instructions to install these can be found in their
-respective documentations.
-
-## How to run
-
-This repository supports the MNIST and SVHN datasets. The following
-instructions are given for MNIST but can easily be adapted by replacing the
-flag `--dataset=mnist` by `--dataset=svhn`.
-There are 2 steps: teacher training and student training. Data will be
-automatically downloaded when you start the teacher training.
-
-The following is a two-step process: first we train an ensemble of teacher
-models and second we train a student using predictions made by this ensemble.
-
-**Training the teachers:** first run the `train_teachers.py` file with at least
-three flags specifying (1) the number of teachers, (2) the ID of the teacher
-you are training among these teachers, and (3) the dataset on which to train.
-For instance, to train teacher number 10 among an ensemble of 100 teachers for
-MNIST, you use the following command:
-
-```
-python train_teachers.py --nb_teachers=100 --teacher_id=10 --dataset=mnist
-```
-
-Other flags like `train_dir` and `data_dir` should optionally be set to
-respectively point to the directory where model checkpoints and temporary data
-(like the dataset) should be saved. The flag `max_steps` (default at 3000)
-controls the length of training. See `train_teachers.py` and `deep_cnn.py`
-to find available flags and their descriptions.
-
-**Training the student:** once the teachers are all trained, e.g., teachers
-with IDs `0` to `99` are trained for `nb_teachers=100`, we are ready to train
-the student. The student is trained by labeling some of the test data with
-predictions from the teachers. The predictions are aggregated by counting the
-votes assigned to each class among the ensemble of teachers, adding Laplacian
-noise to these votes, and assigning the label with the maximum noisy vote count
-to the sample. This is detailed in function `noisy_max` in the file
-`aggregation.py`. To learn the student, use the following command:
-
-```
-python train_student.py --nb_teachers=100 --dataset=mnist --stdnt_share=5000
-```
-
-The flag `--stdnt_share=5000` indicates that the student should be able to
-use the first `5000` samples of the dataset's test subset as unlabeled
-training points (they will be labeled using the teacher predictions). The
-remaining samples are used for evaluation of the student's accuracy, which
-is displayed upon completion of training.
-
-## Using semi-supervised GANs to train the student
-
-In the paper, we describe how to train the student in a semi-supervised 
-fashion using Generative Adversarial Networks. This can be reproduced for MNIST 
-by cloning the [improved-gan](https://github.com/openai/improved-gan)
-repository and adding to your `PATH` variable before running the shell
-script `train_student_mnist_250_lap_20_count_50_epochs_600.sh`.
-
-```
-export PATH="/path/to/improved-gan/mnist_svhn_cifar10":$PATH
-sh train_student_mnist_250_lap_20_count_50_epochs_600.sh
-```
-
-
-## Alternative deeper convolutional architecture
-
-Note that a deeper convolutional model is available. Both the default and
-deeper models graphs are defined in `deep_cnn.py`, respectively by
-functions `inference` and `inference_deeper`. Use the flag `--deeper=true`
-to switch to that model when launching `train_teachers.py` and
-`train_student.py`.
-
-## Privacy analysis
-
-In the paper, we detail how data-dependent differential privacy bounds can be
-computed to estimate the cost of training the student. In order to reproduce
-the bounds given in the paper, we include the label predicted by our two
-teacher ensembles: MNIST and SVHN. You can run the privacy analysis for each
-dataset with the following commands:
-
-```
-python analysis.py --counts_file=mnist_250_teachers_labels.npy --indices_file=mnist_250_teachers_100_indices_used_by_student.npy
-
-python analysis.py --counts_file=svhn_250_teachers_labels.npy --max_examples=1000 --delta=1e-6
-```
-
-To expedite experimentation with the privacy analysis of student training,
-the `analysis.py` file is configured to download the labels produced by 250
-teacher models, for MNIST and SVHN when running the two commands included
-above. These 250 teacher models were trained using the following command lines,
-where `XXX` takes values between `0` and `249`:
-
-```
-python train_teachers.py --nb_teachers=250 --teacher_id=XXX --dataset=mnist
-python train_teachers.py --nb_teachers=250 --teacher_id=XXX --dataset=svhn
-```
-
-Note that these labels may also be used in lieu of function `ensemble_preds`
-in `train_student.py`, to compare the performance of alternative student model
-architectures and learning techniques. This facilitates future work, by
-removing the need for training the MNIST and SVHN teacher ensembles when
-proposing new student training approaches.
-
-## Contact
-
-To ask questions, please email `nicolas@papernot.fr` or open an issue on
-the `tensorflow/models` issues tracker. Please assign issues to
-[@npapernot](https://github.com/npapernot).
--- a/tensorflow_privacy/research/pate_2017/init.py
+++ b/tensorflow_privacy/research/pate_2017/init.py
@ -1 +0,0 @@
-
--- a/tensorflow_privacy/research/pate_2017/aggregation.py
+++ b/tensorflow_privacy/research/pate_2017/aggregation.py
@ -1,130 +0,0 @@
-# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import numpy as np
-from six.moves import xrange
-
-
-def labels_from_probs(probs):
-  """
-  Helper function: computes argmax along last dimension of array to obtain
-  labels (max prob or max logit value)
-  :param probs: numpy array where probabilities or logits are on last dimension
-  :return: array with same shape as input besides last dimension with shape 1
-          now containing the labels
-  """
-  # Compute last axis index
-  last_axis = len(np.shape(probs)) - 1
-
-  # Label is argmax over last dimension
-  labels = np.argmax(probs, axis=last_axis)
-
-  # Return as np.int32
-  return np.asarray(labels, dtype=np.int32)
-
-
-def noisy_max(logits, lap_scale, return_clean_votes=False):
-  """
-  This aggregation mechanism takes the softmax/logit output of several models
-  resulting from inference on identical inputs and computes the noisy-max of
-  the votes for candidate classes to select a label for each sample: it
-  adds Laplacian noise to label counts and returns the most frequent label.
-  :param logits: logits or probabilities for each sample
-  :param lap_scale: scale of the Laplacian noise to be added to counts
-  :param return_clean_votes: if set to True, also returns clean votes (without
-                      Laplacian noise). This can be used to perform the
-                      privacy analysis of this aggregation mechanism.
-  :return: pair of result and (if clean_votes is set to True) the clean counts
-           for each class per sample and the original labels produced by
-           the teachers.
-  """
-
-  # Compute labels from logits/probs and reshape array properly
-  labels = labels_from_probs(logits)
-  labels_shape = np.shape(labels)
-  labels = labels.reshape((labels_shape[0], labels_shape[1]))
-
-  # Initialize array to hold final labels
-  result = np.zeros(int(labels_shape[1]))
-
-  if return_clean_votes:
-    # Initialize array to hold clean votes for each sample
-    clean_votes = np.zeros((int(labels_shape[1]), 10))
-
-  # Parse each sample
-  for i in xrange(int(labels_shape[1])):
-    # Count number of votes assigned to each class
-    label_counts = np.bincount(labels[:, i], minlength=10)
-
-    if return_clean_votes:
-      # Store vote counts for export
-      clean_votes[i] = label_counts
-
-    # Cast in float32 to prepare before addition of Laplacian noise
-    label_counts = np.asarray(label_counts, dtype=np.float32)
-
-    # Sample independent Laplacian noise for each class
-    for item in xrange(10):
-      label_counts[item] += np.random.laplace(loc=0.0, scale=float(lap_scale))
-
-    # Result is the most frequent label
-    result[i] = np.argmax(label_counts)
-
-  # Cast labels to np.int32 for compatibility with deep_cnn.py feed dictionaries
-  result = np.asarray(result, dtype=np.int32)
-
-  if return_clean_votes:
-    # Returns several array, which are later saved:
-    # result: labels obtained from the noisy aggregation
-    # clean_votes: the number of teacher votes assigned to each sample and class
-    # labels: the labels assigned by teachers (before the noisy aggregation)
-    return result, clean_votes, labels
-  else:
-    # Only return labels resulting from noisy aggregation
-    return result
-
-
-def aggregation_most_frequent(logits):
-  """
-  This aggregation mechanism takes the softmax/logit output of several models
-  resulting from inference on identical inputs and computes the most frequent
-  label. It is deterministic (no noise injection like noisy_max() above.
-  :param logits: logits or probabilities for each sample
-  :return:
-  """
-  # Compute labels from logits/probs and reshape array properly
-  labels = labels_from_probs(logits)
-  labels_shape = np.shape(labels)
-  labels = labels.reshape((labels_shape[0], labels_shape[1]))
-
-  # Initialize array to hold final labels
-  result = np.zeros(int(labels_shape[1]))
-
-  # Parse each sample
-  for i in xrange(int(labels_shape[1])):
-    # Count number of votes assigned to each class
-    label_counts = np.bincount(labels[:, i], minlength=10)
-
-    label_counts = np.asarray(label_counts, dtype=np.int32)
-
-    # Result is the most frequent label
-    result[i] = np.argmax(label_counts)
-
-  return np.asarray(result, dtype=np.int32)
--- a/tensorflow_privacy/research/pate_2017/analysis.py
+++ b/tensorflow_privacy/research/pate_2017/analysis.py
@ -1,304 +0,0 @@
-# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-"""
-This script computes bounds on the privacy cost of training the
-student model from noisy aggregation of labels predicted by teachers.
-It should be used only after training the student (and therefore the
-teachers as well). We however include the label files required to
-reproduce key results from our paper (https://arxiv.org/abs/1610.05755):
-the epsilon bounds for MNIST and SVHN students.
-
-The command that computes the epsilon bound associated
-with the training of the MNIST student model (100 label queries
-with a (1/20)*2=0.1 epsilon bound each) is:
-
-python analysis.py
-  --counts_file=mnist_250_teachers_labels.npy
-  --indices_file=mnist_250_teachers_100_indices_used_by_student.npy
-
-The command that computes the epsilon bound associated
-with the training of the SVHN student model (1000 label queries
-with a (1/20)*2=0.1 epsilon bound each) is:
-
-python analysis.py
-  --counts_file=svhn_250_teachers_labels.npy
-  --max_examples=1000
-  --delta=1e-6
-"""
-import os
-import math
-import numpy as np
-from six.moves import xrange
-import tensorflow as tf
-
-import maybe_download
-
-# These parameters can be changed to compute bounds for different failure rates
-# or different model predictions.
-
-tf.flags.DEFINE_integer("moments",8, "Number of moments")
-tf.flags.DEFINE_float("noise_eps", 0.1, "Eps value for each call to noisymax.")
-tf.flags.DEFINE_float("delta", 1e-5, "Target value of delta.")
-tf.flags.DEFINE_float("beta", 0.09, "Value of beta for smooth sensitivity")
-tf.flags.DEFINE_string("counts_file","","Numpy matrix with raw counts")
-tf.flags.DEFINE_string("indices_file","",
-    "File containting a numpy matrix with indices used."
-    "Optional. Use the first max_examples indices if this is not provided.")
-tf.flags.DEFINE_integer("max_examples",1000,
-    "Number of examples to use. We will use the first"
-    " max_examples many examples from the counts_file"
-    " or indices_file to do the privacy cost estimate")
-tf.flags.DEFINE_float("too_small", 1e-10, "Small threshold to avoid log of 0")
-tf.flags.DEFINE_bool("input_is_counts", False, "False if labels, True if counts")
-
-FLAGS = tf.flags.FLAGS
-
-
-def compute_q_noisy_max(counts, noise_eps):
-  """returns ~ Pr[outcome != winner].
-
-  Args:
-    counts: a list of scores
-    noise_eps: privacy parameter for noisy_max
-  Returns:
-    q: the probability that outcome is different from true winner.
-  """
-  # For noisy max, we only get an upper bound.
-  # Pr[ j beats i*] \leq (2+gap(j,i*))/ 4 exp(gap(j,i*)
-  # proof at http://mathoverflow.net/questions/66763/
-  # tight-bounds-on-probability-of-sum-of-laplace-random-variables
-
-  winner = np.argmax(counts)
-  counts_normalized = noise_eps * (counts - counts[winner])
-  counts_rest = np.array(
-      [counts_normalized[i] for i in xrange(len(counts)) if i != winner])
-  q = 0.0
-  for c in counts_rest:
-    gap = -c
-    q += (gap + 2.0) / (4.0 * math.exp(gap))
-  return min(q, 1.0 - (1.0/len(counts)))
-
-
-def compute_q_noisy_max_approx(counts, noise_eps):
-  """returns ~ Pr[outcome != winner].
-
-  Args:
-    counts: a list of scores
-    noise_eps: privacy parameter for noisy_max
-  Returns:
-    q: the probability that outcome is different from true winner.
-  """
-  # For noisy max, we only get an upper bound.
-  # Pr[ j beats i*] \leq (2+gap(j,i*))/ 4 exp(gap(j,i*)
-  # proof at http://mathoverflow.net/questions/66763/
-  # tight-bounds-on-probability-of-sum-of-laplace-random-variables
-  # This code uses an approximation that is faster and easier
-  # to get local sensitivity bound on.
-
-  winner = np.argmax(counts)
-  counts_normalized = noise_eps * (counts - counts[winner])
-  counts_rest = np.array(
-      [counts_normalized[i] for i in xrange(len(counts)) if i != winner])
-  gap = -max(counts_rest)
-  q = (len(counts) - 1) * (gap + 2.0) / (4.0 * math.exp(gap))
-  return min(q, 1.0 - (1.0/len(counts)))
-
-
-def logmgf_exact(q, priv_eps, l):
-  """Computes the logmgf value given q and privacy eps.
-
-  The bound used is the min of three terms. The first term is from
-  https://arxiv.org/pdf/1605.02065.pdf.
-  The second term is based on the fact that when event has probability (1-q) for
-  q close to zero, q can only change by exp(eps), which corresponds to a
-  much smaller multiplicative change in (1-q)
-  The third term comes directly from the privacy guarantee.
-  Args:
-    q: pr of non-optimal outcome
-    priv_eps: eps parameter for DP
-    l: moment to compute.
-  Returns:
-    Upper bound on logmgf
-  """
-  if q < 0.5:
-    t_one = (1-q) * math.pow((1-q) / (1 - math.exp(priv_eps) * q), l)
-    t_two = q * math.exp(priv_eps * l)
-    t = t_one + t_two
-    try:
-      log_t = math.log(t)
-    except ValueError:
-      print("Got ValueError in math.log for values :" + str((q, priv_eps, l, t)))
-      log_t = priv_eps * l
-  else:
-    log_t = priv_eps * l
-
-  return min(0.5 * priv_eps * priv_eps * l * (l + 1), log_t, priv_eps * l)
-
-
-def logmgf_from_counts(counts, noise_eps, l):
-  """
-  ReportNoisyMax mechanism with noise_eps with 2*noise_eps-DP
-  in our setting where one count can go up by one and another
-  can go down by 1.
-  """
-
-  q = compute_q_noisy_max(counts, noise_eps)
-  return logmgf_exact(q, 2.0 * noise_eps, l)
-
-
-def sens_at_k(counts, noise_eps, l, k):
-  """Return sensitivity at distane k.
-
-  Args:
-    counts: an array of scores
-    noise_eps: noise parameter used
-    l: moment whose sensitivity is being computed
-    k: distance
-  Returns:
-    sensitivity: at distance k
-  """
-  counts_sorted = sorted(counts, reverse=True)
-  if 0.5 * noise_eps * l > 1:
-    print("l too large to compute sensitivity")
-    return 0
-  # Now we can assume that at k, gap remains positive
-  # or we have reached the point where logmgf_exact is
-  # determined by the first term and ind of q.
-  if counts[0] < counts[1] + k:
-    return 0
-  counts_sorted[0] -= k
-  counts_sorted[1] += k
-  val = logmgf_from_counts(counts_sorted, noise_eps, l)
-  counts_sorted[0] -= 1
-  counts_sorted[1] += 1
-  val_changed = logmgf_from_counts(counts_sorted, noise_eps, l)
-  return val_changed - val
-
-
-def smoothed_sens(counts, noise_eps, l, beta):
-  """Compute beta-smooth sensitivity.
-
-  Args:
-    counts: array of scors
-    noise_eps: noise parameter
-    l: moment of interest
-    beta: smoothness parameter
-  Returns:
-    smooth_sensitivity: a beta smooth upper bound
-  """
-  k = 0
-  smoothed_sensitivity = sens_at_k(counts, noise_eps, l, k)
-  while k < max(counts):
-    k += 1
-    sensitivity_at_k = sens_at_k(counts, noise_eps, l, k)
-    smoothed_sensitivity = max(
-        smoothed_sensitivity,
-        math.exp(-beta * k) * sensitivity_at_k)
-    if sensitivity_at_k == 0.0:
-      break
-  return smoothed_sensitivity
-
-
-def main(unused_argv):
-  ##################################################################
-  # If we are reproducing results from paper https://arxiv.org/abs/1610.05755,
-  # download the required binaries with label information.
-  ##################################################################
-
-  # Binaries for MNIST results
-  paper_binaries_mnist = \
-    ["https://github.com/npapernot/multiple-teachers-for-privacy/blob/master/mnist_250_teachers_labels.npy?raw=true",
-    "https://github.com/npapernot/multiple-teachers-for-privacy/blob/master/mnist_250_teachers_100_indices_used_by_student.npy?raw=true"]
-  if FLAGS.counts_file == "mnist_250_teachers_labels.npy" \
-    or FLAGS.indices_file == "mnist_250_teachers_100_indices_used_by_student.npy":
-    maybe_download(paper_binaries_mnist, os.getcwd())
-
-  # Binaries for SVHN results
-  paper_binaries_svhn = ["https://github.com/npapernot/multiple-teachers-for-privacy/blob/master/svhn_250_teachers_labels.npy?raw=true"]
-  if FLAGS.counts_file == "svhn_250_teachers_labels.npy":
-    maybe_download(paper_binaries_svhn, os.getcwd())
-
-  input_mat = np.load(FLAGS.counts_file)
-  if FLAGS.input_is_counts:
-    counts_mat = input_mat
-  else:
-    # In this case, the input is the raw predictions. Transform
-    num_teachers, n = input_mat.shape
-    counts_mat = np.zeros((n, 10)).astype(np.int32)
-    for i in range(n):
-      for j in range(num_teachers):
-        counts_mat[i, int(input_mat[j, i])] += 1
-  n = counts_mat.shape[0]
-  num_examples = min(n, FLAGS.max_examples)
-
-  if not FLAGS.indices_file:
-    indices = np.array(range(num_examples))
-  else:
-    index_list = np.load(FLAGS.indices_file)
-    indices = index_list[:num_examples]
-
-  l_list = 1.0 + np.array(xrange(FLAGS.moments))
-  beta = FLAGS.beta
-  total_log_mgf_nm = np.array([0.0 for _ in l_list])
-  total_ss_nm = np.array([0.0 for _ in l_list])
-  noise_eps = FLAGS.noise_eps
-
-  for i in indices:
-    total_log_mgf_nm += np.array(
-        [logmgf_from_counts(counts_mat[i], noise_eps, l)
-         for l in l_list])
-    total_ss_nm += np.array(
-        [smoothed_sens(counts_mat[i], noise_eps, l, beta)
-         for l in l_list])
-  delta = FLAGS.delta
-
-  # We want delta = exp(alpha - eps l).
-  # Solving gives eps = (alpha - ln (delta))/l
-  eps_list_nm = (total_log_mgf_nm - math.log(delta)) / l_list
-
-  print("Epsilons (Noisy Max): " + str(eps_list_nm))
-  print("Smoothed sensitivities (Noisy Max): " + str(total_ss_nm / l_list))
-
-  # If beta < eps / 2 ln (1/delta), then adding noise Lap(1) * 2 SS/eps
-  # is eps,delta DP
-  # Also if beta < eps / 2(gamma +1), then adding noise 2(gamma+1) SS eta / eps
-  # where eta has density proportional to 1 / (1+|z|^gamma) is eps-DP
-  # Both from Corolloary 2.4 in
-  # http://www.cse.psu.edu/~ads22/pubs/NRS07/NRS07-full-draft-v1.pdf
-  # Print the first one's scale
-  ss_eps = 2.0 * beta * math.log(1/delta)
-  ss_scale = 2.0 / ss_eps
-  print("To get an " + str(ss_eps) + "-DP estimate of epsilon, ")
-  print("..add noise ~ " + str(ss_scale))
-  print("... times " + str(total_ss_nm / l_list))
-  print("Epsilon = " + str(min(eps_list_nm)) + ".")
-  if min(eps_list_nm) == eps_list_nm[-1]:
-    print("Warning: May not have used enough values of l")
-
-  # Data independent bound, as mechanism is
-  # 2*noise_eps DP.
-  data_ind_log_mgf = np.array([0.0 for _ in l_list])
-  data_ind_log_mgf += num_examples * np.array(
-      [logmgf_exact(1.0, 2.0 * noise_eps, l) for l in l_list])
-
-  data_ind_eps_list = (data_ind_log_mgf - math.log(delta)) / l_list
-  print("Data independent bound = " + str(min(data_ind_eps_list)) + ".")
-
-  return
-
-
-if __name__ == "__main__":
-  tf.app.run()
--- a/tensorflow_privacy/research/pate_2017/deep_cnn.py
+++ b/tensorflow_privacy/research/pate_2017/deep_cnn.py
@ -1,603 +0,0 @@
-# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-from datetime import datetime
-import math
-import numpy as np
-from six.moves import xrange
-import tensorflow as tf
-import time
-
-import utils
-
-FLAGS = tf.app.flags.FLAGS
-
-# Basic model parameters.
-tf.app.flags.DEFINE_integer('dropout_seed', 123, """seed for dropout.""")
-tf.app.flags.DEFINE_integer('batch_size', 128, """Nb of images in a batch.""")
-tf.app.flags.DEFINE_integer('epochs_per_decay', 350, """Nb epochs per decay""")
-tf.app.flags.DEFINE_integer('learning_rate', 5, """100 * learning rate""")
-tf.app.flags.DEFINE_boolean('log_device_placement', False, """see TF doc""")
-
-
-# Constants describing the training process.
-MOVING_AVERAGE_DECAY = 0.9999     # The decay to use for the moving average.
-LEARNING_RATE_DECAY_FACTOR = 0.1  # Learning rate decay factor.
-
-
-def _variable_on_cpu(name, shape, initializer):
-  """Helper to create a Variable stored on CPU memory.
-
-  Args:
-    name: name of the variable
-    shape: list of ints
-    initializer: initializer for Variable
-
-  Returns:
-    Variable Tensor
-  """
-  with tf.device('/cpu:0'):
-    var = tf.get_variable(name, shape, initializer=initializer)
-  return var
-
-
-def _variable_with_weight_decay(name, shape, stddev, wd):
-  """Helper to create an initialized Variable with weight decay.
-
-  Note that the Variable is initialized with a truncated normal distribution.
-  A weight decay is added only if one is specified.
-
-  Args:
-    name: name of the variable
-    shape: list of ints
-    stddev: standard deviation of a truncated Gaussian
-    wd: add L2Loss weight decay multiplied by this float. If None, weight
-        decay is not added for this Variable.
-
-  Returns:
-    Variable Tensor
-  """
-  var = _variable_on_cpu(name, shape,
-                         tf.truncated_normal_initializer(stddev=stddev))
-  if wd is not None:
-    weight_decay = tf.multiply(tf.nn.l2_loss(var), wd, name='weight_loss')
-    tf.add_to_collection('losses', weight_decay)
-  return var
-
-
-def inference(images, dropout=False):
-  """Build the CNN model.
-  Args:
-    images: Images returned from distorted_inputs() or inputs().
-    dropout: Boolean controlling whether to use dropout or not
-  Returns:
-    Logits
-  """
-  if FLAGS.dataset == 'mnist':
-    first_conv_shape = [5, 5, 1, 64]
-  else:
-    first_conv_shape = [5, 5, 3, 64]
-
-  # conv1
-  with tf.variable_scope('conv1') as scope:
-    kernel = _variable_with_weight_decay('weights',
-                                         shape=first_conv_shape,
-                                         stddev=1e-4,
-                                         wd=0.0)
-    conv = tf.nn.conv2d(images, kernel, [1, 1, 1, 1], padding='SAME')
-    biases = _variable_on_cpu('biases', [64], tf.constant_initializer(0.0))
-    bias = tf.nn.bias_add(conv, biases)
-    conv1 = tf.nn.relu(bias, name=scope.name)
-    if dropout:
-      conv1 = tf.nn.dropout(conv1, 0.3, seed=FLAGS.dropout_seed)
-
-
-  # pool1
-  pool1 = tf.nn.max_pool(conv1,
-                         ksize=[1, 3, 3, 1],
-                         strides=[1, 2, 2, 1],
-                         padding='SAME',
-                         name='pool1')
-
-  # norm1
-  norm1 = tf.nn.lrn(pool1,
-                    4,
-                    bias=1.0,
-                    alpha=0.001 / 9.0,
-                    beta=0.75,
-                    name='norm1')
-
-  # conv2
-  with tf.variable_scope('conv2') as scope:
-    kernel = _variable_with_weight_decay('weights',
-                                         shape=[5, 5, 64, 128],
-                                         stddev=1e-4,
-                                         wd=0.0)
-    conv = tf.nn.conv2d(norm1, kernel, [1, 1, 1, 1], padding='SAME')
-    biases = _variable_on_cpu('biases', [128], tf.constant_initializer(0.1))
-    bias = tf.nn.bias_add(conv, biases)
-    conv2 = tf.nn.relu(bias, name=scope.name)
-    if dropout:
-      conv2 = tf.nn.dropout(conv2, 0.3, seed=FLAGS.dropout_seed)
-
-
-  # norm2
-  norm2 = tf.nn.lrn(conv2,
-                    4,
-                    bias=1.0,
-                    alpha=0.001 / 9.0,
-                    beta=0.75,
-                    name='norm2')
-
-  # pool2
-  pool2 = tf.nn.max_pool(norm2,
-                         ksize=[1, 3, 3, 1],
-                         strides=[1, 2, 2, 1],
-                         padding='SAME',
-                         name='pool2')
-
-  # local3
-  with tf.variable_scope('local3') as scope:
-    # Move everything into depth so we can perform a single matrix multiply.
-    reshape = tf.reshape(pool2, [FLAGS.batch_size, -1])
-    dim = reshape.get_shape()[1].value
-    weights = _variable_with_weight_decay('weights',
-                                          shape=[dim, 384],
-                                          stddev=0.04,
-                                          wd=0.004)
-    biases = _variable_on_cpu('biases', [384], tf.constant_initializer(0.1))
-    local3 = tf.nn.relu(tf.matmul(reshape, weights) + biases, name=scope.name)
-    if dropout:
-      local3 = tf.nn.dropout(local3, 0.5, seed=FLAGS.dropout_seed)
-
-  # local4
-  with tf.variable_scope('local4') as scope:
-    weights = _variable_with_weight_decay('weights',
-                                          shape=[384, 192],
-                                          stddev=0.04,
-                                          wd=0.004)
-    biases = _variable_on_cpu('biases', [192], tf.constant_initializer(0.1))
-    local4 = tf.nn.relu(tf.matmul(local3, weights) + biases, name=scope.name)
-    if dropout:
-      local4 = tf.nn.dropout(local4, 0.5, seed=FLAGS.dropout_seed)
-
-  # compute logits
-  with tf.variable_scope('softmax_linear') as scope:
-    weights = _variable_with_weight_decay('weights',
-                                          [192, FLAGS.nb_labels],
-                                          stddev=1/192.0,
-                                          wd=0.0)
-    biases = _variable_on_cpu('biases',
-                              [FLAGS.nb_labels],
-                              tf.constant_initializer(0.0))
-    logits = tf.add(tf.matmul(local4, weights), biases, name=scope.name)
-
-  return logits
-
-
-def inference_deeper(images, dropout=False):
-  """Build a deeper CNN model.
-  Args:
-    images: Images returned from distorted_inputs() or inputs().
-    dropout: Boolean controlling whether to use dropout or not
-  Returns:
-    Logits
-  """
-  if FLAGS.dataset == 'mnist':
-    first_conv_shape = [3, 3, 1, 96]
-  else:
-    first_conv_shape = [3, 3, 3, 96]
-
-  # conv1
-  with tf.variable_scope('conv1') as scope:
-    kernel = _variable_with_weight_decay('weights',
-                                         shape=first_conv_shape,
-                                         stddev=0.05,
-                                         wd=0.0)
-    conv = tf.nn.conv2d(images, kernel, [1, 1, 1, 1], padding='SAME')
-    biases = _variable_on_cpu('biases', [96], tf.constant_initializer(0.0))
-    bias = tf.nn.bias_add(conv, biases)
-    conv1 = tf.nn.relu(bias, name=scope.name)
-
-  # conv2
-  with tf.variable_scope('conv2') as scope:
-    kernel = _variable_with_weight_decay('weights',
-                                         shape=[3, 3, 96, 96],
-                                         stddev=0.05,
-                                         wd=0.0)
-    conv = tf.nn.conv2d(conv1, kernel, [1, 1, 1, 1], padding='SAME')
-    biases = _variable_on_cpu('biases', [96], tf.constant_initializer(0.0))
-    bias = tf.nn.bias_add(conv, biases)
-    conv2 = tf.nn.relu(bias, name=scope.name)
-
-  # conv3
-  with tf.variable_scope('conv3') as scope:
-    kernel = _variable_with_weight_decay('weights',
-                                         shape=[3, 3, 96, 96],
-                                         stddev=0.05,
-                                         wd=0.0)
-    conv = tf.nn.conv2d(conv2, kernel, [1, 2, 2, 1], padding='SAME')
-    biases = _variable_on_cpu('biases', [96], tf.constant_initializer(0.0))
-    bias = tf.nn.bias_add(conv, biases)
-    conv3 = tf.nn.relu(bias, name=scope.name)
-    if dropout:
-      conv3 = tf.nn.dropout(conv3, 0.5, seed=FLAGS.dropout_seed)
-
-  # conv4
-  with tf.variable_scope('conv4') as scope:
-    kernel = _variable_with_weight_decay('weights',
-                                         shape=[3, 3, 96, 192],
-                                         stddev=0.05,
-                                         wd=0.0)
-    conv = tf.nn.conv2d(conv3, kernel, [1, 1, 1, 1], padding='SAME')
-    biases = _variable_on_cpu('biases', [192], tf.constant_initializer(0.0))
-    bias = tf.nn.bias_add(conv, biases)
-    conv4 = tf.nn.relu(bias, name=scope.name)
-
-  # conv5
-  with tf.variable_scope('conv5') as scope:
-    kernel = _variable_with_weight_decay('weights',
-                                         shape=[3, 3, 192, 192],
-                                         stddev=0.05,
-                                         wd=0.0)
-    conv = tf.nn.conv2d(conv4, kernel, [1, 1, 1, 1], padding='SAME')
-    biases = _variable_on_cpu('biases', [192], tf.constant_initializer(0.0))
-    bias = tf.nn.bias_add(conv, biases)
-    conv5 = tf.nn.relu(bias, name=scope.name)
-
-  # conv6
-  with tf.variable_scope('conv6') as scope:
-    kernel = _variable_with_weight_decay('weights',
-                                         shape=[3, 3, 192, 192],
-                                         stddev=0.05,
-                                         wd=0.0)
-    conv = tf.nn.conv2d(conv5, kernel, [1, 2, 2, 1], padding='SAME')
-    biases = _variable_on_cpu('biases', [192], tf.constant_initializer(0.0))
-    bias = tf.nn.bias_add(conv, biases)
-    conv6 = tf.nn.relu(bias, name=scope.name)
-    if dropout:
-      conv6 = tf.nn.dropout(conv6, 0.5, seed=FLAGS.dropout_seed)
-
-
-  # conv7
-  with tf.variable_scope('conv7') as scope:
-    kernel = _variable_with_weight_decay('weights',
-                                         shape=[5, 5, 192, 192],
-                                         stddev=1e-4,
-                                         wd=0.0)
-    conv = tf.nn.conv2d(conv6, kernel, [1, 1, 1, 1], padding='SAME')
-    biases = _variable_on_cpu('biases', [192], tf.constant_initializer(0.1))
-    bias = tf.nn.bias_add(conv, biases)
-    conv7 = tf.nn.relu(bias, name=scope.name)
-
-
-  # local1
-  with tf.variable_scope('local1') as scope:
-    # Move everything into depth so we can perform a single matrix multiply.
-    reshape = tf.reshape(conv7, [FLAGS.batch_size, -1])
-    dim = reshape.get_shape()[1].value
-    weights = _variable_with_weight_decay('weights',
-                                          shape=[dim, 192],
-                                          stddev=0.05,
-                                          wd=0)
-    biases = _variable_on_cpu('biases', [192], tf.constant_initializer(0.1))
-    local1 = tf.nn.relu(tf.matmul(reshape, weights) + biases, name=scope.name)
-
-  # local2
-  with tf.variable_scope('local2') as scope:
-    weights = _variable_with_weight_decay('weights',
-                                          shape=[192, 192],
-                                          stddev=0.05,
-                                          wd=0)
-    biases = _variable_on_cpu('biases', [192], tf.constant_initializer(0.1))
-    local2 = tf.nn.relu(tf.matmul(local1, weights) + biases, name=scope.name)
-    if dropout:
-      local2 = tf.nn.dropout(local2, 0.5, seed=FLAGS.dropout_seed)
-
-  # compute logits
-  with tf.variable_scope('softmax_linear') as scope:
-    weights = _variable_with_weight_decay('weights',
-                                          [192, FLAGS.nb_labels],
-                                          stddev=0.05,
-                                          wd=0.0)
-    biases = _variable_on_cpu('biases',
-                              [FLAGS.nb_labels],
-                              tf.constant_initializer(0.0))
-    logits = tf.add(tf.matmul(local2, weights), biases, name=scope.name)
-
-  return logits
-
-
-def loss_fun(logits, labels):
-  """Add L2Loss to all the trainable variables.
-
-  Add summary for "Loss" and "Loss/avg".
-  Args:
-    logits: Logits from inference().
-    labels: Labels from distorted_inputs or inputs(). 1-D tensor
-            of shape [batch_size]
-    distillation: if set to True, use probabilities and not class labels to
-                  compute softmax loss
-
-  Returns:
-    Loss tensor of type float.
-  """
-
-  # Calculate the cross entropy between labels and predictions
-  labels = tf.cast(labels, tf.int64)
-  cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(
-      logits=logits, labels=labels, name='cross_entropy_per_example')
-
-  # Calculate the average cross entropy loss across the batch.
-  cross_entropy_mean = tf.reduce_mean(cross_entropy, name='cross_entropy')
-
-  # Add to TF collection for losses
-  tf.add_to_collection('losses', cross_entropy_mean)
-
-  # The total loss is defined as the cross entropy loss plus all of the weight
-  # decay terms (L2 loss).
-  return tf.add_n(tf.get_collection('losses'), name='total_loss')
-
-
-def moving_av(total_loss):
-  """
-  Generates moving average for all losses
-
-  Args:
-    total_loss: Total loss from loss().
-  Returns:
-    loss_averages_op: op for generating moving averages of losses.
-  """
-  # Compute the moving average of all individual losses and the total loss.
-  loss_averages = tf.train.ExponentialMovingAverage(0.9, name='avg')
-  losses = tf.get_collection('losses')
-  loss_averages_op = loss_averages.apply(losses + [total_loss])
-
-  return loss_averages_op
-
-
-def train_op_fun(total_loss, global_step):
-  """Train model.
-
-  Create an optimizer and apply to all trainable variables. Add moving
-  average for all trainable variables.
-
-  Args:
-    total_loss: Total loss from loss().
-    global_step: Integer Variable counting the number of training steps
-      processed.
-  Returns:
-    train_op: op for training.
-  """
-  # Variables that affect learning rate.
-  nb_ex_per_train_epoch = int(60000 / FLAGS.nb_teachers)
-
-  num_batches_per_epoch = nb_ex_per_train_epoch / FLAGS.batch_size
-  decay_steps = int(num_batches_per_epoch * FLAGS.epochs_per_decay)
-
-  initial_learning_rate = float(FLAGS.learning_rate) / 100.0
-
-  # Decay the learning rate exponentially based on the number of steps.
-  lr = tf.train.exponential_decay(initial_learning_rate,
-                                  global_step,
-                                  decay_steps,
-                                  LEARNING_RATE_DECAY_FACTOR,
-                                  staircase=True)
-  tf.summary.scalar('learning_rate', lr)
-
-  # Generate moving averages of all losses and associated summaries.
-  loss_averages_op = moving_av(total_loss)
-
-  # Compute gradients.
-  with tf.control_dependencies([loss_averages_op]):
-    opt = tf.train.GradientDescentOptimizer(lr)
-    grads = opt.compute_gradients(total_loss)
-
-  # Apply gradients.
-  apply_gradient_op = opt.apply_gradients(grads, global_step=global_step)
-
-  # Add histograms for trainable variables.
-  for var in tf.trainable_variables():
-    tf.summary.histogram(var.op.name, var)
-
-  # Track the moving averages of all trainable variables.
-  variable_averages = tf.train.ExponentialMovingAverage(
-      MOVING_AVERAGE_DECAY, global_step)
-  variables_averages_op = variable_averages.apply(tf.trainable_variables())
-
-  with tf.control_dependencies([apply_gradient_op, variables_averages_op]):
-    train_op = tf.no_op(name='train')
-
-  return train_op
-
-
-def _input_placeholder():
-  """
-  This helper function declares a TF placeholder for the graph input data
-  :return: TF placeholder for the graph input data
-  """
-  if FLAGS.dataset == 'mnist':
-    image_size = 28
-    num_channels = 1
-  else:
-    image_size = 32
-    num_channels = 3
-
-  # Declare data placeholder
-  train_node_shape = (FLAGS.batch_size, image_size, image_size, num_channels)
-  return tf.placeholder(tf.float32, shape=train_node_shape)
-
-
-def train(images, labels, ckpt_path, dropout=False):
-  """
-  This function contains the loop that actually trains the model.
-  :param images: a numpy array with the input data
-  :param labels: a numpy array with the output labels
-  :param ckpt_path: a path (including name) where model checkpoints are saved
-  :param dropout: Boolean, whether to use dropout or not
-  :return: True if everything went well
-  """
-
-  # Check training data
-  assert len(images) == len(labels)
-  assert images.dtype == np.float32
-  assert labels.dtype == np.int32
-
-  # Set default TF graph
-  with tf.Graph().as_default():
-    global_step = tf.Variable(0, trainable=False)
-
-    # Declare data placeholder
-    train_data_node = _input_placeholder()
-
-    # Create a placeholder to hold labels
-    train_labels_shape = (FLAGS.batch_size,)
-    train_labels_node = tf.placeholder(tf.int32, shape=train_labels_shape)
-
-    print("Done Initializing Training Placeholders")
-
-    # Build a Graph that computes the logits predictions from the placeholder
-    if FLAGS.deeper:
-      logits = inference_deeper(train_data_node, dropout=dropout)
-    else:
-      logits = inference(train_data_node, dropout=dropout)
-
-    # Calculate loss
-    loss = loss_fun(logits, train_labels_node)
-
-    # Build a Graph that trains the model with one batch of examples and
-    # updates the model parameters.
-    train_op = train_op_fun(loss, global_step)
-
-    # Create a saver.
-    saver = tf.train.Saver(tf.global_variables())
-
-    print("Graph constructed and saver created")
-
-    # Build an initialization operation to run below.
-    init = tf.global_variables_initializer()
-
-    # Create and init sessions
-    sess = tf.Session(config=tf.ConfigProto(log_device_placement=FLAGS.log_device_placement)) #NOLINT(long-line)
-    sess.run(init)
-
-    print("Session ready, beginning training loop")
-
-    # Initialize the number of batches
-    data_length = len(images)
-    nb_batches = math.ceil(data_length / FLAGS.batch_size)
-
-    for step in xrange(FLAGS.max_steps):
-      # for debug, save start time
-      start_time = time.time()
-
-      # Current batch number
-      batch_nb = step % nb_batches
-
-      # Current batch start and end indices
-      start, end = utils.batch_indices(batch_nb, data_length, FLAGS.batch_size)
-
-      # Prepare dictionnary to feed the session with
-      feed_dict = {train_data_node: images[start:end],
-                   train_labels_node: labels[start:end]}
-
-      # Run training step
-      _, loss_value = sess.run([train_op, loss], feed_dict=feed_dict)
-
-      # Compute duration of training step
-      duration = time.time() - start_time
-
-      # Sanity check
-      assert not np.isnan(loss_value), 'Model diverged with loss = NaN'
-
-      # Echo loss once in a while
-      if step % 100 == 0:
-        num_examples_per_step = FLAGS.batch_size
-        examples_per_sec = num_examples_per_step / duration
-        sec_per_batch = float(duration)
-
-        format_str = ('%s: step %d, loss = %.2f (%.1f examples/sec; %.3f '
-                      'sec/batch)')
-        print (format_str % (datetime.now(), step, loss_value,
-                             examples_per_sec, sec_per_batch))
-
-      # Save the model checkpoint periodically.
-      if step % 1000 == 0 or (step + 1) == FLAGS.max_steps:
-        saver.save(sess, ckpt_path, global_step=step)
-
-  return True
-
-
-def softmax_preds(images, ckpt_path, return_logits=False):
-  """
-  Compute softmax activations (probabilities) with the model saved in the path
-  specified as an argument
-  :param images: a np array of images
-  :param ckpt_path: a TF model checkpoint
-  :param logits: if set to True, return logits instead of probabilities
-  :return: probabilities (or logits if logits is set to True)
-  """
-  # Compute nb samples and deduce nb of batches
-  data_length = len(images)
-  nb_batches = math.ceil(len(images) / FLAGS.batch_size)
-
-  # Declare data placeholder
-  train_data_node = _input_placeholder()
-
-  # Build a Graph that computes the logits predictions from the placeholder
-  if FLAGS.deeper:
-    logits = inference_deeper(train_data_node)
-  else:
-    logits = inference(train_data_node)
-
-  if return_logits:
-    # We are returning the logits directly (no need to apply softmax)
-    output = logits
-  else:
-    # Add softmax predictions to graph: will return probabilities
-    output = tf.nn.softmax(logits)
-
-  # Restore the moving average version of the learned variables for eval.
-  variable_averages = tf.train.ExponentialMovingAverage(MOVING_AVERAGE_DECAY)
-  variables_to_restore = variable_averages.variables_to_restore()
-  saver = tf.train.Saver(variables_to_restore)
-
-  # Will hold the result
-  preds = np.zeros((data_length, FLAGS.nb_labels), dtype=np.float32)
-
-  # Create TF session
-  with tf.Session() as sess:
-    # Restore TF session from checkpoint file
-    saver.restore(sess, ckpt_path)
-
-    # Parse data by batch
-    for batch_nb in xrange(0, int(nb_batches+1)):
-      # Compute batch start and end indices
-      start, end = utils.batch_indices(batch_nb, data_length, FLAGS.batch_size)
-
-      # Prepare feed dictionary
-      feed_dict = {train_data_node: images[start:end]}
-
-      # Run session ([0] because run returns a batch with len 1st dim == 1)
-      preds[start:end, :] = sess.run([output], feed_dict=feed_dict)[0]
-
-  # Reset graph to allow multiple calls
-  tf.reset_default_graph()
-
-  return preds
--- a/tensorflow_privacy/research/pate_2017/input.py
+++ b/tensorflow_privacy/research/pate_2017/input.py
@ -1,396 +0,0 @@
-# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import gzip
-import math
-import os
-import sys
-import tarfile
-
-import numpy as np
-from scipy.io import loadmat as loadmat
-from six.moves import cPickle as pickle
-from six.moves import urllib
-from six.moves import xrange
-import tensorflow as tf
-
-FLAGS = tf.flags.FLAGS
-
-
-def create_dir_if_needed(dest_directory):
-  """Create directory if doesn't exist."""
-  if not tf.gfile.IsDirectory(dest_directory):
-    tf.gfile.MakeDirs(dest_directory)
-
-  return True
-
-
-def maybe_download(file_urls, directory):
-  """Download a set of files in temporary local folder."""
-
-  # Create directory if doesn't exist
-  assert create_dir_if_needed(directory)
-
-  # This list will include all URLS of the local copy of downloaded files
-  result = []
-
-  # For each file of the dataset
-  for file_url in file_urls:
-    # Extract filename
-    filename = file_url.split('/')[-1]
-
-    # If downloading from GitHub, remove suffix ?raw=True from local filename
-    if filename.endswith("?raw=true"):
-      filename = filename[:-9]
-
-    # Deduce local file url
-    #filepath = os.path.join(directory, filename)
-    filepath = directory + '/' + filename
-
-    # Add to result list
-    result.append(filepath)
-
-    # Test if file already exists
-    if not tf.gfile.Exists(filepath):
-      def _progress(count, block_size, total_size):
-        sys.stdout.write('\r>> Downloading %s %.1f%%' % (filename,
-            float(count * block_size) / float(total_size) * 100.0))
-        sys.stdout.flush()
-      filepath, _ = urllib.request.urlretrieve(file_url, filepath, _progress)
-      print()
-      statinfo = os.stat(filepath)
-      print('Successfully downloaded', filename, statinfo.st_size, 'bytes.')
-
-  return result
-
-
-def image_whitening(data):
-  """
-  Subtracts mean of image and divides by adjusted standard variance (for
-  stability). Operations are per image but performed for the entire array.
-  """
-  assert len(np.shape(data)) == 4
-
-  # Compute number of pixels in image
-  nb_pixels = np.shape(data)[1] * np.shape(data)[2] * np.shape(data)[3]
-
-  # Subtract mean
-  mean = np.mean(data, axis=(1, 2, 3))
-
-  ones = np.ones(np.shape(data)[1:4], dtype=np.float32)
-  for i in xrange(len(data)):
-    data[i, :, :, :] -= mean[i] * ones
-
-  # Compute adjusted standard variance
-  adj_std_var = np.maximum(np.ones(len(data), dtype=np.float32) / math.sqrt(nb_pixels), np.std(data, axis=(1, 2, 3)))  # pylint: disable=line-too-long
-
-  # Divide image
-  for i in xrange(len(data)):
-    data[i, :, :, :] = data[i, :, :, :] / adj_std_var[i]
-
-  print(np.shape(data))
-
-  return data
-
-
-def extract_svhn(local_url):
-  """Extract a MATLAB matrix into two numpy arrays with data and labels."""
-
-  with tf.gfile.Open(local_url, mode='r') as file_obj:
-    # Load MATLAB matrix using scipy IO
-    data_dict = loadmat(file_obj)
-
-    # Extract each dictionary (one for data, one for labels)
-    data, labels = data_dict['X'], data_dict['y']
-
-    # Set np type
-    data = np.asarray(data, dtype=np.float32)
-    labels = np.asarray(labels, dtype=np.int32)
-
-    # Transpose data to match TF model input format
-    data = data.transpose(3, 0, 1, 2)
-
-    # Fix the SVHN labels which label 0s as 10s
-    labels[labels == 10] = 0
-
-    # Fix label dimensions
-    labels = labels.reshape(len(labels))
-
-    return data, labels
-
-
-def unpickle_cifar_dic(file_path):
-  """Helper function: unpickles a dictionary (used for loading CIFAR)."""
-  file_obj = open(file_path, 'rb')
-  data_dict = pickle.load(file_obj)
-  file_obj.close()
-  return data_dict['data'], data_dict['labels']
-
-
-def extract_cifar10(local_url, data_dir):
-  """Extracts CIFAR-10 and return numpy arrays with the different sets."""
-
-  # These numpy dumps can be reloaded to avoid performing the pre-processing
-  # if they exist in the working directory.
-  # Changing the order of this list will ruin the indices below.
-  preprocessed_files = ['/cifar10_train.npy',
-                        '/cifar10_train_labels.npy',
-                        '/cifar10_test.npy',
-                        '/cifar10_test_labels.npy']
-
-  all_preprocessed = True
-  for file_name in preprocessed_files:
-    if not tf.gfile.Exists(data_dir + file_name):
-      all_preprocessed = False
-      break
-
-  if all_preprocessed:
-    # Reload pre-processed training data from numpy dumps
-    with tf.gfile.Open(data_dir + preprocessed_files[0], mode='r') as file_obj:
-      train_data = np.load(file_obj)
-    with tf.gfile.Open(data_dir + preprocessed_files[1], mode='r') as file_obj:
-      train_labels = np.load(file_obj)
-
-    # Reload pre-processed testing data from numpy dumps
-    with tf.gfile.Open(data_dir + preprocessed_files[2], mode='r') as file_obj:
-      test_data = np.load(file_obj)
-    with tf.gfile.Open(data_dir + preprocessed_files[3], mode='r') as file_obj:
-      test_labels = np.load(file_obj)
-
-  else:
-    # Do everything from scratch
-    # Define lists of all files we should extract
-    train_files = ['data_batch_' + str(i) for i in xrange(1, 6)]
-    test_file = ['test_batch']
-    cifar10_files = train_files + test_file
-
-    # Check if all files have already been extracted
-    need_to_unpack = False
-    for file_name in cifar10_files:
-      if not tf.gfile.Exists(file_name):
-        need_to_unpack = True
-        break
-
-    # We have to unpack the archive
-    if need_to_unpack:
-      tarfile.open(local_url, 'r:gz').extractall(data_dir)
-
-    # Load training images and labels
-    images = []
-    labels = []
-    for train_file in train_files:
-      # Construct filename
-      filename = data_dir + '/cifar-10-batches-py/' + train_file
-
-      # Unpickle dictionary and extract images and labels
-      images_tmp, labels_tmp = unpickle_cifar_dic(filename)
-
-      # Append to lists
-      images.append(images_tmp)
-      labels.append(labels_tmp)
-
-    # Convert to numpy arrays and reshape in the expected format
-    train_data = np.asarray(images, dtype=np.float32)
-    train_data = train_data.reshape((50000, 3, 32, 32))
-    train_data = np.swapaxes(train_data, 1, 3)
-    train_labels = np.asarray(labels, dtype=np.int32).reshape(50000)
-
-    # Save so we don't have to do this again
-    np.save(data_dir + preprocessed_files[0], train_data)
-    np.save(data_dir + preprocessed_files[1], train_labels)
-
-    # Construct filename for test file
-    filename = data_dir + '/cifar-10-batches-py/' + test_file[0]
-
-    # Load test images and labels
-    test_data, test_images = unpickle_cifar_dic(filename)
-
-    # Convert to numpy arrays and reshape in the expected format
-    test_data = np.asarray(test_data, dtype=np.float32)
-    test_data = test_data.reshape((10000, 3, 32, 32))
-    test_data = np.swapaxes(test_data, 1, 3)
-    test_labels = np.asarray(test_images, dtype=np.int32).reshape(10000)
-
-    # Save so we don't have to do this again
-    np.save(data_dir + preprocessed_files[2], test_data)
-    np.save(data_dir + preprocessed_files[3], test_labels)
-
-  return train_data, train_labels, test_data, test_labels
-
-
-def extract_mnist_data(filename, num_images, image_size, pixel_depth):
-  """
-  Extract the images into a 4D tensor [image index, y, x, channels].
-
-  Values are rescaled from [0, 255] down to [-0.5, 0.5].
-  """
-  if not tf.gfile.Exists(filename+'.npy'):
-    with gzip.open(filename) as bytestream:
-      bytestream.read(16)
-      buf = bytestream.read(image_size * image_size * num_images)
-      data = np.frombuffer(buf, dtype=np.uint8).astype(np.float32)
-      data = (data - (pixel_depth / 2.0)) / pixel_depth
-      data = data.reshape(num_images, image_size, image_size, 1)
-      np.save(filename, data)
-      return data
-  else:
-    with tf.gfile.Open(filename+'.npy', mode='rb') as file_obj:
-      return np.load(file_obj)
-
-
-def extract_mnist_labels(filename, num_images):
-  """
-  Extract the labels into a vector of int64 label IDs.
-  """
-  if not tf.gfile.Exists(filename+'.npy'):
-    with gzip.open(filename) as bytestream:
-      bytestream.read(8)
-      buf = bytestream.read(1 * num_images)
-      labels = np.frombuffer(buf, dtype=np.uint8).astype(np.int32)
-      np.save(filename, labels)
-    return labels
-  else:
-    with tf.gfile.Open(filename+'.npy', mode='rb') as file_obj:
-      return np.load(file_obj)
-
-
-def ld_svhn(extended=False, test_only=False):
-  """
-  Load the original SVHN data
-
-  Args:
-    extended: include extended training data in the returned array
-    test_only: disables loading of both train and extra -> large speed up
-  """
-  # Define files to be downloaded
-  # WARNING: changing the order of this list will break indices (cf. below)
-  file_urls = ['http://ufldl.stanford.edu/housenumbers/train_32x32.mat',
-               'http://ufldl.stanford.edu/housenumbers/test_32x32.mat',
-               'http://ufldl.stanford.edu/housenumbers/extra_32x32.mat']
-
-  # Maybe download data and retrieve local storage urls
-  local_urls = maybe_download(file_urls, FLAGS.data_dir)
-
-  # Extra Train, Test, and Extended Train data
-  if not test_only:
-    # Load and applying whitening to train data
-    train_data, train_labels = extract_svhn(local_urls[0])
-    train_data = image_whitening(train_data)
-
-    # Load and applying whitening to extended train data
-    ext_data, ext_labels = extract_svhn(local_urls[2])
-    ext_data = image_whitening(ext_data)
-
-  # Load and applying whitening to test data
-  test_data, test_labels = extract_svhn(local_urls[1])
-  test_data = image_whitening(test_data)
-
-  if test_only:
-    return test_data, test_labels
-  else:
-    if extended:
-      # Stack train data with the extended training data
-      train_data = np.vstack((train_data, ext_data))
-      train_labels = np.hstack((train_labels, ext_labels))
-
-      return train_data, train_labels, test_data, test_labels
-    else:
-      # Return training and extended training data separately
-      return train_data, train_labels, test_data, test_labels, ext_data, ext_labels
-
-
-def ld_cifar10(test_only=False):
-  """Load the original CIFAR10 data."""
-
-  # Define files to be downloaded
-  file_urls = ['https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz']
-
-  # Maybe download data and retrieve local storage urls
-  local_urls = maybe_download(file_urls, FLAGS.data_dir)
-
-  # Extract archives and return different sets
-  dataset = extract_cifar10(local_urls[0], FLAGS.data_dir)
-
-  # Unpack tuple
-  train_data, train_labels, test_data, test_labels = dataset
-
-  # Apply whitening to input data
-  train_data = image_whitening(train_data)
-  test_data = image_whitening(test_data)
-
-  if test_only:
-    return test_data, test_labels
-  else:
-    return train_data, train_labels, test_data, test_labels
-
-
-def ld_mnist(test_only=False):
-  """Load the MNIST dataset."""
-  # Define files to be downloaded
-  # WARNING: changing the order of this list will break indices (cf. below)
-  file_urls = ['http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz',
-               'http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz',
-               'http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz',
-               'http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz',
-              ]
-
-  # Maybe download data and retrieve local storage urls
-  local_urls = maybe_download(file_urls, FLAGS.data_dir)
-
-  # Extract it into np arrays.
-  train_data = extract_mnist_data(local_urls[0], 60000, 28, 1)
-  train_labels = extract_mnist_labels(local_urls[1], 60000)
-  test_data = extract_mnist_data(local_urls[2], 10000, 28, 1)
-  test_labels = extract_mnist_labels(local_urls[3], 10000)
-
-  if test_only:
-    return test_data, test_labels
-  else:
-    return train_data, train_labels, test_data, test_labels
-
-
-def partition_dataset(data, labels, nb_teachers, teacher_id):
-  """
-  Simple partitioning algorithm that returns the right portion of the data
-  needed by a given teacher out of a certain nb of teachers
-
-  Args:
-    data: input data to be partitioned
-    labels: output data to be partitioned
-    nb_teachers: number of teachers in the ensemble (affects size of each
-                      partition)
-    teacher_id: id of partition to retrieve
-  """
-
-  # Sanity check
-  assert len(data) == len(labels)
-  assert int(teacher_id) < int(nb_teachers)
-
-  # This will floor the possible number of batches
-  batch_len = int(len(data) / nb_teachers)
-
-  # Compute start, end indices of partition
-  start = teacher_id * batch_len
-  end = (teacher_id+1) * batch_len
-
-  # Slice partition off
-  partition_data = data[start:end]
-  partition_labels = labels[start:end]
-
-  return partition_data, partition_labels
--- a/tensorflow_privacy/research/pate_2017/metrics.py
+++ b/tensorflow_privacy/research/pate_2017/metrics.py
@ -1,49 +0,0 @@
-# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import numpy as np
-
-
-def accuracy(logits, labels):
-  """
-  Return accuracy of the array of logits (or label predictions) wrt the labels
-  :param logits: this can either be logits, probabilities, or a single label
-  :param labels: the correct labels to match against
-  :return: the accuracy as a float
-  """
-  assert len(logits) == len(labels)
-
-  if len(np.shape(logits)) > 1:
-    # Predicted labels are the argmax over axis 1
-    predicted_labels = np.argmax(logits, axis=1)
-  else:
-    # Input was already labels
-    assert len(np.shape(logits)) == 1
-    predicted_labels = logits
-
-  # Check against correct labels to compute correct guesses
-  correct = np.sum(predicted_labels == labels.reshape(len(labels)))
-
-  # Divide by number of labels to obtain accuracy
-  accuracy = float(correct) / len(labels)
-
-  # Return float value
-  return accuracy
-
-
--- a/tensorflow_privacy/research/pate_2017/train_student.py
+++ b/tensorflow_privacy/research/pate_2017/train_student.py
@ -1,205 +0,0 @@
-# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-import aggregation
-import deep_cnn
-import input  # pylint: disable=redefined-builtin
-import metrics
-import numpy as np
-from six.moves import xrange
-import tensorflow as tf
-
-FLAGS = tf.flags.FLAGS
-
-tf.flags.DEFINE_string('dataset', 'svhn', 'The name of the dataset to use')
-tf.flags.DEFINE_integer('nb_labels', 10, 'Number of output classes')
-
-tf.flags.DEFINE_string('data_dir','/tmp','Temporary storage')
-tf.flags.DEFINE_string('train_dir','/tmp/train_dir','Where model chkpt are saved')
-tf.flags.DEFINE_string('teachers_dir','/tmp/train_dir',
-                       'Directory where teachers checkpoints are stored.')
-
-tf.flags.DEFINE_integer('teachers_max_steps', 3000,
-                        'Number of steps teachers were ran.')
-tf.flags.DEFINE_integer('max_steps', 3000, 'Number of steps to run student.')
-tf.flags.DEFINE_integer('nb_teachers', 10, 'Teachers in the ensemble.')
-tf.flags.DEFINE_integer('stdnt_share', 1000,
-                        'Student share (last index) of the test data')
-tf.flags.DEFINE_integer('lap_scale', 10,
-                        'Scale of the Laplacian noise added for privacy')
-tf.flags.DEFINE_boolean('save_labels', False,
-                        'Dump numpy arrays of labels and clean teacher votes')
-tf.flags.DEFINE_boolean('deeper', False, 'Activate deeper CNN model')
-
-
-def ensemble_preds(dataset, nb_teachers, stdnt_data):
-  """
-  Given a dataset, a number of teachers, and some input data, this helper
-  function queries each teacher for predictions on the data and returns
-  all predictions in a single array. (That can then be aggregated into
-  one single prediction per input using aggregation.py (cf. function
-  prepare_student_data() below)
-  :param dataset: string corresponding to mnist, cifar10, or svhn
-  :param nb_teachers: number of teachers (in the ensemble) to learn from
-  :param stdnt_data: unlabeled student training data
-  :return: 3d array (teacher id, sample id, probability per class)
-  """
-
-  # Compute shape of array that will hold probabilities produced by each
-  # teacher, for each training point, and each output class
-  result_shape = (nb_teachers, len(stdnt_data), FLAGS.nb_labels)
-
-  # Create array that will hold result
-  result = np.zeros(result_shape, dtype=np.float32)
-
-  # Get predictions from each teacher
-  for teacher_id in xrange(nb_teachers):
-    # Compute path of checkpoint file for teacher model with ID teacher_id
-    if FLAGS.deeper:
-      ckpt_path = FLAGS.teachers_dir + '/' + str(dataset) + '_' + str(nb_teachers) + '_teachers_' + str(teacher_id) + '_deep.ckpt-' + str(FLAGS.teachers_max_steps - 1) #NOLINT(long-line)
-    else:
-      ckpt_path = FLAGS.teachers_dir + '/' + str(dataset) + '_' + str(nb_teachers) + '_teachers_' + str(teacher_id) + '.ckpt-' + str(FLAGS.teachers_max_steps - 1)  # NOLINT(long-line)
-
-    # Get predictions on our training data and store in result array
-    result[teacher_id] = deep_cnn.softmax_preds(stdnt_data, ckpt_path)
-
-    # This can take a while when there are a lot of teachers so output status
-    print("Computed Teacher " + str(teacher_id) + " softmax predictions")
-
-  return result
-
-
-def prepare_student_data(dataset, nb_teachers, save=False):
-  """
-  Takes a dataset name and the size of the teacher ensemble and prepares
-  training data for the student model, according to parameters indicated
-  in flags above.
-  :param dataset: string corresponding to mnist, cifar10, or svhn
-  :param nb_teachers: number of teachers (in the ensemble) to learn from
-  :param save: if set to True, will dump student training labels predicted by
-               the ensemble of teachers (with Laplacian noise) as npy files.
-               It also dumps the clean votes for each class (without noise) and
-               the labels assigned by teachers
-  :return: pairs of (data, labels) to be used for student training and testing
-  """
-  assert input.create_dir_if_needed(FLAGS.train_dir)
-
-  # Load the dataset
-  if dataset == 'svhn':
-    test_data, test_labels = input.ld_svhn(test_only=True)
-  elif dataset == 'cifar10':
-    test_data, test_labels = input.ld_cifar10(test_only=True)
-  elif dataset == 'mnist':
-    test_data, test_labels = input.ld_mnist(test_only=True)
-  else:
-    print("Check value of dataset flag")
-    return False
-
-  # Make sure there is data leftover to be used as a test set
-  assert FLAGS.stdnt_share < len(test_data)
-
-  # Prepare [unlabeled] student training data (subset of test set)
-  stdnt_data = test_data[:FLAGS.stdnt_share]
-
-  # Compute teacher predictions for student training data
-  teachers_preds = ensemble_preds(dataset, nb_teachers, stdnt_data)
-
-  # Aggregate teacher predictions to get student training labels
-  if not save:
-    stdnt_labels = aggregation.noisy_max(teachers_preds, FLAGS.lap_scale)
-  else:
-    # Request clean votes and clean labels as well
-    stdnt_labels, clean_votes, labels_for_dump = aggregation.noisy_max(teachers_preds, FLAGS.lap_scale, return_clean_votes=True) #NOLINT(long-line)
-
-    # Prepare filepath for numpy dump of clean votes
-    filepath = FLAGS.data_dir + "/" + str(dataset) + '_' + str(nb_teachers) + '_student_clean_votes_lap_' + str(FLAGS.lap_scale) + '.npy'  # NOLINT(long-line)
-
-    # Prepare filepath for numpy dump of clean labels
-    filepath_labels = FLAGS.data_dir + "/" + str(dataset) + '_' + str(nb_teachers) + '_teachers_labels_lap_' + str(FLAGS.lap_scale) + '.npy'  # NOLINT(long-line)
-
-    # Dump clean_votes array
-    with tf.gfile.Open(filepath, mode='w') as file_obj:
-      np.save(file_obj, clean_votes)
-
-    # Dump labels_for_dump array
-    with tf.gfile.Open(filepath_labels, mode='w') as file_obj:
-      np.save(file_obj, labels_for_dump)
-
-  # Print accuracy of aggregated labels
-  ac_ag_labels = metrics.accuracy(stdnt_labels, test_labels[:FLAGS.stdnt_share])
-  print("Accuracy of the aggregated labels: " + str(ac_ag_labels))
-
-  # Store unused part of test set for use as a test set after student training
-  stdnt_test_data = test_data[FLAGS.stdnt_share:]
-  stdnt_test_labels = test_labels[FLAGS.stdnt_share:]
-
-  if save:
-    # Prepare filepath for numpy dump of labels produced by noisy aggregation
-    filepath = FLAGS.data_dir + "/" + str(dataset) + '_' + str(nb_teachers) + '_student_labels_lap_' + str(FLAGS.lap_scale) + '.npy' #NOLINT(long-line)
-
-    # Dump student noisy labels array
-    with tf.gfile.Open(filepath, mode='w') as file_obj:
-      np.save(file_obj, stdnt_labels)
-
-  return stdnt_data, stdnt_labels, stdnt_test_data, stdnt_test_labels
-
-
-def train_student(dataset, nb_teachers):
-  """
-  This function trains a student using predictions made by an ensemble of
-  teachers. The student and teacher models are trained using the same
-  neural network architecture.
-  :param dataset: string corresponding to mnist, cifar10, or svhn
-  :param nb_teachers: number of teachers (in the ensemble) to learn from
-  :return: True if student training went well
-  """
-  assert input.create_dir_if_needed(FLAGS.train_dir)
-
-  # Call helper function to prepare student data using teacher predictions
-  stdnt_dataset = prepare_student_data(dataset, nb_teachers, save=True)
-
-  # Unpack the student dataset
-  stdnt_data, stdnt_labels, stdnt_test_data, stdnt_test_labels = stdnt_dataset
-
-  # Prepare checkpoint filename and path
-  if FLAGS.deeper:
-    ckpt_path = FLAGS.train_dir + '/' + str(dataset) + '_' + str(nb_teachers) + '_student_deeper.ckpt' #NOLINT(long-line)
-  else:
-    ckpt_path = FLAGS.train_dir + '/' + str(dataset) + '_' + str(nb_teachers) + '_student.ckpt'  # NOLINT(long-line)
-
-  # Start student training
-  assert deep_cnn.train(stdnt_data, stdnt_labels, ckpt_path)
-
-  # Compute final checkpoint name for student (with max number of steps)
-  ckpt_path_final = ckpt_path + '-' + str(FLAGS.max_steps - 1)
-
-  # Compute student label predictions on remaining chunk of test set
-  student_preds = deep_cnn.softmax_preds(stdnt_test_data, ckpt_path_final)
-
-  # Compute teacher accuracy
-  precision = metrics.accuracy(student_preds, stdnt_test_labels)
-  print('Precision of student after training: ' + str(precision))
-
-  return True
-
-def main(argv=None): # pylint: disable=unused-argument
-  # Run student training according to values specified in flags
-  assert train_student(FLAGS.dataset, FLAGS.nb_teachers)
-
-if __name__ == '__main__':
-  tf.app.run()
--- a/tensorflow_privacy/research/pate_2017/train_student_mnist_250_lap_20_count_50_epochs_600.sh
+++ b/tensorflow_privacy/research/pate_2017/train_student_mnist_250_lap_20_count_50_epochs_600.sh
@ -1,25 +0,0 @@
-# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-
-# Be sure to clone https://github.com/openai/improved-gan
-# and add improved-gan/mnist_svhn_cifar10 to your PATH variable
-
-# Download labels used to train the student
-wget https://github.com/npapernot/multiple-teachers-for-privacy/blob/master/mnist_250_student_labels_lap_20.npy
-
-# Train the student using improved-gan 
-THEANO_FLAGS='floatX=float32,device=gpu,lib.cnmem=1' train_mnist_fm_custom_labels.py --labels mnist_250_student_labels_lap_20.npy --count 50 --epochs 600
-
--- a/tensorflow_privacy/research/pate_2017/train_teachers.py
+++ b/tensorflow_privacy/research/pate_2017/train_teachers.py
@ -1,101 +0,0 @@
-# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-import deep_cnn
-import input  # pylint: disable=redefined-builtin
-import metrics
-import tensorflow as tf
-
-
-tf.flags.DEFINE_string('dataset', 'svhn', 'The name of the dataset to use')
-tf.flags.DEFINE_integer('nb_labels', 10, 'Number of output classes')
-
-tf.flags.DEFINE_string('data_dir','/tmp','Temporary storage')
-tf.flags.DEFINE_string('train_dir','/tmp/train_dir',
-                       'Where model ckpt are saved')
-
-tf.flags.DEFINE_integer('max_steps', 3000, 'Number of training steps to run.')
-tf.flags.DEFINE_integer('nb_teachers', 50, 'Teachers in the ensemble.')
-tf.flags.DEFINE_integer('teacher_id', 0, 'ID of teacher being trained.')
-
-tf.flags.DEFINE_boolean('deeper', False, 'Activate deeper CNN model')
-
-FLAGS = tf.flags.FLAGS
-
-
-def train_teacher(dataset, nb_teachers, teacher_id):
-  """
-  This function trains a teacher (teacher id) among an ensemble of nb_teachers
-  models for the dataset specified.
-  :param dataset: string corresponding to dataset (svhn, cifar10)
-  :param nb_teachers: total number of teachers in the ensemble
-  :param teacher_id: id of the teacher being trained
-  :return: True if everything went well
-  """
-  # If working directories do not exist, create them
-  assert input.create_dir_if_needed(FLAGS.data_dir)
-  assert input.create_dir_if_needed(FLAGS.train_dir)
-
-  # Load the dataset
-  if dataset == 'svhn':
-    train_data,train_labels,test_data,test_labels = input.ld_svhn(extended=True)
-  elif dataset == 'cifar10':
-    train_data, train_labels, test_data, test_labels = input.ld_cifar10()
-  elif dataset == 'mnist':
-    train_data, train_labels, test_data, test_labels = input.ld_mnist()
-  else:
-    print("Check value of dataset flag")
-    return False
-
-  # Retrieve subset of data for this teacher
-  data, labels = input.partition_dataset(train_data,
-                                         train_labels,
-                                         nb_teachers,
-                                         teacher_id)
-
-  print("Length of training data: " + str(len(labels)))
-
-  # Define teacher checkpoint filename and full path
-  if FLAGS.deeper:
-    filename = str(nb_teachers) + '_teachers_' + str(teacher_id) + '_deep.ckpt'
-  else:
-    filename = str(nb_teachers) + '_teachers_' + str(teacher_id) + '.ckpt'
-  ckpt_path = FLAGS.train_dir + '/' + str(dataset) + '_' + filename
-
-  # Perform teacher training
-  assert deep_cnn.train(data, labels, ckpt_path)
-
-  # Append final step value to checkpoint for evaluation
-  ckpt_path_final = ckpt_path + '-' + str(FLAGS.max_steps - 1)
-
-  # Retrieve teacher probability estimates on the test data
-  teacher_preds = deep_cnn.softmax_preds(test_data, ckpt_path_final)
-
-  # Compute teacher accuracy
-  precision = metrics.accuracy(teacher_preds, test_labels)
-  print('Precision of teacher after training: ' + str(precision))
-
-  return True
-
-
-def main(argv=None):  # pylint: disable=unused-argument
-  # Make a call to train_teachers with values specified in flags
-  assert train_teacher(FLAGS.dataset, FLAGS.nb_teachers, FLAGS.teacher_id)
-
-if __name__ == '__main__':
-  tf.app.run()
--- a/tensorflow_privacy/research/pate_2017/utils.py
+++ b/tensorflow_privacy/research/pate_2017/utils.py
@ -1,35 +0,0 @@
-# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-
-def batch_indices(batch_nb, data_length, batch_size):
-  """
-  This helper function computes a batch start and end index
-  :param batch_nb: the batch number
-  :param data_length: the total length of the data being parsed by batches
-  :param batch_size: the number of inputs in each batch
-  :return: pair of (start, end) indices
-  """
-  # Batch start and end index
-  start = int(batch_nb * batch_size)
-  end = int((batch_nb + 1) * batch_size)
-
-  # When there are not enough inputs left, we reuse some to complete the batch
-  if end > data_length:
-    shift = end - data_length
-    start -= shift
-    end -= shift
-
-  return start, end
--- a/tensorflow_privacy/research/pate_2018/ICLR2018/README.md
+++ b/tensorflow_privacy/research/pate_2018/ICLR2018/README.md
@ -1,61 +0,0 @@
-Scripts in support of the paper "Scalable Private Learning with PATE" by Nicolas
-Papernot, Shuang Song, Ilya Mironov, Ananth Raghunathan, Kunal Talwar, Ulfar
-Erlingsson (ICLR 2018, https://arxiv.org/abs/1802.08908).
-
-
-### Requirements
-
-* Python, version &ge; 2.7
-* absl (see [here](https://github.com/abseil/abseil-py), or just type `pip install absl-py`)
-* matplotlib
-* numpy
-* scipy
-* sympy (for smooth sensitivity analysis)  
-* write access to the current directory (otherwise, output directories in download.py and *.sh
-scripts must be changed)
-
-## Reproducing Figures 1 and 5, and Table 2
-
-Before running any of the analysis scripts, create the data/ directory and download votes files by running\
-`$ python download.py`
-
-To generate Figures 1 and 5 run\
-`$ sh generate_figures.sh`\
-The output is written to the figures/ directory.
-
-For Table 2 run (may take several hours)\
-`$ sh generate_table.sh`\
-The output is written to the console.
-
-For data-independent bounds (for comparison with Table 2), run\
-`$ sh generate_table_data_independent.sh`\
-The output is written to the console.
-
-## Files in this directory
-
-*   generate_figures.sh &mdash; Master script for generating Figures 1 and 5.
-
-*   generate_table.sh &mdash; Master script for generating Table 2.
-
-*   generate_table_data_independent.sh &mdash; Master script for computing data-independent
-    bounds.
-
-*   rdp_bucketized.py &mdash; Script for producing Figure 1 (right) and Figure 5 (right).
-
-*   rdp_cumulative.py &mdash; Script for producing Figure 1 (middle) and Figure 5 (left).
-   
-*   smooth_sensitivity_table.py &mdash; Script for generating Table 2.
-
-*   utility_queries_answered &mdash; Script for producing Figure 1 (left).
-
-*   plot_partition.py &mdash; Script for producing partition.pdf, a detailed breakdown of privacy
-costs for Confident-GNMax with smooth sensitivity analysis (takes ~50 hours).
-
-*   plots_for_slides.py &mdash; Script for producing several plots for the slide deck. 
-
-*   download.py &mdash; Utility script for populating the data/ directory.
-
-*   plot_ls_q.py is not used.
-
-
-All Python files take flags. Run script_name.py --help for help on flags.
--- a/tensorflow_privacy/research/pate_2018/ICLR2018/download.py
+++ b/tensorflow_privacy/research/pate_2018/ICLR2018/download.py
@ -1,43 +0,0 @@
-# Copyright 2017 The 'Scalable Private Learning with PATE' Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Script to download votes files to the data/ directory.
-"""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-from six.moves import urllib
-import os
-import tarfile
-
-FILE_URI = 'https://storage.googleapis.com/pate-votes/votes.gz'
-DATA_DIR = 'data/'
-
-
-def download():
-  print('Downloading ' + FILE_URI)
-  tar_filename, _ = urllib.request.urlretrieve(FILE_URI)
-  print('Unpacking ' + tar_filename)
-  with tarfile.open(tar_filename, "r:gz") as tar:
-    tar.extractall(DATA_DIR)
-  print('Done!')
-
-
-if __name__ == '__main__':
-  if not os.path.exists(DATA_DIR):
-    print('Data directory does not exist. Creating ' + DATA_DIR)
-    os.makedirs(DATA_DIR)
-  download()
--- a/tensorflow_privacy/research/pate_2018/ICLR2018/generate_figures.sh
+++ b/tensorflow_privacy/research/pate_2018/ICLR2018/generate_figures.sh
@ -1,43 +0,0 @@
-#!/bin/bash
-# Copyright 2017 The 'Scalable Private Learning with PATE' Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-
-counts_file="data/glyph_5000_teachers.npy"
-output_dir="figures/"
-
-mkdir -p $output_dir
-
-if [ ! -d "$output_dir" ]; then
-  echo "Directory $output_dir does not exist."
-  exit 1
-fi
-
-python rdp_bucketized.py \
-  --plot=small \
-  --counts_file=$counts_file \
-  --plot_file=$output_dir"noisy_thresholding_check_perf.pdf"
-
-python rdp_bucketized.py \
-  --plot=large \
-  --counts_file=$counts_file \
-  --plot_file=$output_dir"noisy_thresholding_check_perf_details.pdf"
-
-python rdp_cumulative.py \
-  --cache=False \
-  --counts_file=$counts_file \
-  --figures_dir=$output_dir
-
-python utility_queries_answered.py --plot_file=$output_dir"utility_queries_answered.pdf"
--- a/tensorflow_privacy/research/pate_2018/ICLR2018/generate_table.sh
+++ b/tensorflow_privacy/research/pate_2018/ICLR2018/generate_table.sh
@ -1,93 +0,0 @@
-#!/bin/bash
-# Copyright 2017 The 'Scalable Private Learning with PATE' Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-
-echo "Reproducing Table 2. Takes a couple of hours."
-
-executable="python smooth_sensitivity_table.py"
-data_dir="data"
-
-echo
-echo "######## MNIST ########"
-echo
-
-$executable \
-  --counts_file=$data_dir"/mnist_250_teachers.npy" \
-  --threshold=200 \
-  --sigma1=150 \
-  --sigma2=40 \
-  --queries=640 \
-  --delta=1e-5
-
-echo
-echo "######## SVHN ########"
-echo
-
-$executable \
-  --counts_file=$data_dir"/svhn_250_teachers.npy" \
-  --threshold=300 \
-  --sigma1=200 \
-  --sigma2=40 \
-  --queries=8500 \
-  --delta=1e-6
-
-echo
-echo "######## Adult ########"
-echo
-
-$executable \
-  --counts_file=$data_dir"/adult_250_teachers.npy" \
-  --threshold=300 \
-  --sigma1=200 \
-  --sigma2=40 \
-  --queries=1500 \
-  --delta=1e-5
-
-echo
-echo "######## Glyph (Confident) ########"
-echo
-
-$executable \
-  --counts_file=$data_dir"/glyph_5000_teachers.npy" \
-  --threshold=1000 \
-  --sigma1=500 \
-  --sigma2=100 \
-  --queries=12000 \
-  --delta=1e-8
-
-echo
-echo "######## Glyph (Interactive, Round 1) ########"
-echo
-
-$executable \
-  --counts_file=$data_dir"/glyph_round1.npy" \
-  --threshold=3500 \
-  --sigma1=1500 \
-  --sigma2=100 \
-  --delta=1e-8
-
-echo
-echo "######## Glyph (Interactive, Round 2) ########"
-echo
-
-$executable \
-  --counts_file=$data_dir"/glyph_round2.npy" \
-  --baseline_file=$data_dir"/glyph_round2_student.npy" \
-  --threshold=3500 \
-  --sigma1=2000 \
-  --sigma2=200 \
-  --teachers=5000 \
-  --delta=1e-8
--- a/tensorflow_privacy/research/pate_2018/ICLR2018/generate_table_data_independent.sh
+++ b/tensorflow_privacy/research/pate_2018/ICLR2018/generate_table_data_independent.sh
@ -1,99 +0,0 @@
-#!/bin/bash
-# Copyright 2017 The 'Scalable Private Learning with PATE' Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-
-echo "Table 2 with data-independent analysis."
-
-executable="python smooth_sensitivity_table.py"
-data_dir="data"
-
-echo
-echo "######## MNIST ########"
-echo
-
-$executable \
-  --counts_file=$data_dir"/mnist_250_teachers.npy" \
-  --threshold=200 \
-  --sigma1=150 \
-  --sigma2=40 \
-  --queries=640 \
-  --delta=1e-5 \
-  --data_independent
-echo
-echo "######## SVHN ########"
-echo
-
-$executable \
-  --counts_file=$data_dir"/svhn_250_teachers.npy" \
-  --threshold=300 \
-  --sigma1=200 \
-  --sigma2=40 \
-  --queries=8500 \
-  --delta=1e-6 \
-  --data_independent
-
-echo
-echo "######## Adult ########"
-echo
-
-$executable \
-  --counts_file=$data_dir"/adult_250_teachers.npy" \
-  --threshold=300 \
-  --sigma1=200 \
-  --sigma2=40 \
-  --queries=1500 \
-  --delta=1e-5 \
-  --data_independent
-
-echo
-echo "######## Glyph (Confident) ########"
-echo
-
-$executable \
-  --counts_file=$data_dir"/glyph_5000_teachers.npy" \
-  --threshold=1000 \
-  --sigma1=500 \
-  --sigma2=100 \
-  --queries=12000 \
-  --delta=1e-8 \
-  --data_independent
-
-echo
-echo "######## Glyph (Interactive, Round 1) ########"
-echo
-
-$executable \
-  --counts_file=$data_dir"/glyph_round1.npy" \
-  --threshold=3500 \
-  --sigma1=1500 \
-  --sigma2=100 \
-  --delta=1e-8 \
-  --data_independent
-
-echo
-echo "######## Glyph (Interactive, Round 2) ########"
-echo
-
-$executable \
-  --counts_file=$data_dir"/glyph_round2.npy" \
-  --baseline_file=$data_dir"/glyph_round2_student.npy" \
-  --threshold=3500 \
-  --sigma1=2000 \
-  --sigma2=200 \
-  --teachers=5000 \
-  --delta=1e-8 \
-  --order=8.5 \
-  --data_independent
--- a/tensorflow_privacy/research/pate_2018/ICLR2018/plot_ls_q.py
+++ b/tensorflow_privacy/research/pate_2018/ICLR2018/plot_ls_q.py
@ -1,105 +0,0 @@
-# Copyright 2017 The 'Scalable Private Learning with PATE' Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-"""Plots LS(q).
-
-A script in support of the PATE2 paper. NOT PRESENTLY USED.
-
-The output is written to a specified directory as a pdf file.
-"""
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import math
-import os
-import sys
-
-sys.path.append('..')  # Main modules reside in the parent directory.
-
-
-from absl import app
-from absl import flags
-import matplotlib
-matplotlib.use('TkAgg')
-import matplotlib.pyplot as plt  # pylint: disable=g-import-not-at-top
-import numpy as np
-import smooth_sensitivity as pate_ss
-
-plt.style.use('ggplot')
-
-FLAGS = flags.FLAGS
-
-flags.DEFINE_string('figures_dir', '', 'Path where the output is written to.')
-
-
-def compute_ls_q(sigma, order, num_classes):
-
-  def beta(q):
-    return pate_ss._compute_rdp_gnmax(sigma, math.log(q), order)
-
-  def bu(q):
-    return pate_ss._compute_bu_gnmax(q, sigma, order)
-
-  def bl(q):
-    return pate_ss._compute_bl_gnmax(q, sigma, order)
-
-  def delta_beta(q):
-    if q == 0 or q > .8:
-      return 0
-    beta_q = beta(q)
-    beta_bu_q = beta(bu(q))
-    beta_bl_q = beta(bl(q))
-    assert beta_bl_q <= beta_q <= beta_bu_q
-    return beta_bu_q - beta_q  # max(beta_bu_q - beta_q, beta_q - beta_bl_q)
-
-  logq0 = pate_ss.compute_logq0_gnmax(sigma, order)
-  logq1 = pate_ss._compute_logq1(sigma, order, num_classes)
-  print(math.exp(logq1), math.exp(logq0))
-  xs = np.linspace(0, .1, num=1000, endpoint=True)
-  ys = [delta_beta(x) for x in xs]
-  return xs, ys
-
-
-def main(argv):
-  del argv  # Unused.
-
-  sigma = 20
-  order = 20.
-  num_classes = 10
-
-  # sigma = 20
-  # order = 25.
-  # num_classes = 10
-
-  x_axis, ys = compute_ls_q(sigma, order, num_classes)
-
-  fig, ax = plt.subplots()
-  fig.set_figheight(4.5)
-  fig.set_figwidth(4.7)
-
-  ax.plot(x_axis, ys, alpha=.8, linewidth=5)
-  plt.xlabel('Number of queries answered', fontsize=16)
-  plt.ylabel(r'Privacy cost $\varepsilon$ at $\delta=10^{-8}$', fontsize=16)
-  ax.tick_params(labelsize=14)
-  fout_name = os.path.join(FLAGS.figures_dir, 'ls_of_q.pdf')
-  print('Saving the graph to ' + fout_name)
-  plt.show()
-
-  plt.close('all')
-
-
-if __name__ == '__main__':
-  app.run(main)
--- a/tensorflow_privacy/research/pate_2018/ICLR2018/plot_partition.py
+++ b/tensorflow_privacy/research/pate_2018/ICLR2018/plot_partition.py
@ -1,412 +0,0 @@
-# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-"""Produces two plots. One compares aggregators and their analyses. The other
-illustrates sources of privacy loss for Confident-GNMax.
-
-A script in support of the paper "Scalable Private Learning with PATE" by
-Nicolas Papernot, Shuang Song, Ilya Mironov, Ananth Raghunathan, Kunal Talwar,
-Ulfar Erlingsson (https://arxiv.org/abs/1802.08908).
-
-The input is a file containing a numpy array of votes, one query per row, one
-class per column. Ex:
-  43, 1821, ..., 3
-  31, 16, ..., 0
-  ...
-  0, 86, ..., 438
-The output is written to a specified directory and consists of two files.
-"""
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import math
-import os
-import pickle
-import sys
-
-sys.path.append('..')  # Main modules reside in the parent directory.
-
-from absl import app
-from absl import flags
-from collections import namedtuple
-import matplotlib
-
-matplotlib.use('TkAgg')
-import matplotlib.pyplot as plt  # pylint: disable=g-import-not-at-top
-import numpy as np
-import core as pate
-import smooth_sensitivity as pate_ss
-
-plt.style.use('ggplot')
-
-FLAGS = flags.FLAGS
-flags.DEFINE_boolean('cache', False,
-                     'Read results of privacy analysis from cache.')
-flags.DEFINE_string('counts_file', None, 'Counts file.')
-flags.DEFINE_string('figures_dir', '', 'Path where figures are written to.')
-flags.DEFINE_float('threshold', None, 'Threshold for step 1 (selection).')
-flags.DEFINE_float('sigma1', None, 'Sigma for step 1 (selection).')
-flags.DEFINE_float('sigma2', None, 'Sigma for step 2 (argmax).')
-flags.DEFINE_integer('queries', None, 'Number of queries made by the student.')
-flags.DEFINE_float('delta', 1e-8, 'Target delta.')
-
-flags.mark_flag_as_required('counts_file')
-flags.mark_flag_as_required('threshold')
-flags.mark_flag_as_required('sigma1')
-flags.mark_flag_as_required('sigma2')
-
-Partition = namedtuple('Partition', ['step1', 'step2', 'ss', 'delta'])
-
-
-def analyze_gnmax_conf_data_ind(votes, threshold, sigma1, sigma2, delta):
-  orders = np.logspace(np.log10(1.5), np.log10(500), num=100)
-  n = votes.shape[0]
-
-  rdp_total = np.zeros(len(orders))
-  answered_total = 0
-  answered = np.zeros(n)
-  eps_cum = np.full(n, None, dtype=float)
-
-  for i in range(n):
-    v = votes[i,]
-    if threshold is not None and sigma1 is not None:
-      q_step1 = np.exp(pate.compute_logpr_answered(threshold, sigma1, v))
-      rdp_total += pate.rdp_data_independent_gaussian(sigma1, orders)
-    else:
-      q_step1 = 1.  # always answer
-
-    answered_total += q_step1
-    answered[i] = answered_total
-
-    rdp_total += q_step1 * pate.rdp_data_independent_gaussian(sigma2, orders)
-
-    eps_cum[i], order_opt = pate.compute_eps_from_delta(orders, rdp_total,
-                                                        delta)
-
-    if i > 0 and (i + 1) % 1000 == 0:
-      print('queries = {}, E[answered] = {:.2f}, E[eps] = {:.3f} '
-            'at order = {:.2f}.'.format(
-          i + 1,
-          answered[i],
-          eps_cum[i],
-          order_opt))
-      sys.stdout.flush()
-
-  return eps_cum, answered
-
-
-def analyze_gnmax_conf_data_dep(votes, threshold, sigma1, sigma2, delta):
-  # Short list of orders.
-  # orders = np.round(np.logspace(np.log10(20), np.log10(200), num=20))
-
-  # Long list of orders.
-  orders = np.concatenate((np.arange(20, 40, .2),
-                           np.arange(40, 75, .5),
-                            np.logspace(np.log10(75), np.log10(200), num=20)))
-
-  n = votes.shape[0]
-  num_classes = votes.shape[1]
-  num_teachers = int(sum(votes[0,]))
-
-  if threshold is not None and sigma1 is not None:
-    is_data_ind_step1 = pate.is_data_independent_always_opt_gaussian(
-        num_teachers, num_classes, sigma1, orders)
-  else:
-    is_data_ind_step1 = [True] * len(orders)
-
-  is_data_ind_step2 = pate.is_data_independent_always_opt_gaussian(
-      num_teachers, num_classes, sigma2, orders)
-
-  eps_partitioned = np.full(n, None, dtype=Partition)
-  order_opt = np.full(n, None, dtype=float)
-  ss_std_opt = np.full(n, None, dtype=float)
-  answered = np.zeros(n)
-
-  rdp_step1_total = np.zeros(len(orders))
-  rdp_step2_total = np.zeros(len(orders))
-
-  ls_total = np.zeros((len(orders), num_teachers))
-  answered_total = 0
-
-  for i in range(n):
-    v = votes[i,]
-
-    if threshold is not None and sigma1 is not None:
-      logq_step1 = pate.compute_logpr_answered(threshold, sigma1, v)
-      rdp_step1_total += pate.compute_rdp_threshold(logq_step1, sigma1, orders)
-    else:
-      logq_step1 = 0.  # always answer
-
-    pr_answered = np.exp(logq_step1)
-    logq_step2 = pate.compute_logq_gaussian(v, sigma2)
-    rdp_step2_total += pr_answered * pate.rdp_gaussian(logq_step2, sigma2,
-                                                       orders)
-
-    answered_total += pr_answered
-
-    rdp_ss = np.zeros(len(orders))
-    ss_std = np.zeros(len(orders))
-
-    for j, order in enumerate(orders):
-      if not is_data_ind_step1[j]:
-        ls_step1 = pate_ss.compute_local_sensitivity_bounds_threshold(v,
-            num_teachers, threshold, sigma1, order)
-      else:
-        ls_step1 = np.full(num_teachers, 0, dtype=float)
-
-      if not is_data_ind_step2[j]:
-        ls_step2 = pate_ss.compute_local_sensitivity_bounds_gnmax(
-            v, num_teachers, sigma2, order)
-      else:
-        ls_step2 = np.full(num_teachers, 0, dtype=float)
-
-      ls_total[j,] += ls_step1 + pr_answered * ls_step2
-
-      beta_ss = .49 / order
-
-      ss = pate_ss.compute_discounted_max(beta_ss, ls_total[j,])
-      sigma_ss = ((order * math.exp(2 * beta_ss)) / ss) ** (1 / 3)
-      rdp_ss[j] = pate_ss.compute_rdp_of_smooth_sensitivity_gaussian(
-          beta_ss, sigma_ss, order)
-      ss_std[j] = ss * sigma_ss
-
-    rdp_total = rdp_step1_total + rdp_step2_total + rdp_ss
-
-    answered[i] = answered_total
-    _, order_opt[i] = pate.compute_eps_from_delta(orders, rdp_total, delta)
-    order_idx = np.searchsorted(orders, order_opt[i])
-
-    # Since optimal orders are always non-increasing, shrink orders array
-    # and all cumulative arrays to speed up computation.
-    if order_idx < len(orders):
-      orders = orders[:order_idx + 1]
-      rdp_step1_total = rdp_step1_total[:order_idx + 1]
-      rdp_step2_total = rdp_step2_total[:order_idx + 1]
-
-    eps_partitioned[i] = Partition(step1=rdp_step1_total[order_idx],
-                                   step2=rdp_step2_total[order_idx],
-                                   ss=rdp_ss[order_idx],
-                                   delta=-math.log(delta) / (order_opt[i] - 1))
-    ss_std_opt[i] = ss_std[order_idx]
-    if i > 0 and (i + 1) % 1 == 0:
-      print('queries = {}, E[answered] = {:.2f}, E[eps] = {:.3f} +/- {:.3f} '
-            'at order = {:.2f}. Contributions: delta = {:.3f}, step1 = {:.3f}, '
-            'step2 = {:.3f}, ss = {:.3f}'.format(
-          i + 1,
-          answered[i],
-          sum(eps_partitioned[i]),
-          ss_std_opt[i],
-          order_opt[i],
-          eps_partitioned[i].delta,
-          eps_partitioned[i].step1,
-          eps_partitioned[i].step2,
-          eps_partitioned[i].ss))
-      sys.stdout.flush()
-
-  return eps_partitioned, answered, ss_std_opt, order_opt
-
-
-def plot_comparison(figures_dir, simple_ind, conf_ind, simple_dep, conf_dep):
-  """Plots variants of GNMax algorithm and their analyses.
-  """
-
-  def pivot(x_axis, eps, answered):
-    y = np.full(len(x_axis), None, dtype=float)  # delta
-    for i, x in enumerate(x_axis):
-      idx = np.searchsorted(answered, x)
-      if idx < len(eps):
-        y[i] = eps[idx]
-    return y
-
-  def pivot_dep(x_axis, data_dep):
-    eps_partitioned, answered, _, _ = data_dep
-    eps = [sum(p) for p in eps_partitioned]  # Flatten eps
-    return pivot(x_axis, eps, answered)
-
-  xlim = 10000
-  x_axis = range(0, xlim, 10)
-
-  y_simple_ind = pivot(x_axis, *simple_ind)
-  y_conf_ind = pivot(x_axis, *conf_ind)
-
-  y_simple_dep = pivot_dep(x_axis, simple_dep)
-  y_conf_dep = pivot_dep(x_axis, conf_dep)
-
-  # plt.close('all')
-  fig, ax = plt.subplots()
-  fig.set_figheight(4.5)
-  fig.set_figwidth(4.7)
-
-  ax.plot(x_axis, y_simple_ind, ls='--', color='r', lw=3, label=r'Simple GNMax, data-ind analysis')
-  ax.plot(x_axis, y_conf_ind, ls='--', color='b', lw=3, label=r'Confident GNMax, data-ind analysis')
-  ax.plot(x_axis, y_simple_dep, ls='-', color='r', lw=3, label=r'Simple GNMax, data-dep analysis')
-  ax.plot(x_axis, y_conf_dep, ls='-', color='b', lw=3, label=r'Confident GNMax, data-dep analysis')
-
-  plt.xticks(np.arange(0, xlim + 1000, 2000))
-  plt.xlim([0, xlim])
-  plt.ylim(bottom=0)
-  plt.legend(fontsize=16)
-  ax.set_xlabel('Number of queries answered', fontsize=16)
-  ax.set_ylabel(r'Privacy cost $\varepsilon$ at $\delta=10^{-8}$', fontsize=16)
-
-  ax.tick_params(labelsize=14)
-  plot_filename = os.path.join(figures_dir, 'comparison.pdf')
-  print('Saving the graph to ' + plot_filename)
-  fig.savefig(plot_filename, bbox_inches='tight')
-  plt.show()
-
-
-def plot_partition(figures_dir, gnmax_conf, print_order):
-  """Plots an expert version of the privacy-per-answered-query graph.
-
-  Args:
-    figures_dir: A name of the directory where to save the plot.
-    eps: The cumulative privacy cost.
-    partition: Allocation of the privacy cost.
-    answered: Cumulative number of queries answered.
-    order_opt: The list of optimal orders.
-  """
-  eps_partitioned, answered, ss_std_opt, order_opt = gnmax_conf
-
-  xlim = 10000
-  x = range(0, int(xlim), 10)
-  lenx = len(x)
-  y0 = np.full(lenx, np.nan, dtype=float)  # delta
-  y1 = np.full(lenx, np.nan, dtype=float)  # delta + step1
-  y2 = np.full(lenx, np.nan, dtype=float)  # delta + step1 + step2
-  y3 = np.full(lenx, np.nan, dtype=float)  # delta + step1 + step2 + ss
-  noise_std = np.full(lenx, np.nan, dtype=float)
-
-  y_right = np.full(lenx, np.nan, dtype=float)
-
-  for i in range(lenx):
-    idx = np.searchsorted(answered, x[i])
-    if idx < len(eps_partitioned):
-      y0[i] = eps_partitioned[idx].delta
-      y1[i] = y0[i] + eps_partitioned[idx].step1
-      y2[i] = y1[i] + eps_partitioned[idx].step2
-      y3[i] = y2[i] + eps_partitioned[idx].ss
-
-      noise_std[i] = ss_std_opt[idx]
-      y_right[i] = order_opt[idx]
-
-  # plt.close('all')
-  fig, ax = plt.subplots()
-  fig.set_figheight(4.5)
-  fig.set_figwidth(4.7)
-  fig.patch.set_alpha(0)
-
-  l1 = ax.plot(
-      x, y3, color='b', ls='-', label=r'Total privacy cost', linewidth=1).pop()
-
-  for y in (y0, y1, y2):
-    ax.plot(x, y, color='b', ls='-', label=r'_nolegend_', alpha=.5, linewidth=1)
-
-  ax.fill_between(x, [0] * lenx, y0.tolist(), facecolor='b', alpha=.5)
-  ax.fill_between(x, y0.tolist(), y1.tolist(), facecolor='b', alpha=.4)
-  ax.fill_between(x, y1.tolist(), y2.tolist(), facecolor='b', alpha=.3)
-  ax.fill_between(x, y2.tolist(), y3.tolist(), facecolor='b', alpha=.2)
-
-  ax.fill_between(x, (y3 - noise_std).tolist(), (y3 + noise_std).tolist(),
-                  facecolor='r', alpha=.5)
-
-
-  plt.xticks(np.arange(0, xlim + 1000, 2000))
-  plt.xlim([0, xlim])
-  ax.set_ylim([0, 3.])
-
-  ax.set_xlabel('Number of queries answered', fontsize=16)
-  ax.set_ylabel(r'Privacy cost $\varepsilon$ at $\delta=10^{-8}$', fontsize=16)
-
-  # Merging legends.
-  if print_order:
-    ax2 = ax.twinx()
-    l2 = ax2.plot(
-        x, y_right, 'r', ls='-', label=r'Optimal order', linewidth=5,
-        alpha=.5).pop()
-    ax2.grid(False)
-    # ax2.set_ylabel(r'Optimal Renyi order', fontsize=16)
-    ax2.set_ylim([0, 200.])
-    # ax.legend((l1, l2), (l1.get_label(), l2.get_label()), loc=0, fontsize=13)
-
-  ax.tick_params(labelsize=14)
-  plot_filename = os.path.join(figures_dir, 'partition.pdf')
-  print('Saving the graph to ' + plot_filename)
-  fig.savefig(plot_filename, bbox_inches='tight', dpi=800)
-  plt.show()
-
-
-def run_all_analyses(votes, threshold, sigma1, sigma2, delta):
-  simple_ind = analyze_gnmax_conf_data_ind(votes, None, None, sigma2,
-                                           delta)
-
-  conf_ind = analyze_gnmax_conf_data_ind(votes, threshold, sigma1, sigma2,
-                                         delta)
-
-  simple_dep = analyze_gnmax_conf_data_dep(votes, None, None, sigma2,
-                                           delta)
-
-  conf_dep = analyze_gnmax_conf_data_dep(votes, threshold, sigma1, sigma2,
-                                         delta)
-
-  return (simple_ind, conf_ind, simple_dep, conf_dep)
-
-
-def run_or_load_all_analyses():
-  temp_filename = os.path.expanduser('~/tmp/partition_cached.pkl')
-
-  if FLAGS.cache and os.path.isfile(temp_filename):
-    print('Reading from cache ' + temp_filename)
-    with open(temp_filename, 'rb') as f:
-      all_analyses = pickle.load(f)
-  else:
-    fin_name = os.path.expanduser(FLAGS.counts_file)
-    print('Reading raw votes from ' + fin_name)
-    sys.stdout.flush()
-
-    votes = np.load(fin_name)
-
-    if FLAGS.queries is not None:
-      if votes.shape[0] < FLAGS.queries:
-        raise ValueError('Expect {} rows, got {} in {}'.format(
-            FLAGS.queries, votes.shape[0], fin_name))
-      # Truncate the votes matrix to the number of queries made.
-      votes = votes[:FLAGS.queries, ]
-
-    all_analyses = run_all_analyses(votes, FLAGS.threshold, FLAGS.sigma1,
-                                    FLAGS.sigma2, FLAGS.delta)
-
-    print('Writing to cache ' + temp_filename)
-    with open(temp_filename, 'wb') as f:
-      pickle.dump(all_analyses, f)
-
-  return all_analyses
-
-
-def main(argv):
-  del argv  # Unused.
-
-  simple_ind, conf_ind, simple_dep, conf_dep = run_or_load_all_analyses()
-
-  figures_dir = os.path.expanduser(FLAGS.figures_dir)
-
-  plot_comparison(figures_dir, simple_ind, conf_ind, simple_dep, conf_dep)
-  plot_partition(figures_dir, conf_dep, True)
-  plt.close('all')
-
-
-if __name__ == '__main__':
-  app.run(main)
--- a/tensorflow_privacy/research/pate_2018/ICLR2018/plots_for_slides.py
+++ b/tensorflow_privacy/research/pate_2018/ICLR2018/plots_for_slides.py
@ -1,283 +0,0 @@
-# Copyright 2017 The 'Scalable Private Learning with PATE' Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-"""Plots graphs for the slide deck.
-
-A script in support of the PATE2 paper. The input is a file containing a numpy
-array of votes, one query per row, one class per column. Ex:
-  43, 1821, ..., 3
-  31, 16, ..., 0
-  ...
-  0, 86, ..., 438
-The output graphs are visualized using the TkAgg backend.
-"""
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import math
-import os
-import sys
-
-sys.path.append('..')  # Main modules reside in the parent directory.
-
-from absl import app
-from absl import flags
-import matplotlib
-
-matplotlib.use('TkAgg')
-import matplotlib.pyplot as plt  # pylint: disable=g-import-not-at-top
-import numpy as np
-import core as pate
-import random
-
-plt.style.use('ggplot')
-
-FLAGS = flags.FLAGS
-flags.DEFINE_string('counts_file', None, 'Counts file.')
-flags.DEFINE_string('figures_dir', '', 'Path where figures are written to.')
-flags.DEFINE_boolean('transparent', False, 'Set background to transparent.')
-
-flags.mark_flag_as_required('counts_file')
-
-
-def setup_plot():
-  fig, ax = plt.subplots()
-  fig.set_figheight(4.5)
-  fig.set_figwidth(4.7)
-
-  if FLAGS.transparent:
-    fig.patch.set_alpha(0)
-
-  return fig, ax
-
-
-def plot_rdp_curve_per_example(votes, sigmas):
-  orders = np.linspace(1., 100., endpoint=True, num=1000)
-  orders[0] = 1.001
-  fig, ax = setup_plot()
-
-  for i in range(votes.shape[0]):
-    for sigma in sigmas:
-      logq = pate.compute_logq_gaussian(votes[i,], sigma)
-      rdp = pate.rdp_gaussian(logq, sigma, orders)
-      ax.plot(
-          orders,
-          rdp,
-          alpha=1.,
-          label=r'Data-dependent bound, $\sigma$={}'.format(int(sigma)),
-          linewidth=5)
-
-  for sigma in sigmas:
-    ax.plot(
-        orders,
-        pate.rdp_data_independent_gaussian(sigma, orders),
-        alpha=.3,
-        label=r'Data-independent bound, $\sigma$={}'.format(int(sigma)),
-        linewidth=10)
-
-  plt.xlim(xmin=1, xmax=100)
-  plt.ylim(ymin=0)
-  plt.xticks([1, 20, 40, 60, 80, 100])
-  plt.yticks([0, .0025, .005, .0075, .01])
-  plt.xlabel(r'Order $\alpha$', fontsize=16)
-  plt.ylabel(r'RDP value $\varepsilon$ at $\alpha$', fontsize=16)
-  ax.tick_params(labelsize=14)
-
-  plt.legend(loc=0, fontsize=13)
-  plt.show()
-
-
-def plot_rdp_of_sigma(v, order):
-  sigmas = np.linspace(1., 1000., endpoint=True, num=1000)
-  fig, ax = setup_plot()
-
-  y = np.zeros(len(sigmas))
-
-  for i, sigma in enumerate(sigmas):
-    logq = pate.compute_logq_gaussian(v, sigma)
-    y[i] = pate.rdp_gaussian(logq, sigma, order)
-
-  ax.plot(sigmas, y, alpha=.8, linewidth=5)
-
-  plt.xlim(xmin=1, xmax=1000)
-  plt.ylim(ymin=0)
-  # plt.yticks([0, .0004, .0008, .0012])
-  ax.tick_params(labelleft='off')
-  plt.xlabel(r'Noise $\sigma$', fontsize=16)
-  plt.ylabel(r'RDP at order $\alpha={}$'.format(order), fontsize=16)
-  ax.tick_params(labelsize=14)
-
-  # plt.legend(loc=0, fontsize=13)
-  plt.show()
-
-
-def compute_rdp_curve(votes, threshold, sigma1, sigma2, orders,
-    target_answered):
-  rdp_cum = np.zeros(len(orders))
-  answered = 0
-  for i, v in enumerate(votes):
-    v = sorted(v, reverse=True)
-    q_step1 = math.exp(pate.compute_logpr_answered(threshold, sigma1, v))
-    logq_step2 = pate.compute_logq_gaussian(v, sigma2)
-    rdp = pate.rdp_gaussian(logq_step2, sigma2, orders)
-    rdp_cum += q_step1 * rdp
-
-    answered += q_step1
-    if answered >= target_answered:
-      print('Processed {} queries to answer {}.'.format(i, target_answered))
-      return rdp_cum
-
-  assert False, 'Never reached {} answered queries.'.format(target_answered)
-
-
-def plot_rdp_total(votes, sigmas):
-  orders = np.linspace(1., 100., endpoint=True, num=100)
-  orders[0] = 1.1
-
-  fig, ax = setup_plot()
-
-  target_answered = 2000
-
-  for sigma in sigmas:
-    rdp = compute_rdp_curve(votes, 5000, 1000, sigma, orders, target_answered)
-    ax.plot(
-        orders,
-        rdp,
-        alpha=.8,
-        label=r'Data-dependent bound, $\sigma$={}'.format(int(sigma)),
-        linewidth=5)
-
-  # for sigma in sigmas:
-  #   ax.plot(
-  #       orders,
-  #       target_answered * pate.rdp_data_independent_gaussian(sigma, orders),
-  #       alpha=.3,
-  #       label=r'Data-independent bound, $\sigma$={}'.format(int(sigma)),
-  #       linewidth=10)
-
-  plt.xlim(xmin=1, xmax=100)
-  plt.ylim(ymin=0)
-  plt.xticks([1, 20, 40, 60, 80, 100])
-  plt.yticks([0, .0005, .001, .0015, .002])
-
-  plt.xlabel(r'Order $\alpha$', fontsize=16)
-  plt.ylabel(r'RDP value $\varepsilon$ at $\alpha$', fontsize=16)
-  ax.tick_params(labelsize=14)
-
-  plt.legend(loc=0, fontsize=13)
-  plt.show()
-
-
-def plot_data_ind_curve():
-  fig, ax = setup_plot()
-
-  orders = np.linspace(1., 10., endpoint=True, num=1000)
-  orders[0] = 1.01
-
-  ax.plot(
-      orders,
-      pate.rdp_data_independent_gaussian(1., orders),
-      alpha=.5,
-      color='gray',
-      linewidth=10)
-
-  # plt.yticks([])
-  plt.xlim(xmin=1, xmax=10)
-  plt.ylim(ymin=0)
-  plt.xticks([1, 3, 5, 7, 9])
-  ax.tick_params(labelsize=14)
-  plt.show()
-
-
-def plot_two_data_ind_curves():
-  orders = np.linspace(1., 100., endpoint=True, num=1000)
-  orders[0] = 1.001
-
-  fig, ax = setup_plot()
-
-  for sigma in [100, 150]:
-    ax.plot(
-        orders,
-        pate.rdp_data_independent_gaussian(sigma, orders),
-        alpha=.3,
-        label=r'Data-independent bound, $\sigma$={}'.format(int(sigma)),
-        linewidth=10)
-
-  plt.xlim(xmin=1, xmax=100)
-  plt.ylim(ymin=0)
-  plt.xticks([1, 20, 40, 60, 80, 100])
-  plt.yticks([0, .0025, .005, .0075, .01])
-  plt.xlabel(r'Order $\alpha$', fontsize=16)
-  plt.ylabel(r'RDP value $\varepsilon$ at $\alpha$', fontsize=16)
-  ax.tick_params(labelsize=14)
-
-  plt.legend(loc=0, fontsize=13)
-  plt.show()
-
-
-def scatter_plot(votes, threshold, sigma1, sigma2, order):
-  fig, ax = setup_plot()
-  x = []
-  y = []
-  for i, v in enumerate(votes):
-    if threshold is not None and sigma1 is not None:
-      q_step1 = math.exp(pate.compute_logpr_answered(threshold, sigma1, v))
-    else:
-      q_step1 = 1.
-    if random.random() < q_step1:
-      logq_step2 = pate.compute_logq_gaussian(v, sigma2)
-      x.append(max(v))
-      y.append(pate.rdp_gaussian(logq_step2, sigma2, order))
-
-  print('Selected {} queries.'.format(len(x)))
-  # Plot the data-independent curve:
-  # data_ind = pate.rdp_data_independent_gaussian(sigma, order)
-  # plt.plot([0, 5000], [data_ind, data_ind], color='tab:blue', linestyle='-', linewidth=2)
-  ax.set_yscale('log')
-  plt.xlim(xmin=0, xmax=5000)
-  plt.ylim(ymin=1e-300, ymax=1)
-  plt.yticks([1, 1e-100, 1e-200, 1e-300])
-  plt.scatter(x, y, s=1, alpha=0.5)
-  plt.ylabel(r'RDP at $\alpha={}$'.format(order), fontsize=16)
-  plt.xlabel(r'max count', fontsize=16)
-  ax.tick_params(labelsize=14)
-  plt.show()
-
-
-def main(argv):
-  del argv  # Unused.
-  fin_name = os.path.expanduser(FLAGS.counts_file)
-  print('Reading raw votes from ' + fin_name)
-  sys.stdout.flush()
-
-  plot_data_ind_curve()
-  plot_two_data_ind_curves()
-
-  v1 = [2550, 2200, 250]  # based on votes[2,]
-  # v2 = [2600, 2200, 200]  # based on votes[381,]
-  plot_rdp_curve_per_example(np.array([v1]), (100., 150.))
-
-  plot_rdp_of_sigma(np.array(v1), 20.)
-
-  votes = np.load(fin_name)
-
-  plot_rdp_total(votes[:12000, ], (100., 150.))
-  scatter_plot(votes[:6000, ], None, None, 100, 20)  # w/o thresholding
-  scatter_plot(votes[:6000, ], 3500, 1500, 100, 20)  # with thresholding
-
-
-if __name__ == '__main__':
-  app.run(main)
--- a/tensorflow_privacy/research/pate_2018/ICLR2018/rdp_bucketized.py
+++ b/tensorflow_privacy/research/pate_2018/ICLR2018/rdp_bucketized.py
@ -1,264 +0,0 @@
-# Copyright 2017 The 'Scalable Private Learning with PATE' Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-"""Illustrates how noisy thresholding check changes distribution of queries.
-
-A script in support of the paper "Scalable Private Learning with PATE" by
-Nicolas Papernot, Shuang Song, Ilya Mironov, Ananth Raghunathan, Kunal Talwar,
-Ulfar Erlingsson (https://arxiv.org/abs/1802.08908).
-
-The input is a file containing a numpy array of votes, one query per row, one
-class per column. Ex:
-  43, 1821, ..., 3
-  31, 16, ..., 0
-  ...
-  0, 86, ..., 438
-The output is one of two graphs depending on the setting of the plot variable.
-The output is written to a pdf file.
-"""
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import math
-import os
-import sys
-
-sys.path.append('..')  # Main modules reside in the parent directory.
-
-from absl import app
-from absl import flags
-import core as pate
-import matplotlib
-matplotlib.use('TkAgg')
-import matplotlib.pyplot as plt  # pylint: disable=g-import-not-at-top
-import numpy as np
-from six.moves import xrange
-
-plt.style.use('ggplot')
-
-FLAGS = flags.FLAGS
-flags.DEFINE_enum('plot', 'small', ['small', 'large'], 'Selects which of'
-                  'the two plots is produced.')
-flags.DEFINE_string('counts_file', None, 'Counts file.')
-flags.DEFINE_string('plot_file', '', 'Plot file to write.')
-
-flags.mark_flag_as_required('counts_file')
-
-
-def compute_count_per_bin(bin_num, votes):
-  """Tabulates number of examples in each bin.
-
-  Args:
-    bin_num: Number of bins.
-    votes: A matrix of votes, where each row contains votes in one instance.
-
-  Returns:
-    Array of counts of length bin_num.
-  """
-  sums = np.sum(votes, axis=1)
-  # Check that all rows contain the same number of votes.
-  assert max(sums) == min(sums)
-
-  s = max(sums)
-
-  counts = np.zeros(bin_num)
-  n = votes.shape[0]
-
-  for i in xrange(n):
-    v = votes[i,]
-    bin_idx = int(math.floor(max(v) * bin_num / s))
-    assert 0 <= bin_idx < bin_num
-    counts[bin_idx] += 1
-
-  return counts
-
-
-def compute_privacy_cost_per_bins(bin_num, votes, sigma2, order):
-  """Outputs average privacy cost per bin.
-
-  Args:
-    bin_num: Number of bins.
-    votes: A matrix of votes, where each row contains votes in one instance.
-    sigma2: The scale (std) of the Gaussian noise. (Same as sigma_2 in
-            Algorithms 1 and 2.)
-    order: The Renyi order for which privacy cost is computed.
-
-  Returns:
-    Expected eps of RDP (ignoring delta) per example in each bin.
-  """
-  n = votes.shape[0]
-
-  bin_counts = np.zeros(bin_num)
-  bin_rdp = np.zeros(bin_num)  # RDP at order=order
-
-  for i in xrange(n):
-    v = votes[i,]
-    logq = pate.compute_logq_gaussian(v, sigma2)
-    rdp_at_order = pate.rdp_gaussian(logq, sigma2, order)
-
-    bin_idx = int(math.floor(max(v) * bin_num / sum(v)))
-    assert 0 <= bin_idx < bin_num
-    bin_counts[bin_idx] += 1
-    bin_rdp[bin_idx] += rdp_at_order
-    if (i + 1) % 1000 == 0:
-      print('example {}'.format(i + 1))
-      sys.stdout.flush()
-
-  return bin_rdp / bin_counts
-
-
-def compute_expected_answered_per_bin(bin_num, votes, threshold, sigma1):
-  """Computes expected number of answers per bin.
-
-  Args:
-    bin_num: Number of bins.
-    votes: A matrix of votes, where each row contains votes in one instance.
-    threshold: The threshold against which check is performed.
-    sigma1: The std of the Gaussian noise with which check is performed. (Same
-      as sigma_1 in Algorithms 1 and 2.)
-
-  Returns:
-    Expected number of queries answered per bin.
-  """
-  n = votes.shape[0]
-
-  bin_answered = np.zeros(bin_num)
-
-  for i in xrange(n):
-    v = votes[i,]
-    p = math.exp(pate.compute_logpr_answered(threshold, sigma1, v))
-    bin_idx = int(math.floor(max(v) * bin_num / sum(v)))
-    assert 0 <= bin_idx < bin_num
-    bin_answered[bin_idx] += p
-    if (i + 1) % 1000 == 0:
-      print('example {}'.format(i + 1))
-      sys.stdout.flush()
-
-  return bin_answered
-
-
-def main(argv):
-  del argv  # Unused.
-  fin_name = os.path.expanduser(FLAGS.counts_file)
-  print('Reading raw votes from ' + fin_name)
-  sys.stdout.flush()
-
-  votes = np.load(fin_name)
-  votes = votes[:4000,]  # truncate to 4000 samples
-
-  if FLAGS.plot == 'small':
-    bin_num = 5
-    m_check = compute_expected_answered_per_bin(bin_num, votes, 3500, 1500)
-  elif FLAGS.plot == 'large':
-    bin_num = 10
-    m_check = compute_expected_answered_per_bin(bin_num, votes, 3500, 1500)
-    a_check = compute_expected_answered_per_bin(bin_num, votes, 5000, 1500)
-    eps = compute_privacy_cost_per_bins(bin_num, votes, 100, 50)
-  else:
-    raise ValueError('--plot flag must be one of ["small", "large"]')
-
-  counts = compute_count_per_bin(bin_num, votes)
-  bins = np.linspace(0, 100, num=bin_num, endpoint=False)
-
-  plt.close('all')
-  fig, ax = plt.subplots()
-  if FLAGS.plot == 'small':
-    fig.set_figheight(5)
-    fig.set_figwidth(5)
-    ax.bar(
-        bins,
-        counts,
-        20,
-        color='orangered',
-        linestyle='dotted',
-        linewidth=5,
-        edgecolor='red',
-        fill=False,
-        alpha=.5,
-        align='edge',
-        label='LNMax answers')
-    ax.bar(
-        bins,
-        m_check,
-        20,
-        color='g',
-        alpha=.5,
-        linewidth=0,
-        edgecolor='g',
-        align='edge',
-        label='Confident-GNMax\nanswers')
-  elif FLAGS.plot == 'large':
-    fig.set_figheight(4.7)
-    fig.set_figwidth(7)
-    ax.bar(
-        bins,
-        counts,
-        10,
-        linestyle='dashed',
-        linewidth=5,
-        edgecolor='red',
-        fill=False,
-        alpha=.5,
-        align='edge',
-        label='LNMax answers')
-    ax.bar(
-        bins,
-        m_check,
-        10,
-        color='g',
-        alpha=.5,
-        linewidth=0,
-        edgecolor='g',
-        align='edge',
-        label='Confident-GNMax\nanswers (moderate)')
-    ax.bar(
-        bins,
-        a_check,
-        10,
-        color='b',
-        alpha=.5,
-        align='edge',
-        label='Confident-GNMax\nanswers (aggressive)')
-    ax2 = ax.twinx()
-    bin_centers = [x + 5 for x in bins]
-    ax2.plot(bin_centers, eps, 'ko', alpha=.8)
-    ax2.set_ylim([1e-200, 1.])
-    ax2.set_yscale('log')
-    ax2.grid(False)
-    ax2.set_yticks([1e-3, 1e-50, 1e-100, 1e-150, 1e-200])
-    plt.tick_params(which='minor', right='off')
-    ax2.set_ylabel(r'Per query privacy cost $\varepsilon$', fontsize=16)
-
-  plt.xlim([0, 100])
-  ax.set_ylim([0, 2500])
-  # ax.set_yscale('log')
-  ax.set_xlabel('Percentage of teachers that agree', fontsize=16)
-  ax.set_ylabel('Number of queries answered', fontsize=16)
-  vals = ax.get_xticks()
-  ax.set_xticklabels([str(int(x)) + '%' for x in vals])
-  ax.tick_params(labelsize=14, bottom=True, top=True, left=True, right=True)
-  ax.legend(loc=2, prop={'size': 16})
-
-  # simple: 'figures/noisy_thresholding_check_perf.pdf')
-  # detailed: 'figures/noisy_thresholding_check_perf_details.pdf'
-
-  print('Saving the graph to ' + FLAGS.plot_file)
-  plt.savefig(os.path.expanduser(FLAGS.plot_file), bbox_inches='tight')
-  plt.show()
-
-
-if __name__ == '__main__':
-  app.run(main)
--- a/tensorflow_privacy/research/pate_2018/ICLR2018/rdp_cumulative.py
+++ b/tensorflow_privacy/research/pate_2018/ICLR2018/rdp_cumulative.py
@ -1,378 +0,0 @@
-# Copyright 2017 The 'Scalable Private Learning with PATE' Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-"""Plots three graphs illustrating cost of privacy per answered query.
-
-A script in support of the paper "Scalable Private Learning with PATE" by
-Nicolas Papernot, Shuang Song, Ilya Mironov, Ananth Raghunathan, Kunal Talwar,
-Ulfar Erlingsson (https://arxiv.org/abs/1802.08908).
-
-The input is a file containing a numpy array of votes, one query per row, one
-class per column. Ex:
-  43, 1821, ..., 3
-  31, 16, ..., 0
-  ...
-  0, 86, ..., 438
-The output is written to a specified directory and consists of three pdf files.
-"""
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import math
-import os
-import pickle
-import sys
-
-sys.path.append('..')  # Main modules reside in the parent directory.
-
-from absl import app
-from absl import flags
-import matplotlib
-
-matplotlib.use('TkAgg')
-import matplotlib.pyplot as plt  # pylint: disable=g-import-not-at-top
-import numpy as np
-import core as pate
-
-plt.style.use('ggplot')
-
-FLAGS = flags.FLAGS
-flags.DEFINE_boolean('cache', False,
-                     'Read results of privacy analysis from cache.')
-flags.DEFINE_string('counts_file', None, 'Counts file.')
-flags.DEFINE_string('figures_dir', '', 'Path where figures are written to.')
-
-flags.mark_flag_as_required('counts_file')
-
-def run_analysis(votes, mechanism, noise_scale, params):
-  """Computes data-dependent privacy.
-
-  Args:
-    votes: A matrix of votes, where each row contains votes in one instance.
-    mechanism: A name of the mechanism ('lnmax', 'gnmax', or 'gnmax_conf')
-    noise_scale: A mechanism privacy parameter.
-    params: Other privacy parameters.
-
-  Returns:
-    Four lists: cumulative privacy cost epsilon, how privacy budget is split,
-    how many queries were answered, optimal order.
-  """
-
-  def compute_partition(order_opt, eps):
-    order_opt_idx = np.searchsorted(orders, order_opt)
-    if mechanism == 'gnmax_conf':
-      p = (rdp_select_cum[order_opt_idx],
-           rdp_cum[order_opt_idx] - rdp_select_cum[order_opt_idx],
-           -math.log(delta) / (order_opt - 1))
-    else:
-      p = (rdp_cum[order_opt_idx], -math.log(delta) / (order_opt - 1))
-    return [x / eps for x in p]  # Ensures that sum(x) == 1
-
-  # Short list of orders.
-  # orders = np.round(np.concatenate((np.arange(2, 50 + 1, 1),
-  #                   np.logspace(np.log10(50), np.log10(1000), num=20))))
-
-  # Long list of orders.
-  orders = np.concatenate((np.arange(2, 100 + 1, .5),
-                           np.logspace(np.log10(100), np.log10(500), num=100)))
-  delta = 1e-8
-
-  n = votes.shape[0]
-  eps_total = np.zeros(n)
-  partition = [None] * n
-  order_opt = np.full(n, np.nan, dtype=float)
-  answered = np.zeros(n, dtype=float)
-
-  rdp_cum = np.zeros(len(orders))
-  rdp_sqrd_cum = np.zeros(len(orders))
-  rdp_select_cum = np.zeros(len(orders))
-  answered_sum = 0
-
-  for i in range(n):
-    v = votes[i,]
-    if mechanism == 'lnmax':
-      logq_lnmax = pate.compute_logq_laplace(v, noise_scale)
-      rdp_query = pate.rdp_pure_eps(logq_lnmax, 2. / noise_scale, orders)
-      rdp_sqrd = rdp_query ** 2
-      pr_answered = 1
-    elif mechanism == 'gnmax':
-      logq_gmax = pate.compute_logq_gaussian(v, noise_scale)
-      rdp_query = pate.rdp_gaussian(logq_gmax, noise_scale, orders)
-      rdp_sqrd = rdp_query ** 2
-      pr_answered = 1
-    elif mechanism == 'gnmax_conf':
-      logq_step1 = pate.compute_logpr_answered(params['t'], params['sigma1'], v)
-      logq_step2 = pate.compute_logq_gaussian(v, noise_scale)
-      q_step1 = np.exp(logq_step1)
-      logq_step1_min = min(logq_step1, math.log1p(-q_step1))
-      rdp_gnmax_step1 = pate.rdp_gaussian(logq_step1_min,
-                                          2 ** .5 * params['sigma1'], orders)
-      rdp_gnmax_step2 = pate.rdp_gaussian(logq_step2, noise_scale, orders)
-      rdp_query = rdp_gnmax_step1 + q_step1 * rdp_gnmax_step2
-      # The expression below evaluates
-      #     E[(cost_of_step_1 + Bernoulli(pr_of_step_2) * cost_of_step_2)^2]
-      rdp_sqrd = (
-          rdp_gnmax_step1 ** 2 + 2 * rdp_gnmax_step1 * q_step1 * rdp_gnmax_step2
-          + q_step1 * rdp_gnmax_step2 ** 2)
-      rdp_select_cum += rdp_gnmax_step1
-      pr_answered = q_step1
-    else:
-      raise ValueError(
-          'Mechanism must be one of ["lnmax", "gnmax", "gnmax_conf"]')
-
-    rdp_cum += rdp_query
-    rdp_sqrd_cum += rdp_sqrd
-    answered_sum += pr_answered
-
-    answered[i] = answered_sum
-    eps_total[i], order_opt[i] = pate.compute_eps_from_delta(
-        orders, rdp_cum, delta)
-    partition[i] = compute_partition(order_opt[i], eps_total[i])
-
-    if i > 0 and (i + 1) % 1000 == 0:
-      rdp_var = rdp_sqrd_cum / i - (
-          rdp_cum / i) ** 2  # Ignore Bessel's correction.
-      order_opt_idx = np.searchsorted(orders, order_opt[i])
-      eps_std = ((i + 1) * rdp_var[order_opt_idx]) ** .5  # Std of the sum.
-      print(
-          'queries = {}, E[answered] = {:.2f}, E[eps] = {:.3f} (std = {:.5f}) '
-          'at order = {:.2f} (contribution from delta = {:.3f})'.format(
-              i + 1, answered_sum, eps_total[i], eps_std, order_opt[i],
-              -math.log(delta) / (order_opt[i] - 1)))
-      sys.stdout.flush()
-
-  return eps_total, partition, answered, order_opt
-
-
-def print_plot_small(figures_dir, eps_lap, eps_gnmax, answered_gnmax):
-  """Plots a graph of LNMax vs GNMax.
-
-  Args:
-    figures_dir: A name of the directory where to save the plot.
-    eps_lap: The cumulative privacy costs of the Laplace mechanism.
-    eps_gnmax: The cumulative privacy costs of the Gaussian mechanism
-    answered_gnmax: The cumulative count of queries answered.
-  """
-  xlim = 6000
-  x_axis = range(0, int(xlim), 10)
-  y_lap = np.zeros(len(x_axis), dtype=float)
-  y_gnmax = np.full(len(x_axis), np.nan, dtype=float)
-
-  for i in range(len(x_axis)):
-    x = x_axis[i]
-    y_lap[i] = eps_lap[x]
-    idx = np.searchsorted(answered_gnmax, x)
-    if idx < len(eps_gnmax):
-      y_gnmax[i] = eps_gnmax[idx]
-
-  fig, ax = plt.subplots()
-  fig.set_figheight(4.5)
-  fig.set_figwidth(4.7)
-  ax.plot(
-      x_axis, y_lap, color='r', ls='--', label='LNMax', alpha=.5, linewidth=5)
-  ax.plot(
-      x_axis,
-      y_gnmax,
-      color='g',
-      ls='-',
-      label='Confident-GNMax',
-      alpha=.5,
-      linewidth=5)
-  plt.xticks(np.arange(0, 7000, 1000))
-  plt.xlim([0, 6000])
-  plt.ylim([0, 6.])
-  plt.xlabel('Number of queries answered', fontsize=16)
-  plt.ylabel(r'Privacy cost $\varepsilon$ at $\delta=10^{-8}$', fontsize=16)
-  plt.legend(loc=2, fontsize=13)  # loc=2 -- upper left
-  ax.tick_params(labelsize=14)
-  fout_name = os.path.join(figures_dir, 'lnmax_vs_gnmax.pdf')
-  print('Saving the graph to ' + fout_name)
-  fig.savefig(fout_name, bbox_inches='tight')
-  plt.show()
-
-
-def print_plot_large(figures_dir, eps_lap, eps_gnmax1, answered_gnmax1,
-    eps_gnmax2, partition_gnmax2, answered_gnmax2):
-  """Plots a graph of LNMax vs GNMax with two parameters.
-
-  Args:
-    figures_dir: A name of the  directory where to save the plot.
-    eps_lap: The cumulative privacy costs of the Laplace mechanism.
-    eps_gnmax1: The cumulative privacy costs of the Gaussian mechanism (set 1).
-    answered_gnmax1: The cumulative count of queries answered (set 1).
-    eps_gnmax2: The cumulative privacy costs of the Gaussian mechanism (set 2).
-    partition_gnmax2: Allocation of eps for set 2.
-    answered_gnmax2: The cumulative count of queries answered (set 2).
-  """
-  xlim = 6000
-  x_axis = range(0, int(xlim), 10)
-  lenx = len(x_axis)
-  y_lap = np.zeros(lenx)
-  y_gnmax1 = np.full(lenx, np.nan, dtype=float)
-  y_gnmax2 = np.full(lenx, np.nan, dtype=float)
-  y1_gnmax2 = np.full(lenx, np.nan, dtype=float)
-
-  for i in range(lenx):
-    x = x_axis[i]
-    y_lap[i] = eps_lap[x]
-    idx1 = np.searchsorted(answered_gnmax1, x)
-    if idx1 < len(eps_gnmax1):
-      y_gnmax1[i] = eps_gnmax1[idx1]
-    idx2 = np.searchsorted(answered_gnmax2, x)
-    if idx2 < len(eps_gnmax2):
-      y_gnmax2[i] = eps_gnmax2[idx2]
-      fraction_step1, fraction_step2, _ = partition_gnmax2[idx2]
-      y1_gnmax2[i] = eps_gnmax2[idx2] * fraction_step1 / (
-          fraction_step1 + fraction_step2)
-
-  fig, ax = plt.subplots()
-  fig.set_figheight(4.5)
-  fig.set_figwidth(4.7)
-  ax.plot(
-      x_axis,
-      y_lap,
-      color='r',
-      ls='dashed',
-      label='LNMax',
-      alpha=.5,
-      linewidth=5)
-  ax.plot(
-      x_axis,
-      y_gnmax1,
-      color='g',
-      ls='-',
-      label='Confident-GNMax (moderate)',
-      alpha=.5,
-      linewidth=5)
-  ax.plot(
-      x_axis,
-      y_gnmax2,
-      color='b',
-      ls='-',
-      label='Confident-GNMax (aggressive)',
-      alpha=.5,
-      linewidth=5)
-  ax.fill_between(
-      x_axis, [0] * lenx,
-      y1_gnmax2.tolist(),
-      facecolor='b',
-      alpha=.3,
-      hatch='\\')
-  ax.plot(
-      x_axis,
-      y1_gnmax2,
-      color='b',
-      ls='-',
-      label='_nolegend_',
-      alpha=.5,
-      linewidth=1)
-  ax.fill_between(
-      x_axis, y1_gnmax2.tolist(), y_gnmax2.tolist(), facecolor='b', alpha=.3)
-  plt.xticks(np.arange(0, 7000, 1000))
-  plt.xlim([0, xlim])
-  plt.ylim([0, 1.])
-  plt.xlabel('Number of queries answered', fontsize=16)
-  plt.ylabel(r'Privacy cost $\varepsilon$ at $\delta=10^{-8}$', fontsize=16)
-  plt.legend(loc=2, fontsize=13)  # loc=2 -- upper left
-  ax.tick_params(labelsize=14)
-  fout_name = os.path.join(figures_dir, 'lnmax_vs_2xgnmax_large.pdf')
-  print('Saving the graph to ' + fout_name)
-  fig.savefig(fout_name, bbox_inches='tight')
-  plt.show()
-
-
-def run_all_analyses(votes, lambda_laplace, gnmax_parameters, sigma2):
-  """Sequentially runs all analyses.
-
-  Args:
-    votes: A matrix of votes, where each row contains votes in one instance.
-    lambda_laplace: The scale of the Laplace noise (lambda).
-    gnmax_parameters: A list of parameters for GNMax.
-    sigma2: Shared parameter for the GNMax mechanisms.
-
-  Returns:
-    Five lists whose length is the number of queries.
-  """
-  print('=== Laplace Mechanism ===')
-  eps_lap, _, _, _ = run_analysis(votes, 'lnmax', lambda_laplace, None)
-  print()
-
-  # Does not go anywhere, for now
-  # print('=== Gaussian Mechanism (simple) ===')
-  # eps, _, _, _ = run_analysis(votes[:n,], 'gnmax', sigma1, None)
-
-  eps_gnmax = [[] for p in gnmax_parameters]
-  partition_gmax = [[] for p in gnmax_parameters]
-  answered = [[] for p in gnmax_parameters]
-  order_opt = [[] for p in gnmax_parameters]
-  for i, p in enumerate(gnmax_parameters):
-    print('=== Gaussian Mechanism (confident) {}: ==='.format(p))
-    eps_gnmax[i], partition_gmax[i], answered[i], order_opt[i] = run_analysis(
-        votes, 'gnmax_conf', sigma2, p)
-    print()
-
-  return eps_lap, eps_gnmax, partition_gmax, answered, order_opt
-
-
-def main(argv):
-  del argv  # Unused.
-  lambda_laplace = 50.  # corresponds to eps = 1. / lambda_laplace
-
-  # Paramaters of the GNMax
-  gnmax_parameters = ({
-                        't': 1000,
-                        'sigma1': 500
-                      }, {
-                        't': 3500,
-                        'sigma1': 1500
-                      }, {
-                        't': 5000,
-                        'sigma1': 1500
-                      })
-  sigma2 = 100  # GNMax parameters differ only in Step 1 (selection).
-  ftemp_name = '/tmp/precomputed.pkl'
-
-  figures_dir = os.path.expanduser(FLAGS.figures_dir)
-
-  if FLAGS.cache and os.path.isfile(ftemp_name):
-    print('Reading from cache ' + ftemp_name)
-    with open(ftemp_name, 'rb') as f:
-      (eps_lap, eps_gnmax, partition_gmax, answered_gnmax,
-       orders_opt_gnmax) = pickle.load(f)
-  else:
-    fin_name = os.path.expanduser(FLAGS.counts_file)
-    print('Reading raw votes from ' + fin_name)
-    sys.stdout.flush()
-
-    votes = np.load(fin_name)
-
-    (eps_lap, eps_gnmax, partition_gmax,
-     answered_gnmax, orders_opt_gnmax) = run_all_analyses(
-        votes, lambda_laplace, gnmax_parameters, sigma2)
-
-    print('Writing to cache ' + ftemp_name)
-    with open(ftemp_name, 'wb') as f:
-      pickle.dump((eps_lap, eps_gnmax, partition_gmax, answered_gnmax,
-                   orders_opt_gnmax), f)
-
-  print_plot_small(figures_dir, eps_lap, eps_gnmax[0], answered_gnmax[0])
-  print_plot_large(figures_dir, eps_lap, eps_gnmax[1], answered_gnmax[1],
-                   eps_gnmax[2], partition_gmax[2], answered_gnmax[2])
-  plt.close('all')
-
-
-if __name__ == '__main__':
-  app.run(main)
--- a/tensorflow_privacy/research/pate_2018/ICLR2018/smooth_sensitivity_table.py
+++ b/tensorflow_privacy/research/pate_2018/ICLR2018/smooth_sensitivity_table.py
@ -1,358 +0,0 @@
-# Copyright 2017 The 'Scalable Private Learning with PATE' Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-"""Performs privacy analysis of GNMax with smooth sensitivity.
-
-A script in support of the paper "Scalable Private Learning with PATE" by
-Nicolas Papernot, Shuang Song, Ilya Mironov, Ananth Raghunathan, Kunal Talwar,
-Ulfar Erlingsson (https://arxiv.org/abs/1802.08908).
-
-Several flavors of the GNMax algorithm can be analyzed.
-  - Plain GNMax (argmax w/ Gaussian noise) is assumed when arguments threshold
-    and sigma2 are missing.
-  - Confident GNMax (thresholding + argmax w/ Gaussian noise) is used when
-    threshold, sigma1, and sigma2 are given.
-  - Interactive GNMax (two- or multi-round) is triggered by specifying
-    baseline_file, which provides baseline values for votes selection in Step 1.
-"""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import math
-import os
-import sys
-
-sys.path.append('..')  # Main modules reside in the parent directory.
-
-from absl import app
-from absl import flags
-import numpy as np
-import core as pate
-import smooth_sensitivity as pate_ss
-
-FLAGS = flags.FLAGS
-
-flags.DEFINE_string('counts_file', None, 'Counts file.')
-flags.DEFINE_string('baseline_file', None, 'File with baseline scores.')
-flags.DEFINE_boolean('data_independent', False,
-                     'Force data-independent bounds.')
-flags.DEFINE_float('threshold', None, 'Threshold for step 1 (selection).')
-flags.DEFINE_float('sigma1', None, 'Sigma for step 1 (selection).')
-flags.DEFINE_float('sigma2', None, 'Sigma for step 2 (argmax).')
-flags.DEFINE_integer('queries', None, 'Number of queries made by the student.')
-flags.DEFINE_float('delta', 1e-8, 'Target delta.')
-flags.DEFINE_float(
-    'order', None,
-    'Fixes a Renyi DP order (if unspecified, finds an optimal order from a '
-    'hardcoded list).')
-flags.DEFINE_integer(
-    'teachers', None,
-    'Number of teachers (if unspecified, derived from the counts file).')
-
-flags.mark_flag_as_required('counts_file')
-flags.mark_flag_as_required('sigma2')
-
-
-def _check_conditions(sigma, num_classes, orders):
-  """Symbolic-numeric verification of conditions C5 and C6.
-
-  The conditions on the beta function are verified by constructing the beta
-  function symbolically, and then checking that its derivative (computed
-  symbolically) is non-negative within the interval of conjectured monotonicity.
-  The last check is performed numerically.
-  """
-
-  print('Checking conditions C5 and C6 for all orders.')
-  sys.stdout.flush()
-  conditions_hold = True
-
-  for order in orders:
-    cond5, cond6 = pate_ss.check_conditions(sigma, num_classes, order)
-    conditions_hold &= cond5 and cond6
-    if not cond5:
-      print('Condition C5 does not hold for order =', order)
-    elif not cond6:
-      print('Condition C6 does not hold for order =', order)
-
-  if conditions_hold:
-    print('Conditions C5-C6 hold for all orders.')
-  sys.stdout.flush()
-  return conditions_hold
-
-
-def _compute_rdp(votes, baseline, threshold, sigma1, sigma2, delta, orders,
-                 data_ind):
-  """Computes the (data-dependent) RDP curve for Confident GNMax."""
-  rdp_cum = np.zeros(len(orders))
-  rdp_sqrd_cum = np.zeros(len(orders))
-  answered = 0
-
-  for i, v in enumerate(votes):
-    if threshold is None:
-      logq_step1 = 0  # No thresholding, always proceed to step 2.
-      rdp_step1 = np.zeros(len(orders))
-    else:
-      logq_step1 = pate.compute_logpr_answered(threshold, sigma1,
-                                               v - baseline[i,])
-      if data_ind:
-        rdp_step1 = pate.compute_rdp_data_independent_threshold(sigma1, orders)
-      else:
-        rdp_step1 = pate.compute_rdp_threshold(logq_step1, sigma1, orders)
-
-    if data_ind:
-      rdp_step2 = pate.rdp_data_independent_gaussian(sigma2, orders)
-    else:
-      logq_step2 = pate.compute_logq_gaussian(v, sigma2)
-      rdp_step2 = pate.rdp_gaussian(logq_step2, sigma2, orders)
-
-    q_step1 = np.exp(logq_step1)
-    rdp = rdp_step1 + rdp_step2 * q_step1
-    # The expression below evaluates
-    #     E[(cost_of_step_1 + Bernoulli(pr_of_step_2) * cost_of_step_2)^2]
-    rdp_sqrd = (
-        rdp_step1**2 + 2 * rdp_step1 * q_step1 * rdp_step2 +
-        q_step1 * rdp_step2**2)
-    rdp_sqrd_cum += rdp_sqrd
-
-    rdp_cum += rdp
-    answered += q_step1
-    if ((i + 1) % 1000 == 0) or (i == votes.shape[0] - 1):
-      rdp_var = rdp_sqrd_cum / i - (
-          rdp_cum / i)**2  # Ignore Bessel's correction.
-      eps_total, order_opt = pate.compute_eps_from_delta(orders, rdp_cum, delta)
-      order_opt_idx = np.searchsorted(orders, order_opt)
-      eps_std = ((i + 1) * rdp_var[order_opt_idx])**.5  # Std of the sum.
-      print(
-          'queries = {}, E[answered] = {:.2f}, E[eps] = {:.3f} (std = {:.5f}) '
-          'at order = {:.2f} (contribution from delta = {:.3f})'.format(
-              i + 1, answered, eps_total, eps_std, order_opt,
-              -math.log(delta) / (order_opt - 1)))
-      sys.stdout.flush()
-
-    _, order_opt = pate.compute_eps_from_delta(orders, rdp_cum, delta)
-
-  return order_opt
-
-
-def _find_optimal_smooth_sensitivity_parameters(
-    votes, baseline, num_teachers, threshold, sigma1, sigma2, delta, ind_step1,
-    ind_step2, order):
-  """Optimizes smooth sensitivity parameters by minimizing a cost function.
-
-  The cost function is
-        exact_eps + cost of GNSS + two stds of noise,
-  which captures that upper bound of the confidence interval of the sanitized
-  privacy budget.
-
-  Since optimization is done with full view of sensitive data, the results
-  cannot be released.
-  """
-  rdp_cum = 0
-  answered_cum = 0
-  ls_cum = 0
-
-  # Define a plausible range for the beta values.
-  betas = np.arange(.3 / order, .495 / order, .01 / order)
-  cost_delta = math.log(1 / delta) / (order - 1)
-
-  for i, v in enumerate(votes):
-    if threshold is None:
-      log_pr_answered = 0
-      rdp1 = 0
-      ls_step1 = np.zeros(num_teachers)
-    else:
-      log_pr_answered = pate.compute_logpr_answered(threshold, sigma1,
-                                                    v - baseline[i,])
-      if ind_step1:  # apply data-independent bound for step 1 (thresholding).
-        rdp1 = pate.compute_rdp_data_independent_threshold(sigma1, order)
-        ls_step1 = np.zeros(num_teachers)
-      else:
-        rdp1 = pate.compute_rdp_threshold(log_pr_answered, sigma1, order)
-        ls_step1 = pate_ss.compute_local_sensitivity_bounds_threshold(
-            v - baseline[i,], num_teachers, threshold, sigma1, order)
-
-    pr_answered = math.exp(log_pr_answered)
-    answered_cum += pr_answered
-
-    if ind_step2:  # apply data-independent bound for step 2 (GNMax).
-      rdp2 = pate.rdp_data_independent_gaussian(sigma2, order)
-      ls_step2 = np.zeros(num_teachers)
-    else:
-      logq_step2 = pate.compute_logq_gaussian(v, sigma2)
-      rdp2 = pate.rdp_gaussian(logq_step2, sigma2, order)
-      # Compute smooth sensitivity.
-      ls_step2 = pate_ss.compute_local_sensitivity_bounds_gnmax(
-          v, num_teachers, sigma2, order)
-
-    rdp_cum += rdp1 + pr_answered * rdp2
-    ls_cum += ls_step1 + pr_answered * ls_step2  # Expected local sensitivity.
-
-    if ind_step1 and ind_step2:
-      # Data-independent bounds.
-      cost_opt, beta_opt, ss_opt, sigma_ss_opt = None, 0., 0., np.inf
-    else:
-      # Data-dependent bounds.
-      cost_opt, beta_opt, ss_opt, sigma_ss_opt = np.inf, None, None, None
-
-      for beta in betas:
-        ss = pate_ss.compute_discounted_max(beta, ls_cum)
-
-        # Solution to the minimization problem:
-        #   min_sigma {order * exp(2 * beta)/ sigma^2 + 2 * ss * sigma}
-        sigma_ss = ((order * math.exp(2 * beta)) / ss)**(1 / 3)
-        cost_ss = pate_ss.compute_rdp_of_smooth_sensitivity_gaussian(
-            beta, sigma_ss, order)
-
-        # Cost captures exact_eps + cost of releasing SS + two stds of noise.
-        cost = rdp_cum + cost_ss + 2 * ss * sigma_ss
-        if cost < cost_opt:
-          cost_opt, beta_opt, ss_opt, sigma_ss_opt = cost, beta, ss, sigma_ss
-
-    if ((i + 1) % 100 == 0) or (i == votes.shape[0] - 1):
-      eps_before_ss = rdp_cum + cost_delta
-      eps_with_ss = (
-          eps_before_ss + pate_ss.compute_rdp_of_smooth_sensitivity_gaussian(
-              beta_opt, sigma_ss_opt, order))
-      print('{}: E[answered queries] = {:.1f}, RDP at {} goes from {:.3f} to '
-            '{:.3f} +/- {:.3f} (ss = {:.4}, beta = {:.4f}, sigma_ss = {:.3f})'.
-            format(i + 1, answered_cum, order, eps_before_ss, eps_with_ss,
-                   ss_opt * sigma_ss_opt, ss_opt, beta_opt, sigma_ss_opt))
-      sys.stdout.flush()
-
-  # Return optimal parameters for the last iteration.
-  return beta_opt, ss_opt, sigma_ss_opt
-
-
-####################
-# HELPER FUNCTIONS #
-####################
-
-
-def _load_votes(counts_file, baseline_file, queries):
-  counts_file_expanded = os.path.expanduser(counts_file)
-  print('Reading raw votes from ' + counts_file_expanded)
-  sys.stdout.flush()
-
-  votes = np.load(counts_file_expanded)
-  print('Shape of the votes matrix = {}'.format(votes.shape))
-
-  if baseline_file is not None:
-    baseline_file_expanded = os.path.expanduser(baseline_file)
-    print('Reading baseline values from ' + baseline_file_expanded)
-    sys.stdout.flush()
-    baseline = np.load(baseline_file_expanded)
-    if votes.shape != baseline.shape:
-      raise ValueError(
-          'Counts file and baseline file must have the same shape. Got {} and '
-          '{} instead.'.format(votes.shape, baseline.shape))
-  else:
-    baseline = np.zeros_like(votes)
-
-  if queries is not None:
-    if votes.shape[0] < queries:
-      raise ValueError('Expect {} rows, got {} in {}'.format(
-          queries, votes.shape[0], counts_file))
-    # Truncate the votes matrix to the number of queries made.
-    votes = votes[:queries,]
-    baseline = baseline[:queries,]
-  else:
-    print('Process all {} input rows. (Use --queries flag to truncate.)'.format(
-        votes.shape[0]))
-
-  return votes, baseline
-
-
-def _count_teachers(votes):
-  s = np.sum(votes, axis=1)
-  num_teachers = int(max(s))
-  if min(s) != num_teachers:
-    raise ValueError(
-        'Matrix of votes is malformed: the number of votes is not the same '
-        'across rows.')
-  return num_teachers
-
-
-def _is_data_ind_step1(num_teachers, threshold, sigma1, orders):
-  if threshold is None:
-    return True
-  return np.all(
-      pate.is_data_independent_always_opt_threshold(num_teachers, threshold,
-                                                    sigma1, orders))
-
-
-def _is_data_ind_step2(num_teachers, num_classes, sigma, orders):
-  return np.all(
-      pate.is_data_independent_always_opt_gaussian(num_teachers, num_classes,
-                                                   sigma, orders))
-
-
-def main(argv):
-  del argv  # Unused.
-
-  if (FLAGS.threshold is None) != (FLAGS.sigma1 is None):
-    raise ValueError(
-        '--threshold flag and --sigma1 flag must be present or absent '
-        'simultaneously.')
-
-  if FLAGS.order is None:
-    # Long list of orders.
-    orders = np.concatenate((np.arange(2, 100 + 1, .5),
-                             np.logspace(np.log10(100), np.log10(500),
-                                         num=100)))
-    # Short list of orders.
-    # orders = np.round(
-    #     np.concatenate((np.arange(2, 50 + 1, 1),
-    #                     np.logspace(np.log10(50), np.log10(1000), num=20))))
-  else:
-    orders = np.array([FLAGS.order])
-
-  votes, baseline = _load_votes(FLAGS.counts_file, FLAGS.baseline_file,
-                                FLAGS.queries)
-
-  if FLAGS.teachers is None:
-    num_teachers = _count_teachers(votes)
-  else:
-    num_teachers = FLAGS.teachers
-
-  num_classes = votes.shape[1]
-
-  order = _compute_rdp(votes, baseline, FLAGS.threshold, FLAGS.sigma1,
-                       FLAGS.sigma2, FLAGS.delta, orders,
-                       FLAGS.data_independent)
-
-  ind_step1 = _is_data_ind_step1(num_teachers, FLAGS.threshold, FLAGS.sigma1,
-                                 order)
-
-  ind_step2 = _is_data_ind_step2(num_teachers, num_classes, FLAGS.sigma2, order)
-
-  if FLAGS.data_independent or (ind_step1 and ind_step2):
-    print('Nothing to do here, all analyses are data-independent.')
-    return
-
-  if not _check_conditions(FLAGS.sigma2, num_classes, [order]):
-    return  # Quit early: sufficient conditions for correctness fail to hold.
-
-  beta_opt, ss_opt, sigma_ss_opt = _find_optimal_smooth_sensitivity_parameters(
-      votes, baseline, num_teachers, FLAGS.threshold, FLAGS.sigma1,
-      FLAGS.sigma2, FLAGS.delta, ind_step1, ind_step2, order)
-
-  print('Optimal beta = {:.4f}, E[SS_beta] = {:.4}, sigma_ss = {:.2f}'.format(
-      beta_opt, ss_opt, sigma_ss_opt))
-
-
-if __name__ == '__main__':
-  app.run(main)
--- a/tensorflow_privacy/research/pate_2018/ICLR2018/utility_queries_answered.py
+++ b/tensorflow_privacy/research/pate_2018/ICLR2018/utility_queries_answered.py
@ -1,90 +0,0 @@
-# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-from absl import app
-from absl import flags
-import matplotlib
-import os
-
-matplotlib.use('TkAgg')
-import matplotlib.pyplot as plt
-
-plt.style.use('ggplot')
-
-FLAGS = flags.FLAGS
-flags.DEFINE_string('plot_file', '', 'Output file name.')
-
-qa_lnmax = [500, 750] + range(1000, 12500, 500)
-
-acc_lnmax = [43.3, 52.3, 59.8, 66.7, 68.8, 70.5, 71.6, 72.3, 72.6, 72.9, 73.4,
-             73.4, 73.7, 73.9, 74.2, 74.4, 74.5, 74.7, 74.8, 75, 75.1, 75.1,
-             75.4, 75.4, 75.4]
-
-qa_gnmax = [456, 683, 908, 1353, 1818, 2260, 2702, 3153, 3602, 4055, 4511, 4964,
-            5422, 5875, 6332, 6792, 7244, 7696, 8146, 8599, 9041, 9496, 9945,
-            10390, 10842]
-
-acc_gnmax = [39.6, 52.2, 59.6, 66.6, 69.6, 70.5, 71.8, 72, 72.7, 72.9, 73.3,
-             73.4, 73.4, 73.8, 74, 74.2, 74.4, 74.5, 74.5, 74.7, 74.8, 75, 75.1,
-             75.1, 75.4]
-
-qa_gnmax_aggressive = [167, 258, 322, 485, 647, 800, 967, 1133, 1282, 1430,
-                       1573, 1728, 1889, 2028, 2190, 2348, 2510, 2668, 2950,
-                       3098, 3265, 3413, 3581, 3730]
-
-acc_gnmax_aggressive = [17.8, 26.8, 39.3, 48, 55.7, 61, 62.8, 64.8, 65.4, 66.7,
-                        66.2, 68.3, 68.3, 68.7, 69.1, 70, 70.2, 70.5, 70.9,
-                        70.7, 71.3, 71.3, 71.3, 71.8]
-
-
-def main(argv):
-  del argv  # Unused.
-
-  plt.close('all')
-  fig, ax = plt.subplots()
-  fig.set_figheight(4.7)
-  fig.set_figwidth(5)
-  ax.plot(qa_lnmax, acc_lnmax, color='r', ls='--', linewidth=5., marker='o',
-          alpha=.5, label='LNMax')
-  ax.plot(qa_gnmax, acc_gnmax, color='g', ls='-', linewidth=5., marker='o',
-          alpha=.5, label='Confident-GNMax')
-  # ax.plot(qa_gnmax_aggressive, acc_gnmax_aggressive, color='b', ls='-', marker='o', alpha=.5, label='Confident-GNMax (aggressive)')
-  plt.xticks([0, 2000, 4000, 6000])
-  plt.xlim([0, 6000])
-  # ax.set_yscale('log')
-  plt.ylim([65, 76])
-  ax.tick_params(labelsize=14)
-  plt.xlabel('Number of queries answered', fontsize=16)
-  plt.ylabel('Student test accuracy (%)', fontsize=16)
-  plt.legend(loc=2, prop={'size': 16})
-
-  x = [400, 2116, 4600, 4680]
-  y = [69.5, 68.5, 74, 72.5]
-  annotations = [0.76, 2.89, 1.42, 5.76]
-  color_annotations = ['g', 'r', 'g', 'r']
-  for i, txt in enumerate(annotations):
-    ax.annotate(r'${\varepsilon=}$' + str(txt), (x[i], y[i]), fontsize=16,
-                color=color_annotations[i])
-
-  plot_filename = os.path.expanduser(FLAGS.plot_file)
-  plt.savefig(plot_filename, bbox_inches='tight')
-  plt.show()
-
-if __name__ == '__main__':
-  app.run(main)
--- a/tensorflow_privacy/research/pate_2018/README.md
+++ b/tensorflow_privacy/research/pate_2018/README.md
@ -1,71 +0,0 @@
-Implementation of an RDP privacy accountant and smooth sensitivity analysis for
-the PATE framework. The underlying theory and supporting experiments appear in
-"Scalable Private Learning with PATE" by Nicolas Papernot, Shuang Song, Ilya
-Mironov, Ananth Raghunathan, Kunal Talwar, Ulfar Erlingsson (ICLR 2018,
-https://arxiv.org/abs/1802.08908).
-
-## Overview
-
-The PATE ('Private Aggregation of Teacher Ensembles') framework was introduced 
-by Papernot et al. in "Semi-supervised Knowledge Transfer for Deep Learning from
-Private Training Data" (ICLR 2017, https://arxiv.org/abs/1610.05755). The 
-framework enables model-agnostic training that provably provides [differential
-privacy](https://en.wikipedia.org/wiki/Differential_privacy) of the training 
-dataset. 
-
-The framework consists of _teachers_, the _student_ model, and the _aggregator_. The 
-teachers are models trained on disjoint subsets of the training datasets. The student
-model has access to an insensitive (e.g., public) unlabelled dataset, which is labelled by 
-interacting with the ensemble of teachers via the _aggregator_. The aggregator tallies 
-outputs of the teacher models, and either forwards a (noisy) aggregate to the student, or
-refuses to answer.
-
-Differential privacy is enforced by the aggregator. The privacy guarantees can be _data-independent_,
-which means that they are solely the function of the aggregator's parameters. Alternatively, privacy 
-analysis can be _data-dependent_, which allows for finer reasoning where, under certain conditions on
-the input distribution, the final privacy guarantees can be improved relative to the data-independent
-analysis. Data-dependent privacy guarantees may, by themselves, be a function of sensitive data and 
-therefore publishing these guarantees requires its own sanitization procedure. In our case 
-sanitization of data-dependent privacy guarantees proceeds via _smooth sensitivity_ analysis.
-
-The common machinery used for all privacy analyses in this repository is the 
-R&eacute;nyi differential privacy, or RDP (see https://arxiv.org/abs/1702.07476). 
-
-This repository contains implementations of privacy accountants and smooth 
-sensitivity analysis for several data-independent and data-dependent mechanism that together
-comprise the PATE framework.
-
-
-### Requirements
-
-* Python, version &ge; 2.7
-* absl (see [here](https://github.com/abseil/abseil-py), or just type `pip install absl-py`)
-* numpy
-* scipy
-* sympy (for smooth sensitivity analysis)
-* unittest (for testing)
-
-
-### Self-testing
-
-To verify the installation run
-```bash
-$ python core_test.py
-$ python smooth_sensitivity_test.py
-```
-
-
-## Files in this directory
-
-*   core.py &mdash; RDP privacy accountant for several vote aggregators (GNMax,
-    Threshold, Laplace).
-
-*   smooth_sensitivity.py &mdash; Smooth sensitivity analysis for GNMax and
-    Threshold mechanisms.
-
-*   core_test.py and smooth_sensitivity_test.py &mdash; Unit tests for the
-    files above.
-
-## Contact information
-
-You may direct your comments to mironov@google.com and PR to @ilyamironov.
--- a/tensorflow_privacy/research/pate_2018/core.py
+++ b/tensorflow_privacy/research/pate_2018/core.py
@ -1,370 +0,0 @@
-# Copyright 2017 The 'Scalable Private Learning with PATE' Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-"""Core functions for RDP analysis in PATE framework.
-
-This library comprises the core functions for doing differentially private
-analysis of the PATE architecture and its various Noisy Max and other
-mechanisms.
-"""
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import math
-
-from absl import app
-import numpy as np
-import scipy.stats
-
-
-def _logaddexp(x):
-  """Addition in the log space. Analogue of numpy.logaddexp for a list."""
-  m = max(x)
-  return m + math.log(sum(np.exp(x - m)))
-
-
-def _log1mexp(x):
-  """Numerically stable computation of log(1-exp(x))."""
-  if x < -1:
-    return math.log1p(-math.exp(x))
-  elif x < 0:
-    return math.log(-math.expm1(x))
-  elif x == 0:
-    return -np.inf
-  else:
-    raise ValueError("Argument must be non-positive.")
-
-
-def compute_eps_from_delta(orders, rdp, delta):
-  """Translates between RDP and (eps, delta)-DP.
-
-  Args:
-    orders: A list (or a scalar) of orders.
-    rdp: A list of RDP guarantees (of the same length as orders).
-    delta: Target delta.
-
-  Returns:
-    Pair of (eps, optimal_order).
-
-  Raises:
-    ValueError: If input is malformed.
-  """
-  if len(orders) != len(rdp):
-    raise ValueError("Input lists must have the same length.")
-  eps = np.array(rdp) - math.log(delta) / (np.array(orders) - 1)
-  idx_opt = np.argmin(eps)
-  return eps[idx_opt], orders[idx_opt]
-
-
-#####################
-# RDP FOR THE GNMAX #
-#####################
-
-
-def compute_logq_gaussian(counts, sigma):
-  """Returns an upper bound on ln Pr[outcome != argmax] for GNMax.
-
-  Implementation of Proposition 7.
-
-  Args:
-    counts: A numpy array of scores.
-    sigma: The standard deviation of the Gaussian noise in the GNMax mechanism.
-
-  Returns:
-    logq: Natural log of the probability that outcome is different from argmax.
-  """
-  n = len(counts)
-  variance = sigma**2
-  idx_max = np.argmax(counts)
-  counts_normalized = counts[idx_max] - counts
-  counts_rest = counts_normalized[np.arange(n) != idx_max]  # exclude one index
-  # Upper bound q via a union bound rather than a more precise calculation.
-  logq = _logaddexp(
-      scipy.stats.norm.logsf(counts_rest, scale=math.sqrt(2 * variance)))
-
-  # A sketch of a more accurate estimate, which is currently disabled for two
-  # reasons:
-  # 1. Numerical instability;
-  # 2. Not covered by smooth sensitivity analysis.
-  # covariance = variance * (np.ones((n - 1, n - 1)) + np.identity(n - 1))
-  # logq = np.log1p(-statsmodels.sandbox.distributions.extras.mvnormcdf(
-  #     counts_rest, np.zeros(n - 1), covariance, maxpts=1e4))
-
-  return min(logq, math.log(1 - (1 / n)))
-
-
-def rdp_data_independent_gaussian(sigma, orders):
-  """Computes a data-independent RDP curve for GNMax.
-
-  Implementation of Proposition 8.
-
-  Args:
-    sigma: Standard deviation of Gaussian noise.
-    orders: An array_like list of Renyi orders.
-
-  Returns:
-    Upper bound on RPD for all orders. A scalar if orders is a scalar.
-
-  Raises:
-    ValueError: If the input is malformed.
-  """
-  if sigma < 0 or np.any(orders <= 1):  # not defined for alpha=1
-    raise ValueError("Inputs are malformed.")
-
-  variance = sigma**2
-  if np.isscalar(orders):
-    return orders / variance
-  else:
-    return np.atleast_1d(orders) / variance
-
-
-def rdp_gaussian(logq, sigma, orders):
-  """Bounds RDP from above of GNMax given an upper bound on q (Theorem 6).
-
-  Args:
-    logq: Natural logarithm of the probability of a non-argmax outcome.
-    sigma: Standard deviation of Gaussian noise.
-    orders: An array_like list of Renyi orders.
-
-  Returns:
-    Upper bound on RPD for all orders. A scalar if orders is a scalar.
-
-  Raises:
-    ValueError: If the input is malformed.
-  """
-  if logq > 0 or sigma < 0 or np.any(orders <= 1):  # not defined for alpha=1
-    raise ValueError("Inputs are malformed.")
-
-  if np.isneginf(logq):  # If the mechanism's output is fixed, it has 0-DP.
-    if np.isscalar(orders):
-      return 0.
-    else:
-      return np.full_like(orders, 0., dtype=np.float)
-
-  variance = sigma**2
-
-  # Use two different higher orders: mu_hi1 and mu_hi2 computed according to
-  # Proposition 10.
-  mu_hi2 = math.sqrt(variance * -logq)
-  mu_hi1 = mu_hi2 + 1
-
-  orders_vec = np.atleast_1d(orders)
-
-  ret = orders_vec / variance  # baseline: data-independent bound
-
-  # Filter out entries where data-dependent bound does not apply.
-  mask = np.logical_and(mu_hi1 > orders_vec, mu_hi2 > 1)
-
-  rdp_hi1 = mu_hi1 / variance
-  rdp_hi2 = mu_hi2 / variance
-
-  log_a2 = (mu_hi2 - 1) * rdp_hi2
-
-  # Make sure q is in the increasing wrt q range and A is positive.
-  if (np.any(mask) and logq <= log_a2 - mu_hi2 *
-      (math.log(1 + 1 / (mu_hi1 - 1)) + math.log(1 + 1 / (mu_hi2 - 1))) and
-      -logq > rdp_hi2):
-    # Use log1p(x) = log(1 + x) to avoid catastrophic cancellations when x ~ 0.
-    log1q = _log1mexp(logq)  # log1q = log(1-q)
-    log_a = (orders - 1) * (
-        log1q - _log1mexp((logq + rdp_hi2) * (1 - 1 / mu_hi2)))
-    log_b = (orders - 1) * (rdp_hi1 - logq / (mu_hi1 - 1))
-
-    # Use logaddexp(x, y) = log(e^x + e^y) to avoid overflow for large x, y.
-    log_s = np.logaddexp(log1q + log_a, logq + log_b)
-    ret[mask] = np.minimum(ret, log_s / (orders - 1))[mask]
-
-  assert np.all(ret >= 0)
-
-  if np.isscalar(orders):
-    return np.asscalar(ret)
-  else:
-    return ret
-
-
-def is_data_independent_always_opt_gaussian(num_teachers, num_classes, sigma,
-                                            orders):
-  """Tests whether data-ind bound is always optimal for GNMax.
-
-  Args:
-    num_teachers: Number of teachers.
-    num_classes: Number of classes.
-    sigma: Standard deviation of the Gaussian noise.
-    orders: An array_like list of Renyi orders.
-
-  Returns:
-    Boolean array of length |orders| (a scalar if orders is a scalar). True if
-    the data-independent bound is always the same as the data-dependent bound.
-
-  """
-  unanimous = np.array([num_teachers] + [0] * (num_classes - 1))
-  logq = compute_logq_gaussian(unanimous, sigma)
-
-  rdp_dep = rdp_gaussian(logq, sigma, orders)
-  rdp_ind = rdp_data_independent_gaussian(sigma, orders)
-  return np.isclose(rdp_dep, rdp_ind)
-
-
-###################################
-# RDP FOR THE THRESHOLD MECHANISM #
-###################################
-
-
-def compute_logpr_answered(t, sigma, counts):
-  """Computes log of the probability that a noisy threshold is crossed.
-
-  Args:
-    t: The threshold.
-    sigma: The stdev of the Gaussian noise added to the threshold.
-    counts: An array of votes.
-
-  Returns:
-    Natural log of the probability that max is larger than a noisy threshold.
-  """
-  # Compared to the paper, max(counts) is rounded to the nearest integer. This
-  # is done to facilitate computation of smooth sensitivity for the case of
-  # the interactive mechanism, where votes are not necessarily integer.
-  return scipy.stats.norm.logsf(t - round(max(counts)), scale=sigma)
-
-
-def compute_rdp_data_independent_threshold(sigma, orders):
-  # The input to the threshold mechanism has stability 1, compared to
-  # GNMax, which has stability = 2. Hence the sqrt(2) factor below.
-  return rdp_data_independent_gaussian(2**.5 * sigma, orders)
-
-
-def compute_rdp_threshold(log_pr_answered, sigma, orders):
-  logq = min(log_pr_answered, _log1mexp(log_pr_answered))
-  # The input to the threshold mechanism has stability 1, compared to
-  # GNMax, which has stability = 2. Hence the sqrt(2) factor below.
-  return rdp_gaussian(logq, 2**.5 * sigma, orders)
-
-
-def is_data_independent_always_opt_threshold(num_teachers, threshold, sigma,
-                                             orders):
-  """Tests whether data-ind bound is always optimal for the threshold mechanism.
-
-  Args:
-    num_teachers: Number of teachers.
-    threshold: The cut-off threshold.
-    sigma: Standard deviation of the Gaussian noise.
-    orders: An array_like list of Renyi orders.
-
-  Returns:
-    Boolean array of length |orders| (a scalar if orders is a scalar). True if
-    the data-independent bound is always the same as the data-dependent bound.
-  """
-
-  # Since the data-dependent bound depends only on max(votes), it suffices to
-  # check whether the data-dependent bounds are better than data-independent
-  # bounds in the extreme cases when max(votes) is minimal or maximal.
-  # For both Confident GNMax and Interactive GNMax it holds that
-  #   0 <= max(votes) <= num_teachers.
-  # The upper bound is trivial in both cases.
-  # The lower bound is trivial for Confident GNMax (and a stronger one, based on
-  # the pigeonhole principle, is possible).
-  # For Interactive GNMax (Algorithm 2), the lower bound follows from the
-  # following argument. Since the votes vector is the difference between the
-  # actual teachers' votes and the student's baseline, we need to argue that
-  # max(n_j - M * p_j) >= 0.
-  # The bound holds because sum_j n_j = sum M * p_j = M. Thus,
-  # sum_j (n_j - M * p_j) = 0, and max_j (n_j - M * p_j) >= 0 as needed.
-  logq1 = compute_logpr_answered(threshold, sigma, [0])
-  logq2 = compute_logpr_answered(threshold, sigma, [num_teachers])
-
-  rdp_dep1 = compute_rdp_threshold(logq1, sigma, orders)
-  rdp_dep2 = compute_rdp_threshold(logq2, sigma, orders)
-
-  rdp_ind = compute_rdp_data_independent_threshold(sigma, orders)
-  return np.isclose(rdp_dep1, rdp_ind) and np.isclose(rdp_dep2, rdp_ind)
-
-
-#############################
-# RDP FOR THE LAPLACE NOISE #
-#############################
-
-
-def compute_logq_laplace(counts, lmbd):
-  """Computes an upper bound on log Pr[outcome != argmax] for LNMax.
-
-  Args:
-    counts: A list of scores.
-    lmbd: The lambda parameter of the Laplace distribution ~exp(-|x| / lambda).
-
-  Returns:
-    logq: Natural log of the probability that outcome is different from argmax.
-  """
-  # For noisy max, we only get an upper bound via the union bound. See Lemma 4
-  # in https://arxiv.org/abs/1610.05755.
-  #
-  # Pr[ j beats i*] = (2+gap(j,i*))/ 4 exp(gap(j,i*)
-  # proof at http://mathoverflow.net/questions/66763/
-
-  idx_max = np.argmax(counts)
-  counts_normalized = (counts - counts[idx_max]) / lmbd
-  counts_rest = np.array(
-      [counts_normalized[i] for i in range(len(counts)) if i != idx_max])
-
-  logq = _logaddexp(np.log(2 - counts_rest) + math.log(.25) + counts_rest)
-
-  return min(logq, math.log(1 - (1 / len(counts))))
-
-
-def rdp_pure_eps(logq, pure_eps, orders):
-  """Computes the RDP value given logq and pure privacy eps.
-
-  Implementation of https://arxiv.org/abs/1610.05755, Theorem 3.
-
-  The bound used is the min of three terms. The first term is from
-  https://arxiv.org/pdf/1605.02065.pdf.
-  The second term is based on the fact that when event has probability (1-q) for
-  q close to zero, q can only change by exp(eps), which corresponds to a
-  much smaller multiplicative change in (1-q)
-  The third term comes directly from the privacy guarantee.
-
-  Args:
-    logq: Natural logarithm of the probability of a non-optimal outcome.
-    pure_eps: eps parameter for DP
-    orders: array_like list of moments to compute.
-
-  Returns:
-    Array of upper bounds on rdp (a scalar if orders is a scalar).
-  """
-  orders_vec = np.atleast_1d(orders)
-  q = math.exp(logq)
-  log_t = np.full_like(orders_vec, np.inf)
-  if q <= 1 / (math.exp(pure_eps) + 1):
-    logt_one = math.log1p(-q) + (
-        math.log1p(-q) - _log1mexp(pure_eps + logq)) * (
-            orders_vec - 1)
-    logt_two = logq + pure_eps * (orders_vec - 1)
-    log_t = np.logaddexp(logt_one, logt_two)
-
-  ret = np.minimum(
-      np.minimum(0.5 * pure_eps * pure_eps * orders_vec,
-                 log_t / (orders_vec - 1)), pure_eps)
-  if np.isscalar(orders):
-    return np.asscalar(ret)
-  else:
-    return ret
-
-
-def main(argv):
-  del argv  # Unused.
-
-
-if __name__ == "__main__":
-  app.run(main)
--- a/tensorflow_privacy/research/pate_2018/core_test.py
+++ b/tensorflow_privacy/research/pate_2018/core_test.py
@ -1,124 +0,0 @@
-# Copyright 2017 The 'Scalable Private Learning with PATE' Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-"""Tests for pate.core."""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import sys
-import unittest
-import numpy as np
-
-import core as pate
-
-
-class PateTest(unittest.TestCase):
-
-  def _test_rdp_gaussian_value_errors(self):
-    # Test for ValueErrors.
-    with self.assertRaises(ValueError):
-      pate.rdp_gaussian(1.0, 1.0, np.array([2, 3, 4]))
-    with self.assertRaises(ValueError):
-      pate.rdp_gaussian(np.log(0.5), -1.0, np.array([2, 3, 4]))
-    with self.assertRaises(ValueError):
-      pate.rdp_gaussian(np.log(0.5), 1.0, np.array([1, 3, 4]))
-
-  def _test_rdp_gaussian_as_function_of_q(self):
-    # Test for data-independent and data-dependent ranges over q.
-    # The following corresponds to orders 1.1, 2.5, 32, 250
-    # sigmas 1.5, 15, 1500, 15000.
-    # Hand calculated -log(q0)s arranged in a 'sigma major' ordering.
-    neglogq0s = [
-        2.8, 2.6, 427, None, 4.8, 4.0, 4.7, 275, 9.6, 8.8, 6.0, 4, 12, 11.2,
-        8.6, 6.4
-    ]
-    idx_neglogq0s = 0  # To iterate through neglogq0s.
-    orders = [1.1, 2.5, 32, 250]
-    sigmas = [1.5, 15, 1500, 15000]
-    for sigma in sigmas:
-      for order in orders:
-        curr_neglogq0 = neglogq0s[idx_neglogq0s]
-        idx_neglogq0s += 1
-        if curr_neglogq0 is None:  # sigma == 1.5 and order == 250:
-          continue
-
-        rdp_at_q0 = pate.rdp_gaussian(-curr_neglogq0, sigma, order)
-
-        # Data-dependent range. (Successively halve the value of q.)
-        logq_dds = (-curr_neglogq0 - np.array(
-            [0, np.log(2), np.log(4), np.log(8)]))
-        # Check that in q_dds, rdp is decreasing.
-        for idx in range(len(logq_dds) - 1):
-          self.assertGreater(
-              pate.rdp_gaussian(logq_dds[idx], sigma, order),
-              pate.rdp_gaussian(logq_dds[idx + 1], sigma, order))
-
-        # Data-independent range.
-        q_dids = np.exp(-curr_neglogq0) + np.array([0.1, 0.2, 0.3, 0.4])
-        # Check that in q_dids, rdp is constant.
-        for q in q_dids:
-          self.assertEqual(rdp_at_q0, pate.rdp_gaussian(
-              np.log(q), sigma, order))
-
-  def _test_compute_eps_from_delta_value_error(self):
-    # Test for ValueError.
-    with self.assertRaises(ValueError):
-      pate.compute_eps_from_delta([1.1, 2, 3, 4], [1, 2, 3], 0.001)
-
-  def _test_compute_eps_from_delta_monotonicity(self):
-    # Test for monotonicity with respect to delta.
-    orders = [1.1, 2.5, 250.0]
-    sigmas = [1e-3, 1.0, 1e5]
-    deltas = [1e-60, 1e-6, 0.1, 0.999]
-    for sigma in sigmas:
-      list_of_eps = []
-      rdps_for_gaussian = np.array(orders) / (2 * sigma**2)
-      for delta in deltas:
-        list_of_eps.append(
-            pate.compute_eps_from_delta(orders, rdps_for_gaussian, delta)[0])
-
-      # Check that in list_of_eps, epsilons are decreasing (as delta increases).
-      sorted_list_of_eps = list(list_of_eps)
-      sorted_list_of_eps.sort(reverse=True)
-      self.assertEqual(list_of_eps, sorted_list_of_eps)
-
-  def _test_compute_q0(self):
-    # Stub code to search a logq space and figure out logq0 by eyeballing
-    # results. This code does not run with the tests. Remove underscore to run.
-    sigma = 15
-    order = 250
-    logqs = np.arange(-290, -270, 1)
-    count = 0
-    for logq in logqs:
-      count += 1
-      sys.stdout.write("\t%0.5g: %0.10g" %
-                       (logq, pate.rdp_gaussian(logq, sigma, order)))
-      sys.stdout.flush()
-      if count % 5 == 0:
-        print("")
-
-  def test_rdp_gaussian(self):
-    self._test_rdp_gaussian_value_errors()
-    self._test_rdp_gaussian_as_function_of_q()
-
-  def test_compute_eps_from_delta(self):
-    self._test_compute_eps_from_delta_value_error()
-    self._test_compute_eps_from_delta_monotonicity()
-
-
-if __name__ == "__main__":
-  unittest.main()
--- a/tensorflow_privacy/research/pate_2018/smooth_sensitivity.py
+++ b/tensorflow_privacy/research/pate_2018/smooth_sensitivity.py
@ -1,419 +0,0 @@
-# Copyright 2017 The 'Scalable Private Learning with PATE' Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-"""Functions for smooth sensitivity analysis for PATE mechanisms.
-
-This library implements functionality for doing smooth sensitivity analysis
-for Gaussian Noise Max (GNMax), Threshold with Gaussian noise, and Gaussian
-Noise with Smooth Sensitivity (GNSS) mechanisms.
-"""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import math
-from absl import app
-import numpy as np
-import scipy
-import sympy as sp
-
-import core as pate
-
-################################
-# SMOOTH SENSITIVITY FOR GNMAX #
-################################
-
-# Global dictionary for storing cached q0 values keyed by (sigma, order).
-_logq0_cache = {}
-
-
-def _compute_logq0(sigma, order):
-  key = (sigma, order)
-  if key in _logq0_cache:
-    return _logq0_cache[key]
-
-  logq0 = compute_logq0_gnmax(sigma, order)
-
-  _logq0_cache[key] = logq0  # Update the global variable.
-  return logq0
-
-
-def _compute_logq1(sigma, order, num_classes):
-  logq0 = _compute_logq0(sigma, order)  # Most likely already cached.
-  logq1 = math.log(_compute_bl_gnmax(math.exp(logq0), sigma, num_classes))
-  assert logq1 <= logq0
-  return logq1
-
-
-def _compute_mu1_mu2_gnmax(sigma, logq):
-  # Computes mu1, mu2 according to Proposition 10.
-  mu2 = sigma * math.sqrt(-logq)
-  mu1 = mu2 + 1
-  return mu1, mu2
-
-
-def _compute_data_dep_bound_gnmax(sigma, logq, order):
-  # Applies Theorem 6 in Appendix without checking that logq satisfies necessary
-  # constraints. The pre-conditions must be assured by comparing logq against
-  # logq0 by the caller.
-  variance = sigma**2
-  mu1, mu2 = _compute_mu1_mu2_gnmax(sigma, logq)
-  eps1 = mu1 / variance
-  eps2 = mu2 / variance
-
-  log1q = np.log1p(-math.exp(logq))  # log1q = log(1-q)
-  log_a = (order - 1) * (
-      log1q - (np.log1p(-math.exp((logq + eps2) * (1 - 1 / mu2)))))
-  log_b = (order - 1) * (eps1 - logq / (mu1 - 1))
-
-  return np.logaddexp(log1q + log_a, logq + log_b) / (order - 1)
-
-
-def _compute_rdp_gnmax(sigma, logq, order):
-  logq0 = _compute_logq0(sigma, order)
-  if logq >= logq0:
-    return pate.rdp_data_independent_gaussian(sigma, order)
-  else:
-    return _compute_data_dep_bound_gnmax(sigma, logq, order)
-
-
-def compute_logq0_gnmax(sigma, order):
-  """Computes the point where we start using data-independent bounds.
-
-  Args:
-    sigma: std of the Gaussian noise
-    order: Renyi order lambda
-
-  Returns:
-    logq0: the point above which the data-ind bound overtakes data-dependent
-    bound.
-  """
-
-  def _check_validity_conditions(logq):
-    # Function returns true iff logq is in the range where data-dependent bound
-    # is valid. (Theorem 6 in Appendix.)
-    mu1, mu2 = _compute_mu1_mu2_gnmax(sigma, logq)
-    if mu1 < order:
-      return False
-    eps2 = mu2 / sigma**2
-    # Do computation in the log space. The condition below comes from Lemma 9
-    # from Appendix.
-    return (logq <= (mu2 - 1) * eps2 - mu2 * math.log(mu1 / (mu1 - 1) * mu2 /
-                                                      (mu2 - 1)))
-
-  def _compare_dep_vs_ind(logq):
-    return (_compute_data_dep_bound_gnmax(sigma, logq, order) -
-            pate.rdp_data_independent_gaussian(sigma, order))
-
-  # Natural upper bounds on q0.
-  logub = min(-(1 + 1. / sigma)**2, -((order - .99) / sigma)**2, -1 / sigma**2)
-  assert _check_validity_conditions(logub)
-
-  # If data-dependent bound is already better, we are done already.
-  if _compare_dep_vs_ind(logub) < 0:
-    return logub
-
-  # Identifying a reasonable lower bound to bracket logq0.
-  loglb = 2 * logub  # logub is negative, and thus loglb < logub.
-  while _compare_dep_vs_ind(loglb) > 0:
-    assert loglb > -10000, "The lower bound on q0 is way too low."
-    loglb *= 1.5
-
-  logq0, r = scipy.optimize.brentq(
-      _compare_dep_vs_ind, loglb, logub, full_output=True)
-  assert r.converged, "The root finding procedure failed to converge."
-  assert _check_validity_conditions(logq0)  # just in case.
-
-  return logq0
-
-
-def _compute_bl_gnmax(q, sigma, num_classes):
-  return ((num_classes - 1) / 2 * scipy.special.erfc(
-      1 / sigma + scipy.special.erfcinv(2 * q / (num_classes - 1))))
-
-
-def _compute_bu_gnmax(q, sigma, num_classes):
-  return min(1, (num_classes - 1) / 2 * scipy.special.erfc(
-      -1 / sigma + scipy.special.erfcinv(2 * q / (num_classes - 1))))
-
-
-def _compute_local_sens_gnmax(logq, sigma, num_classes, order):
-  """Implements Algorithm 3 (computes an upper bound on local sensitivity).
-
-  (See Proposition 13 for proof of correctness.)
-  """
-  logq0 = _compute_logq0(sigma, order)
-  logq1 = _compute_logq1(sigma, order, num_classes)
-  if logq1 <= logq <= logq0:
-    logq = logq1
-
-  beta = _compute_rdp_gnmax(sigma, logq, order)
-  beta_bu_q = _compute_rdp_gnmax(
-      sigma, math.log(_compute_bu_gnmax(math.exp(logq), sigma, num_classes)),
-      order)
-  beta_bl_q = _compute_rdp_gnmax(
-      sigma, math.log(_compute_bl_gnmax(math.exp(logq), sigma, num_classes)),
-      order)
-  return max(beta_bu_q - beta, beta - beta_bl_q)
-
-
-def compute_local_sensitivity_bounds_gnmax(votes, num_teachers, sigma, order):
-  """Computes a list of max-LS-at-distance-d for the GNMax mechanism.
-
-  A more efficient implementation of Algorithms 4 and 5 working in time
-  O(teachers*classes). A naive implementation is O(teachers^2*classes) or worse.
-
-  Args:
-    votes: A numpy array of votes.
-    num_teachers: Total number of voting teachers.
-    sigma: Standard deviation of the Guassian noise.
-    order: The Renyi order.
-
-  Returns:
-    A numpy array of local sensitivities at distances d, 0 <= d <= num_teachers.
-  """
-
-  num_classes = len(votes)  # Called m in the paper.
-
-  logq0 = _compute_logq0(sigma, order)
-  logq1 = _compute_logq1(sigma, order, num_classes)
-  logq = pate.compute_logq_gaussian(votes, sigma)
-  plateau = _compute_local_sens_gnmax(logq1, sigma, num_classes, order)
-
-  res = np.full(num_teachers, plateau)
-
-  if logq1 <= logq <= logq0:
-    return res
-
-  # Invariant: votes is sorted in the non-increasing order.
-  votes = sorted(votes, reverse=True)
-
-  res[0] = _compute_local_sens_gnmax(logq, sigma, num_classes, order)
-  curr_d = 0
-
-  go_left = logq > logq0  # Otherwise logq < logq1 and we go right.
-
-  # Iterate while the following is true:
-  #    1. If we are going left, logq is still larger than logq0 and we may still
-  #       increase the gap between votes[0] and votes[1].
-  #    2. If we are going right, logq is still smaller than logq1.
-  while ((go_left and logq > logq0 and votes[1] > 0) or
-         (not go_left and logq < logq1)):
-    curr_d += 1
-    if go_left:  # Try decreasing logq.
-      votes[0] += 1
-      votes[1] -= 1
-      idx = 1
-      # Restore the invariant. (Can be implemented more efficiently by keeping
-      # track of the range of indices equal to votes[1]. Does not seem to matter
-      # for the overall running time.)
-      while idx < len(votes) - 1 and votes[idx] < votes[idx + 1]:
-        votes[idx], votes[idx + 1] = votes[idx + 1], votes[idx]
-        idx += 1
-    else:  # Go right, i.e., try increasing logq.
-      votes[0] -= 1
-      votes[1] += 1  # The invariant holds since otherwise logq >= logq1.
-
-    logq = pate.compute_logq_gaussian(votes, sigma)
-    res[curr_d] = _compute_local_sens_gnmax(logq, sigma, num_classes, order)
-
-  return res
-
-
-##################################################
-# SMOOTH SENSITIVITY FOR THE THRESHOLD MECHANISM #
-##################################################
-
-# A global dictionary of RDPs for various threshold values. Indexed by a 4-tuple
-# (num_teachers, threshold, sigma, order).
-_rdp_thresholds = {}
-
-
-def _compute_rdp_list_threshold(num_teachers, threshold, sigma, order):
-  key = (num_teachers, threshold, sigma, order)
-  if key in _rdp_thresholds:
-    return _rdp_thresholds[key]
-
-  res = np.zeros(num_teachers + 1)
-  for v in range(0, num_teachers + 1):
-    logp = scipy.stats.norm.logsf(threshold - v, scale=sigma)
-    res[v] = pate.compute_rdp_threshold(logp, sigma, order)
-
-  _rdp_thresholds[key] = res
-  return res
-
-
-def compute_local_sensitivity_bounds_threshold(counts, num_teachers, threshold,
-                                               sigma, order):
-  """Computes a list of max-LS-at-distance-d for the threshold mechanism."""
-
-  def _compute_ls(v):
-    ls_step_up, ls_step_down = float("-inf"), float("-inf")
-    if v > 0:
-      ls_step_down = abs(rdp_list[v - 1] - rdp_list[v])
-    if v < num_teachers:
-      ls_step_up = abs(rdp_list[v + 1] - rdp_list[v])
-    return max(ls_step_down, ls_step_up)  # Rely on max(x, None) = x.
-
-  cur_max = int(round(max(counts)))
-  rdp_list = _compute_rdp_list_threshold(num_teachers, threshold, sigma, order)
-
-  ls = np.zeros(num_teachers)
-  for d in range(max(cur_max, num_teachers - cur_max)):
-    ls_up, ls_down = float("-inf"), float("-inf")
-    if cur_max + d <= num_teachers:
-      ls_up = _compute_ls(cur_max + d)
-    if cur_max - d >= 0:
-      ls_down = _compute_ls(cur_max - d)
-    ls[d] = max(ls_up, ls_down)
-  return ls
-
-
-#############################################
-# PROCEDURES FOR SMOOTH SENSITIVITY RELEASE #
-#############################################
-
-# A global dictionary of exponentially decaying arrays. Indexed by beta.
-dict_beta_discount = {}
-
-
-def compute_discounted_max(beta, a):
-  n = len(a)
-
-  if beta not in dict_beta_discount or (len(dict_beta_discount[beta]) < n):
-    dict_beta_discount[beta] = np.exp(-beta * np.arange(n))
-
-  return max(a * dict_beta_discount[beta][:n])
-
-
-def compute_smooth_sensitivity_gnmax(beta, counts, num_teachers, sigma, order):
-  """Computes smooth sensitivity of a single application of GNMax."""
-
-  ls = compute_local_sensitivity_bounds_gnmax(counts, sigma, order,
-                                              num_teachers)
-  return compute_discounted_max(beta, ls)
-
-
-def compute_rdp_of_smooth_sensitivity_gaussian(beta, sigma, order):
-  """Computes the RDP curve for the GNSS mechanism.
-
-  Implements Theorem 23 (https://arxiv.org/pdf/1802.08908.pdf).
-  """
-  if beta > 0 and not 1 < order < 1 / (2 * beta):
-    raise ValueError("Order outside the (1, 1/(2*beta)) range.")
-
-  return order * math.exp(2 * beta) / sigma**2 + (
-      -.5 * math.log(1 - 2 * order * beta) + beta * order) / (
-          order - 1)
-
-
-def compute_params_for_ss_release(eps, delta):
-  """Computes sigma for additive Gaussian noise scaled by smooth sensitivity.
-
-  Presently not used. (We proceed via RDP analysis.)
-
-  Compute beta, sigma for applying Lemma 2.6 (full version of Nissim et al.) via
-  Lemma 2.10.
-  """
-  # Rather than applying Lemma 2.10 directly, which would give suboptimal alpha,
-  # (see http://www.cse.psu.edu/~ads22/pubs/NRS07/NRS07-full-draft-v1.pdf),
-  # we extract a sufficient condition on alpha from its proof.
-  #
-  # Let a = rho_(delta/2)(Z_1). Then solve for alpha such that
-  # 2 alpha a + alpha^2 = eps/2.
-  a = scipy.special.ndtri(1 - delta / 2)
-  alpha = math.sqrt(a**2 + eps / 2) - a
-
-  beta = eps / (2 * scipy.special.chdtri(1, delta / 2))
-
-  return alpha, beta
-
-
-#######################################################
-# SYMBOLIC-NUMERIC VERIFICATION OF CONDITIONS C5--C6. #
-#######################################################
-
-
-def _construct_symbolic_beta(q, sigma, order):
-  mu2 = sigma * sp.sqrt(sp.log(1 / q))
-  mu1 = mu2 + 1
-  eps1 = mu1 / sigma**2
-  eps2 = mu2 / sigma**2
-  a = (1 - q) / (1 - (q * sp.exp(eps2))**(1 - 1 / mu2))
-  b = sp.exp(eps1) / q**(1 / (mu1 - 1))
-  s = (1 - q) * a**(order - 1) + q * b**(order - 1)
-  return (1 / (order - 1)) * sp.log(s)
-
-
-def _construct_symbolic_bu(q, sigma, m):
-  return (m - 1) / 2 * sp.erfc(sp.erfcinv(2 * q / (m - 1)) - 1 / sigma)
-
-
-def _is_non_decreasing(fn, q, bounds):
-  """Verifies whether the function is non-decreasing within a range.
-
-  Args:
-    fn: Symbolic function of a single variable.
-    q: The name of f's variable.
-    bounds: Pair of (lower_bound, upper_bound) reals.
-
-  Returns:
-    True iff the function is non-decreasing in the range.
-  """
-  diff_fn = sp.diff(fn, q)  # Symbolically compute the derivative.
-  diff_fn_lambdified = sp.lambdify(
-      q,
-      diff_fn,
-      modules=[
-          "numpy", {
-              "erfc": scipy.special.erfc,
-              "erfcinv": scipy.special.erfcinv
-          }
-      ])
-  r = scipy.optimize.minimize_scalar(
-      diff_fn_lambdified, bounds=bounds, method="bounded")
-  assert r.success, "Minimizer failed to converge."
-  return r.fun >= 0  # Check whether the derivative is non-negative.
-
-
-def check_conditions(sigma, m, order):
-  """Checks conditions C5 and C6 (Section B.4.2 in Appendix)."""
-  q = sp.symbols("q", positive=True, real=True)
-
-  beta = _construct_symbolic_beta(q, sigma, order)
-  q0 = math.exp(compute_logq0_gnmax(sigma, order))
-
-  cond5 = _is_non_decreasing(beta, q, (0, q0))
-
-  if cond5:
-    bl_q0 = _compute_bl_gnmax(q0, sigma, m)
-
-    bu = _construct_symbolic_bu(q, sigma, m)
-    delta_beta = beta.subs(q, bu) - beta
-
-    cond6 = _is_non_decreasing(delta_beta, q, (0, bl_q0))
-  else:
-    cond6 = False  # Skip the check, since Condition 5 is false already.
-
-  return (cond5, cond6)
-
-
-def main(argv):
-  del argv  # Unused.
-
-
-if __name__ == "__main__":
-  app.run(main)
--- a/tensorflow_privacy/research/pate_2018/smooth_sensitivity_test.py
+++ b/tensorflow_privacy/research/pate_2018/smooth_sensitivity_test.py
@ -1,126 +0,0 @@
-# Copyright 2017 The 'Scalable Private Learning with PATE' Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-"""Tests for pate.smooth_sensitivity."""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import unittest
-import numpy as np
-
-import smooth_sensitivity as pate_ss
-
-
-class PateSmoothSensitivityTest(unittest.TestCase):
-
-  def test_check_conditions(self):
-    self.assertEqual(pate_ss.check_conditions(20, 10, 25.), (True, False))
-    self.assertEqual(pate_ss.check_conditions(30, 10, 25.), (True, True))
-
-  def _assert_all_close(self, x, y):
-    """Asserts that two numpy arrays are close."""
-    self.assertEqual(len(x), len(y))
-    self.assertTrue(np.allclose(x, y, rtol=1e-8, atol=0))
-
-  def test_compute_local_sensitivity_bounds_gnmax(self):
-    counts1 = np.array([10, 0, 0])
-    sigma1 = .5
-    order1 = 1.5
-
-    answer1 = np.array(
-        [3.13503646e-17, 1.60178280e-08, 5.90681786e-03] + [5.99981308e+00] * 7)
-
-    # Test for "going right" in the smooth sensitivity computation.
-    out1 = pate_ss.compute_local_sensitivity_bounds_gnmax(
-        counts1, 10, sigma1, order1)
-
-    self._assert_all_close(out1, answer1)
-
-    counts2 = np.array([1000, 500, 300, 200, 0])
-    sigma2 = 250.
-    order2 = 10.
-
-    # Test for "going left" in the smooth sensitivity computation.
-    out2 = pate_ss.compute_local_sensitivity_bounds_gnmax(
-        counts2, 2000, sigma2, order2)
-
-    answer2 = np.array([0.] * 298 + [2.77693450548e-7, 2.10853979548e-6] +
-                       [2.73113623988e-6] * 1700)
-    self._assert_all_close(out2, answer2)
-
-  def test_compute_local_sensitivity_bounds_threshold(self):
-    counts1_3 = np.array([20, 10, 0])
-    num_teachers = sum(counts1_3)
-    t1 = 16  # high threshold
-    sigma = 2
-    order = 10
-
-    out1 = pate_ss.compute_local_sensitivity_bounds_threshold(
-        counts1_3, num_teachers, t1, sigma, order)
-    answer1 = np.array([0] * 3 + [
-        1.48454129e-04, 1.47826870e-02, 3.94153241e-02, 6.45775697e-02,
-        9.01543247e-02, 1.16054002e-01, 1.42180452e-01, 1.42180452e-01,
-        1.48454129e-04, 1.47826870e-02, 3.94153241e-02, 6.45775697e-02,
-        9.01543266e-02, 1.16054000e-01, 1.42180452e-01, 1.68302106e-01,
-        1.93127860e-01
-    ] + [0] * 10)
-    self._assert_all_close(out1, answer1)
-
-    t2 = 2  # low threshold
-
-    out2 = pate_ss.compute_local_sensitivity_bounds_threshold(
-        counts1_3, num_teachers, t2, sigma, order)
-    answer2 = np.array([
-        1.60212079e-01, 2.07021132e-01, 2.07021132e-01, 1.93127860e-01,
-        1.68302106e-01, 1.42180452e-01, 1.16054002e-01, 9.01543247e-02,
-        6.45775697e-02, 3.94153241e-02, 1.47826870e-02, 1.48454129e-04
-    ] + [0] * 18)
-    self._assert_all_close(out2, answer2)
-
-    t3 = 50  # very high threshold (larger than the number of teachers).
-
-    out3 = pate_ss.compute_local_sensitivity_bounds_threshold(
-        counts1_3, num_teachers, t3, sigma, order)
-
-    answer3 = np.array([
-        1.35750725752e-19, 1.88990500499e-17, 2.05403154065e-15,
-        1.74298153642e-13, 1.15489723995e-11, 5.97584949325e-10,
-        2.41486826748e-08, 7.62150641922e-07, 1.87846248741e-05,
-        0.000360973025976, 0.000360973025976, 2.76377015215e-50,
-        1.00904975276e-53, 2.87254164748e-57, 6.37583360761e-61,
-        1.10331620211e-64, 1.48844393335e-68, 1.56535552444e-72,
-        1.28328011060e-76, 8.20047697109e-81
-    ] + [0] * 10)
-
-    self._assert_all_close(out3, answer3)
-
-    # Fractional values.
-    counts4 = np.array([19.5, -5.1, 0])
-    t4 = 10.1
-    out4 = pate_ss.compute_local_sensitivity_bounds_threshold(
-        counts4, num_teachers, t4, sigma, order)
-
-    answer4 = np.array([
-        0.0620410301, 0.0875807131, 0.113451958, 0.139561671, 0.1657074530,
-        0.1908244840, 0.2070270720, 0.207027072, 0.169718100, 0.0575152142,
-        0.00678695871
-    ] + [0] * 6 + [0.000536304908, 0.0172181073, 0.041909870] + [0] * 10)
-    self._assert_all_close(out4, answer4)
-
-
-if __name__ == "__main__":
-  unittest.main()
--- a/tensorflow_privacy/setup.py
+++ b/tensorflow_privacy/setup.py
@ -1,32 +0,0 @@
-# Copyright 2018, The TensorFlow Authors.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#      http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-"""TensorFlow Privacy library setup file for pip."""
-from setuptools import find_packages
-from setuptools import setup
-
-setup(name='tensorflow_privacy',
-      version='0.1.0',
-      url='https://github.com/tensorflow/privacy',
-      license='Apache-2.0',
-      install_requires=[
-          'scipy>=0.17',
-          'mpmath',  # used in tests only
-      ],
-      # Explicit dependence on TensorFlow is not supported.
-      # See https://github.com/tensorflow/tensorflow/issues/7166
-      extras_require={
-          'tf': ['tensorflow>=1.0.0'],
-          'tf_gpu': ['tensorflow-gpu>=1.0.0'],
-      },
-      packages=find_packages())
--- a/tensorflow_privacy/tutorials/README.md
+++ b/tensorflow_privacy/tutorials/README.md
@ -1,129 +0,0 @@
-# Tutorials
-
-This folder contains a set of tutorials that demonstrate the features of this
-library.
-As demonstrated on MNIST in `mnist_dpsgd_tutorial.py`, the easiest way to use
-a differentially private optimizer is to modify an existing TF training loop
-to replace an existing vanilla optimizer with its differentially private
-counterpart implemented in the library.
-
-Here is a list of all the tutorials included:
-
-* `lm_dpsgd_tutorial.py`: learn a language model with differential privacy.
-
-* `mnist_dpsgd_tutorial.py`: learn a convolutional neural network on MNIST with
-  differential privacy.
-
-* `mnist_dpsgd_tutorial_eager.py`: learn a convolutional neural network on MNIST
-  with differential privacy using Eager mode.
-
-* `mnist_dpsgd_tutorial_keras.py`: learn a convolutional neural network on MNIST
-  with differential privacy using tf.Keras.
-
-* `mnist_lr_tutorial.py`: learn a differentially private logistic regression
-  model on MNIST. The model illustrates application of the
-  "amplification-by-iteration" analysis (https://arxiv.org/abs/1808.06651).
-
-The rest of this README describes the different parameters used to configure
-DP-SGD as well as expected outputs for the `mnist_dpsgd_tutorial.py` tutorial.
-
-## Parameters
-
-All of the optimizers share some privacy-specific parameters that need to
-be tuned in addition to any existing hyperparameter. There are currently four:
-
-* `learning_rate` (float): The learning rate of the SGD training algorithm. The
-  higher the learning rate, the more each update matters. If the updates are noisy
-  (such as when the additive noise is large compared to the clipping
-  threshold), the learning rate must be kept low for the training procedure to converge.
-* `num_microbatches` (int): The input data for each step (i.e., batch) of your
-  original training algorithm is split into this many microbatches. Generally,
-  increasing this will improve your utility but slow down your training in terms
-  of wall-clock time. The total number of examples consumed in one global step
-  remains the same. This number should evenly divide your input batch size.
-* `l2_norm_clip` (float): The cumulative gradient across all network parameters
-  from each microbatch will be clipped so that its L2 norm is at most this
-  value. You should set this to something close to some percentile of what
-  you expect the gradient from each microbatch to be. In previous experiments,
-  we've found numbers from 0.5 to 1.0 to work reasonably well.
-* `noise_multiplier` (float): This governs the amount of noise added during
-  training. Generally, more noise results in better privacy and lower utility.
-  This generally has to be at least 0.3 to obtain rigorous privacy guarantees,
-  but smaller values may still be acceptable for practical purposes.
-
-## Measuring Privacy
-
-Differential privacy can be expressed using two values, epsilon and delta.
-Roughly speaking, they mean the following:
-
-* epsilon gives a ceiling on how much the probability of a particular output
-  can increase by including (or removing) a single training example. We usually
-  want it to be a small constant (less than 10, or, for more stringent privacy
-  guarantees, less than 1). However, this is only an upper bound, and a large
-  value of epsilon may still mean good practical privacy.
-* delta bounds the probability of an arbitrary change in model behavior.
-  We can usually set this to a very small number (1e-7 or so) without
-  compromising utility. A rule of thumb is to set it to be less than the inverse
-  of the training data size.
-
-To find out the epsilon given a fixed delta value for your model, follow the
-approach demonstrated in the `compute_epsilon` of the `mnist_dpsgd_tutorial.py`
-where the arguments used to call the RDP accountant (i.e., the tool used to
-compute the privacy guarantee) are:
-
-* `q` : The sampling ratio, defined as (number of examples consumed in one
-  step) / (total training examples).
-* `noise_multiplier` : The noise_multiplier from your parameters above.
-* `steps` : The number of global steps taken.
-
-A detailed writeup of the theory behind the computation of epsilon and delta
-is available at https://arxiv.org/abs/1908.10530.
-
-## Expected Output
-
-When the `mnist_dpsgd_tutorial.py` script is run with the default parameters,
-the output will contain the following lines (leaving out a lot of diagnostic
-info):
-```
-...
-Test accuracy after 1 epochs is: 0.774
-For delta=1e-5, the current epsilon is: 1.03
-...
-Test accuracy after 2 epochs is: 0.877
-For delta=1e-5, the current epsilon is: 1.11
-...
-Test accuracy after 60 epochs is: 0.966
-For delta=1e-5, the current epsilon is: 3.01
-```
-
-## Using Command-Line Interface for Privacy Budgeting
-
-Before launching a (possibly quite lengthy) training procedure, it is possible
-to compute, quickly and accurately, privacy loss at any point of the training.
-To do so, run the script `privacy/analysis/compute_dp_sgd_privacy.py`, which
-does not have any TensorFlow dependencies. For example, executing
-```
-compute_dp_sgd_privacy.py --N=60000 --batch_size=256 --noise_multiplier=1.1 --epochs=60 --delta=1e-5
-```
-allows us to conclude, in a matter of seconds, that DP-SGD run with default
-parameters satisfies differential privacy with eps = 3.01 and delta = 1e-05.
-Note that the flags provided in the command above correspond to the tutorial in
-`mnist_dpsgd_tutorial.py`. The command is applicable to other datasets but the
-values passed must be adapted (e.g., N the number of training points).
-
-
-## Select Parameters
-
-The table below has a few sample parameters illustrating various
-accuracy/privacy tradeoffs achieved by the MNIST tutorial in
-`mnist_dpsgd_tutorial.py` (default parameters are in __bold__; privacy epsilon
-is reported at delta=1e-5; accuracy is averaged over 10 runs, its standard
-deviation is less than .3% in all cases).
-
-| Learning rate | Noise multiplier | Clipping threshold | Number of microbatches | Number of epochs | Privacy eps | Accuracy |
-| ------------- | ---------------- | -----------------  | ---------------------- | ---------------- | ----------- | -------- |
-| 0.1           |                  |                    | __256__                | 20               | no privacy  | 99.0%    |
-| 0.25          | 1.3              | 1.5                | __256__                | 15               | 1.19        | 95.0%    |
-| __0.15__      | __1.1__          | __1.0__            | __256__                |__60__            | 3.01        | 96.6%    |
-| 0.25          | 0.7              | 1.5                | __256__                | 45               | 7.10        | 97.0%    |
-
--- a/tensorflow_privacy/tutorials/bolton_tutorial.py
+++ b/tensorflow_privacy/tutorials/bolton_tutorial.py
@ -1,187 +0,0 @@
-# Copyright 2019, The TensorFlow Authors.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#      http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-"""Tutorial for bolt_on module, the model and the optimizer."""
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-import tensorflow as tf  # pylint: disable=wrong-import-position
-from tensorflow_privacy.privacy.bolt_on import losses  # pylint: disable=wrong-import-position
-from tensorflow_privacy.privacy.bolt_on import models  # pylint: disable=wrong-import-position
-from tensorflow_privacy.privacy.bolt_on.optimizers import BoltOn  # pylint: disable=wrong-import-position
-# -------
-# First, we will create a binary classification dataset with a single output
-# dimension. The samples for each label are repeated data points at different
-# points in space.
-# -------
-# Parameters for dataset
-n_samples = 10
-input_dim = 2
-n_outputs = 1
-# Create binary classification dataset:
-x_stack = [tf.constant(-1, tf.float32, (n_samples, input_dim)),
-           tf.constant(1, tf.float32, (n_samples, input_dim))]
-y_stack = [tf.constant(0, tf.float32, (n_samples, 1)),
-           tf.constant(1, tf.float32, (n_samples, 1))]
-x, y = tf.concat(x_stack, 0), tf.concat(y_stack, 0)
-print(x.shape, y.shape)
-generator = tf.data.Dataset.from_tensor_slices((x, y))
-generator = generator.batch(10)
-generator = generator.shuffle(10)
-# -------
-# First, we will explore using the pre - built BoltOnModel, which is a thin
-# wrapper around a Keras Model using a single - layer neural network.
-# It automatically uses the BoltOn Optimizer which encompasses all the logic
-# required for the BoltOn Differential Privacy method.
-# -------
-bolt = models.BoltOnModel(n_outputs)  # tell the model how many outputs we have.
-# -------
-# Now, we will pick our optimizer and Strongly Convex Loss function. The loss
-# must extend from StrongConvexMixin and implement the associated methods.Some
-# existing loss functions are pre - implemented in bolt_on.loss
-# -------
-optimizer = tf.optimizers.SGD()
-reg_lambda = 1
-C = 1
-radius_constant = 1
-loss = losses.StrongConvexBinaryCrossentropy(reg_lambda, C, radius_constant)
-# -------
-# For simplicity, we pick all parameters of the StrongConvexBinaryCrossentropy
-# to be 1; these are all tunable and their impact can be read in losses.
-# StrongConvexBinaryCrossentropy.We then compile the model with the chosen
-# optimizer and loss, which will automatically wrap the chosen optimizer with
-# the BoltOn Optimizer, ensuring the required components function as required
-# for privacy guarantees.
-# -------
-bolt.compile(optimizer, loss)
-# -------
-# To fit the model, the optimizer will require additional information about
-# the dataset and model.These parameters are:
-# 1. the class_weights used
-# 2. the number of samples in the dataset
-# 3. the batch size which the model will try to infer, if possible.  If not,
-# you will be required to pass these explicitly to the fit method.
-#
-# As well, there are two privacy parameters than can be altered:
-# 1. epsilon, a float
-# 2. noise_distribution, a valid string indicating the distriution to use (must
-# be implemented)
-#
-# The BoltOnModel offers a helper method,.calculate_class_weight to aid in
-# class_weight calculation.
-# required parameters
-# -------
-class_weight = None  # default, use .calculate_class_weight for other values
-batch_size = None  # default, if it cannot be inferred, specify this
-n_samples = None  # default, if it cannot be iferred, specify this
-# privacy parameters
-epsilon = 2
-noise_distribution = 'laplace'
-
-bolt.fit(x,
-         y,
-         epsilon=epsilon,
-         class_weight=class_weight,
-         batch_size=batch_size,
-         n_samples=n_samples,
-         noise_distribution=noise_distribution,
-         epochs=2)
-# -------
-# We may also train a generator object, or try different optimizers and loss
-# functions. Below, we will see that we must pass the number of samples as the
-# fit method is unable to infer it for a generator.
-# -------
-optimizer2 = tf.optimizers.Adam()
-bolt.compile(optimizer2, loss)
-# required parameters
-class_weight = None  # default, use .calculate_class_weight for other values
-batch_size = None  # default, if it cannot be inferred, specify this
-n_samples = None  # default, if it cannot be iferred, specify this
-# privacy parameters
-epsilon = 2
-noise_distribution = 'laplace'
-try:
-  bolt.fit(generator,
-           epsilon=epsilon,
-           class_weight=class_weight,
-           batch_size=batch_size,
-           n_samples=n_samples,
-           noise_distribution=noise_distribution,
-           verbose=0)
-except ValueError as e:
-  print(e)
-# -------
-# And now, re running with the parameter set.
-# -------
-n_samples = 20
-bolt.fit_generator(generator,
-                   epsilon=epsilon,
-                   class_weight=class_weight,
-                   n_samples=n_samples,
-                   noise_distribution=noise_distribution,
-                   verbose=0)
-# -------
-# You don't have to use the BoltOn model to use the BoltOn method.
-# There are only a few requirements:
-# 1. make sure any requirements from the loss are implemented in the model.
-# 2. instantiate the optimizer and use it as a context around the fit operation.
-# -------
-# -------------------- Part 2, using the Optimizer
-
-# -------
-# Here, we create our own model and setup the BoltOn optimizer.
-# -------
-
-
-class TestModel(tf.keras.Model):  # pylint: disable=abstract-method
-
-  def __init__(self, reg_layer, number_of_outputs=1):
-    super(TestModel, self).__init__(name='test')
-    self.output_layer = tf.keras.layers.Dense(number_of_outputs,
-                                              kernel_regularizer=reg_layer)
-
-  def call(self, inputs):  # pylint: disable=arguments-differ
-    return self.output_layer(inputs)
-
-
-optimizer = tf.optimizers.SGD()
-loss = losses.StrongConvexBinaryCrossentropy(reg_lambda, C, radius_constant)
-optimizer = BoltOn(optimizer, loss)
-# -------
-# Now, we instantiate our model and check for 1. Since our loss requires L2
-# regularization over the kernel, we will pass it to the model.
-# -------
-n_outputs = 1  # parameter for model and optimizer context.
-test_model = TestModel(loss.kernel_regularizer(), n_outputs)
-test_model.compile(optimizer, loss)
-# -------
-# We comply with 2., and use the BoltOn Optimizer as a context around the fit
-# method.
-# -------
-# parameters for context
-noise_distribution = 'laplace'
-epsilon = 2
-class_weights = 1  # Previously, the fit method auto-detected the class_weights.
-# Here, we need to pass the class_weights explicitly. 1 is the same as None.
-n_samples = 20
-batch_size = 5
-
-with optimizer(
-    noise_distribution=noise_distribution,
-    epsilon=epsilon,
-    layers=test_model.layers,
-    class_weights=class_weights,
-    n_samples=n_samples,
-    batch_size=batch_size
-) as _:
-  test_model.fit(x, y, batch_size=batch_size, epochs=2)
--- a/tensorflow_privacy/tutorials/lm_dpsgd_tutorial.py
+++ b/tensorflow_privacy/tutorials/lm_dpsgd_tutorial.py
@ -1,225 +0,0 @@
-# Copyright 2019, The TensorFlow Authors.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#      http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-"""Training a language model (recurrent neural network) with DP-SGD optimizer.
-
-This tutorial uses a corpus of text from TensorFlow datasets unless a
-FLAGS.data_dir is specified with the path to a directory containing two files
-train.txt and test.txt corresponding to a training and test corpus.
-
-Even though we haven't done any hyperparameter tuning, and the analytical
-epsilon upper bound can't offer any strong guarantees, the benefits of training
-with differential privacy can be clearly seen by examining the trained model.
-In particular, such inspection can confirm that the set of training-data
-examples that the model fails to learn (i.e., has high perplexity for) comprises
-outliers and rare sentences outside the distribution to be learned (see examples
-and a discussion in this blog post). This can be further confirmed by
-testing the differentially-private model's propensity for memorization, e.g.,
-using the exposure metric of https://arxiv.org/abs/1802.08232.
-
-This example is decribed in more details in this post: https://goo.gl/UKr7vH
-"""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import os
-
-from absl import app
-from absl import flags
-
-import numpy as np
-import tensorflow as tf
-import tensorflow_datasets as tfds
-
-from tensorflow_privacy.privacy.analysis import privacy_ledger
-from tensorflow_privacy.privacy.analysis.rdp_accountant import compute_rdp
-from tensorflow_privacy.privacy.analysis.rdp_accountant import get_privacy_spent
-from tensorflow_privacy.privacy.optimizers import dp_optimizer
-
-flags.DEFINE_boolean(
-    'dpsgd', True, 'If True, train with DP-SGD. If False, '
-    'train with vanilla SGD.')
-flags.DEFINE_float('learning_rate', 0.001, 'Learning rate for training')
-flags.DEFINE_float('noise_multiplier', 0.001,
-                   'Ratio of the standard deviation to the clipping norm')
-flags.DEFINE_float('l2_norm_clip', 1.0, 'Clipping norm')
-flags.DEFINE_integer('batch_size', 256, 'Batch size')
-flags.DEFINE_integer('epochs', 60, 'Number of epochs')
-flags.DEFINE_integer(
-    'microbatches', 256, 'Number of microbatches '
-    '(must evenly divide batch_size)')
-flags.DEFINE_string('model_dir', None, 'Model directory')
-flags.DEFINE_string('data_dir', None, 'Directory containing the PTB data.')
-
-FLAGS = flags.FLAGS
-
-SEQ_LEN = 80
-NB_TRAIN = 45000
-
-
-def rnn_model_fn(features, labels, mode):  # pylint: disable=unused-argument
-  """Model function for a RNN."""
-
-  # Define RNN architecture using tf.keras.layers.
-  x = features['x']
-  x = tf.reshape(x, [-1, SEQ_LEN])
-  input_layer = x[:, :-1]
-  input_one_hot = tf.one_hot(input_layer, 256)
-  lstm = tf.keras.layers.LSTM(256, return_sequences=True).apply(input_one_hot)
-  logits = tf.keras.layers.Dense(256).apply(lstm)
-
-  # Calculate loss as a vector (to support microbatches in DP-SGD).
-  vector_loss = tf.nn.softmax_cross_entropy_with_logits(
-      labels=tf.cast(tf.one_hot(x[:, 1:], 256), dtype=tf.float32),
-      logits=logits)
-  # Define mean of loss across minibatch (for reporting through tf.Estimator).
-  scalar_loss = tf.reduce_mean(vector_loss)
-
-  # Configure the training op (for TRAIN mode).
-  if mode == tf.estimator.ModeKeys.TRAIN:
-    if FLAGS.dpsgd:
-
-      ledger = privacy_ledger.PrivacyLedger(
-          population_size=NB_TRAIN,
-          selection_probability=(FLAGS.batch_size / NB_TRAIN))
-
-      optimizer = dp_optimizer.DPAdamGaussianOptimizer(
-          l2_norm_clip=FLAGS.l2_norm_clip,
-          noise_multiplier=FLAGS.noise_multiplier,
-          num_microbatches=FLAGS.microbatches,
-          ledger=ledger,
-          learning_rate=FLAGS.learning_rate,
-          unroll_microbatches=True)
-      opt_loss = vector_loss
-    else:
-      optimizer = tf.train.AdamOptimizer(
-          learning_rate=FLAGS.learning_rate)
-      opt_loss = scalar_loss
-    global_step = tf.train.get_global_step()
-    train_op = optimizer.minimize(loss=opt_loss, global_step=global_step)
-    return tf.estimator.EstimatorSpec(mode=mode,
-                                      loss=scalar_loss,
-                                      train_op=train_op)
-
-  # Add evaluation metrics (for EVAL mode).
-  elif mode == tf.estimator.ModeKeys.EVAL:
-    eval_metric_ops = {
-        'accuracy':
-            tf.metrics.accuracy(
-                labels=tf.cast(x[:, 1:], dtype=tf.int32),
-                predictions=tf.argmax(input=logits, axis=2))
-    }
-    return tf.estimator.EstimatorSpec(mode=mode,
-                                      loss=scalar_loss,
-                                      eval_metric_ops=eval_metric_ops)
-
-
-def load_data():
-  """Load training and validation data."""
-  if not FLAGS.data_dir:
-    print('FLAGS.data_dir containing train.txt and test.txt was not specified, '
-          'using a substitute dataset from the tensorflow_datasets module.')
-    train_dataset = tfds.load(name='lm1b/subwords8k',
-                              split=tfds.Split.TRAIN,
-                              batch_size=NB_TRAIN,
-                              shuffle_files=True)
-    test_dataset = tfds.load(name='lm1b/subwords8k',
-                             split=tfds.Split.TEST,
-                             batch_size=10000)
-    train_data = next(tfds.as_numpy(train_dataset))
-    test_data = next(tfds.as_numpy(test_dataset))
-    train_data = train_data['text'].flatten()
-    test_data = test_data['text'].flatten()
-  else:
-    train_fpath = os.path.join(FLAGS.data_dir, 'train.txt')
-    test_fpath = os.path.join(FLAGS.data_dir, 'test.txt')
-    train_txt = open(train_fpath).read().split()
-    test_txt = open(test_fpath).read().split()
-    keys = sorted(set(train_txt))
-    remap = {k: i for i, k in enumerate(keys)}
-    train_data = np.array([remap[x] for x in train_txt], dtype=np.uint8)
-    test_data = np.array([remap[x] for x in test_txt], dtype=np.uint8)
-
-  return train_data, test_data
-
-
-def compute_epsilon(steps):
-  """Computes epsilon value for given hyperparameters."""
-  if FLAGS.noise_multiplier == 0.0:
-    return float('inf')
-  orders = [1 + x / 10. for x in range(1, 100)] + list(range(12, 64))
-  sampling_probability = FLAGS.batch_size / NB_TRAIN
-  rdp = compute_rdp(q=sampling_probability,
-                    noise_multiplier=FLAGS.noise_multiplier,
-                    steps=steps,
-                    orders=orders)
-  # Delta is set to 1e-5 because Penn TreeBank has 60000 training points.
-  return get_privacy_spent(orders, rdp, target_delta=1e-5)[0]
-
-
-def main(unused_argv):
-  tf.logging.set_verbosity(tf.logging.INFO)
-  if FLAGS.batch_size % FLAGS.microbatches != 0:
-    raise ValueError('Number of microbatches should divide evenly batch_size')
-
-  # Load training and test data.
-  train_data, test_data = load_data()
-
-  # Instantiate the tf.Estimator.
-  conf = tf.estimator.RunConfig(save_summary_steps=1000)
-  lm_classifier = tf.estimator.Estimator(model_fn=rnn_model_fn,
-                                         model_dir=FLAGS.model_dir,
-                                         config=conf)
-
-  # Create tf.Estimator input functions for the training and test data.
-  batch_len = FLAGS.batch_size * SEQ_LEN
-  train_data_end = len(train_data) - len(train_data) % batch_len
-  test_data_end = len(test_data) - len(test_data) % batch_len
-  train_input_fn = tf.estimator.inputs.numpy_input_fn(
-      x={'x': train_data[:train_data_end]},
-      batch_size=batch_len,
-      num_epochs=FLAGS.epochs,
-      shuffle=False)
-  eval_input_fn = tf.estimator.inputs.numpy_input_fn(
-      x={'x': test_data[:test_data_end]},
-      batch_size=batch_len,
-      num_epochs=1,
-      shuffle=False)
-
-  # Training loop.
-  steps_per_epoch = len(train_data) // batch_len
-  for epoch in range(1, FLAGS.epochs + 1):
-    print('epoch', epoch)
-    # Train the model for one epoch.
-    lm_classifier.train(input_fn=train_input_fn, steps=steps_per_epoch)
-
-    if epoch % 5 == 0:
-      name_input_fn = [('Train', train_input_fn), ('Eval', eval_input_fn)]
-      for name, input_fn in name_input_fn:
-        # Evaluate the model and print results
-        eval_results = lm_classifier.evaluate(input_fn=input_fn)
-        result_tuple = (epoch, eval_results['accuracy'], eval_results['loss'])
-        print(name, 'accuracy after %d epochs is: %.3f (%.4f)' % result_tuple)
-
-    # Compute the privacy budget expended so far.
-    if FLAGS.dpsgd:
-      eps = compute_epsilon(epoch * steps_per_epoch)
-      print('For delta=1e-5, the current epsilon is: %.2f' % eps)
-    else:
-      print('Trained with vanilla non-private SGD optimizer')
-
-if __name__ == '__main__':
-  app.run(main)
--- a/tensorflow_privacy/tutorials/mnist_dpsgd_tutorial.py
+++ b/tensorflow_privacy/tutorials/mnist_dpsgd_tutorial.py
@ -1,212 +0,0 @@
-# Copyright 2018, The TensorFlow Authors.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#      http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-"""Training a CNN on MNIST with differentially private SGD optimizer."""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-from absl import app
-from absl import flags
-
-from distutils.version import LooseVersion
-
-import numpy as np
-import tensorflow as tf
-
-from tensorflow_privacy.privacy.analysis import privacy_ledger
-from tensorflow_privacy.privacy.analysis.rdp_accountant import compute_rdp_from_ledger
-from tensorflow_privacy.privacy.analysis.rdp_accountant import get_privacy_spent
-from tensorflow_privacy.privacy.optimizers import dp_optimizer
-
-if LooseVersion(tf.__version__) < LooseVersion('2.0.0'):
-  GradientDescentOptimizer = tf.train.GradientDescentOptimizer
-else:
-  GradientDescentOptimizer = tf.optimizers.SGD  # pylint: disable=invalid-name
-
-FLAGS = flags.FLAGS
-
-flags.DEFINE_boolean(
-    'dpsgd', True, 'If True, train with DP-SGD. If False, '
-    'train with vanilla SGD.')
-flags.DEFINE_float('learning_rate', .15, 'Learning rate for training')
-flags.DEFINE_float('noise_multiplier', 1.1,
-                   'Ratio of the standard deviation to the clipping norm')
-flags.DEFINE_float('l2_norm_clip', 1.0, 'Clipping norm')
-flags.DEFINE_integer('batch_size', 256, 'Batch size')
-flags.DEFINE_integer('epochs', 60, 'Number of epochs')
-flags.DEFINE_integer(
-    'microbatches', 256, 'Number of microbatches '
-    '(must evenly divide batch_size)')
-flags.DEFINE_string('model_dir', None, 'Model directory')
-
-
-class EpsilonPrintingTrainingHook(tf.estimator.SessionRunHook):
-  """Training hook to print current value of epsilon after an epoch."""
-
-  def __init__(self, ledger):
-    """Initalizes the EpsilonPrintingTrainingHook.
-
-    Args:
-      ledger: The privacy ledger.
-    """
-    self._samples, self._queries = ledger.get_unformatted_ledger()
-
-  def end(self, session):
-    orders = [1 + x / 10.0 for x in range(1, 100)] + list(range(12, 64))
-    samples = session.run(self._samples)
-    queries = session.run(self._queries)
-    formatted_ledger = privacy_ledger.format_ledger(samples, queries)
-    rdp = compute_rdp_from_ledger(formatted_ledger, orders)
-    eps = get_privacy_spent(orders, rdp, target_delta=1e-5)[0]
-    print('For delta=1e-5, the current epsilon is: %.2f' % eps)
-
-
-def cnn_model_fn(features, labels, mode):
-  """Model function for a CNN."""
-
-  # Define CNN architecture using tf.keras.layers.
-  input_layer = tf.reshape(features['x'], [-1, 28, 28, 1])
-  y = tf.keras.layers.Conv2D(16, 8,
-                             strides=2,
-                             padding='same',
-                             activation='relu').apply(input_layer)
-  y = tf.keras.layers.MaxPool2D(2, 1).apply(y)
-  y = tf.keras.layers.Conv2D(32, 4,
-                             strides=2,
-                             padding='valid',
-                             activation='relu').apply(y)
-  y = tf.keras.layers.MaxPool2D(2, 1).apply(y)
-  y = tf.keras.layers.Flatten().apply(y)
-  y = tf.keras.layers.Dense(32, activation='relu').apply(y)
-  logits = tf.keras.layers.Dense(10).apply(y)
-
-  # Calculate loss as a vector (to support microbatches in DP-SGD).
-  vector_loss = tf.nn.sparse_softmax_cross_entropy_with_logits(
-      labels=labels, logits=logits)
-  # Define mean of loss across minibatch (for reporting through tf.Estimator).
-  scalar_loss = tf.reduce_mean(vector_loss)
-
-  # Configure the training op (for TRAIN mode).
-  if mode == tf.estimator.ModeKeys.TRAIN:
-
-    if FLAGS.dpsgd:
-      ledger = privacy_ledger.PrivacyLedger(
-          population_size=60000,
-          selection_probability=(FLAGS.batch_size / 60000))
-
-      # Use DP version of GradientDescentOptimizer. Other optimizers are
-      # available in dp_optimizer. Most optimizers inheriting from
-      # tf.train.Optimizer should be wrappable in differentially private
-      # counterparts by calling dp_optimizer.optimizer_from_args().
-      optimizer = dp_optimizer.DPGradientDescentGaussianOptimizer(
-          l2_norm_clip=FLAGS.l2_norm_clip,
-          noise_multiplier=FLAGS.noise_multiplier,
-          num_microbatches=FLAGS.microbatches,
-          ledger=ledger,
-          learning_rate=FLAGS.learning_rate)
-      training_hooks = [
-          EpsilonPrintingTrainingHook(ledger)
-      ]
-      opt_loss = vector_loss
-    else:
-      optimizer = GradientDescentOptimizer(learning_rate=FLAGS.learning_rate)
-      training_hooks = []
-      opt_loss = scalar_loss
-    global_step = tf.train.get_global_step()
-    train_op = optimizer.minimize(loss=opt_loss, global_step=global_step)
-    # In the following, we pass the mean of the loss (scalar_loss) rather than
-    # the vector_loss because tf.estimator requires a scalar loss. This is only
-    # used for evaluation and debugging by tf.estimator. The actual loss being
-    # minimized is opt_loss defined above and passed to optimizer.minimize().
-    return tf.estimator.EstimatorSpec(mode=mode,
-                                      loss=scalar_loss,
-                                      train_op=train_op,
-                                      training_hooks=training_hooks)
-
-  # Add evaluation metrics (for EVAL mode).
-  elif mode == tf.estimator.ModeKeys.EVAL:
-    eval_metric_ops = {
-        'accuracy':
-            tf.metrics.accuracy(
-                labels=labels,
-                predictions=tf.argmax(input=logits, axis=1))
-    }
-
-    return tf.estimator.EstimatorSpec(mode=mode,
-                                      loss=scalar_loss,
-                                      eval_metric_ops=eval_metric_ops)
-
-
-def load_mnist():
-  """Loads MNIST and preprocesses to combine training and validation data."""
-  train, test = tf.keras.datasets.mnist.load_data()
-  train_data, train_labels = train
-  test_data, test_labels = test
-
-  train_data = np.array(train_data, dtype=np.float32) / 255
-  test_data = np.array(test_data, dtype=np.float32) / 255
-
-  train_labels = np.array(train_labels, dtype=np.int32)
-  test_labels = np.array(test_labels, dtype=np.int32)
-
-  assert train_data.min() == 0.
-  assert train_data.max() == 1.
-  assert test_data.min() == 0.
-  assert test_data.max() == 1.
-  assert train_labels.ndim == 1
-  assert test_labels.ndim == 1
-
-  return train_data, train_labels, test_data, test_labels
-
-
-def main(unused_argv):
-  tf.logging.set_verbosity(tf.logging.INFO)
-  if FLAGS.dpsgd and FLAGS.batch_size % FLAGS.microbatches != 0:
-    raise ValueError('Number of microbatches should divide evenly batch_size')
-
-  # Load training and test data.
-  train_data, train_labels, test_data, test_labels = load_mnist()
-
-  # Instantiate the tf.Estimator.
-  mnist_classifier = tf.estimator.Estimator(model_fn=cnn_model_fn,
-                                            model_dir=FLAGS.model_dir)
-
-  # Create tf.Estimator input functions for the training and test data.
-  train_input_fn = tf.estimator.inputs.numpy_input_fn(
-      x={'x': train_data},
-      y=train_labels,
-      batch_size=FLAGS.batch_size,
-      num_epochs=FLAGS.epochs,
-      shuffle=True)
-  eval_input_fn = tf.estimator.inputs.numpy_input_fn(
-      x={'x': test_data},
-      y=test_labels,
-      num_epochs=1,
-      shuffle=False)
-
-  # Training loop.
-  steps_per_epoch = 60000 // FLAGS.batch_size
-  for epoch in range(1, FLAGS.epochs + 1):
-    # Train the model for one epoch.
-    mnist_classifier.train(input_fn=train_input_fn, steps=steps_per_epoch)
-
-    # Evaluate the model and print results
-    eval_results = mnist_classifier.evaluate(input_fn=eval_input_fn)
-    test_accuracy = eval_results['accuracy']
-    print('Test accuracy after %d epochs is: %.3f' % (epoch, test_accuracy))
-
-if __name__ == '__main__':
-  app.run(main)
--- a/tensorflow_privacy/tutorials/mnist_dpsgd_tutorial_eager.py
+++ b/tensorflow_privacy/tutorials/mnist_dpsgd_tutorial_eager.py
@ -1,153 +0,0 @@
-# Copyright 2019, The TensorFlow Authors.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#      http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-"""Training a CNN on MNIST in TF Eager mode with DP-SGD optimizer."""
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-from absl import app
-from absl import flags
-
-from distutils.version import LooseVersion
-
-import numpy as np
-import tensorflow as tf
-
-from tensorflow_privacy.privacy.analysis.rdp_accountant import compute_rdp
-from tensorflow_privacy.privacy.analysis.rdp_accountant import get_privacy_spent
-from tensorflow_privacy.privacy.optimizers.dp_optimizer import DPGradientDescentGaussianOptimizer
-
-if LooseVersion(tf.__version__) < LooseVersion('2.0.0'):
-  GradientDescentOptimizer = tf.train.GradientDescentOptimizer
-  tf.enable_eager_execution()
-else:
-  GradientDescentOptimizer = tf.optimizers.SGD  # pylint: disable=invalid-name
-
-flags.DEFINE_boolean('dpsgd', True, 'If True, train with DP-SGD. If False, '
-                     'train with vanilla SGD.')
-flags.DEFINE_float('learning_rate', 0.15, 'Learning rate for training')
-flags.DEFINE_float('noise_multiplier', 1.1,
-                   'Ratio of the standard deviation to the clipping norm')
-flags.DEFINE_float('l2_norm_clip', 1.0, 'Clipping norm')
-flags.DEFINE_integer('batch_size', 250, 'Batch size')
-flags.DEFINE_integer('epochs', 60, 'Number of epochs')
-flags.DEFINE_integer('microbatches', 250, 'Number of microbatches '
-                     '(must evenly divide batch_size)')
-
-FLAGS = flags.FLAGS
-
-
-def compute_epsilon(steps):
-  """Computes epsilon value for given hyperparameters."""
-  if FLAGS.noise_multiplier == 0.0:
-    return float('inf')
-  orders = [1 + x / 10. for x in range(1, 100)] + list(range(12, 64))
-  sampling_probability = FLAGS.batch_size / 60000
-  rdp = compute_rdp(q=sampling_probability,
-                    noise_multiplier=FLAGS.noise_multiplier,
-                    steps=steps,
-                    orders=orders)
-  # Delta is set to 1e-5 because MNIST has 60000 training points.
-  return get_privacy_spent(orders, rdp, target_delta=1e-5)[0]
-
-
-def main(_):
-  if FLAGS.dpsgd and FLAGS.batch_size % FLAGS.microbatches != 0:
-    raise ValueError('Number of microbatches should divide evenly batch_size')
-
-  # Fetch the mnist data
-  train, test = tf.keras.datasets.mnist.load_data()
-  train_images, train_labels = train
-  test_images, test_labels = test
-
-  # Create a dataset object and batch for the training data
-  dataset = tf.data.Dataset.from_tensor_slices(
-      (tf.cast(train_images[..., tf.newaxis]/255, tf.float32),
-       tf.cast(train_labels, tf.int64)))
-  dataset = dataset.shuffle(1000).batch(FLAGS.batch_size)
-
-  # Create a dataset object and batch for the test data
-  eval_dataset = tf.data.Dataset.from_tensor_slices(
-      (tf.cast(test_images[..., tf.newaxis]/255, tf.float32),
-       tf.cast(test_labels, tf.int64)))
-  eval_dataset = eval_dataset.batch(10000)
-
-  # Define the model using tf.keras.layers
-  mnist_model = tf.keras.Sequential([
-      tf.keras.layers.Conv2D(16, 8,
-                             strides=2,
-                             padding='same',
-                             activation='relu'),
-      tf.keras.layers.MaxPool2D(2, 1),
-      tf.keras.layers.Conv2D(32, 4, strides=2, activation='relu'),
-      tf.keras.layers.MaxPool2D(2, 1),
-      tf.keras.layers.Flatten(),
-      tf.keras.layers.Dense(32, activation='relu'),
-      tf.keras.layers.Dense(10)
-  ])
-
-  # Instantiate the optimizer
-  if FLAGS.dpsgd:
-    opt = DPGradientDescentGaussianOptimizer(
-        l2_norm_clip=FLAGS.l2_norm_clip,
-        noise_multiplier=FLAGS.noise_multiplier,
-        num_microbatches=FLAGS.microbatches,
-        learning_rate=FLAGS.learning_rate)
-  else:
-    opt = GradientDescentOptimizer(learning_rate=FLAGS.learning_rate)
-
-  # Training loop.
-  steps_per_epoch = 60000 // FLAGS.batch_size
-  for epoch in range(FLAGS.epochs):
-    # Train the model for one epoch.
-    for (_, (images, labels)) in enumerate(dataset.take(-1)):
-      with tf.GradientTape(persistent=True) as gradient_tape:
-        # This dummy call is needed to obtain the var list.
-        logits = mnist_model(images, training=True)
-        var_list = mnist_model.trainable_variables
-
-        # In Eager mode, the optimizer takes a function that returns the loss.
-        def loss_fn():
-          logits = mnist_model(images, training=True)  # pylint: disable=undefined-loop-variable,cell-var-from-loop
-          loss = tf.nn.sparse_softmax_cross_entropy_with_logits(
-              labels=labels, logits=logits)  # pylint: disable=undefined-loop-variable,cell-var-from-loop
-          # If training without privacy, the loss is a scalar not a vector.
-          if not FLAGS.dpsgd:
-            loss = tf.reduce_mean(loss)
-          return loss
-
-        if FLAGS.dpsgd:
-          grads_and_vars = opt.compute_gradients(loss_fn, var_list,
-                                                 gradient_tape=gradient_tape)
-        else:
-          grads_and_vars = opt.compute_gradients(loss_fn, var_list)
-
-      opt.apply_gradients(grads_and_vars)
-
-    # Evaluate the model and print results
-    for (_, (images, labels)) in enumerate(eval_dataset.take(-1)):
-      logits = mnist_model(images, training=False)
-      correct_preds = tf.equal(tf.argmax(logits, axis=1), labels)
-    test_accuracy = np.mean(correct_preds.numpy())
-    print('Test accuracy after epoch %d is: %.3f' % (epoch, test_accuracy))
-
-    # Compute the privacy budget expended so far.
-    if FLAGS.dpsgd:
-      eps = compute_epsilon((epoch + 1) * steps_per_epoch)
-      print('For delta=1e-5, the current epsilon is: %.2f' % eps)
-    else:
-      print('Trained with vanilla non-private SGD optimizer')
-
-if __name__ == '__main__':
-  app.run(main)
--- a/tensorflow_privacy/tutorials/mnist_dpsgd_tutorial_keras.py
+++ b/tensorflow_privacy/tutorials/mnist_dpsgd_tutorial_keras.py
@ -1,150 +0,0 @@
-# Copyright 2019, The TensorFlow Authors.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#      http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-"""Training a CNN on MNIST with Keras and the DP SGD optimizer."""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-from absl import app
-from absl import flags
-
-from distutils.version import LooseVersion
-
-import numpy as np
-import tensorflow as tf
-
-from tensorflow_privacy.privacy.analysis.rdp_accountant import compute_rdp
-from tensorflow_privacy.privacy.analysis.rdp_accountant import get_privacy_spent
-from tensorflow_privacy.privacy.optimizers.dp_optimizer import DPGradientDescentGaussianOptimizer
-
-if LooseVersion(tf.__version__) < LooseVersion('2.0.0'):
-  GradientDescentOptimizer = tf.train.GradientDescentOptimizer
-else:
-  GradientDescentOptimizer = tf.optimizers.SGD  # pylint: disable=invalid-name
-
-flags.DEFINE_boolean(
-    'dpsgd', True, 'If True, train with DP-SGD. If False, '
-    'train with vanilla SGD.')
-flags.DEFINE_float('learning_rate', 0.15, 'Learning rate for training')
-flags.DEFINE_float('noise_multiplier', 1.1,
-                   'Ratio of the standard deviation to the clipping norm')
-flags.DEFINE_float('l2_norm_clip', 1.0, 'Clipping norm')
-flags.DEFINE_integer('batch_size', 250, 'Batch size')
-flags.DEFINE_integer('epochs', 60, 'Number of epochs')
-flags.DEFINE_integer(
-    'microbatches', 250, 'Number of microbatches '
-    '(must evenly divide batch_size)')
-flags.DEFINE_string('model_dir', None, 'Model directory')
-
-FLAGS = flags.FLAGS
-
-
-def compute_epsilon(steps):
-  """Computes epsilon value for given hyperparameters."""
-  if FLAGS.noise_multiplier == 0.0:
-    return float('inf')
-  orders = [1 + x / 10. for x in range(1, 100)] + list(range(12, 64))
-  sampling_probability = FLAGS.batch_size / 60000
-  rdp = compute_rdp(q=sampling_probability,
-                    noise_multiplier=FLAGS.noise_multiplier,
-                    steps=steps,
-                    orders=orders)
-  # Delta is set to 1e-5 because MNIST has 60000 training points.
-  return get_privacy_spent(orders, rdp, target_delta=1e-5)[0]
-
-
-def load_mnist():
-  """Loads MNIST and preprocesses to combine training and validation data."""
-  train, test = tf.keras.datasets.mnist.load_data()
-  train_data, train_labels = train
-  test_data, test_labels = test
-
-  train_data = np.array(train_data, dtype=np.float32) / 255
-  test_data = np.array(test_data, dtype=np.float32) / 255
-
-  train_data = train_data.reshape(train_data.shape[0], 28, 28, 1)
-  test_data = test_data.reshape(test_data.shape[0], 28, 28, 1)
-
-  train_labels = np.array(train_labels, dtype=np.int32)
-  test_labels = np.array(test_labels, dtype=np.int32)
-
-  train_labels = tf.keras.utils.to_categorical(train_labels, num_classes=10)
-  test_labels = tf.keras.utils.to_categorical(test_labels, num_classes=10)
-
-  assert train_data.min() == 0.
-  assert train_data.max() == 1.
-  assert test_data.min() == 0.
-  assert test_data.max() == 1.
-
-  return train_data, train_labels, test_data, test_labels
-
-
-def main(unused_argv):
-  tf.logging.set_verbosity(tf.logging.INFO)
-  if FLAGS.dpsgd and FLAGS.batch_size % FLAGS.microbatches != 0:
-    raise ValueError('Number of microbatches should divide evenly batch_size')
-
-  # Load training and test data.
-  train_data, train_labels, test_data, test_labels = load_mnist()
-
-  # Define a sequential Keras model
-  model = tf.keras.Sequential([
-      tf.keras.layers.Conv2D(16, 8,
-                             strides=2,
-                             padding='same',
-                             activation='relu',
-                             input_shape=(28, 28, 1)),
-      tf.keras.layers.MaxPool2D(2, 1),
-      tf.keras.layers.Conv2D(32, 4,
-                             strides=2,
-                             padding='valid',
-                             activation='relu'),
-      tf.keras.layers.MaxPool2D(2, 1),
-      tf.keras.layers.Flatten(),
-      tf.keras.layers.Dense(32, activation='relu'),
-      tf.keras.layers.Dense(10)
-  ])
-
-  if FLAGS.dpsgd:
-    optimizer = DPGradientDescentGaussianOptimizer(
-        l2_norm_clip=FLAGS.l2_norm_clip,
-        noise_multiplier=FLAGS.noise_multiplier,
-        num_microbatches=FLAGS.microbatches,
-        learning_rate=FLAGS.learning_rate)
-    # Compute vector of per-example loss rather than its mean over a minibatch.
-    loss = tf.keras.losses.CategoricalCrossentropy(
-        from_logits=True, reduction=tf.losses.Reduction.NONE)
-  else:
-    optimizer = GradientDescentOptimizer(learning_rate=FLAGS.learning_rate)
-    loss = tf.keras.losses.CategoricalCrossentropy(from_logits=True)
-
-  # Compile model with Keras
-  model.compile(optimizer=optimizer, loss=loss, metrics=['accuracy'])
-
-  # Train model with Keras
-  model.fit(train_data, train_labels,
-            epochs=FLAGS.epochs,
-            validation_data=(test_data, test_labels),
-            batch_size=FLAGS.batch_size)
-
-  # Compute the privacy budget expended.
-  if FLAGS.dpsgd:
-    eps = compute_epsilon(FLAGS.epochs * 60000 // FLAGS.batch_size)
-    print('For delta=1e-5, the current epsilon is: %.2f' % eps)
-  else:
-    print('Trained with vanilla non-private SGD optimizer')
-
-if __name__ == '__main__':
-  app.run(main)
--- a/tensorflow_privacy/tutorials/mnist_dpsgd_tutorial_vectorized.py
+++ b/tensorflow_privacy/tutorials/mnist_dpsgd_tutorial_vectorized.py
@ -1,207 +0,0 @@
-# Copyright 2019, The TensorFlow Authors.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#      http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-"""Training a CNN on MNIST with vectorized DP-SGD optimizer."""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-from absl import app
-from absl import flags
-
-from distutils.version import LooseVersion
-
-import numpy as np
-import tensorflow as tf
-
-from tensorflow_privacy.privacy.analysis.rdp_accountant import compute_rdp
-from tensorflow_privacy.privacy.analysis.rdp_accountant import get_privacy_spent
-from tensorflow_privacy.privacy.optimizers import dp_optimizer_vectorized
-
-
-flags.DEFINE_boolean(
-    'dpsgd', True, 'If True, train with DP-SGD. If False, '
-    'train with vanilla SGD.')
-flags.DEFINE_float('learning_rate', .15, 'Learning rate for training')
-flags.DEFINE_float('noise_multiplier', 1.1,
-                   'Ratio of the standard deviation to the clipping norm')
-flags.DEFINE_float('l2_norm_clip', 1.0, 'Clipping norm')
-flags.DEFINE_integer('batch_size', 200, 'Batch size')
-flags.DEFINE_integer('epochs', 60, 'Number of epochs')
-flags.DEFINE_integer(
-    'microbatches', 200, 'Number of microbatches '
-    '(must evenly divide batch_size)')
-flags.DEFINE_string('model_dir', None, 'Model directory')
-
-
-FLAGS = flags.FLAGS
-
-
-NUM_TRAIN_EXAMPLES = 60000
-
-
-if LooseVersion(tf.__version__) < LooseVersion('2.0.0'):
-  GradientDescentOptimizer = tf.train.GradientDescentOptimizer
-else:
-  GradientDescentOptimizer = tf.optimizers.SGD  # pylint: disable=invalid-name
-
-
-def compute_epsilon(steps):
-  """Computes epsilon value for given hyperparameters."""
-  if FLAGS.noise_multiplier == 0.0:
-    return float('inf')
-  orders = [1 + x / 10. for x in range(1, 100)] + list(range(12, 64))
-  sampling_probability = FLAGS.batch_size / NUM_TRAIN_EXAMPLES
-  rdp = compute_rdp(q=sampling_probability,
-                    noise_multiplier=FLAGS.noise_multiplier,
-                    steps=steps,
-                    orders=orders)
-  # Delta is set to approximate 1 / (number of training points).
-  return get_privacy_spent(orders, rdp, target_delta=1e-5)[0]
-
-
-def cnn_model_fn(features, labels, mode):
-  """Model function for a CNN."""
-
-  # Define CNN architecture using tf.keras.layers.
-  input_layer = tf.reshape(features['x'], [-1, 28, 28, 1])
-  y = tf.keras.layers.Conv2D(16, 8,
-                             strides=2,
-                             padding='same',
-                             activation='relu').apply(input_layer)
-  y = tf.keras.layers.MaxPool2D(2, 1).apply(y)
-  y = tf.keras.layers.Conv2D(32, 4,
-                             strides=2,
-                             padding='valid',
-                             activation='relu').apply(y)
-  y = tf.keras.layers.MaxPool2D(2, 1).apply(y)
-  y = tf.keras.layers.Flatten().apply(y)
-  y = tf.keras.layers.Dense(32, activation='relu').apply(y)
-  logits = tf.keras.layers.Dense(10).apply(y)
-
-  # Calculate loss as a vector (to support microbatches in DP-SGD).
-  vector_loss = tf.nn.sparse_softmax_cross_entropy_with_logits(
-      labels=labels, logits=logits)
-  # Define mean of loss across minibatch (for reporting through tf.Estimator).
-  scalar_loss = tf.reduce_mean(vector_loss)
-
-  # Configure the training op (for TRAIN mode).
-  if mode == tf.estimator.ModeKeys.TRAIN:
-
-    if FLAGS.dpsgd:
-      # Use DP version of GradientDescentOptimizer. Other optimizers are
-      # available in dp_optimizer. Most optimizers inheriting from
-      # tf.train.Optimizer should be wrappable in differentially private
-      # counterparts by calling dp_optimizer.optimizer_from_args().
-      optimizer = dp_optimizer_vectorized.VectorizedDPSGD(
-          l2_norm_clip=FLAGS.l2_norm_clip,
-          noise_multiplier=FLAGS.noise_multiplier,
-          num_microbatches=FLAGS.microbatches,
-          learning_rate=FLAGS.learning_rate)
-      opt_loss = vector_loss
-    else:
-      optimizer = GradientDescentOptimizer(learning_rate=FLAGS.learning_rate)
-      opt_loss = scalar_loss
-    global_step = tf.train.get_global_step()
-    train_op = optimizer.minimize(loss=opt_loss, global_step=global_step)
-    # In the following, we pass the mean of the loss (scalar_loss) rather than
-    # the vector_loss because tf.estimator requires a scalar loss. This is only
-    # used for evaluation and debugging by tf.estimator. The actual loss being
-    # minimized is opt_loss defined above and passed to optimizer.minimize().
-    return tf.estimator.EstimatorSpec(mode=mode,
-                                      loss=scalar_loss,
-                                      train_op=train_op)
-
-  # Add evaluation metrics (for EVAL mode).
-  elif mode == tf.estimator.ModeKeys.EVAL:
-    eval_metric_ops = {
-        'accuracy':
-            tf.metrics.accuracy(
-                labels=labels,
-                predictions=tf.argmax(input=logits, axis=1))
-    }
-
-    return tf.estimator.EstimatorSpec(mode=mode,
-                                      loss=scalar_loss,
-                                      eval_metric_ops=eval_metric_ops)
-
-
-def load_mnist():
-  """Loads MNIST and preprocesses to combine training and validation data."""
-  train, test = tf.keras.datasets.mnist.load_data()
-  train_data, train_labels = train
-  test_data, test_labels = test
-
-  train_data = np.array(train_data, dtype=np.float32) / 255
-  test_data = np.array(test_data, dtype=np.float32) / 255
-
-  train_labels = np.array(train_labels, dtype=np.int32)
-  test_labels = np.array(test_labels, dtype=np.int32)
-
-  assert train_data.min() == 0.
-  assert train_data.max() == 1.
-  assert test_data.min() == 0.
-  assert test_data.max() == 1.
-  assert train_labels.ndim == 1
-  assert test_labels.ndim == 1
-
-  return train_data, train_labels, test_data, test_labels
-
-
-def main(unused_argv):
-  tf.logging.set_verbosity(tf.logging.INFO)
-  if FLAGS.dpsgd and FLAGS.batch_size % FLAGS.microbatches != 0:
-    raise ValueError('Number of microbatches should divide evenly batch_size')
-
-  # Load training and test data.
-  train_data, train_labels, test_data, test_labels = load_mnist()
-
-  # Instantiate the tf.Estimator.
-  mnist_classifier = tf.estimator.Estimator(model_fn=cnn_model_fn,
-                                            model_dir=FLAGS.model_dir)
-
-  # Create tf.Estimator input functions for the training and test data.
-  train_input_fn = tf.estimator.inputs.numpy_input_fn(
-      x={'x': train_data},
-      y=train_labels,
-      batch_size=FLAGS.batch_size,
-      num_epochs=FLAGS.epochs,
-      shuffle=True)
-  eval_input_fn = tf.estimator.inputs.numpy_input_fn(
-      x={'x': test_data},
-      y=test_labels,
-      num_epochs=1,
-      shuffle=False)
-
-  # Training loop.
-  steps_per_epoch = NUM_TRAIN_EXAMPLES // FLAGS.batch_size
-  for epoch in range(1, FLAGS.epochs + 1):
-    # Train the model for one epoch.
-    mnist_classifier.train(input_fn=train_input_fn, steps=steps_per_epoch)
-
-    # Evaluate the model and print results
-    eval_results = mnist_classifier.evaluate(input_fn=eval_input_fn)
-    test_accuracy = eval_results['accuracy']
-    print('Test accuracy after %d epochs is: %.3f' % (epoch, test_accuracy))
-
-    # Compute the privacy budget expended.
-    if FLAGS.dpsgd:
-      eps = compute_epsilon(epoch * NUM_TRAIN_EXAMPLES // FLAGS.batch_size)
-      print('For delta=1e-5, the current epsilon is: %.2f' % eps)
-    else:
-      print('Trained with vanilla non-private SGD optimizer')
-
-if __name__ == '__main__':
-  app.run(main)
--- a/tensorflow_privacy/tutorials/mnist_lr_tutorial.py
+++ b/tensorflow_privacy/tutorials/mnist_lr_tutorial.py
@ -1,250 +0,0 @@
-# Copyright 2019, The TensorFlow Authors.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#      http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-"""DP Logistic Regression on MNIST.
-
-DP Logistic Regression on MNIST with support for privacy-by-iteration analysis.
-Vitaly Feldman, Ilya Mironov, Kunal Talwar, and Abhradeep Thakurta.
-"Privacy amplification by iteration."
-In 2018 IEEE 59th Annual Symposium on Foundations of Computer Science (FOCS),
-pp. 521-532. IEEE, 2018.
-https://arxiv.org/abs/1808.06651.
-"""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import math
-
-from absl import app
-from absl import flags
-
-from distutils.version import LooseVersion
-
-import numpy as np
-import tensorflow as tf
-
-from tensorflow_privacy.privacy.analysis.rdp_accountant import compute_rdp
-from tensorflow_privacy.privacy.analysis.rdp_accountant import get_privacy_spent
-from tensorflow_privacy.privacy.optimizers import dp_optimizer
-
-if LooseVersion(tf.__version__) < LooseVersion('2.0.0'):
-  GradientDescentOptimizer = tf.train.GradientDescentOptimizer
-else:
-  GradientDescentOptimizer = tf.optimizers.SGD  # pylint: disable=invalid-name
-
-FLAGS = flags.FLAGS
-
-flags.DEFINE_boolean(
-    'dpsgd', True, 'If True, train with DP-SGD. If False, '
-    'train with vanilla SGD.')
-flags.DEFINE_float('learning_rate', 0.001, 'Learning rate for training')
-flags.DEFINE_float('noise_multiplier', 0.05,
-                   'Ratio of the standard deviation to the clipping norm')
-flags.DEFINE_integer('batch_size', 5, 'Batch size')
-flags.DEFINE_integer('epochs', 5, 'Number of epochs')
-flags.DEFINE_float('regularizer', 0, 'L2 regularizer coefficient')
-flags.DEFINE_string('model_dir', None, 'Model directory')
-flags.DEFINE_float('data_l2_norm', 8, 'Bound on the L2 norm of normalized data')
-
-
-def lr_model_fn(features, labels, mode, nclasses, dim):
-  """Model function for logistic regression."""
-  input_layer = tf.reshape(features['x'], tuple([-1]) + dim)
-
-  logits = tf.layers.dense(
-      inputs=input_layer,
-      units=nclasses,
-      kernel_regularizer=tf.contrib.layers.l2_regularizer(
-          scale=FLAGS.regularizer),
-      bias_regularizer=tf.contrib.layers.l2_regularizer(
-          scale=FLAGS.regularizer))
-
-  # Calculate loss as a vector (to support microbatches in DP-SGD).
-  vector_loss = tf.nn.sparse_softmax_cross_entropy_with_logits(
-      labels=labels, logits=logits) + tf.losses.get_regularization_loss()
-  # Define mean of loss across minibatch (for reporting through tf.Estimator).
-  scalar_loss = tf.reduce_mean(vector_loss)
-
-  # Configure the training op (for TRAIN mode).
-  if mode == tf.estimator.ModeKeys.TRAIN:
-    if FLAGS.dpsgd:
-      # The loss function is L-Lipschitz with L = sqrt(2*(||x||^2 + 1)) where
-      # ||x|| is the norm of the data.
-      # We don't use microbatches (thus speeding up computation), since no
-      # clipping is necessary due to data normalization.
-      optimizer = dp_optimizer.DPGradientDescentGaussianOptimizer(
-          l2_norm_clip=math.sqrt(2 * (FLAGS.data_l2_norm**2 + 1)),
-          noise_multiplier=FLAGS.noise_multiplier,
-          num_microbatches=1,
-          learning_rate=FLAGS.learning_rate)
-      opt_loss = vector_loss
-    else:
-      optimizer = GradientDescentOptimizer(learning_rate=FLAGS.learning_rate)
-      opt_loss = scalar_loss
-    global_step = tf.train.get_global_step()
-    train_op = optimizer.minimize(loss=opt_loss, global_step=global_step)
-    # In the following, we pass the mean of the loss (scalar_loss) rather than
-    # the vector_loss because tf.estimator requires a scalar loss. This is only
-    # used for evaluation and debugging by tf.estimator. The actual loss being
-    # minimized is opt_loss defined above and passed to optimizer.minimize().
-    return tf.estimator.EstimatorSpec(
-        mode=mode, loss=scalar_loss, train_op=train_op)
-
-  # Add evaluation metrics (for EVAL mode).
-  elif mode == tf.estimator.ModeKeys.EVAL:
-    eval_metric_ops = {
-        'accuracy':
-            tf.metrics.accuracy(
-                labels=labels, predictions=tf.argmax(input=logits, axis=1))
-    }
-    return tf.estimator.EstimatorSpec(
-        mode=mode, loss=scalar_loss, eval_metric_ops=eval_metric_ops)
-
-
-def normalize_data(data, data_l2_norm):
-  """Normalizes data such that each samples has bounded L2 norm.
-
-  Args:
-    data: the dataset. Each row represents one samples.
-    data_l2_norm: the target upper bound on the L2 norm.
-  """
-
-  for i in range(data.shape[0]):
-    norm = np.linalg.norm(data[i])
-    if norm > data_l2_norm:
-      data[i] = data[i] / norm * data_l2_norm
-
-
-def load_mnist(data_l2_norm=float('inf')):
-  """Loads MNIST and preprocesses to combine training and validation data."""
-  train, test = tf.keras.datasets.mnist.load_data()
-  train_data, train_labels = train
-  test_data, test_labels = test
-
-  train_data = np.array(train_data, dtype=np.float32) / 255
-  test_data = np.array(test_data, dtype=np.float32) / 255
-
-  train_data = train_data.reshape(train_data.shape[0], -1)
-  test_data = test_data.reshape(test_data.shape[0], -1)
-
-  idx = np.random.permutation(len(train_data))  # shuffle data once
-  train_data = train_data[idx]
-  train_labels = train_labels[idx]
-
-  normalize_data(train_data, data_l2_norm)
-  normalize_data(test_data, data_l2_norm)
-
-  train_labels = np.array(train_labels, dtype=np.int32)
-  test_labels = np.array(test_labels, dtype=np.int32)
-
-  return train_data, train_labels, test_data, test_labels
-
-
-def print_privacy_guarantees(epochs, batch_size, samples, noise_multiplier):
-  """Tabulating position-dependent privacy guarantees."""
-  if noise_multiplier == 0:
-    print('No differential privacy (additive noise is 0).')
-    return
-
-  print('In the conditions of Theorem 34 (https://arxiv.org/abs/1808.06651) '
-        'the training procedure results in the following privacy guarantees.')
-
-  print('Out of the total of {} samples:'.format(samples))
-
-  steps_per_epoch = samples // batch_size
-  orders = np.concatenate(
-      [np.linspace(2, 20, num=181),
-       np.linspace(20, 100, num=81)])
-  delta = 1e-5
-  for p in (.5, .9, .99):
-    steps = math.ceil(steps_per_epoch * p)  # Steps in the last epoch.
-    coef = 2 * (noise_multiplier * batch_size)**-2 * (
-        # Accounting for privacy loss
-        (epochs - 1) / steps_per_epoch +  # ... from all-but-last epochs
-        1 / (steps_per_epoch - steps + 1))  # ... due to the last epoch
-    # Using RDP accountant to compute eps. Doing computation analytically is
-    # an option.
-    rdp = [order * coef for order in orders]
-    eps, _, _ = get_privacy_spent(orders, rdp, target_delta=delta)
-    print('\t{:g}% enjoy at least ({:.2f}, {})-DP'.format(
-        p * 100, eps, delta))
-
-  # Compute privacy guarantees for the Sampled Gaussian Mechanism.
-  rdp_sgm = compute_rdp(batch_size / samples, noise_multiplier,
-                        epochs * steps_per_epoch, orders)
-  eps_sgm, _, _ = get_privacy_spent(orders, rdp_sgm, target_delta=delta)
-  print('By comparison, DP-SGD analysis for training done with the same '
-        'parameters and random shuffling in each epoch guarantees '
-        '({:.2f}, {})-DP for all samples.'.format(eps_sgm, delta))
-
-
-def main(unused_argv):
-  tf.logging.set_verbosity(tf.logging.INFO)
-  if FLAGS.data_l2_norm <= 0:
-    raise ValueError('data_l2_norm must be positive.')
-  if FLAGS.dpsgd and FLAGS.learning_rate > 8 / FLAGS.data_l2_norm**2:
-    raise ValueError('The amplification-by-iteration analysis requires'
-                     'learning_rate <= 2 / beta, where beta is the smoothness'
-                     'of the loss function and is upper bounded by ||x||^2 / 4'
-                     'with ||x|| being the largest L2 norm of the samples.')
-
-  # Load training and test data.
-  # Smoothness = ||x||^2 / 4 where ||x|| is the largest L2 norm of the samples.
-  # To get bounded smoothness, we normalize the data such that each sample has a
-  # bounded L2 norm.
-  train_data, train_labels, test_data, test_labels = load_mnist(
-      data_l2_norm=FLAGS.data_l2_norm)
-
-  # Instantiate tf.Estimator.
-  # pylint: disable=g-long-lambda
-  model_fn = lambda features, labels, mode: lr_model_fn(
-      features, labels, mode, nclasses=10, dim=train_data.shape[1:])
-  mnist_classifier = tf.estimator.Estimator(
-      model_fn=model_fn, model_dir=FLAGS.model_dir)
-
-  # Create tf.Estimator input functions for the training and test data.
-  # To analyze the per-user privacy loss, we keep the same orders of samples in
-  # each epoch by setting shuffle=False.
-  train_input_fn = tf.estimator.inputs.numpy_input_fn(
-      x={'x': train_data},
-      y=train_labels,
-      batch_size=FLAGS.batch_size,
-      num_epochs=FLAGS.epochs,
-      shuffle=False)
-  eval_input_fn = tf.estimator.inputs.numpy_input_fn(
-      x={'x': test_data}, y=test_labels, num_epochs=1, shuffle=False)
-
-  # Train the model.
-  num_samples = train_data.shape[0]
-  steps_per_epoch = num_samples // FLAGS.batch_size
-
-  mnist_classifier.train(
-      input_fn=train_input_fn, steps=steps_per_epoch * FLAGS.epochs)
-
-  # Evaluate the model and print results.
-  eval_results = mnist_classifier.evaluate(input_fn=eval_input_fn)
-  print('Test accuracy after {} epochs is: {:.2f}'.format(
-      FLAGS.epochs, eval_results['accuracy']))
-
-  if FLAGS.dpsgd:
-    print_privacy_guarantees(
-        epochs=FLAGS.epochs,
-        batch_size=FLAGS.batch_size,
-        samples=num_samples,
-        noise_multiplier=FLAGS.noise_multiplier,
-    )
-
-if __name__ == '__main__':
-  app.run(main)
--- a/tensorflow_privacy/tutorials/walkthrough/mnist_scratch.py
+++ b/tensorflow_privacy/tutorials/walkthrough/mnist_scratch.py
@ -1,134 +0,0 @@
-# Copyright 2019, The TensorFlow Authors.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#      http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-
-"""Scratchpad for training a CNN on MNIST with DPSGD."""
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import numpy as np
-import tensorflow as tf
-
-tf.flags.DEFINE_float('learning_rate', .15, 'Learning rate for training')
-tf.flags.DEFINE_integer('batch_size', 256, 'Batch size')
-tf.flags.DEFINE_integer('epochs', 15, 'Number of epochs')
-
-FLAGS = tf.flags.FLAGS
-
-
-def cnn_model_fn(features, labels, mode):
-  """Model function for a CNN."""
-
-  # Define CNN architecture using tf.keras.layers.
-  input_layer = tf.reshape(features['x'], [-1, 28, 28, 1])
-  y = tf.keras.layers.Conv2D(16, 8,
-                             strides=2,
-                             padding='same',
-                             activation='relu').apply(input_layer)
-  y = tf.keras.layers.MaxPool2D(2, 1).apply(y)
-  y = tf.keras.layers.Conv2D(32, 4,
-                             strides=2,
-                             padding='valid',
-                             activation='relu').apply(y)
-  y = tf.keras.layers.MaxPool2D(2, 1).apply(y)
-  y = tf.keras.layers.Flatten().apply(y)
-  y = tf.keras.layers.Dense(32, activation='relu').apply(y)
-  logits = tf.keras.layers.Dense(10).apply(y)
-
-  # Calculate loss as a vector and as its average across minibatch.
-  vector_loss = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=labels,
-                                                               logits=logits)
-  scalar_loss = tf.reduce_mean(vector_loss)
-
-  # Configure the training op (for TRAIN mode).
-  if mode == tf.estimator.ModeKeys.TRAIN:
-    optimizer = tf.train.GradientDescentOptimizer(FLAGS.learning_rate)
-    opt_loss = scalar_loss
-    global_step = tf.train.get_global_step()
-    train_op = optimizer.minimize(loss=opt_loss, global_step=global_step)
-    return tf.estimator.EstimatorSpec(mode=mode,
-                                      loss=scalar_loss,
-                                      train_op=train_op)
-
-  # Add evaluation metrics (for EVAL mode).
-  elif mode == tf.estimator.ModeKeys.EVAL:
-    eval_metric_ops = {
-        'accuracy':
-            tf.metrics.accuracy(
-                labels=labels,
-                predictions=tf.argmax(input=logits, axis=1))
-    }
-    return tf.estimator.EstimatorSpec(mode=mode,
-                                      loss=scalar_loss,
-                                      eval_metric_ops=eval_metric_ops)
-
-
-def load_mnist():
-  """Loads MNIST and preprocesses to combine training and validation data."""
-  train, test = tf.keras.datasets.mnist.load_data()
-  train_data, train_labels = train
-  test_data, test_labels = test
-
-  train_data = np.array(train_data, dtype=np.float32) / 255
-  test_data = np.array(test_data, dtype=np.float32) / 255
-
-  train_labels = np.array(train_labels, dtype=np.int32)
-  test_labels = np.array(test_labels, dtype=np.int32)
-
-  assert train_data.min() == 0.
-  assert train_data.max() == 1.
-  assert test_data.min() == 0.
-  assert test_data.max() == 1.
-  assert train_labels.ndim == 1
-  assert test_labels.ndim == 1
-
-  return train_data, train_labels, test_data, test_labels
-
-
-def main(unused_argv):
-  tf.logging.set_verbosity(tf.logging.INFO)
-
-  # Load training and test data.
-  train_data, train_labels, test_data, test_labels = load_mnist()
-
-  # Instantiate the tf.Estimator.
-  mnist_classifier = tf.estimator.Estimator(model_fn=cnn_model_fn)
-
-  # Create tf.Estimator input functions for the training and test data.
-  train_input_fn = tf.estimator.inputs.numpy_input_fn(
-      x={'x': train_data},
-      y=train_labels,
-      batch_size=FLAGS.batch_size,
-      num_epochs=FLAGS.epochs,
-      shuffle=True)
-  eval_input_fn = tf.estimator.inputs.numpy_input_fn(
-      x={'x': test_data},
-      y=test_labels,
-      num_epochs=1,
-      shuffle=False)
-
-  # Training loop.
-  steps_per_epoch = 60000 // FLAGS.batch_size
-  for epoch in range(1, FLAGS.epochs + 1):
-    # Train the model for one epoch.
-    mnist_classifier.train(input_fn=train_input_fn, steps=steps_per_epoch)
-
-    # Evaluate the model and print results
-    eval_results = mnist_classifier.evaluate(input_fn=eval_input_fn)
-    test_accuracy = eval_results['accuracy']
-    print('Test accuracy after %d epochs is: %.3f' % (epoch, test_accuracy))
-
-if __name__ == '__main__':
-  tf.app.run()
--- a/tensorflow_privacy/tutorials/walkthrough/walkthrough.md
+++ b/tensorflow_privacy/tutorials/walkthrough/walkthrough.md
@ -1,431 +0,0 @@
-# Machine Learning with Differential Privacy in TensorFlow
-
-*Cross-posted from [cleverhans.io](http://www.cleverhans.io/privacy/2019/03/26/machine-learning-with-differential-privacy-in-tensorflow.html)*
-
-Differential privacy is a framework for measuring the privacy guarantees
-provided by an algorithm. Through the lens of differential privacy, we can
-design machine learning algorithms that responsibly train models on private
-data. Learning with differential privacy provides provable guarantees of
-privacy, mitigating the risk of exposing sensitive training data in machine
-learning. Intuitively, a model trained with differential privacy should not be
-affected by any single training example, or small set of training examples, in its data set.
-
-You may recall our [previous blog post on PATE](http://www.cleverhans.io/privacy/2018/04/29/privacy-and-machine-learning.html), 
-an approach that achieves private learning by carefully 
-coordinating the activity of several different ML 
-models [[Papernot et al.]](https://arxiv.org/abs/1610.05755). 
-In this post, you will learn how to train a differentially private model with
-another approach that relies on Differentially
-Private Stochastic Gradient Descent (DP-SGD) [[Abadi et al.]](https://arxiv.org/abs/1607.00133).
-DP-SGD and PATE are two different ways to achieve the same goal of privacy-preserving
-machine learning. DP-SGD makes less assumptions about the ML task than PATE, 
-but this comes at the expense of making modifications to the training algorithm. 
-
-Indeed, DP-SGD is
-a modification of the stochastic gradient descent algorithm,
-which is the basis for many optimizers that are popular in machine learning.
-Models trained with DP-SGD have provable privacy guarantees expressed in terms
-of differential privacy (we will explain what this means at the end of this
-post). We will be using the [TensorFlow Privacy](https://github.com/tensorflow/privacy) library,
-which provides an implementation of DP-SGD, to illustrate our presentation of DP-SGD
-and provide a hands-on tutorial.
-
-The only prerequisite for following this tutorial is to be able to train a
-simple neural network with TensorFlow. If you are not familiar with
-convolutional neural networks or how to train them, we recommend reading
-[this tutorial first](https://www.tensorflow.org/tutorials/keras/basic_classification)
-to get started with TensorFlow and machine learning.
-
-Upon completing the tutorial presented in this post, 
-you will be able to wrap existing optimizers
-(e.g., SGD, Adam, ...) into their differentially private counterparts using
-TensorFlow (TF) Privacy. You will also learn how to tune the parameters
-introduced by differentially private optimization. Finally, we will learn how to
-measure the privacy guarantees provided using analysis tools included in TF
-Privacy.
-
-## Getting started
-
-Before we get started with DP-SGD and TF Privacy, we need to put together a
-script that trains a simple neural network with TensorFlow.
-
-In the interest of keeping this tutorial focused on the privacy aspects of
-training, we've included
-such a script as companion code for this blog post in the `walkthrough` [subdirectory](https://github.com/tensorflow/privacy/tree/master/tutorials/walkthrough) of the
-`tutorials` found in the [TensorFlow Privacy](https://github.com/tensorflow/privacy) repository. The code found in the file `mnist_scratch.py` 
-trains a small
-convolutional neural network on the MNIST dataset for handwriting recognition.
-This script will be used as the basis for our exercise below.
-
-Next, we highlight some important code snippets from the `mnist_scratch.py`
-script.
-
-The first snippet includes the definition of a convolutional neural network
-using `tf.keras.layers`. The model contains two convolutional layers coupled
-with max pooling layers, a fully-connected layer, and a softmax. The model's
-output is a vector where each component indicates how likely the input is to be
-in one of the 10 classes of the handwriting recognition problem we considered.
-If any of this sounds unfamiliar, we recommend reading
-[this tutorial first](https://www.tensorflow.org/tutorials/keras/basic_classification)
-to get started with TensorFlow and machine learning.
-
-```python
-input_layer = tf.reshape(features['x'], [-1, 28, 28, 1])
-y = tf.keras.layers.Conv2D(16, 8,
-                           strides=2,
-                           padding='same',
-                           activation='relu').apply(input_layer)
-y = tf.keras.layers.MaxPool2D(2, 1).apply(y)
-y = tf.keras.layers.Conv2D(32, 4,
-                           strides=2,
-                           padding='valid',
-                           activation='relu').apply(y)
-y = tf.keras.layers.MaxPool2D(2, 1).apply(y)
-y = tf.keras.layers.Flatten().apply(y)
-y = tf.keras.layers.Dense(32, activation='relu').apply(y)
-logits = tf.keras.layers.Dense(10).apply(y)
-predicted_labels = tf.argmax(input=logits, axis=1)
-```
-
-The second snippet shows how the model is trained using the `tf.Estimator` API,
-which takes care of all the boilerplate code required to form minibatches used
-to train and evaluate the model. To prepare ourselves for the modifications we
-will be making to provide differential privacy, we still expose the loop over
-different epochs of learning: an epoch is defined as one pass over all of the
-training points included in the training set.
-
-```python
-steps_per_epoch = 60000 // FLAGS.batch_size
-for epoch in range(1, FLAGS.epochs + 1):
-  # Train the model for one epoch.
-  mnist_classifier.train(input_fn=train_input_fn, steps=steps_per_epoch)
-
-  # Evaluate the model and print results
-  eval_results = mnist_classifier.evaluate(input_fn=eval_input_fn)
-  test_accuracy = eval_results['accuracy']
-  print('Test accuracy after %d epochs is: %.3f' % (epoch, test_accuracy))
-```
-
-We are now ready to train our MNIST model without privacy. The model should
-achieve above 99% test accuracy after 15 epochs at a learning rate of 0.15 on
-minibatches of 256 training points.
-
-```shell
-python mnist_scratch.py
-```
-
-### Stochastic Gradient Descent
-
-Before we dive into how DP-SGD and TF Privacy can be used to provide differential privacy
-during machine learning, we first provide a brief overview of the stochastic
-gradient descent algorithm, which is one of the most popular optimizers for
-neural networks.
-
-Stochastic gradient descent is an iterative procedure. At each iteration, a
-batch of data is randomly sampled from the training set (this is where the
-*stochasticity* comes from). The error between the model's prediction and the
-training labels is then computed. This error, also called the loss, is then
-differentiated with respect to the model's parameters. These derivatives (or
-*gradients*) tell us how we should update each parameter to bring the model
-closer to predicting the correct label. Iteratively recomputing gradients and
-applying them to update the model's parameters is what is referred to as the
-*descent*. To summarize, the following steps are repeated until the model's
-performance is satisfactory:
-
-1.  Sample a minibatch of training points `(x, y)` where `x` is an input and `y`
-    a label.
-
-2.  Compute loss (i.e., error) `L(theta, x, y)` between the model's prediction
-    `f_theta(x)` and label `y` where `theta` represents the model parameters.
-
-3.  Compute gradient of the loss `L(theta, x, y)` with respect to the model
-    parameters `theta`.
-
-4.  Multiply these gradients by the learning rate and apply the product to
-    update model parameters `theta`.
-
-### Modifications needed to make stochastic gradient descent a differentially private algorithm
-
-Two modifications are needed to ensure that stochastic gradient descent is a
-differentially private algorithm.
-
-First, the sensitivity of each gradient needs to be bounded. In other words, we
-need to limit how much each individual training point sampled in a minibatch can
-influence the resulting gradient computation. This can be done by clipping each
-gradient computed on each training point between steps 3 and 4 above.
-Intuitively, this allows us to bound how much each training point can possibly
-impact model parameters.
-
-Second, we need to randomize the algorithm's behavior to make it statistically
-impossible to know whether or not a particular point was included in the
-training set by comparing the updates stochastic gradient descent applies when
-it operates with or without this particular point in the training set. This is
-achieved by sampling random noise and adding it to the clipped gradients.
-
-Thus, here is the stochastic gradient descent algorithm adapted from above to be
-differentially private:
-
-1.  Sample a minibatch of training points `(x, y)` where `x` is an input and `y`
-    a label.
-
-2.  Compute loss (i.e., error) `L(theta, x, y)` between the model's prediction
-    `f_theta(x)` and label `y` where `theta` represents the model parameters.
-
-3.  Compute gradient of the loss `L(theta, x, y)` with respect to the model
-    parameters `theta`.
-
-4.  Clip gradients, per training example included in the minibatch, to ensure
-    each gradient has a known maximum Euclidean norm.
-
-5.  Add random noise to the clipped gradients.
-
-6.  Multiply these clipped and noised gradients by the learning rate and apply
-    the product to update model parameters `theta`.
-
-### Implementing DP-SGD with TF Privacy
-
-It's now time to make changes to the code we started with to take into account
-the two modifications outlined in the previous paragraph: gradient clipping and
-noising. This is where TF Privacy kicks in: it provides code that wraps an
-existing TF optimizer to create a variant that performs both of these steps
-needed to obtain differential privacy.
-
-As mentioned above, step 1 of the algorithm, that is forming minibatches of
-training data and labels, is implemented by the `tf.Estimator` API in our
-tutorial. We can thus go straight to step 2 of the algorithm outlined above and
-compute the loss (i.e., model error) between the model's predictions and labels.
-
-```python
-vector_loss = tf.nn.sparse_softmax_cross_entropy_with_logits(
-    labels=labels, logits=logits)
-```
-
-TensorFlow provides implementations of common losses, here we use the
-cross-entropy, which is well-suited for our classification problem. Note how we
-computed the loss as a vector, where each component of the vector corresponds to
-an individual training point and label. This is required to support per example
-gradient manipulation later at step 4.
-
-We are now ready to create an optimizer. In TensorFlow, an optimizer object can
-be instantiated by passing it a learning rate value, which is used in step 6
-outlined above. 
-This is what the code would look like *without* differential privacy:
-
-```python
-optimizer = tf.train.GradientDescentOptimizer(FLAGS.learning_rate)
-train_op = optimizer.minimize(loss=scalar_loss)
-```
-
-Note that our code snippet assumes that a TensorFlow flag was
-defined for the learning rate value.
-
-Now, we use the `optimizers.dp_optimizer` module of TF Privacy to implement the
-optimizer with differential privacy. Under the hood, this code implements steps
-3-6 of the algorithm above:
-
-```python
-optimizer = optimizers.dp_optimizer.DPGradientDescentGaussianOptimizer(
-    l2_norm_clip=FLAGS.l2_norm_clip,
-    noise_multiplier=FLAGS.noise_multiplier,
-    num_microbatches=FLAGS.microbatches,
-    learning_rate=FLAGS.learning_rate,
-    population_size=60000)
-train_op = optimizer.minimize(loss=vector_loss)
-```
-
-In these two code snippets, we used the stochastic gradient descent
-optimizer but it could be replaced by another optimizer implemented in
-TensorFlow. For instance, the `AdamOptimizer` can be replaced by `DPAdamGaussianOptimizer`. In addition to the standard optimizers already
-included in TF Privacy, most optimizers which are objects from a child class
-of `tf.train.Optimizer`
-can be made differentially private by calling `optimizers.dp_optimizer.make_gaussian_optimizer_class()`.
-
-As you can see, only one line needs to change but there are a few things going
-on that are best to unwrap before we continue. In addition to the learning rate, we
-passed the size of the training set as the `population_size` parameter. This is
-used to measure the strength of privacy achieved; we will come back to this
-accounting aspect later.
-
-More importantly, TF Privacy introduces three new hyperparameters to the
-optimizer object: `l2_norm_clip`, `noise_multiplier`, and `num_microbatches`.
-You may have deduced what `l2_norm_clip` and `noise_multiplier` are from the two
-changes outlined above.
-
-Parameter `l2_norm_clip` is the maximum Euclidean norm of each individual
-gradient that is computed on an individual training example from a minibatch. This
-parameter is used to bound the optimizer's sensitivity to individual training
-points. Note how in order for the optimizer to be able to compute these per
-example gradients, we must pass it a *vector* loss as defined previously, rather
-than the loss averaged over the entire minibatch.
-
-Next, the `noise_multiplier` parameter is used to control how much noise is
-sampled and added to gradients before they are applied by the optimizer.
-Generally, more noise results in better privacy (often, but not necessarily, at
-the expense of lower utility).
-
-The third parameter relates to an aspect of DP-SGD that was not discussed
-previously. In practice, clipping gradients on a per example basis can be
-detrimental to the performance of our approach because computations can no
-longer be batched and parallelized at the granularity of minibatches. Hence, we
-introduce a new granularity by splitting each minibatch into multiple
-microbatches [[McMahan et al.]](https://arxiv.org/abs/1812.06210). Rather than
-clipping gradients on a per example basis, we clip them on a microbatch basis.
-For instance, if we have a minibatch of 256 training examples, rather than
-clipping each of the 256 gradients individually, we would clip 32 gradients
-averaged over microbatches of 8 training examples when `num_microbatches=32`.
-This allows for some degree of parallelism. Hence, one can think of
-`num_microbatches` as a parameter that allows us to trade off performance (when
-the parameter is set to a small value) with utility (when the parameter is set
-to a value close to the minibatch size).
-
-Once you've implemented all these changes, try training your model again with
-the differentially private stochastic gradient optimizer. You can use the
-following hyperparameter values to obtain a reasonable model (95% test
-accuracy):
-
-```python
-learning_rate=0.25
-noise_multiplier=1.3
-l2_norm_clip=1.5
-batch_size=256
-epochs=15
-num_microbatches=256
-```
-
-### Measuring the privacy guarantee achieved
-
-At this point, we made all the changes needed to train our model with
-differential privacy. Congratulations! Yet, we are still missing one crucial
-piece of the puzzle: we have not computed the privacy guarantee achieved. Recall
-the two modifications we made to the original stochastic gradient descent
-algorithm: clip and randomize gradients.
-
-It is intuitive to machine learning practitioners how clipping gradients limits
-the ability of the model to overfit to any of its training points. In fact,
-gradient clipping is commonly employed in machine learning even when privacy is
-not a concern. The intuition for introducing randomness to a learning algorithm
-that is already randomized is a little more subtle but this additional
-randomization is required to make it hard to tell which behavioral aspects of
-the model defined by the learned parameters came from randomness and which came
-from the training data. Without randomness, we would be able to ask questions
-like: “What parameters does the learning algorithm choose when we train it on
-this specific dataset?” With randomness in the learning algorithm, we instead
-ask questions like: “What is the probability that the learning algorithm will
-choose parameters in this set of possible parameters, when we train it on this
-specific dataset?”
-
-We use a version of differential privacy which requires that the probability of
-learning any particular set of parameters stays roughly the same if we change a
-single training example in the training set. This could mean to add a training
-example, remove a training example, or change the values within one training
-example. The intuition is that if a single training point does not affect the
-outcome of learning, the information contained in that training point cannot be
-memorized and the privacy of the individual who contributed this data point to our
-dataset is respected. We often refer to this probability as the privacy budget:
-smaller privacy budgets correspond to stronger privacy guarantees.
-
-Accounting required to compute the privacy budget spent to train our machine
-learning model is another feature provided by TF Privacy. Knowing what level of
-differential privacy was achieved allows us to put into perspective the drop in
-utility that is often observed when switching to differentially private
-optimization. It also allows us to compare two models objectively to determine
-which of the two is more privacy-preserving than the other.
-
-Before we derive a bound on the privacy guarantee achieved by our optimizer, we
-first need to identify all the parameters that are relevant to measuring the
-potential privacy loss induced by training. These are the `noise_multiplier`,
-the sampling ratio `q` (the probability of an individual training point being
-included in a minibatch), and the number of `steps` the optimizer takes over the
-training data. We simply report the `noise_multiplier` value provided to the
-optimizer and compute the sampling ratio and number of steps as follows:
-
-```python
-noise_multiplier = FLAGS.noise_multiplier
-sampling_probability = FLAGS.batch_size / 60000
-steps = FLAGS.epochs * 60000 // FLAGS.batch_size
-```
-
-At a high level, the privacy analysis measures how including or excluding any
-particular point in the training data is likely to change the probability that
-we learn any particular set of parameters. In other words, the analysis measures
-the difference between the distributions of model parameters on neighboring training
-sets (pairs of any training sets with a Hamming distance of 1). In TF Privacy,
-we use the Rényi divergence to measure this distance between distributions.
-Indeed, our analysis is performed in the framework of Rényi Differential Privacy
-(RDP), which is a generalization of pure differential privacy
-[[Mironov]](https://arxiv.org/abs/1702.07476). RDP is a useful tool here because
-it is particularly well suited to analyze the differential privacy guarantees
-provided by sampling followed by Gaussian noise addition, which is how gradients
-are randomized in the TF Privacy implementation of the DP-SGD optimizer.
-
-We will express our differential privacy guarantee using two parameters:
-`epsilon` and `delta`.
-
-*   Delta bounds the probability of our privacy guarantee not holding. A rule of
-    thumb is to set it to be less than the inverse of the training data size
-    (i.e., the population size). Here, we set it to `10^-5` because MNIST has
-    60000 training points.
-
-*   Epsilon measures the strength of our privacy guarantee. In the case of
-    differentially private machine learning, it gives a bound on how much the
-    probability of a particular model output can vary by including (or removing)
-    a single training example. We usually want it to be a small constant.
-    However, this is only an upper bound, and a large value of epsilon could
-    still mean good practical privacy.
-
-The TF Privacy library provides two methods relevant to derive privacy
-guarantees achieved from the three parameters outlined in the last code snippet: `compute_rdp`
-and `get_privacy_spent`.
-These methods are found in its `analysis.rdp_accountant` module. Here is how to use them.
-
-First, we need to define a list of orders, at which the Rényi divergence will be
-computed. While some finer points of how to use the RDP accountant are outside the 
-scope of this document, it is useful to keep in mind the following.
-First, there is very little downside in expanding the list of orders for which RDP
-is computed. Second, the computed privacy budget is typically not very sensitive to
-the exact value of the order (being close enough will land you in the right neighborhood).
-Finally, if you are targeting a particular range of epsilons (say, 1—10) and your delta is
-fixed (say, `10^-5`), then your orders must cover the range between `1+ln(1/delta)/10≈2.15` and 
-`1+ln(1/delta)/1≈12.5`. This last rule may appear circular (how do you know what privacy
-parameters you get without running the privacy accountant?!), one or two adjustments 
-of the range of the orders would usually suffice.
-
-```python
-orders = [1 + x / 10. for x in range(1, 100)] + list(range(12, 64))
-rdp = compute_rdp(q=sampling_probability,
-                  noise_multiplier=FLAGS.noise_multiplier,
-                  steps=steps,
-                  orders=orders)
-```
-
-Then, the method `get_privacy_spent` computes the best `epsilon` for a given
-`target_delta` value of delta by taking the minimum over all orders.
-
-```python
-epsilon = get_privacy_spent(orders, rdp, target_delta=1e-5)[0]
-```
-
-Running the code snippets above with the hyperparameter values used during
-training will estimate the `epsilon` value that was achieved by the
-differentially private optimizer, and thus the strength of the privacy guarantee
-which comes with the model we trained. Once we computed the value of `epsilon`, 
-interpreting this value is at times
-difficult. One possibility is to purposely 
-insert secrets in the model's training set and measure how likely
-they are to be leaked by a differentially private model 
-(compared to a non-private model) at inference time 
-[[Carlini et al.]](https://arxiv.org/abs/1802.08232).
-
-### Putting all the pieces together
-
-We covered a lot in this blog post! If you made all the changes discussed
-directly into the `mnist_scratch.py` file, you should have been able to train a
-differentially private neural network on MNIST and measure the privacy guarantee
-achieved.
-
-However, in case you ran into an issue or you'd like to see what a complete
-implementation looks like, the "solution" to the tutorial presented in this blog
-post can be [found](https://github.com/tensorflow/privacy/blob/master/tutorials/mnist_dpsgd_tutorial.py) in the
-tutorials directory of TF Privacy. It is the script called `mnist_dpsgd_tutorial.py`.
-
-