set_denominator was added so that the batch size doesn't need to be specified before constructing the optimizer, but it breaks the DPQuery abstraction. Now the optimizer uses a GaussianSumQuery instead of GaussianAverageQuery, and normalization by batch size is done inside the optimizer.
Also instead of creating all DPQueries with a PrivacyLedger and then wrapping with QueryWithLedger, it is now sufficient to create the queries with no ledger and QueryWithLedger will construct the ledger and pass it to all inner queries.
PiperOrigin-RevId: 251462353
1. Split DPQuery.accumulate_record function into preprocess_record and accumulate_preprocessed_record.
2. Add merge_sample_state function.
3. Add default implementations for some functions in DPQuery, and add base class SumAggregationDPQuery that implements some more. Only get_noised_result is still abstract.
4. Enforce that all states and parameters used as inputs and outputs to DPQuery functions are nested structures of tensors by replacing numbers with constants and Nones with empty tuples.
PiperOrigin-RevId: 247975791
At present, the script has no heavy dependencies except for the rdp_accountant, which is by itself pretty light-weight. However, importing rdp_accountant triggers __init__.py in third_party/py/tensorflow_privacy/privacy, which loads TF and all of tf.privacy. The CL adds a check to the __init__.py, which controls this behavior.
PiperOrigin-RevId: 243172355
Prior to this change the PrivacyLedger is running to keep a log of private queries, but the ledger is not actually used to compute the (epsilon, delta) guarantees. This CL adds a function to compute the RDP directly from the ledger.
Note I did verify that the tutorial builds and runs with the changes and for the first few iterations prints the same epsilon values as before the change.
PiperOrigin-RevId: 241063532
Moved query classes from dir optimizers into new dir dp_query. Added NormalizedQuery class for queries that divide the output of another query by a constant like GaussianAverageQuery.
PiperOrigin-RevId: 240167115
The global state for DP query is intended for aspects of the query that change across samples under the query's own control. It was therefore unnecessary to wrap "l2_norm_clip" and "sum_stddev" in the namedtuple _GlobalState for the basic GaussianQuery classes.
PiperOrigin-RevId: 237528962
__xrange()__ was removed in Python 3 in favor of a reworked version of __range()__.
[flake8](http://flake8.pycqa.org) testing of https://github.com/tensorflow/privacy on Python 3.7.1
$ __flake8 . --count --select=E9,F63,F72,F82 --show-source --statistics__
```
./privacy/optimizers/gaussian_query_test.py:65:16: F821 undefined name 'xrange'
for _ in xrange(1000):
^
./research/pate_2018/ICLR2018/rdp_bucketized.py:79:12: F821 undefined name 'xrange'
for i in xrange(n):
^
./research/pate_2018/ICLR2018/rdp_bucketized.py:106:12: F821 undefined name 'xrange'
for i in xrange(n):
^
./research/pate_2018/ICLR2018/rdp_bucketized.py:139:12: F821 undefined name 'xrange'
for i in xrange(n):
^
4 F821 undefined name 'xrange'
4
```
__E901,E999,F821,F822,F823__ are the "_showstopper_" [flake8](http://flake8.pycqa.org) issues that can halt the runtime with a SyntaxError, NameError, etc. These 5 are different from most other flake8 issues which are merely "style violations" -- useful for readability but they do not effect runtime safety.
* F821: undefined name `name`
* F822: undefined name `name` in `__all__`
* F823: local variable name referenced before assignment
* E901: SyntaxError or IndentationError
* E999: SyntaxError -- failed to compile a file into an Abstract Syntax Tree
tf.nest.map_structure_up_to has changed so that map_structure_up_to(x, func, x, y) no longer raises an error when y is longer than x, for example x=[1,2], y=[1,2,3]. This broke one of our tests for nested query. Remove the test until (if and when) the old, more reasonable, behavior is restored.
PiperOrigin-RevId: 232057385