Add comments explaining the relationship between ML terminology and DP terminology.

PiperOrigin-RevId: 246926753
2019-05-06 17:12:06 -07:00 · 2019-05-06 17:12:06 -07:00 · 82852c0e71
commit 82852c0e71
parent 9cece21d92
1 changed files with 45 additions and 6 deletions
--- a/privacy/dp_query/dp_query.py
+++ b/privacy/dp_query/dp_query.py
@ -13,6 +13,33 @@
 # limitations under the License.
 """An interface for differentially private query mechanisms.
 The DPQuery class abstracts the differential privacy mechanism needed by DP-SGD.
 The nomenclature is not specific to machine learning, but rather comes from
 the differential privacy literature. Therefore, instead of talking about
 examples, minibatches, and gradients, the code talks about records, samples and
 queries. For more detail, please see the paper here:
 https://arxiv.org/pdf/1812.06210.pdf
 A common usage paradigm for this class is centralized DP-SGD training on a
 fixed set of training examples, which we call "standard DP-SGD training."
 In such training, SGD applies as usual by computing gradient updates from a set
 of training examples that form a minibatch. However, each minibatch is broken
 up into disjoint "microbatches."  The gradient of each microbatch is computed
 and clipped to a maximum norm, with the "records" for all such clipped gradients
 forming a "sample" that constitutes the entire minibatch. Subsequently, that
 sample can be "queried" to get an averaged, noised gradient update that can be
 applied to model parameters.
 In order to prevent inaccurate accounting of privacy parameters, the only
 means of inspecting the gradients and updates of SGD training is via the use
 of the below interfaces, and through the accumulation and querying of a
 "sample state" abstraction. Thus, accessing data is indirect on purpose.
 The DPQuery class also allows the use of a global state that may change between
 samples. In the common situation where the privacy mechanism remains unchanged
 throughout the entire training process, the global state is usually None.
 """
 from __future__ import absolute_import
@ -62,12 +89,18 @@ class DPQuery(object):
    """Accumulates a single record into the sample state.
    Args:
-      params: The parameters for the sample.
+      params: The parameters for the sample. In standard DP-SGD training,
-      sample_state: The current sample state.
+        the clipping norm for the sample's microbatch gradients (i.e.,
-      record: The record to accumulate.
+        a maximum norm magnitude to which each gradient is clipped)
      sample_state: The current sample state. In standard DP-SGD training,
        the accumulated sum of previous clipped microbatch gradients.
      record: The record to accumulate. In standard DP-SGD training,
        the gradient computed for the examples in one microbatch, which
        may be the gradient for just one example (for size 1 microbatches).
    Returns:
-      The updated sample state.
+      The updated sample state. In standard DP-SGD training, the set of
      previous mcrobatch gradients with the addition of the record argument.
    """
    pass
@ -77,10 +110,16 @@ class DPQuery(object):
    Args:
      sample_state: The sample state after all records have been accumulated.
-      global_state: The global state.
+        In standard DP-SGD training, the accumulated sum of clipped microbatch
        gradients (in the special case of microbatches of size 1, the clipped
        per-example gradients).
      global_state: The global state, storing long-term privacy bookkeeping.
    Returns:
      A tuple (result, new_global_state) where "result" is the result of the
-      query and "new_global_state" is the updated global state.
+      query and "new_global_state" is the updated global state. In standard
      DP-SGD training, the result is a gradient update comprising a noised
      average of the clipped gradients in the sample state---with the noise and
      averaging performed in a manner that guarantees differential privacy.
    """
    pass