setup docs

Algue-Rythme · Mar 19, 2024 · d866f11 · d866f11
1 parent 75c9faf
commit d866f11
Show file tree

Hide file tree

Showing 18 changed files with 270 additions and 147 deletions.
diff --git a/.github/workflows/python-linters.yml b/.github/workflows/python-linters.yml
@@ -0,0 +1,28 @@
+name: lip-dp linters
+
+on:
+  push:
+    branches:
+      - main
+      - release-no-advertising
+  pull_request:
+    branches:
+      - main
+      - release-no-advertising
+
+jobs:
+  checks:
+    runs-on: ubuntu-latest
+
+    steps:
+    - uses: actions/checkout@v3
+    - name: Set up Python 3.11
+      uses: actions/setup-python@v4
+      with:
+        python-version: 3.11
+    - name: Install dependencies
+      run: |
+        python -m pip install --upgrade pip
+        pip install tox
+    - name: Check lint
+      run: tox -e py311-lint
diff --git a/.github/workflows/tests.yml b/.github/workflows/tests.yml
@@ -2,9 +2,9 @@ name: tests
 
 on:
   push:
-    branches: ["release-no-advertising"]
+    branches: ["main", "release-no-advertising"]
   pull_request:
-    branches: ["release-no-advertising"]
+    branches: ["main", "release-no-advertising"]
 
 jobs:
   build-and-test:

diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -45,7 +45,7 @@ repos:
     rev: v3.0.0a5
     hooks:
       - id: pylint
-        args: [--enable=unused-import --max-line-length=100, --disable=all]
+        args: [--disable=all]
 
 
   # - repo: https://github.com/commitizen-tools/commitizen

diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -4,14 +4,14 @@ Thanks for taking the time to contribute!
 
 From opening a bug report to creating a pull request: every contribution is
 appreciated and welcome. If you're planning to implement a new feature or change
-the api please create an [issue first](https://https://github.com/deel-ai/dp-lipschitz/issues/new). This way we can ensure that your precious
+the api please create an [issue first](https://github.com/Algue-Rythme/lip-dp/issues). This way we can ensure that your precious
 work is not in vain.
 
 
 ## Setup with make
 
-- Clone the repo `git clone https://github.com/deel-ai/lipdp.git`.
-- Go to your freshly downloaded repo `cd lipdp`
+- Clone the repo `git clone git@github.com:Algue-Rythme/lip-dp.git`.
+- Go to your freshly downloaded repo `cd lip-dp`
 - Create a virtual environment and install the necessary dependencies for development:
 
   `make prepare-dev && source lipdp_dev_env/bin/activate`.
@@ -26,9 +26,8 @@ This command activate your virtual environment and launch the `tox` command.
 
 
 `tox` on the otherhand will do the following:
-- run pytest on the tests folder with python 3.6, python 3.7 and python 3.8
-> Note: If you do not have those 3 interpreters the tests would be only performs with your current interpreter
-- run pylint on the deel-datasets main files, also with python 3.6, python 3.7 and python 3.8
+- run pytest on the tests folder
+- run pylint on the deel-datasets main files
 > Note: It is possible that pylint throw false-positive errors. If the linting test failed please check first pylint output to point out the reasons.
 
 Please, make sure you run all the tests at least once before opening a pull request.
@@ -42,7 +41,7 @@ Basically, it will check that your code follow a certain number of convention. A
 
 After getting some feedback, push to your fork and submit a pull request. We
 may suggest some changes or improvements or alternatives, but for small changes
-your pull request should be accepted quickly (see [Governance policy](https://github.com/deel-ai/lipdp/blob/master/GOVERNANCE.md)).
+your pull request should be accepted quickly (see [Governance policy](https://github.com/Algue-Rythme/lip-dp/blob/release-no-advertising/GOVERNANCE.md)).
 
 Something that will increase the chance that your pull request is accepted:
 
@@ -51,4 +50,3 @@ Something that will increase the chance that your pull request is accepted:
 - Follow the existing coding style and run `make check_all` to check all files format.
 - Write a [good commit message](https://tbaggery.com/2008/04/19/a-note-about-git-commit-messages.html) (we follow a lowercase convention).
 - For a major fix/feature make sure your PR has an issue and if it doesn't, please create one. This would help discussion with the community, and polishing ideas in case of a new feature.
-
diff --git a/README.md b/README.md
@@ -1,22 +1,26 @@
 <p align="center">
-<img src="./docs/assets/lipdp_logo.png" alt="lipdp_logo" width="350"/></p>
+<img src="./docs/assets/lipdp_logo.png" alt="lipdp_logo" width="300"/></p>
 <!-- Badge section -->
 <div align="center">
     <a href="#">
-        <img src="https://img.shields.io/badge/Python-3.9|3.10|3.11-efefef">
+        <img src="https://img.shields.io/badge/Python-3.9 | 3.10 | 3.11-efefef">
     </a>
     <a href="https://github.com/Algue-Rythme/lip-dp/actions/workflows/tests.yml">
         <img alt="Tests" src="https://github.com/Algue-Rythme/lip-dp/actions/workflows/tests.yml/badge.svg?branch=release-no-advertising">
     </a>
+    <a href="https://github.com/Algue-Rythme/lip-dp/actions/workflows/python-linters.yml">
+        <img alt="Linter" src="https://github.com/Algue-Rythme/lip-dp/actions/workflows/python-linters.yml/badge.svg?branch=release-no-advertising">
+    </a>
     <a href="#">
         <img src="https://img.shields.io/badge/License-MIT-efefef">
     </a>
 </div>
-<br>
+</p>
 
 <!-- Short description of your library -->
 <p align="center">
   <b>LipDP</b> is a Python toolkit dedicated to robust and certifiable learning under privacy guarantees.  
+</p>
 
 
 This package is the code for the paper "*DP-SGD Without Clipping: The Lipschitz Neural Network Way*" by Louis Béthune, Thomas Massena, Thibaut Boissin, Aurélien Bellet, Franck Mamalet, Yannick Prudent, Corentin Friedrich, Mathieu Serrurier, David Vigouroux, published at the **International Conference on Learning Representations (ICLR 2024)**. The paper is available on [arxiv](https://arxiv.org/abs/2305.16202).   

diff --git a/deel/lipdp/dynamic.py b/deel/lipdp/dynamic.py
@@ -20,6 +20,7 @@
 # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
 # SOFTWARE.
+"""Dynamic gradient clipping for differential privacy."""
 import random
 from abc import abstractmethod
 
@@ -66,9 +67,11 @@ def on_train_begin(self, logs=None):
 
     def get_gradloss(self):
         """Computes the norm of gradient of the loss with respect to the model's output.
-        
-        Warning: this method is unsafe from a privacy perspective, as the true gradient bound is computed.
-        It is meant to be used with privacy-preserving methods only, such as the ones implemented in this module.
+
+        Warning: this method is unsafe from a privacy perspective,
+            as the true gradient bound is computed.
+        It is meant to be used with privacy-preserving methods only,
+            such as the ones implemented in this module.
         """
         batch = next(iter(self.ds_train.take(1)))
         imgs, labels = batch

diff --git a/deel/lipdp/model.py b/deel/lipdp/model.py
@@ -20,6 +20,7 @@
 # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
 # SOFTWARE.
+"""Model class for differentially private training with Lipschitz constraints."""
 from dataclasses import dataclass
 
 import numpy as np

diff --git a/deel/lipdp/pipeline.py b/deel/lipdp/pipeline.py
@@ -354,9 +354,11 @@ def load_and_prepare_images_data(
         nb_samples_train=ds_info.splits["train"].num_examples,
         nb_samples_test=ds_info.splits["test"].num_examples,
         class_names=ds_info.features["label"].names,
-        nb_steps_per_epochs=ds_train.cardinality().numpy()
-        if ds_train.cardinality() > 0  # handle case cardinality return -1 (unknown)
-        else ds_info.splits["train"].num_examples / batch_size,
+        nb_steps_per_epochs=(
+            ds_train.cardinality().numpy()
+            if ds_train.cardinality() > 0  # handle case cardinality return -1 (unknown)
+            else ds_info.splits["train"].num_examples / batch_size
+        ),
         batch_size=batch_size,
         max_norm=bound_val,
     )
@@ -493,9 +495,11 @@ def prepare_tabular_data(
         nb_samples_train=x_train.shape[0],
         nb_samples_test=x_test.shape[0],
         class_names=[str(i) for i in range(nb_classes)],
-        nb_steps_per_epochs=ds_train.cardinality().numpy()
-        if ds_train.cardinality() > 0  # handle case cardinality return -1 (unknown)
-        else x_train.shape[0] / batch_size,
+        nb_steps_per_epochs=(
+            ds_train.cardinality().numpy()
+            if ds_train.cardinality() > 0  # handle case cardinality return -1 (unknown)
+            else x_train.shape[0] / batch_size
+        ),
         batch_size=batch_size,
         max_norm=bound_val,
     )

diff --git a/deel/lipdp/sensitivity.py b/deel/lipdp/sensitivity.py
@@ -91,10 +91,14 @@ def fun(epoch):
         elif error < atol:
             # This branch should never be taken if fun is a non-decreasing function of the number of epochs.
             # fun is mathematcally non-decreasing, but numerical inaccuracy can lead to this case.
-            print(f"Numerical inaccuracy with error {error:.7f} in the dichotomy search: using a conservative value.")
+            print(
+                f"Numerical inaccuracy with error {error:.7f} in the dichotomy search: using a conservative value."
+            )
             return epochs_min - 1
         else:
-            assert False, f"Numerical inaccuracy with error {error:.7f}>{atol:.3f} in the dichotomy search."
+            assert (
+                False,
+            ), f"Numerical inaccuracy with error {error:.7f}>{atol:.3f} in the dichotomy search."
 
     return epochs_max
 
@@ -106,7 +110,7 @@ def gradient_norm_check(upper_bounds, model, examples):
     Args :
         upper_bounds: maximum gradient bounds for each layer (dictionnary of 'layers name ': 'bounds' pairs).
         model: The model containing the layers we are interested in. Layers must only have one trainable variable.
-        examples: a batch of examples to test on.  
+        examples: a batch of examples to test on.
     Returns :
         Boolean value. True corresponds to upper bound has been validated.
     """
@@ -117,19 +121,30 @@ def gradient_norm_check(upper_bounds, model, examples):
         assert len(layer.trainable_variables) < 2
         if len(layer.trainable_variables) == 1:
             assert len(layer.trainable_variables) == 1
-            train_var = layer.trainable_variables[0]
             var_name = layer.trainable_variables[0].name
             var_seen.add(var_name)
             bound = upper_bounds[var_name]
-            check_layer_gradient_norm(bound, layer, activations)
+            bound_check = check_layer_gradient_norm(bound, layer, activations)
+            assert (
+                bound_check
+            ), f"Gradient norm check failed for layer {layer.name} with bound {bound}."
         activations = post_activations
     for var_name in upper_bounds:
         assert var_name in var_seen
 
 
 def check_layer_gradient_norm(S, layer, activations):
+    """Check that the maximum gradient norm of a layer is less than S.
+
+    Args:
+        S: The maximum gradient norm.
+        layer: The layer to check.
+        activations: The input to the layer.
+    Returns:
+        Boolean value. True corresponds to upper bound has been validated.
+    """
     trainable_vars = layer.trainable_variables[0]
-    with tf.GradientTape() as tape:        
+    with tf.GradientTape() as tape:
         y_pred = layer(activations, training=True)
         flat_pred = tf.reshape(y_pred, (y_pred.shape[0], -1))
     jacobians = tape.jacobian(flat_pred, trainable_vars)
@@ -141,8 +156,8 @@ def check_layer_gradient_norm(S, layer, activations):
         (y_pred.shape[0], -1, np.prod(trainable_vars.shape)),
         name="Reshaped_Gradient",
     )
-    J_sigma = tf.linalg.svd(jacobians, full_matrices=False, compute_uv=False, name=None)
-    J_2norm = tf.reduce_max(J_sigma, axis=-1)
-    J_2norm = tf.reduce_max(J_2norm).numpy()
+    sigma = tf.linalg.svd(jacobians, full_matrices=False, compute_uv=False, name=None)
+    norm2 = tf.reduce_max(sigma, axis=-1)
+    norm2 = tf.reduce_max(norm2).numpy()
     atol = 1e-5
-    return J_2norm < S+atol
+    return norm2 < (S + atol)
diff --git a/docs/CONTRIBUTING.md b/docs/CONTRIBUTING.md
@@ -4,14 +4,14 @@ Thanks for taking the time to contribute!
 
 From opening a bug report to creating a pull request: every contribution is
 appreciated and welcome. If you're planning to implement a new feature or change
-the api please create an [issue first](https://https://github.com/deel-ai/dp-lipschitz/issues/new). This way we can ensure that your precious
+the api please create an [issue first](https://github.com/Algue-Rythme/lip-dp/issues). This way we can ensure that your precious
 work is not in vain.
 
 
 ## Setup with make
 
-- Clone the repo `git clone https://github.com/deel-ai/dp-lipschitz.git`.
-- Go to your freshly downloaded repo `cd lipdp`
+- Clone the repo `git clone git@github.com:Algue-Rythme/lip-dp.git`.
+- Go to your freshly downloaded repo `cd lip-dp`
 - Create a virtual environment and install the necessary dependencies for development:
 
   `make prepare-dev && source lipdp_dev_env/bin/activate`.
@@ -26,9 +26,8 @@ This command activate your virtual environment and launch the `tox` command.
 
 
 `tox` on the otherhand will do the following:
-- run pytest on the tests folder with python 3.6, python 3.7 and python 3.8
-> Note: If you do not have those 3 interpreters the tests would be only performs with your current interpreter
-- run pylint on the deel-datasets main files, also with python 3.6, python 3.7 and python 3.8
+- run pytest on the tests folder
+- run pylint on the deel-datasets main files
 > Note: It is possible that pylint throw false-positive errors. If the linting test failed please check first pylint output to point out the reasons.
 
 Please, make sure you run all the tests at least once before opening a pull request.
@@ -42,7 +41,7 @@ Basically, it will check that your code follow a certain number of convention. A
 
 After getting some feedback, push to your fork and submit a pull request. We
 may suggest some changes or improvements or alternatives, but for small changes
-your pull request should be accepted quickly (see [Governance policy](https://github.com/deel-ai/lipdp/blob/master/GOVERNANCE.md)).
+your pull request should be accepted quickly (see [Governance policy](https://github.com/Algue-Rythme/lip-dp/blob/release-no-advertising/GOVERNANCE.md)).
 
 Something that will increase the chance that your pull request is accepted:
 

diff --git a/docs/assets/residuals.png b/docs/assets/residuals.png