Add Tabnet support (#168)

Tianzhang Cai · bcebere · robsdavis · web-flow · commit a4190e6941e5 · 2023-04-20T10:05:38.000+01:00
* first commit for the addition of the TabDDPM plugin

* Add DDPM test script and update DDPM plugin

* add TabDDPM class and refactor

* handle discrete cols and label generation

* add hparam space and update tests of DDPM

* debug and test DDPM

* update TensorDataLoader and training loop

* clear bugs

* debug for regression tasks

* debug for regression tasks; ALL TESTS PASSED

* remove the official repo of TabDDPM

* passed all pre-commit checks

* convert assert to conditional AssertionErrors

* added an auto annotation tool

* update auto-anno and generate annotations

* remove auto-anno and flake8 noqa

* add python&lt;3.9 compatible annotations

* remove star import

* replace builtin type annos to typing annos

* resolve py38 compatibility issue

* tests/plugins/generic/test_ddpm.py

* change TabDDPM method signatures

* remove Iterator subscription

* update AssertionErrors, add EarlyStop callback, removed additional MLP, update logging

* remove TensorDataLoader, update test_ddpm

* update EarlyStopping

* add TabDDPM tutorial, update TabDDPM plugin and encoders

* add TabDDPM tutorial

* major update of FeatureEncoder and TabularEncoder

* add LogDistribution and LogIntDistribution

* update DDPM to use TabularEncoder

* update test_tabular_encoder and debug

* debug and DDPM tutorial OK

* debug LogDistribution and LogIntDistribution

* change discrete encoding of BinEncoder to passthrough;  passed all tests in test_tabular_encoder

* add tabnet to plugins/core/models

* add factory.py, let DDPM use TabNet, refactor

* update docstrings and refactor

* fix type annotation compatibility

* make SkipConnection serializable

* fix TabularEncoder.activation_layout

* remove unnecessary code

* fix minor bug and add more nn models in factory

* update pandas and torch version requirement

* update pandas and torch version requirement

* update ddpm tutorial

* restore setup.cfg

* restore setup.cfg

* replace LabelEncoder with OrdinalEncoder

* update setup.cfg

* update setup.cfg

* debug datetimeDistribution

* clean

* update setup.cfg and goggle test

* move DDPM tutorial to tutorials/plugins

* update tabnet.py reference

* update tab_ddpm

* update

* try fixing goggle

* add more activations

* minor fix

* update

* update

* update

* update

* Update tabular_encoder.py

* Update test_goggle.py

* Update tabular_encoder.py

* update

* update

* default cat nonlin of goggle &lt;- gumbel_softmax

* get_nonlin('softmax') &lt;- GumbelSoftmax()

* remove debug logging

* update

* update

* fix merge

* update pip upgrade commands in workflows

* keep pip version to 23.0.1 in workflows

---------

Co-authored-by: Bogdan Cebere &lt;bogdan.cebere@gmail.com&gt;
Co-authored-by: Rob &lt;62107751+robsdavis@users.noreply.github.com&gt;
diff --git a/.github/workflows/test_full.yml b/.github/workflows/test_full.yml
@@ -27,8 +27,8 @@ jobs:
         if: ${{ matrix.os == 'macos-latest' }}
       - name: Install dependencies
         run: |
+            pip install pip==23.0.1
             pip install -r prereq.txt
-            pip install --upgrade pip
       - name: Test Core
         run: |
           pip install .[testing]
diff --git a/.github/workflows/test_pr.yml b/.github/workflows/test_pr.yml
@@ -54,8 +54,8 @@ jobs:
         if: ${{ matrix.os == 'macos-latest' }}
       - name: Install dependencies
         run: |
+            pip install pip==23.0.1
             pip install -r prereq.txt
-            pip install --upgrade pip
       - name: Test Core
         run: |
           pip install .[testing]
diff --git a/.github/workflows/test_tutorials.yml b/.github/workflows/test_tutorials.yml
@@ -32,8 +32,8 @@ jobs:
         if: ${{ matrix.os == 'macos-latest' }}
       - name: Install dependencies
         run: |
+            pip install pip==23.0.1
             pip install -r prereq.txt
-            pip install --upgrade pip
 
             pip install .[all]
 
diff --git a/src/synthcity/plugins/core/models/factory.py b/src/synthcity/plugins/core/models/factory.py
@@ -20,9 +20,9 @@
     DatetimeEncoder,
     FeatureEncoder,
     GaussianQuantileTransformer,
-    LabelEncoder,
     MinMaxScaler,
     OneHotEncoder,
+    OrdinalEncoder,
     RobustScaler,
     StandardScaler,
 )
@@ -74,7 +74,7 @@
 FEATURE_ENCODERS = dict(
     datetime=DatetimeEncoder,
     onehot=OneHotEncoder,
-    label=LabelEncoder,
+    ordinal=OrdinalEncoder,
     standard=StandardScaler,
     minmax=MinMaxScaler,
     robust=RobustScaler,
diff --git a/src/synthcity/plugins/core/models/tabnet.py b/src/synthcity/plugins/core/models/tabnet.py
@@ -1,3 +1,8 @@
+# TabNet: Attentive Interpretable Tabular Learning
+# Reference:
+# - https://arxiv.org/pdf/1908.07442.pdf
+# - https://github.com/dreamquark-ai/tabnet
+
 # stdlib
 from typing import List, Optional, Tuple