-
Notifications
You must be signed in to change notification settings - Fork 1.2k
ART Unit Testing
The Adversarial Robustness Toolbox (ART) is a library which supports multiple frameworks simultaneously. For this reason, tests written for ART must be written keeping in mind that they will be ran across all frameworks supported by ART.
This page will clarify how tests should be written to achieve this end, presenting the conventions used as well as the various test helper tools available in ART to simplify this process.
Art makes heavy use of Pytest functionalities such as fixtures. Any information related to fixtures in general, can be found here.
The followings are good example ART tests that can be used as templates:
While debugging tests, it can become useful at times to run a given test with a specific framework. To do so, the
command line argument mlFramework
can be specified along with the relevant framework name.
pytest -q tests/estimators/classification/test_common_deeplearning.py --mlFramework=pytorch
The mlFramework
argument can be used with the following frameworks (tensorflow
, keras
, keras_tf
, pytorch
, mxnet
and scikitlearn
). If no framework is provided, ART will run the tests with a default framework of its choice.
In order to achieve framework agnosticity, ART provides a few pytest fixtures which hide any framework specific concerns
of the test code within the pytest conftest.py
files. This makes writing tests for ART much easier and cleaner.
A list of all relevant ART fixtures can be found below.
As a general rule, tests should only implement the test logic regardless of the framework being used. Any framework
specific code should be hidden and placed within the relevant pytest conftest.py
files.
The following example presents a typical ART test.
@pytest.mark.framework_agnostic
def test_myTest(art_warning, get_default_mnist_subset, get_image_classifier_list):
try:
(x_train_mnist, y_train_mnist), (x_test_mnist, y_test_mnist) = get_default_mnist_subset
classifier, sess = get_image_classifier_list(one_classifier=True)
# example test code
labels = np.argmax(y_test_mnist, axis=1)
accuracy_2 = np.sum(np.argmax(classifier.predict(x_test_mnist), axis=1) == labels) / x_test_mnist.shape[0]
assert accuracy_2 == 0.99
except ARTTestException as e:
art_warning(e)
-
get_default_mnist_subset: The test avails of the
get_default_mnist_subset
fixture which takes care of retrieving the Mnist dataset, shaped correctly for whatever framework this test will be run with. The Pytorch and Tensorflow frameworks for example expect different image channel orderings. This fixtures takes care of providing the test with the channel ordering corresponding to the framework being used. -
get_image_classifier_list: The
get_image_classifier_list
is used quite extensibly within the tests and creates an image classifier using the framework this test is being ran with. If a framework specific implementation for an ART component does not exist yet, the test will fail gracefully and simply output a warning to notify that the test could not be run with this framework due to a missing component. -
@pytest.mark.framework_agnostic: The
@pytest.mark.framework_agnostic
pytest marker should be used in most cases. It indicates that, although the test can be run successfully in any framework, it does not dependent upon any framework specific implementations. Hence there is no need to run the same test across all frameworks, only one random framework will suffice. ART will thus run this test with a random framework. While most tests will fit this category, a few exceptions will eventually occur. Tests located in test_common_deeplearning.py for example must always be run with all the frameworks since they check whether each framework specific implementations of ART classifiers produce the exact same outputs. -
try/except and art_warning: In some cases, framework specific implementations of classifiers or other components needed will not have been implemented yet for a given framework. In order to gracefully move on to the next test, ART tests should be contained within a
try/except
clause with aart_warning
should be created. This will produces a report after the testing completes listing the components implementations currently missing for a given framework.
In addition to using fixtures, the following conventions are used across ART tests.
- Test files names and test names themselves should not contain any reference to specific frameworks. For instance, tests named
test_feature_pytorch
should be renamed totest_feature
- As a rule of thumb, any framework specific test code (eg:
if framework == "tensorflow": do this
) should be placed in a relevant fixture in the appropriateconftest.py
file (see below). - In order to keep each test's framework specific limitations (if any) please do not place tests within a test class. In other words the following pattern
class TestMyNewFeature:
def test_feature1(self, param1):
pass
def test_feature2(self, param1):
pass
should be replaced by the following
def test_feature1(self, param1):
pass
def test_feature2(self, param1):
pass
- In order to increase test code readability, please refrain from hardcoding
np.asarray()
's within the test code. Instead please use thestore_expected_values
andexpected_values
fixtures for that purpose (see section below)
In order to keep ART tests maintainable over time, it is essential we use standardised fixtures across all tests. Hence whenever a new fixture is considered, please follow these guidelines:
- Before creating a pytest fixture, please ensure a similar one hasn't yet been created for another test (these can be found in all the
conftest.py
files within the project). - If a similar fixture already exists, please refrain from creating a similar fixture. Instead either a) try to alter your test to use that existing fixture, or b) improve the existing fixture to taken into account your new use case.
- If you feel there is really a need to create a new fixture, this fixture should be placed in a
conftest.py
file located in the directory where the test file using it is located. - Please do not hesitate to contact the project owners before creating a new fixture.
- An ART-wide random generator master seed is already set within the project root
conftest.py
file. Hence there is no need to add suchmaster_seed(1234)
seeds within test code. - If the same test needs to be ran for multiple combinations of parameters, please do not create loops for each parameter combination. Instead please use the standard pytest
@pytest.mark.parametrize
parameterization(eg: test_deeplearning_common.py::test_loss_functions() ) - If test code is repeated across tests, please instead encapsulate this repeated code in a method named
back_end_<testing_this>
and call this method in each test (eg: test_membership_inference.py::backend_check_accuracy())
Here is a list of most common ART fixtures available when writing tests. They can be found in any of the pytest
conftest.py
files within the project
Fixture Name | Purpose |
---|---|
get_mnist_dataset |
provides the mnist dataset with the image channel ordered for the relevant framework being used |
get_mnist_dataset |
provides the mnist dataset with the image channel ordered for the relevant framework being used |
get_iris_dataset |
provides the iris dataset with the image channel ordered for the relevant framework being used |
get_default_mnist_subset |
provides a smaller mnist dataset |
image_data_generator |
provides the mnist dataset as a data generator |
mnist_shape |
provides the shape of the mnist dataset based on where the channel is positioned |
image_iterator |
returns an image iterator specific to framework the test is being run with |
image_data_generator |
returns an image data generator specific to framework the test is being run with |
create_test_image |
creates a default test image |
Fixture Name | Purpose |
---|---|
image_dl_estimator |
provides an image deep learning estimator corresponding to the framework the test is being run with |
tabular_dl_estimator |
provides a tabular deep learning estimator corresponding to the framework the test is being run with |
image_dl_estimator_defended |
provides a defended version of the estimator returned by image_dl_estimator
|
image_dl_estimator_for_attack |
provides an image deep learning estimator for the framework the test is being run with usable to perform a specific attack |
decision_tree_estimator |
return a decision tree estimator specific to framework the test is being run with |
Fixture Name | Purpose |
---|---|
framework |
returns the name framework type this test is current running with |
create_test_dir |
creates a temporary test directory |
store_expected_values |
stores any large value needed for a test in a json file. The expected_values can be use thereafter to retrieves these values |
expected_values |
Retrieves values expected for a given test, previously stored using the store_expected_values . This fixture identifies whether a value needed for this test should take into account what framework the tests is being run with or not. |
Fixture Name | Purpose |
---|---|
@pytest.mark.framework_agnostic |
indicates that, although the test can be run successfully in any framework, it does not dependent upon any framework specific implementations. Hence there is no need to run the same test across all frameworks, only one random framework will suffice. |
@pytest.mark.skipMlFramework("tensorflow","scikitlearn", etc...) |
indicates that a test currently fails when ran using a specific to be skipped for specific mlFramework values. Valid values are: "tensorflow1" , "tensorflow2" , "keras" , "kerastf" , "pytorch" , "mxnet" , "scikitlearn" , as well as "tensorflow" (shorthand for "tensorflow1" , "tensorflow2" ) "dl_frameworks" (shorthand for "tensorflow" , "keras" , "kerastf" , "pytorch" , "mxnet" ), "non_dl_frameworks" (shorthand for "scikitlearn" ) |
@pytest.mark.skip_travis() |
to be used in exceptional circumstances. This indicates that this test should be ignored by Travis |
DEPRECATED @pytest.mark.only_with_platform("keras")
|
This marker is deprecated and should only be used for legacy tests that are not yet framework agnostic. Instead use @pytest.mark.skipMlFramework
|
At times, tests require to assert that a given component produces an expected value. Such expected values can at times be numerous and consist of very large arrays which make the test code unnecessary convoluted and much harder to read. ART provides two helper fixtures which cache any expected values required and thus makes your test code much more readable and small.
While writing your test, the first version of your test using hardcoded expected values can use the
store_expected_values
fixture in order to cache such values as follows:
@pytest.mark.framework_agnostic
def test_myTest(get_default_mnist_subset, get_image_classifier_list, store_expected_values):
try:
(x_train_mnist, y_train_mnist), (x_test_mnist, y_test_mnist) = get_default_mnist_subset
classifier, sess = get_image_classifier_list(one_classifier=True)
expected_value1 = np.asarray(
[
0.0000000e00,
0.0000000e00,
0.0000000e00,
2.3582461e-03,
4.8802234e-04,
1.6699843e-03,
-6.4777887e-05,
-1.4215634e-03,
-1.3359448e-04,
2.0448549e-03,
2.8171093e-04,
1.9665064e-04,
1.5335126e-03,
1.7000455e-03,
-2.0136381e-04,
6.4588618e-04,
2.0524357e-03,
2.1990810e-03,
8.3692279e-04,
0.0000000e00,
0.0000000e00,
0.0000000e00,
0.0000000e00,
0.0000000e00,
0.0000000e00,
0.0000000e00,
0.0000000e00,
0.0000000e00,
]
)
# ... more expected value arrays
# example test code
labels = np.argmax(y_test_mnist, axis=1)
accuracy_2 = np.sum(np.argmax(classifier.predict(x_test_mnist), axis=1) == labels) / x_test_mnist.shape[0]
assert accuracy_2 == expected_value1
store_expected_values(expected_value1, expected_value2, ...)
except NotImplementedError as e:
warnings.warn(UserWarning(e))
Once the expected values have been cached, the final version of the test can be increased in readability and simplicity
by using the expected_values
fixture as follows:
@pytest.mark.framework_agnostic
def test_myTest(get_default_mnist_subset, get_image_classifier_list, expected_values):
try:
(x_train_mnist, y_train_mnist), (x_test_mnist, y_test_mnist) = get_default_mnist_subset
# this test is being run with
classifier, sess = get_image_classifier_list(one_classifier=True)
#retrieve the cached expected values
(expected_value1, expected_value2, ...) = expected_values
# example test code
labels = np.argmax(y_test_mnist, axis=1)
accuracy_2 = np.sum(np.argmax(classifier.predict(x_test_mnist), axis=1) == labels) / x_test_mnist.shape[0]
assert accuracy_2 == expected_value1
except NotImplementedError as e:
warnings.warn(UserWarning(e))