Skip to content

FedAS NaN issue on Tiny-Imagenet with Dirichlet(0.1), 20 clients #246

@appleiphonedddd

Description

@appleiphonedddd

Hi

When training FedAS on Tiny-Imagenet with 20 clients and Dirichlet(α=0.1) for data partitioning, I encountered NaN values in the FIM trace after 182 rounds. This eventually caused a ValueError in roc_auc_score

Setup

Dataset: Tiny-Imagenet

Clients: 20

Data distribution: Dirichlet(α=0.1)

Algorithm: FedAS

Rounds: 300

-------------Round number: 182-------------

Evaluate global model
Traceback (most recent call last):
File "/home/infor/Code/Master-thesis/system/main.py", line 554, in
run(args)
File "/home/infor/Code/Master-thesis/system/main.py", line 367, in run
server.train()
File "/home/infor/Code/Master-thesis/system/flcore/servers/serveras.py", line 72, in train
self.evaluate()
File "/home/infor/Code/Master-thesis/system/flcore/servers/serverbase.py", line 229, in evaluate
stats = self.test_metrics()
^^^^^^^^^^^^^^^^^^^
File "/home/infor/Code/Master-thesis/system/flcore/servers/serverbase.py", line 203, in test_metrics
ct, ns, auc = c.test_metrics()
^^^^^^^^^^^^^^^^
File "/home/infor/Code/Master-thesis/system/flcore/clients/clientbase.py", line 118, in test_metrics
auc = metrics.roc_auc_score(y_true, y_prob, average='micro')
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/infor/miniconda3/envs/PFL/lib/python3.11/site-packages/sklearn/utils/_param_validation.py", line 218, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/infor/miniconda3/envs/PFL/lib/python3.11/site-packages/sklearn/metrics/_ranking.py", line 665, in roc_auc_score
y_score = check_array(y_score, ensure_2d=False)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/infor/miniconda3/envs/PFL/lib/python3.11/site-packages/sklearn/utils/validation.py", line 1105, in check_array
_assert_all_finite(
File "/home/infor/miniconda3/envs/PFL/lib/python3.11/site-packages/sklearn/utils/validation.py", line 120, in _assert_all_finite
_assert_all_finite_element_wise(
File "/home/infor/miniconda3/envs/PFL/lib/python3.11/site-packages/sklearn/utils/validation.py", line 169, in _assert_all_finite_element_wise
raise ValueError(msg_err)
ValueError: Input contains NaN.

Additional information
Let me know if more logs or config files are needed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions