Add Datasize Tracking, persistent training loss json and share model … #4
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Summary of Changes:
I have implemented several key enhancements to the Federated Learning (FL) Client component to improve the training process's efficiency, fairness, and transparency. The main changes include:
Per-Epoch Loss Logging to JSON:
train_model()
function to log the average training loss for each epoch into atraining_loss_round_{round_num}.json
file in addition to the existing text logs.Dataset Size Tracking:
dataset_size.json
file that records the total number of samples used in training.Persistence of Training Loss JSON:
training_loss_round_{round_num}.json
file is copied to the client's public folder (public/fl/<project_name>/
), ensuring that the training logs remain accessible even after the project concludes.Sharing Model and Dataset Info:
trained_model_round_{round_num}.pt
) anddataset_size.json
to the aggregator's side.Bug Fixes and Miscellaneous Improvements:
FileNotFoundError
by ensuring thefl_config.json
is correctly placed in therunning/<project_name>/
folder on the client side.Motivation:
These enhancements aim to make the federated learning pipeline more robust, fair, and efficient. Logging training loss in a structured format allows for better monitoring and analysis, while tracking dataset sizes ensures that the aggregation process fairly represents each participant's contribution. Persisting logs in the public folder enhances transparency, and the bug fixes ensure reliability and smooth operation.
Additional Context:
Affected Dependencies
json
module for handling JSON operations.How has this been tested?
Unit Tests:
training_loss_round_{round_num}.json
is correctly created and updated during each training round.dataset_size.json
accurately reflects the number of samples used in training.Integration Tests:
training_loss_round_{round_num}.json
to the public folder upon training completion.dataset_size.json
are successfully shared with the aggregator.Manual Testing:
training_loss_round_{round_num}.json
is generated and moved to the public folder.dataset_size.json
and verified that it contains accurate dataset size information.dataset_size.json
files for weighted aggregation.FileNotFoundError
related tofl_config.json
is resolved by correctly placing the configuration file in therunning/<project_name>/
folder.Instructions to Reproduce:
Set Up:
fl_config.json
,model.py
, andglobal_model_weights.pt
are correctly placed in thelaunch
folder.Run Training:
training_loss_round_{round_num}.json
is being generated and copied to the public folder.Verify Aggregation:
dataset_size.json
from each client and performs weighted aggregation.Review Logs and Metrics:
public/fl/<project_name>
folder to review the training loss JSON files and verify their persistence after training completion.dataset_size.json
is present and accurate.