Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce functionality for chunking experiments #10

Closed
wants to merge 20 commits into from
Closed

Conversation

mcw92
Copy link
Member

@mcw92 mcw92 commented Oct 24, 2024

This PR introduces functionality for chunking experiments. The following changes have been made:

  • Make the synthetic data generation consistent throughout the code. This means that in the serial case, the dataset generated with generate_and_distribute_synthetic_dataset without local or global imbalances equals the completely balanced dataset generated with make_classification_dataset when using the same random state. This ensures comparability of the strong scaling experiment series with and without chunking as the same datasets are created when passing the same random state.
  • Fix passing additional keyword arguments in both train_parallel_on_synthetic data and train_parallel_on_balanced_synthetic_data. This was completely missing in the former case. In addition, the argument parser was lacking some of the keyword arguments used in sklearn's make_classification and train_test_split used under the hood.

@mcw92 mcw92 requested a review from fluegelk October 24, 2024 11:32
@mcw92 mcw92 self-assigned this Oct 24, 2024
Copy link
Contributor

github-actions bot commented Oct 24, 2024

Name Stmts Miss Cover Missing
specialcouscous/__init__.py 0 0 100%
specialcouscous/rf_parallel.py 119 9 92% 89-93, 194, 198, 271-273, 447
specialcouscous/synthetic_classification_data.py 215 49 77% 88-90, 185, 304-324, 358, 469, 471, 561-567, 585, 871-885, 1095-1151, 1224-1246
specialcouscous/train.py 260 1 99% 525
specialcouscous/utils/__init__.py 61 33 46% 31, 81-82, 106-287
specialcouscous/utils/plot.py 136 74 46% 152, 277-302, 319-405, 421-547
specialcouscous/utils/result_handling.py 22 1 95% 79
specialcouscous/utils/slurm.py 79 72 9% 22-116, 133-149, 166-177
specialcouscous/utils/timing.py 35 0 100%
TOTAL 927 239 74%

@codecov-commenter
Copy link

codecov-commenter commented Oct 24, 2024

Codecov Report

Attention: Patch coverage is 93.10345% with 2 lines in your changes missing coverage. Please review.

Project coverage is 74.21%. Comparing base (864d122) to head (0e91832).
Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
specialcouscous/utils/__init__.py 0.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #10      +/-   ##
==========================================
+ Coverage   73.62%   74.21%   +0.59%     
==========================================
  Files           8        8              
  Lines         906      927      +21     
==========================================
+ Hits          667      688      +21     
  Misses        239      239              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@mcw92 mcw92 closed this Dec 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants