Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Support for RS2 Downsampler #465

Merged
merged 14 commits into from
Jun 4, 2024
Merged

Conversation

MaxiBoether
Copy link
Contributor

@MaxiBoether MaxiBoether commented Jun 3, 2024

This implements the random selection from the RS2 paper minus the learning rate scheduling adjustments.

Note that it is a bit suboptimal to use the downsampling infrastructure here (#466). We might want to think about making the selector a bit more dynamic, but for now, this will suffice to run experiments. #462 should be merged before this is reviewed.

@MaxiBoether MaxiBoether changed the base branch from main to feature/MaxiBoether/disable-grad-downmsaple June 3, 2024 13:48
@MaxiBoether MaxiBoether changed the base branch from feature/MaxiBoether/disable-grad-downmsaple to main June 3, 2024 13:50
Copy link

github-actions bot commented Jun 3, 2024

✅ Result of Pytest Coverage

---------- coverage: platform linux, python 3.12.3-final-0 -----------

Name Stmts Miss Cover
modyn/common/benchmark/stopwatch.py 26 0 100%
modyn/common/example_extension/example_extension.py 28 2 93%
modyn/common/ftp/ftp_server.py 31 18 42%
modyn/common/ftp/ftp_utils.py 83 69 17%
modyn/common/grpc/grpc_helpers.py 67 36 46%
modyn/common/trigger_sample/trigger_sample_storage.py 158 9 94%
modyn/config/schema/config.py 93 0 100%
modyn/config/schema/modyn_base_model.py 5 0 100%
modyn/config/schema/pipeline.py 245 20 92%
modyn/config/schema/sampling/downsampling_config.py 61 1 98%
modyn/database/abstract_database_connection.py 35 0 100%
modyn/database/partition_by_meta.py 33 12 64%
modyn/evaluator/evaluator.py 15 0 100%
modyn/evaluator/evaluator_entrypoint.py 32 3 91%
modyn/evaluator/internal/dataset/evaluation_dataset.py 75 3 96%
modyn/evaluator/internal/grpc/evaluator_grpc_server.py 22 0 100%
modyn/evaluator/internal/grpc/evaluator_grpc_servicer.py 165 14 92%
modyn/evaluator/internal/metric_factory.py 18 1 94%
modyn/evaluator/internal/metrics/abstract_decomposable_metric.py 10 1 90%
modyn/evaluator/internal/metrics/abstract_evaluation_metric.py 29 2 93%
modyn/evaluator/internal/metrics/abstract_holistic_metric.py 10 1 90%
modyn/evaluator/internal/metrics/accuracy.py 20 2 90%
modyn/evaluator/internal/metrics/f1_score.py 63 0 100%
modyn/evaluator/internal/metrics/roc_auc.py 36 1 97%
modyn/evaluator/internal/pytorch_evaluator.py 113 28 75%
modyn/evaluator/internal/utils/evaluation_info.py 9 0 100%
modyn/evaluator/internal/utils/evaluation_process_info.py 8 0 100%
modyn/evaluator/internal/utils/evaluator_messages.py 3 0 100%
modyn/metadata_database/metadata_base.py 3 0 100%
modyn/metadata_database/metadata_database_connection.py 55 3 95%
modyn/metadata_database/models/pipelines.py 24 1 96%
modyn/metadata_database/models/sample_training_metadata.py 15 0 100%
modyn/metadata_database/models/selector_state_metadata.py 47 10 79%
modyn/metadata_database/models/trained_models.py 18 0 100%
modyn/metadata_database/models/trigger_partitions.py 10 0 100%
modyn/metadata_database/models/trigger_training_metadata.py 14 0 100%
modyn/metadata_database/models/triggers.py 10 0 100%
modyn/metadata_database/utils/model_storage_strategy_config.py 21 2 90%
modyn/metadata_processor/internal/grpc/metadata_processor_grpc_servicer.py 18 0 100%
modyn/metadata_processor/internal/grpc/metadata_processor_server.py 24 0 100%
modyn/metadata_processor/internal/metadata_processor_manager.py 23 4 83%
modyn/metadata_processor/metadata_processor.py 11 0 100%
modyn/metadata_processor/metadata_processor_entrypoint.py 24 1 96%
modyn/metadata_processor/processor_strategies/abstract_processor_strategy.py 30 0 100%
modyn/metadata_processor/processor_strategies/basic_processor_strategy.py 17 2 88%
modyn/metadata_processor/processor_strategies/processor_strategy_type.py 6 1 83%
modyn/model_storage/internal/grpc/grpc_server.py 23 0 100%
modyn/model_storage/internal/grpc/model_storage_grpc_servicer.py 54 0 100%
modyn/model_storage/internal/model_storage_manager.py 118 5 96%
modyn/model_storage/internal/storage_strategies/abstract_difference_operator.py 11 2 82%
modyn/model_storage/internal/storage_strategies/abstract_model_storage_strategy.py 16 1 94%
modyn/model_storage/internal/storage_strategies/difference_operators/sub_difference_operator.py 12 0 100%
modyn/model_storage/internal/storage_strategies/difference_operators/xor_difference_operator.py 14 0 100%
modyn/model_storage/internal/storage_strategies/full_model_strategies/abstract_full_model_strategy.py 26 2 92%
modyn/model_storage/internal/storage_strategies/full_model_strategies/binary_full_model.py 16 0 100%
modyn/model_storage/internal/storage_strategies/full_model_strategies/pytorch_full_model.py 15 0 100%
modyn/model_storage/internal/storage_strategies/incremental_model_strategies/abstract_incremental_model_strategy.py 26 10 62%
modyn/model_storage/internal/storage_strategies/incremental_model_strategies/weights_difference.py 99 1 99%
modyn/model_storage/internal/utils/model_storage_policy.py 35 0 100%
modyn/model_storage/model_storage.py 27 3 89%
modyn/model_storage/model_storage_entrypoint.py 32 3 91%
modyn/models/articlenet/articlenet.py 30 16 47%
modyn/models/coreset_methods_support.py 29 1 97%
modyn/models/dlrm/cuda_ext/dot_based_interact.py 24 13 46%
modyn/models/dlrm/cuda_ext/fused_gather_embedding.py 16 16 0%
modyn/models/dlrm/cuda_ext/sparse_embedding.py 32 32 0%
modyn/models/dlrm/dlrm.py 67 9 87%
modyn/models/dlrm/nn/embeddings.py 123 64 48%
modyn/models/dlrm/nn/factories.py 24 9 62%
modyn/models/dlrm/nn/interactions.py 50 11 78%
modyn/models/dlrm/nn/mlps.py 77 23 70%
modyn/models/dlrm/nn/parts.py 60 4 93%
modyn/models/dlrm/setup.py 5 5 0%
modyn/models/dlrm/utils/install_lib.py 11 7 36%
modyn/models/dlrm/utils/utils.py 28 0 100%
modyn/models/dummy/dummy.py 12 0 100%
modyn/models/fmownet/fmownet.py 25 0 100%
modyn/models/resnet18/resnet18.py 28 0 100%
modyn/models/resnet50/resnet50.py 28 0 100%
modyn/models/resnet152/resnet152.py 28 0 100%
modyn/models/tokenizers/distill_bert_tokenizer.py 11 0 100%
modyn/models/yearbooknet/yearbooknet.py 23 0 100%
modyn/selector/internal/grpc/selector_grpc_servicer.py 78 22 72%
modyn/selector/internal/grpc/selector_server.py 33 12 64%
modyn/selector/internal/selector_manager.py 125 37 70%
modyn/selector/internal/selector_strategies/abstract_selection_strategy.py 125 8 94%
modyn/selector/internal/selector_strategies/coreset_strategy.py 66 6 91%
modyn/selector/internal/selector_strategies/downsampling_strategies/abstract_downsampling_strategy.py 29 0 100%
modyn/selector/internal/selector_strategies/downsampling_strategies/craig_downsampling_strategy.py 18 12 33%
modyn/selector/internal/selector_strategies/downsampling_strategies/downsampling_scheduler.py 51 0 100%
modyn/selector/internal/selector_strategies/downsampling_strategies/gradmatch_downsampling_strategy.py 14 8 43%
modyn/selector/internal/selector_strategies/downsampling_strategies/gradnorm_downsampling_strategy.py 6 0 100%
modyn/selector/internal/selector_strategies/downsampling_strategies/kcentergreedy_downsampling_strategy.py 14 8 43%
modyn/selector/internal/selector_strategies/downsampling_strategies/loss_downsampling_strategy.py 6 0 100%
modyn/selector/internal/selector_strategies/downsampling_strategies/no_downsampling_strategy.py 10 0 100%
modyn/selector/internal/selector_strategies/downsampling_strategies/rho_loss_downsampling_strategy.py 56 4 93%
modyn/selector/internal/selector_strategies/downsampling_strategies/rs2_downsampling_strategy.py 10 0 100%
modyn/selector/internal/selector_strategies/downsampling_strategies/submodular_downsampling_strategy.py 20 14 30%
modyn/selector/internal/selector_strategies/downsampling_strategies/uncertainty_downsampling_strategy.py 15 9 40%
modyn/selector/internal/selector_strategies/downsampling_strategies/utils.py 7 0 100%
modyn/selector/internal/selector_strategies/freshness_sampling_strategy.py 130 12 91%
modyn/selector/internal/selector_strategies/new_data_strategy.py 98 10 90%
modyn/selector/internal/selector_strategies/presampling_strategies/abstract_balanced_strategy.py 57 0 100%
modyn/selector/internal/selector_strategies/presampling_strategies/abstract_presampling_strategy.py 23 1 96%
modyn/selector/internal/selector_strategies/presampling_strategies/label_balanced_presampling_strategy.py 7 0 100%
modyn/selector/internal/selector_strategies/presampling_strategies/no_presampling_strategy.py 16 1 94%
modyn/selector/internal/selector_strategies/presampling_strategies/random_no_replacement_presampling_strategy.py 42 0 100%
modyn/selector/internal/selector_strategies/presampling_strategies/random_presampling_strategy.py 17 0 100%
modyn/selector/internal/selector_strategies/presampling_strategies/trigger_balanced_presampling_strategy.py 13 1 92%
modyn/selector/internal/selector_strategies/presampling_strategies/utils.py 9 0 100%
modyn/selector/internal/selector_strategies/utils.py 10 0 100%
modyn/selector/internal/storage_backend/abstract_storage_backend.py 34 7 79%
modyn/selector/internal/storage_backend/database/database_storage_backend.py 85 7 92%
modyn/selector/internal/storage_backend/local/local_storage_backend.py 136 5 96%
modyn/selector/selector.py 82 14 83%
modyn/selector/selector_entrypoint.py 31 3 90%
modyn/supervisor/entrypoint.py 31 3 90%
modyn/supervisor/internal/eval_strategies/abstract_eval_strategy.py 8 1 88%
modyn/supervisor/internal/eval_strategies/matrix_eval_strategy.py 17 0 100%
modyn/supervisor/internal/eval_strategies/offset_eval_strategy.py 22 0 100%
modyn/supervisor/internal/evaluation_result_writer/abstract_evaluation_result_writer.py 16 2 88%
modyn/supervisor/internal/evaluation_result_writer/json_result_writer.py 23 1 96%
modyn/supervisor/internal/evaluation_result_writer/tensorboard_result_writer.py 13 0 100%
modyn/supervisor/internal/grpc/enums.py 55 0 100%
modyn/supervisor/internal/grpc/supervisor_grpc_server.py 25 7 72%
modyn/supervisor/internal/grpc/supervisor_grpc_servicer.py 35 0 100%
modyn/supervisor/internal/grpc/template_msg.py 26 0 100%
modyn/supervisor/internal/grpc_handler.py 301 36 88%
modyn/supervisor/internal/pipeline_executor/models.py 256 34 87%
modyn/supervisor/internal/pipeline_executor/pipeline_executor.py 361 18 95%
modyn/supervisor/internal/supervisor.py 144 17 88%
modyn/supervisor/internal/triggers/amounttrigger.py 15 0 100%
modyn/supervisor/internal/triggers/datadrifttrigger.py 102 28 73%
modyn/supervisor/internal/triggers/embedding_encoder_utils/embedding_encoder.py 30 19 37%
modyn/supervisor/internal/triggers/embedding_encoder_utils/embedding_encoder_downloader.py 50 31 38%
modyn/supervisor/internal/triggers/timetrigger.py 26 3 88%
modyn/supervisor/internal/triggers/trigger.py 21 1 95%
modyn/supervisor/internal/triggers/trigger_datasets/dataloader_info.py 16 13 19%
modyn/supervisor/internal/triggers/trigger_datasets/fixed_keys_dataset.py 72 3 96%
modyn/supervisor/internal/triggers/trigger_datasets/online_trigger_dataset.py 17 1 94%
modyn/supervisor/internal/triggers/utils.py 50 37 26%
modyn/supervisor/internal/utils/evaluation_status_reporter.py 31 0 100%
modyn/supervisor/internal/utils/pipeline_info.py 30 9 70%
modyn/supervisor/internal/utils/training_status_reporter.py 24 3 88%
modyn/tests/common/example_extension/test_example_extension.py 13 0 100%
modyn/tests/common/grpc/test_grpc_helpers.py 3 0 100%
modyn/tests/common/trigger_sample/test_trigger_sample_storage.py 128 0 100%
modyn/tests/config/schema/test_pipeline.py 35 0 100%
modyn/tests/config/test_config_integrity.py 36 1 97%
modyn/tests/conftest.py 39 0 100%
modyn/tests/database/test_abstract_database_connection.py 19 0 100%
modyn/tests/evaluator/internal/dataset/test_evaluation_dataset.py 131 2 98%
modyn/tests/evaluator/internal/grpc/test_evaluator_grpc_server.py 20 0 100%
modyn/tests/evaluator/internal/grpc/test_evaluator_grpc_servicer.py 365 16 96%
modyn/tests/evaluator/internal/metrics/test_accuracy.py 45 0 100%
modyn/tests/evaluator/internal/metrics/test_f1_score.py 53 0 100%
modyn/tests/evaluator/internal/metrics/test_roc_auc.py 31 0 100%
modyn/tests/evaluator/internal/test_metric_factory.py 13 0 100%
modyn/tests/evaluator/internal/test_pytorch_evaluator.py 163 19 88%
modyn/tests/evaluator/test_evaluator.py 30 0 100%
modyn/tests/evaluator/test_evaluator_entrypoint.py 21 0 100%
modyn/tests/metadata_database/models/test_pipelines.py 50 0 100%
modyn/tests/metadata_database/models/test_sample_training_metadata.py 40 0 100%
modyn/tests/metadata_database/models/test_selector_state_metadata.py 46 0 100%
modyn/tests/metadata_database/models/test_trained_models.py 48 0 100%
modyn/tests/metadata_database/models/test_trigger_training_metadata.py 38 0 100%
modyn/tests/metadata_database/models/test_triggers.py 33 0 100%
modyn/tests/metadata_database/test_metadata_database_connection.py 47 0 100%
modyn/tests/metadata_processor/internal/grpc/test_metadata_processor_grpc_servicer.py 26 0 100%
modyn/tests/metadata_processor/internal/grpc/test_metadata_processor_server.py 27 0 100%
modyn/tests/metadata_processor/internal/test_metadata_processor_manager.py 42 3 93%
modyn/tests/metadata_processor/processor_strategies/test_abstract_processor_strategy.py 60 0 100%
modyn/tests/metadata_processor/processor_strategies/test_basic_processor_strategy.py 43 0 100%
modyn/tests/metadata_processor/test_metadata_processor.py 22 3 86%
modyn/tests/metadata_processor/test_metadata_processor_entrypoint.py 21 0 100%
modyn/tests/model_storage/internal/grpc/test_model_storage_grpc_server.py 16 0 100%
modyn/tests/model_storage/internal/grpc/test_model_storage_grpc_servicer.py 100 0 100%
modyn/tests/model_storage/internal/storage_strategies/difference_operators/test_sub_difference_operator.py 16 0 100%
modyn/tests/model_storage/internal/storage_strategies/difference_operators/test_xor_difference_operator.py 16 0 100%
modyn/tests/model_storage/internal/storage_strategies/full_model_strategies/test_binary_full_model.py 27 1 96%
modyn/tests/model_storage/internal/storage_strategies/full_model_strategies/test_pytorch_full_model.py 36 1 97%
modyn/tests/model_storage/internal/storage_strategies/incremental_model_strategies/test_weights_difference.py 88 2 98%
modyn/tests/model_storage/internal/test_model_storage_manager.py 217 1 99%
modyn/tests/model_storage/internal/utils/test_model_storage_policy.py 28 0 100%
modyn/tests/model_storage/test_model_storage.py 37 0 100%
modyn/tests/model_storage/test_model_storage_entrypoint.py 21 0 100%
modyn/tests/models/test_bert_tokenizer.py 24 0 100%
modyn/tests/models/test_dlrm.py 46 0 100%
modyn/tests/models/test_dummy.py 8 0 100%
modyn/tests/models/test_embedding_recorder.py 27 0 100%
modyn/tests/models/test_fmownet.py 25 0 100%
modyn/tests/models/test_resnet18.py 22 0 100%
modyn/tests/models/test_resnet50.py 22 0 100%
modyn/tests/models/test_resnet152.py 22 0 100%
modyn/tests/models/test_yearbook_net.py 47 0 100%
modyn/tests/selector/internal/grpc/test_selector_grpc_servicer.py 132 0 100%
modyn/tests/selector/internal/grpc/test_selector_server.py 16 0 100%
modyn/tests/selector/internal/selector_strategies/downsampling_strategies/test_abstract_downsampling_strategy.py 14 0 100%
modyn/tests/selector/internal/selector_strategies/downsampling_strategies/test_gradnorm_downsampling_strategy.py 14 0 100%
modyn/tests/selector/internal/selector_strategies/downsampling_strategies/test_loss_downsampling_strategy.py 18 0 100%
modyn/tests/selector/internal/selector_strategies/downsampling_strategies/test_no_downsampling_strategy.py 6 0 100%
modyn/tests/selector/internal/selector_strategies/downsampling_strategies/test_rho_loss_downsampling_strategy.py 112 0 100%
modyn/tests/selector/internal/selector_strategies/downsampling_strategies/test_rs2_downsampling_strategy.py 18 0 100%
modyn/tests/selector/internal/selector_strategies/downsampling_strategies/test_scheduler.py 131 0 100%
modyn/tests/selector/internal/selector_strategies/presampling_strategies/test_abstract_balanced_strategy.py 14 0 100%
modyn/tests/selector/internal/selector_strategies/presampling_strategies/test_empty_presampling_strategy.py 0 0 100%
modyn/tests/selector/internal/selector_strategies/presampling_strategies/test_label_balanced_presampling_strategy.py 165 0 100%
modyn/tests/selector/internal/selector_strategies/presampling_strategies/test_random_no_replacement_presampling_strategy.py 52 0 100%
modyn/tests/selector/internal/selector_strategies/presampling_strategies/test_random_presampling_strategy.py 86 0 100%
modyn/tests/selector/internal/selector_strategies/presampling_strategies/test_trigger_balanced_presampling.py 140 0 100%
modyn/tests/selector/internal/selector_strategies/test_abstract_selection_strategy.py 170 0 100%
modyn/tests/selector/internal/selector_strategies/test_coreset_strategy.py 246 0 100%
modyn/tests/selector/internal/selector_strategies/test_freshness_sampling_strategy.py 300 0 100%
modyn/tests/selector/internal/selector_strategies/test_new_data_strategy.py 500 0 100%
modyn/tests/selector/internal/storage_backend/database/test_database_storage_backend.py 123 0 100%
modyn/tests/selector/internal/storage_backend/local/test_local_storage_backend.py 84 0 100%
modyn/tests/selector/internal/storage_backend/utils.py 16 5 69%
modyn/tests/selector/internal/test_selector_manager.py 148 5 97%
modyn/tests/selector/test_selector.py 95 5 95%
modyn/tests/selector/test_selector_entrypoint.py 25 0 100%
modyn/tests/supervisor/internal/eval_strategies/test_matrix_eval_strategy.py 16 0 100%
modyn/tests/supervisor/internal/eval_strategies/test_offset_eval_strategy.py 8 0 100%
modyn/tests/supervisor/internal/evaluation_result_writer/test_abstract_evaluation_result_writer.py 7 0 100%
modyn/tests/supervisor/internal/evaluation_result_writer/test_json_result_writer.py 16 0 100%
modyn/tests/supervisor/internal/evaluation_result_writer/test_tensorboard_result_writer.py 21 0 100%
modyn/tests/supervisor/internal/grpc/test_supervisor_grpc_server.py 29 1 97%
modyn/tests/supervisor/internal/grpc/test_supervisor_grpc_servicer.py 54 0 100%
modyn/tests/supervisor/internal/pipeline_executor/test_pipeline_executor.py 348 6 98%
modyn/tests/supervisor/internal/test_grpc_handler.py 287 0 100%
modyn/tests/supervisor/internal/test_supervisor.py 179 5 97%
modyn/tests/supervisor/internal/triggers/test_amounttrigger.py 25 0 100%
modyn/tests/supervisor/internal/triggers/test_datadrifttrigger.py 94 1 99%
modyn/tests/supervisor/internal/triggers/test_timetrigger.py 21 0 100%
modyn/tests/supervisor/internal/triggers/test_trigger.py 5 0 100%
modyn/tests/supervisor/internal/triggers/trigger_datasets/test_fixed_keys_dataset.py 123 2 98%
modyn/tests/supervisor/internal/triggers/trigger_datasets/test_online_trigger_dataset.py 28 2 93%
modyn/tests/supervisor/test_entrypoint.py 25 0 100%
modyn/tests/trainer_server/internal/data/key_sources/test_local_key_source.py 89 0 100%
modyn/tests/trainer_server/internal/data/key_sources/test_selector_key_source.py 92 0 100%
modyn/tests/trainer_server/internal/data/test_data_utils.py 22 1 95%
modyn/tests/trainer_server/internal/data/test_local_dataset_writer.py 59 0 100%
modyn/tests/trainer_server/internal/data/test_online_dataset.py 367 5 99%
modyn/tests/trainer_server/internal/data/test_per_class_online_dataset.py 53 3 94%
modyn/tests/trainer_server/internal/grpc/test_trainer_server_grpc_server.py 17 0 100%
modyn/tests/trainer_server/internal/grpc/test_trainer_server_grpc_servicer.py 406 8 98%
modyn/tests/trainer_server/internal/metadata_collector/test_metadata_collector.py 41 0 100%
modyn/tests/trainer_server/internal/trainer/metadata_pytorch_callbacks/test_loss_callback.py 51 1 98%
modyn/tests/trainer_server/internal/trainer/remote_downsamplers/deepcore_comparison_tests_utils.py 21 1 95%
modyn/tests/trainer_server/internal/trainer/remote_downsamplers/test_abstract_matrix_downsampling_strategy.py 77 0 100%
modyn/tests/trainer_server/internal/trainer/remote_downsamplers/test_abstract_remote_downsampling_strategy.py 12 0 100%
modyn/tests/trainer_server/internal/trainer/remote_downsamplers/test_craig_remote_downsampling.py 260 0 100%
modyn/tests/trainer_server/internal/trainer/remote_downsamplers/test_get_tensor_subset.py 56 0 100%
modyn/tests/trainer_server/internal/trainer/remote_downsamplers/test_remote_gradmatch_downsampling_strategy.py 120 0 100%
modyn/tests/trainer_server/internal/trainer/remote_downsamplers/test_remote_gradnorm_downsample.py 96 0 100%
modyn/tests/trainer_server/internal/trainer/remote_downsamplers/test_remote_kcenter_downsampling_strategy.py 108 0 100%
modyn/tests/trainer_server/internal/trainer/remote_downsamplers/test_remote_loss_downsample.py 86 0 100%
modyn/tests/trainer_server/internal/trainer/remote_downsamplers/test_remote_rs2_downsampling.py 123 0 100%
modyn/tests/trainer_server/internal/trainer/remote_downsamplers/test_remote_submodular_downsampling_strategy.py 103 0 100%
modyn/tests/trainer_server/internal/trainer/remote_downsamplers/test_remote_uncertainty_downsampling_strategy.py 51 0 100%
modyn/tests/trainer_server/internal/trainer/test_batch_accumulator.py 93 0 100%
modyn/tests/trainer_server/internal/trainer/test_pytorch_trainer.py 412 34 92%
modyn/tests/trainer_server/test_trainer_server.py 34 0 100%
modyn/tests/trainer_server/test_trainer_server_entrypoint.py 21 0 100%
modyn/tests/utils/test_timer.py 22 0 100%
modyn/tests/utils/test_utils.py 175 0 100%
modyn/trainer_server/custom_lr_schedulers/dlrm_lr_scheduler/dlrm_scheduler.py 33 33 0%
modyn/trainer_server/internal/dataset/data_utils.py 17 2 88%
modyn/trainer_server/internal/dataset/key_sources/abstract_key_source.py 21 5 76%
modyn/trainer_server/internal/dataset/key_sources/local_key_source.py 23 1 96%
modyn/trainer_server/internal/dataset/key_sources/selector_key_source.py 54 2 96%
modyn/trainer_server/internal/dataset/local_dataset_writer.py 55 3 95%
modyn/trainer_server/internal/dataset/online_dataset.py 308 29 91%
modyn/trainer_server/internal/dataset/per_class_online_dataset.py 14 0 100%
modyn/trainer_server/internal/grpc/trainer_server_grpc_server.py 22 0 100%
modyn/trainer_server/internal/grpc/trainer_server_grpc_servicer.py 244 38 84%
modyn/trainer_server/internal/metadata_collector/metadata_collector.py 33 0 100%
modyn/trainer_server/internal/mocks/mock_metadata_processor.py 22 2 91%
modyn/trainer_server/internal/trainer/batch_accumulator.py 30 0 100%
modyn/trainer_server/internal/trainer/metadata_pytorch_callbacks/base_callback.py 15 3 80%
modyn/trainer_server/internal/trainer/metadata_pytorch_callbacks/loss_callback.py 21 0 100%
modyn/trainer_server/internal/trainer/pytorch_trainer.py 516 152 71%
modyn/trainer_server/internal/trainer/remote_downsamplers/abstract_matrix_downsampling_strategy.py 69 4 94%
modyn/trainer_server/internal/trainer/remote_downsamplers/abstract_per_label_remote_downsample_strategy.py 9 1 89%
modyn/trainer_server/internal/trainer/remote_downsamplers/abstract_remote_downsampling_strategy.py 38 4 89%
modyn/trainer_server/internal/trainer/remote_downsamplers/deepcore_utils/cossim.py 28 17 39%
modyn/trainer_server/internal/trainer/remote_downsamplers/deepcore_utils/euclidean.py 29 12 59%
modyn/trainer_server/internal/trainer/remote_downsamplers/deepcore_utils/k_center_greedy.py 38 4 89%
modyn/trainer_server/internal/trainer/remote_downsamplers/deepcore_utils/orthogonal_matching_pursuit.py 66 34 48%
modyn/trainer_server/internal/trainer/remote_downsamplers/deepcore_utils/shuffling.py 9 0 100%
modyn/trainer_server/internal/trainer/remote_downsamplers/deepcore_utils/submodular_function.py 103 15 85%
modyn/trainer_server/internal/trainer/remote_downsamplers/deepcore_utils/submodular_optimizer.py 116 78 33%
modyn/trainer_server/internal/trainer/remote_downsamplers/remote_craig_downsampling.py 98 7 93%
modyn/trainer_server/internal/trainer/remote_downsamplers/remote_grad_match_downsampling_strategy.py 16 1 94%
modyn/trainer_server/internal/trainer/remote_downsamplers/remote_gradnorm_downsampling.py 46 5 89%
modyn/trainer_server/internal/trainer/remote_downsamplers/remote_kcenter_greedy_downsampling_strategy.py 14 0 100%
modyn/trainer_server/internal/trainer/remote_downsamplers/remote_loss_downsampling.py 37 5 86%
modyn/trainer_server/internal/trainer/remote_downsamplers/remote_rs2_downsampling.py 44 1 98%
modyn/trainer_server/internal/trainer/remote_downsamplers/remote_submodular_downsampling_strategy.py 29 3 90%
modyn/trainer_server/internal/trainer/remote_downsamplers/remote_uncertainty_downsampling_strategy.py 64 18 72%
modyn/trainer_server/internal/utils/metric_type.py 3 0 100%
modyn/trainer_server/internal/utils/trainer_messages.py 4 0 100%
modyn/trainer_server/internal/utils/training_info.py 53 2 96%
modyn/trainer_server/internal/utils/training_process_info.py 10 0 100%
modyn/trainer_server/trainer_server.py 19 0 100%
modyn/trainer_server/trainer_server_entrypoint.py 32 3 91%
modyn/utils/timer.py 8 0 100%
modyn/utils/utils.py 161 13 92%
TOTAL 18490 1583 91%
Coverage HTML written to
Required test coverage of
=============== 2434 passed, 8079

@MaxiBoether MaxiBoether changed the base branch from main to feature/MaxiBoether/disable-grad-downmsaple June 3, 2024 14:06
@MaxiBoether MaxiBoether changed the base branch from feature/MaxiBoether/disable-grad-downmsaple to main June 3, 2024 14:06
Copy link

github-actions bot commented Jun 3, 2024

Line Coverage: -% ( % to main)
Branch Coverage: -% ( % to main)

1 similar comment
Copy link

github-actions bot commented Jun 3, 2024

Line Coverage: -% ( % to main)
Branch Coverage: -% ( % to main)

@MaxiBoether MaxiBoether requested a review from XianzheMa June 3, 2024 15:09
modyn/config/schema/sampling/downsampling_config.py Outdated Show resolved Hide resolved
target = torch.randint(0, 10, (10,))

for _ in range(3):
downsampler.inform_samples(sample_ids, data, target)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each call to inform_samples should be provided with a different set of sample_ids

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? That would not be the case in the trainer server / pytorch trainer due to the nature of downsampling and also it will not make a difference

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it does not make a difference here, as we just test the shape. But naturally they should be different because,

In sample_and_batch. In the pytorch_trainer.py, we first iterate over the dataloader and keep informing each batch in _iterate_dataloader_and_compute_scores

self._downsampler.inform_samples(sample_ids, model_output, target, embeddings)

the sample_ids come from the dataloader and should be naturally distinct right? (they are keys of the samples)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, they are not. What differs is the model output (on which true downsamplers sample), but the list of samples is always the same, since the trigger training set from the selector does not change between epochs. Since RS2 only relies on the IDs, it should not matter. The IDs will in all cases be identical across epochs.

Copy link
Collaborator

@XianzheMa XianzheMa Jun 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have a misunderstanding. I am saying the consecutive calls to inform_samples within two select_points call boundaries should contain different sample ids.

I copy the code of _iterate_dataloader_and_compute_scores here:

        for batch_number, batch in enumerate(dataloader):
            self.update_queue(AvailableQueues.DOWNSAMPLING, batch_number, number_of_samples, training_active=False)

            sample_ids, target, data = self.preprocess_batch(batch)
            number_of_samples += len(sample_ids)

            with torch.inference_mode(mode=(not self._downsampler.requires_grad)):
                with torch.autocast(self._device_type, enabled=self._amp):
                    # compute the scores and accumulate them
                    model_output = self._model.model(data)
                    embeddings = self.get_embeddings_if_recorded()
                    self._downsampler.inform_samples(sample_ids, model_output, target, embeddings)

You see: We load one batch after another from the dataloader. One inform_samples call does not contain the entire dataset data but just one batch. The first batch must have different sample ids than the second batch's sample ids. That means if we do not call select_points in the middle, then the inform_samples call should contain different sample ids

I am not talking about sample ids across epochs. Those definitely do not change.

Copy link
Collaborator

@XianzheMa XianzheMa Jun 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i.e. if we have

downsampler.inform_samples(...)
downsampler.select_points(...)
downsampler.inform_samples(...)

Then the first inform_samples call can have the same sample ids as the second inform_samples.

But when we do

downsampler.inform_samples(...)
downsampler.inform_samples(...)
downsampler.select_points(...)
downsampler.inform_samples(...)
downsampler.inform_samples(...)

Suppose the whole dataset contains two batches. Then the first two inform_samples calls should contain different sample_ids.

In this unit test, we only keep calling inform_samples(...) without calling select_points(...), so each call should contain distinct sample_ids.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyway, I think it does not really make a difference here to use different sample ids. But I still do think consecutive inform_samples calls (without select_points call in the middle) should contain distinct sample ids.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand your point and agree with your description, but I still don't understand why you are suggesting it here :D The code is this

    with torch.inference_mode(mode=(not downsampler.requires_grad)):
        sample_ids = list(range(10))
        data = torch.randn(10, 10)
        target = torch.randint(0, 10, (10,))

        for _ in range(3):
            downsampler.inform_samples(sample_ids, data, target)
            selected_ids, weights = downsampler.select_points()

so the loop is the epoch loop (!). Since sample_ids = list(range(10)) we don't have duplicate samples in the same epoch and consistent samples across epochs. This is exactly like you describe. I am not sure if I am missing something or you just confused this loop with something else. I am merging this for now and happy to do a follow up PR in case I am missing something here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

" this unit test, we only keep calling inform_samples(...) without calling select_points(...),"

i dont get it. isn't it directly below :D?

@MaxiBoether MaxiBoether requested a review from XianzheMa June 4, 2024 13:58
@MaxiBoether
Copy link
Contributor Author

Answered your comments and addressed them where possible for now :)

Copy link
Collaborator

@XianzheMa XianzheMa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are still some points to address, but after that, feel free to merge🚀, thanks a lot!

@XianzheMa
Copy link
Collaborator

Feel free to merge the PR! Thanks for the further explanation!!!!

@MaxiBoether MaxiBoether merged commit 9fd2b80 into main Jun 4, 2024
24 checks passed
@MaxiBoether MaxiBoether deleted the feature/MaxiBoether/rs2 branch June 5, 2024 21:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants