feat: Support for RS2 Downsampler #465

MaxiBoether · 2024-06-03T13:48:21Z

This implements the random selection from the RS2 paper minus the learning rate scheduling adjustments.

Note that it is a bit suboptimal to use the downsampling infrastructure here (#466). We might want to think about making the selector a bit more dynamic, but for now, this will suffice to run experiments. #462 should be merged before this is reviewed.

…nto feature/MaxiBoether/rs2

github-actions · 2024-06-03T13:55:22Z

✅ Result of Pytest Coverage

---------- coverage: platform linux, python 3.12.3-final-0 -----------

Name	Stmts	Miss	Cover
modyn/common/benchmark/stopwatch.py	26	0	100%
modyn/common/example_extension/example_extension.py	28	2	93%
modyn/common/ftp/ftp_server.py	31	18	42%
modyn/common/ftp/ftp_utils.py	83	69	17%
modyn/common/grpc/grpc_helpers.py	67	36	46%
modyn/common/trigger_sample/trigger_sample_storage.py	158	9	94%
modyn/config/schema/config.py	93	0	100%
modyn/config/schema/modyn_base_model.py	5	0	100%
modyn/config/schema/pipeline.py	245	20	92%
modyn/config/schema/sampling/downsampling_config.py	61	1	98%
modyn/database/abstract_database_connection.py	35	0	100%
modyn/database/partition_by_meta.py	33	12	64%
modyn/evaluator/evaluator.py	15	0	100%
modyn/evaluator/evaluator_entrypoint.py	32	3	91%
modyn/evaluator/internal/dataset/evaluation_dataset.py	75	3	96%
modyn/evaluator/internal/grpc/evaluator_grpc_server.py	22	0	100%
modyn/evaluator/internal/grpc/evaluator_grpc_servicer.py	165	14	92%
modyn/evaluator/internal/metric_factory.py	18	1	94%
modyn/evaluator/internal/metrics/abstract_decomposable_metric.py	10	1	90%
modyn/evaluator/internal/metrics/abstract_evaluation_metric.py	29	2	93%
modyn/evaluator/internal/metrics/abstract_holistic_metric.py	10	1	90%
modyn/evaluator/internal/metrics/accuracy.py	20	2	90%
modyn/evaluator/internal/metrics/f1_score.py	63	0	100%
modyn/evaluator/internal/metrics/roc_auc.py	36	1	97%
modyn/evaluator/internal/pytorch_evaluator.py	113	28	75%
modyn/evaluator/internal/utils/evaluation_info.py	9	0	100%
modyn/evaluator/internal/utils/evaluation_process_info.py	8	0	100%
modyn/evaluator/internal/utils/evaluator_messages.py	3	0	100%
modyn/metadata_database/metadata_base.py	3	0	100%
modyn/metadata_database/metadata_database_connection.py	55	3	95%
modyn/metadata_database/models/pipelines.py	24	1	96%
modyn/metadata_database/models/sample_training_metadata.py	15	0	100%
modyn/metadata_database/models/selector_state_metadata.py	47	10	79%
modyn/metadata_database/models/trained_models.py	18	0	100%
modyn/metadata_database/models/trigger_partitions.py	10	0	100%
modyn/metadata_database/models/trigger_training_metadata.py	14	0	100%
modyn/metadata_database/models/triggers.py	10	0	100%
modyn/metadata_database/utils/model_storage_strategy_config.py	21	2	90%
modyn/metadata_processor/internal/grpc/metadata_processor_grpc_servicer.py	18	0	100%
modyn/metadata_processor/internal/grpc/metadata_processor_server.py	24	0	100%
modyn/metadata_processor/internal/metadata_processor_manager.py	23	4	83%
modyn/metadata_processor/metadata_processor.py	11	0	100%
modyn/metadata_processor/metadata_processor_entrypoint.py	24	1	96%
modyn/metadata_processor/processor_strategies/abstract_processor_strategy.py	30	0	100%
modyn/metadata_processor/processor_strategies/basic_processor_strategy.py	17	2	88%
modyn/metadata_processor/processor_strategies/processor_strategy_type.py	6	1	83%
modyn/model_storage/internal/grpc/grpc_server.py	23	0	100%
modyn/model_storage/internal/grpc/model_storage_grpc_servicer.py	54	0	100%
modyn/model_storage/internal/model_storage_manager.py	118	5	96%
modyn/model_storage/internal/storage_strategies/abstract_difference_operator.py	11	2	82%
modyn/model_storage/internal/storage_strategies/abstract_model_storage_strategy.py	16	1	94%
modyn/model_storage/internal/storage_strategies/difference_operators/sub_difference_operator.py	12	0	100%
modyn/model_storage/internal/storage_strategies/difference_operators/xor_difference_operator.py	14	0	100%
modyn/model_storage/internal/storage_strategies/full_model_strategies/abstract_full_model_strategy.py	26	2	92%
modyn/model_storage/internal/storage_strategies/full_model_strategies/binary_full_model.py	16	0	100%
modyn/model_storage/internal/storage_strategies/full_model_strategies/pytorch_full_model.py	15	0	100%
modyn/model_storage/internal/storage_strategies/incremental_model_strategies/abstract_incremental_model_strategy.py	26	10	62%
modyn/model_storage/internal/storage_strategies/incremental_model_strategies/weights_difference.py	99	1	99%
modyn/model_storage/internal/utils/model_storage_policy.py	35	0	100%
modyn/model_storage/model_storage.py	27	3	89%
modyn/model_storage/model_storage_entrypoint.py	32	3	91%
modyn/models/articlenet/articlenet.py	30	16	47%
modyn/models/coreset_methods_support.py	29	1	97%
modyn/models/dlrm/cuda_ext/dot_based_interact.py	24	13	46%
modyn/models/dlrm/cuda_ext/fused_gather_embedding.py	16	16	0%
modyn/models/dlrm/cuda_ext/sparse_embedding.py	32	32	0%
modyn/models/dlrm/dlrm.py	67	9	87%
modyn/models/dlrm/nn/embeddings.py	123	64	48%
modyn/models/dlrm/nn/factories.py	24	9	62%
modyn/models/dlrm/nn/interactions.py	50	11	78%
modyn/models/dlrm/nn/mlps.py	77	23	70%
modyn/models/dlrm/nn/parts.py	60	4	93%
modyn/models/dlrm/setup.py	5	5	0%
modyn/models/dlrm/utils/install_lib.py	11	7	36%
modyn/models/dlrm/utils/utils.py	28	0	100%
modyn/models/dummy/dummy.py	12	0	100%
modyn/models/fmownet/fmownet.py	25	0	100%
modyn/models/resnet18/resnet18.py	28	0	100%
modyn/models/resnet50/resnet50.py	28	0	100%
modyn/models/resnet152/resnet152.py	28	0	100%
modyn/models/tokenizers/distill_bert_tokenizer.py	11	0	100%
modyn/models/yearbooknet/yearbooknet.py	23	0	100%
modyn/selector/internal/grpc/selector_grpc_servicer.py	78	22	72%
modyn/selector/internal/grpc/selector_server.py	33	12	64%
modyn/selector/internal/selector_manager.py	125	37	70%
modyn/selector/internal/selector_strategies/abstract_selection_strategy.py	125	8	94%
modyn/selector/internal/selector_strategies/coreset_strategy.py	66	6	91%
modyn/selector/internal/selector_strategies/downsampling_strategies/abstract_downsampling_strategy.py	29	0	100%
modyn/selector/internal/selector_strategies/downsampling_strategies/craig_downsampling_strategy.py	18	12	33%
modyn/selector/internal/selector_strategies/downsampling_strategies/downsampling_scheduler.py	51	0	100%
modyn/selector/internal/selector_strategies/downsampling_strategies/gradmatch_downsampling_strategy.py	14	8	43%
modyn/selector/internal/selector_strategies/downsampling_strategies/gradnorm_downsampling_strategy.py	6	0	100%
modyn/selector/internal/selector_strategies/downsampling_strategies/kcentergreedy_downsampling_strategy.py	14	8	43%
modyn/selector/internal/selector_strategies/downsampling_strategies/loss_downsampling_strategy.py	6	0	100%
modyn/selector/internal/selector_strategies/downsampling_strategies/no_downsampling_strategy.py	10	0	100%
modyn/selector/internal/selector_strategies/downsampling_strategies/rho_loss_downsampling_strategy.py	56	4	93%
modyn/selector/internal/selector_strategies/downsampling_strategies/rs2_downsampling_strategy.py	10	0	100%
modyn/selector/internal/selector_strategies/downsampling_strategies/submodular_downsampling_strategy.py	20	14	30%
modyn/selector/internal/selector_strategies/downsampling_strategies/uncertainty_downsampling_strategy.py	15	9	40%
modyn/selector/internal/selector_strategies/downsampling_strategies/utils.py	7	0	100%
modyn/selector/internal/selector_strategies/freshness_sampling_strategy.py	130	12	91%
modyn/selector/internal/selector_strategies/new_data_strategy.py	98	10	90%
modyn/selector/internal/selector_strategies/presampling_strategies/abstract_balanced_strategy.py	57	0	100%
modyn/selector/internal/selector_strategies/presampling_strategies/abstract_presampling_strategy.py	23	1	96%
modyn/selector/internal/selector_strategies/presampling_strategies/label_balanced_presampling_strategy.py	7	0	100%
modyn/selector/internal/selector_strategies/presampling_strategies/no_presampling_strategy.py	16	1	94%
modyn/selector/internal/selector_strategies/presampling_strategies/random_no_replacement_presampling_strategy.py	42	0	100%
modyn/selector/internal/selector_strategies/presampling_strategies/random_presampling_strategy.py	17	0	100%
modyn/selector/internal/selector_strategies/presampling_strategies/trigger_balanced_presampling_strategy.py	13	1	92%
modyn/selector/internal/selector_strategies/presampling_strategies/utils.py	9	0	100%
modyn/selector/internal/selector_strategies/utils.py	10	0	100%
modyn/selector/internal/storage_backend/abstract_storage_backend.py	34	7	79%
modyn/selector/internal/storage_backend/database/database_storage_backend.py	85	7	92%
modyn/selector/internal/storage_backend/local/local_storage_backend.py	136	5	96%
modyn/selector/selector.py	82	14	83%
modyn/selector/selector_entrypoint.py	31	3	90%
modyn/supervisor/entrypoint.py	31	3	90%
modyn/supervisor/internal/eval_strategies/abstract_eval_strategy.py	8	1	88%
modyn/supervisor/internal/eval_strategies/matrix_eval_strategy.py	17	0	100%
modyn/supervisor/internal/eval_strategies/offset_eval_strategy.py	22	0	100%
modyn/supervisor/internal/evaluation_result_writer/abstract_evaluation_result_writer.py	16	2	88%
modyn/supervisor/internal/evaluation_result_writer/json_result_writer.py	23	1	96%
modyn/supervisor/internal/evaluation_result_writer/tensorboard_result_writer.py	13	0	100%
modyn/supervisor/internal/grpc/enums.py	55	0	100%
modyn/supervisor/internal/grpc/supervisor_grpc_server.py	25	7	72%
modyn/supervisor/internal/grpc/supervisor_grpc_servicer.py	35	0	100%
modyn/supervisor/internal/grpc/template_msg.py	26	0	100%
modyn/supervisor/internal/grpc_handler.py	301	36	88%
modyn/supervisor/internal/pipeline_executor/models.py	256	34	87%
modyn/supervisor/internal/pipeline_executor/pipeline_executor.py	361	18	95%
modyn/supervisor/internal/supervisor.py	144	17	88%
modyn/supervisor/internal/triggers/amounttrigger.py	15	0	100%
modyn/supervisor/internal/triggers/datadrifttrigger.py	102	28	73%
modyn/supervisor/internal/triggers/embedding_encoder_utils/embedding_encoder.py	30	19	37%
modyn/supervisor/internal/triggers/embedding_encoder_utils/embedding_encoder_downloader.py	50	31	38%
modyn/supervisor/internal/triggers/timetrigger.py	26	3	88%
modyn/supervisor/internal/triggers/trigger.py	21	1	95%
modyn/supervisor/internal/triggers/trigger_datasets/dataloader_info.py	16	13	19%
modyn/supervisor/internal/triggers/trigger_datasets/fixed_keys_dataset.py	72	3	96%
modyn/supervisor/internal/triggers/trigger_datasets/online_trigger_dataset.py	17	1	94%
modyn/supervisor/internal/triggers/utils.py	50	37	26%
modyn/supervisor/internal/utils/evaluation_status_reporter.py	31	0	100%
modyn/supervisor/internal/utils/pipeline_info.py	30	9	70%
modyn/supervisor/internal/utils/training_status_reporter.py	24	3	88%
modyn/tests/common/example_extension/test_example_extension.py	13	0	100%
modyn/tests/common/grpc/test_grpc_helpers.py	3	0	100%
modyn/tests/common/trigger_sample/test_trigger_sample_storage.py	128	0	100%
modyn/tests/config/schema/test_pipeline.py	35	0	100%
modyn/tests/config/test_config_integrity.py	36	1	97%
modyn/tests/conftest.py	39	0	100%
modyn/tests/database/test_abstract_database_connection.py	19	0	100%
modyn/tests/evaluator/internal/dataset/test_evaluation_dataset.py	131	2	98%
modyn/tests/evaluator/internal/grpc/test_evaluator_grpc_server.py	20	0	100%
modyn/tests/evaluator/internal/grpc/test_evaluator_grpc_servicer.py	365	16	96%
modyn/tests/evaluator/internal/metrics/test_accuracy.py	45	0	100%
modyn/tests/evaluator/internal/metrics/test_f1_score.py	53	0	100%
modyn/tests/evaluator/internal/metrics/test_roc_auc.py	31	0	100%
modyn/tests/evaluator/internal/test_metric_factory.py	13	0	100%
modyn/tests/evaluator/internal/test_pytorch_evaluator.py	163	19	88%
modyn/tests/evaluator/test_evaluator.py	30	0	100%
modyn/tests/evaluator/test_evaluator_entrypoint.py	21	0	100%
modyn/tests/metadata_database/models/test_pipelines.py	50	0	100%
modyn/tests/metadata_database/models/test_sample_training_metadata.py	40	0	100%
modyn/tests/metadata_database/models/test_selector_state_metadata.py	46	0	100%
modyn/tests/metadata_database/models/test_trained_models.py	48	0	100%
modyn/tests/metadata_database/models/test_trigger_training_metadata.py	38	0	100%
modyn/tests/metadata_database/models/test_triggers.py	33	0	100%
modyn/tests/metadata_database/test_metadata_database_connection.py	47	0	100%
modyn/tests/metadata_processor/internal/grpc/test_metadata_processor_grpc_servicer.py	26	0	100%
modyn/tests/metadata_processor/internal/grpc/test_metadata_processor_server.py	27	0	100%
modyn/tests/metadata_processor/internal/test_metadata_processor_manager.py	42	3	93%
modyn/tests/metadata_processor/processor_strategies/test_abstract_processor_strategy.py	60	0	100%
modyn/tests/metadata_processor/processor_strategies/test_basic_processor_strategy.py	43	0	100%
modyn/tests/metadata_processor/test_metadata_processor.py	22	3	86%
modyn/tests/metadata_processor/test_metadata_processor_entrypoint.py	21	0	100%
modyn/tests/model_storage/internal/grpc/test_model_storage_grpc_server.py	16	0	100%
modyn/tests/model_storage/internal/grpc/test_model_storage_grpc_servicer.py	100	0	100%
modyn/tests/model_storage/internal/storage_strategies/difference_operators/test_sub_difference_operator.py	16	0	100%
modyn/tests/model_storage/internal/storage_strategies/difference_operators/test_xor_difference_operator.py	16	0	100%
modyn/tests/model_storage/internal/storage_strategies/full_model_strategies/test_binary_full_model.py	27	1	96%
modyn/tests/model_storage/internal/storage_strategies/full_model_strategies/test_pytorch_full_model.py	36	1	97%
modyn/tests/model_storage/internal/storage_strategies/incremental_model_strategies/test_weights_difference.py	88	2	98%
modyn/tests/model_storage/internal/test_model_storage_manager.py	217	1	99%
modyn/tests/model_storage/internal/utils/test_model_storage_policy.py	28	0	100%
modyn/tests/model_storage/test_model_storage.py	37	0	100%
modyn/tests/model_storage/test_model_storage_entrypoint.py	21	0	100%
modyn/tests/models/test_bert_tokenizer.py	24	0	100%
modyn/tests/models/test_dlrm.py	46	0	100%
modyn/tests/models/test_dummy.py	8	0	100%
modyn/tests/models/test_embedding_recorder.py	27	0	100%
modyn/tests/models/test_fmownet.py	25	0	100%
modyn/tests/models/test_resnet18.py	22	0	100%
modyn/tests/models/test_resnet50.py	22	0	100%
modyn/tests/models/test_resnet152.py	22	0	100%
modyn/tests/models/test_yearbook_net.py	47	0	100%
modyn/tests/selector/internal/grpc/test_selector_grpc_servicer.py	132	0	100%
modyn/tests/selector/internal/grpc/test_selector_server.py	16	0	100%
modyn/tests/selector/internal/selector_strategies/downsampling_strategies/test_abstract_downsampling_strategy.py	14	0	100%
modyn/tests/selector/internal/selector_strategies/downsampling_strategies/test_gradnorm_downsampling_strategy.py	14	0	100%
modyn/tests/selector/internal/selector_strategies/downsampling_strategies/test_loss_downsampling_strategy.py	18	0	100%
modyn/tests/selector/internal/selector_strategies/downsampling_strategies/test_no_downsampling_strategy.py	6	0	100%
modyn/tests/selector/internal/selector_strategies/downsampling_strategies/test_rho_loss_downsampling_strategy.py	112	0	100%
modyn/tests/selector/internal/selector_strategies/downsampling_strategies/test_rs2_downsampling_strategy.py	18	0	100%
modyn/tests/selector/internal/selector_strategies/downsampling_strategies/test_scheduler.py	131	0	100%
modyn/tests/selector/internal/selector_strategies/presampling_strategies/test_abstract_balanced_strategy.py	14	0	100%
modyn/tests/selector/internal/selector_strategies/presampling_strategies/test_empty_presampling_strategy.py	0	0	100%
modyn/tests/selector/internal/selector_strategies/presampling_strategies/test_label_balanced_presampling_strategy.py	165	0	100%
modyn/tests/selector/internal/selector_strategies/presampling_strategies/test_random_no_replacement_presampling_strategy.py	52	0	100%
modyn/tests/selector/internal/selector_strategies/presampling_strategies/test_random_presampling_strategy.py	86	0	100%
modyn/tests/selector/internal/selector_strategies/presampling_strategies/test_trigger_balanced_presampling.py	140	0	100%
modyn/tests/selector/internal/selector_strategies/test_abstract_selection_strategy.py	170	0	100%
modyn/tests/selector/internal/selector_strategies/test_coreset_strategy.py	246	0	100%
modyn/tests/selector/internal/selector_strategies/test_freshness_sampling_strategy.py	300	0	100%
modyn/tests/selector/internal/selector_strategies/test_new_data_strategy.py	500	0	100%
modyn/tests/selector/internal/storage_backend/database/test_database_storage_backend.py	123	0	100%
modyn/tests/selector/internal/storage_backend/local/test_local_storage_backend.py	84	0	100%
modyn/tests/selector/internal/storage_backend/utils.py	16	5	69%
modyn/tests/selector/internal/test_selector_manager.py	148	5	97%
modyn/tests/selector/test_selector.py	95	5	95%
modyn/tests/selector/test_selector_entrypoint.py	25	0	100%
modyn/tests/supervisor/internal/eval_strategies/test_matrix_eval_strategy.py	16	0	100%
modyn/tests/supervisor/internal/eval_strategies/test_offset_eval_strategy.py	8	0	100%
modyn/tests/supervisor/internal/evaluation_result_writer/test_abstract_evaluation_result_writer.py	7	0	100%
modyn/tests/supervisor/internal/evaluation_result_writer/test_json_result_writer.py	16	0	100%
modyn/tests/supervisor/internal/evaluation_result_writer/test_tensorboard_result_writer.py	21	0	100%
modyn/tests/supervisor/internal/grpc/test_supervisor_grpc_server.py	29	1	97%
modyn/tests/supervisor/internal/grpc/test_supervisor_grpc_servicer.py	54	0	100%
modyn/tests/supervisor/internal/pipeline_executor/test_pipeline_executor.py	348	6	98%
modyn/tests/supervisor/internal/test_grpc_handler.py	287	0	100%
modyn/tests/supervisor/internal/test_supervisor.py	179	5	97%
modyn/tests/supervisor/internal/triggers/test_amounttrigger.py	25	0	100%
modyn/tests/supervisor/internal/triggers/test_datadrifttrigger.py	94	1	99%
modyn/tests/supervisor/internal/triggers/test_timetrigger.py	21	0	100%
modyn/tests/supervisor/internal/triggers/test_trigger.py	5	0	100%
modyn/tests/supervisor/internal/triggers/trigger_datasets/test_fixed_keys_dataset.py	123	2	98%
modyn/tests/supervisor/internal/triggers/trigger_datasets/test_online_trigger_dataset.py	28	2	93%
modyn/tests/supervisor/test_entrypoint.py	25	0	100%
modyn/tests/trainer_server/internal/data/key_sources/test_local_key_source.py	89	0	100%
modyn/tests/trainer_server/internal/data/key_sources/test_selector_key_source.py	92	0	100%
modyn/tests/trainer_server/internal/data/test_data_utils.py	22	1	95%
modyn/tests/trainer_server/internal/data/test_local_dataset_writer.py	59	0	100%
modyn/tests/trainer_server/internal/data/test_online_dataset.py	367	5	99%
modyn/tests/trainer_server/internal/data/test_per_class_online_dataset.py	53	3	94%
modyn/tests/trainer_server/internal/grpc/test_trainer_server_grpc_server.py	17	0	100%
modyn/tests/trainer_server/internal/grpc/test_trainer_server_grpc_servicer.py	406	8	98%
modyn/tests/trainer_server/internal/metadata_collector/test_metadata_collector.py	41	0	100%
modyn/tests/trainer_server/internal/trainer/metadata_pytorch_callbacks/test_loss_callback.py	51	1	98%
modyn/tests/trainer_server/internal/trainer/remote_downsamplers/deepcore_comparison_tests_utils.py	21	1	95%
modyn/tests/trainer_server/internal/trainer/remote_downsamplers/test_abstract_matrix_downsampling_strategy.py	77	0	100%
modyn/tests/trainer_server/internal/trainer/remote_downsamplers/test_abstract_remote_downsampling_strategy.py	12	0	100%
modyn/tests/trainer_server/internal/trainer/remote_downsamplers/test_craig_remote_downsampling.py	260	0	100%
modyn/tests/trainer_server/internal/trainer/remote_downsamplers/test_get_tensor_subset.py	56	0	100%
modyn/tests/trainer_server/internal/trainer/remote_downsamplers/test_remote_gradmatch_downsampling_strategy.py	120	0	100%
modyn/tests/trainer_server/internal/trainer/remote_downsamplers/test_remote_gradnorm_downsample.py	96	0	100%
modyn/tests/trainer_server/internal/trainer/remote_downsamplers/test_remote_kcenter_downsampling_strategy.py	108	0	100%
modyn/tests/trainer_server/internal/trainer/remote_downsamplers/test_remote_loss_downsample.py	86	0	100%
modyn/tests/trainer_server/internal/trainer/remote_downsamplers/test_remote_rs2_downsampling.py	123	0	100%
modyn/tests/trainer_server/internal/trainer/remote_downsamplers/test_remote_submodular_downsampling_strategy.py	103	0	100%
modyn/tests/trainer_server/internal/trainer/remote_downsamplers/test_remote_uncertainty_downsampling_strategy.py	51	0	100%
modyn/tests/trainer_server/internal/trainer/test_batch_accumulator.py	93	0	100%
modyn/tests/trainer_server/internal/trainer/test_pytorch_trainer.py	412	34	92%
modyn/tests/trainer_server/test_trainer_server.py	34	0	100%
modyn/tests/trainer_server/test_trainer_server_entrypoint.py	21	0	100%
modyn/tests/utils/test_timer.py	22	0	100%
modyn/tests/utils/test_utils.py	175	0	100%
modyn/trainer_server/custom_lr_schedulers/dlrm_lr_scheduler/dlrm_scheduler.py	33	33	0%
modyn/trainer_server/internal/dataset/data_utils.py	17	2	88%
modyn/trainer_server/internal/dataset/key_sources/abstract_key_source.py	21	5	76%
modyn/trainer_server/internal/dataset/key_sources/local_key_source.py	23	1	96%
modyn/trainer_server/internal/dataset/key_sources/selector_key_source.py	54	2	96%
modyn/trainer_server/internal/dataset/local_dataset_writer.py	55	3	95%
modyn/trainer_server/internal/dataset/online_dataset.py	308	29	91%
modyn/trainer_server/internal/dataset/per_class_online_dataset.py	14	0	100%
modyn/trainer_server/internal/grpc/trainer_server_grpc_server.py	22	0	100%
modyn/trainer_server/internal/grpc/trainer_server_grpc_servicer.py	244	38	84%
modyn/trainer_server/internal/metadata_collector/metadata_collector.py	33	0	100%
modyn/trainer_server/internal/mocks/mock_metadata_processor.py	22	2	91%
modyn/trainer_server/internal/trainer/batch_accumulator.py	30	0	100%
modyn/trainer_server/internal/trainer/metadata_pytorch_callbacks/base_callback.py	15	3	80%
modyn/trainer_server/internal/trainer/metadata_pytorch_callbacks/loss_callback.py	21	0	100%
modyn/trainer_server/internal/trainer/pytorch_trainer.py	516	152	71%
modyn/trainer_server/internal/trainer/remote_downsamplers/abstract_matrix_downsampling_strategy.py	69	4	94%
modyn/trainer_server/internal/trainer/remote_downsamplers/abstract_per_label_remote_downsample_strategy.py	9	1	89%
modyn/trainer_server/internal/trainer/remote_downsamplers/abstract_remote_downsampling_strategy.py	38	4	89%
modyn/trainer_server/internal/trainer/remote_downsamplers/deepcore_utils/cossim.py	28	17	39%
modyn/trainer_server/internal/trainer/remote_downsamplers/deepcore_utils/euclidean.py	29	12	59%
modyn/trainer_server/internal/trainer/remote_downsamplers/deepcore_utils/k_center_greedy.py	38	4	89%
modyn/trainer_server/internal/trainer/remote_downsamplers/deepcore_utils/orthogonal_matching_pursuit.py	66	34	48%
modyn/trainer_server/internal/trainer/remote_downsamplers/deepcore_utils/shuffling.py	9	0	100%
modyn/trainer_server/internal/trainer/remote_downsamplers/deepcore_utils/submodular_function.py	103	15	85%
modyn/trainer_server/internal/trainer/remote_downsamplers/deepcore_utils/submodular_optimizer.py	116	78	33%
modyn/trainer_server/internal/trainer/remote_downsamplers/remote_craig_downsampling.py	98	7	93%
modyn/trainer_server/internal/trainer/remote_downsamplers/remote_grad_match_downsampling_strategy.py	16	1	94%
modyn/trainer_server/internal/trainer/remote_downsamplers/remote_gradnorm_downsampling.py	46	5	89%
modyn/trainer_server/internal/trainer/remote_downsamplers/remote_kcenter_greedy_downsampling_strategy.py	14	0	100%
modyn/trainer_server/internal/trainer/remote_downsamplers/remote_loss_downsampling.py	37	5	86%
modyn/trainer_server/internal/trainer/remote_downsamplers/remote_rs2_downsampling.py	44	1	98%
modyn/trainer_server/internal/trainer/remote_downsamplers/remote_submodular_downsampling_strategy.py	29	3	90%
modyn/trainer_server/internal/trainer/remote_downsamplers/remote_uncertainty_downsampling_strategy.py	64	18	72%
modyn/trainer_server/internal/utils/metric_type.py	3	0	100%
modyn/trainer_server/internal/utils/trainer_messages.py	4	0	100%
modyn/trainer_server/internal/utils/training_info.py	53	2	96%
modyn/trainer_server/internal/utils/training_process_info.py	10	0	100%
modyn/trainer_server/trainer_server.py	19	0	100%
modyn/trainer_server/trainer_server_entrypoint.py	32	3	91%
modyn/utils/timer.py	8	0	100%
modyn/utils/utils.py	161	13	92%
TOTAL	18490	1583	91%
Coverage	HTML	written	to
Required	test	coverage	of
===============	2434	passed,	8079

github-actions · 2024-06-03T14:23:52Z

^{( % to main)}
^{( % to main)}

github-actions · 2024-06-03T14:23:52Z

^{( % to main)}
^{( % to main)}

modyn/config/schema/sampling/downsampling_config.py

…nto feature/MaxiBoether/rs2

modyn/config/schema/sampling/downsampling_config.py

...trainer_server/internal/trainer/remote_downsamplers/abstract_remote_downsampling_strategy.py

modyn/trainer_server/internal/trainer/remote_downsamplers/remote_rs2_downsampling.py

modyn/trainer_server/internal/trainer/pytorch_trainer.py

modyn/trainer_server/internal/trainer/remote_downsamplers/remote_rs2_downsampling.py

modyn/tests/trainer_server/internal/trainer/remote_downsamplers/test_rs2_downsampling.py

XianzheMa · 2024-06-04T11:19:18Z

modyn/tests/trainer_server/internal/trainer/remote_downsamplers/test_rs2_downsampling.py

+        target = torch.randint(0, 10, (10,))
+
+        for _ in range(3):
+            downsampler.inform_samples(sample_ids, data, target)


Each call to inform_samples should be provided with a different set of sample_ids

Why? That would not be the case in the trainer server / pytorch trainer due to the nature of downsampling and also it will not make a difference

Yeah it does not make a difference here, as we just test the shape. But naturally they should be different because,

In sample_and_batch. In the pytorch_trainer.py, we first iterate over the dataloader and keep informing each batch in _iterate_dataloader_and_compute_scores

modyn/modyn/trainer_server/internal/trainer/pytorch_trainer.py

Line 836 in ac66eaf

self._downsampler.inform_samples(sample_ids, model_output, target, embeddings)

the sample_ids come from the dataloader and should be naturally distinct right? (they are keys of the samples)

No, they are not. What differs is the model output (on which true downsamplers sample), but the list of samples is always the same, since the trigger training set from the selector does not change between epochs. Since RS2 only relies on the IDs, it should not matter. The IDs will in all cases be identical across epochs.

I think we have a misunderstanding. I am saying the consecutive calls to inform_samples within two select_points call boundaries should contain different sample ids.

I copy the code of _iterate_dataloader_and_compute_scores here:

for batch_number, batch in enumerate(dataloader): self.update_queue(AvailableQueues.DOWNSAMPLING, batch_number, number_of_samples, training_active=False) sample_ids, target, data = self.preprocess_batch(batch) number_of_samples += len(sample_ids) with torch.inference_mode(mode=(not self._downsampler.requires_grad)): with torch.autocast(self._device_type, enabled=self._amp): # compute the scores and accumulate them model_output = self._model.model(data) embeddings = self.get_embeddings_if_recorded() self._downsampler.inform_samples(sample_ids, model_output, target, embeddings)

You see: We load one batch after another from the dataloader. One inform_samples call does not contain the entire dataset data but just one batch. The first batch must have different sample ids than the second batch's sample ids. That means if we do not call select_points in the middle, then the inform_samples call should contain different sample ids

I am not talking about sample ids across epochs. Those definitely do not change.

i.e. if we have

downsampler.inform_samples(...) downsampler.select_points(...) downsampler.inform_samples(...)

Then the first inform_samples call can have the same sample ids as the second inform_samples.

But when we do

downsampler.inform_samples(...) downsampler.inform_samples(...) downsampler.select_points(...) downsampler.inform_samples(...) downsampler.inform_samples(...)

Suppose the whole dataset contains two batches. Then the first two inform_samples calls should contain different sample_ids.

In this unit test, we only keep calling inform_samples(...) without calling select_points(...), so each call should contain distinct sample_ids.

Anyway, I think it does not really make a difference here to use different sample ids. But I still do think consecutive inform_samples calls (without select_points call in the middle) should contain distinct sample ids.

I understand your point and agree with your description, but I still don't understand why you are suggesting it here :D The code is this

with torch.inference_mode(mode=(not downsampler.requires_grad)): sample_ids = list(range(10)) data = torch.randn(10, 10) target = torch.randint(0, 10, (10,)) for _ in range(3): downsampler.inform_samples(sample_ids, data, target) selected_ids, weights = downsampler.select_points()

so the loop is the epoch loop (!). Since sample_ids = list(range(10)) we don't have duplicate samples in the same epoch and consistent samples across epochs. This is exactly like you describe. I am not sure if I am missing something or you just confused this loop with something else. I am merging this for now and happy to do a follow up PR in case I am missing something here

" this unit test, we only keep calling inform_samples(...) without calling select_points(...),"

i dont get it. isn't it directly below :D?

MaxiBoether · 2024-06-04T13:58:53Z

Answered your comments and addressed them where possible for now :)

...n/selector/internal/selector_strategies/downsampling_strategies/rs2_downsampling_strategy.py

modyn/tests/trainer_server/internal/trainer/remote_downsamplers/test_rs2_downsampling.py

...trainer_server/internal/trainer/remote_downsamplers/abstract_remote_downsampling_strategy.py

modyn/trainer_server/internal/trainer/pytorch_trainer.py

XianzheMa

There are still some points to address, but after that, feel free to merge🚀, thanks a lot!

...n/selector/internal/selector_strategies/downsampling_strategies/rs2_downsampling_strategy.py

XianzheMa · 2024-06-04T20:43:55Z

Feel free to merge the PR! Thanks for the further explanation!!!!

MaxiBoether added 3 commits June 3, 2024 13:19

enable inference mode for downsampling if ok

de5c118

implement rs2

5a4be02

formatting

98680f0

MaxiBoether changed the base branch from main to feature/MaxiBoether/disable-grad-downmsaple June 3, 2024 13:48

MaxiBoether changed the base branch from feature/MaxiBoether/disable-grad-downmsaple to main June 3, 2024 13:50

Merge branch 'main' into feature/MaxiBoether/rs2

2c8ad08

MaxiBoether mentioned this pull request Jun 3, 2024

Support for RS2 at the Selector instead of downsampling #466

Open

MaxiBoether added 2 commits June 3, 2024 15:54

more linting

4e44116

Merge branch 'feature/MaxiBoether/rs2' of github.com:eth-easl/modyn i…

77e86aa

…nto feature/MaxiBoether/rs2

move matrix content to constructor

7324ab5

MaxiBoether changed the base branch from main to feature/MaxiBoether/disable-grad-downmsaple June 3, 2024 14:06

MaxiBoether changed the base branch from feature/MaxiBoether/disable-grad-downmsaple to main June 3, 2024 14:06

fix config

cf26dc4

MaxiBoether requested a review from XianzheMa June 3, 2024 15:09

MaxiBoether and others added 2 commits June 3, 2024 19:09

Merge remote-tracking branch 'origin/main' into feature/MaxiBoether/rs2

b971373

Merge branch 'main' into feature/MaxiBoether/rs2

7848dfc

MaxiBoether commented Jun 3, 2024

View reviewed changes

modyn/config/schema/sampling/downsampling_config.py Outdated Show resolved Hide resolved

MaxiBoether added 2 commits June 4, 2024 13:05

fix every epoch

7033596

Merge branch 'feature/MaxiBoether/rs2' of github.com:eth-easl/modyn i…

25d5af0

…nto feature/MaxiBoether/rs2

XianzheMa requested changes Jun 4, 2024

View reviewed changes

addressed comments

240dbf4

MaxiBoether requested a review from XianzheMa June 4, 2024 13:58

XianzheMa requested changes Jun 4, 2024

View reviewed changes

XianzheMa approved these changes Jun 4, 2024

View reviewed changes

...n/selector/internal/selector_strategies/downsampling_strategies/rs2_downsampling_strategy.py Outdated Show resolved Hide resolved

address CI and xianzhe's comments

0e11701

MaxiBoether merged commit 9fd2b80 into main Jun 4, 2024
24 checks passed

MaxiBoether deleted the feature/MaxiBoether/rs2 branch June 5, 2024 21:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Support for RS2 Downsampler #465

feat: Support for RS2 Downsampler #465

MaxiBoether commented Jun 3, 2024 •

edited

Loading

github-actions bot commented Jun 3, 2024 •

edited

Loading

github-actions bot commented Jun 3, 2024

github-actions bot commented Jun 3, 2024

XianzheMa Jun 4, 2024

MaxiBoether Jun 4, 2024

XianzheMa Jun 4, 2024

MaxiBoether Jun 4, 2024

XianzheMa Jun 4, 2024 •

edited

Loading

XianzheMa Jun 4, 2024 •

edited

Loading

XianzheMa Jun 4, 2024

MaxiBoether Jun 4, 2024

MaxiBoether Jun 4, 2024

MaxiBoether commented Jun 4, 2024

XianzheMa left a comment

XianzheMa commented Jun 4, 2024

feat: Support for RS2 Downsampler #465

feat: Support for RS2 Downsampler #465

Conversation

MaxiBoether commented Jun 3, 2024 • edited Loading

github-actions bot commented Jun 3, 2024 • edited Loading

✅ Result of Pytest Coverage

github-actions bot commented Jun 3, 2024

github-actions bot commented Jun 3, 2024

XianzheMa Jun 4, 2024

Choose a reason for hiding this comment

MaxiBoether Jun 4, 2024

Choose a reason for hiding this comment

XianzheMa Jun 4, 2024

Choose a reason for hiding this comment

MaxiBoether Jun 4, 2024

Choose a reason for hiding this comment

XianzheMa Jun 4, 2024 • edited Loading

Choose a reason for hiding this comment

XianzheMa Jun 4, 2024 • edited Loading

Choose a reason for hiding this comment

XianzheMa Jun 4, 2024

Choose a reason for hiding this comment

MaxiBoether Jun 4, 2024

Choose a reason for hiding this comment

MaxiBoether Jun 4, 2024

Choose a reason for hiding this comment

MaxiBoether commented Jun 4, 2024

XianzheMa left a comment

Choose a reason for hiding this comment

XianzheMa commented Jun 4, 2024

MaxiBoether commented Jun 3, 2024 •

edited

Loading

github-actions bot commented Jun 3, 2024 •

edited

Loading

XianzheMa Jun 4, 2024 •

edited

Loading

XianzheMa Jun 4, 2024 •

edited

Loading