Scarf is an automatic configuration tuning framework for Apache Flink. It consists of:
- Knob selection acceleration through workload clustering
- Multi-objective Reinforcement Learning (MORL)-based offline-online learning
- Knowledge transfer via topology-agnostic GNN-based actor-critic network
This tuner is implemented with Python 3.12. To run the tuner, install packages in requirements.txt.
This tuner is tested against Flink 2.0 running on Java 17 running YARN application mode with Hadoop 3.4.1.
The workloads are located in the flink-jobs/ directory. You need to compile the JAR file and upload it to HDFS using flink-jobs/build.sh.
First, fill in the cluster address, job information and hyperparameters in config/config.yaml. The meaning of each configuration is described in utils/config.py.
Run:
python main.py --mode selection --stage coldstart --config config/config.yamlAn output folder will be created under tuner.saveDir in the config file.
Place the output directory in tuner.loadDir in the config file, and run:
python main.py --mode selection --stage analysis --config config/config.yamlPlace the output directories of historical tasks in selection/speedup.py, and run:
python main.py --mode selection --stage cluster --config config/config.yamlRemove the value of tuner.loadDir in the config file, fill in the selected knobs in the knobs section of the config file, and run:
python main.py --mode offline --config config/config.yamlPlace the output directory of the task to transfer from in tuner.loadDir, and run:
python main.py --mode offline --config config/config.yamlPlace the output directory of the offline trained task in tuner.loadDir, and run:
python main.py --mode online --config config/config.yaml