Skip to content

error when running the benchmark #35

@lethean1

Description

@lethean1

Hi, I meet some problems running your benchmark.
I did the following to install Salus.
first, start the server:docker run --rm -it registry.gitlab.com/salus/salus and get the following result:

[2022-12-19 08:59:16.965984] [1] [default] [I] Running build type: Debug
[2022-12-19 08:59:16.966143] [1] [default] [I] Verbose logging level: 0 file: verbose.log
[2022-12-19 08:59:16.966157] [1] [default] [I] Performance logging: disabled file: verbose.log
[2022-12-19 08:59:16.966168] [1] [default] [I] Allocation logging: enabled
[2022-12-19 08:59:16.966177] [1] [default] [I] Scheduling parameters:
[2022-12-19 08:59:16.966187] [1] [default] [I]     Policy: pack
[2022-12-19 08:59:16.966197] [1] [default] [I]     MaxQueueHeadWaiting: 50
[2022-12-19 08:59:16.966206] [1] [default] [I]     WorkConservative: on
[2022-12-19 08:59:16.966333] [41] [default] [I] TaskExecutor scheduling thread started
[2022-12-19 08:59:16.966463] [42] [default] [I] ExecutionEngine scheduling thread started
[2022-12-19 08:59:16.966914] [1] [default] [I] Starting server listening at tcp://*:5501

then I run the benchmark in the same docker:

pip3 install -r requirements.txt
python3 -m benchmarks.driver exp308

And meet this error:

root@349ebb6dcb74:~/Salus# python3 -m benchmarks.driver card308
[2022-12-19 13:06:40,304] [cli] [INFO] Running experiment: benchmarks.exps.card308
[2022-12-19 13:06:40,304] [cli] [INFO] Saving log files to: /root/Salus/scripts/templogs/card308
[2022-12-19 13:06:40,307] [benchmarks.exps.card308] [INFO] **** Saving SavedModel: vgg11eval_1
[2022-12-19 13:06:40,307] [benchmarks.exps.card308] [INFO] **** Location: /root/Salus/scripts/templogs/card308
[2022-12-19 13:06:40,308] [benchmarks.driver.utils.utils] [INFO] Using temporary directory: /dev/shm/tmpu8dbzl_q
[2022-12-19 13:06:40,308] [benchmarks.driver.workload] [INFO] Starting workload `vgg11eval_1' on TF with output file: /dev/shm/tmpu8dbzl_q/vgg11eval_1.tf.1iter.0.output
[2022-12-19 13:06:40,308] [benchmarks.driver.runner] [INFO] Starting workload with cmd: ['stdbuf', '-o0', '-e0', '--', 'python', 'tf_cnn_benchmarks.py', '--display_every=1', '--num_gpus=1', '--variable_update=parameter_server', '--nodistortions', '--executor=tf', '--num_batches=1', '--batch_size=1', '--model_dir=/symbiotic/peifeng/tf_cnn_benchmarks_models/legacy_checkpoint_models/vgg11', '--model=vgg11', '--eval_block=true', '--eval', '--saved_model_dir=/symbiotic/peifeng/tf_cnn_benchmarks_models/saved_models/vgg11']
[2022-12-19 13:06:40,312] [benchmarks.exps] [INFO] Waiting all workloads to finish
[2022-12-19 13:06:40,339] [benchmarks.driver.server] [INFO] Workload vgg11eval_1 exited with 1
Press enter to continue...
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/local/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/root/Salus/benchmarks/driver/__main__.py", line 212, in <module>
    sys.exit(main())
  File "/root/Salus/benchmarks/driver/__main__.py", line 198, in main
    expm.main(argv)
  File "/root/Salus/benchmarks/exps/card308.py", line 64, in main
    run_tf(FLAGS.save_dir, wl)
  File "/root/Salus/benchmarks/exps/__init__.py", line 124, in run_tf
    raise RuntimeError(f'Workload {w.canonical_name} did not finish cleanly: {w.proc.returncode}')
RuntimeError: Workload vgg11eval_1 did not finish cleanly: 1

Any help to solve this problem is appreciated!!!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions