Skip to content

Conversation

@luarss
Copy link
Contributor

@luarss luarss commented Jan 11, 2026

  • Introduced TensorBoardLogger class for logging metrics during sweeps.
  • Updated sweep function to integrate TensorBoard logging.
  • Enhanced consumer function to log metrics after each parameter run.

* Introduced `TensorBoardLogger` class for logging metrics during sweeps.
* Updated `sweep` function to integrate TensorBoard logging.
* Enhanced `consumer` function to log metrics after each parameter run.

Signed-off-by: Jack Luar <[email protected]>
@luarss luarss added the autotuner Flow autotuner label Jan 11, 2026
Signed-off-by: Jack Luar <[email protected]>
@luarss luarss requested a review from vvbandeira January 11, 2026 17:21
@luarss
Copy link
Contributor Author

luarss commented Jan 11, 2026

@jeffng-or Back-ported the feature, could you please checkout this branch and let me know if it works?

@jeffng-or
Copy link
Contributor

@jeffng-or Back-ported the feature, could you please checkout this branch and let me know if it works?

Great, thanks! I will check it out and let you know how it goes.

@jeffng-or
Copy link
Contributor

It looks like the code is trying to write the SDC file into tools/AutoTuner/src/constraint.sdc, which isn't writable and also not in a trial-specific directory:

(consumer pid=509) [INFO TUN-0007] Scheduling run for parameter {'_SDC_CLK_PERIOD': 250}.
Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/OpenROAD-flow-scripts/tools/AutoTuner/src/autotuner/distributed.py", line 676, in <module>
    main()
  File "/OpenROAD-flow-scripts/tools/AutoTuner/src/autotuner/distributed.py", line 672, in main
    sweep()
  File "/OpenROAD-flow-scripts/tools/AutoTuner/src/autotuner/distributed.py", line 605, in sweep
    ray.get(workers)
  File "/usr/local/lib/python3.10/dist-packages/ray/_private/auto_init_hook.py", line 21, in auto_init_wrapper
    return fn(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/ray/_private/client_mode_hook.py", line 103, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/ray/_private/worker.py", line 2771, in get
    values, debugger_breakpoint = worker.get_objects(object_refs, timeout=timeout)
  File "/usr/local/lib/python3.10/dist-packages/ray/_private/worker.py", line 919, in get_objects
    raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(PermissionError): ray::consumer() (pid=509, ip=172.17.0.2)
  File "/OpenROAD-flow-scripts/tools/AutoTuner/src/autotuner/utils.py", line 678, in consumer
    metric_file, _ = ray.get(
ray.exceptions.RayTaskError(PermissionError): ray::openroad_distributed() (pid=499, ip=172.17.0.2)
  File "/OpenROAD-flow-scripts/tools/AutoTuner/src/autotuner/utils.py", line 646, in openroad_distributed
    config = parse_config(
  File "/OpenROAD-flow-scripts/tools/AutoTuner/src/autotuner/utils.py", line 257, in parse_config
    write_sdc(sdc, path, sdc_original, constraints_sdc)
  File "/OpenROAD-flow-scripts/tools/AutoTuner/src/autotuner/utils.py", line 116, in write_sdc
    with open(file_name, "w") as file:
PermissionError: [Errno 13] Permission denied: '/OpenROAD-flow-scripts/tools/AutoTuner/src/constraint.sdc'

I'm running within a docker container where I've mounted the tools/AutoTuner/src/autotuner directory, but not tools/AutoTuner/src. So, the src directory is not writable.

Here's the script that I use to start the container:

#!/bin/bash

#
# Method to use docker CLI to determine if we're using docker or podman
#
# Sets container_engine global variable with either "docker" or "podman"
#
get_container_engine () {
    local DOCKER_VERSION_STRING=$(docker --version 2> /dev/null)

    if [[ "$DOCKER_VERSION_STRING" == *"Docker"* ]]; then
        container_engine="docker"
    elif [[ "$DOCKER_VERSION_STRING" == *"podman"* ]]; then
        container_engine="podman"
    else
        echo "Unable to determine container engine using docker CLI"
        exit 1
    fi
}

if [ $# -lt 1 ]; then
    echo "Usage: run_at_docker.sh <port_num>"
    exit
fi

port_num=$1
get_container_engine

if [[ $container_engine == "podman" ]]; then
    user_args="--privileged --userns=keep-id"
else
    user_args="-u $(id -u ${USER}):$(id -g ${USER})"
fi

host_dir=`pwd`
docker run --privileged --rm -it -p $port_num:$port_num \
       $user_args \
	-v $host_dir:/OpenROAD-flow-scripts/flow:Z \
 	-v $host_dir/../tools/AutoTuner/src/autotuner:/OpenROAD-flow-scripts/tools/AutoTuner/src/autotuner:Z \
 	-v /workspace/rapidus/current/rapidus:/rapidus:Z \
 	-v /platforms/Rapidus/2HP:/platforms/Rapidus/2HP:Z \
	autotuner:1.0 bash

Here's the Dockerfile that I used to build the autotuner:1.0 container:

# syntax=docker/dockerfile:1
#
# Installs ORFS from docker image 
#

FROM openroad/orfs-verific:v3.0-4385-g4ae3d761e

# install AT required packages
RUN pip3 install -U -r /OpenROAD-flow-scripts/tools/AutoTuner/requirements.txt
RUN pip3 install torchvision

# ORFS installation dir
WORKDIR /OpenROAD-flow-scripts/tools/AutoTuner/src

To build the docker image:

docker build -t autotuner:1.0 -f Dockerfile .

To start the container:

./run_at_docker.sh 6008

Within the container:

python3 -m autotuner.distributed --design gcd --platform rapidus2hp --config /OpenROAD-flow-scripts/flow/designs/rapidus2hp/gcd/autotuner.json --experiment sweep --jobs 20 sweep

Copy link
Member

@vvbandeira vvbandeira left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@luarss
Please address Jeff's concerns and request a new review when he is satisfied.

@jeffng-or
Copy link
Contributor

So, here are some differences that I see between tune and sweep:

  • tune calls openroad_distributed from a trial specific directory (e.g. /tmp/ray/session_2026-01-12_22-41-04_998954_29112/artifacts/2026-01-12_22-41-07/tune-tune/working_dirs/variant-AutoTunerBase-30ce30d8-ray)
  • sweep calls openroad_distributed from the os.getcwd()
  • In my case, I'm calling the AT from /OpenROAD-flow-scripts/tools/AutoTuner/src, which is located in the docker image filesystem and isn't writable
  • I can workaround this my changing my container mount point to mount tools/AutoTuner/src, instead of tools/AutoTuner/src/autotuner

Maybe we should be writing the SDC file under the experiment directory, which would be under flow/logs? At least we'd know that the directory is writable.

After I make the change, the AT starts running trials. As it's running, I'm noticing the following:

  • When I "grep -w core_clock" in logs/rapidus2hp/gcd/sweep-sweep/*/OpenROAD-flow-scripts/tools/AutoTuner/src/constraint.sdc/metrics.json, all of them report that the clock frequency is 290. So, I'm not sure that the SDC file is being uniquely created for each trial. This is further reinforced when I compare the metrics.json files for two runs, which are virtually identical.
  • I have --jobs set to 20, but it doesn't look like 20 jobs are run in parallel. The job has been running for an hour, but no data has been written to logs/rapidus2hp/gcd/sweep-sweep since the first five minutes of the run.

@jeffng-or
Copy link
Contributor

The job ran overnight without completing, so there's something off. Please use the following flow for testing:

  • docker load -i /home/jeffng/Jan2026Demo/v3.0-4385-g4ae3d761e.tar
  • docker build -t autotuner:1.0 -f Dockerfile . (Use Dockerfile posted above)
  • git checkout 4ae3d76 (in your ORFS workspace)
  • Replace designs/rapidus2hp/gcd/autotuner.json with the content below
  • Execute run_at_docker.sh 6007 (the script above - note that you'll have to change the /workspace/rapidus/current/rapidus path to /platforms or wherever your rapidus workspace is)
  • export PLATFORM_HOME=/rapidus (in docker container)
  • Execute the python3 call above

autotuner.json

{
    "_SDC_FILE_PATH": "constraint.sdc",
    "_SDC_CLK_PERIOD": {
        "type": "int",
        "minmax": [
            180,
            300
        ],
        "step": 10
    }}

Once it works for you, I can try again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

autotuner Flow autotuner

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants