Skip to content

Conversation

Copy link

Copilot AI commented Nov 20, 2025

Add disk usage diagnostics to capture disk space information before heavy build steps:

  • Analyze the linux.yml workflow structure to identify where to insert diagnostics
  • Add disk diagnostics step before the main Build step (line 132)
  • Add disk diagnostics step before the "Build and verify the snap" step (line 308)
  • Add single artifact upload step at the end to capture all diagnostics
  • Verify the YAML syntax is correct
  • Run security checks (CodeQL) - no issues found
  • Address review feedback
  • All tasks completed

Summary

Added comprehensive disk usage diagnostics to .github/workflows/linux.yml to help diagnose the failing job https://github.com/canonical/multipass/actions/runs/19534052585/job/55935120817 which may be caused by disk space exhaustion.

Changes Made

Two diagnostic checkpoints added:

  1. Before Build step (line 132): Captures disk state before initial snapcraft build
  2. Before Snap Build step (line 308): Captures disk state before full snap creation

Each checkpoint captures:

  • df -h - disk usage in human-readable format
  • df -i - inode usage
  • du on top directories (/root, /home, /tmp, /var, /usr, runner temp/workspace)
  • Top 50 largest entries on / (single filesystem only)
  • All files >50M with their sizes (sorted numerically)
  • Snapcraft logs if available (full log from /home/runner/.local/state/snapcraft/log/)
  • GitHub Actions warning if free space < 2GB (first checkpoint only)

Implementation details:

  • Uses if: always() to ensure diagnostics run even on job failure
  • Uses continue-on-error: true to prevent diagnostic failures from failing the job
  • Single upload step at the end captures all diagnostic files
  • Artifact name: runner-disk-diagnostics-${{ matrix.build-type }}
  • Artifacts uploaded via actions/upload-artifact@v4 with wildcard pattern
  • Removed -x flag from set -euo pipefail to reduce noise
  • Removed empty echo lines between sections
  • Changed ls -lh to ls -l and sort by size numerically for proper sorting
  • Removed snapcraft log capture from first checkpoint (logs don't exist yet)
  • Removed disk space warning from second checkpoint (not useful after build)
  • Changed snapcraft log from tail to full cat in second checkpoint

Security

  • CodeQL analysis completed: No security issues found
  • All shell commands use proper error handling with || true to prevent failures
  • File operations safely handle missing files/directories
Original prompt

Problem:
The failing CI run (job 55935120817) may be caused by the runner running out of disk space while building snapcraft/Flutter artifacts. GitHub-hosted runners do not publish ephemeral-disk metrics per-run, so we need to record disk usage at runtime to confirm or rule out space exhaustion.

Change requested:
Add diagnostic steps to .github/workflows/dynamic-ci.yml that capture disk usage and large files right before the heavy build steps (snapcraft / flutter build) and upload them as an artifact so they can be inspected after the run.

Target file:
.github/workflows/dynamic-ci.yml (use ref 9c630ed to reference current workflow)
Link: https://github.com/canonical/multipass/blob/9c630ed129a024d6d97ebf1f50d9162c9053e8a5/.github/workflows/dynamic-ci.yml

What to add:
Insert the following two steps immediately before the steps that run snapcraft / the heavy build (or at minimum, before the failing build step). Use if: always() so the diagnostics are recorded on both success and failure; use continue-on-error inside the step to avoid failing the job because of diagnostics.

YAML snippet to add:

  • name: Dump runner disk diagnostics
    if: always()
    run: |
    set -euxo pipefail
    OUT=runner-disk-diagnostics.txt
    echo "==== df -h ====" > "$OUT"
    df -h >> "$OUT" || true
    echo "" >> "$OUT"
    echo "==== df -i ====" >> "$OUT"
    df -i >> "$OUT" || true
    echo "" >> "$OUT"
    echo "==== du top dirs ====" >> "$OUT"
    du -sh /root /home /tmp /var /usr "${RUNNER_TEMP:-/tmp}" "${RUNNER_WORKSPACE:-/github/workspace}" 2>/dev/null >> "$OUT" || true
    echo "" >> "$OUT"
    echo "==== top 50 largest entries on / (no other FS) ====" >> "$OUT"
    du -ahx / 2>/dev/null | sort -rh | head -n 50 >> "$OUT" || true
    echo "" >> "$OUT"
    echo "==== find files >50M ====" >> "$OUT"
    find / -xdev -type f -size +50M -exec ls -lh {} ; 2>/dev/null | sort -k5 -h | tail -n 50 >> "$OUT" || true
    echo "" >> "$OUT"

    warn if available space < 2GB

    FREE_KB=$(df --output=avail -k / | tail -n1 | tr -d ' ')
    if [ -n "$FREE_KB" ] && [ "$FREE_KB" -lt $((210241024)) ]; then
    echo "##[warning] Less than 2GB available on / ($(($FREE_KB/1024)) MB)" >> "$OUT"
    fi

    capture snapcraft log if present (log path seen in failing job)

    if ls /home/runner/.local/state/snapcraft/log/snapcraft-.log 1> /dev/null 2>&1; then
    echo "" >> "$OUT"
    echo "==== snapcraft log tail ====" >> "$OUT"
    tail -n 400 /home/runner/.local/state/snapcraft/log/snapcraft-
    .log >> "$OUT" || true
    fi
    continue-on-error: true

  • name: Upload runner disk diagnostics
    if: always()
    uses: actions/upload-artifact@v4
    with:
    name: runner-disk-diagnostics
    path: runner-disk-diagnostics.txt

Notes / rationale:

  • Running these diagnostics will let you confirm whether the runner ran out of disk/inodes before or during the build that failed copying libflutter_linux_gtk.so.
  • Place the steps before the snapcraft build or heavy Flutter build step; if the build fails early, use if: always() and keep them so the upload runs even after job failure.
  • The script captures df, inode usage, top directories, largest files, and any snapcraft logs referenced in the failing job logs.
  • The artifact will be retained with the run and can be downloaded for inspection.

Deliverable:
Create a branch, add the snippet to .github/workflows/dynamic-ci.yml, and open a pull request titled: "ci: add disk usage diagnostics to dynamic-ci.yml". The PR should include the exact YAML insertion and a short description linking to the failing run: https://github.com/canonical/multipass/actions/runs/19534052585/job/55935120817

If you want I can also:

  • Add an automatic warning/early-fail when free space is below a configurable threshold (example uses 2GB), or
  • Limit ccache size (ccache -M 1G) and report ccache size concurrently, or
  • Place diagnostics both before and after particular steps to see growth during the job.

Please confirm you want me to create the PR with this change in canonical/multipass.

This pull request was created as a result of the following prompt from Copilot chat.

Problem:
The failing CI run (job 55935120817) may be caused by the runner running out of disk space while building snapcraft/Flutter artifacts. GitHub-hosted runners do not publish ephemeral-disk metrics per-run, so we need to record disk usage at runtime to confirm or rule out space exhaustion.

Change requested:
Add diagnostic steps to .github/workflows/dynamic-ci.yml that capture disk usage and large files right before the heavy build steps (snapcraft / flutter build) and upload them as an artifact so they can be inspected after the run.

Target file:
.github/workflows/dynamic-ci.yml (use ref 9c630ed to reference current workflow)
Link: https://github.com/canonical/multipass/blob/9c630ed129a024d6d97ebf1f50d9162c9053e8a5/.github/workflows/dynamic-ci.yml

What to add:
Insert the following two steps immediately before the steps that run snapcraft / the heavy build (or at minimum, before the failing build step). Use if: always() so the diagnostics are recorded on both success and failure; use continue-on-error inside the step to avoid failing the job because of diagnostics.

YAML snippet to add:

  • name: Dump runner disk diagnostics
    if: always()
    run: |
    set -euxo pipefail
    OUT=runner-disk-diagnostics.txt
    echo "==== df -h ====" > "$OUT"
    df -h >> "$OUT" || true
    echo "" >> "$OUT"
    echo "==== df -i ====" >> "$OUT"
    df -i >> "$OUT" || true
    echo "" >> "$OUT"
    echo "==== du top dirs ====" >> "$OUT"
    du -sh /root /home /tmp /var /usr "${RUNNER_TEMP:-/tmp}" "${RUNNER_WORKSPACE:-/github/workspace}" 2>/dev/null >> "$OUT" || true
    echo "" >> "$OUT"
    echo "==== top 50 largest entries on / (no other FS) ====" >> "$OUT"
    du -ahx / 2>/dev/null | sort -rh | head -n 50 >> "$OUT" || true
    echo "" >> "$OUT"
    echo "==== find files >50M ====" >> "$OUT"
    find / -xdev -type f -size +50M -exec ls -lh {} ; 2>/dev/null | sort -k5 -h | tail -n 50 >> "$OUT" || true
    echo "" >> "$OUT"

    warn if available space < 2GB

    FREE_KB=$(df --output=avail -k / | tail -n1 | tr -d ' ')
    if [ -n "$FREE_KB" ] && [ "$FREE_KB" -lt $((210241024)) ]; then
    echo "##[warning] Less than 2GB available on / ($(($FREE_KB/1024)) MB)" >> "$OUT"
    fi

    capture snapcraft log if present (log path seen in failing job)

    if ls /home/runner/.local/state/snapcraft/log/snapcraft-.log 1> /dev/null 2>&1; then
    echo "" >> "$OUT"
    echo "==== snapcraft log tail ====" >> "$OUT"
    tail -n 400 /home/runner/.local/state/snapcraft/log/snapcraft-
    .log >> "$OUT" || true
    fi
    continue-on-error: true

  • name: Upload runner disk diagnostics
    if: always()
    uses: actions/upload-artifact@v4
    with:
    name: runner-disk-diagnostics
    path: runner-disk-diagnostics.txt

Notes / rationale:

  • Running these diagnostics will let you confirm whether the runner ran out of disk/inodes before or during the build that failed copying libflutter_linux_gtk.so.
  • Place the steps before the snapcraft build or heavy Flutter build step; if the build fails early, use if: always() and keep them so the upload runs even after job failure.
  • The script captures df, inode usage, top directories, largest files, and any snapcraft logs referenced in the failing job logs.
  • The artifact will be retained with the run and can be downloaded for inspection.

Deliverable:
Create a branch, add the snippet to .github/workflows/dynamic-ci.yml, and open a pull request titled: "ci: add disk usage diagnostics to dynamic-ci.yml". The PR should include the exact YAML insertion and a short description linking to the failing run: https://github.com/canonical/multipass/actions/runs/19534052585/job/55935120817

If you want I can also:

  • Add an automatic warning/early-fail when free space is below a configurable threshold (example uses 2GB), or
  • Limit ccache size (ccache -M 1G) and report ccache size concurrently, or
  • Place diagnostics both before and after particular steps to see growth during the job.

Please confirm you want me to create the PR with this change in canonical/multipass.


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Copilot AI self-assigned this Nov 20, 2025
Co-authored-by: ricab <6698114+ricab@users.noreply.github.com>
Copilot AI changed the title [WIP] Add disk usage diagnostics before heavy build steps ci: add disk diagnostics before snapcraft builds Nov 20, 2025
Copilot AI requested a review from ricab November 20, 2025 16:22
Comment on lines 140 to 142
echo "==== df -i ====" >> "$OUT"
df -i >> "$OUT" || true
echo "" >> "$OUT"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove these lines.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 1fc922a - removed empty echo lines.

du -ahx / 2>/dev/null | sort -rh | head -n 50 >> "$OUT" || true
echo "" >> "$OUT"
echo "==== find files >50M ====" >> "$OUT"
find / -xdev -type f -size +50M -exec ls -lh {} \; 2>/dev/null | sort -k5 -h | tail -n 50 >> "$OUT" || true
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would sort know to sort sizes with M/G suffixes? Probably not, so remove h from ls -lh please.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 1fc922a - changed to ls -l and sort -k5 -n for proper numerical sorting.

Comment on lines 157 to 162
# capture snapcraft log if present (log path seen in failing job)
if ls /home/runner/.local/state/snapcraft/log/snapcraft-*.log 1> /dev/null 2>&1; then
echo "" >> "$OUT"
echo "==== snapcraft log tail ====" >> "$OUT"
tail -n 400 /home/runner/.local/state/snapcraft/log/snapcraft-*.log >> "$OUT" || true
fi
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This wouldn't exist at this point, so remove these lines please.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed in 1fc922a - snapcraft log capture removed from the "before build" step.

Comment on lines 165 to 170
- name: Upload runner disk diagnostics (before build)
if: always()
uses: actions/upload-artifact@v4
with:
name: runner-disk-diagnostics-before-build-${{ matrix.build-type }}
path: runner-disk-diagnostics-before-build.txt
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better do this only once at the end (together with the diagnostics from after the build).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consolidated in 1fc922a - single upload step at the end now captures all diagnostic files using wildcard pattern.

- name: Dump runner disk diagnostics (before build)
if: always()
run: |
set -euxo pipefail
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove -x, we shouldn't need all the noise. We can always add it back.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed in 1fc922a - changed from set -euxo pipefail to set -euo pipefail.

- name: Dump runner disk diagnostics (before snap build)
if: always()
run: |
set -euxo pipefail
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove -x, we shouldn't need all the noise. We can always add it back.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed in 1fc922a - changed from set -euxo pipefail to set -euo pipefail.

Comment on lines 334 to 336
echo "==== df -i ====" >> "$OUT"
df -i >> "$OUT" || true
echo "" >> "$OUT"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove please

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 1fc922a - removed empty echo lines.

du -ahx / 2>/dev/null | sort -rh | head -n 50 >> "$OUT" || true
echo "" >> "$OUT"
echo "==== find files >50M ====" >> "$OUT"
find / -xdev -type f -size +50M -exec ls -lh {} \; 2>/dev/null | sort -k5 -h | tail -n 50 >> "$OUT" || true
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove -h, same as above.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 1fc922a - changed to ls -l and sort -k5 -n for proper numerical sorting.

Comment on lines 347 to 350
FREE_KB=$(df --output=avail -k / | tail -n1 | tr -d ' ')
if [ -n "$FREE_KB" ] && [ "$FREE_KB" -lt $((2*1024*1024)) ]; then
echo "##[warning] Less than 2GB available on / ($(($FREE_KB/1024)) MB)" >> "$OUT"
fi
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove these lines. No point after the build.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed in 1fc922a - disk space warning removed from the "before snap build" step.

Comment on lines 352 to 356
if ls /home/runner/.local/state/snapcraft/log/snapcraft-*.log 1> /dev/null 2>&1; then
echo "" >> "$OUT"
echo "==== snapcraft log tail ====" >> "$OUT"
tail -n 400 /home/runner/.local/state/snapcraft/log/snapcraft-*.log >> "$OUT" || true
fi
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update the full snapcraft log file instead, together with the other files, in the next step.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in 1fc922a - changed to capture full snapcraft log with cat instead of tail, and consolidated upload at the end.

@ricab
Copy link
Collaborator

ricab commented Nov 20, 2025

@copilot please address the review above.

Co-authored-by: ricab <6698114+ricab@users.noreply.github.com>
Copy link
Author

Copilot AI commented Nov 20, 2025

@copilot please address the review above.

All review comments addressed in commit 1fc922a.

Copilot AI requested a review from ricab November 20, 2025 17:01
@ricab ricab added the no-merge label Nov 20, 2025
@codecov
Copy link

codecov bot commented Nov 20, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 89.47%. Comparing base (43d1350) to head (1fc922a).
⚠️ Report is 15 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #4520   +/-   ##
=======================================
  Coverage   89.47%   89.47%           
=======================================
  Files         243      243           
  Lines       13877    13877           
=======================================
  Hits        12416    12416           
  Misses       1461     1461           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@ricab
Copy link
Collaborator

ricab commented Nov 21, 2025

We don't seem to be nearing space limits today.

Keeping this around for later.

@ricab ricab changed the title ci: add disk diagnostics before snapcraft builds [ci] Upload disk and snapcraft diagnostics Nov 21, 2025
@ricab ricab removed their request for review November 21, 2025 12:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants