Skip to content

Commit

Permalink
feat: Earthly cache watcher utility (#269)
Browse files Browse the repository at this point in the history
* feat: initial commit

* feat: initial file watcher

* refactor: minor

* feat: config

* feat: interval

* feat: handle size exceeding

* feat: trigger events

* refactor: check sizes

* refactor: print

* feat: journal

* feat: conf

* fix: logging

* feat: pyproject

* docs: add proper readme

* fix: growth indexes

* fix: logging

* feat: default options

* chore: format

* chore: lintfix

* ci: fix earthfile

* fix: logging

* fix: app.service

* feat: config argument

* refactor: service file

* fix: service script

* fix: params

* fix: to simple servicei instead of forking

* fix: watch dir location

* feat: layer watching

* chore: type annotation

* chore: lint

* chore: log layer name

* fix: type

* fix: first checks

* chore: lintfix

* fix: is file check

* feat: init log

* feat: log init

* fix: log large layer

* fix: layer name

* fix: parameters and notes

* refactor: minor code

* feat: print number formatter

* refactor: minor number formatter

* feat: safe delete

* feat: handle file accessing

* feat: overall compacting

* fix: default config path in systemd service

* fix: default.conf

* docs: systemd installation

* feat: handle move

* fix: growth index iteration

* feat: trigger once

* chore: warning to error

* fix: empty set

* fix: has triggered layer

* fix: layer discard

* feat: loguru

* fix: markdownlint

* chore: sort import

* chore: lintfix

* chore: rufflint fix

* fix: log level info

---------

Co-authored-by: Oleksandr Prokhorenko <[email protected]>
  • Loading branch information
apskhem and minikin authored Jul 24, 2024
1 parent d7ddf44 commit a57863a
Show file tree
Hide file tree
Showing 8 changed files with 694 additions and 0 deletions.
10 changes: 10 additions & 0 deletions utilities/earthly-cache-watcher/Earthfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
VERSION 0.8

IMPORT github.com/input-output-hk/catalyst-ci/earthly/python:v3.1.7 AS python-ci

check:
FROM python-ci+python-base

COPY . .

DO python-ci+CHECK
110 changes: 110 additions & 0 deletions utilities/earthly-cache-watcher/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,110 @@
<!-- cspell: words loguru inotify journalctl -->

# Earthly Cache Watcher

Logs an error when cache layers reach their maximum size limit.

## Functionality

* Watch files changes in a specified directory.
* Trigger events when either an individual file or
a watched directory grows beyond certain criteria.
* Main triggering criteria: single file size exceeds, watched directory size exceeds,
watched directory growth in size within an interval exceeds.

## Configuration Parameters

There are several options of configurable parameters:

* `watch_dir` - A directory to watch recursively. (default: `.`)
* `large_layer_size` - A parameter to determine and detect an individual file
if reaches the criteria of a large-sized file. (default: `1073741824` bytes)
* `max_cache_size` - A parameter to determine `watch_dir`
if reaches the criteria. (default: `536870912000` bytes)
* `time_window` - The duration of time interval to detect growth
in size of `watch_dir`. (default: `10` secs)
* `max_time_window_growth_size` - A criteria to determine within an interval to detect
if `watch_dir` exceeds the size criteria. (default: `53687091200`)
* `log_file_accessing_err` - Logs errors occurring during file access. (default: `True`)

Typically, these configuration will be read from the specified file.

## System Setup

If the system has many files to watch, you should consider to config this parameter
with `sysctl` to raise the maximum numbers of files to watch:

```bash
sudo sysctl fs.inotify.max_user_watches=25000000
echo 'fs.inotify.max_user_watches=25000000' | sudo tee -a /etc/sysctl.conf
```

Feel free to change the number of the parameter to fit your requirement.

## Systemd Unit Installation

Run the following commands to install the program as a unit in systemd service:

```bash
systemctl daemon-reload
systemctl enable /path/to/your/watchdog.service
systemctl start watchdog
```

To view the status and logs, use these two commands:

```bash
systemctl status watchdog
```

Or

```bash
journalctl -xeu watchdog.service
```

## Logging Example

Logging example using `loguru`:

```json
{
"text": "read config from '/root/catalyst-ci/utilities/earthly-cache-watcher/default.conf'\n",
"record": {
"elapsed": {
"repr": "0:00:00.007240",
"seconds": 0.00724
},
"exception": null,
"extra": {},
"file": {
"name": "main.py",
"path": "/root/catalyst-ci/utilities/earthly-cache-watcher/main.py"
},
"function": "main",
"level": {
"icon": "ℹ️",
"name": "INFO",
"no": 20
},
"line": 298,
"message": "read config from '/root/catalyst-ci/utilities/earthly-cache-watcher/default.conf'",
"module": "main",
"name": "__main__",
"process": {
"id": 59917,
"name": "MainProcess"
},
"thread": {
"id": 8615431168,
"name": "MainThread"
},
"time": {
"repr": "2024-07-04 19:22:31.458044+07:00",
"timestamp": 1720095751.458044
}
}
}
```

Notes: The logging result is prettified, the actual result is a single-lined message.
8 changes: 8 additions & 0 deletions utilities/earthly-cache-watcher/default.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# cspell: words runc overlayfs

watch_dir = /var/lib/docker/volumes/earthly-satellite_earthly-tmp/_data/buildkit/runc-overlayfs/snapshots/snapshots
large_layer_size = 1073741824 # 1GB
max_cache_size = 536870912000 # 500GB
time_window = 10 # 10 secs
max_time_window_growth_size = 53687091200 # 50GB
log_file_accessing_err = True
45 changes: 45 additions & 0 deletions utilities/earthly-cache-watcher/helper.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
import os


def get_subdirectory_name(working_dir_path: str, path: str):
"""
Extracts the direct subdirectory name from the given path within
the specified working directory.
Parameters:
working_dir_path (str): The absolute path of the current working directory.
path (str): The absolute path from which to extract the direct subdir name.
Returns:
str | None: The name of the direct subdirectory if the given path is within
the working directory; otherwise, None.
Example:
>>> working_dir = "/home/user/projects"
>>> given_path = "/home/user/projects/subdir1/file.txt"
>>> get_subdirectory_name(working_dir, given_path)
'subdir1'
>>> given_path_invalid = "/home/user/projects1/subdir1/file.txt"
>>> get_subdirectory_name(working_dir, given_path_invalid)
None
"""
working_dir_path = os.path.abspath(working_dir_path)
path = os.path.abspath(path)

if (
os.path.commonpath([working_dir_path])
!= os.path.commonpath([working_dir_path, path])
):
return None

relative_path = os.path.relpath(path, working_dir_path)
parts = relative_path.split(os.sep)

if parts:
return parts[0]
return None

def add_or_init(obj: dict[str, int], key: str, value: int):
obj.setdefault(key, 0)
obj[key] += value
Loading

0 comments on commit a57863a

Please sign in to comment.