Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

borg2: borg {compact,create,..} fails with "Failed to release the lock" #8390

Open
mirko opened this issue Sep 18, 2024 · 17 comments · Fixed by #8394
Open

borg2: borg {compact,create,..} fails with "Failed to release the lock" #8390

mirko opened this issue Sep 18, 2024 · 17 comments · Fixed by #8394
Labels
Milestone

Comments

@mirko
Copy link

mirko commented Sep 18, 2024

Have you checked borgbackup docs, FAQ, and open GitHub issues?

Yes

Is this a BUG / ISSUE report or a QUESTION?

BUG / QUESTION

System information. For client/server mode post info for both machines.

Your borg version (borg -V).

borg2 2.0.0b11.dev3+gc278a8e8.d20240915 with latest master/HEAD of borgstore

Full borg commandline that lead to the problem (leave away excludes and passwords)

  --- EXEC: borg -r sftp://HOST//storage/backup_mirko/borg2 create [..]
[..]
d rootfs
d .
terminating with success status, rc 0
  --- EXEC: borg -r sftp://HOST//storage/backup_mirko/borg2 prune -v --list --keep-daily=30 --keep-weekly= --keep-monthly=1 --keep-yearly=1
usage: borg prune [-h] [--critical] [--error] [--warning] [--info] [--debug] [--debug-topic TOPIC] [-p] [--iec] [--log-json]
                  [--lock-wait SECONDS] [--show-version] [--show-rc] [--umask M] [--remote-path PATH] [--upload-ratelimit RATE]
                  [--upload-buffer UPLOAD_BUFFER] [--debug-profile FILE] [--rsh RSH] [--socket [PATH]] [-r REPO] [-n] [--list]
                  [--short] [--list-pruned] [--list-kept] [--format FORMAT] [--keep-within INTERVAL] [--keep-last SECONDLY]
                  [--keep-minutely MINUTELY] [-H HOURLY] [-d DAILY] [-w WEEKLY] [-m MONTHLY] [-y YEARLY] [-a PATTERN]
                  [--oldest TIMESPAN | --newest TIMESPAN] [--older TIMESPAN | --newer TIMESPAN]
borg prune: error: argument -w/--keep-weekly: invalid int value: ''
  --- EXEC: borg -r sftp://HOST//storage/backup_mirko/borg2 compact
Failed to release the lock <Store(url='sftp://HOST//storage/backup_mirko/borg2', levels=[('config/', [0]), ('data/', [2])])> (was not locked).

Error:

NotLocked: Failed to release the lock <Store(url='sftp://HOST//storage/backup_mirko/borg2', levels=[('config/', [0]), ('data/', [2])])> (was not locked).

Describe the problem you're observing.

Can you reproduce the problem? If so, describe how. If not, describe troubleshooting steps you took before opening the issue.

I scripted around borg (without checking exit codes) and had a typo in the borg prune line, causing it to fail due to typo in the passed args.

Next command to be executed was borg compact which then surprisingly fails with error:

NotLocked: Failed to release the lock <Store(url='sftp://HOST//storage/backup_mirko/borg2', levels=[('config/', [0]), ('data/', [2])])> (was not locked).

Not sure what's going on here - borg compact failing to release a lock feels rather strange.

Include any warning/errors/backtraces from the system logs

  --- EXEC: borg -r sftp://HOST//storage/backup_mirko/borg2 prune -v --list --keep-daily=30 --keep-weekly= --keep-monthly=1 --keep-yearly=1
usage: borg prune [-h] [--critical] [--error] [--warning] [--info] [--debug] [--debug-topic TOPIC] [-p] [--iec] [--log-json]
                  [--lock-wait SECONDS] [--show-version] [--show-rc] [--umask M] [--remote-path PATH] [--upload-ratelimit RATE]
                  [--upload-buffer UPLOAD_BUFFER] [--debug-profile FILE] [--rsh RSH] [--socket [PATH]] [-r REPO] [-n] [--list]
                  [--short] [--list-pruned] [--list-kept] [--format FORMAT] [--keep-within INTERVAL] [--keep-last SECONDLY]
                  [--keep-minutely MINUTELY] [-H HOURLY] [-d DAILY] [-w WEEKLY] [-m MONTHLY] [-y YEARLY] [-a PATTERN]
                  [--oldest TIMESPAN | --newest TIMESPAN] [--older TIMESPAN | --newer TIMESPAN]
borg prune: error: argument -w/--keep-weekly: invalid int value: ''
  --- EXEC: borg -r sftp://HOST//storage/backup_mirko/borg2 compact
Failed to release the lock <Store(url='sftp://HOST//storage/backup_mirko/borg2', levels=[('config/', [0]), ('data/', [2])])> (was not locked).

Error:

NotLocked: Failed to release the lock <Store(url='sftp://HOST//storage/backup_mirko/borg2', levels=[('config/', [0]), ('data/', [2])])> (was not locked).

If reporting bugs, please include the following:

Traceback (most recent call last):
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/archiver/_common.py", line 176, in wrapper
    return method(self, args, repository=repository, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/archiver/compact_cmd.py", line 156, in do_compact
    ArchiveGarbageCollector(repository, manifest).garbage_collect()
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/archiver/compact_cmd.py", line 40, in garbage_collect
    self.repository_chunks = self.get_repository_chunks()
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/archiver/compact_cmd.py", line 53, in get_repository_chunks
    result = self.repository.list(limit=LIST_SCAN_LIMIT, marker=marker)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/repository.py", line 339, in list
    self._lock_refresh()
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/repository.py", line 163, in _lock_refresh
    self.lock.refresh()
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/storelocking.py", line 237, in refresh
    raise LockTimeout(str(self.store))  # our lock was killed, there is no safe way to continue.
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
borg.storelocking.LockTimeout: Failed to create/acquire the lock <Store(url='sftp://HOST//storage/backup_mirko/borg2', levels=[('config/', [0]), ('data/', [2])])> (timeout).

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/archiver/__init__.py", line 622, in main
    exit_code = archiver.run(args)
                ^^^^^^^^^^^^^^^^^^
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/archiver/__init__.py", line 516, in run
    rc = func(args)
         ^^^^^^^^^^
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/archiver/_common.py", line 151, in wrapper
    with repository:
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/repository.py", line 138, in __exit__
    self.close()
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/repository.py", line 204, in close
    self.lock.release()
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/storelocking.py", line 194, in release
    raise NotLocked(str(self.store))
borg.storelocking.NotLocked: Failed to release the lock <Store(url='sftp://HOST//storage/backup_mirko/borg2', levels=[('config/', [0]), ('data/', [2])])> (was not locked).

Platform: Linux thar 6.10.9-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.10.9-1 (2024-09-08) x86_64
Linux: Unknown Linux
Borg: 2.0.0b11.dev3+gc278a8e8.d20240915  Python: CPython 3.11.9 msgpack: 1.1.0 fuse: None [pyfuse3,llfuse]
PID: 356139  CWD: /tmp/.lxc-backups/gsmk
sys.argv: ['/usr/local/bin/borg', '-r', 'sftp://HOST//storage/backup_mirko/borg2', 'compact']
SSH_ORIGINAL_COMMAND: None
@ThomasWaldmann
Copy link
Member

ThomasWaldmann commented Sep 18, 2024

The main issue here seems to be this happening while it was doing repository.list():

    raise LockTimeout(str(self.store))  # our lock was killed, there is no safe way to continue.

That other locking error is just something triggered by that, when it __exit__s the context manager.

So, it HAD the lock (and regularly refreshed it while doing repository.list()), but noticed that the lock vanished.

So, did you use break-lock in parallel?

Did your machine go to sleep while doing repository.list?

@mirko
Copy link
Author

mirko commented Sep 18, 2024

The main issue here seems to be this happening while it was doing repository.list():

    raise LockTimeout(str(self.store))  # our lock was killed, there is no safe way to continue.

That other locking error is just something triggered by that, when it __exit__s the context manager.

So, it HAD the lock (and regularly refreshed it while doing repository.list()), but noticed that the lock vanished.

So, did you use break-lock in parallel?

Did your machine go to sleep while doing repository.list?

Nope and nope.

Thinking about it, though, another possibility is that at that moment I had 2 instances of borg run. Though, definitely into different repos (one SFTP, the other local). Is that something I shouldn't do? From the error the lock appears to be repo specific and not global.

@ThomasWaldmann
Copy link
Member

borgs dealing with different repos don't influence each other (except maybe needed ressources, cpu, ram, i/o). with current borg2 a lot even works in parallel using the same repo.

i'll have a look at the locking code later, the storelocking is rather fresh code still, maybe there is a bug.

@ThomasWaldmann ThomasWaldmann added this to the 2.0.0b11 milestone Sep 18, 2024
@mirko
Copy link
Author

mirko commented Sep 18, 2024

OK, it's at least not about the failing borg prune line, neither it's about borg compact:

BACKUP CONTAINER: android
>>> CONTAINER: (android) - METHOD: borg2
  --- SRC_DIR:  /tmp/.lxc-backups/android
  --- DST_URL:  sftp://HOST//storage/backup_mirko/borg2
  --- ARCHIVE:  vm:android_2024-09-18T15:07:34+02:00
  --- EXEC: borg -r sftp://HOST//storage/backup_mirko/borg2 prune -v --list --keep-last=14 --keep-within=2w --keep-daily=30 --keep-weekly=8 --keep-monthly=1 --keep-yearly=1 -a sh:vm:android_* --dry-run
Connected (version 2.0, client dropbear)
Authentication (publickey) successful!
[chan 0] Opened sftp connection (server version 3)
[chan 0] sftp session closed.
  --- EXEC: borg -r sftp://HOST//storage/backup_mirko/borg2 create --show-rc vm:android_2024-09-18T15:07:34+02:00 . 
Failed to release the lock <Store(url='sftp://HOST//storage/backup_mirko/borg2', levels=[('config/', [0]), ('data/', [2])])> (was not locked).

Error:

NotLocked: Failed to release the lock <Store(url='sftp://HOST//storage/backup_mirko/borg2', levels=[('config/', [0]), ('data/', [2])])> (was not locked).

If reporting bugs, please include the following:

Traceback (most recent call last):
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/archiver/_common.py", line 176, in wrapper
    return method(self, args, repository=repository, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/archiver/create_cmd.py", line 268, in do_create
    create_inner(archive, cache, fso)
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/archiver/create_cmd.py", line 169, in create_inner
    self._rec_walk(
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/archiver/create_cmd.py", line 527, in _rec_walk
    self._rec_walk(
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/archiver/create_cmd.py", line 527, in _rec_walk
    self._rec_walk(
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/archiver/create_cmd.py", line 467, in _rec_walk
    status = self._process_any(
             ^^^^^^^^^^^^^^^^^^
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/archiver/create_cmd.py", line 284, in _process_any
    return fso.process_file(
           ^^^^^^^^^^^^^^^^^
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/archive.py", line 1359, in process_file
    if not cache.seen_chunk(chunk.id):
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/cache.py", line 614, in seen_chunk
    entry = self.chunks.get(id, ChunkIndexEntry(0, None))
            ^^^^^^^^^^^
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/cache.py", line 610, in chunks
    self._chunks = build_chunkindex_from_repo(self.repository)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/cache.py", line 576, in build_chunkindex_from_repo
    result = repository.list(limit=LIST_SCAN_LIMIT, marker=marker)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/repository.py", line 339, in list
    self._lock_refresh()
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/repository.py", line 163, in _lock_refresh
    self.lock.refresh()
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/storelocking.py", line 237, in refresh
    raise LockTimeout(str(self.store))  # our lock was killed, there is no safe way to continue.
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
borg.storelocking.LockTimeout: Failed to create/acquire the lock <Store(url='sftp://HOST//storage/backup_mirko/borg2', levels=[('config/', [0]), ('data/', [2])])> (timeout).

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/archiver/__init__.py", line 622, in main
    exit_code = archiver.run(args)
                ^^^^^^^^^^^^^^^^^^
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/archiver/__init__.py", line 516, in run
    rc = func(args)
         ^^^^^^^^^^
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/archiver/_common.py", line 151, in wrapper
    with repository:
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/repository.py", line 138, in __exit__
    self.close()
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/repository.py", line 204, in close
    self.lock.release()
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/storelocking.py", line 194, in release
    raise NotLocked(str(self.store))
borg.storelocking.NotLocked: Failed to release the lock <Store(url='sftp://HOST//storage/backup_mirko/borg2', levels=[('config/', [0]), ('data/', [2])])> (was not locked).

Platform: Linux thar 6.10.9-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.10.9-1 (2024-09-08) x86_64
Linux: Unknown Linux
Borg: 2.0.0b11.dev3+gc278a8e8.d20240915  Python: CPython 3.11.9 msgpack: 1.1.0 fuse: None [pyfuse3,llfuse]
PID: 466264  CWD: /tmp/.lxc-backups/android
sys.argv: ['/usr/local/bin/borg', '-r', 'sftp://HOST//storage/backup_mirko/borg2', 'create', '--show-rc', 'vm:android_2024-09-18T15:07:34+02:00', '.']
SSH_ORIGINAL_COMMAND: None


terminating with error status, rc 74

@ThomasWaldmann
Copy link
Member

ThomasWaldmann commented Sep 18, 2024

Can you try if this fixes it?

diff --git a/src/borg/repository.py b/src/borg/repository.py
index 23c3b150..66db295c 100644
--- a/src/borg/repository.py
+++ b/src/borg/repository.py
@@ -336,11 +336,11 @@ def list(self, limit=None, marker=None):
         list <limit> infos starting from after id <marker>.
         each info is a tuple (id, storage_size).
         """
-        self._lock_refresh()
         collect = True if marker is None else False
         result = []
         infos = self.store.list("data")  # generator yielding ItemInfos
         while True:
+            self._lock_refresh()
             try:
                 info = next(infos)
             except StoreObjectNotFound:

I suspect if the backend is rather slow, the lock might time out while fetching LIST_SCAN_LIMIT (100000) items from store.list(). That must never happen, because in that case, any borg might kill that lock.

@ThomasWaldmann ThomasWaldmann changed the title After failing "borg prune", "borg compact" also fails with "Failed to release the lock" with SFTP backend borg2 sftp: "borg compact" fails with "Failed to release the lock" Sep 18, 2024
@mirko
Copy link
Author

mirko commented Sep 19, 2024

Backup completed over night without any error incl. above proposed change. However performance unfortunately is devastating.
Backing up 64GB of data took almost 8h. No compression options passed, so defaults. This might very well be paramiko, but I also I noticed borg exhausting my CPU when I was checking occasionally.
Transferring data via e.g. dd if=/dev/zero bs=64M count=1024 | ssh HOST "( dd of=/storage/backup_mirko/borg2/NULL.img bs=64M oflag=direct )" passes through at ~100Mbps speed.

@ThomasWaldmann
Copy link
Member

Oh, that's slow.

Maybe add the sftp related insights there: borgbackup/borgstore#44

Also add conn speed and the roundtrip time on that connection (ping) there.

Default compression is lz4.

Exhausting CPU sounds interesting, can you determine with what?
I'ld have rather expected it idling most of the time if the store is so slow.

@ThomasWaldmann
Copy link
Member

You could also try to do a 64MiB sftp upload on that connection for comparison (not: ssh pipe, that works differently).

@mirko
Copy link
Author

mirko commented Sep 19, 2024

First back to original issue - I hit it again, this time with the local file backend:

>>> ROOT - METHOD: borg2
  --- SRC_DIR:  /
  --- DST_URL:  /mnt/borg2/
  --- ARCHIVE:  root_2024-09-19T09:01:22+02:00
  --- EXEC: borg -r /mnt/borg2/ prune -v --list --keep-last=14 --keep-within=2w --keep-daily=30 --keep-weekly=8 --keep-monthly=1 --keep-yearly=1 -a 'sh:root_*' --dry-run
  --- EXEC: borg -r /mnt/borg2/ create --show-rc 'sys:root_2024-09-19T09:01:22+02:00' /
/home/mirko/.cache/mozilla/firefox/oxpi9a9p.XXXX/cache2/entries/4826AC1B36DDDE16CF4CEABA5279381E0DE30B1A: stat: [Errno 2] No such file or directory: '4826AC1B36DDDE16CF4CEABA5279381E0DE30B1A'
Failed to release the lock <Store(url='file:///mnt/borg2', levels=[('config/', [0]), ('data/', [2])])> (was not locked).

Error:

NotLocked: Failed to release the lock <Store(url='file:///mnt/borg2', levels=[('config/', [0]), ('data/', [2])])> (was not locked).

If reporting bugs, please include the following:

Traceback (most recent call last):
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/archiver/_common.py", line 176, in wrapper
    return method(self, args, repository=repository, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/archiver/create_cmd.py", line 268, in do_create
    create_inner(archive, cache, fso)
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/archiver/create_cmd.py", line 169, in create_inner
    self._rec_walk(
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/archiver/create_cmd.py", line 527, in _rec_walk
    self._rec_walk(
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/archiver/create_cmd.py", line 527, in _rec_walk
    self._rec_walk(
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/archiver/create_cmd.py", line 527, in _rec_walk
    self._rec_walk(
  [Previous line repeated 6 more times]
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/archiver/create_cmd.py", line 467, in _rec_walk
    status = self._process_any(
             ^^^^^^^^^^^^^^^^^^
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/archiver/create_cmd.py", line 284, in _process_any
    return fso.process_file(
           ^^^^^^^^^^^^^^^^^
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/archive.py", line 1377, in process_file
    self.process_file_chunks(
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/archive.py", line 1139, in process_file_chunks
    chunk_entry = chunk_processor(chunk)
                  ^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/archive.py", line 1129, in chunk_processor
    chunk_entry = cache.add_chunk(chunk_id, {}, data, stats=stats, wait=False, ro_type=ROBJ_FILE_STREAM)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/cache.py", line 654, in add_chunk
    self.repository.put(id, cdata, wait=wait)
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/repository.py", line 400, in put
    self._lock_refresh()
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/repository.py", line 163, in _lock_refresh
    self.lock.refresh()
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/storelocking.py", line 237, in refresh
    raise LockTimeout(str(self.store))  # our lock was killed, there is no safe way to continue.
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
borg.storelocking.LockTimeout: Failed to create/acquire the lock <Store(url='file:///mnt/borg2', levels=[('config/', [0]), ('data/', [2])])> (timeout).

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/archiver/__init__.py", line 622, in main
    exit_code = archiver.run(args)
                ^^^^^^^^^^^^^^^^^^
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/archiver/__init__.py", line 516, in run
    rc = func(args)
         ^^^^^^^^^^
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/archiver/_common.py", line 151, in wrapper
    with repository:
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/repository.py", line 138, in __exit__
    self.close()
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/repository.py", line 204, in close
    self.lock.release()
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/storelocking.py", line 194, in release
    raise NotLocked(str(self.store))
borg.storelocking.NotLocked: Failed to release the lock <Store(url='file:///mnt/borg2', levels=[('config/', [0]), ('data/', [2])])> (was not locked).

Platform: Linux thar 6.10.9-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.10.9-1 (2024-09-08) x86_64
Linux: Unknown Linux
Borg: 2.0.0b11.dev3+gc278a8e8.d20240915  Python: CPython 3.11.9 msgpack: 1.1.0 fuse: None [pyfuse3,llfuse]
PID: 896117  CWD: /home/mirko
sys.argv: ['/usr/local/bin/borg', '-r', '/mnt/borg2/', 'create', '--show-rc', 'sys:root_2024-09-19T09:01:22+02:00', '/']
SSH_ORIGINAL_COMMAND: None


terminating with error status, rc 74

@mirko mirko changed the title borg2 sftp: "borg compact" fails with "Failed to release the lock" borg2: borg {compact,create,..] fails with "Failed to release the lock" Sep 19, 2024
@mirko mirko changed the title borg2: borg {compact,create,..] fails with "Failed to release the lock" borg2: borg {compact,create,..} fails with "Failed to release the lock" Sep 19, 2024
@ThomasWaldmann
Copy link
Member

ThomasWaldmann commented Sep 19, 2024

That's unexpected. But maybe update your local master branch, so that my recent repository.list fix gets in there.

Can you say how long it ran before crashing within that borg invocation?

@mirko
Copy link
Author

mirko commented Sep 19, 2024

That's unexpected. But maybe update your local master branch, so that my recent repository.list fix gets in there.

2a20ebe does not seem to differ to what you proposed earlier in this ticket and what I applied (and still is applied) manually

Can you say how long it ran before crashing within that borg invocation?

real 67m17.522s
user 42m30.524s
sys 7m44.778s

@mirko
Copy link
Author

mirko commented Sep 19, 2024

You could also try to do a 64MiB sftp upload on that connection for comparison (not: ssh pipe, that works differently).

11.9MB/s via sftp put

@ThomasWaldmann
Copy link
Member

#8396 should help debugging the locking issue, use --debug.

@mirko
Copy link
Author

mirko commented Sep 19, 2024

#8396 should help debugging the locking issue, use --debug.

Ongoing, what I also noticed when using the SFTP backend and now seeing with --debug enabled - that literally hours are spent on listdir():

[..]
[chan 0] listdir(b'/storage/backup_mirko/borg2/data/0d/ce')
[chan 0] listdir(b'/storage/backup_mirko/borg2/data/0d/cf')
[chan 0] listdir(b'/storage/backup_mirko/borg2/data/0d/d0')
[chan 0] listdir(b'/storage/backup_mirko/borg2/data/0d/d1')
[chan 0] listdir(b'/storage/backup_mirko/borg2/data/0d/d2')
[chan 0] listdir(b'/storage/backup_mirko/borg2/data/0d/d3')
[chan 0] listdir(b'/storage/backup_mirko/borg2/data/0d/d4')
[chan 0] listdir(b'/storage/backup_mirko/borg2/data/0d/d5')
[chan 0] listdir(b'/storage/backup_mirko/borg2/data/0d/d6')
[chan 0] listdir(b'/storage/backup_mirko/borg2/data/0d/d7')
[chan 0] listdir(b'/storage/backup_mirko/borg2/data/0d/d8')
[chan 0] listdir(b'/storage/backup_mirko/borg2/data/0d/d9')
[chan 0] listdir(b'/storage/backup_mirko/borg2/data/0d/da')
[chan 0] listdir(b'/storage/backup_mirko/borg2/data/0d/db')
[chan 0] listdir(b'/storage/backup_mirko/borg2/data/0d/dc')
[chan 0] listdir(b'/storage/backup_mirko/borg2/data/0d/dd')
[chan 0] listdir(b'/storage/backup_mirko/borg2/data/0d/de')
[..]

That at least explains why locks could timeout when not refreshed within respective loop.

@ThomasWaldmann
Copy link
Member

ThomasWaldmann commented Sep 19, 2024

@mirko that's when it queries all the chunk ids from remote. there is no cache for that yet.

it distributes the objects using 2 levels of directories (e.g. 12/34/ contains all chunks that have an ID starting with 1234), so that is up to 65536+256+1 dirs.

could be configured to only use 1 level of directories (that would be 256+1 dirs), but that would potentially put a lot of objects into a single directory, so i thought 2 levels are more appropriate for that kind of storage.

@mirko
Copy link
Author

mirko commented Sep 19, 2024

@mirko that's when it queries all the chunk ids from remote. there is no cache for that yet.

it distributes the objects using 2 levels of directories (e.g. 12/34/ contains all chunks that have an ID starting with 1234), so that is up to 65536+256+1 dirs.

could be configured to only use 1 level of directories (that would be 256+1 dirs), but that would potentially put a lot of objects into a single directory, so i thought 2 levels are more appropriate for that kind of storage.

Fair enough - it's still listing, though - and it already has been 2h+ when I wrote the last comment. Mind, we're talking about a repo on an SFTP backend, which for some reason appears to be really slow already (with paramiko and/or borg).

So we're now at 6h+ and the backup didn't even start :)

EDIT:

[chan 0] listdir(b'/storage/backup_mirko/borg2/data/ff/ff')
queried 1679538 chunk IDs in 25815.41996549 s (18 requests), ~2.21 kB/s

Now borg is doing something on 100% CPU for a while already without printing any (debug-)statement.

EDIT2: Started backing up next VM into the same repo. Again, going through that listdir()-loop. Is this happening for every borg create command? And if so, previous round took 7+ hours at a speed of 2.2 kB/s. Suboptimal at least.

@ThomasWaldmann
Copy link
Member

Yeah, that is unbearable. Yes, it currently always ad-hoc builds the chunks index only in memory (after querying all object ids from repo), no local cache.

I have some ideas how to optimize this, will do after finishing the files cache work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants