Retry feature when a scheduled backup fails #566

lzecca78 · 2024-11-18T13:45:32Z

Hi!
I have used this tool and is perfect in almost everything. As backend, I use pCloud through the rclone provider, and it behaves perfectly. Sometimes it fails for random reasons, and I would like that would be available a retry option that with an exponential backoff, retries a X time when something fails, according to the existing scheduling.

garethgeorge · 2024-11-20T05:32:58Z

Hmm, its an interesting idea.

Backrest tries to provide error policies on hooks which are supported as a way of implementing a "preflight" check that the network is available etc / that a backup is likely to succeed if attempted.

I'm a bit concerned about blanket retries on backup operations. I'm not sure if backrest would be able to reliably differentiate between an error that prevents a snapshot from being created and an error, say during restic's cleanup steps, that happened after a backup was created. In the latter case it might just keep retrying forever and still creating snapshots.

It might be possible for backrest to integrate a preflight check or a hook type that allows for explicitly checking that the repo is available (e.g. retry later if it can't cat the repo's config or some such thing ...). But I'm not sure if that would help with flaky reads on the storage provider's end of things.

Ideally this is something to work around in rclone's restic serve implementation if at all possible.

Nodens- · 2025-01-23T15:29:47Z

I am actually surprised this is not available. You can't really make sure with hooks that a backup will go through. There are unexpected conditions that may occur during a long backup set.

Eg I have set up daily backups on a lot of my VMs and sometimes a long backup just fails for whatever reason. In my case I use a remote s3 compatible backend directly, not rclone. So when eg backing up 180GB of data and something happens network-wise, the snapshot is never actually taken for the day.

I found out yesterday that one specific snapshot (I take 3 per VM, splitting the snapshots into covering directories with very rare changes, medium frequency changes and very often changes) for a VM had not been taken for 3 days in a row because it just happened that the data center had upstream issues with BGP flapping at the specific time the job was scheduled to run daily. That means 3 days with no backups.

A basic retry on fail feature with an exponential backoff and max retry setting (0 to disable and as default) would solve this problem and would never get into retrying forever. Eg retry max 10 times, each time with double the waiting period and hook flag that we could use to send an email if that x max retry value is reached so we know something is wrong that begs close inspection.

thueske · 2025-03-06T18:11:55Z

Would be super cool! Had the following error today:

subprocess ssh: Connection closed by XX.XX.XX.XX port 23Fatal: unable to open repository at sftp:backup:/home/restic: unable to start the sftp session, error: error receiving version packet from server: server unexpectedly closed connection: unexpected EOF

So a retry would have been great.

lzecca78 added the enhancement New feature or request label Nov 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Retry feature when a scheduled backup fails #566

Retry feature when a scheduled backup fails #566

lzecca78 commented Nov 18, 2024

garethgeorge commented Nov 20, 2024

Nodens- commented Jan 23, 2025

thueske commented Mar 6, 2025

Retry feature when a scheduled backup fails #566

Retry feature when a scheduled backup fails #566

Comments

lzecca78 commented Nov 18, 2024

garethgeorge commented Nov 20, 2024

Nodens- commented Jan 23, 2025

thueske commented Mar 6, 2025