-
-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Retry feature when a scheduled backup fails #566
Comments
Hmm, its an interesting idea. Backrest tries to provide error policies on hooks which are supported as a way of implementing a "preflight" check that the network is available etc / that a backup is likely to succeed if attempted. I'm a bit concerned about blanket retries on backup operations. I'm not sure if backrest would be able to reliably differentiate between an error that prevents a snapshot from being created and an error, say during restic's cleanup steps, that happened after a backup was created. In the latter case it might just keep retrying forever and still creating snapshots. It might be possible for backrest to integrate a preflight check or a hook type that allows for explicitly checking that the repo is available (e.g. retry later if it can't cat the repo's config or some such thing ...). But I'm not sure if that would help with flaky reads on the storage provider's end of things. Ideally this is something to work around in rclone's restic serve implementation if at all possible. |
I am actually surprised this is not available. You can't really make sure with hooks that a backup will go through. There are unexpected conditions that may occur during a long backup set. Eg I have set up daily backups on a lot of my VMs and sometimes a long backup just fails for whatever reason. In my case I use a remote s3 compatible backend directly, not rclone. So when eg backing up 180GB of data and something happens network-wise, the snapshot is never actually taken for the day. I found out yesterday that one specific snapshot (I take 3 per VM, splitting the snapshots into covering directories with very rare changes, medium frequency changes and very often changes) for a VM had not been taken for 3 days in a row because it just happened that the data center had upstream issues with BGP flapping at the specific time the job was scheduled to run daily. That means 3 days with no backups. A basic retry on fail feature with an exponential backoff and max retry setting (0 to disable and as default) would solve this problem and would never get into retrying forever. Eg retry max 10 times, each time with double the waiting period and hook flag that we could use to send an email if that x max retry value is reached so we know something is wrong that begs close inspection. |
Would be super cool! Had the following error today: subprocess ssh: Connection closed by XX.XX.XX.XX port 23Fatal: unable to open repository at sftp:backup:/home/restic: unable to start the sftp session, error: error receiving version packet from server: server unexpectedly closed connection: unexpected EOF So a retry would have been great. |
Hi!
I have used this tool and is perfect in almost everything. As backend, I use pCloud through the rclone provider, and it behaves perfectly. Sometimes it fails for random reasons, and I would like that would be available a retry option that with an exponential backoff, retries a X time when something fails, according to the existing scheduling.
The text was updated successfully, but these errors were encountered: