Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash when medium is running out of space? #78

Open
VARGA-Peter opened this issue May 26, 2018 · 6 comments
Open

Crash when medium is running out of space? #78

VARGA-Peter opened this issue May 26, 2018 · 6 comments

Comments

@VARGA-Peter
Copy link

VARGA-Peter commented May 26, 2018

While I ran into #77 I think these issues exist:

  1. Can it be there is a uncaught program crash when the backup medium is running out of space?
  2. I also noticed that the snapshots remained on the host.
  3. No email was sent.

May be you find time to check it out.

Thank you

@NAUbackup
Copy link
Owner

A precheck is not a bad idea. When things crash, there often are remnants and if it crashes, there won't be of course email sent.

@VARGA-Peter
Copy link
Author

Below the email I received when I cleaned up the backup medium and restarted the backup again.
You see that the fail for APA-SBS201WX was reported correctly [2018/05/26 04:46:40] - because there was not enough of space - but then APA-TS230W became a zombie and the script crashed.

I don't know Python but in my C# and C++ code I wrap all functions into a try/catch block where I send an email that something got wrong because before I had the same problem that I wasn't notified in such situations.

2018/05/26 04:00:01,vmbackup.py,APA92,begin
2018/05/26 04:00:01,vm-export,APA92,begin,APA-APP231LY
2018/05/26 04:02:49,vm-export,APA92,end,SUCCESS APA-APP231LY,elapse:2 size:6G
2018/05/26 04:02:49,vm-export,APA92,begin,APA-SBS201WX
2018/05/26 04:46:40,vm-export,APA92,end,VM-EXPORT-FAIL APA-SBS201WX
                                         ^ this is OK and correct
2018/05/26 04:46:40,vm-export,APA92,begin,APA-TS230W
        and now we have undefined behaviour
2018/05/26 13:00:01,vmbackup.py,APA92,begin
2018/05/26 13:00:01,vm-export,APA92,begin,APA-APP231LY
2018/05/26 13:02:48,vm-export,APA92,end,SUCCESS APA-APP231LY,elapse:2 size:6G
2018/05/26 13:02:48,vm-export,APA92,begin,APA-SBS201WX
2018/05/26 13:56:45,vm-export,APA92,end,SUCCESS APA-SBS201WX,elapse:53 size:183G
2018/05/26 13:56:45,vm-export,APA92,begin,APA-TS230W
2018/05/26 14:26:15,vm-export,APA92,end,SUCCESS APA-TS230W,elapse:29 size:92G
2018/05/26 14:26:15,vm-export,APA92,begin,APA-BKP102WX
2018/05/26 14:33:38,vm-export,APA92,end,SUCCESS APA-BKP102WX,elapse:7 size:19G
2018/05/26 14:33:38,vmbackup.py,APA92,end,SUCCESS,S:4 W:0 E:0

@NAUbackup
Copy link
Owner

Yes, python has a try/except construct that could be used to that extent.

Note that cleanup does take place if an old attempted backup is found using process_backup_dir(tmp_vm_backup_dir) and the embedded function get_last_backup_dir_that_failed, so this does get cleaned up, but not until the next run. The issue is currently that if the script fails or is killed off by someone, any pre-cleanup may not be caught in time to act upon, hence the checking for failed backups when the script is launched afresh. In part, it's a matter of philosophy of approach. This check is done both for VDI and full VM backups.

@NAUbackup
Copy link
Owner

As to a precheck, what might be useful: A warning if the available space is over 90% full? If the backup fails, the nature of the failure would have to be understood (such as the lack of disk space). I don't think the script would be clever enough to trap the condition that caused the failure., and usually if something happens with a VM, it typically is desirable to go ahead with the others, assuming the error only affected the one VM.

@VARGA-Peter
Copy link
Author

I think 90% is just a number and in my situation it wouldn't help as you can see from the above email. SBS needs 60% of the total space of all VMs. As I mentioned before adding the possibility to remove the oldest version FIRST would solve this problem.

Yes, it is my logical mistake that I didn't consider this when I created the iSCSI storage for the backup. The point is that now the NAS is almost full and it is not that trivial to reorganize it because then I have to change all storage assignments.

@NAUbackup
Copy link
Owner

NAUbackup commented May 27, 2018

Could maybe do this: Look for the largest existing backup for a VM and see if there is at least that much space still available (+10% or so) before starting a backup series on a particular VM? Would have to handle both NFS and CIFS storage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants