Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

luci-app-statistics: Add backup/restore for RRD statistics #6646

Merged
merged 1 commit into from
Nov 4, 2023

Conversation

jtkohl
Copy link
Contributor

@jtkohl jtkohl commented Oct 25, 2023

Add a backup/restore capability for rrd data storage in luci_statistics. The data storage is typically in /tmp and does not survive reboot or sysupgrade. This adds an option for the administrator to configure the RRD plugin, so that the RRD data are are preserved with a backup copy in the overlay file system.

This works for shutdown/reboot, sysupgrade (backup config files, restore config files, and true sysupgrade).

Also fix a bug where starting luci_statistics for the first time would not get a restart a running collectd

@jtkohl
Copy link
Contributor Author

jtkohl commented Oct 25, 2023

USE CASES:

  • sysupgrade (complete); sysupgrade -c; sysupgrade -o:
    during upgrade, sysupgrade_backup is put into the archive
    after boot, the new sysupgrade backup is unpacked and available

    either immediately (if luci_statistics included in the image) or after
    install of luci_statistics: starting statistics restores the sysupgrade backup

  • sysupgrade -l (list):
    generate list of files but don't actually generate the backup

  • sysupgrade -r (restore backup), then reboot:
    sysupgrade_backup is restored into place
    during shutdown the sysupgrade_backup is left alone
    after reboot use the sysupgrade_backup

  • sysupgrade -b (make backup):
    create a sysupgrade_backup and include it in the archive

    • continued operation: next shutdown/restart will not use the
      sysupgrade_backup (but does clean it up)

    • another sysupgrade -b: recreate new sysupgrade_backup and behave as if
      it's the first

  • sysupgrade -b, run, then sysupgrade -r (different backup)
    then stop statistics: mismatched trust files, will restore.

  • sysupgrade -f (restore conf file and update at the same time): yes, trusted and restored

  • interactive mode (-i): works as expected

  • test mode: sysupgrade -T -v -v -i -f /tmp/backup /tmp/openwrt*gz : OK; sysupgrade -T -v -v -i /tmp/openwrt*.gz : OK

@jtkohl jtkohl force-pushed the luci-app-statistics-backups branch from 4790845 to 86b39bc Compare October 25, 2023 02:31
USE_PROCD=1

# We only want to restore a sysupgrade backup file if
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might want to add a note about this being better than a cronjob, as it doesn't constantly write to flash on devices with non-durable storage.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added something, let me know if you like it.

Copy link
Contributor

@efahl efahl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

@jtkohl jtkohl force-pushed the luci-app-statistics-backups branch from 86b39bc to 6c654ce Compare October 26, 2023 23:14
@jtkohl
Copy link
Contributor Author

jtkohl commented Oct 26, 2023

I also added a README.backups explaining the use of /etc/luci_statistics directory

@jtkohl
Copy link
Contributor Author

jtkohl commented Oct 26, 2023

@hnyman what do you think?

@jtkohl jtkohl force-pushed the luci-app-statistics-backups branch from 6c654ce to b8c232a Compare October 26, 2023 23:16
@hnyman
Copy link
Contributor

hnyman commented Oct 27, 2023

I didn't test it, yet, but one comment regarding verbosity:

The init script and any file included in the .ipk package should be kept short and terse, and verbose explanations into the commit message, please.

Right now the init script looks like a novel ;-)
Please shorten the explanations included in the package.

@efahl
Copy link
Contributor

efahl commented Oct 27, 2023

Please shorten the explanations included in the package.

I wondered about that. I assume since the scripts eat flash space, so minimizing them is best practice for OpenWrt/LEDE, embedded distros in general.

Would it be appropriate for @jtkohl to just move all that doc to the readme? Actually, on second thought, I'd guess the best thing would be to put it in the wiki: https://openwrt.org/docs/guide-user/luci/luci_app_statistics

@jtkohl
Copy link
Contributor Author

jtkohl commented Oct 27, 2023

I didn't test it, yet, but one comment regarding verbosity:

The init script and any file included in the .ipk package should be kept short and terse, and verbose explanations into the commit message, please.

Right now the init script looks like a novel ;-) Please shorten the explanations included in the package.

Sure thing... I'm still learning my way around the OpenWrt development norms.

How about putting the design notes into the Makefile for the package, or as a README in the sources? And also I'm happy to put notes in the wiki if desired.

EDIT: I'm happy to update the wiki section on preserving statistics, if/when this is merged into the project's repository.

@jtkohl jtkohl force-pushed the luci-app-statistics-backups branch from b8c232a to c0422e9 Compare October 28, 2023 00:51
Copy link
Contributor

@hnyman hnyman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made some comments, generic in nature, aiming mainly for simplification.

I have not yet tested your code, but the idea looks ok in principle.

I wonder about the backup file list generation. Is it not enough to just provide a directory to backup? Does it need real evaluation? (What actually gets added to the "filelist" after the sed code?)
E.g. in my use case, the wlan interface names might change depending on 22.03, 23.05 and master due to changes in the default naming logic, so the exact files to be backuped do vary a bit depending on the specific build on the same router as there are rrd records unused by a specific Openwrt version. (Well, I have named my wlan interfaces to avoid that, but in principle...)

Similarly, I am thinking about the logic to remove backup files.
Is that crash-proof? If the router crashes and reboots, what happens? Is something restored?

Some context: I flash a different main/master test build roughly every 3 days to my routers, and they do crash every now and then. My current manual (cronjob) "store a backup a day as .tar.gz" approach ensures me that I have the last night's copy. What happens with your new scheme if the router crashes and reboots and then continues operation ? The backup made at the last sysupgrade survives? Or is lost? (Would daily backups be possible?)

(I am currently using daily timestamp-named files: if the router crashes and reboots (and there is no automatic restore), I do not want the next night's empty short-term backup to overwrite the single backup, but I want to also preserve the earlier backups for manually selecting a suitable restore from a before-crash backup. So the standard backup name symlinks to the most recent daily backup, but the daily backups are individual files.) Sure, I am minority, but the long-term monthly/yearly trend is the most useful tool to see gradual changes. And sure, I do need to manually remove the old backups.)


### restart collectd if it was running before us
/etc/init.d/collectd status >/dev/null 2>&1 && /etc/init.d/collectd stop >/dev/null 2>&1
/etc/init.d/collectd start
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about just having a "collectd restart" here?

currently this is actually not about "restart collectd if it was running before us", but it is "stop collectd if running + always start collectd".

If you want "restart collectd if it was running before us", it might be just "/etc/init.d/collectd status >/dev/null 2>&1 && /etc/init.d/collectd restart"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm....
reading further, there actually was a "collectd" restart that you removed and replaced with a more polished logic. git diff hide that well at the first glance.
Not sure if that is worthwhile. (I like KISS principle.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm.... reading further, there actually was a "collectd" restart that you removed and replaced with a more polished logic. git diff hide that well at the first glance. Not sure if that is worthwhile. (I like KISS principle.)

This is what got lost when moving the docs to README and out of the source code comments.

The problem I found (really, this is unrelated to the backups/restores) is that if you install luci_statistics with opkg install luci_statistics, it will load the dependencies first. collectd gets installed and started with its default configuration. Then luci_statistics gets installed and started, whereupon it creates a new config file. But in this case, it never restarts collectd so the new config file is not yet used.

This problem does not occur if the system is built with both collectd and luci_statistics in the base image. In that case, the first time it boots, luci_statistics goes first and sets up the config (with collectd not yet running), due to the /etc/rc.d/SXX ordering.

Copy link
Contributor

@hnyman hnyman Oct 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But in this case, it never restarts collectd so the new config file is not yet used.

Well, there was a "collectd restart" (that you removed) so it should restart collectd once luci_statistics starts. Interesting if it does not happen.

This problem does not occur if the system is built with both collectd and luci_statistics in the base image

Yeah, I have it always that way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, there was a "collectd restart" (that you removed) so it should restart collectd once luci_statistics starts. Interesting if it does not happen.

That previous restart of collectd is only inside the code to restart luci_statistics, not in the code to start luci_statistics

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

currently this is actually not about "restart collectd if it was running before us", but it is "stop collectd if running + always start collectd".

If you want "restart collectd if it was running before us", it might be just "/etc/init.d/collectd status >/dev/null 2>&1 && /etc/init.d/collectd restart"

I think we want "stop if running and then start". Otherwise, if collectd is not actually running, then starting just luci_statistics won't really start gathering statistics. In the current code (before this PR), the sysadmin needs to know that if both collectd & luci_statistcs are stopped, then they have to start luci_statistics and then collectd in order to get statistics gathering started.

So I can either update the comment to explain that we want "stop collectd if necessary and always start it", or leave the comment and change the code to "restart collectd if it was running". I of course prefer the former.

@jtkohl
Copy link
Contributor Author

jtkohl commented Oct 28, 2023

I wonder about the backup file list generation. Is it not enough to just provide a directory to backup? Does it need real evaluation? (What actually gets added to the "filelist" after the sed code?)

We want to create a new backup at the time of sysupgrade, so for non-CONF_BACKUP_LIST cases we do that.
For the restore logic to work correctly, we need to ensure that only one of the twin files will survive a sysupgrade. An administrator can unwittingly break things by putting /etc/luci_statistics into the backup list. But even worse, using the sysupgrade -o or -c options (preserve all of /etc, or preserve entire overlay) will break the restore logic.
The sed command removes lines matching /etc/luci_statistics and the rest of the function puts back what we need to preserve, just the two file names echoed at the end.

You can test this out with sysupgrade -l (and add in -c or -o if you like) to see the net result in the backed-up file list.

@jtkohl
Copy link
Contributor Author

jtkohl commented Oct 28, 2023

Similarly, I am thinking about the logic to remove backup files.
Is that crash-proof? If the router crashes and reboots, what happens? Is something restored?

I'm adding the following to the README.md.

During disorderly reboot

In a system crash or other disorderly reboot, the shutdown scripts do
not run. What remains on the system is the previous contents of
/etc/luci_statistics.

  • If the system never started luci_statistics, or it was cleanly shut
    down before the crash, then there is no difference in behavior from
    normal startup: we restore either the sysupgrade backup (if
    luci_statistics had never run) or the regular backup (if
    luci_statistics was cleanly stopped)

  • If luci_statistics and collectd were running at the time of the crash,
    there could be a regular backup and a sysupgrade backup present, plus
    volatile data in /tmp (which are lost in the crash). The regular
    backup would be from the most recent time the system cleanly stopped
    luci_statistics. During the subsequent reboot/service start up:

    • If there is a sysupgrade backup on disk from having run sysupgrade -b, with both twin files matching (meaning the administrator had
      taken a backup sometime during the life of the system, before the
      crash), they are ignored and a regular backup (if any) is restored.

    • If the sysupgrade backup has mismatched twin files or only one twin,
      then it is used to restore state. This would be the case if a
      sysupgrade restored configuration (sysupgrade -r), whether or not
      it did an orderly shutdown/reboot, or if the file system were
      damaged in a crash and only one of the twin files survived.

@jtkohl jtkohl force-pushed the luci-app-statistics-backups branch from c0422e9 to 9782dea Compare October 28, 2023 11:56
Add a backup/restore capability for rrd data storage in
luci_statistics.  The data storage is typically in /tmp and does not
survive reboot or sysupgrade.  This adds an option for the
administrator to configure the RRD plugin, so that the RRD data are
are preserved with a backup copy in the overlay file system.

This works for shutdown/reboot, sysupgrade (backup config files,
restore config files, and true sysupgrade).

Also fix a bug where starting luci_statistics for the first time would
not get a restart a running collectd: during install of the package
when it is not included in the base flashed image, collectd might be
started when it got installed/configured before this package gets
installed/configured.  So we need to check if it's running, and
restart it to use the luci_statistics configuration.

Signed-off-by: John Kohl <[email protected]>
@jtkohl jtkohl force-pushed the luci-app-statistics-backups branch from 9782dea to ad98af3 Compare October 30, 2023 00:34
@jtkohl
Copy link
Contributor Author

jtkohl commented Oct 30, 2023

had to update /etc/init.d/luci_statistics, we must not run uci during every parse of the script, since that then gets run during image assembly. So now the uci calls are inside a function that isn't called during image assembly (enable/disable of packages).

https://openwrt.org/docs/guide-developer/procd-init-scripts#init_scripts_during_compilation

@jtkohl
Copy link
Contributor Author

jtkohl commented Nov 1, 2023

As it happens, I had a brief power failure today and got to test out the result. I last ran a backup (restarting statistics) last night, and so that backup is what got restored after the power failure, then more data was gathered between power restored and now. Here's a sample graph:

image

@hnyman
Copy link
Contributor

hnyman commented Nov 4, 2023

Thanks.
I first applied this manually to my RT3200 and then also built an image with this included.

  • Sysupgrade worked ok,
  • a managed reboot worked ok, and
  • the last backup was nicely restored after a manual power-off.

Looks ok to me.

I will merge this as it is, but a few suggestions for further development:

  • the backup option is well hidden in the RRDtool options.

    • There might be a secondary (or the only) instance of the option on the main Setup page.
    • Or there might at least be a dummy value with text about backup the option status.
  • the backup option explanation text might include an URL link to the README with explanations. Once I merge this, there is permanent file in GitHub.
    (Should the README be renamed to be more backup-specific?)

  • backup cron jobs:
    As I test the newest stuff a lot, I am prepared for crashes. So, I currently run a cron job at 02:00 to store stats as a tar.gz so that I do not lose more than one day of stats. I might change that to use the "sysupgrade -b" now.

    In general, it might be useful to offer an option regarding daily backup to permanent flash, e.g. a cronjob of stats backup running e.g. at 03:00.

    Not sure if that is easily possible: Adding a cronjob could be straightforward, but detecting/deleting/disabling one might be more cumbersome, and it would require write rights to global root crontab. But the readme might contain a line offering advice abouta suitable line to be added manually.

@hnyman hnyman merged commit 574caf9 into openwrt:master Nov 4, 2023
@hnyman
Copy link
Contributor

hnyman commented Nov 4, 2023

there might at least be a dummy value with text about backup the option status.

Crude example about that:

image

--- /rom/www/luci-static/resources/view/statistics/collectd.js
+++ /www/luci-static/resources/view/statistics/collectd.js
@@ -40,6 +40,8 @@
        render: function(plugins) {
                var m, s, o, enabled;

+               var rrd_backup = uci.get("luci_statistics", "collectd_rrdtool", "backup") == "1" ? _("Backup enabled") : _("Backup disabled. See RRD plugin config");
+
                for (var i = 0; i < plugins.length; i++)
                        plugins[plugins[i].name] = plugins[i];

@@ -82,6 +84,8 @@
                o.optional = true;
                o.depends('Hostname', '');

+               o = s.option(form.DummyValue, 'backup_status', _('RRD database backup'), rrd_backup);
+
                var groupNames = [
                        'general', _('General plugins'),
                        'network', _('Network plugins'),

@jtkohl
Copy link
Contributor Author

jtkohl commented Nov 4, 2023

Thank you! I will take a look at your suggestions next week when I have some more time available.

Any possibility of back porting to 23.05 branch?

@hnyman
Copy link
Contributor

hnyman commented Nov 4, 2023

Backporting to 23.05 might be possible after a few weeks, when there is more evidence that there are no major negative surprises.

@jtkohl
Copy link
Contributor Author

jtkohl commented Nov 6, 2023

  • As I test the newest stuff a lot, I am prepared for crashes. So, I currently run a cron job at 02:00 to store stats as a tar.gz so that I do not lose more than one day of stats. I might change that to use the "sysupgrade -b" now.

You could do /etc/init.d/sysupgrade backup if you only care about backing up statistics and not the entire config. That will create the "normal" backup file that survives crashes and orderly reboots. The sysupgrade backup is only needed if you want the backup packaged in a way to survive a sysupgrade.

@hnyman
Copy link
Contributor

hnyman commented Nov 6, 2023

You could do /etc/init.d/sysupgrade backup if you only care about backing up statistics

I guess that you meant /etc/init.d/luci_statistics backup

@jtkohl
Copy link
Contributor Author

jtkohl commented Nov 6, 2023

You could do /etc/init.d/sysupgrade backup if you only care about backing up statistics

I guess that you meant /etc/init.d/luci_statistics backup

yup. oops! sorry, was traveling and not thinking 100% straight, clearly!

@jtkohl
Copy link
Contributor Author

jtkohl commented Nov 11, 2023

  • the backup option explanation text might include an URL link to the README with explanations. Once I merge this, there is permanent file in GitHub.

ah, I'm somewhat new to GitHub (been using/developing other SCMs for 30+ years) so that would be like https://github.com/openwrt/luci/blob/master/applications/luci-app-statistics/README.md ?

  (Should the README be renamed to be more backup-specific?)

I thought about a different name for README, and then decided to leave it as README and use a heading for Backups...figuring if anybody wanted to document anything else about the package, they could add other sections to the README. But if you prefer to rename it, I can do that...let me know?

@jtkohl
Copy link
Contributor Author

jtkohl commented Nov 11, 2023

the backup option is well hidden in the RRDtool options.

* There might be a secondary (or the only) instance of the option on the main Setup page.

* Or there might at least be a dummy value with text about backup the option status.

I get the point. I struggle with the best place for this in the LuCi UI. The backups are really all about the rrdtool database, so putting it on the rrd plugin config seems right.
But the admin in many cases never needs to visit the rrd plugin config page: the defaults are generally workable. So putting it on the main page seems good from a "hey you want to think about this", but seems wrong from a "this only applies when rrdtool is configured" perspective.

Is there a way for the collectd.js page logic to see the current values of the settings from the rrdplugin.js form? With your initial prototype, it only tracks the value as seen by UCI, not the temporary values in the LuCI GUI. I'm new to LuCi programming so any hints here would be helpful.

@hnyman
Copy link
Contributor

hnyman commented Nov 11, 2023

Seemed to work ok in master, that I backported it to 23.05

@hnyman
Copy link
Contributor

hnyman commented Nov 11, 2023

While looking at the help output, I tested the two extra commands, ane one of tem sems to fail?

root@router1:/# /etc/init.d/luci_statistics backup

root@router1:/# /etc/init.d/luci_statistics sysupgrade_backup
sed: -i requires an argument
/etc/rc.common: line 167: can't create : nonexistent directory
/etc/rc.common: line 168: can't create : nonexistent directory

@jtkohl
Copy link
Contributor Author

jtkohl commented Nov 11, 2023

While looking at the help output, I tested the two extra commands, ane one of tem sems to fail?

root@router1:/# /etc/init.d/luci_statistics backup

root@router1:/# /etc/init.d/luci_statistics sysupgrade_backup
sed: -i requires an argument
/etc/rc.common: line 167: can't create : nonexistent directory
/etc/rc.common: line 168: can't create : nonexistent directory

Ah, it's a documentation/usage problem, and insufficient protection. sysupgrade_backup is used by /lib/upgrade/luci_statistics-add-conffiles.sh which passes it a filename.
I'll tweak the script to fail if missing a file name and tweak the help to explain that

@hnyman
Copy link
Contributor

hnyman commented Nov 11, 2023

sysupgrade_backup is used by /lib/upgrade/luci_statistics-add-conffiles.sh which passes it a filename.
I'll tweak the script to fail if missing a file name and tweak the help to explain that

Is it meant to be used from CLI?
If not, then there is no need to mention it in the EXTRA_HELP item.
If yes, the instructions should be clarified

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants