-
Notifications
You must be signed in to change notification settings - Fork 307
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add docker healthcheck to all containers #415
Comments
Agree on both the nice-to-have and that it is simple enough to implement. I just did some experiments with the core MING components. Node-RED already has a health check but the others don't. This is what I came up with:
The result on my test Pi (lots of experimental containers running):
In the case of Mosquitto, we're building that from a Dockerfile so I'd lean towards adding it there. Ditto for MariaDB (which would also take care of nextcloud's database). The others (the ones we're not building from Dockerfiles) would need augmented service definitions. But, taking a step back, I'm not sure what would happen if an upstream image started to supply its own health check. I assume (without checking) that the last one in the chain (ie IOTstack's) would prevail. What I'm more concerned about is if an upstream container started providing a better test. For example, Mosquitto issue 10 was opened in 2016 with very little movement but a recent post proposes running
Aside from one or two cosmetic issues (eg not every IOTstack user will have gone to the trouble of setting up credentials so those variables would need to be quoted, so they'd turn into null strings) that's quite promising. I was tempted to use it but a bit of testing on my own system revealed a few wrinkles. The success of the Also, at least on my system, the
If I set up a retained message:
and repeat the tests, both return "0" immediately. Anyway, outside the container always gets what I think of as the correct answer while inside the container the mileage varies depending on the situation and is, accordingly, prone to returning false "unhealthy" messages. An improved scheme might start with a retained However, I think you can probably see my point. Assuming all these issues could be addressed, this would actually be a better health check and I wouldn't want to risk getting in its way if it was adopted by the Eclipse people. Thoughts? |
How about this for Mosquitto? Dockerfile:
Healthcheck script:
Basic operationCredentialsShould work out-of-the-box on systems that do not have password schemes. For those that do, the following will need to be added to the service definition in environment:
- HEALTHCHECK_USER=someusername
- HEALTHCHECK_PASSWORD=somepassword In the original version of this, I wrote:
I have since checked what happens and, indeed, all the quote marks do get passed through verbatim so I have updated the script to handle that problem for the username, password and topic variables. I also explicitly checked the return code from the subscribe. Listener portIn the reasonably unlikely event someone is using something other than internal port 1883, there's: environment:
- HEALTHCHECK_PORT=12345 Test topicThe test topic defaults to environment:
- HEALTHCHECK_TOPIC=some/other/topic Basic testThe Dockerfile runs the test every 30 seconds so an external subscriber should be able to see the timestamps appearing with that frequency:
What do you think? |
Oh, if I deliberately force a bad port by adding:
the result is:
|
I had some pull requests for Mosquitto open already. I did a lot more testing and I'm pretty happy with it so I've pushed the changes into the existing PRs: I added a chunk of words about the topic to the IOTstack Mosquitto documentation on the master branch PR. The easiest way to see it in advance of the PR being accepted/rejected is via the PR branch at: |
Nice, I was going to look into this a little, but it looks like you jumped right on it. This also shows up nicely in portainer and lets you pull it in reports like cadvisor and nodeexporter. Not sure about the precedent and override of built in checks, have you tested it with the nodered one? Either way its good, and someone can expand on it later if they want to add anything advanced like influxdb real consistency checks, or actual file system stuff |
Follows on from suggestion in [Issue 415](SensorsIot#415) to add health-check to more containers. See also [PR 406](SensorsIot@dbb6217). Changes: * Adds `iotstack_healthcheck.sh` script to template. * Adds commands to Dockerfile to copy that script into the local image and activate health-checking on launch. * Describes health-check functionality in the MariaDB documentation. * References MariaDB health-check documentation in NextCloud documentation.
Follows on from suggestion in [Issue 415](SensorsIot#415) to add health-check to more containers. See also [PR 406](SensorsIot@dbb6217). Changes: * Adds `iotstack_healthcheck.sh` script to template. * Adds commands to Dockerfile to copy that script into the local image and activate health-checking on launch. * Reduces old-menu MariaDB documentation to a stub pointing to new-menu documentation (this is already the situation for old-menu NextCloud documentation).
Follows on from suggestion in [Issue 415](SensorsIot#415) to add health-check to more containers. See also [PR 406](SensorsIot@dbb6217). Changes: * Adds `iotstack_healthcheck.sh` script to template. * Moves Dockerfile into `buildFiles` directory, and adds commands to copy the health-check script into the local image and activate health-checking on launch. Does not change any documentation on experimental branch.
@tablatronix I know that I can disable the built-in check that comes with the base Node-RED image, and that I can do it in either I think the Mosquitto script will turn out to be pretty robust and, if anyone does go to the trouble of proposing a similar PR for Eclipse-Mosquitto, having "ours" pre-empt "theirs" probably won't amount to a hill of beans.
I've just submitted PRs for adding a similar script to MariaDB (which will be inherited by Nextcloud_DB): In this case, the script runs If I knew a bit more about MySQL/MariaDB I'd probably try to fashion something that went further. It's this scenario that worries me more because there's much greater potential for a true MySQL guru to come up with a really good health-check script, and I wouldn't want my "I suppose it's a bit better than having no health-checking at all" solution to block something better coming to us from upstream. |
Sorry, I have no experience at all with Git or Docker but I thought I'd try and help get a health check merged in to Mosquitto. Please be gentle with your criticism ;-) |
just noticed the topic may be incorrect in healthcheck.sh. I'm sure they'll change that |
Adds health-check functionality to Grafana and InfluxDB 1.8, as discussed in SensorsIot#415. Health-check functionality already added to Mosquitto via SensorsIot#406. Closes SensorsIot#415 Signed-off-by: Phill Kelley <[email protected]>
Adds health-check functionality to Grafana and InfluxDB 1.8, as discussed in SensorsIot#415. Health-check functionality already added to Mosquitto via SensorsIot#409. Closes SensorsIot#415 Signed-off-by: Phill Kelley <[email protected]>
Adds health-check functionality to Grafana and InfluxDB 1.8, as discussed in SensorsIot#415. Health-check functionality already added to Mosquitto via SensorsIot#410. Closes SensorsIot#415 Signed-off-by: Phill Kelley <[email protected]>
It would be a nice have to have healthchecks for all containers.
Most can be pretty trivial, some examples might exists already for most services
https://docs.docker.com/engine/reference/builder/#healthcheck
The text was updated successfully, but these errors were encountered: