-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use s-nail host mail server (FNAL/CERN) #1566
Conversation
I have the following workaround for the exim4 mail server failing inside docker. In general, the problem is related to the fact that some users are created at build time (exim4 creates a couple of users), that we lose when we bind mount the users from the host, but we need the latter in order to run as e.g.: cmst1, etc (since the host at CERN for example runs s-nail rather than exim4, their local users do not include anything related to exim4). We also had issues with cron due to this bind mounting approach which we are working around already. My opinion is that this change here should workaround exim4, but in the future a general solution would be to stop bind mounting /etc/group and /etc/passwd into the docker container and somehow create them at build time, in order to avoid services like cron or exim4 failing. |
I completely agree with you Kenyi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking good
@khurtado can you please also test this at FNAL? Once we confirm this is working in both CERN and FNAL (and virtual env, but Todor already confirmed it works for him), then we can move forward and also work on a new hot-patch for the agents. |
@amaltaro It seems I will need to work a bit more on this. The test at FNAL failed. The reason is both s-nail at the host and exim4 in docker are running on port 25 and we use the host network. I will need to work that around as well (in the config) |
1a02c01
to
83875c5
Compare
@khurtado I am not sure I have already mentioned this, but if needed, we should also consider reaching out to the CERN/FNAL teams and request new packages/services/etc to be installed. In case that would help to resolve this issue. |
@amaltaro @todor-ivanov
So, I changed the solution to use the host-level mail agent instead (re-adding system users/groups for this to work well was still needed). This current solution is working both at CERN and FNAL. There is one thing I would likely change though: Installs the s-nail package at runtime (this package has 500KB in size and installs no additional dependencies. When I checked with "docker ps -s", the additional size to the container was about 5MB total). https://github.com/dmwm/CMSKubernetes/blob/master/docker/pypi/wmagent-base/Dockerfile But of course, for that I would need to
If you guys approve the rest of the code here and prefer the s-nail installation in the base image, let me know and I can create the PR to proceed with the steps above. |
Given all the problems we had with this cronjob and email setup since the migration to dockers, I am starting to think that we should make a little extra effort and integrate all this logic into AgentStatusWatcher. The open question is, how do we notify our team upon component automatic restart? I think we would have the same problems, as the python library(ies) are probably using the system mail system. Any thoughts? |
Alan, please keep in mind the following:
Here is my suggestion:
In Go, it would be one page server, static executable (i.e. no run-time dependencies), it will be accessible in CMSWEB, it will be reachable from anywhere, it will work with any language via HTTP protocol, it will support customizable auth/authz as any other services we run, etc. In my view, it is the easiest and most portable and scalable solution. Here is an example of working one-page alerts web server and here is how it can be used in any programming language (here is curl shows its functionality):
|
@amaltaro I think we should first put some effort into fixing the root problem both this issue and the cron issue had in common: getting rid of the host-level passwd/group binding for the production user/group uuids inside docker. I say this because we may encounter other issues in the future with other services because of this binding trick. For the mail issue, considering this solution is working and uses the host-level mail agent, I would vote for using this now and perhaps open another issue with the tag "new feature", with a feature migration proposal, and assign a priority accordingly. In my opinion, while reimplementing this logic would be nice, if the goal is to make maintenance easier, then getting rid of the binding trick would remove a lot of the hacks we are doing at runtime and make maintenance much easier already though. |
hi @khurtado
I couldn't agree more! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @khurtado I had only two comments inline concerning readability of the code, but not game changing ones. In case we decide to continue in that direction consider my approval of this PR.
@@ -148,3 +172,13 @@ userStatus="$(docker exec -u root -it wmagent sh -c "passwd -S $wmaUser" | awk ' | |||
if [ "${userStatus:0:1}" == "P" ]; then | |||
docker exec -u root -it wmagent sh -c "echo $wmaUser:$wmaUser | chpasswd" | |||
fi | |||
|
|||
# Install s-nail |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This indeed would be better to go into the Docker file, either wmagent
or wmbase
. Which might also alleviate/remove the need for the package configuration here.
@todor-ivanov Thank you! I added implemented your change proposals for the nested conditionals. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @khurtado
@khurtado can you please update this PR (Dockerfile and scripts) with the new image tag available in CERN Registry? Please see: #1573 (comment) |
@amaltaro I just made the changes and tested with 2.3.8rc10 (which has the latest htcondor python bindings, as 10.9.0 is not on pypi anymore and couldn't build the image). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@khurtado thank you for making the relevant changes and testing them out.
I left one comment along the code, which I think it would provide further improvements to the way we are dealing with this mail service setup. In case we ever need to review this, we might consider that.
@@ -148,3 +169,10 @@ userStatus="$(docker exec -u root -it wmagent sh -c "passwd -S $wmaUser" | awk ' | |||
if [ "${userStatus:0:1}" == "P" ]; then | |||
docker exec -u root -it wmagent sh -c "echo $wmaUser:$wmaUser | chpasswd" | |||
fi | |||
|
|||
# Configure s-nail to use the host s-nail mail server | |||
docker exec -u root -it wmagent cp /etc/s-nail_host.rc /etc/s-nail.rc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking that we could keep the modified s-nail.rc under $HOST_MOUNT_DIR/admin/etc/
and simply mount it from there to /etc/s-nail.rc
. This way we could do some of this tweaking only once in a lifetime of an agent.
The update-alternatives
seem to be still required though - but perhaps it could go to the wmagent Dockerfile?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding update-alternatives
, I agree this could be a further improvement.
Regarding keeping the modified s-nail.rc in $HOST_MOUNT_DIR/admin/etc/
. What I'm afraid of in this case is that a s-nail update in the host could modify the host config and would not be in sync anymore (with a potential failure). We would need to add a verification step to make sure the config we have in $HOST_MOUNT_DIR/admin/etc/
without the tweaks match the host config, or update it and then tweak it otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point! Thank you for this clarification, Kenyi.
@amaltaro Thank you! |
Let me merge this before we forget. This is a great timing as well, as we are about to release a final stable release for WMAgent. |
Fixes dmwm/WMCore#12159
Depends on #1573
Issue: Both CERN and FNAL already run mail agents at the host-level. Deploying a mail server inside docker (like exim4) conflicts with the host-server, since both would operate on ports 25. Running an independent mail agent on different port can lead to firewall issues as well.
Solution: Both CERN and FNAL use the same mail agent (s-nail). Therefore, configure the containers to use the host mail agent instead.