Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenVPN signals to systemd that it is ready before the VPN connection is up #617

Open
me-and opened this issue Oct 1, 2024 · 6 comments

Comments

@me-and
Copy link

me-and commented Oct 1, 2024

Describe the bug
OpenVPN sends the READY=1 sd_notify message to systemd once it has completed initialisation, before the VPN tunnel is established. This means that systemd units that depend on the VPN connection being up by ordering themselves after the OpenVPN unit will fail.

To Reproduce
Broad outline:

  1. Create a systemd unit for an OpenVPN connection
  2. Create a systemd unit that can only succeed once the OpenVPN connection is up, and order it after the OpenVPN unit with Requires= and After=
  3. Start the second unit. Note that it fails.

My specific scenario has CIFS mount and automount units; the mount will only succeed if the VPN has connected. The trimmed-for-simplicity unit config looks like this:

/etc/systemd/system/openvpn-pdnet.service:

[Unit]
After=network.target
Description=OpenVPN instance ‘pdnet’

[Service]
ExecStart=openvpn --suppress-timestamps --config <file>
Restart=always
Type=notify

/etc/fstab:

//example.org/Directory /mnt cifs rw,credentials=/etc/creds,x-systemd.automount,x-systemd.mount-timeout=60s,nofail,x-systemd.requires=openvpn-pdnet.service,x-systemd.after=openvpn-pdnet.service 0 0

Expected behavior
OpenVPN does not signal to systemd that it is ready until the VPN connection is actually up, so that systemd units configured to only start after the OpenVPN unit can safely rely on the connection being available.

Version information:

  • OS: NixOS 24.05
  • OpenVPN version: 2.6.12

Additional context
The current behaviour is deliberate, and was introduced by e83a868 in 2017; before that commit the behaviour I'd like was in place. Given this is deliberate behaviour, I'm opening this report to discuss whether we want a fix; the fix itself is pretty straightforward assuming there's consensus that this is actually a bug.

There are three justifications given in the commit message for that change:

  • "First, it adds challenges if --chroot is used in the configuration; this is already fixed."

    I don't use --chroot so I'm not quite sure what the challenges are, but if they've already been fixed then they presumably don't need this fix as well?

  • "Secondly, it will cause havoc on static key p2p mode configurations where the log line above ["Initialization Sequence Completed"] will not happen before either sides have completed establishing a connection"

    I also don't use these configurations, and I don't have a sense of what havoc this causes. But it seems correct to me that the OpenVPN client would not report itself as ready to systemd before the connection has been established.

  • "And thirdly, if a client configuration fails to establish a connection within 90 seconds, it will also fail. For the third case this may
    not be a critical issue itself, as the host just needs to get an Internet access established first - which in some scenarios may take much longer than those 90 seconds systemd grants after the OpenVPN client configuration is started."

    I do understand this issue, but I think this is the wrong solution. Or at least it's the wrong solution now; it may well be that systemd didn't have better options when this change was made in 2017. The point of reporting that a unit is ready is that systemd knows the unit can be used and other units that depend on it can run. If it's taking a long time for OpenVPN to fully establish a connection, it's correct that systemd should know and be able to act on that situation.

    If this is causing problems, the correct solutions IMVHO are either (a) setting a longer timeout before systemd declares the unit has failed, using TimeoutStartSec=, or (b) where there is some other blocking requirement like Internet access, to order the OpenVPN unit after that access has been established, e.g. by specifying Wants=network-online.target and After=network-online.target.

As I say, I definitely don't understand the ramifications of at least two of the problems outlined above, and I could well believe I don't fully appreciate the third either. Nonetheless, being able to order one systemd unit after another is exactly what the READY=1 notification is intended for, and this functionality is broken with the current OpenVPN code. At the very least, I'd like the behaviour to be configurable.

I have a patch more-or-less ready to go for what I consider to be the preferable behaviour, and I'm currently running OpenVPN with that patch in place.

@me-and
Copy link
Author

me-and commented Oct 1, 2024

The patch I'm currently using is at https://github.com/me-and/nixcfg/blob/d53a799b29de3a80889db1c13b391fef4d422113/overlays/openvpn/openvpn.diff, although I expect it'll want a small amount of tidying for style.

@cron2
Copy link
Contributor

cron2 commented Oct 1, 2024

What does the patch do on openvpn server instances (where "waiting for an established VPN session" might never happen)?

@me-and
Copy link
Author

me-and commented Oct 1, 2024

Ah! Good question; I hadn't realised this code was used for server initialisation as well as client initialisation.

In the server scenario, I'd expect OpenVPN to emit the "ready" signal as soon as it can accept incoming VPN connections, which probably means it'll need to emit the signal in different places depending on whether it's acting as a client or server.

I also now suddenly have a much clearer idea of what the p2p use case is. I imagine that mode is the clearest use case for making this configurable, as I can absolutely see people wanting to rely on both OpenVPN being available for connections and for OpenVPN to be fully connected.

@cron2
Copy link
Contributor

cron2 commented Oct 2, 2024

Introducing a new config option is the traditional way we solve things, and now we have hundreds of options that nobody can remember why they were introduced :-) - so this needs good consideration.

Your use case ("client, outbound connection, must wait") and the server case ("signal systemd as soon as incoming connections can be accepted") could be distinguished by looking at the --client option. But... other users might be happy with OpenVPN just starting and succeeding "eventually", not having something wait for it... would that interfere with anything?

p2p mode comes in two kinds - one is the old static key mode, which basically does not negotiate anything (so, "READY" is reached "when everything is ready"). Then, there's TLS p2p mode, which does negotiate, and has a tls-client and tls-server role... so these could again be differentiated by config.

OTOH, maybe we do want an option... pinging @flichtenheld to get more brains.

@flichtenheld
Copy link
Member

Note that is is very helpful to also go to the Trac tickets referenced in the commit (https://community.openvpn.net/openvpn/ticket/827, https://community.openvpn.net/openvpn/ticket/801). They show much more about the motivation behind the patch and @dsommers there acknowledges the problems that are raised in this issue. So as you said, this is definitely expected behavior and not an accidental regression. So just reverting this patch will probably be not the correct solution.

Ultimately I guess there are too many use-cases to satisfy everyone by the default behavior. A --client with stable internet but complex dependencies might want different behavior than a --client on the road with intermittent connections. And a --server might want to have different behavior again.

So far I saw two suggestions to make the behavior more flexible:

  1. As suggested above, add a config option (default might depend on --client/--server/etc.). Relatively straightforward, one more option...
  2. In 827 it was suggested that a possible solution might be an additional systemd service that could monitor openvpn and be used to report an additional status. That would probably need to talk to openvpn on the management interface to get a accurate status. This might be relatively complex.

@me-and
Copy link
Author

me-and commented Oct 2, 2024

@flichtenheld Thank you! I hadn't known how/where to find the Trac tickets, but that makes a lot of sense.

Having an additional service that could provide additional state information is the approach taken by several other parts of systemd, for example network.target vs network-online.target, or time-set.target vs time-sync.target. But, as you say, that seems like it'll add a lot of complexity, and would only be more useful than the config option approach if someone needed to use the same configuration but have different operations ordered after both the initial setup being completed and after the VPN was fully established.

My inclination would be config options. Something like sd-notify on-init vs sd-notify on-connect, with on-connect being the default behaviour for clients, and on-init being the current behaviour and the default otherwise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants