-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Run service after network-online.target #129
base: main
Are you sure you want to change the base?
Conversation
f3ee4d9
to
b68ffdb
Compare
"The service has to run after the network has been configured" - why? What is the problem you are seeing that this fixes? |
In my Fedora system with NetworkManager, the vlan interface on which I run the sqm scripts doesn't exist before network.target, and even if the dependency
With my patch, the layer_cake is configured correctly at startup. Do you want me to open an issue? |
We can keep the discussion here, no need to open a separate issue. I'd just like to understand a bit more what's causing the failure (and get this documented in the commit message) before changing this. So a couple more questions:
And lastly, but a bit tangential, why are you running sqm-scripts on the VLAN device and not the physical device in the first place? :) |
This is a x86_64 box that acts as my router and I'm running Fedora 33 on it. The interfaces are configured by NetworkManager, and I use shorewall to configure the iptables rules (I've disabled the traffic shaping capabilities of shorewall with CLEAR_TC=No and TC_ENABLED=No). My WAN interface is enp4s0 over VLAN 20. There's no other VLAN on this physical interface. Should I apply the SQM on enp4s0 only? am I doing this wrong? The .device units are autogenerated and they do exist:
This is the journal of the service unit with the default settings of
|
Juan Orti Alcaine <[email protected]> writes:
This is a x86_64 box that acts as my router and I'm running Fedora 33
on it. The interfaces are configured by NetworkManager, and I use
shorewall to configure the iptables rules (I've disabled the traffic
shaping capabilities of shorewall with CLEAR_TC=No and TC_ENABLED=No).
My WAN interface is enp4s0 over VLAN 20. There's no other VLAN on this
physical interface. Should I apply the SQM on enp4s0 only? am I doing
this wrong?
Well with that setup you can do either, as long as you adjust the
overhead appropriately. If you run it on the VLAN interface, CAKE won't
see the VLAN tag, so it needs to be accounted in the 'overhead'
parameter. Whereas if you run it on the physical interface, it sees the
tagged packets and won't need any adjustments. From the log it looks
like you're using the default 38-byte overhead which corresponds to an
ethernet header *without* VLAN tags. So with this setup, CAKE will
slightly underestimate the size of packets on the wire, so the shaping
bandwidth will be slightly off. So I'd suggest either running
sqm-scripts on the physical interface, or adding another four bytes of
overhead (by adding the 'ether-vlan' keyword to the CAKE config, or just
manually setting 'overhead 42').
(And if you had different VLANs going to different destinations, running
on the physical interface would mean you were shaping them all, which
would likely not have been appropriate; but with only one VLAN there's
no difference, really).
The .device units are autogenerated and they do exist:
```
# systemctl status sys-subsystem-net-devices-enp4s0.20.device
● sys-subsystem-net-devices-enp4s0.20.device - /sys/subsystem/net/devices/enp4s0.20
Loaded: loaded
Active: active (plugged) since Wed 2020-12-09 10:37:13 CET; 13min ago
Device: /sys/devices/virtual/net/enp4s0.20
# systemctl status sys-subsystem-net-devices-enp4s0.device
● sys-subsystem-net-devices-enp4s0.device - I211 Gigabit Network Connection
Loaded: loaded
Active: active (plugged) since Wed 2020-12-09 10:36:48 CET; 13min ago
Device: /sys/devices/pci0000:00/0000:00:04.0/0000:04:00.0/net/enp4s0
```
This is the journal of the service unit with the default settings of `Before=network.target` and debug logging:
Right, so this looks like sqm-scripts is started and the immediately
stopped again two seconds later? The stop-sqm timestamps correspond to
the time the device unit appeared. That's a bit odd, but maybe there's
some kind of flip-flop as networkmanager is setting up the VLAN device?
In which case maybe your suggestion of just changing to run after the
network is configured is not a bad one.
Could you post a corresponding log with your change, please?
|
Thanks for your comments. Looking at the log in more detail, I see that NetworkManager tries to mangle the qdiscs and fails with this error: After that, NetworkManager marks the connection as failed and brings the interface down. This is the cause of sqm stopping on the interface.
|
If sqm starts before NetworkManager has finished configuring the network interfaces, it can cause NetworkManager to fail with the following error: platform-linux: do-delete-tfilter[9: -65536]: failure 22 (Invalid argument - Parent Qdisc doesn't exists) After that, the interface is torn down, causing sqm to stop. To avoid that problem, wait until the network interfaces have been fully configured ordering the systemd unit after network-online.target. Signed-off-by: Juan Orti Alcaine <[email protected]>
b68ffdb
to
9651f57
Compare
Juan Orti Alcaine <[email protected]> writes:
Thanks for your comments.
Looking at the log in more detail, I see that NetworkManager tries to
mangle the qdiscs and fails with this error: `platform-linux:
do-delete-tfilter[9: -65536]: failure 22 (Invalid argument - Parent
Qdisc doesn't exists)`.
After that, NetworkManager marks the connection as failed and brings
the interface down. This is the cause of sqm stopping on the
interface.
Ah, I see. This sounds like a bug in NetworkManager triggered by the
vlan interface using 'noqueue' by default. I guess you can work around
it by moving the sqm-scripts instance to the physical interface? It does
indicate that changing the run order is not necessarily the right fix,
though...
|
FYI I've opened this bug to NetworkManager: https://bugzilla.redhat.com/show_bug.cgi?id=1906024 |
The service has to run after the network has been configured, so change
the ordering of the systemd unit after network-online.target.