Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add-on with "devices:" config entry does not allow to access device nodes anymore #2690

Closed
jens-maus opened this issue Mar 5, 2021 · 28 comments · Fixed by #3421
Closed

add-on with "devices:" config entry does not allow to access device nodes anymore #2690

jens-maus opened this issue Mar 5, 2021 · 28 comments · Fixed by #3421
Labels
enhancement no-stale Avoids issue going stale

Comments

@jens-maus
Copy link

jens-maus commented Mar 5, 2021

Describe the issue you are experiencing

Since the logic for the devices: entry in the config.json of add-on has changed some supervisor versions ago, I am not able to properly access certain device nodes in /dev even thought these device nodes are perfectly defined in the corresponding config.json of the developed test add-on (see https://github.com/jens-maus/ha-addon-raspmatic/blob/main/home-assistant-addon-strict/config.json#L21-L32)

What is the used version of the Supervisor?

supervisor-2021.03.04

What type of installation are you running?

Home Assistant OS

Which operating system are you running on?

Home Assistant Operating System

What is the version of your installed operating system?

6.0dev

What version of Home Assistant Core is installed?

core-2021.3.1

Steps to reproduce the issue

  1. Install latest HAos
  2. Use the following test Addon-Shop URL: https://github.com/jens-maus/ha-addon-raspmatic
  3. Install the RaspberryMatic CCU (strict) add-on listed
  4. Start the add-on, then exec a shell within the started add-on container (using docker exec -it ...)
  5. execute cat /dev/zram0 and see that it returns
    root@339c13ac-raspberrymatic-strict:~# cat /dev/zram0
    cat: can't open '/dev/zram0': Operation not permitted
  6. Install the other test add-on RaspberryMatic CCU from the same shop
  7. Start it with protected mode disabled and see that the same command returns no permission error anymore:
    root@339c13ac-raspberrymatic:~# cat /dev/zram0
    /ïcXiÅEϤⓌhassos-zramswapSWAPSPACE2
  8. Note, that the only difference between these two test add-ons is, that the config.json of the "strict" add-on version is, that full_access is not set to true and the devices: entry is used and defines /dev/zram0 as a valid access device, thought the strict add-on is still not able to access the device properly.

Anything in the Supervisor logs that might be useful for us?

The only logout I see when starting the strict add-on is:

21-03-05 12:37:13 INFO (SyncWorker_1) [supervisor.docker.addon] Starting Docker add-on ghcr.io/jens-maus/raspberrymatic with version 3.57.4.20210303-802244b
21-03-05 12:37:15 INFO (MainThread) [supervisor.hardware.monitor] Detecting HardwareAction.ADD usb hardware /dev/bus/usb/002/001
21-03-05 12:37:15 INFO (MainThread) [supervisor.hardware.monitor] Detecting HardwareAction.ADD usb hardware /dev/bus/usb/002/002
21-03-05 12:37:15 INFO (MainThread) [supervisor.hardware.monitor] Detecting HardwareAction.ADD serial hardware /dev/input/event3 - /dev/input/by-id/usb-VMware_VMware_Virtual_USB_Mouse-event-mouse
21-03-05 12:37:15 INFO (MainThread) [supervisor.hardware.monitor] Detecting HardwareAction.ADD usb hardware /dev/bus/usb/002/003
21-03-05 12:37:15 INFO (MainThread) [supervisor.hardware.monitor] Detecting HardwareAction.ADD audio hardware - /dev/snd/hwC0D0
21-03-05 12:37:15 INFO (MainThread) [supervisor.hardware.monitor] Detecting HardwareAction.ADD audio hardware - /dev/snd/pcmC0D0c
21-03-05 12:37:15 INFO (MainThread) [supervisor.hardware.monitor] Detecting HardwareAction.ADD audio hardware - /dev/snd/pcmC0D0p
21-03-05 12:37:15 INFO (MainThread) [supervisor.hardware.monitor] Detecting HardwareAction.ADD audio hardware - /dev/snd/controlC0
21-03-05 12:37:15 INFO (MainThread) [supervisor.hardware.monitor] Detecting HardwareAction.ADD usb hardware /dev/bus/usb/001/001
21-03-05 12:37:15 INFO (MainThread) [supervisor.host.sound] Updating PulseAudio information
21-03-05 12:37:15 INFO (MainThread) [supervisor.hardware.monitor] Detecting HardwareAction.ADD audio hardware - /dev/snd/seq
21-03-05 12:37:15 INFO (MainThread) [supervisor.hardware.monitor] Detecting HardwareAction.ADD audio hardware - /dev/snd/timer

See, that I didn't see any further output that might be helpful in seeing if the supervisor provides enough access to the add-on to be able to access the device nodes it requests access for using the devices: section in config.json. In addition, I couldn't even spot something in the docker inspect output of the container that shows up access assignment for the configured device nodes.

As outlined, in some previous versions a similar add-on config.json without full_access and enough privileged capabilities setting perfectly allowed to access all defined device nodes from the previous devices: config setting. However, after the change (and potentially also some other additional changes) this behaviour somehow broke and now I always have to start the add-on in privileged/non-protected mode to see it accessing the device nodes it requires.

@jens-maus jens-maus added the bug label Mar 5, 2021
@jens-maus
Copy link
Author

jens-maus commented Mar 5, 2021

Please allow me to directly tag @pvizeli and @agners here to get their attention to this ticket/issue since I feel they could immediately help in trying to help or identify the root cause here since we lately had contact regarding new realtime feature in the supervisor which I actually now try to use in my add-on development work here.

So if you guys have any idea what might be wrong here, please let me know. Some versions ago I could really perfectly start the add-on without any elevated --privileged docker support and could use all device nodes it requires. This seems to broke/change lately and I don't know why...

@pvizeli
Copy link
Member

pvizeli commented Mar 5, 2021

Yes, zram is blocked because we not allow add-ons to access to such nodes. We use that for the OS. You can only access to that node with disabled protection mode. If you need zram, your add-on is not compatible with our eco system, you can use this add-on only with disabled protection mode.

@pvizeli pvizeli removed the bug label Mar 5, 2021
@jens-maus
Copy link
Author

jens-maus commented Mar 5, 2021

I was using /dev/zram0 here just as an example and for simplicity reasons. The dev nodes my add-on requires/wants to access without having to disable protected mode would be:

    "/dev/raw-uart",
    "/dev/raw-uart1",
    "/dev/mmd_hmip",
    "/dev/mmd_bidcos",
    "/dev/eq3loop",
    "/dev/i2c-0",
    "/dev/i2c-1",

And even thought I defined these dev nodes in config.json the add-on container can't access these nodes. However, in past supervisor versions this was actually possible. So is there any possibility in allowing an add-on to define the dev nodes it requires without forcing an add-on to to completly disable protected mode? That would be indeed highly appreciated and helpful. Any please node, all thes dev nodes the add-on generates itself by loading the corresponding kernel modules using the SYS_MODULE capability being added to config.json. Thus, the add-on should also be able to actually use them if it is responsible for generating/clearing them afterwards.

@jens-maus
Copy link
Author

@pvizeli Sorry to bother you once again, but is there any chance that add-ons can use the devices: entry in their config.json to actually define device nodes they generated on their own and for which they will gain read/write access to without having to disable protection mode?

I really think it would be worth allowing add-ons to explicitly define device nodes (while internally keeping a black list for no-go device nodes exclusively protected by the supervisor) they would like to have access (e.g. because they generate them on their own) to without forcing them into unprotected mode and thus confuse users of such add-ons by having to disable protection mode altogether. I would really like to get my add-on working without having to disable protection mode altogether. As said, zram was just mentioned as an example above and it would be fine if the supervisor blocks it. But all the device nodes the add-on requires are actually generated by itself and thus it should be IMHO fine if the add-on is allowed to access them.

@pvizeli
Copy link
Member

pvizeli commented Mar 28, 2021

If they show themselves as uart interface, they should get access over uart: true based on udev and https://www.kernel.org/doc/Documentation/admin-guide/devices.txt you need to make sure that you get the right major number or at least udev subsystem is the correct setup for the device, then the only magic which you need is uart: true otherwise you can use the device's mapping to map specific devices if they are present, but they need also show correctly on the host udev, you can find it out if they are listening on ha hardware info.

Make also sure, that you have not run an udev daemon inside your container because they get out of sync over time. Just add udev: true into your config and you have the udev information directly from host system without the need of starting an udev daemon.

@jens-maus
Copy link
Author

@pvizeli Thanks for the hint. I will try to test/analyze that and let you know what I find out. But please note that not all devices my add-on need are actualy uart devices. See the list, it also requires access to, e.g. /dev/i2c-0 to be able to query i2c devices.

@pvizeli
Copy link
Member

pvizeli commented Mar 29, 2021

Right:

uart: true
devices:
  - /dev/i2c-0

That will allow you to access all uart devices (subsystem=tty) and if available, the /dev/i2c-0. But that work only for devices which you see on ha hardware info, at least for uart the major mapping need to be correct or the udev subsystem.

@jens-maus
Copy link
Author

@pvizeli Thanks again, I changed the config.json now and added uart: true and at least the uart devices and the /dev/i2c-0 can be accessed now even with protected mode enable. However, during startup my add-on modprobes a kernel module that during runtime generates a /dev/eq3loop device and even thought I added that device node to devices: (see https://github.com/jens-maus/ha-addon-raspmatic/blob/main/home-assistant-addon-strict/config.json#L21-L29) I cannot access it and the application trying to use it returns the following error:

2021/03/29 20:33:26.081 <Warning> UpstreamCharConnection could not open master port /dev/eq3loop: Operation not permitted

This application tries to access the /dev/eq3loop using a simple int fd = open (masterdev, O_RDWR | O_NONBLOCK); syscall operation, which however is rejected. So is there a way or additional config.json setting which would allow the application to access this device after it has been created by the respective modprobe has been issued?

@pvizeli
Copy link
Member

pvizeli commented Mar 30, 2021

my add-on modprobes a kernel module that during runtime generates a /dev/eq3loop device right, that will only work if it's using the common https://www.kernel.org/doc/Documentation/admin-guide/devices.txt for uart/tty. It's still a container that is a bit static with dynamic devices. Let me see if we can adjust cgroup rules at runtime, so we can patch tty subsystem devices which are not in devices.txt available. Maybe that is possible

@jens-maus
Copy link
Author

my add-on modprobes a kernel module that during runtime generates a /dev/eq3loop device right, that will only work if it's using the common https://www.kernel.org/doc/Documentation/admin-guide/devices.txt for uart/tty. It's still a container that is a bit static with dynamic devices. Let me see if we can adjust cgroup rules at runtime, so we can patch tty subsystem devices which are not in devices.txt available. Maybe that is possible

That would be great, thanks. The actual /dev/eq3loop device that the eq3_char_loop kernel module generates (see home-assistant/operating-system#1266) registers itself as:

crw-------    1 root     root      246,   0 Mar  1 19:33 /dev/eq3loop

Please note, on top of that another process of my add-on (multimacd) which uses this /dev/eqloop device will then generate slave char devices (/dev/mmd_hmip, /dev/mmd_bidcos) with different minor numbers:

crw-------    1 root     root      246,   2 Mar  1 19:33 /dev/mmd_bidcos
crw-------    1 root     root      246,   1 Mar  1 19:33 /dev/mmd_hmip

Thus, the protected container would also need to be able to access these slave char devices, which is currently not possible.

@pvizeli
Copy link
Member

pvizeli commented Mar 30, 2021

Can you use somethings like: https://github.com/home-assistant/operating-system/blob/dev/buildroot-external/rootfs-overlay/usr/lib/udev/rules.d/99-hmip-rfusb.rules ?

It's not possible to adjust the cgroup devices permission at runtime. Means you need call /addons/self/restart which reevaluates the permissions. How they devices look inside udev (ha hardware info)?

@jens-maus
Copy link
Author

jens-maus commented Mar 30, 2021

Can you use somethings like: https://github.com/home-assistant/operating-system/blob/dev/buildroot-external/rootfs-overlay/usr/lib/udev/rules.d/99-hmip-rfusb.rules ?

I don't think that this is possible. As said the eq3_char_loop kernel module is loaded on runtime during add-on start and not initiated by connecting a hardware and then generates the /dev/eq3loop as its master device and then listens for connection and the process multimacd then opens /dev/eq3loop and causes the kernel module to generate two slave devices (/dev/mmd_hmip and /dev/mmd_bidcos) which are used to route corresponding homematic/homematicIP connection packages to different processes (rfd and HMIPServer).

It's not possible to adjust the cgroup devices permission at runtime. Means you need call /addons/self/restart which reevaluates the permissions. How they devices look inside udev (ha hardware info)?

The devices in question look like:

- attributes:
    DEVNAME: /dev/eq3loop
    DEVPATH: /devices/virtual/eq3loop/eq3loop
    MAJOR: "235"
    MINOR: "0"
    SUBSYSTEM: eq3loop
  by_id: null
  dev_path: /dev/eq3loop
  name: eq3loop
  subsystem: eq3loop
  sysfs: /sys/devices/virtual/eq3loop/eq3loop
- attributes:
    DEVNAME: /dev/mmd_hmip
    DEVPATH: /devices/virtual/eq3loop/mmd_hmip
    MAJOR: "235"
    MINOR: "1"
    SUBSYSTEM: eq3loop
  by_id: null
  dev_path: /dev/mmd_hmip
  name: mmd_hmip
  subsystem: eq3loop
  sysfs: /sys/devices/virtual/eq3loop/mmd_hmip
- attributes:
    DEVNAME: /dev/mmd_bidcos
    DEVPATH: /devices/virtual/eq3loop/mmd_bidcos
    MAJOR: "235"
    MINOR: "2"
    SUBSYSTEM: eq3loop
  by_id: null
  dev_path: /dev/mmd_bidcos
  name: mmd_bidcos
  subsystem: eq3loop
  sysfs: /sys/devices/virtual/eq3loop/mmd_bidcos

Would really be great to have some methodology in the supervisor or config.json to allow my add-on to still be started in protected mode while at the same time being able to access these three dynamically generated char loop devices. And as said before, some supervisor version before this was really working by just specifying these devices in the devices: array, but then suddenly stopped at some later supervisor version.

@pvizeli
Copy link
Member

pvizeli commented Mar 31, 2021

If you have the device exists on devices and the kernel module is loaded and the device is created on the time which the add-on starts, you get cgroup access to the device. But if the add-on start before the kernel module is loaded, the system doesn't know these devices and can't look up the major number to generate the cgroup permissions. You need to restart your add-on to become accessible to these special devices, there is no dynamic cgroup handling exists for containers today.

But what you saw is, you can't use the full_access and devices at the same time and I think you see also a warning in the logs about this. So long you use full_access, the devices is ignored because this add-on will run with full access, maybe would also make sense to ignore uart, usb etc in that case

@jens-maus
Copy link
Author

jens-maus commented Mar 31, 2021

@pvizeli Thanks for your comments.

Let me explain first that of course the add-on works fine with full_access and disabled protection mode. So that's not the point here. But as you probably guessed already, I would like to get the add-on working without full_access and thus without having to force users to disable protection mode in the end so that the add-on runs with elevated security.

Most of the stuff within the add-on already works with protection mode enabled and in the past (some supervisor versions ago) the add-on also worked completely in protection mode. But at some point in time it stopped because you probably changed the logic regarding devices: and the cgroup to some extend. The current state of affairs is, that apart from the above three dynamic devices all functionality works. The other uart devices also listed in the config.json (/dev/raw-uart) can be perfectly accessed. The only thing that is missing is, to allow the container to not only generate these dynamic devices (the modprobe works already) during runtime, but to simply use them also via simple open() calls. So is there really no way that you don't allow device access only based on MAJOR/MINOR numbering but also based on device node name (e.g. /dev/eq3loop)? In the end, that's exactly what worked in the past when I added /dev/eq3loop and the other two to the devices: entry in the config.json file. Or isn't it possible to enhance the devices: entry to also directly specify MAJOR:MINOR numbering so that I could add these numbers in the config.json instead?

Edit:
Without knowing the supervisor internals, what about this approach in using the devices.allow cgroup entry for this purpose? See https://stackoverflow.com/questions/24225647/docker-a-way-to-give-access-to-a-host-usb-or-serial-device

@pvizeli
Copy link
Member

pvizeli commented Apr 2, 2021

Your driver uses not specify MAJOR:MINOR, they are different based on Linux Kernel and which kind of devices the user have. You will maybe show all time the same number, but that number is only for this system and maybe some other with exactly the same configuration/kernel. The reason why we evaluate the major un runtime for such devices.

There is another way, initialize. You can write an add-on with that service type which just handles the kernel drivers. The Supervisor starts such an add-on just on the first start to allow handling drivers/options for later normal system start.

Using devices.allow would be nice, but is also complex, we need first to update our system to get access to host cgroup devices. But I will definitive looking into that because it would make the device handling from static containers more dynamic which such kind of devices.

@jens-maus
Copy link
Author

Your driver uses not specify MAJOR:MINOR, they are different based on Linux Kernel and which kind of devices the user have. You will maybe show all time the same number, but that number is only for this system and maybe some other with exactly the same configuration/kernel. The reason why we evaluate the major un runtime for such devices.

You are right. I didn't consider this. However, what I still don't understand is, why this was already working in the past when then devices: config.json accepted device nodes together with the *:rwm syntax? And when you reworked all this the add-on started to require the protection mode, unfortunately.

There is another way, initialize. You can write an add-on with that service type which just handles the kernel drivers. The Supervisor starts such an add-on just on the first start to allow handling drivers/options for later normal system start.

Using devices.allow would be nice, but is also complex, we need first to update our system to get access to host cgroup devices. But I will definitive looking into that because it would make the device handling from static containers more dynamic which such kind of devices.

That would be great, indeed. Is there any planned ETA for that? Because I would rather not write another initialize add-on just for loading the kernel drivers if that's not required. That just complicates things in the end and is prone to users tending to forget that step. So if you can get this devices.allow cgroup thingy implemented it would be great. Furthermore, loading the eq3_char_loop kernel driver is just one step of the picture. This just brings up the /dev/eq3loop kernel driver. The other two slave devices /dev/mmd_hmip and /dev/mmd_bidcos are actually generated and removed on runtime by the multimacd application that is used during startup of the HomeMatic/RaspberryMatic add-on. Thus, using a initialize add-on type would probably not help in the end. So if you can work on this improvement of a more fine grained configuration for dynamic devices also being possible within add-ons, this would be highly appreciated! Thanks.

@github-actions
Copy link

github-actions bot commented Jun 5, 2021

There hasn't been any activity on this issue recently. Due to the high number of incoming GitHub notifications, we have to clean some of the old issues, as many of them have already been resolved with the latest updates.
Please make sure to update to the latest version and check if that solves the issue. Let us know if that works for you by adding a comment 👍
This issue has now been marked as stale and will be closed if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale label Jun 5, 2021
@jens-maus
Copy link
Author

This issue indeed still exists and is relevant since I would really like to get my RaspberryMatic HA add-on to not force users into disabling protection mode. And some work has already been started on getting this situation improved. @pvizeli and ETA for getting this done?

@github-actions github-actions bot removed the stale label Jun 5, 2021
@pvizeli
Copy link
Member

pvizeli commented Jun 9, 2021

I need to finish #2811 - but it looks like I can't finish it this month, too many stuff on my list :(

@jens-maus
Copy link
Author

No problem. Take your time, I just wanted to make sure you won't forget it. If during the next months it will be possible to remove the requirement for disabling protection mode for my add-on, this is fine with me. Thanks in advance!

@github-actions
Copy link

github-actions bot commented Aug 8, 2021

There hasn't been any activity on this issue recently. Due to the high number of incoming GitHub notifications, we have to clean some of the old issues, as many of them have already been resolved with the latest updates.
Please make sure to update to the latest version and check if that solves the issue. Let us know if that works for you by adding a comment 👍
This issue has now been marked as stale and will be closed if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale label Aug 8, 2021
@jens-maus
Copy link
Author

@pvizeli two more months have past since my last comment here and it seems you have finished #2811. Any chance you can work on this issue anytime soon because with the latest HA core my RaspberryMatic HA addon is now even rated 1 even thought protected mode is not disabled. And this would definitely make some users wonder if it is safe to use this addon or not while IMHO it is quite uninvasive in its current sense.

@github-actions github-actions bot removed the stale label Aug 8, 2021
@github-actions
Copy link

github-actions bot commented Oct 7, 2021

There hasn't been any activity on this issue recently. Due to the high number of incoming GitHub notifications, we have to clean some of the old issues, as many of them have already been resolved with the latest updates.
Please make sure to update to the latest version and check if that solves the issue. Let us know if that works for you by adding a comment 👍
This issue has now been marked as stale and will be closed if no further activity occurs. Thank you for your contributions.

@jens-maus
Copy link
Author

Please allow me to reopen this ticket. Is there any chance that there will be any progress here since os-agent is now merge for a long time and it would be great to get this issue fixes/resolved somehow.

@jens-maus
Copy link
Author

@pvizeli Thx for PR #3421 merge. So in which public (and when) supervisor update this change will be integrated so that I can test it?

@agners
Copy link
Member

agners commented Jan 26, 2022

Whenever something gets merged onto main it gets built and a new dev release hits the dev channel (see https://github.com/home-assistant/supervisor/actions/runs/1751928414, in thise case 2022.01.dev2604 is the version you are looking for). On a installation, use:

ha supervisor options --channel=dev
ha supervisor reload
ha supervisor update

@jens-maus
Copy link
Author

Thanks for the hints regarding updating the supervisor to the latest dev version. And "hooray" 🎉 I could perfectly verify that with these changes here from @pvizeli together with the latest HAos 8.0dev version a RaspberryMatic add-on with enabled protection mode (full_access removed from config.yaml) everything works perfectly.

See here:
Bildschirmfoto 2022-01-28 um 10 47 00
Bildschirmfoto 2022-01-28 um 10 47 13

So the RaspberryMatic add-on now gets a green 6 rating as an add-on while it is perfectly able to load/unload all necessary kernel modules for using all homematic rf modules.

As it seems the only things missing are now:

  1. Release current dev supervisor as a stable release to the public
  2. Get HaOS 7.3 release with the dualcopro pull request integrated

Afterwards we/I can then try to get multicast udp running. Thus solving home-assistant/plugin-multicast#17 and to jens-maus/RaspberryMatic#1373

So when exactly is (1) and (2) going to be happening?

@pvizeli
Copy link
Member

pvizeli commented Jan 28, 2022

  1. I guess Monday, will make an beta release today
  2. Is somethings in control from by @agners

@github-actions github-actions bot locked and limited conversation to collaborators Feb 28, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement no-stale Avoids issue going stale
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants