-
Notifications
You must be signed in to change notification settings - Fork 249
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
drivers/quota.Control: lock a mutex when called #1604
base: main
Are you sure you want to change the base?
Conversation
LGTM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the local object level LGTM.
(Internal users don’t need this because the internal user is the overlay driver, called under the layer store lock — but CRI-O is calling driver.ReadWriteDiskUsage
.)
Oh, actually external callers have direct access to the graph driver via store.GraphDriver
??! What does that mean for #1214 ? Do we need to re-add that, and add other synchronization on all the public methods of all drivers?
Does the ReadWriteDiskUsage
need to care about concurrent layer updates / removals? I.e. should the actual locking happen at the ReadWriteDiskUsage
API level, to synchronize with all other layer updates by the graph driver?
Or is the right decision to pretend that the public access to the graph driver mostly doesn’t exist, it’s the caller’s responsibility not to do that, and we treat ReadWriteDiskUsage
as a special case highly-optimized for concurrency? If that’s the intent, I think it would be useful to have widespread comments pointing that special case out.
(I appreciate that this fixes #1394 as is, and it’s better to have this PR than to have nothing. It’s just… a whole new can of worms I was not aware of. Even if the individual disk usage computation was fine vs. concurrent updates/removals, and it would be fine to return such mostly-invalid data, I’m at least worried about things like |
It's not exposed through the |
The "quotas" map and "next project ID" field don't take well to being created and updated concurrently in multiple threads. Throw a lock around the whole object when any of its exported methods are called. Signed-off-by: Nalin Dahyabhai <[email protected]>
edb07d4
to
8db5cc4
Compare
Rebased. |
The CRI-O backtrace in the original report points at https://github.com/cri-o/cri-o/blob/80228a5d6a4b26e9ed2363ca6a936f5fb604c607/internal/lib/stats/stats_server.go#L189-L197 , which uses a published method of Line 154 in 9e425af
(The Podman user I can find is https://github.com/containers/podman/blob/b15510694b81929dbc107d354fdb0a83d0bfaa2a/libpod/runtime_volume_common.go#L181-L197 , but that’s a private single-use |
Yikes, I thought we'd dropped that somewhere along the way. |
WDYT about marking That would require giving CRI-O a new |
The "quotas" map and "next project ID" field don't take well to being created and updated concurrently in multiple threads. Throw a lock around the whole object when any of its exported methods are called. Fixes #1394.