-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[storage] Remove NVMe parameters from front-end that don't belong #206
Conversation
Many of these will end up on the middle/back-end instead. Signed-off-by: Ben Walker <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to see how we propose to solve the bridge to Marvell and Nvidia if we remove those ?
see https://github.com/opiproject/opi-marvell-bridge/blob/main/mrvl_nvme_json.rpc_methods.pdf
I'm ok if we move it to middle or back, but before we remove them, maybe we should add middle/back example first... ?
wdyt ?
Which Marvell call breaks if these are removed? I tried to read over their code and I don't see any of these used. Separately, as an FYI, they're missing several of their RPCs from that PDF documentation. They're in the go code but not in the PDF. I also see several places where the subsystem_id is used as the SubNQN in that bridge - those are not the same thing. It needs to use the subsystem_id to look up the subsystem object, which has the NQN in it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all looks good, one question inline
@@ -195,23 +195,12 @@ message NVMeNamespaceSpec { | |||
// subsystem for this namespace | |||
common.v1.ObjectKey subsystem_id = 2; | |||
|
|||
// key of the PCIe controller object that will host this namespace. | |||
common.v1.ObjectKey controller_id = 3; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the namespace is associated with controller via subsystem_id?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In #177 I was proposing that we do not need to support attaching namespaces to individual controllers (for the NVMe front-end). We only need to support attaching namespaces to subsystems. That means mrvl_nvm_ctrlr_attach_ns
isn't ever used (you use mrvl_nvm_subsys_alloc_ns
instead)
Attaching a namespace to a subsystem, rather than to a controller, is the "normal" thing to do in NVMe. Attaching a namespace to just one controller is the "weird corner case you can technically do but probably shouldn't" case, and it's likely best modeled as making a subsystem-attached namespace "visible" to a controller, rather than attaching it to a controller outright. This visibility configuration is what the NVM attach/detach namespace commands do in the spec - the namespace is already attached to the subsystem and those commands set up visibility rules for a given controller.
From our past discussions, I'm worried that this particular point isn't well understood and that most people are thinking the normal thing to do is to attach a namespace to a controller. Are we aligned on this particular point or does this need some dedicated discussion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand, just few more questions. thanks for patience @benlwalker
so if we only have a single subsystem, for all the emulated NVMe devices, and if NS is property of the subsystem, like you say, then on every VF/PF we create, will see ALL
namespaces at once ? I don;t like to be forced to create multiple subsystems... for example for multipathing purposes...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @benlwalker for describing the 'attach' semantics.
As @glimchb pointed out, with this change, we can work out but create a subsystem for every controller that needs to be associated with a namespace. It is a bit extra work, but will it run into something you could do otherwise and now you can't?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed a bit on Slack but to recap
so if we only have a single subsystem, for all the emulated NVMe devices, and if NS is property of the subsystem, like you say, then on every VF/PF we create, will see ALL namespaces at once
This is a true statement, but you would not want to create a single subsystem for all controllers in any design. I've asked for use cases where you would want to do that and I haven't heard any. I think we need to get to some consensus that using a single subsystem for all controllers is a design mistake. Maybe on the call today or here we can try to make some forward progress.
As @glimchb pointed out, with this change, we can work out but create a subsystem for every controller that needs to be associated with a namespace. It is a bit extra work, but will it run into something you could do otherwise and now you can't?
I'm not very confident I understand what you're saying here. But yes, for a multitude of reasons you do want to create a subsystem per tenant, which is usually per controller. It's the only way to support live migration, provide the right performance isolation guarantees (against resets mostly), not leak information about other tenants, and get the NN value suitable for booting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that between tenants you want to create separate subsystems. But inside the same tenant ? single ? several ?
I think maybe @jainvipin says that creating many subsystem if a heavy burden in DPU resources, so you might hit a limit... ?
the use case we talked about is multipathing with 2 ports, there you need 2 devices instead of 4 present to the host...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@glimchb - I am okay configuring multiple subsystems to have multiple controllers to refer to a different namespace. This way removal of reference from Controller to namespace is not a problem.
I was hoping to see a proposal on the backend APIs and then remove these... but it should be okay to first take them out and then discuss the backend APIs |
|
I agree, would like to see us add things before remove things as well |
Ok - I don't believe these changes require any updates to the Marvell bridge. I can't see the NVIDIA one.
I don't think these are the same thing. If you look at how CreateNvmeNamespace is implemented on the Marvell bridge, it calls
I limited myself to only removing things that cannot possibly work as part of a front-end configuration. So I'm not removing any functionality that was implementable anyway, and we may as well get rid of it now. We're not losing anything, only gaining clarity. |
|
Whoops, I didn't realize that the GitHub pdf viewer has a "more pages" button - I see it now. Thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see there is no agreement here, but I feel we can go back to this discussion with more concrete use cases
and implementation examples
once/if we upstream nvidia and amd code
Hope the team can give it another go in that case.
merging this for now and moving on
Many of these will end up on the middle/back-end instead.
Signed-off-by: Ben Walker [email protected]