Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[storage] Remove NVMe parameters from front-end that don't belong #206

Merged
merged 1 commit into from
Nov 17, 2022

Conversation

benlwalker
Copy link

Many of these will end up on the middle/back-end instead.

Signed-off-by: Ben Walker [email protected]

Many of these will end up on the middle/back-end instead.

Signed-off-by: Ben Walker <[email protected]>
@benlwalker benlwalker requested a review from a team as a code owner November 14, 2022 19:41
@glimchb glimchb added Storage APIs or code related to storage area Merge Candidate in the open merge window, next candidate for merge labels Nov 14, 2022
Copy link
Contributor

@sandersms sandersms left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good.

Copy link
Member

@glimchb glimchb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to see how we propose to solve the bridge to Marvell and Nvidia if we remove those ?
see https://github.com/opiproject/opi-marvell-bridge/blob/main/mrvl_nvme_json.rpc_methods.pdf

I'm ok if we move it to middle or back, but before we remove them, maybe we should add middle/back example first... ?

wdyt ?

@benlwalker
Copy link
Author

I would like to see how we propose to solve the bridge to Marvell and Nvidia if we remove those ?
see https://github.com/opiproject/opi-marvell-bridge/blob/main/mrvl_nvme_json.rpc_methods.pdf

I'm ok if we move it to middle or back, but before we remove them, maybe we should add middle/back example first... ?

Which Marvell call breaks if these are removed? I tried to read over their code and I don't see any of these used. Separately, as an FYI, they're missing several of their RPCs from that PDF documentation. They're in the go code but not in the PDF.

I also see several places where the subsystem_id is used as the SubNQN in that bridge - those are not the same thing. It needs to use the subsystem_id to look up the subsystem object, which has the NQN in it.

Copy link
Contributor

@jainvipin jainvipin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all looks good, one question inline

@@ -195,23 +195,12 @@ message NVMeNamespaceSpec {
// subsystem for this namespace
common.v1.ObjectKey subsystem_id = 2;

// key of the PCIe controller object that will host this namespace.
common.v1.ObjectKey controller_id = 3;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the namespace is associated with controller via subsystem_id?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see mrvl_nvm_ctrlr_attach_ns uses Ctrl_id here PDF

question : how we accommodate this ? we can create new methods in OPI for attach/detach ns #177
or we associate controller with NS in a different way ? thoughts ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In #177 I was proposing that we do not need to support attaching namespaces to individual controllers (for the NVMe front-end). We only need to support attaching namespaces to subsystems. That means mrvl_nvm_ctrlr_attach_ns isn't ever used (you use mrvl_nvm_subsys_alloc_ns instead)

Attaching a namespace to a subsystem, rather than to a controller, is the "normal" thing to do in NVMe. Attaching a namespace to just one controller is the "weird corner case you can technically do but probably shouldn't" case, and it's likely best modeled as making a subsystem-attached namespace "visible" to a controller, rather than attaching it to a controller outright. This visibility configuration is what the NVM attach/detach namespace commands do in the spec - the namespace is already attached to the subsystem and those commands set up visibility rules for a given controller.

From our past discussions, I'm worried that this particular point isn't well understood and that most people are thinking the normal thing to do is to attach a namespace to a controller. Are we aligned on this particular point or does this need some dedicated discussion?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand, just few more questions. thanks for patience @benlwalker
so if we only have a single subsystem, for all the emulated NVMe devices, and if NS is property of the subsystem, like you say, then on every VF/PF we create, will see ALL namespaces at once ? I don;t like to be forced to create multiple subsystems... for example for multipathing purposes...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @benlwalker for describing the 'attach' semantics.
As @glimchb pointed out, with this change, we can work out but create a subsystem for every controller that needs to be associated with a namespace. It is a bit extra work, but will it run into something you could do otherwise and now you can't?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed a bit on Slack but to recap

so if we only have a single subsystem, for all the emulated NVMe devices, and if NS is property of the subsystem, like you say, then on every VF/PF we create, will see ALL namespaces at once

This is a true statement, but you would not want to create a single subsystem for all controllers in any design. I've asked for use cases where you would want to do that and I haven't heard any. I think we need to get to some consensus that using a single subsystem for all controllers is a design mistake. Maybe on the call today or here we can try to make some forward progress.

As @glimchb pointed out, with this change, we can work out but create a subsystem for every controller that needs to be associated with a namespace. It is a bit extra work, but will it run into something you could do otherwise and now you can't?

I'm not very confident I understand what you're saying here. But yes, for a multitude of reasons you do want to create a subsystem per tenant, which is usually per controller. It's the only way to support live migration, provide the right performance isolation guarantees (against resets mostly), not leak information about other tenants, and get the NN value suitable for booting.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that between tenants you want to create separate subsystems. But inside the same tenant ? single ? several ?

I think maybe @jainvipin says that creating many subsystem if a heavy burden in DPU resources, so you might hit a limit... ?

the use case we talked about is multipathing with 2 ports, there you need 2 devices instead of 4 present to the host...

Copy link
Contributor

@jainvipin jainvipin Nov 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@glimchb - I am okay configuring multiple subsystems to have multiple controllers to refer to a different namespace. This way removal of reference from Controller to namespace is not a problem.

@jainvipin
Copy link
Contributor

jainvipin commented Nov 15, 2022

I was hoping to see a proposal on the backend APIs and then remove these... but it should be okay to first take them out and then discuss the backend APIs

@glimchb
Copy link
Member

glimchb commented Nov 15, 2022

I would like to see how we propose to solve the bridge to Marvell and Nvidia if we remove those ?
see https://github.com/opiproject/opi-marvell-bridge/blob/main/mrvl_nvme_json.rpc_methods.pdf
I'm ok if we move it to middle or back, but before we remove them, maybe we should add middle/back example first... ?

Which Marvell call breaks if these are removed? I tried to read over their code and I don't see any of these used. Separately, as an FYI, they're missing several of their RPCs from that PDF documentation. They're in the go code but not in the PDF.

I also see several places where the subsystem_id is used as the SubNQN in that bridge - those are not the same thing. It needs to use the subsystem_id to look up the subsystem object, which has the NQN in it.

  • I'm not saying those changes are breaking, just asking if they are...
  • the missing RPCs from the PDF that needs to be added to Marvell implementation are attach/detach of namespace
    And isn't not in the code currently because we are still having discussion if this is going to be OPI API or not
    [storage]: Front-end Attach / Detach Namespace Proto-Bufs #177
  • Correct, subsystem_id vs SubNQN is not the same, and that part of bridge implementation should change

@glimchb
Copy link
Member

glimchb commented Nov 15, 2022

I was hoping to see a proposal on the backend APIs and then remove these... but it should be okay to first take them out and then discuss the backend APIs

I agree, would like to see us add things before remove things as well

@benlwalker
Copy link
Author

I'm not saying those changes are breaking, just asking if they are...

Ok - I don't believe these changes require any updates to the Marvell bridge. I can't see the NVIDIA one.

the missing RPCs from the PDF that needs to be added to Marvell implementation are attach/detach of namespace
And isn't not in the code currently because we are still having discussion if this is going to be OPI API or not
#177

I don't think these are the same thing. If you look at how CreateNvmeNamespace is implemented on the Marvell bridge, it calls mrvl_nvm_subsys_alloc_ns which isn't documented in the PDF. I don't think there's any discussion on whether we need to be able to add/remove subsystems from namespaces.

I was hoping to see a proposal on the backend APIs and then remove these... but it should be okay to first take them out and then discuss the backend APIs

I agree, would like to see us add things before remove things as well

I limited myself to only removing things that cannot possibly work as part of a front-end configuration. So I'm not removing any functionality that was implementable anyway, and we may as well get rid of it now. We're not losing anything, only gaining clarity.

@glimchb
Copy link
Member

glimchb commented Nov 15, 2022

I don't think these are the same thing. If you look at how CreateNvmeNamespace is implemented on the Marvell bridge, it calls mrvl_nvm_subsys_alloc_ns which isn't documented in the PDF. I don't think there's any discussion on whether we need to be able to add/remove subsystems from namespaces.

mrvl_nvm_subsys_alloc_ns is on page 7 of the PDF

@benlwalker
Copy link
Author

mrvl_nvm_subsys_alloc_ns is on page 7 of the PDF

Whoops, I didn't realize that the GitHub pdf viewer has a "more pages" button - I see it now. Thanks.

Copy link
Member

@glimchb glimchb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see there is no agreement here, but I feel we can go back to this discussion with more concrete use cases and implementation examples once/if we upstream nvidia and amd code
Hope the team can give it another go in that case.

merging this for now and moving on

@glimchb glimchb merged commit ca2c25b into opiproject:main Nov 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Merge Candidate in the open merge window, next candidate for merge Storage APIs or code related to storage area
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants