[storage] Transaction support #212

benlwalker · 2022-11-17T21:34:20Z

A transaction is created by calling CreateTransaction on the
TransactionService, which allocates a txn_id. This txn_id can
(optionally) be included in many storage RPCs. If a storage RPC contains
a valid txn_id, it's execution will be delayed until the transaction is
later executed using ExecuteTransaction.

When a transaction is executed, all of the steps within that transaction
either succeed or fail as a unit. When a transaction completes, the state
of the system will be either as if the transaction never occurred (for a
failure) or as if all steps succeeded.

Transactions must have limited lifetimes after they are created to deal
with clients creating transactions and then going away without executing
or deleting them. The agent implementing this service should periodically
perform clean up as it deems necessary.

A transaction is created by calling CreateTransactionRequest on the TransactionService, which allocates a txn_id. This txn_id can (optionally) be included in many storage RPCs. If a storage RPC contains a valid txn_id, it's executable will be delayed until the transaction is later executed using ExecuteTransaction. When a transaction is executed, all of the steps within that transaction either succeed or fail as a unit. When a transaction completes, the state of the system will be either as if the transaction never occurred (for a failure) or as if all steps succeeded. Transactions must have limited lifetimes after they are created to deal with clients creating transactions and then going away without executing or deleting them. The agent implementing this RPC should periodically perform clean up as it deems necessary. Signed-off-by: Ben Walker <[email protected]>

Signed-off-by: Ben Walker <[email protected]>

glimchb

I like the idea, not approving officially yet, so others can comment and review before we merge

jainvipin · 2022-11-18T07:32:53Z

Usually REST APIs doesn't have a concept of inter-related resources that need to be atomically committed. However we should adopt this if if it makes sense for us. Overall I like the idea... but we can keep it optional (naturally if a transaction is not started then rest is optional and life is usual). A few things that deem a discussion

Do we want to support multiple ||el transactions? Both yes and no have their own pros/cons, which we can discuss.
The response needs to be verbose, even though requests to add/delete went together, the response for various operations need to be reported individually so user knows what went wrong on a specific object
If we follow the Spec/Status semantics, we need one object, we can call is Transaction but have Spec identifying the transaction id, and Status listing the status of the transaction, may be it is not an imperative operation and may require resources to asynchronously be available.

pohly · 2022-11-18T08:38:37Z

I wouldn't make this support optional. As I mentioned elsewhere, idempotency is crucial for CSI drivers that want to use this API. To achieve this, they can repeat all API calls that haven't completed successfully before, but I have the impression that this is going to be tricky for some these fine-grained API calls (please correct me if I'm wrong). It can also leave the system in a partially configured state.

Both a single, big API call and many smaller calls with transaction support avoid that. I'm fine with both approaches.

intelfisz · 2022-11-18T08:59:40Z

Agree with @pohly. Especially when dealing with xPUs making this optional would allow the HW to be left in some partially configured/inconsistent state. I see no reason for leaving such a backdoor open - @jainvipin? If you are concerned that creating transactions may impose some performance tax I would not worry about that since these are configuration and management calls - once the offload machinery is set up, data plane traffic would not be affected.

glimchb · 2022-11-18T13:22:55Z

Can we also explore explore Batch methods https://google.aip.dev/233 #211 ? Batch methods according to https://google.aip.dev/233 ?
instead of transaction ?
I consulted with EMC CSI drivers writers and they said that idempotency and atomicity is already part of CSI drivers today, so we don't need that support just for DPUs. Other use cases don;t provide this option, so this will create a better, but unnecessary functionality and code divergency

intelfisz · 2022-11-18T14:05:05Z

I consulted with EMC CSI drivers writers and they said that idempotency and atomicity is already part of CSI drivers today, so we don't need that support just for DPUs. Other use cases don;t provide this option, so this will create a better, but unnecessary functionality and code divergency

I find this confusing. We are not talking here about existing CSI drivers for setting up remote storage backends in particular but rather about frontend configuration of "local targets" for which new CSI drivers need to be written and which are supposed to utilize OPI API. It is up to the implementation to guarantee idempotency requirement - the transaction idea is to facilitate this.

pohly · 2022-11-18T14:09:50Z

they said that idempotency and atomicity is already part of CSI drivers today, so we don't need that support just for DPUs

It's part of a CSI driver if its implementation of calls like NodeStageVolume are written properly. That depends on support in the underlying API. For example, suppose the underlying API has a CreateNewInterface() CreateNewInterfaceResponse call that is needed by NodeStageVolume to set up something in the DPU:

func NodeStageVolume(...) {
    resp := api.CreateNewInterface()
    // do something with the new interface
   ...
}

This implementation of NodeStageVolume is not idempotent. If it gets interrupted after CreateNewInterface has changed the DPU configuration and gets called again, it will create another "interface" (whatever that is), leaving the one created before behind as an orphan resource that will never get cleaned up.

Every single API call has to be defined so that the caller can figure out whether it has completed or be called again (idempotency). The introduction of a transaction ID is a simple way of achieving this because calls have no side effect until the transaction completes.

I just see one problem: what if ExecuteTransaction itself gets interrupted so that it completes on the DPU, but the response doesn't get back to the CSI driver?

glimchb · 2022-11-18T14:09:52Z

I find this confusing. We are not talking here about existing CSI drivers for setting up remote storage backends in particular but rather about frontend configuration of "local targets" for which new CSI drivers need to be written and which are supposed to utilize OPI API. It is up to the implementation to guarantee idempotency requirement - the transaction idea is to facilitate this.

nobody is writing new drivers from scratch, we adding OPI to many already existing options in CSI node drivers
see https://github.com/opiproject/opi-api/blob/main/doc/images/CSI-With-xPU-option1.png
in this option1, controller driver is unchanged, node driver uses OPI gRPC in addition to what it has today

pohly · 2022-11-18T14:10:59Z

node driver uses OPI gRPC in addition to what it has today

And it is that part where OPI gRPC calls have to support implementing an idempotent NodeStageVolume.

intelfisz · 2022-11-18T14:15:23Z

I find this confusing. We are not talking here about existing CSI drivers for setting up remote storage backends in particular but rather about frontend configuration of "local targets" for which new CSI drivers need to be written and which are supposed to utilize OPI API. It is up to the implementation to guarantee idempotency requirement - the transaction idea is to facilitate this.

nobody is writing new drivers from scratch, we adding OPI to many already existing options in CSI node drivers see https://github.com/opiproject/opi-api/blob/main/doc/images/CSI-With-xPU-option1.png in this option1, controller driver is unchanged, node driver uses OPI gRPC in addition to what it has today

No matter if you extend or write from scratch - it is up to you to guarantee idempotency. Transaction API can greatly facilitate this.

glimchb · 2022-11-18T14:17:48Z

it will create another "interface" (whatever that is), leaving the one created before behind as an orphan resource that will never get cleaned up.

it will not create a new interface. it will use GET to check if interface already exists and not create a new one...

glimchb · 2022-11-18T14:19:04Z

Transaction API can greatly facilitate this.

I agree with this, they can help. But do CSI driver providers need this help or not is another story

pohly · 2022-11-18T14:22:19Z

it will not create a new interface. it will use GET to check if interface already exists and not create a new one...

It could do that, but the version I showed didn't to demonstrate the problem 😁

Has someone tried writing down what a correct implementation of NodeStageVolume has to look like with the current OPI gRCP API? That would be useful to verify whether the API has the necessary support (like a GET call for each create).

glimchb · 2022-11-18T14:24:04Z

Has someone tried writing down what a correct implementation of NodeStageVolume has to look like with the current OPI gRCP API? That would be useful to verify whether the API has the necessary support (like a GET call for each create).

we are adding OPI support to existing CSI drivers now, this is where all small fixes to APIs come from my side...

glimchb · 2022-11-18T14:25:43Z

one feedback we already got, that even with GETs or TRANSACTIONs there are too many code, too many API calls...
we were suggested to create a go library as a good practice already exists today in CSI world...

this is why we creating https://github.com/opiproject/goopicsi

jainvipin · 2022-11-18T16:37:15Z

I wouldn't make this support optional. As I mentioned elsewhere, idempotency is crucial for CSI drivers that want to use this API. To achieve this, they can repeat all API calls that haven't completed successfully before, but I have the impression that this is going to be tricky for some these fine-grained API calls (please correct me if I'm wrong). It can also leave the system in a partially configured state.

Both a single, big API call and many smaller calls with transaction support avoid that. I'm fine with both approaches.

Idempotency and transaction semantics are totally independent.
Transaction allows atomicity of commit on multiple objects, where as idempotency is natively built in REST semantics where resources can be committed anytime over and over again without a consequence, etc. I'd favor idempotency, but mandating transactional processing for multi-object atomic commit is a bit extreme.

glimchb · 2022-11-18T16:45:54Z

Idempotency, for example using https://google.aip.dev/133#user-specified-ids

jainvipin · 2022-11-18T16:51:29Z

Agree with @pohly. Especially when dealing with xPUs making this optional would allow the HW to be left in some partially configured/inconsistent state. I see no reason for leaving such a backdoor open - @jainvipin? If you are concerned that creating transactions may impose some performance tax I would not worry about that since these are configuration and management calls - once the offload machinery is set up, data plane traffic would not be affected.

Well, things are working just fine without transactions for us in our deployments. So I can ascertain that this is not mandatory. DPU is not the highest level of manageability entity in the infrastructure, therefore partial failures can be handled quite efficiently up in the caller as well. Having said, enable/disable #178 captures turning on the volume only when everything is successful, giving user a full control.

xPUs can very likely implement 2 phase commit (required by transactional semantics) - and so we could introduce the concept, but mandating it would be to impose things that are not needed in NVMe, at least not for now. In fact, I'd say that let's go without transactional semantics for now, and see if anyone runs into a problem that we need to solve here - at least here at AMD, we don't see a problem without transactions (for NVMe).

In fact transactional semantics (esp multiple transactions simultaneous) is going to break idempotency, because the update on an object later in different transaction could be void with prior commit on the same object in later transaction. So we are creating a sense of idempotency at transaction level and not object level.

benlwalker · 2022-11-18T16:55:55Z

Adding transactions facilitates making the APIs idempotent. A single CSI call likely results in many OPI calls, each of which modifies state and can fail. Either you check a bunch of state in the CSI plugin implementation, then submit these one by one and on a failure attempt to roll it all back, or we have a transaction API that does the whole set as a unit and lets the CSI plugin check many fewer things and not deal with rollback.

The transaction API also deals with parallelism in a nicer way. You can build up transactions in parallel, but I'd highly recommend that the agent implementations take a global lock around the ExecuteTransaction call to serialize everything within that xPU. Overlapped provisioning is extremely problematic.

this is why we creating https://github.com/opiproject/goopicsi

While having a golang wrapper around OPI is convenient, I'd highly discourage that as a place where we implement any key features of OPI, such as transactionality or idempotence. This is because not all orchestration systems are written in go - some are Python, or Java, or C#, or JavaScript. Critical functionality in OPI can't be in a golang specific module.

benlwalker · 2022-11-18T16:58:57Z

Well, things are working just fine without transactions for us in our deployments. So I can ascertain that this is not mandatory. DPU is not the highest level of manageability entity in the infrastructure, therefore partial failures can be handled quite efficiently up in the caller as well. Having said, enable/disable #178 captures turning on the volume only when everything is successful, giving user a full control.

You've just pushed the idempotence logic up into the orchestration plugin. There's nothing inherently wrong with that approach, other than now every orchestration system has to implement that. We're only arguing that it's more effective for the agent running on the xPU to implement it, given the knowledge of the xPU internals that the agent has. It's not at all about what works and what doesn't, it's about what's best.

jainvipin · 2022-11-18T17:07:15Z

Well, things are working just fine without transactions for us in our deployments. So I can ascertain that this is not mandatory. DPU is not the highest level of manageability entity in the infrastructure, therefore partial failures can be handled quite efficiently up in the caller as well. Having said, enable/disable #178 captures turning on the volume only when everything is successful, giving user a full control.

You've just pushed the idempotence logic up into the orchestration plugin. There's nothing inherently wrong with that approach, other than now every orchestration system has to implement that. We're only arguing that it's more effective for the agent running on the xPU to implement it, given the knowledge of the xPU internals that the agent has. It's not at all about what works and what doesn't, it's about what's best.

Again, idempotency has nothing to do with transaction semantics.
I am not sure what is the goal here. We can have idempotency (which works best with declarative APIs vs. imperative APIs) with or without transaction semantics.

And as for handling failure, the caller of the API will need to handle the failure regardless of whether a transaction fails or an object operation fails. But I do understand that transactional failure handling may be simpler if orchestrator wants to do away with withdrawing an object upon partial failure.

Also note that I am not saying there is anything wrong with the approach, in fact I like the idea of a user wanting to do atomic commit on multiple objects. What I am objecting to is that this is the only way to do it, because I'd refuse to believe that based on my deployment experience. Of course, we can make it forceful- despite that.

In fact, I'd challenge that we run current APIs in production and see if we even need transactional semantics in storage APIs?

pohly · 2022-11-18T17:27:34Z

this is why we creating https://github.com/opiproject/goopicsi

Is that already supposed to be idempotent?

From the OPI proto file it is not clear to me how CreateNVMeSubsystem will react when called twice, and in https://github.com/opiproject/goopicsi/blob/c545e44b269d3fb77b8331231df8448576e1959c/goopicsi.go#LL64C17-L64C37 there is no special handling of an "already exists" error.

glimchb · 2022-11-18T17:37:29Z

this is why we creating https://github.com/opiproject/goopicsi

Is that already supposed to be idempotent?

in development, PRs in progress, you are welcome to contribute as well, this is open

From the OPI proto file it is not clear to me how CreateNVMeSubsystem will react when called twice,

one simple option is just to return same subsystems[in.SubsystemId.Value]
https://github.com/opiproject/opi-spdk-bridge/blob/main/server/frontend.go#L50
without creating new object.

PRs are welcome

jainvipin · 2022-11-18T17:37:59Z

this is why we creating https://github.com/opiproject/goopicsi

Is that already supposed to be idempotent?

From the OPI proto file it is not clear to me how CreateNVMeSubsystem will react when called twice, and in https://github.com/opiproject/goopicsi/blob/c545e44b269d3fb77b8331231df8448576e1959c/goopicsi.go#LL64C17-L64C37 there is no special handling of an "already exists" error.

Typically, per REST semantics you'd do POST on an object and then PUT (to update, as many times). The point I was trying to make is that I can have transaction semantics and yet fail upon double create and so I may not get idempotency with transaction semantics. If backend doesn't implement idempotency we can seek that as the goal, not mix it with transactions.

pohly · 2022-11-18T17:42:44Z

one simple option is just to return same subsystems[in.SubsystemId.Value]
https://github.com/opiproject/opi-spdk-bridge/blob/main/server/frontend.go#L50
without creating new object.

But doesn't that then have to be in the API spec? Clients cannot rely on this unless such behavior is required from an implementation and thus guaranteed.

PRs are welcome

I'm afraid I don't know enough about these individual APIs to suggest enhancements. I probably wouldn't even get the terminology right 😅

glimchb · 2022-11-18T17:47:21Z

But doesn't that then have to be in the API spec? Clients cannot rely on this unless such behavior is required from an implementation and thus guaranteed.

https://google.aip.dev/133#user-specified-ids

If a user tries to create a resource with an ID that would result in a duplicate resource name, the service must error with ALREADY_EXISTS.

and

Important: Declarative-friendly resources ([AIP-128](https://google.aip.dev/128)) must support user-specified IDs.

Ben Walker added 2 commits November 17, 2022 14:27

[storage] Regenerate protobufs

1e57351

Signed-off-by: Ben Walker <[email protected]>

benlwalker requested a review from a team as a code owner November 17, 2022 21:34

benlwalker changed the title ~~Transaction support~~ [storage] Transaction support Nov 17, 2022

glimchb assigned benlwalker Nov 17, 2022

glimchb requested review from jainvipin, llabordehpe, mkalderon, orendu and a team November 17, 2022 23:32

glimchb reviewed Nov 17, 2022

View reviewed changes

glimchb added Storage APIs or code related to storage area Merge Candidate in the open merge window, next candidate for merge labels Nov 17, 2022

glimchb removed the Merge Candidate in the open merge window, next candidate for merge label Dec 8, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[storage] Transaction support #212

[storage] Transaction support #212

benlwalker commented Nov 17, 2022

glimchb left a comment

jainvipin commented Nov 18, 2022

pohly commented Nov 18, 2022 •

edited

Loading

intelfisz commented Nov 18, 2022

glimchb commented Nov 18, 2022

intelfisz commented Nov 18, 2022

pohly commented Nov 18, 2022

glimchb commented Nov 18, 2022

pohly commented Nov 18, 2022

intelfisz commented Nov 18, 2022

glimchb commented Nov 18, 2022

glimchb commented Nov 18, 2022

pohly commented Nov 18, 2022

glimchb commented Nov 18, 2022

glimchb commented Nov 18, 2022

jainvipin commented Nov 18, 2022 •

edited

Loading

glimchb commented Nov 18, 2022 •

edited

Loading

jainvipin commented Nov 18, 2022

benlwalker commented Nov 18, 2022

benlwalker commented Nov 18, 2022

jainvipin commented Nov 18, 2022

pohly commented Nov 18, 2022

glimchb commented Nov 18, 2022 •

edited

Loading

jainvipin commented Nov 18, 2022

pohly commented Nov 18, 2022

glimchb commented Nov 18, 2022 •

edited

Loading

[storage] Transaction support #212

Are you sure you want to change the base?

[storage] Transaction support #212

Conversation

benlwalker commented Nov 17, 2022

glimchb left a comment

Choose a reason for hiding this comment

jainvipin commented Nov 18, 2022

pohly commented Nov 18, 2022 • edited Loading

intelfisz commented Nov 18, 2022

glimchb commented Nov 18, 2022

intelfisz commented Nov 18, 2022

pohly commented Nov 18, 2022

glimchb commented Nov 18, 2022

pohly commented Nov 18, 2022

intelfisz commented Nov 18, 2022

glimchb commented Nov 18, 2022

glimchb commented Nov 18, 2022

pohly commented Nov 18, 2022

glimchb commented Nov 18, 2022

glimchb commented Nov 18, 2022

jainvipin commented Nov 18, 2022 • edited Loading

glimchb commented Nov 18, 2022 • edited Loading

jainvipin commented Nov 18, 2022

benlwalker commented Nov 18, 2022

benlwalker commented Nov 18, 2022

jainvipin commented Nov 18, 2022

pohly commented Nov 18, 2022

glimchb commented Nov 18, 2022 • edited Loading

jainvipin commented Nov 18, 2022

pohly commented Nov 18, 2022

glimchb commented Nov 18, 2022 • edited Loading

pohly commented Nov 18, 2022 •

edited

Loading

jainvipin commented Nov 18, 2022 •

edited

Loading

glimchb commented Nov 18, 2022 •

edited

Loading

glimchb commented Nov 18, 2022 •

edited

Loading

glimchb commented Nov 18, 2022 •

edited

Loading