-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[storage] Transaction support #212
base: main
Are you sure you want to change the base?
Conversation
A transaction is created by calling CreateTransactionRequest on the TransactionService, which allocates a txn_id. This txn_id can (optionally) be included in many storage RPCs. If a storage RPC contains a valid txn_id, it's executable will be delayed until the transaction is later executed using ExecuteTransaction. When a transaction is executed, all of the steps within that transaction either succeed or fail as a unit. When a transaction completes, the state of the system will be either as if the transaction never occurred (for a failure) or as if all steps succeeded. Transactions must have limited lifetimes after they are created to deal with clients creating transactions and then going away without executing or deleting them. The agent implementing this RPC should periodically perform clean up as it deems necessary. Signed-off-by: Ben Walker <[email protected]>
Signed-off-by: Ben Walker <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea, not approving officially yet, so others can comment and review before we merge
Usually REST APIs doesn't have a concept of inter-related resources that need to be atomically committed. However we should adopt this if if it makes sense for us. Overall I like the idea... but we can keep it optional (naturally if a transaction is not started then rest is optional and life is usual). A few things that deem a discussion
|
I wouldn't make this support optional. As I mentioned elsewhere, idempotency is crucial for CSI drivers that want to use this API. To achieve this, they can repeat all API calls that haven't completed successfully before, but I have the impression that this is going to be tricky for some these fine-grained API calls (please correct me if I'm wrong). It can also leave the system in a partially configured state. Both a single, big API call and many smaller calls with transaction support avoid that. I'm fine with both approaches. |
Agree with @pohly. Especially when dealing with xPUs making this optional would allow the HW to be left in some partially configured/inconsistent state. I see no reason for leaving such a backdoor open - @jainvipin? If you are concerned that creating transactions may impose some performance tax I would not worry about that since these are configuration and management calls - once the offload machinery is set up, data plane traffic would not be affected. |
|
I find this confusing. We are not talking here about existing CSI drivers for setting up remote storage backends in particular but rather about frontend configuration of "local targets" for which new CSI drivers need to be written and which are supposed to utilize OPI API. It is up to the implementation to guarantee idempotency requirement - the transaction idea is to facilitate this. |
It's part of a CSI driver if its implementation of calls like
This implementation of Every single API call has to be defined so that the caller can figure out whether it has completed or be called again (idempotency). The introduction of a transaction ID is a simple way of achieving this because calls have no side effect until the transaction completes. I just see one problem: what if |
nobody is writing new drivers from scratch, we adding OPI to many already existing options in CSI node drivers |
And it is that part where OPI gRPC calls have to support implementing an idempotent |
No matter if you extend or write from scratch - it is up to you to guarantee idempotency. Transaction API can greatly facilitate this. |
it will not create a new interface. it will use GET to check if interface already exists and not create a new one... |
I agree with this, they can help. But do CSI driver providers need this help or not is another story |
It could do that, but the version I showed didn't to demonstrate the problem 😁 Has someone tried writing down what a correct implementation of |
we are adding OPI support to existing CSI drivers now, this is where all small fixes to APIs come from my side... |
one feedback we already got, that even with this is why we creating https://github.com/opiproject/goopicsi |
Idempotency and transaction semantics are totally independent. |
Idempotency, for example using https://google.aip.dev/133#user-specified-ids |
Well, things are working just fine without transactions for us in our deployments. So I can ascertain that this is not mandatory. DPU is not the highest level of manageability entity in the infrastructure, therefore partial failures can be handled quite efficiently up in the caller as well. Having said, enable/disable #178 captures turning on the volume only when everything is successful, giving user a full control. xPUs can very likely implement 2 phase commit (required by transactional semantics) - and so we could introduce the concept, but mandating it would be to impose things that are not needed in NVMe, at least not for now. In fact, I'd say that let's go without transactional semantics for now, and see if anyone runs into a problem that we need to solve here - at least here at AMD, we don't see a problem without transactions (for NVMe). In fact transactional semantics (esp multiple transactions simultaneous) is going to break idempotency, because the update on an object later in different transaction could be void with prior commit on the same object in later transaction. So we are creating a sense of idempotency at transaction level and not object level. |
Adding transactions facilitates making the APIs idempotent. A single CSI call likely results in many OPI calls, each of which modifies state and can fail. Either you check a bunch of state in the CSI plugin implementation, then submit these one by one and on a failure attempt to roll it all back, or we have a transaction API that does the whole set as a unit and lets the CSI plugin check many fewer things and not deal with rollback. The transaction API also deals with parallelism in a nicer way. You can build up transactions in parallel, but I'd highly recommend that the agent implementations take a global lock around the ExecuteTransaction call to serialize everything within that xPU. Overlapped provisioning is extremely problematic.
While having a golang wrapper around OPI is convenient, I'd highly discourage that as a place where we implement any key features of OPI, such as transactionality or idempotence. This is because not all orchestration systems are written in go - some are Python, or Java, or C#, or JavaScript. Critical functionality in OPI can't be in a golang specific module. |
You've just pushed the idempotence logic up into the orchestration plugin. There's nothing inherently wrong with that approach, other than now every orchestration system has to implement that. We're only arguing that it's more effective for the agent running on the xPU to implement it, given the knowledge of the xPU internals that the agent has. It's not at all about what works and what doesn't, it's about what's best. |
Again, idempotency has nothing to do with transaction semantics. And as for handling failure, the caller of the API will need to handle the failure regardless of whether a transaction fails or an object operation fails. But I do understand that transactional failure handling may be simpler if orchestrator wants to do away with withdrawing an object upon partial failure. Also note that I am not saying there is anything wrong with the approach, in fact I like the idea of a user wanting to do atomic commit on multiple objects. What I am objecting to is that this is the only way to do it, because I'd refuse to believe that based on my deployment experience. Of course, we can make it forceful- despite that. In fact, I'd challenge that we run current APIs in production and see if we even need transactional semantics in storage APIs? |
Is that already supposed to be idempotent? From the OPI proto file it is not clear to me how |
in development, PRs in progress, you are welcome to contribute as well, this is open
one simple option is just to return same PRs are welcome |
Typically, per REST semantics you'd do POST on an object and then PUT (to update, as many times). The point I was trying to make is that I can have transaction semantics and yet fail upon double create and so I may not get idempotency with transaction semantics. If backend doesn't implement idempotency we can seek that as the goal, not mix it with transactions. |
But doesn't that then have to be in the API spec? Clients cannot rely on this unless such behavior is required from an implementation and thus guaranteed.
I'm afraid I don't know enough about these individual APIs to suggest enhancements. I probably wouldn't even get the terminology right 😅 |
https://google.aip.dev/133#user-specified-ids
and
|
A transaction is created by calling CreateTransaction on the
TransactionService, which allocates a txn_id. This txn_id can
(optionally) be included in many storage RPCs. If a storage RPC contains
a valid txn_id, it's execution will be delayed until the transaction is
later executed using ExecuteTransaction.
When a transaction is executed, all of the steps within that transaction
either succeed or fail as a unit. When a transaction completes, the state
of the system will be either as if the transaction never occurred (for a
failure) or as if all steps succeeded.
Transactions must have limited lifetimes after they are created to deal
with clients creating transactions and then going away without executing
or deleting them. The agent implementing this service should periodically
perform clean up as it deems necessary.