Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SIMD XXXX: Enable Core BPF Programs #2

Closed
wants to merge 1 commit into from

Conversation

buffalojoec
Copy link
Owner

@buffalojoec buffalojoec commented Nov 9, 2023

Roadmap

This is SIMD 1/4 expected for Multi-Client Feature Gates. See #3.

Goals:

  • An official process for migrating native programs to BPF
  • Expanded control over pending feature activations
  • Automatic feature activation selection based on stake weight with supporting
    software
  • Decentralized control over queuing new runtime features

Resulting Architecture:

  1. 👉 Enable Core BPF Programs: The official process by which core
    contributors migrate native programs to "core" BPF programs and how those
    programs are upgraded.
  2. Feature Control: Features can be created by anyone. Each is owned by the
    core Feature Gate program at Feature111111111111111111111111111111111111,
    which gives core contributors added control over pending activations.
  3. Feature Recognition & Activation: Features qualify for activation based
    on stake support of nodes who recognize the feature in their software
    version.
  4. Feature Queuing: A governance process nominates eligible features that
    should be queued for activation.

Summary

This SIMD outlines the process by which native programs will become "core" BPF
programs, and how changes to these programs will be managed.

@buffalojoec
Copy link
Owner Author

buffalojoec commented Nov 9, 2023

Note:

It's kind of scary, but I think inevitable, that the process of migrating a native program to core BPF will basically consist of re-writing the program as BPF and then swapping it with the process outlined here. There's no way to really do it incrementally because the address must match (😳).

A good process for failover in case of any problems upgrading a core BPF program would be good to have. I was thinking something like the following:

  • A backup account is used to store the most recent version of the target program before it was upgraded
  • Somehow, the newly upgraded program is tested for newly introduced bugs
  • If anything fails, the runtime can replace the program with the previous version

However I'm not really sure how we could test it via runtime.

proposals/enable-core-bpf-programs.md Outdated Show resolved Hide resolved
proposals/enable-core-bpf-programs.md Outdated Show resolved Hide resolved
## Detailed Design

Core BPF programs shall be non-upgradeable BPF programs deployed with the
BPFLoader.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
BPFLoader.
`BPFLoader2111111111111111111111111111111111`.

?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually this confused me when I went to test a few things out. For example, if I deploy my own program and use

solana program deploy --final my_program.so

then the upgrade authority gets set to None, but it's still owned by the upgradeable loader and still has a data account. Whereas a program like Tokenkeg is owned by the BPF Loader and does not have a data account.

I would prefer the latter setup if possible for core BPF programs.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, solana program deploy uses the upgradeable loader by default. I think that we supported both in the cli for a while, but at this point, we would have to write a new deployment cli to use the non-upgradeable loader.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes more sense to enforce core BPF programs to be one single program account owned by the non-upgradeable loader, just to add to the distinction. Wdyt?

Comment on lines 90 to 91
The process for migrating an existing native program to a core BPF program shall
be the same as upgrading a core BPF program.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can it be exactly the same? Presumably we want to run some checks on the native-program account in question, and these will have to be different than would be run against a BPF program. Maybe "essentially" the same?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What sort of checks are you thinking of?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Is this account owned by the native loader?
  • Is the account state what we expect, ie. does it contain the name of the program we expect to be replacing?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added this to the proposal, but I think your second bullet depends on the discussion of which loader should own core BPF programs, and whether or not they should be single-account programs.

proposals/enable-core-bpf-programs.md Outdated Show resolved Hide resolved
proposals/enable-core-bpf-programs.md Outdated Show resolved Hide resolved
proposals/enable-core-bpf-programs.md Outdated Show resolved Hide resolved
Comment on lines 124 to 137
Here is an example of such a function:

```rust
enum TargetProgram {
System,
Stake,
Vote,
/* ... */
}

fn upgrade_core_program(target_program: TargetProgram, source: Pubkey) {
/* replacement logic */
}
```

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, are you implying with this example that the runtime will only permits programs in some list of Core BPF Programs to be upgraded via this process? Let's make it clear in the design description whether any non-upgradeable BPF program can be upgraded this way, or just a set list of Core BPF Programs.
Then remove this stub code.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and if we add a new program we add it to the list! This would be for maximum safety when using this function in a feature gate, versus two arbitrary program addresses provided as parameters.

Let's make it clear in the design description whether any non-upgradeable BPF program can be upgraded this way, or just a set list of Core BPF Programs.

I think it makes sense to fixate it to a list, since that restricts the use of this method - which I think is good. I think it should be pretty restrictive which programs are allowed to be upgraded like this, and explicitly laid out in the code when a feature gate is created for it.

Wdyt?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, a list does make sense, I'm fine with it.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like you said, we can't enforce Firedancer to use such an approach, so I've included it as a suggestion.

proposals/enable-core-bpf-programs.md Outdated Show resolved Hide resolved

## Detailed Design

Core BPF programs shall be non-upgradeable BPF programs deployed with
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want these programs to use PRv2 ABI/API? If so, LoaderV411111111111111111111111111111111111 would be used to deploy the programs.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If Loader v4 is live we can use that instead! It would make sense for sure.

2. Generate a new keypair for the **source** program.
3. Deploy the program to the **source** address.
4. Generate a new keypair for the **feature gate**.
5. Create a new feature gate for replacing the **target** program with the
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does replacing mean the contents of source program account will be copied over to target program account? If so, it'll be useful mention the constraints that must be met before the target program is replaced. For example, should the program authority for source and target be the same?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep this is what I mean. And yeah, good shout. We should have a series of pretty robust checks on the swap.

Copy link

@Lichtso Lichtso Nov 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But that is precisely the mechanics of an upgradeable program:
Prepare a buffer account, verify that the program is ok, then replace the target program and delete the buffer. Only difference would be that a feature gate would send the upgrade transaction.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, you're right. How instead would you propose to make an upgrade transaction only possible through feature gate?

An additional optional field could be added to feature gate issues for the
**target program** being upgraded.

A hard-coded list of all supported core BPF programs should be used to conduct
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we expect this list to be frozen, or could more core programs be added in future? I am guessing it's the latter, and feature gates will be used to update the list. Just want to clear up my understanding.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, definitely latter.

@Lichtso
Copy link

Lichtso commented Nov 13, 2023

A couple of thoughts:

  1. For on-chain programs we introduced delayed visibility, meaning that the upgraded program appears to be closed for the rest of the slot in which it was upgraded in order to prevent pipeline stalls. For these migrated built-in programs there should be an exception as they can not be closed even for a single transaction.

  2. Removing these built-in programs from the protocol by deploying them on-chain is a very welcome reduction of complexity of the protocol. However, it also means that these built-in programs do not get the benefit of multiple implementations (from the multiple validator projects) anymore.

  3. For quality assurance in both: testing of new versions of these migrated built-ins and their continuous operation, it might be interesting to have multiple implementations being deployed on chain under the same address and running transactions through them simultaneously, to see that they come to the same conclusions. This could also be done as a form of random sampling instead of all the time. This however probably belongs into a different proposal.

@buffalojoec
Copy link
Owner Author

buffalojoec commented Nov 14, 2023

Hey thanks for sharing these thoughts, it's a big help! Seems we may want a lot of VM/PRv2 considerations in here.

  1. For on-chain programs we introduced delayed visibility, meaning that the upgraded program appears to be closed for the rest of the slot in which it was upgraded in order to prevent pipeline stalls. For these migrated built-in programs there should be an exception as they can not be closed even for a single transaction.

This seems a bit strange, although I'm sure you have valid reasons. Where can I read more about this change?

Agreed they have to be exempt from this delay. I will add it to our growing list of special permissions!

  1. Removing these built-in programs from the protocol by deploying them on-chain is a very welcome reduction of complexity of the protocol. However, it also means that these built-in programs do not get the benefit of multiple implementations (from the multiple validator projects) anymore.

This is true, and I think it's worth surveying a bit on whether or not it's worth the trade-off.

  1. For quality assurance in both: testing of new versions of these migrated built-ins and their continuous operation, it might be interesting to have multiple implementations being deployed on chain under the same address and running transactions through them simultaneously, to see that they come to the same conclusions. This could also be done as a form of random sampling instead of all the time. This however probably belongs into a different proposal.

This also piqued my interest when it was proposed a few times before/during Breakpoint. Sounds like you've also been considering it.

We can move this discussion somewhere more relevant, but it does have some slight implications on this SIMD, since here we're defining what constitutes a "core BPF program", and the ability to simulate alongside multiple implementations of each might be worth partially detailing in here.

Fwiw, implementing a BPF program multiple different ways to test against each other for reliance is not too different than multiple clients agreeing on execution results.

@Lichtso
Copy link

Lichtso commented Nov 14, 2023

This seems a bit strange, although I'm sure you have valid reasons. Where can I read more about this change?

@buffalojoec Here: solana-labs/solana#29654. We also considered a seamless transition of old to new in redeployment of programs but the accounts-db and snapshot can only hold one version of a program in a slot / fork so it was ruled out.

@buffalojoec
Copy link
Owner Author

Closing to move to official SIMD repo.

solana-foundation#88

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants