Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Multiple Python Versions #175

Closed
wants to merge 10 commits into from
Closed

Support Multiple Python Versions #175

wants to merge 10 commits into from

Conversation

HassanAbouelela
Copy link
Member

@HassanAbouelela HassanAbouelela commented Mar 15, 2023

Closes #158
Adds support for evaluating code in different python versions via the version argument.

This PR is a PoC to discuss the approach, and find issues with the idea. There's a versions.json file which specifies the supported python versions, and a make target for re-generating the relevant configuration files and dockerfile to enable the new versions. Packages are isolated based on python version.

The current approach was chosen for being relatively easy, quick, and lightweight when creating the container, however it is a little hacky. I am open to suggestions on other good methods for installing multiple python versions. Part of this proposal is to publish two different snekbox images, one with all images, and a lightweight one with just one version for developers.

Other things I've attempted included building all the versions in the container, manually and with pyenv, but those inevitably take way too long and produce really large containers. I've also tried adding deadsnakes and installing from apt, but I struggled to actually get it working within our image. This might still be a viable solution if we want to investigate it.

Adds support for having multiple evaluation python versions installed in
the docker container. A utility to automatically generate correct
dockerfile instructions and nsjail mounts based on the available
versions is also included.

Signed-off-by: Hassan Abouelela <[email protected]>
Adds a version argument to the eval API, and a GET endpoint to retrieve
all enabled versions.

Signed-off-by: Hassan Abouelela <[email protected]>
@HassanAbouelela HassanAbouelela added type: feature New feature or request area: API Related to or causes API changes area: backend Related to internal functionality and utilities priority: 2 - normal labels Mar 15, 2023
Signed-off-by: Hassan Abouelela <[email protected]>
Copy link
Member

@MarkKoz MarkKoz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this decouple the interpreter the web app uses and what eval/nsjail uses?

snekbox/api/resources/eval.py Outdated Show resolved Hide resolved
snekbox/api/resources/eval.py Outdated Show resolved Hide resolved
Comment on lines 35 to 38
"version": {
"type": "string",
"oneOf": [{"const": name} for name in _VERSION_DISPLAY_NAMES],
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternative proposal: pass a path to an executable instead of a version. This would leave the door open for supporting arbitrary executables in the future. So far nothing in the API is actually specific to Python or even a compiler/interpreter. The downside is that it's less user-friendly to know the full path to an executable than it is to specify a Python version.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So instead of discovering which versions are available, users could discover which binaries are available and what their paths are. We don't need to list everything that's mounted; can continue to use the same approach of configuring that in a JSON file.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an interesting idea, I'll into implementing it. I'd probably keep the display names in the JSON files so end-users (bot) can still display human-readable names.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another approach would be to base it off nsjail config files. Each binary (including different Python versions) would have its own config file, and we would scan a directory for config files. The API would accept a config name. This would offer more flexibility in supporting multiple binaries, as different binaries may require different mounts, environment variables, etc. I don't think this approach is mutually exclusive with my previous proposal, so we can revisit this in the future as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder how much of the config will be repeated. I imagine a lot of the python (for the cpython implementations) configs will be very repetitive, but even other interpreters would presumably need similar cgroup configuration, mounts, etc. We might be able to get around the duplication problem by providing overrides to be applied in python (or if nsjail supports combining configs?) instead of multiple config files. Either way, it's a problem for another day.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, fair enough if we stick with something like versions.json. However, I am considering that maybe arbitrary paths is better. It's more flexible, it's less complex, and requires no additional configuration. What problem or use case are we trying to solve by restricting the binaries to a predefined set?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'd need to define the python versions statically somewhere at any rate, so we can actually define them in the dockerfile. We might be able to get away with an image that has all python versions already so we don't need to specify them, but that would require an external image most likely. I linked an example in the original issue, but that is no longer maintained.

On a more general level though, I'm not sure arbitrary binaries is the best approach. Different binaries would require different configurations, be that resources, mounts, or permissions. The most flexible system would be statically defined sets of configurations - harking back to your multiple configs idea - and at that point we'd end up with end users selecting from a pre-defined set of human names.

This system could actually define multiple binaries per config-set as well by passing the path as an argument, or more ideally, defining the paths in a dict and passing the name to the API which resolves the correct binary.

Copy link
Member

@MarkKoz MarkKoz Mar 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The most flexible system would be statically defined sets of configurations

I agree, and we want to leave the door open for that in the future.

Different binaries would require different configurations, be that resources, mounts, or permissions.

This is true, but not always. It is feasible for a single NsJail config to support multiple binaries (though it can only have one default entry point I think).

We'd need to define the python versions statically somewhere at any rate

That only applies to Python (so far), and I consider it an implementation detail as far as the API is concerned. I believe we need a stronger reason to influence the design of our API.

defining the paths in a dict and passing the name to the API which resolves the correct binary.

I find this redundant. It seems simpler and more intuitive to create an NsJail config, which effectively defines an environment, and have everything in that environment be available via the API.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is feasible for a single NsJail config to support multiple binaries (though it can only have one default entry point I think).

This is true, but we can probably just avoid defining a default entry point in the first place. That's more or less what this PR does for python.

It seems simpler and more intuitive to create an NsJail config, which effectively defines an environment, and have everything in that environment be available via the API.

My problem is it moves the burden of knowing what binaries are available on the API from the project (which can reasonably control the matter) to the user (which can not directly affect it). For instance, if we use this PR to implement support for multiple python versions on bot, we can't trivially define what versions are available to the bot without essentially hardcoding it. A change to snekbox would necessitate a change to the users as well.

Regardless of which approach we choose, it'd be our burden as maintainers to ensure if we provide a binary for eval in one version, that we also have it available for future versions, or declaring an incompatibility if needed. This is much harder when we don't actually know what binaries users need, and have to assume everything is used.

Explicitly defining which binaries the API supports (and returning them in /info) alleviates both problem. At that point a display name is just the icing on the cake.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My problem is it moves the burden of knowing what binaries are available on the API from the project (which can reasonably control the matter) to the user (which can not directly affect it).

I don't necessarily see that as a problem. I've seen snekbox as primarily an internal service for which users build their own front ends. Thus, the users are in control of both, and they will be aware of what is available in the snekbox environment. It'd be a different story if the snekbox API was directly available on the internet for anyone to use, but I do not think that is the use case we should be prioritising.

This is much harder when we don't actually know what binaries users need, and have to assume everything is used.

We could include a list of binaries in our documentation, and it does not need to be exhaustive. We can just list the Python binaries. If users want to use other binaries, then that's on them, and we won't guarantee the availability of those undocumented binaries.

@HassanAbouelela
Copy link
Member Author

HassanAbouelela commented Mar 15, 2023

It is sort of decoupled in the sense they are not enforced to be the same version as the one in the pyproject. The server will use one of the versions in versions.json, particularly the one labeled main. They are the same binaries though.

@MarkKoz
Copy link
Member

MarkKoz commented Mar 15, 2023

I think the version the application uses should always match what we target with pip-compile. The frozen dependencies may not be compatible across Python versions, but this is not something I have looked into.

I don't know if there is any way for us to actually enforce this. I suppose we can mostly rely on CI to tell us whether the dependencies are compatible.

@HassanAbouelela
Copy link
Member Author

The main challenge is figuring that version out during the dockerfile creation step, and verifying the version string against whatever is configured in the file. It might be doable, do we want that enforcement?

@MarkKoz
Copy link
Member

MarkKoz commented Mar 15, 2023

No, I don't think we need to do anything for that. I think the tests + image build than run in CI gives us confidence in the dependencies being compatible. It's not worth adding complexity to our image. If there was a straightforward way, then sure.

Remove the pointless function and allow the file-level constants to be
imported by external callers.

Signed-off-by: Hassan Abouelela <[email protected]>
Move the information endpoint from the eval resource into its own
location. This endpoint currently only returns the python version but
can easily be expanded with more info later.

Signed-off-by: Hassan Abouelela <[email protected]>
Signed-off-by: Hassan Abouelela <[email protected]>
@coveralls
Copy link

coveralls commented Mar 17, 2023

Coverage Status

Coverage: 85.594% (+0.3%) from 85.255% when pulling 3d9eff4 on multi-version into 9acc6f5 on main.

@HassanAbouelela HassanAbouelela force-pushed the multi-version branch 3 times, most recently from 12a7de9 to 2e8c91d Compare March 18, 2023 01:24
Adds an image with only one python version installed. This can be useful
in development environments for other projects where the multi-version
features are less useful than a more efficient container.

Signed-off-by: Hassan Abouelela <[email protected]>
@HassanAbouelela
Copy link
Member Author

Nearly done, just need to wrap up some dependency work. I've tested the new CI on my fork.

@ChrisLovering
Copy link
Member

Closing this now that #181 is merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: API Related to or causes API changes area: backend Related to internal functionality and utilities priority: 2 - normal type: feature New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow Evaluation Using Multiple Python Versions
4 participants