-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[framework] game registry #99
Comments
The eval scripts rely on the structure of |
You can adjust the games loading code here: clembench/clemgame/__init__.py Lines 37 to 47 in c6a4546
|
Thinking a bit more about this. What do we want to achieve?
Thinking about this from the perspective of
What could an entry of the registry look like?
How are locations denoted? Would it be enough to have the conventions that game code by default lives in sister directories? So at least in the official What I am not sure about is how to realise the capability of selecting by properties, or what it even means... I thought it could be nice to have this be a list, so that the same game can be part of more than one collection. But that doesn't work with the current unification mechanism at least. Also it doesn't really work with the way things are set up at the moment, because our version numbers do not only reference changes in instances, but can also reference changes in code. So to reconstruct a particular version of the benchmark, one needs to make sure that the code that is being linked to is at the right version. Aha. Maybe this could work:
This means that for reproducability purposes, if one wants to re-run an older benchmark, one needs to check out the games repository at a particular revision, then rename it to conform with this, and then can call Alright. Looks like this wouldn't be too dramatic a change. The game loading / identification code needs to change, but hopefully the rest can stay. (Problem might be the resource location methods?) What are the chances that most of the code for dealing with spec jsons can be re-used, @phisad ? |
Just adding a note here. This:
would not work, as it would lead to multiple entries with But I think what is valid is that older versions need to be identified via their code. The understanding would have to be that an entry in the registry denotes a combination of code and instances (because that's what's packaged up together). So the information that a particular game has been part of several versions of the benchmark doesn't belong to a single entry, if each entry only denotes game-in-particular-version-of-benchmark. (There are two goals here I think which only partially overlap: One is to be able to easily access collections of games, and the other is being able to reproduce older versions of the benchmark.) |
Here's a first template for the game registry @sherzod-hakimov and I briefly discussed yesterday:
My approach to the collections would then be similar to what we already discussed for the instance files (and results structure): And how about addressing the reproducibility aspect by marking each version as a Release in the new game repository (assuming that we are able to mirror the "evolution" of the games there)? This still requires a manual checkout of the required version, but the model registry would not need to be changed (assuming the version above). |
Can you elaborate on this? I don't understand what One desideratum that I see is to make this work as much as possible without changing the mechanism that we arrived at for the model registry; or at the very least only adding to that mechanism. (Not only because it's quite elegant, but also because we don't want to duplicate functionality.) So when thinking about this, one element that's important to understand is that that mechanism works via unification. (At least that was the idea...) So basically, the specification (= feature structure) that is used (= used to select the backend, and then passed on to it) is the first specification that unifies with the specification (feature structure) that is given. This makes it possible to find a fully specified specification just by giving one feature value (e.g., `"name": "gpt-4"), because that will find the first one that has this value for this feature; all other feature values will then come from that entry. But it also makes it possible to extend an existing entry, by specifying a feature that it doesn't mention, and it makes it possible from blocking an existing entry from matching, by specifying a feature differently (and hence making what it passed on the command line the full entry). Another nice feature is that the consumer (in the case of the model registry, the backend) will just ignore features that it doesn't care about. So this -- together with the fact that there is only a convention for the structure, and not a real schema -- makes it possible to stick additional information into the registry, that may be used elsewhere (e.g., in the model registry, the number of parameters, which is just used for creating plots). Something that does not yet work, and if we want it, would require some thinking, is unification into sets or lists. So selecting something that is specified as |
I think the first thing to get clarity about is what an entry in the registry is meant to specify: Is it a particular game as such (in which case it makes sense to list all benchmark versions it was involved in), or is it -- more modestly -- one directory containing game code and instances? I'm tending towards the latter, because then, together with your proposal, we could easily capture the reproducibility use case:
(This would mean however that with each new release, we need to add a batch of entries like the one above. The "unmarked" entry ( |
But in general, I like the idea of putting all of that information in there. It should be possible to ask questions like "what are all games (game directories) that use multiple images?" or "what are all games (game directories) for which Spanish instances exist?". |
To answer your question (now a bit further) above: I did actually understand the unification approach and will look into how this can be used for the game registry once I have a working version for loading the games interactively in general. |
But how? Where does one specify what |
As far as I understood, the use case is a bit different here than with the backends. |
Come to think of it, I would place this difference elsewhere, so as to not break the mechanism. Maybe something like |
I think we need to be careful to keep game, directory containing game and instances, experiment, language of experiment, instance, etc. etc. separate, and be clear about what this mechanism should do. |
Hi! Barging in here with a bit of an outsider's perspective (I am working on multilingual versions of taboo and wordle). First of all, I think it's a great step towards systematization. Personally, based on the discussion here I would add two things to the template suggested by @AnneBeyer: First, for each game, we have three game "variables" we need to consider: version, language, game variant (e.g vanilla wordle, wordle with clue, wordle with critic). Version - to be honest, I don't understand the definition of version. Initially I thought it was the version of the framework itself: so there was clembench v0.9 and there were some games relying on this implementation (say, taboo-0.9). Then clembench v1.0 came, old games' implementations were accordingly updated ( taboo-1.0), old implementations are tagged/archived(?) and some new games are added. Language should influence instance generation and resources only, the game logic should be language-agnostic. So, if the structure of instances is likely to change between versions, version and language should come in a bind. Game variants are different games sharing a lot of code. So I believe that while the implementations should be refactored and unified under the same parent game folder, in the game registry, each entry should reflect a specific game variant as suggested. Second, if I understand the purpose of introducing collections correctly, I would implement them with arbitratry tags and then programmatically filter/combine games based on their tags (and probably define collections as a combination of tags and other parameters (versions, langs, etc.) based on the registry. So the registry entry can follow this structure.
|
Another high-level thing I would like to discuss is the approach to keep the framework and the games in one place. I think it would be way easier to maintain the whole setup if the two things were separated: one repo with the core framework logic and games in separate repos. In this case, each game repo would contain a descriptor file similar to the game registry entry and the registry would look like a key-value map like |
I'm a little bit worried about overloading this. We should identify a main purpose, maybe something like "the primary function of an entry is to map between identifying characteristics and a directory". I think what's written above is compatible with that, but it would be good to keep that in mind. With that said, and inspired by your last comment, @YanaPalacheva (although I don't fully understand it), maybe we could even do something that is a bit huggingface-like and allow for the path to be a repository identifier? The maximal solution here would then be to even automatically check out the repo, if it isn't in Or, more modestly, we could just have the understanding that the path is to a repository. Either way we could then add to the version description something like |
First step: Second step: Third step: |
This actually also overlaps with #62 |
…een postponed for now, see issue clp-research#99)
Instead of making assumptions about where the code for a game has to live in order to run the game via
cli.py
, we could introduce agame_registry.json
(in the style of themodel_registry.json
) that is the only thing that needs to live in a known location. (Suggestion by Sherzod.)The entries for each game in there could then point to a directory where the code is, besides holding all kinds of other information about the game that could be useful for whatever consumer. E.g.,
image: none | single | multiple
, etc. etc. The scripts for running a benchmark could then filter on these entries, and automatically pull together what is needed.But mostly, this would make it possible to move the game code outside of the main repository. We could still make default assumptions (for example, that it lives in a sister directory, e.g. if this is
cb-code/clembench
, it lives incb-code/clemgames
, and entries in the registry could have a relative path../clemgames/GAME
), but with our usual mechanism, these could be overwritten or adapted, if you want to put games elsewhere.The text was updated successfully, but these errors were encountered: