Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Idea: Compatibility Matrix #20

Closed
nitsanavni opened this issue Sep 6, 2024 · 5 comments
Closed

Idea: Compatibility Matrix #20

nitsanavni opened this issue Sep 6, 2024 · 5 comments

Comments

@nitsanavni
Copy link

I guess some features work some of the time with some of the LMs.

Idea: Document a Compatibility Matrix

Can take inpiration from e.g. MDN Compatibility Tables.

I imagine one axis to be the LM, and the other a SudoLang feature, syntax, etc.
Cells could be boolean (green / red), or could show a percentage for estimated success frequency.

@ericelliott
Copy link
Contributor

Thank you. 🙏

I so far have not conclusively identified any feature of SudoLang that is NOT supported by all tested models. If you have some, please supply them.

@nitsanavni
Copy link
Author

Could also be a way to extend SudoLang to "harder" constructs (harder for LLMs), that may only be reliable on certain models.

@nitsanavni
Copy link
Author

Or, could be a way to assess weaker, smaller models - how small can you go until some SudoLang constructs become unusable?

@ericelliott
Copy link
Contributor

ericelliott commented Sep 25, 2024

The language is designed to work well without any special prompting across all sufficiently advanced language models, and so far, I have not identified any features that just don't work at all on some models vs others. Because the goal is to remain highly intuitive to all models (and people), I have no desire to develop "harder" constructs for SudoLang. With that in mind, a "compatibility" matrix would be kinda boring, because there would be check boxes in every box for every model, with no real differentiation, although some models are less good at tracking variable changes due to attention limitations. See below.

That said, it might be a good idea to provide an overview of how well various models are suited to complex instruction following and reasoning in general, which does impact the effectiveness of models at processing SudoLang and following instructions well. Spoiler: GPT-4o is not so great. Claude 3.5 is currently a clear winner (best bang-for-buck). OpenAI o1 is pretty great, but slow and expensive. Llama 3+ 70b+ are great. Google Gemma 2 27b beats GPT-4o 👀, but is a little uncreative.

@nitsanavni
Copy link
Author

Makes sense. Thanks for taking the time to think through this and explain!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants