-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Idea: Compatibility Matrix #20
Comments
Thank you. 🙏 I so far have not conclusively identified any feature of SudoLang that is NOT supported by all tested models. If you have some, please supply them. |
Could also be a way to extend SudoLang to "harder" constructs (harder for LLMs), that may only be reliable on certain models. |
Or, could be a way to assess weaker, smaller models - how small can you go until some SudoLang constructs become unusable? |
The language is designed to work well without any special prompting across all sufficiently advanced language models, and so far, I have not identified any features that just don't work at all on some models vs others. Because the goal is to remain highly intuitive to all models (and people), I have no desire to develop "harder" constructs for SudoLang. With that in mind, a "compatibility" matrix would be kinda boring, because there would be check boxes in every box for every model, with no real differentiation, although some models are less good at tracking variable changes due to attention limitations. See below. That said, it might be a good idea to provide an overview of how well various models are suited to complex instruction following and reasoning in general, which does impact the effectiveness of models at processing SudoLang and following instructions well. Spoiler: GPT-4o is not so great. Claude 3.5 is currently a clear winner (best bang-for-buck). OpenAI o1 is pretty great, but slow and expensive. Llama 3+ 70b+ are great. Google Gemma 2 27b beats GPT-4o 👀, but is a little uncreative. |
Makes sense. Thanks for taking the time to think through this and explain! |
I guess some features work some of the time with some of the LMs.
Idea: Document a Compatibility Matrix
Can take inpiration from e.g. MDN Compatibility Tables.
I imagine one axis to be the LM, and the other a SudoLang feature, syntax, etc.
Cells could be boolean (green / red), or could show a percentage for estimated success frequency.
The text was updated successfully, but these errors were encountered: