Idea: Compatibility Matrix #20

nitsanavni · 2024-09-06T18:12:46Z

I guess some features work some of the time with some of the LMs.

Idea: Document a Compatibility Matrix

Can take inpiration from e.g. MDN Compatibility Tables.

I imagine one axis to be the LM, and the other a SudoLang feature, syntax, etc.
Cells could be boolean (green / red), or could show a percentage for estimated success frequency.

ericelliott · 2024-09-23T23:35:04Z

Thank you. 🙏

I so far have not conclusively identified any feature of SudoLang that is NOT supported by all tested models. If you have some, please supply them.

nitsanavni · 2024-09-24T08:03:23Z

Could also be a way to extend SudoLang to "harder" constructs (harder for LLMs), that may only be reliable on certain models.

nitsanavni · 2024-09-24T08:04:53Z

Or, could be a way to assess weaker, smaller models - how small can you go until some SudoLang constructs become unusable?

ericelliott · 2024-09-25T23:32:50Z

The language is designed to work well without any special prompting across all sufficiently advanced language models, and so far, I have not identified any features that just don't work at all on some models vs others. Because the goal is to remain highly intuitive to all models (and people), I have no desire to develop "harder" constructs for SudoLang. With that in mind, a "compatibility" matrix would be kinda boring, because there would be check boxes in every box for every model, with no real differentiation, although some models are less good at tracking variable changes due to attention limitations. See below.

That said, it might be a good idea to provide an overview of how well various models are suited to complex instruction following and reasoning in general, which does impact the effectiveness of models at processing SudoLang and following instructions well. Spoiler: GPT-4o is not so great. Claude 3.5 is currently a clear winner (best bang-for-buck). OpenAI o1 is pretty great, but slow and expensive. Llama 3+ 70b+ are great. Google Gemma 2 27b beats GPT-4o 👀, but is a little uncreative.

nitsanavni · 2024-09-26T08:47:34Z

Makes sense. Thanks for taking the time to think through this and explain!

ericelliott closed this as completed Oct 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Idea: Compatibility Matrix #20

Idea: Compatibility Matrix #20

nitsanavni commented Sep 6, 2024

ericelliott commented Sep 23, 2024

nitsanavni commented Sep 24, 2024

nitsanavni commented Sep 24, 2024

ericelliott commented Sep 25, 2024 •

edited

Loading

nitsanavni commented Sep 26, 2024

Idea: Compatibility Matrix #20

Idea: Compatibility Matrix #20

Comments

nitsanavni commented Sep 6, 2024

ericelliott commented Sep 23, 2024

nitsanavni commented Sep 24, 2024

nitsanavni commented Sep 24, 2024

ericelliott commented Sep 25, 2024 • edited Loading

nitsanavni commented Sep 26, 2024

ericelliott commented Sep 25, 2024 •

edited

Loading