Add Prompt testing using a LLM testing suite #174

vmesel · 2024-04-07T02:31:52Z

We need to enable our users to prevent having regressions on their prompt without noticing it clearly. In order to achieve this, we must implement a way for our users to run a test suite on their own test cases with a user set similarity score. This must be simply set and must be extensible to use beyond toml files.

avelino · 2024-04-07T13:05:33Z

https://github.com/confident-ai/deepeval?tab=readme-ov-file#writing-your-first-test-case

this week @lgabs shared this project with me, apparently we were able to use it to test our prompts

vmesel · 2024-04-07T13:24:41Z

Yes, I saw the README on ragtalks and forgot to attach the link, thats the library I was looking for.

…

On Sun, 7 Apr 2024 at 10:05 Avelino ***@***.***> wrote: https://github.com/confident-ai/deepeval?tab=readme-ov-file#writing-your-first-test-case this week @lgabs <https://github.com/lgabs> shared this project with me, apparently we were able to use it to test our prompts — Reply to this email directly, view it on GitHub <#174 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABGA2UYJKDWV5JV4VIOMKCLY4FADHAVCNFSM6AAAAABF25SA3CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANBRGQ3DINRSGQ> . You are receiving this because you authored the thread.Message ID: ***@***.***>

lgabs · 2024-04-07T13:28:30Z

Yeah, I couldn't study too much of llm evals, but I did check that the way the community seams to evaluate llms applications is using several standard metrics such that another llm evaluates the llm apppication outputs against expected results (this evaluator could even be a free local llm, for a task is much simpler).

I saved this Andre Ng short course to see soon, maybe it'll help 🤘

lgabs · 2024-04-07T13:35:56Z

Also, I think it would be a good idea for us to use the same common dataset for local dev, and also for tests that depend on dataset (generate embeddings or even these llm evals), one idea is to download some one from hugging face like this wiki_qa. What do you think? This would a new issue, of course.

These llm evals change a lot depending on the domain, so i think we could just make good documentation about how to add these tests without trying to write generic cases for all datasets.

vmesel · 2024-04-07T13:43:53Z

So I was thinking on making it available for the developer to create new test cases, not necessarily our software writing generic test cases. Think of having just a single test case, instead of having to implement it by yourself inside dialog, you can just write a new toml file that could enable this feature. On using other LLMs, I’m not aware on what should be implemented in those cases, going to watch the video here to research more about this.

…

On Sun, 7 Apr 2024 at 10:36 Luan Fernandes ***@***.***> wrote: Also, I think it would be a good idea for us to use the same common dataset for local dev, and also for tests that depend on dataset (generate embeddings or even these llm evals), one idea is to download some one from hugging face like this [wiki_qa](https://huggingface.co/datasets/wiki_qa. What do you think? This would a new issue, of course. These llm evals change a lot depending on the domain, so i think we could just make good documentation about how to add these tests without trying to write generic cases for all datasets. — Reply to this email directly, view it on GitHub <#174 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABGA2U2FJT2U72XBOKCJD63Y4FDVFAVCNFSM6AAAAABF25SA3CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANBRGQ3TIMRRGQ> . You are receiving this because you authored the thread.Message ID: ***@***.***>

avelino added LLM: ChatGPT Prompt Engineering devx labels Apr 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Prompt testing using a LLM testing suite #174

Add Prompt testing using a LLM testing suite #174

vmesel commented Apr 7, 2024 •

edited

Loading

avelino commented Apr 7, 2024

vmesel commented Apr 7, 2024 via email

lgabs commented Apr 7, 2024

lgabs commented Apr 7, 2024 •

edited

Loading

vmesel commented Apr 7, 2024 via email

Add Prompt testing using a LLM testing suite #174

Add Prompt testing using a LLM testing suite #174

Comments

vmesel commented Apr 7, 2024 • edited Loading

avelino commented Apr 7, 2024

vmesel commented Apr 7, 2024 via email

lgabs commented Apr 7, 2024

lgabs commented Apr 7, 2024 • edited Loading

vmesel commented Apr 7, 2024 via email

vmesel commented Apr 7, 2024 •

edited

Loading

lgabs commented Apr 7, 2024 •

edited

Loading