5th of July Updates #6

alexandersimoes · 2024-07-05T17:58:05Z

No description provided.

alebjanes · 2024-07-05T20:23:31Z

Updates on my side:

Evaluation set: the evaluation set we'll use with all approaches with 100 questions is ready, along with the correct answers and correct values that should go in the answers.
New content to the corpus: In order for the RAG to work, I added the content that is needed to answer these questions to the corpus (for all years available). This includes content with broad product categories (like dairy, salmon, chicken, etc.) that are composed by multiple hs codes.
RAG evaluation: Now I'm running the RAG evaluation which will take each of this 100 questions, fetch the top k results using similarity search and then pass this as context to an LLM. I'll try a few combinations changing the top k results (5 or 10), the embedding model, and the final LLM model (to evaluate the costs of using gpt-4 or gpt-3.5 here). As an initial result here, the first evaluation got 81/100 questions correct.

pippo-sci · 2024-07-08T14:20:09Z

Fine-tuning results with both random sample and Ale's test set (in RAG only questions):

Model	Accuracy
TinyLlama 1epoch	0%
TinyLlama 10 epoch	0%
TinyLlama 50 epoch	0%
Llama2 1epoch	0%

The main issue is the model learns the text around the numbers but it gets the numbers wrong. Actually, It changes the number every time is queried. Side effect, the tinyllama models lost their capabilities to answer other inputs.

Next steps:

Apply another metric to discriminate how far off the values are, to check if there is any difference between models
Test fine tuning of api URLs
Test with stripped version of the multilayer approach

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

5th of July Updates #6

5th of July Updates #6

alexandersimoes commented Jul 5, 2024

alebjanes commented Jul 5, 2024

pippo-sci commented Jul 8, 2024 •

edited

Loading

5th of July Updates #6

5th of July Updates #6

Comments

alexandersimoes commented Jul 5, 2024

alebjanes commented Jul 5, 2024

pippo-sci commented Jul 8, 2024 • edited Loading

pippo-sci commented Jul 8, 2024 •

edited

Loading