Add ability to delete tokens (undo feed) #334

steventrouble · 2023-06-27T18:11:05Z

The goal of this PR is to enable token feed undo. This is mostly useful for autocomplete functionality (e.g. keyboards or coding assistants), where the text changes often around the cursor's position.

Closes #332

Tested on GptNeoX by writing a demo program that uses the delete_token method to minimize model refreshes. I tested these use cases:

Add 200 tokens, then delete one of them. Ensure topk predictions look the same as before the latest token.
Add 200 tokens, then delete all of them. Ensure topk predictions return to start of sentence.
Delete then add, add then delete, etc.
Used println!() extensively to verify it was deleting the expected tokens.

The performance is pretty solid. Deleting 10 tokens takes less than a millisecond on my box.

A potential issue with this PR is that you can't delete the start of content token because decoded_tokens.truncate would fail. It also might fail on other special tokens, but I'm not sure how to encounter these as part of my testing.

LLukas22 · 2023-06-28T08:06:13Z

Good job, i read through it and it looks good. Could you provide a link to your demo program? I would like to play around a bit with it.

steventrouble · 2023-06-28T12:03:27Z

Sure! Here's the demo code:

https://gist.github.com/steventrouble/47cca1306bf3a41c4f62dc698d71f5af

And you can run it with cargo run -- --model=my_model_path

I extracted it rather hastily from my project crate, but I think it should compile fine. Hopefully I didn't remove too much functionality when I yanked it out, and hopefully it's somewhat comprehensible. It's a little late over here 😅

crates/llm-base/src/inference_session.rs

philpax · 2023-06-28T20:46:49Z

Hi there! Great work on this; this is a very useful feature to have.

The only thing I'm not sure about is the interaction with the memory_k/memory_v tensors. If you do the following sequence of operations:

logits_0 = feed_prompt("blah")
logits_200 = infer_200(deterministic_sampler)

logits_0_new = rewind_200()
logits_200_new = infer_200(deterministic_sampler)

assert(logits_0 == logits_0_new)
assert(logits_200 == logits_new)

Do the logits match with no further user intervention? My understanding is that the process of feeding prompts/running inference will mutate the memory tensors, so the logits will not match. (Feel free to correct me if I'm wrong - it's very hard to get visibility into exactly what's happening with the tensors, so I have to resort to conjecture)

steventrouble · 2023-06-28T21:15:08Z

You bring up a great point. There's not a lot of visibility into the buffers. I verified with a hex editor that GptNeoX only modifies the memory corresponding to the latest token, but I'd be happy to do more tests.

Do you have ggml copies of tiny models for each of the model types? That would save me a lot of time when testing.

Also, I'm not sure what the policy is on integration tests, but I'd love to be able to submit some kind of repeatable test so others can verify it works on other hardware.

LLukas22 · 2023-06-28T21:37:03Z

@steventrouble Currently we don't have integration tests but we sure would like to have some :D (see #319)

The problem is that we don't have miniature versions of our models which makes testing with the original models kinda difficult.
Maybe we could create some sort of python script to only covert a single layer of a pytorch model but that's incredibly hacky.

steventrouble · 2023-06-30T20:15:51Z

@LLukas22 Not sure how much progress you've made on this already, but I can give you mock models that compress to about 2MB each. Does that work for your needs?

I set most of the model weights and biases to 0, then compress it with gzip. We should be able to do this with any model. Here's an example with GptNeoX:

https://github.com/steventrouble/llm/tree/smoke/crates/llm/tests

steventrouble · 2023-06-30T21:33:21Z

Awesome! I saw your integration tests PR 😄

I wrote a basic integration test, but I'll hold off on uploading until your PR is stable. Then I'll upate my PR to add the test in. Ping me when I can pick this up again!

LLukas22 · 2023-07-01T07:59:29Z

Uhm i dont know how these zeroed out models will behave but wont they always output the same results as they will zero out the input after the first matmul?

steventrouble · 2023-07-05T04:42:25Z

A few of the neurons will still have values, so it'll output random jibberish. It's mostly useful as a lightweight smoke test, to make sure the tensor loader still works and that the compute graph doesn't crash at inference time.

LLukas22 · 2023-07-05T11:04:34Z

Maybe we could use these tiny models, currently we're just using the smallest available models for each architecture 🤷
I also thought about creating some "tiny_shakespeare" models where is use 1-2 layers and maybe 8 heads to train a small variant of each model on this dataset.

But honestly, we can just use the original models as the github action runners can download them in about 30 sec 😅

If you want you could pull the newest main into this branch and try to create an integration test for the token deletion 👍

steventrouble · 2023-07-05T17:28:24Z

Sweet! I'm on it 😃

Helps newbies like me set up the repo correctly.

steventrouble · 2023-07-06T02:36:35Z

Alright, all tested. Deleting tokens works on all the models! (except llama 😢)

LLukas22 · 2023-07-06T05:33:27Z

Hm thats strange, are you recalculating the K/V memory or are you removing the last n entries from it?

steventrouble · 2023-07-07T00:12:19Z

@LLukas22 Neither, I leave the memory unchanged. The old values are still in the memory, but it's fine because the memory isn't read. E.g. in GPT-J, accesses to memory_k and memory_v use session_len (n_past) to only select up to the last token.

llm/crates/models/gptj/src/lib.rs

Line 185 in 7d6eee3

(memory_k_size * n_embd) * (il * ctx_size + session_len),

P.S: I accidentally clicked "close issue", sorry 😅 Could you re-open the PR?

steventrouble · 2023-07-07T03:02:51Z

TBH, I'm not sure why it doesn't work for llama. I've looked over the code, and all access to memory_k and memory_v seem to be using session_len.

Is it possible it's storing state in the scratch buffers or statically? Or that there's some subtle bug in one of the memory_k/v accesses?

LLukas22 · 2023-07-07T05:20:33Z

Maybe it has something to do with the scratch buffers. If i find some time i'll give it a look and check why the results diverge.

steventrouble · 2023-07-07T16:51:03Z

Aha! Found the issue. The test was wrong, and the model works fine with delete. I used a debugger to verify that it only touches the section of memory corresponding to the current token, and that the scratch buffers don't (seem to) carry state between calls.

The issue was that the string " crab" is actually tokenized as two separate tokens by llama, but I only remove one token in the test. I updated the test to use " ", which is a single token and it passes for llama.

😄

This'll help when I add my tests, because otherwise this file will explode in size

Note that llama.json fails the tests, so it's likely it doesn't support it. I may investigate further, though.

steventrouble · 2023-07-07T18:08:58Z

(Fixed a temp change I made while debugging. I'm still getting back into the hang of git after 8 years of perforce, thanks for your patience!)

LLukas22 · 2023-07-08T07:55:02Z

Is this ready for a review?

steventrouble · 2023-07-08T18:27:01Z

Yup, it looks good for review.

Thanks for your patience! Next time I contribute there will be a lot less back and forth.

LLukas22

LGTM, good job 👍

Maybe a few naming/messages need to be changed 🤔

crates/llm-base/src/model/mod.rs

binaries/llm-test/src/delete.rs

binaries/llm-test/src/tokens.rs

philpax

Looks great outside of what Lukas mentioned - ready to merge once those are fixed!

binaries/llm-test/src/delete.rs

philpax · 2023-07-09T19:53:27Z

Fantastic work - thank you very much 🚀

LLukas22 reviewed Jun 28, 2023

View reviewed changes

crates/llm-base/src/inference_session.rs Outdated Show resolved Hide resolved

steventrouble force-pushed the main branch from 622c1da to b1811b0 Compare July 6, 2023 02:32

Add verify_state to ggml/sys/build.rs

2d939e6

Helps newbies like me set up the repo correctly.

steventrouble force-pushed the main branch from b1811b0 to 936cfa7 Compare July 6, 2023 02:33

steventrouble force-pushed the main branch from 936cfa7 to fe5c112 Compare July 6, 2023 02:55

steventrouble closed this Jul 7, 2023

hhamud reopened this Jul 7, 2023

steventrouble force-pushed the main branch from fe5c112 to 7ac22e3 Compare July 7, 2023 16:57

steventrouble added 3 commits July 7, 2023 11:06

Refactor tests to put each type in it's own file.

2646aba

This'll help when I add my tests, because otherwise this file will explode in size

Add Tokens test to test the feed_prompt method

b2238c2

Add delete tokens test and impl.

2e35b46

Note that llama.json fails the tests, so it's likely it doesn't support it. I may investigate further, though.

steventrouble force-pushed the main branch from 7ac22e3 to 2e35b46 Compare July 7, 2023 18:06

LLukas22 reviewed Jul 9, 2023

View reviewed changes

crates/llm-base/src/model/mod.rs Outdated Show resolved Hide resolved

binaries/llm-test/src/delete.rs Outdated Show resolved Hide resolved

binaries/llm-test/src/delete.rs Outdated Show resolved Hide resolved

binaries/llm-test/src/tokens.rs Outdated Show resolved Hide resolved

philpax reviewed Jul 9, 2023

View reviewed changes

binaries/llm-test/src/delete.rs Outdated Show resolved Hide resolved

binaries/llm-test/src/delete.rs Outdated Show resolved Hide resolved

Address PR comments

2badcd9

philpax merged commit 7f13bb9 into rustformers:main Jul 9, 2023
13 checks passed

hhamud mentioned this pull request Aug 7, 2023

Write a 0.2 changelog #244

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ability to delete tokens (undo feed) #334

Add ability to delete tokens (undo feed) #334

steventrouble commented Jun 27, 2023 •

edited by philpax

Loading

LLukas22 commented Jun 28, 2023

steventrouble commented Jun 28, 2023

philpax commented Jun 28, 2023

steventrouble commented Jun 28, 2023 •

edited

Loading

LLukas22 commented Jun 28, 2023

steventrouble commented Jun 30, 2023 •

edited

Loading

steventrouble commented Jun 30, 2023

LLukas22 commented Jul 1, 2023 •

edited

Loading

steventrouble commented Jul 5, 2023

LLukas22 commented Jul 5, 2023

steventrouble commented Jul 5, 2023

steventrouble commented Jul 6, 2023

LLukas22 commented Jul 6, 2023

steventrouble commented Jul 7, 2023 •

edited

Loading

steventrouble commented Jul 7, 2023

LLukas22 commented Jul 7, 2023

steventrouble commented Jul 7, 2023 •

edited

Loading

steventrouble commented Jul 7, 2023

LLukas22 commented Jul 8, 2023

steventrouble commented Jul 8, 2023

LLukas22 left a comment

philpax left a comment

philpax commented Jul 9, 2023

Add ability to delete tokens (undo feed) #334

Add ability to delete tokens (undo feed) #334

Conversation

steventrouble commented Jun 27, 2023 • edited by philpax Loading

LLukas22 commented Jun 28, 2023

steventrouble commented Jun 28, 2023

philpax commented Jun 28, 2023

steventrouble commented Jun 28, 2023 • edited Loading

LLukas22 commented Jun 28, 2023

steventrouble commented Jun 30, 2023 • edited Loading

steventrouble commented Jun 30, 2023

LLukas22 commented Jul 1, 2023 • edited Loading

steventrouble commented Jul 5, 2023

LLukas22 commented Jul 5, 2023

steventrouble commented Jul 5, 2023

steventrouble commented Jul 6, 2023

LLukas22 commented Jul 6, 2023

steventrouble commented Jul 7, 2023 • edited Loading

steventrouble commented Jul 7, 2023

LLukas22 commented Jul 7, 2023

steventrouble commented Jul 7, 2023 • edited Loading

steventrouble commented Jul 7, 2023

LLukas22 commented Jul 8, 2023

steventrouble commented Jul 8, 2023

LLukas22 left a comment

Choose a reason for hiding this comment

philpax left a comment

Choose a reason for hiding this comment

philpax commented Jul 9, 2023

steventrouble commented Jun 27, 2023 •

edited by philpax

Loading

steventrouble commented Jun 28, 2023 •

edited

Loading

steventrouble commented Jun 30, 2023 •

edited

Loading

LLukas22 commented Jul 1, 2023 •

edited

Loading

steventrouble commented Jul 7, 2023 •

edited

Loading

steventrouble commented Jul 7, 2023 •

edited

Loading