nlp_course/week08_llm at 2022 · nikostro/nlp_course

History

Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
lecture_llm.odp		lecture_llm.odp
lecture_llm.pdf		lecture_llm.pdf
practice.ipynb		practice.ipynb

README.md

Large language models

Slides: lecture_llm.pdf
Practice session: practice.ipynb
Video (in russian): lecture, practice

[practice notes] Fine-grained inference

If for some reason you're not satisfied with model.generate interface, you can write your own inference code with iterative forward passes. Here's how it's done:

prefix = "Mark Zuckerberg is"  # same as above
batch = tokenizer(prefix, return_tensors='pt')
past_key_values = None
with torch.cuda.amp.autocast():
  for i in range(50):
    outputs = model.forward(**batch, use_cache=True, past_key_values=past_key_values)
    probs = outputs.logits[0, -1].div(0.8).softmax(-1)
    token = torch.multinomial(probs, 1).view([])

    print(tokenizer.decode(token), end=' ', flush=True)
    past_key_values = outputs.past_key_values
    batch = dict(input_ids=outputs.logits[0, -1].argmax(-1).reshape(1, 1),
                 attention_mask=torch.ones(1, past_key_values[0][0].shape[-2] + 1, device='cuda'))

[practice notes] How to optimize for inference

The code below converts training-optimized 8bit weights into inference-optimized layout. It should result in significantly faster inference in the same memory footprint. However, if you do this, you can no longer run training -- there is no way to un-convert after the first optimized forward!

model.config.use_cache = True
for module in model.modules():
    if isinstance(module, bnb.nn.Linear8bitLt):
        module.state.memory_efficient_backward = False

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

week08_llm

week08_llm

README.md

Large language models

Read more

[practice notes] Fine-grained inference

[practice notes] How to optimize for inference

Files

week08_llm

Directory actions

More options

Directory actions

More options

Latest commit

History

week08_llm

Folders and files

parent directory

README.md

Large language models

Read more

[practice notes] Fine-grained inference

[practice notes] How to optimize for inference