Releases: TransformerLensOrg/TransformerLens
v2.10.0
Huge update! This is likely going to be the last big 2.x update. This update greatly improves model implementation accuracy, and adds some of the newer Qwen models.
What's Changed
- Remove einsum in forward pass in AbstractAttention by @degenfabian in #783
- Colab compatibility bug fixes by @degenfabian in #794
- Remove einsum usage from create_alibi_bias function by @degenfabian in #781
- Actions token access by @bryce13950 in #797
- Remove einsum in apply_causal_mask in abstract_attention.py by @degenfabian in #782
- clarified arguments a bit for hook_points by @bryce13950 in #799
- Remove einsum in logit_attrs in ActivationCache by @degenfabian in #788
- Remove einsum in compute_head_results in ActivationCache by @degenfabian in #789
- Remove einsum usage in refactor_factored_attn_matrices in HookedTransformer by @degenfabian in #791
- Remove einsum usage in _get_w_in_matrix in SVDInterpreter by @degenfabian in #792
- Remove einsum usage in forward function of BertMLMHead by @degenfabian in #793
- Set default_prepend_bos to False in Bloom model configuration by @degenfabian in #806
- Remove einsum in complex_attn_linear by @degenfabian in #790
- Add a demo of collecting activations from a single location in the model. by @adamkarvonen in #807
- Add support for Qwen_with_Questions by @degenfabian in #811
- Added support for Qwen2.5 by @israel-adewuyi in #809
- Updated devcontainers to use python3.11 by @jonasrohw in #812
New Contributors
- @israel-adewuyi made their first contribution in #809
- @jonasrohw made their first contribution in #812
Full Changelog: v2.9.1...v2.10.0
v2.9.1
Minor dependency change to address a change in an outside dependency
What's Changed
- added typeguard dependency by @bryce13950 in #786
Full Changelog: v2.9.0...v2.9.1
v2.9.0
Lot's of accuracy improvements! A number of models are behaving closer to how they behave in Transformers, and a new internal configuration has been added to allow for more ease of use!
What's Changed
- fix the bug that attention_mask and past_kv_cache cannot work together by @yzhhr in #772
- Set prepend_bos to false by default for Bloom model family by @degenfabian in #775
- Fix that if use_past_kv_cache is set to True models from the Bloom family produce weird outputs. by @degenfabian in #777
New Contributors
- @yzhhr made their first contribution in #772
- @degenfabian made their first contribution in #775
Full Changelog: v2.8.1...v2.9.0
v2.8.1
New notebook for comparing models, and bug fix with dealing with newer LLaMA models!
What's Changed
- Logit comparator tool by @curt-tigges in #765
- Add support for NTK-by-Part Rotary Embedding & set correct rotary base for Llama-3.1 series by @Hzfinfdu in #764
New Contributors
Full Changelog: v2.8.0...v2.8.1
v2.8.0
What's Changed
- add transformer diagram by @akozlo in #749
- Demo colab compatibility by @bryce13950 in #752
- Add support for
Mistral-Nemo-Base-2407
model by @ryanhoangt in #751 - Fix the bug that tokenize_and_concatenate function not working for small dataset by @xy-z-code in #725
- added new block for recent diagram, and colab compatibility notebook by @bryce13950 in #758
- Add warning and halt execution for incorrect T5 model usage by @vatsalrathod16 in #757
- New issue template for reporting model compatibility by @bryce13950 in #759
- Add configurations for Llama 3.1 models(Llama-3.1-8B and Llama-3.1-70B) by @vatsalrathod16 in #761
New Contributors
- @akozlo made their first contribution in #749
- @ryanhoangt made their first contribution in #751
- @xy-z-code made their first contribution in #725
- @vatsalrathod16 made their first contribution in #757
Full Changelog: v2.7.1...v2.8.0
v2.7.1
What's Changed
- Updated broken Slack link by @neelnanda-io in #742
from_pretrained
has correct return type (i.e.HookedSAETransformer.from_pretrained
returnsHookedSAETransformer
) by @callummcdougall in #743- Avoid warning in
utils.download_file_from_hf
by @albertsgarde in #739
New Contributors
- @albertsgarde made their first contribution in #739
Full Changelog: v2.7.0...v2.7.1
v2.7.0
Model 3.2 support! There is also a new compatibility added to the function test_promt
to allow for multiple prompts, as well as a minor typo.
What's Changed
- Typo hooked encoder by @bryce13950 in #732
utils.test_prompt
compares multiple prompts by @callummcdougall in #733- Model llama 3.2 by @bryce13950 in #734
Full Changelog: v2.6.0...v2.7.0
v2.6.0
Another nice little feature update! You now have the ability to ungroup the grouped query attention head component through a new config parameter ungroup_grouped_query_attention
!
What's Changed
- Ungrouping GQA by @hannamw & @FlyingPumba in #713
Full Changelog: v2.5.0...v2.6.0
v2.5.0
Nice little release! This release adds a new parameter named first_n_layers
that will allow you to specify how many layers of a model you want to load.
What's Changed
- Fix typo in bug issue template by @JasonGross in #715
- HookedTransformerConfig docs string:
weight_init_mode
=>init_mode
by @JasonGross in #716 - Allow loading only first n layers. by @joelburget in #717
Full Changelog: v2.4.1...v2.5.0
v2.4.1
Little update to the code usage, but huge update for memory consumption! TransformerLens now needs almost half the memory it needed previously to boot thanks to a change with how the TransformerLens models are loaded.
What's Changed
- removed einsum causing error when use_atten_result is enabled by @oliveradk in #660
- revised loading to recycle state dict by @bryce13950 in #706
New Contributors
- @oliveradk made their first contribution in #660
Full Changelog: v2.4.0...v2.4.1