Skip to content

Latest commit

 

History

History
100 lines (69 loc) · 2.92 KB

README.md

File metadata and controls

100 lines (69 loc) · 2.92 KB


An Evolved Universal Transformer Memory

📄 [Paper] | 🤗 [Hugging Face] 📁 [Dataset]

Installation

We provide means to install this repository with conda:

For the full set of dependencies with fixed versions (provided to ensure some level of long-term reproducibility):

conda env create --file=env.yaml

For a more minimal and less constrained set of dependencies (for future development/extensions):

conda env create --file=env_minimal.yaml

Usage

Training

Training following the incremental setup described in our work can be replicated via the following hydra commands:

stage 1 training:

torchrun --standalone --nproc_per_node=$NUM_OF_GPUs main.py run@_global_=namm_bam_i1.yaml

stage 2 training:

torchrun --standalone --nproc_per_node=$NUM_OF_GPUs main.py run@_global_=namm_bam_i2.yaml init_from='path/to/stage1/results/ckpt.pt'

stage 3 training:

torchrun --standalone --nproc_per_node=$NUM_OF_GPUs main.py run@_global_=namm_bam_i3.yaml init_from='path/to/stage2/results/ckpt.pt'

Evaluation

Evaluating trained NAMMs on the full set of LongBench tasks can be replicated for both NAMMs with the following command:

torchrun --standalone --nproc_per_node=$NUM_OF_GPUs main.py run@_global_=namm_bam_eval.yaml init_from='path/to/results/ckpt.pt'

Evaluating trained NAMMs on the full set of ChouBun tasks can be replicated with the following command:

torchrun --standalone --nproc_per_node=$NUM_OF_GPUs main.py run@_global_=namm_bam_eval_choubun.yaml init_from='path/to/results/ckpt.pt'

Additional notes

Using wandb to log the results (through the hydra setting wandb_log=true) requires authenticating to the wandb server via the following command:

wandb login

and using your account's API key (which you should be able to find here)

Gated models (e.g., Llama)

Using gated models requires authenticating to the hugging face hub by running:

huggingface-cli login

and using your account's access tokens (which you should be able to find here)

Bibtex

To cite our work, you can use the following:

@article{sakana2024memory,
title={An Evolved Universal Transformer Memory}, 
       author={Edoardo Cetin and Qi Sun and Tianyu Zhao and Yujin Tang},
       year={2024},
       eprint={2410.13166},
       archivePrefix={arXiv},
       primaryClass={cs.LG},
       url={https://arxiv.org/abs/2410.13166},
}