Skip to content

[NeurIPS 2024] | An Efficient Recipe for Long Context Extension via Middle-Focused Positional Encoding

Notifications You must be signed in to change notification settings

bigai-nlco/CREAM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

cream-logo

An Efficient Recipe for Long Context Extension via Middle-Focused Positional Encoding

arXiv Conference

Updates

  • (2024.09.26) Our Paper have been accepted by NeurIPS 2024🔥🔥.
  • (2024.06.11) Paper Release on Arxiv.

🚀 Overview

We propose Continuity-Relativity indExing with gAussian Middle (CREAM), which interpolates positional encodings by manipulating position indices.

Apart from being simple, CREAM is training-efficient: it only requires fine-tuning at the pre-trained context window (e.g., Llama 2-4K) and can extend LLMs to a much longer target context length (e.g., 256K).

To ensure that the model focuses more on the information in the middle, we introduce a truncated Gaussian to encourage sampling from the middle part of the context during fine-tuning, thus alleviating the “Lost-in-the-Middle” problem faced by long-context LLMs.

Experimental results show that CREAM successfully extends LLMs to the target length for both Base and Chat versions of Llama2-7B with “Never Miss A Beat”.

⚙️ Installation

# clone project
git clone [email protected]:wutong4012/CREAM.git
cd CREAM

# create conda environment
conda create -n cream python=3.9
conda activate cream

# install requirements
pip install -r requirements.txt
conda install -c nvidia cuda-nvcc
pip install flash_attn-2.5.7+cu122torch2.2cxx11abiFALSE-cp39-cp39-linux_x86_64.whl

# replace lm-evaluation-harness
git clone https://github.com/EleutherAI/lm-evaluation-harness.git
"replace lm_eval folder"

💡 How to run

You can download all the finetune data and evaluation data from pile_4k_train, pile_val, ShareGPT_4k_train, ShareGPT_val, gov_report, proof-pile, book3, pg19_long, LongChat-Lines, Needle in a Haystack, LongBench

Attention: You have to modify the "root" path in every file in the scripts folder.

Train model

bash scripts/run_CREAM.sh 8 linear llama2 5946 CREAM

bash scripts/run_CREAM_chat.sh 8 linear llama2_chat 5946 CREAM

Evaluate model

bash scripts/eval_longchat_lines.sh 8 linear llama2 CREAM 1000

bash scripts/eval_lost_in_the_middle.sh 8 linear llama2 CREAM 1000

bash scripts/eval_needle.sh 8 linear llama2_chat CREAM 100

bash scripts/eval_longbench.sh 8 linear llama2_chat CREAM 100

bash scripts/eval_ppl.sh 8 linear llama2 CREAM 1000

bash scripts/eval_long_ppl.sh 64 linear llama2 CREAM 1000

bash scripts/eval_benchmark.sh 8 linear llama2 CREAM 1000

⚽ Evaluation Results

LongChat-Lines

Lost in the Middle

Needle in a Haystack

LongBench

Acknowledgement

Data / Code:

📜 Citation

Please cite our paper if you use CREAM in your work:

@inproceedings{wu2024cream,
    title={An Efficient Recipe for Long Context Extension via Middle-Focused Positional Encoding},
    author={Wu, Tong and Zhao, Yanpeng and Zheng, Zilong},
    booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
    volume = {37},
    year={2024}
}

About

[NeurIPS 2024] | An Efficient Recipe for Long Context Extension via Middle-Focused Positional Encoding

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published