Merge branch 'main' of github.com:frankaging/llms-switch into main

stanfordnlp · Apr 6, 2024 · fd7d6ef · fd7d6ef
2 parents dfb4dec + e12c279
commit fd7d6ef
Show file tree

Hide file tree

Showing 14 changed files with 51,408 additions and 6 deletions.
diff --git a/.gitignore b/.gitignore
@@ -22,6 +22,7 @@ task_steer.py
 templates.py
 trainer.py
 tmp/
+*.DS_Store
 
 
 # Byte-compiled / optimized / DLL files

diff --git a/README.md b/README.md
@@ -3,6 +3,15 @@
  <p>State-of-the-art Representation Fine-Tuning (ReFT) methods</p>
 </h3>
 
+> [!WARNING]
+> **Hey hey! Corrections to the preprint:** We or members of the community have identified a few typos.
+
+- (1) Hyperparameter settings presented in Table 5 and 6 in the Appendix should be swapped, i.e., GSM8K should be the one where we apply interventions to all layers. We release our training wandb logs in our [LoReFT](https://github.com/frankaging/pyreft/tree/main/examples/loreft) folder, check those to reproduce for now!
+- (2) Wrong UltraLM citation, will correct that.
+- (3) Commonsense170K is not 100 times larger than GSM8K :) (170/8).
+
+We will update our arXiv paper on Monday (April 8th, 2024). Sorry guys! Till then, happy ReFTing!
+
 # A _Powerful_, _Parameter-Efficient_, and _Interpretable_ way of fine-tuning
 Want to try a fine-tuning method that uses a fraction of SoTA PEFT parameters count, while achieving potentially better performance? Introducing **pyreft**, a **representation fine-tuning (ReFT)** library that supports adapting internal language model representations via trainable interventions. With fewer fine-tuning parameters and more robust performance, **pyreft** can boost fine-tuning efficiency, decrease fine-tuning cost, while opening the doors to study the interpretability of adapting parameters.
 
@@ -15,7 +24,7 @@ Want to try a fine-tuning method that uses a fraction of SoTA PEFT parameters co
 - Sharing the fine-tuned results easily to HuggingFace
 
 > [!TIP]
-> **Powerful and Parameter-Efficient:** Read [Our ReFT paper]() for an introduction of representation fine-tuning (ReFT) and its performance.
+> **Powerful and Parameter-Efficient:** Read [Our ReFT paper](https://arxiv.org/abs/2404.03592) for an introduction of representation fine-tuning (ReFT) and its performance.
 
 > [!TIP]
 > **Intepretable Finetuning:** Read [Composable ReFT](https://github.com/frankaging/pyreft/tree/main/examples/composition) for a sneak-peek of the interpretable nature of ReFT.
@@ -203,7 +212,7 @@ Note that Llama-2 models can follow instructions zero-shot. We encourge people t
 **Usage and License Notices**: Our chat-model is intended and licensed for research use only. The model is CC BY NC 4.0 (allowing only non-commercial use) should not be used outside of research purposes. 
 
 
-## Why you should use ReFT as opppose to PEFT?
+## Why you should use ReFT as oppose to PEFT?
 
 There are various benefits such as saving memory and storage. In addition to that, ReFT is more interpretable and extensible than PEFT. The interventions we are learning is simply a causal abstraction of the task you are training without touching any model weights. The intervention site search space is large, and can be at any token position which is more flexibile. We showcase ReFT performance on various benchmarks against popular PEFT such as LoRA and its newer variants (e.g., DoRA) in our paper.
 
@@ -223,8 +232,8 @@ Make sure you cite the **ReFT** paper:
 @article{wuandarora2024reft,
  title={ReFT: Representation Finetuning for Language Models},
  author={Wu, Zhengxuan* and Arora, Aryaman* and Wang, Zheng and Geiger, Atticus and Jurafsky, Dan and Manning, Christopher D. and Potts, Christopher},
- booktitle={arXiv:xxxx.xxxxx},
- url={arxiv.org/abs/xxxx.xxxxx},
+ booktitle={arXiv:2404.03592},
+ url={arxiv.org/abs/2404.03592},
  year={2024}
 }
 ```

diff --git a/examples/icl/README.md b/examples/icl/README.md
@@ -0,0 +1,5 @@
+# In-context learning (ICL) and ReFT
+
+Based on the script [`reft_icl.ipynb`](https://github.com/stanfordnlp/pyreft/blob/main/examples/icl/reft_icl.ipynb).
+
+ReFT requires minimum training parameters, and what if ReFT is also quick and easy to adapt to a new task? Here, we explore the limit of ReFT in an ICL learning algorithm which requires training with very limited examples.
diff --git a/examples/overhead/README.md b/examples/overhead/README.md
@@ -0,0 +1,3 @@
+# Inference-time Overhead Analysis
+
+Based on the script [`inference.ipynb`](https://github.com/stanfordnlp/pyreft/blob/main/examples/overhead/inference.ipynb).
diff --git a/examples/plots/plot.pdf b/examples/plots/plot.pdf
diff --git a/examples/plots/plot.py b/examples/plots/plot.py
@@ -83,7 +83,7 @@
 
 df = DataFrame(stats_flat)
 df["params"] *= 0.01
-df["color"] = df["name"].isin(["LoReFT"])
+df["color"] = ~df["name"].isin(["LoReFT"])
 df["model"] = df["model"].astype("category")
 df["model"].cat.set_categories(MODEL_ORDER, inplace=True)
 df["task"] = df["task"].astype("category")
@@ -101,4 +101,4 @@
  panel_grid_minor_x=element_blank(), panel_grid_minor_y=element_blank(), axis_text=element_text(size=7),
  strip_text=element_text(weight="bold"))
 )
-plot.save("plot.pdf", width=9, height=4, dpi=300)
+plot.save("plot.svg", width=9, height=4, dpi=300)