Skip to content

Latest commit

 

History

History
998 lines (865 loc) · 86.6 KB

README.md

File metadata and controls

998 lines (865 loc) · 86.6 KB

LLMSurvey

A collection of papers and resources related to Large Language Models.

The organization of papers refers to our survey "A Survey of Large Language Models". Paper page

Please let us know if you find out a mistake or have any suggestions by e-mail: [email protected]

(we suggest ccing another email [email protected] meanwhile, in case of any unsuccessful delivery issue.)

If you find our survey useful for your research, please cite the following paper:

@article{LLMSurvey,
    title={A Survey of Large Language Models},
    author={Zhao, Wayne Xin and Zhou, Kun and Li, Junyi and Tang, Tianyi and Wang, Xiaolei and Hou, Yupeng and Min, Yingqian and Zhang, Beichen and Zhang, Junjie and Dong, Zican and Du, Yifan and Yang, Chen and Chen, Yushuo and Chen, Zhipeng and Jiang, Jinhao and Ren, Ruiyang and Li, Yifan and Tang, Xinyu and Liu, Zikang and Liu, Peiyu and Nie, Jian-Yun and Wen, Ji-Rong},
    year={2023},
    journal={arXiv preprint arXiv:2303.18223},
    url={http://arxiv.org/abs/2303.18223}
}

🚀(New) We have released the Chinese book of our survey!

The Chinese book focuses on providing explanations for beginners in the field of LLMs, aiming to present a comprehensive framework and roadmap for LLMs. This book is suitable for senior undergraduate students and junior graduate students with a foundation in deep learning and can serve as an introductory technical book. You can download the Chinese book at https://llmbook-zh.github.io/.

chinese_version

🚀(New) The trends of the number of papers related to LLMs on arXiv

Here are the trends of the cumulative numbers of arXiv papers that contain the keyphrases “language model” (since June 2018) and “large language model” (since October 2019), respectively.

arxiv_llms

The statistics are calculated using exact match by querying the keyphrases in title or abstract by months. We set different x-axis ranges for the two keyphrases, because “language models” have been explored at an earlier time. We label the points corresponding to important landmarks in the research progress of LLMs. A sharp increase occurs after the release of ChatGPT: the average number of published arXiv papers that contain “large language model” in title or abstract goes from 0.40 per day to 8.58 per day.

🚀(New) Technical Evolution of GPT-series Models

A brief illustration for the technical evolution of GPT-series models. We plot this figure mainly based on the papers, blog articles and official APIs from OpenAI. Here, solid lines denote that there exists an explicit evidence (e.g., the official statement that a new model is developed based on a base model) on the evolution path between two models, while dashed lines denote a relatively weaker evolution relation.

gpt-series

🚀(New) Evolutionary Graph of LLaMA Family

An evolutionary graph of the research work conducted on LLaMA. Due to the huge number, we cannot include all the LLaMA variants in this figure, even much excellent work.

LLaMA_family

To support incremental update, we share the source file of this figure, and welcome the readers to include the desired models by submitting the pull requests on our GitHub page. If you're instrested, please request by application.

🚀(New) Prompts

We collect some useful tips for designing prompts that are collected from online notes and experiences from our authors, where we also show the related ingredients and principles (introduced in Section 8.1).

prompt examples

Please click here to view more detailed information.

Welcome everyone to provide us with more relevant tips in the form of issues. After selection, we will regularly update them on GitHub and indicate the source.

🚀(New) Experiments

Instruction Tuning Experiments

We will explore the effect of different types of instructions in fine-tuning LLMs (i.e., 7B LLaMA26), as well as examine the usefulness of several instruction improvement strategies.

instruction_tuning_table

Please click here to view more detailed information.

Ability Evaluaition Experiments

We conduct a fine-grained evaluation on the abilities discussed in Section 7.1 and Section 7.2. For each kind of ability, we select representative tasks and datasets for conducting evaluation experiments to examine the corresponding performance of LLMs.

ability_main

Please click here to view more detailed information.

We also call for support of computing power for conducting more comprehensive experiments.

Table of Contents

Timeline of LLMs

LLMs_timeline

List of LLMs

Category model Release Time Size(B) Link
Publicly
Accessbile
T5 2019/10 11 Paper
mT5 2021/03 13 Paper
PanGu-α 2021/05 13 Paper
CPM-2 2021/05 198 Paper
T0 2021/10 11 Paper
GPT-NeoX-20B 2022/02 20 Paper
CodeGen 2022/03 16 Paper
Tk-Instruct 2022/04 11 Paper
UL2 2022/02 20 Paper
OPT 2022/05 175 Paper
YaLM 2022/06 100 GitHub
NLLB 2022/07 55 Paper
BLOOM 2022/07 176 Paper
GLM 2022/08 130 Paper
Flan-T5 2022/10 11 Paper
mT0 2022/11 13 Paper
Galatica 2022/11 120 Paper
BLOOMZ 2022/11 176 Paper
OPT-IML 2022/12 175 Paper
Pythia 2023/01 12 Paper
LLaMA 2023/02 65 Paper
Vicuna 2023/03 13 Blog
ChatGLM 2023/03 6 GitHub
CodeGeeX 2023/03 13 Paper
Alpaca 2023/03 7 Blog
Koala 2023/04 13 Blog
Mistral 2023/09 7 Blog
Closed
Source
GShard 2020/01 600 Paper
GPT-3 2020/05 175 Paper
LaMDA 2021/05 137 Paper
HyperCLOVA 2021/06 82 Paper
Codex 2021/07 12 Paper
ERNIE 3.0 2021/07 10 Paper
Jurassic-1 2021/08 178 Paper
FLAN 2021/10 137 Paper
MT-NLG 2021/10 530 Paper
Yuan 1.0 2021/10 245 Paper
Anthropic 2021/12 52 Paper
WebGPT 2021/12 175 Paper
Gopher 2021/12 280 Paper
ERNIE 3.0 Titan 2021/12 260 Paper
GLaM 2021/12 1200 Paper
InstructGPT 2022/01 175 Paper
AlphaCode 2022/02 41 Paper
Chinchilla 2022/03 70 Paper
PaLM 2022/04 540 Paper
Cohere 2022/06 54 Homepage
AlexaTM 2022/08 20 Paper
Luminous 2022/09 70 Docs
Sparrow 2022/09 70 Paper
WeLM 2022/09 10 Paper
U-PaLM 2022/10 540 Paper
Flan-PaLM 2022/10 540 Paper
Flan-U-PaLM 2022/10 540 Paper
GPT-4 2023/3 - Paper
PanGU-Σ 2023/3 1085 Paper

Paper List

Resources of LLMs

Publicly Available Models

  1. T5: "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer". Colin Raffel et al. JMLR 2019. [Paper] [Checkpoint]
  2. mT5: "mT5: A massively multilingual pre-trained text-to-text transformer". Linting Xue et al. NAACL 2021. [Paper] [Checkpoint]
  3. PanGu-α: "PanGu-α: Large-scale Autoregressive Pretrained Chinese Language Models with Auto-parallel Computation". Wei Zeng et al. arXiv 2021. [Paper] [Checkpoint]
  4. CPM-2: "CPM-2: Large-scale Cost-effective Pre-trained Language Models". Zhengyan Zhang et al. arXiv 2021. [Paper] [Checkpoint]
  5. T0: "Multitask Prompted Training Enables Zero-Shot Task Generalization". Victor Sanh et al. ICLR 2022. [Paper] [Checkpoint]
  6. GPT-NeoX-20B: "GPT-NeoX-20B: An Open-Source Autoregressive Language Model". Sid Black et al. arXiv 2022. [Paper] [Checkpoint]
  7. CodeGen: "CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis". Erik Nijkamp et al. arXiv 2022. [Paper] [Checkpoint]
  8. Tk-Instruct: "Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks". Yizhong Wang et al. EMNLP 2022. [Paper] [Checkpoint]
  9. UL2: "UL2: Unifying Language Learning Paradigms". Yi Tay et al. arXiv 2022. [Paper] [Checkpoint]
  10. OPT: "OPT: Open Pre-trained Transformer Language Models". Susan Zhang et al. arXiv 2022. [Paper] [Checkpoint]
  11. NLLB: "No Language Left Behind: Scaling Human-Centered Machine Translation". NLLB Team. arXiv 2022. [Paper] [Checkpoint]
  12. BLOOM: "BLOOM: A 176B-Parameter Open-Access Multilingual Language Model". BigScience Workshop. arXiv 2022. [Paper] [Checkpoint]
  13. GLM: "GLM-130B: An Open Bilingual Pre-trained Model". Aohan Zeng et al. arXiv 2022. [Paper] [Checkpoint]
  14. Flan-T5: "Scaling Instruction-Finetuned Language Models". Hyung Won Chung et al. arXiv 2022. [Paper] [Checkpoint]
  15. mT0 && BLOOMZ: "Crosslingual Generalization through Multitask Finetuning". Niklas Muennighoff et al. arXiv 2022. [Paper] [Checkpoint]
  16. Galactica: "Galactica: A Large Language Model for Science". Ross Taylor et al. arXiv 2022. [Paper] [Checkpoint]
  17. OPT-IML: "OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization". Srinivasan et al. . arXiv 2022. [Paper] [Checkpoint]
  18. CodeGeeX: "CodeGeeX: A Pre-Trained Model for Code Generation with Multilingual Evaluations on HumanEval-X". Qinkai Zheng et al. . arXiv 2023. [Paper] [Checkpoint]
  19. Pythia: "Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling". Stella Biderman et al. . arXiv 2023. [Paper] [Checkpoint]
  20. LLaMA: "LLaMA: Open and Efficient Foundation Language Models". Hugo Touvron et al. arXiv 2023. [Paper] [Checkpoint]

Closed-source Models

  1. GShard: "GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding". Dmitry Lepikhin et al. ICLR 2021. [Paper]
  2. GPT-3: "Language Models are Few-Shot Learners". Tom B. Brown et al. NeurIPS 2020. [Paper]
  3. LaMDA: "LaMDA: Language Models for Dialog Applications". Romal Thoppilan et al. arXiv 2021. [Paper]
  4. HyperCLOVA: "What Changes Can Large-scale Language Models Bring? Intensive Study on HyperCLOVA: Billions-scale Korean Generative Pretrained Transformers". Boseop Kim et al. EMNLP 2021. [Paper]
  5. CodeX: "Evaluating Large Language Models Trained on Code". Mark Chen et al. arXiv 2021. [Paper]
  6. ERNIE 3.0: "ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation". Yu Sun et al. arXiv 2021. [Paper]
  7. Jurassic-1: "Jurassic-1: Technical details and evaluation". Opher Lieber et al. 2021. [Paper]
  8. FLAN: "Finetuned Language Models Are Zero-Shot Learners". Jason Wei et al. ICLR 2021. [Paper]
  9. MT-NLG: "Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model". Shaden Smith et al. arXiv 2021. [Paper]
  10. Yuan 1.0: "Yuan 1.0: Large-Scale Pre-trained Language Model in Zero-Shot and Few-Shot Learning". Shaohua Wu et al. arXiv 2021. [Paper]
  11. Anthropic: "A General Language Assistant as a Laboratory for Alignment" . Amanda Askell et al. arXiv 2021. [Paper]
  12. WebGPT: "WebGPT: Browser-assisted question-answering with human feedback" . Reiichiro Nakano et al. arXiv 2021. [Paper]
  13. Gopher: "Scaling Language Models: Methods, Analysis & Insights from Training Gopher". Jack W. Rae et al. arXiv 2021. [Paper]
  14. ERNIE 3.0 Titan: "ERNIE 3.0 Titan: Exploring Larger-scale Knowledge Enhanced Pre-training for Language Understanding and Generation". *Shuohuan Wang et al. *arXiv 2021. [Paper]
  15. GLaM: "GLaM: Efficient Scaling of Language Models with Mixture-of-Experts". Nan Du et al. ICML 2022. [Paper]
  16. InstructGPT: "Training language models to follow instructions with human feedback". Long Ouyang et al. arXiv 2022. [Paper]
  17. AlphaCode: "Competition-Level Code Generation with AlphaCode". Yujia Li et al. arXiv 2022. [Paper]
  18. Chinchilla: "Training Compute-Optimal Large Language Models". Jordan Hoffmann et al. arXiv. [Paper]
  19. PaLM: "PaLM: Scaling Language Modeling with Pathways". Aakanksha Chowdhery et al. arXiv 2022. [Paper]
  20. AlexaTM: "AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model". Saleh Soltan et al. arXiv 2022. [Paper]
  21. Sparrow: "Improving alignment of dialogue agents via targeted human judgements". Amelia Glaese et al. . arXiv 2022. [Paper]
  22. WeLM: "WeLM: A Well-Read Pre-trained Language Model for Chinese". Hui Su et al. . arXiv 2022. [Paper]
  23. U-PaLM: "Transcending Scaling Laws with 0.1% Extra Compute". Yi Tay et al. arXiv 2022. [Paper]
  24. Flan-PaLM && Flan-U-PaLM: "Scaling Instruction-Finetuned Language Models". Hyung Won Chung et al. arXiv. [Paper]
  25. GPT-4: "GPT-4 Technical Report". OpenAI. arXiv 2023. [Paper]
  26. PanGu-Σ: "PanGu-Σ: Towards Trillion Parameter Language Model with Sparse Heterogeneous Computing". Xiaozhe Ren et al. arXiv 2023. [Paper]

Commonly Used Corpora

  1. BookCorpus: "Aligning Books and Movies: Towards Story-like Visual Explanations by Watching Movies and Reading Books". Yukun Zhu et al. ICCV 2015. [Paper] [Source]
  2. Guntenburg: [Source]
  3. CommonCrawl: [Source]
  4. C4: "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer". Colin Raffel et al. JMLR 2019. [Paper] [Source]
  5. CC-stories-R: "A Simple Method for Commonsense Reasoning". Trieu H. Trinh el al. arXiv 2018. [Paper] [Source]
  6. CC-NEWS: "RoBERTa: A Robustly Optimized BERT Pretraining Approach". Yinhan Liu et al. arXiv 2019. [Paper] [Source]
  7. REALNEWs: "Defending Against Neural Fake News". Rowan Zellers et al. NeurIPS 2019. [Paper] [Source]
  8. OpenWebText: [Source]
  9. Pushshift.io: "The Pushshift Reddit Dataset". Jason Baumgartner et al. AAAI 2020. [Paper] [Source]
  10. Wikipedia: [Source]
  11. BigQuery: [Source]
  12. The Pile: "The Pile: An 800GB Dataset of Diverse Text for Language Modeling". Leo Gao et al. arxiv 2021. [Paper] [Source]
  13. ROOTS: "The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset". Laurençon et al. NeurIPS 2022 Datasets and Benchmarks Track. [paper]

Library Resource

  1. Transformers: "Transformers: State-of-the-Art Natural Language Processing". Thomas Wolf et al. EMNLP 2020. [Paper] [Source]
  2. DeepSpeed: "Deepspeed: System optimizations enable training deep learning models with over 100 billion parameters". Rasley et al. KDD 2020. [Paper] [Source]
  3. Megatron-LM: "Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism". Mohammad Shoeybi et al. arXiv 2019. [Paper] [Source]
  4. JAX: [Source]
  5. Colossal-AI: "Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training". Zhengda Bian et al. arXiv 2021. [Paper] [Source]
  6. BMTrain: [Source]
  7. FastMoE: "FastMoE: A Fast Mixture-of-Expert Training System". Jiaao He et al. arXiv 2021. [Paper] [Source]

Deep Learning Frameworks

  1. Pytorch: "PyTorch: An Imperative Style, High-Performance Deep Learning Library". Adam Paszke el al. NeurIPS 2019. [Paper] [Source]
  2. TensorFlow: "TensorFlow: A system for large-scale machine learning". Martín Abadi et al. OSDI 2016. [Paper] [Source]
  3. MXNet: "MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems". Tianqi Chen et al. arXiv 2015. [Paper] [Source]
  4. PaddlePaddle: "PaddlePaddle: An Open-Source Deep Learning Platform from Industrial Practice" . Yanjun Ma et al. Frontiers of Data and Domputing 2019. [Paper] [Source]
  5. MindSpore: "Huawei MindSpore AI Development Framework" . Huawei Technologies Co., Ltd. Artificial Intelligence Technology 2022. [Paper] [Source]
  6. OneFlow: "OneFlow: Redesign the Distributed Deep Learning Framework from Scratch" . Jinhui Yuan et al. arXiv 2021. [Paper] [Source]

Pre-training

Data Collection

  1. "The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset". Laurençon et al. NeurIPS 2022 Datasets and Benchmarks Track. [paper]
  2. "Deduplicating Training Data Makes Language Models Better". Katherine Lee et al. ACL 2022. [paper]
  3. "Deduplicating Training Data Mitigates Privacy Risks in Language Models". Nikhil Kandpal et al. ICML 2022. [paper]
  4. "Scaling Laws and Interpretability of Learning from Repeated Data". Danny Hernandez et al. arXiv 2022. [paper]
  5. "A Pretrainer's Guide to Training Data: Measuring the Effects of Data Age, Domain Coverage, Quality, & Toxicity". Shayne Longpre et al. arXiv 2023. [paper]

Architecture

Mainstream Architectures

Causal Decoder

  1. "Language Models are Few-Shot Learners". Tom B. Brown et al. NeurIPS 2020. [paper]
  2. "OPT: Open Pre-trained Transformer Language Models". Susan Zhang et al. arXiv 2022. [paper]
  3. "BLOOM: A 176B-Parameter Open-Access Multilingual Language Model". Teven Le Scao et al. arXiv 2022. [paper]
  4. "Training Compute-Optimal Large Language Models". Jordan Hoffmann et al. arXiv 2022. [paper]
  5. "Scaling Language Models: Methods, Analysis & Insights from Training Gopher". Jack W. Rae et al. arXiv 2021. [paper]
  6. "Galactica: A Large Language Model for Science". Ross Taylor et al. arXiv 2022. [paper]
  7. "PaLM: Scaling Language Modeling with Pathways". Aakanksha Chowdhery et al. arXiv 2022. [paper]
  8. "Jurassic-1: Technical Details and Evaluation". Opher Lieber et al. AI21 Labs. [paper]
  9. "LaMDA: Language Models for Dialog Applications". Romal Thoppilan et al. arXiv 2022. [paper]

Prefix Decoder

  1. "GLM-130B: An Open Bilingual Pre-trained Model". Aohan Zeng et al. arXiv 2022. [paper]
  2. "GLM: General Language Model Pretraining with Autoregressive Blank Infilling". Zhengxiao Du et al. ACL 2022. [paper]
  3. "Transcending Scaling Laws with 0.1% Extra Compute". Yi Tay et al. arXiv 2022. [paper]

MoE

  1. "Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity". William Fedus et al. JMLR. [paper]
  2. "Unified Scaling Laws for Routed Language Models". Aidan Clark et al. ICML 2022. [paper]

SSM

  1. "Pretraining Without Attention". Junxiong Wang et al. arXiv 2022. [paper]
  2. "Efficiently Modeling Long Sequences with Structured State Spaces". Albert Gu et al. ICLR 2022. [paper]
  3. "Long Range Language Modeling via Gated State Spaces". Harsh Mehta et al. arXiv 2022. [paper]
  4. "Hungry Hungry Hippos: Towards Language Modeling with State Space Models". Daniel Y. Fu et al. ICLR 2023. [paper]
Detailed Configuration

Layer Normalization

  1. RMSNorm: "Root Mean Square Layer Normalization". Biao Zhang et al. NeurIPS 2019. [paper]
  2. DeepNorm: "DeepNet: Scaling Transformers to 1,000 Layers". Hongyu Wang et al. arXiv 2022. [paper]
  3. Sandwich-LN: "CogView: Mastering Text-to-Image Generation via Transformers". Ming Ding et al. NeirIPS 2021. [paper]

Position Encoding

  1. T5 bias: "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer". Colin Raffel et al. JMLR 2019. [paper]
  2. ALiBi: "Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation". Ofir Press et al. ICLR 2022. [paper]
  3. RoPE: "RoFormer: Enhanced Transformer with Rotary Position Embedding". Jianlin Su et al. arXiv 2021. [paper]
  4. xPos: "A Length-Extrapolatable Transformer". Yutao Sun et al. arXiv 2022. [paper]

Attention

  1. Multi-query attention: "Fast Transformer Decoding: One Write-Head is All You Need". Noam Shazeer. arXiv 2019. [paper]
  2. FlashAttention: "FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness". Tri Dao et al. NeurIPS 2022. [paper]
  3. PagedAttention: "vLLM: Easy, Fast, and Cheap LLM Serving with PagedAttention". Woosuk Kwon et al. 2023. paper(Stay Tuned) [Offical WebSite]
Analysis
  1. "What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization?". Thomas Wang et al. ICML 2022. [paper]
  2. "What Language Model to Train if You Have One Million GPU Hours?". Teven Le Scao et al. Findings of EMNLP 2022. [paper]
  3. "Examining Scaling and Transfer of Language Model Architectures for Machine Translation". Biao Zhang et al. ICML 2022. [paper]
  4. "Scaling Laws vs Model Architectures: How does Inductive Bias Influence Scaling?". Yi Tay et al. arXiv 2022. [paper]
  5. "Do Transformer Modifications Transfer Across Implementations and Applications?". Sharan Narang et al. EMNLP 2021. [paper]

Training Algorithms

  1. "Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism". Mohammad Shoeybi et al. arXiv 2019. [paper]
  2. "An Efficient 2D Method for Training Super-Large Deep Learning Models". Qifan Xu et al. arXiv 2021. [paper]
  3. "Tesseract: Parallelize the Tensor Parallelism Efficiently". Boxiang Wang et al. ICPP 2022. [paper]
  4. "Maximizing Parallelism in Distributed Training for Huge Neural Networks". Zhengda Bian et al. arXiv 2021. [paper]
  5. "GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism". Yanping Huang et al. NeurIPS 2019. [paper]
  6. "PipeDream: Fast and Efficient Pipeline Parallel DNN Training". Aaron Harlap et al. arXiv 2018. [paper]
  7. "ZeRO: Memory Optimizations Toward Training Trillion Parameter Models". Samyam Rajbhandari et al. SC 2020. [paper]
  8. "ZeRO-Offload: Democratizing Billion-Scale Model Training". Jie Ren et al. USENIX 2021. [paper]

Pre-training on Code

LLMs for Program Synthesis
  1. "Evaluating Large Language Models Trained on Code". Mark Chen et al. arXiv 2021. [paper]
  2. "Program Synthesis with Large Language Models". Jacob Austin et al. arXiv 2021. [paper]
  3. "Show Your Work: Scratchpads for Intermediate Computation with Language Models". Maxwell Nye et al. arXiv 2021. [paper]
  4. "A Systematic Evaluation of Large Language Models of Code". Frank F. Xu et al. arXiv 2022. [paper]
  5. "Competition-Level Code Generation with AlphaCode". Yujia Li et al. Science. [paper]
  6. "CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis". Erik Nijkamp et al. ICLR 2023. [paper]
  7. "InCoder: A Generative Model for Code Infilling and Synthesis". Daniel Fried et al. ICLR 2023. [paper]
  8. "CodeT: Code Generation with Generated Tests". Bei Chen et al. ICLR 2023. [paper]
  9. "StarCoder: may the source be with you!". Raymond Li et al. arXiv 2023. [paper]
NLP Tasks Formatted as Code
  1. "Language Models of Code are Few-Shot Commonsense Learners". Aman Madaan et al. EMNLP 2022. [paper]
  2. "Autoformalization with Large Language Models". Yuhuai Wu et al. NeurIPS 2022. [paper]

Adaptation Tuning

Instruction Tuning

  1. "Multi-Task Deep Neural Networks for Natural Language Understanding". Xiaodong Liu et al. ACL 2019. [Paper] [Homepage]
  2. "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer". Colin Raffel et al. JMLR 2020. [Paper] [Checkpoint]
  3. "Muppet: Massive Multi-task Representations with Pre-Finetuning". Armen Aghajanyan et al. EMNLP 2021. [Paper] [Checkpoint]
  4. "Cross-Task Generalization via Natural Language Crowdsourcing Instructions". Swaroop Mishra et al. ACL 2022. [Paper] [Collection]
  5. "Finetuned Language Models Are Zero-Shot Learners". Jason Wei et al. ICLR 2022. [Paper] [Homepage]
  6. "Multitask Prompted Training Enables Zero-Shot Task Generalization". Victor Sanh et al. ICLR 2022. [Paper] [Checkpoint]
  7. "PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts". Stephen H. Bach et al. ACL 2022. [Paper] [Collection]
  8. "Training language models to follow instructions with human feedback". Long Ouyang et al. arXiv 2022. [Paper]
  9. "Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks". Yizhong Wang et al. EMNLP 2022. [Paper] [Collection] [Checkpoint]
  10. "MVP: Multi-task Supervised Pre-training for Natural Language Generation". Tianyi Tang et al. arXiv 2022. [Paper] [Collection] [Checkpoint]
  11. "Crosslingual Generalization through Multitask Finetuning". Niklas Muennighoff et al. arXiv 2022. [Paper] [Collection] [Checkpoint]
  12. "Scaling Instruction-Finetuned Language Models". Hyung Won Chung et al. arXiv 2022. [Paper] [Homepage]
  13. "Unnatural Instructions: Tuning Language Models with (Almost) No Human Labor". Or Honovich et al. arXiv 2022. [Paper] [Homepage]
  14. "Self-Instruct: Aligning Language Model with Self Generated Instructions". Yizhong Wang et al. arXiv 2022. [Paper] [Homepage]
  15. "OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization". Srinivasan Iyer et al. arXiv 2022. [Paper] [Checkpoint]
  16. "The Flan Collection: Designing Data and Methods for Effective Instruction Tuning". Shayne Longpre et al. arXiv 2023. [Paper] [Homepage]
  17. "Is Prompt All You Need No. A Comprehensive and Broader View of Instruction Learning". Renze Lou et al. arXiv 2023. [Paper]
  18. "Maybe Only 0.5% Data is Needed: A Preliminary Exploration of Low Training Data Instruction Tuning". Hao Chen et al. arXiv 2023. [Paper]
  19. "LIMA: Less Is More for Alignment". Chunting Zhou. arXiv 2023. [Paper]

Alignment Tuning

  1. "TAMER: Training an Agent Manually via Evaluative Reinforcement". W. Bradley Knox et al. ICDL 2008. [Paper]
  2. "Interactive Learning from Policy-Dependent Human Feedback". James MacGlashan et al. ICML 2017. [Paper]
  3. "Deep Reinforcement Learning from Human Preferences". Paul Christiano et al. NIPS 2017. [Paper]
  4. "Deep TAMER: Interactive Agent Shaping in High-Dimensional State Spaces". Garrett Warnell et al. AAAI 2018. [Paper]
  5. "Fine-Tuning Language Models from Human Preferences". Daniel M. Ziegler et al. arXiv 2019. [Paper]
  6. "Learning to summarize from human feedback". Nisan Stiennon et al. NeurIPS 2020. [Paper]
  7. "Alignment of Language Agents". Zachary Kenton et al. arXiv 2021. [Paper]
  8. "Recursively Summarizing Books with Human Feedback". Jeff Wu et al. arXiv 2021. [Paper]
  9. "A General Language Assistant as a Laboratory for Alignment". Amanda Askell et al. arXiv 2021. [Paper]
  10. "WebGPT: Browser-assisted question-answering with human feedback". Reiichiro Nakano et al. arXiv 2021. [Paper]
  11. "Training language models to follow instructions with human feedback". Long Ouyang et al. arXiv 2022. [Paper]
  12. "Teaching language models to support answers with verified quotes". Jacob Menick et al. arXiv 2022. [Paper]
  13. "Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback". Yuntao Bai et al. arXiv 2022. [Paper]
  14. "Dynamic Planning in Open-Ended Dialogue using Reinforcement Learning". Deborah Cohen et al. arXiv 2022. [Paper]
  15. "Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned". Deep Ganguli et al. arXiv 2022. [Paper]
  16. "Improving alignment of dialogue agents via targeted human judgements". Amelia Glaese et al. arXiv 2022. [Paper]
  17. "Is Reinforcement Learning (Not) for Natural Language Processing: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization". Rajkumar Ramamurthy et al. arXiv 2022. [Paper]
  18. "Scaling Laws for Reward Model Overoptimization". Leo Gao et al. arXiv 2022. [Paper]
  19. "The Wisdom of Hindsight Makes Language Models Better Instruction Followers". Tianjun Zhang et al. arXiv 2023. [Paper]
  20. "RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment". Hanze Dong et al. arXiv 2023. [Paper]
  21. "Red-Teaming Large Language Models using Chain of Utterances for Safety-Alignment". Rishabh Bhardwaj et al. arXiv 2023. [Paper]

Parameter-Efficient Model Adaptation

  1. "Parameter-Efficient Transfer Learning for NLP". Neil Houlsby et al. ICML 2019. [Paper] [GitHub]
  2. "MAD-X: An Adapter-Based Framework for Multi-Task Cross-Lingual Transfer". Jonas Pfeiffer et al. EMNLP 2020. [Paper] [GitHub]
  3. "AUTOPROMPT: Eliciting Knowledge from Language Models with Automatically Generated Prompts". Taylor Shin et al. EMNLP 2020. [Paper] [GitHub]
  4. "Prefix-Tuning: Optimizing Continuous Prompts for Generation". Xiang Lisa Li et al. ACL 2021. [Paper] [GitHub]
  5. "GPT Understands, Too". Xiao Liu et al. arXiv 2021. [Paper] [GitHub]
  6. "The Power of Scale for Parameter-Efficient Prompt Tuning". Brian Lester et al. EMNLP 2021. [Paper]
  7. "LoRA: Low-Rank Adaptation of Large Language Models". Edward J. Hu et al. arXiv 2021. [Paper] [GitHub]
  8. "Towards a Unified View of Parameter-Efficient Transfer Learning". Junxian He et al. ICLR 2022. [Paper] [GitHub]
  9. "P-Tuning v2: Prompt Tuning Can Be Comparable to Fine-tuning Universally Across Scales and Tasks". Xiao Liu et al. ACL 2022. [Paper] [GitHub]
  10. "DyLoRA: Parameter-Efficient Tuning of Pre-trained Models using Dynamic Search-Free Low-Rank Adaptation". Mojtaba Valipour et al. EACL 2023. [Paper] [GitHub]
  11. "Parameter-efficient fine-tuning of large-scale pre-trained language models". Ning Ding et al. Nat Mach Intell. [Paper] [GitHub]
  12. "Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning". Qingru Zhang et al. arXiv 2023. [Paper] [GitHub]
  13. "LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention". Renrui Zhang et al. arXiv 2023. [Paper] [GitHub]
  14. "LLM-Adapters: An Adapter Family for Parameter-Efficient Fine-Tuning of Large Language Models". Zhiqiang Hu et al. arXiv 2023. [Paper] [GitHub]

Memory-Efficient Model Adaptation

  1. "A Survey of Quantization Methods for Efficient Neural Network Inference". Amir Gholami et al. arXiv 2021. [Paper]
  2. "8-bit Optimizers via Block-wise Quantization". Tim Dettmers et al. arXiv 2021. [Paper]
  3. "Compression of Generative Pre-trained Language Models via Quantization". Chaofan Tao et al. ACL 2022. [Paper]
  4. "ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers". Zhewei Yao et al. NeurIPS 2022. [Paper] [GitHub]
  5. "LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale". Tim Dettmers et al. arXiv 2022. [Paper] [GitHub]
  6. "GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers". Elias Frantar et al. ICLR 2023. [Paper] [GitHub]
  7. "SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models". Guangxuan Xiao et al. arXiv 2022. [Paper] [GitHub]
  8. "The case for 4-bit precision: k-bit Inference Scaling Laws". Tim Dettmers et al. arXiv 2022. [Paper]
  9. "ZeroQuant-V2: Exploring Post-training Quantization in LLMs from Comprehensive Study to Low Rank Compensation". Zhewei Yao et al. arXiv 2023. [Paper]
  10. "QLoRA: Efficient Finetuning of Quantized LLMs". Tim Dettmers et al. arXiv 2023. [Paper] [GitHub]
  11. "LLM-QAT: Data-Free Quantization Aware Training for Large Language Models". Zechun Liu et al. arXiv 2023. [Paper]
  12. "AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration". Ji Lin et al. arXiv 2023. [Paper] [GitHub]

Utilization

In-Context Learning (ICL)

  1. "An Information-theoretic Approach to Prompt Engineering Without Ground Truth Labels". Taylor Sorensen et al. ACL 2022. [Paper]
  2. "What Makes Good In-Context Examples for GPT-3?". Jiachang Liu et al. ACL 2022. [Paper]
  3. "Learning to retrieve prompts for in-context learning". Ohad Rubin et al. NAACL 2022. [Paper]
  4. "Diverse demonstrations improve in-context compositional generalization". Itay Levy et al. arXiv 2022. [Paper]
  5. "Demystifying Prompts in Language Models via Perplexity Estimation". Hila Gonen et al. arXiv 2022. [Paper]
  6. "Active Example Selection for In-Context Learning". Yiming Zhang et al. EMNLP 2022. [Paper]
  7. "Self-adaptive In-context Learning". Zhiyong Wu et al. arXiv 2022. [Paper]
  8. "Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensitivity". Yao Lu et al. ACL 2022. [Paper]
  9. "Structured Prompting: Scaling In-Context Learning to 1,000 Examples". Hao, Yaru et al. arXiv 2022. [Paper]
  10. "The Unreliability of Explanations in Few-shot Prompting for Textual Reasoning". Ye, Xi et al. arXiv 2022. [Paper]
  11. "Cross-Task Generalization via Natural Language Crowdsourcing Instructions". Swaroop Mishra et al. ACL 2022. [Paper]
  12. "Prompt-Augmented Linear Probing: Scaling Beyond the Limit of Few-shot In-Context Learner". Hyunsoo Cho et al. arXiv 2022. [Paper]
  13. "An Explanation of In-context Learning as Implicit Bayesian Inference". Sang Michael Xie et al. ICLR 2022. [Paper]
  14. "Calibrate Before Use: Improving Few-Shot Performance of Language Models". Zihao Zhao et al. ICML 2021. [Paper]
  15. "Data distributional properties drive emergent in-context learning in transformers". Stephanie C. Y. Chan et al. arXiv 2022. [Paper]
  16. "In-context Learning and Induction Heads". Catherine Olsson et al. arXiv 2022. [Paper]
  17. "On the Effect of Pretraining Corpora on In-context Learning by a Large-scale Language Model". Seongjin Shin et al. NAACL 2022. [Paper]
  18. "Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?". Sewon Min et al. EMNLP 2022. [Paper]
  19. "Rethinking the Role of Scale for In-Context Learning: An Interpretability-based Case Study at 66 Billion Scale". Hritik Bansal et al. arXiv 2022. [Paper]
  20. "Transformers as algorithms: Generalization and implicit model selection in in-context learning". Yingcong Li et al. arXiv 2023. [Paper]
  21. "Transformers learn in-context by gradient descent". Johannes von Oswald et al. arXiv 2022. [Paper]
  22. "What learning algorithm is in-context learning? investigations with linear models". Ekin Aky{"{u}}rek et al. arXiv 2022. [Paper]
  23. "A Survey for In-context Learning". Qingxiu Dong et al. arXiv 2023. [Paper]
  24. What In-Context Learning "Learns" In-Context: Disentangling Task Recognition and Task Learning. Jane Pan et al. arXiv 2023. [Paper]
  25. The Learnability of In-Context Learning. Noam Wies et al. arXiv 2023. [Paper]
  26. Do Prompt-Based Models Really Understand the Meaning of Their Prompts? Albert Webson et al. NAACL 2022. [Paper]
  27. Larger language models do in-context learning differently. Jerry Wei. arXiv 2023. [Paper]
  28. Meta-in-context learning in large language models. Julian Coda-Forno. arXiv 2023. [Paper]
  29. Symbol tuning improves in-context learning in language models. Jerry Wei. arXiv 2023. [Paper]

Chain-of-Thought Reasoning (CoT)

  1. "Automatic Chain of Thought Prompting in Large Language Models". Zhuosheng Zhang et al. arXiv 2022. [Paper]
  2. "Chain of Thought Prompting Elicits Reasoning in Large Language Models". Jason Wei et al. arXiv 2022. [Paper]
  3. "STaR: Self-Taught Reasoner Bootstrapping Reasoning With Reasoning". Zelikman et al. arXiv 2022. [Paper]
  4. "Large language models are zero-shot reasoners". Takeshi Kojima et al. arXiv 2022. [Paper]
  5. "Automatic Chain of Thought Prompting in Large Language Models". Zhuosheng Zhang et al. arXiv. [Paper]
  6. "Complexity-Based Prompting for Multi-Step Reasoning". Yao Fu et al. arXiv 2022. [Paper]
  7. "Language Models are Multilingual Chain-of-Thought Reasoners". Freda Shi et al. arXiv 2022. [Paper]
  8. "Rationale-Augmented Ensembles in Language Models". Xuezhi Wang et al. arXiv 2022. [Paper]
  9. "Least-to-Most Prompting Enables Complex Reasoning in Large Language Models". Denny Zhou et al. arXiv 2022. [Paper]
  10. "Multimodal Chain-of-Thought Reasoning in Language Models". Zhuosheng Zhang et al. arXiv 2023. [Paper]
  11. "Self-Consistency Improves Chain of Thought Reasoning in Language Models". Xuezhi Wang et al. arXiv 2022. [Paper]
  12. "Large Language Models Can Self-Improve". Jiaxin Huang et al. arXiv 2022. [Paper]
  13. "Training Verifiers to Solve Math Word Problems". Karl Cobbe et al. arXiv 2021. [Paper]
  14. "On the Advance of Making Language Models Better Reasoners". Yifei Li et al. arXiv 2022. [Paper]
  15. "Large Language Models are reasoners with Self-Verification". Yixuan Weng et al. arXiv 2022. [Paper]
  16. "Teaching small language models to reason". Lucie Charlotte Magister et al. arXiv 2022. [Paper]
  17. "Large language models are reasoning teachers". Namgyu Ho et al. arXiv 2022. [Paper]
  18. "The Unreliability of Explanations in Few-shot Prompting for Textual Reasoning". Ye, Xi et al. arXiv 2022. [Paper]
  19. "Scaling Instruction-Finetuned Language Models". Hyung Won Chung et al. arXiv 2022. [Paper]
  20. "Solving Quantitative Reasoning Problems with Language Models". Aitor Lewkowycz et al. arXiv 2022. [Paper]
  21. "Text and patterns: For effective chain of thought, it takes two to tango". Aman Madaan et al. arXiv 2022. [Paper]
  22. "Challenging BIG-Bench tasks and whether chain-of-thought can solve them". Mirac Suzgun et al. arXiv 2022. [Paper]
  23. "Reasoning with Language Model Prompting: A Survey". Shuofei Qiao et al. arXiv 2022. [Paper]
  24. "Towards Reasoning in Large Language Models: A Survey". Jie Huang et al. arXiv 2022. [Paper]

Planning for Complex Task Solving

  1. Least-to-Most Prompting Enables Complex Reasoning in Large Language Models. Denny Zhou et al. ICLR 2023. [Paper]
  2. PAL: Program-aided Language Models. Luyu Gao et al. ICML 2023. [Paper]
  3. Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models. Lei Wang et al. ACL 2023. [Paper]
  4. ProgPrompt: Generating Situated Robot Task Plans using Large Language Models. Ishika Singh et al. ICRA 2022. [Paper]
  5. Tree of Thoughts: Deliberate Problem Solving with Large Language Models. Shunyu Yao et al. arXiv 2023. [Paper]
  6. Voyager: An Open-Ended Embodied Agent with Large Language Models. Guanzhi Wang et al. arXiv 2023. [Paper]
  7. Reflexion: Language Agents with Verbal Reinforcement Learning. Noah Shinn et al. arXiv 2023. [Paper]
  8. Multimodal Procedural Planning via Dual Text-Image Prompting. Yujie Lu et al. arXiv 2023. [Paper]
  9. Self-planning Code Generation with Large Language Model. Xue Jiang et al. arXiv 2023. [Paper]
  10. Decomposed Prompting: A Modular Approach for Solving Complex Tasks. Tushar Khot et al. ICLR 2023 [Paper]
  11. Toolformer: Language Models Can Teach Themselves to Use Tools. Timo Schick et al. arXiv 2023. [Paper]
  12. HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face. Yongliang Shen et al. arXiv 2023. [Paper]
  13. Faithful Chain-of-Thought Reasoning. Qing Lyu et al. arXiv 2023. [Paper]
  14. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. Bo Liu et al. arXiv 2023. [Paper]
  15. Reasoning with Language Model is Planning with World Model. Shibo Hao et al. arXiv 2023. [Paper]
  16. Generative Agents: Interactive Simulacra of Human Behavior. Joon Sung Park et al. arXiv 2023. [Paper]
  17. ReAct: Synergizing Reasoning and Acting in Language Models. Shunyu Yao et al. ICLR 2023. [Paper]
  18. ChatCoT: Tool-Augmented Chain-of-Thought Reasoning on Chat-based Large Language Models. Zhipeng Chen et al. arXiv 2023. [Paper]
  19. Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents. Zihao Wang et al. arXiv 2023. [Paper]
  20. AdaPlanner: Adaptive Planning from Feedback with Language Models. Haotian Sun et al. arXiv 2023. [Paper]

Capacity Evaluation

  1. "Measuring Massive Multitask Language Understanding". Dan Hendrycks et al. ICLR 2021. [Paper]
  2. "Persistent Anti-Muslim Bias in Large Language Models". Abubakar Abid et al. AIES 2021. [Paper]
  3. "Understanding the Capabilities, Limitations, and Societal Impact of Large Language Models". Alex Tamkin et al. arXiv 2021. [Paper]
  4. "BEHAVIOR: Benchmark for Everyday Household Activities in Virtual, Interactive, and Ecological Environments". Sanjana Srivastava et al. CoRL 2021. [Paper]
  5. "Program Synthesis with Large Language Models". Jacob Austin et al. arXiv 2021. [Paper]
  6. "Training Verifiers to Solve Math Word Problems". Karl Cobbe et al. arXiv 2021. [Paper]
  7. "Show Your Work: Scratchpads for Intermediate Computation with Language Models". Maxwell I. Nye et al. arXiv 2021. [Paper]
  8. "Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents". Wenlong Huang et al. ICML 2022. [Paper]
  9. "Chain-of-Thought Prompting Elicits Reasoning in Large Language Models". Jason Wei et al. NeurIPS 2022. [Paper]
  10. "Training language models to follow instructions with human feedback". Long Ouyang et al. arXiv 2022. [Paper]
  11. "Competition-Level Code Generation with AlphaCode". Yujia Li et al. Science 2022. [Paper]
  12. "Do As I Can, Not As I Say: Grounding Language in Robotic Affordances". Michael Ahn et al. arXiv 2022. [Paper]
  13. "Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback". Yuntao Bai et al. arXiv 2022. [Paper]
  14. "Autoformalization with Large Language Models". Yuhuai Wu et al. NeurIPS 2022. [Paper]
  15. "Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models". Aarohi Srivastava et al. arXiv 2022. [Paper]
  16. "Exploring Length Generalization in Large Language Models". Cem Anil et al. NeurIPS 2022. [Paper]
  17. "Few-shot Learning with Retrieval Augmented Language Models". Gautier Izacard et al. arXiv 2022. [Paper]
  18. "Limitations of Language Models in Arithmetic and Symbolic Induction". Jing Qian et al. arXiv 2022. [Paper]
  19. "Code as Policies: Language Model Programs for Embodied Control". Jacky Liang et al. arXiv 2022. [Paper]
  20. "ProgPrompt: Generating Situated Robot Task Plans using Large Language Models". Ishika Singh et al. arXiv 2022. [Paper]
  21. "Law Informs Code: A Legal Informatics Approach to Aligning Artificial Intelligence with Humans". John J. Nay et al. arXiv 2022. [Paper]
  22. "Language Models Are Greedy Reasoners: A Systematic Formal Analysis of Chain-of-Thought". Abulhair Saparov et al. ICLR 2023. [Paper]
  23. "Language Models are Multilingual Chain-of-Thought Reasoners". Freda Shi et al. ICLR 2023. [Paper]
  24. "Re3: Generating Longer Stories With Recursive Reprompting and Revision". Kevin Yang et al. EMNLP 2022. [Paper]
  25. "Language Models of Code are Few-Shot Commonsense Learners". Aman Madaan et al. EMNLP 2022. [Paper]
  26. "Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them". Mirac Suzgun et al. arXiv 2022. [Paper]
  27. "Large Language Models Can Self-Improve". Jiaxin Huang et al. arXiv 2022. [Paper]
  28. "Draft, Sketch, and Prove: Guiding Formal Theorem Provers with Informal Proofs". Albert Q. Jiang et al. ICLR 2023. [Paper]
  29. "Holistic Evaluation of Language Models". Percy Liang et al. arXiv 2022. [Paper]
  30. "PAL: Program-aided Language Models". Luyu Gao et al. arXiv 2022. [Paper]
  31. "Legal Prompt Engineering for Multilingual Legal Judgement Prediction". Dietrich Trautmann et al. arXiv 2022. [Paper]
  32. "How Does ChatGPT Perform on the Medical Licensing Exams? The Implications of Large Language Models for Medical Education and Knowledge Assessment". Aidan Gilson et al. medRxiv 2022. [Paper]
  33. "ChatGPT: The End of Online Exam Integrity?". Teo Susnjak et al. arXiv 2022. [Paper]
  34. "Large Language Models are reasoners with Self-Verification". Yixuan Weng et al. arXiv 2022. [Paper]
  35. "Self-Instruct: Aligning Language Model with Self Generated Instructions". Yizhong Wang et al. arXiv 2022. [Paper]
  36. "ChatGPT Makes Medicine Easy to Swallow: An Exploratory Case Study on Simplified Radiology Reports". Katharina Jeblick et al. arXiv 2022. [Paper]
  37. "The End of Programming". Matt Welsh et al. ACM 2023. [Paper]
  38. "Chatgpt goes to law school". Choi Jonathan H et al. SSRN 2023. [Paper]
  39. "How Close is ChatGPT to Human Experts? Comparison Corpus, Evaluation, and Detection". Biyang Guo et al. arXiv 2023. [Paper]
  40. "Is ChatGPT A Good Translator? A Preliminary Study". Wenxiang Jiao et al. arXiv 2023. [Paper]
  41. "Could an Artificial-Intelligence agent pass an introductory physics course?". Gerd Kortemeyer et al. arXiv 2023. [Paper]
  42. "Mathematical Capabilities of ChatGPT". Simon Frieder et al. arXiv 2023. [Paper]
  43. "Synthetic Prompting: Generating Chain-of-Thought Demonstrations for Large Language Models". Zhihong Shao et al. arXiv 2023. [Paper]
  44. "Grounding Large Language Models in Interactive Environments with Online Reinforcement Learning". Thomas Carta et al. arXiv 2023. [Paper]
  45. "Evaluating ChatGPT as an Adjunct for Radiologic Decision-Making". Arya Yao et al. medRxiv 2023. [Paper]
  46. "Theory of Mind May Have Spontaneously Emerged in Large Language Models". Michal Kosinski et al. arXiv 2023. [Paper]
  47. "A Categorical Archive of ChatGPT Failures". Ali Borji et al. arXiv 2023. [Paper]
  48. "A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity". Yejin Bang et al. arXiv 2023. [Paper]
  49. "Toolformer: Language Models Can Teach Themselves to Use Tools". Timo Schick et al. arXiv 2023. [Paper]
  50. "Is ChatGPT a General-Purpose Natural Language Processing Task Solver?". Chengwei Qin et al. arXiv 2023. [Paper]
  51. "How Good Are GPT Models at Machine Translation? A Comprehensive Evaluation". Hendy Amr et al. arXiv 2023. [Paper]
  52. "Can ChatGPT Understand Too? A Comparative Study on ChatGPT and Fine-tuned BERT". Qihuang Zhong et al. arXiv 2023. [Paper]
  53. "Zero-Shot Information Extraction via Chatting with ChatGPT". Xiang Wei et al. arXiv 2023. [Paper]
  54. "ChatGPT: Jack of all trades, master of none". Jan Kocon et al. arXiv 2023. [Paper]
  55. "On the Robustness of ChatGPT: An Adversarial and Out-of-distribution Perspective". Jindong Wang et al. arXiv 2023. [Paper]
  56. "Check Your Facts and Try Again: Improving Large Language Models with External Knowledge and Automated Feedback". Baolin Peng et al. arXiv 2023. [Paper]
  57. "An Independent Evaluation of ChatGPT on Mathematical Word Problems (MWP)". Paulo Shakarian et al. arXiv 2023. [Paper]
  58. "How Robust is GPT-3.5 to Predecessors? A Comprehensive Study on Language Understanding Tasks". Chen Xuanting et al. arXiv 2023. [Paper]
  59. "The utility of ChatGPT for cancer treatment information". Shen Chen et al. medRxiv 2023. [Paper]
  60. "Can ChatGPT Assess Human Personalities? A General Evaluation Framework". Haocong Rao et al. arXiv 2023. [Paper]
  61. "Will Affective Computing Emerge from Foundation Models and General AI? A First Evaluation on ChatGPT.". Mostafa M. Amin et al. arXiv 2023. [Paper]
  62. "Exploring the Feasibility of ChatGPT for Event Extraction.". Jun Gao et al. arXiv 2023. [Paper]
  63. "Does Synthetic Data Generation of LLMs Help Clinical Text Mining?". Tang Ruixiang et al. arXiv 2023. [Paper]
  64. "Consistency Analysis of ChatGPT". Myeongjun Jang et al. arXiv 2023. [Paper]
  65. "Self-planning Code Generation with Large Language Model". Shun Zhang et al. ICLR 2023. [Paper]
  66. "Evaluation of ChatGPT as a Question Answering System for Answering Complex Questions". Yiming Tan et al. arXiv 2023. [Paper]
  67. "GPT-4 Technical Report". OpenAI et al. OpenAI 2023. [Paper]
  68. "A Short Survey of Viewing Large Language Models in Legal Aspect". Zhongxiang Sun et al. arXiv 2023. [Paper]
  69. "ChatGPT Participates in a Computer Science Exam". Sebastian Bordt et al. arXiv 2023. [Paper]
  70. "A Comprehensive Capability Analysis of GPT-3 and GPT-3.5 Series Models". Junjie Ye et al. arXiv 2023. [Paper]
  71. "On the Educational Impact of ChatGPT: Is Artificial Intelligence Ready to Obtain a University Degree?". Kamil Malinka et al. arXiv 2023. [Paper]
  72. "Sparks of Artificial General Intelligence: Early experiments with GPT-4". S'ebastien Bubeck et al. arXiv 2023. [Paper]
  73. "Is ChatGPT A Good Keyphrase Generator? A Preliminary Study". Mingyang Song et al. arXiv 2023. [Paper]
  74. "Capabilities of GPT-4 on Medical Challenge Problems". Harsha Nori et al. arXiv 2023. [Paper]
  75. "Can we trust the evaluation on ChatGPT?". Rachith Aiyappa et al. arXiv 2023. [Paper]
  76. "ChatGPT Outperforms Crowd-Workers for Text-Annotation Tasks". Fabrizio Gilardi et al. arXiv 2023. [Paper]
  77. "Evaluation of ChatGPT for NLP-based Mental Health Applications". Bishal Lamichhane et al. arXiv 2023. [Paper]
  78. "ChatGPT is a Knowledgeable but Inexperienced Solver: An Investigation of Commonsense Problem in Large Language Models". Bian Ning et al. arXiv 2023. [Paper]
  79. "Evaluating GPT-3.5 and GPT-4 Models on Brazilian University Admission Exams". Desnes Nunes et al. arXiv 2023. [Paper]
  80. "Humans in Humans Out: On GPT Converging Toward Common Sense in both Success and Failure". Philipp Koralus et al. arXiv 2023. [Paper]
  81. "Yes but.. Can ChatGPT Identify Entities in Historical Documents?". Carlos-Emiliano González-Gallardo et al. arXiv 2023. [Paper]
  82. "Uncovering ChatGPT's Capabilities in Recommender Systems". Sunhao Dai et al. arXiv 2023. [Paper]
  83. "Editing Large Language Models: Problems, Methods, and Opportunities". Yunzhi Yao et al. arXiv 2023. [Paper]
  84. "Red teaming ChatGPT via Jailbreaking: Bias, Robustness, Reliability and Toxicity". Terry Yue Zhuo et al. arXiv 2023. [Paper]
  85. "On Robustness of Prompt-based Semantic Parsing with Large Pre-trained Language Model: An Empirical Study on Codex". Terry Yue Zhuo et al. EACL 2023. [Paper]
  86. "A Systematic Study and Comprehensive Evaluation of ChatGPT on Benchmark Datasets". Laskar et al.* ACL'23. [Paper]
  87. "Red-Teaming Large Language Models using Chain of Utterances for Safety-Alignment". Rishabh Bhardwaj et al. arXiv 2023. [Paper]

The Team

Here is the list of our student contributors in each section.

Section Student Contributors
The whole paper Kun Zhou, Junyi Li
Overview && Resources of LLMs Yingqian Min (Lead), Chen Yang
Pretraining Yupeng Hou (Lead), Junjie Zhang, Zican Dong, Yushuo Chen
Adaptaion Tuning Tianyi Tang (Lead), Jinhao Jiang, Ruiyang Ren, Zikang Liu, Peiyu Liu
Utilization Xiaolei Wang (Lead), Yifan Du, Xinyu Tang
Capacity Evaluation Beichen Zhang (Lead), Zhipeng Chen, Yifan Li

Acknowledgments

The authors would like to thank Yankai Lin and Yutao Zhu for proofreading this paper. Since the first release of this paper, we have received a number of valuable comments from the readers. We sincerely thank the readers who have written to us with constructive suggestions and comments: Tyler Suard, Damai Dai, Liang Ding, Stella Biderman, Kevin Gray, Jay Alammar and Yubo Feng.

Update Log

Version Time Update Content
V1 2023/03/31 The initial version.
V2 2023/04/09 Add the affiliation information.
Revise Figure 1 and Table 1 and clarify the
corresponding selection criterion for LLMs.
Improve the writing.
Correct some minor errors.
V3 2023/04/11 Correct the errors for library resources.
V4 2023/04/12 Revise Figure 1 and Table 1 and clarify the release date of LLMs.
V5 2023/04/16 Add a new Section 2.2 about
the technical evolution of GPT-series models.
V6 2023/04/24 Add some new models in Table 1 and Figure 1.
Add the discussion about scaling laws.
Add some explanations about the
model sizes for emergent abilities (Section 2.1).
Add an illustrative figure for the attention patterns
for different architectures in Figure 4.
Add the detailed formulas in Table 4.
V7 2023/04/25 Revise some copy errors in figures and tables.
V8 2023/04/27 Add efficient tuning in Section 5.3
V9 2023/04/28 Revise Section 5.3
V10 2023/05/07 Revise Table 1, Table 2, and some minor points.
V11
(major revision)
2023/06/29 – Section 1: add Figure 1 for the trends of published
LLM papers in arXiv;
– Section 2: add Figure 3 for GPT’s evolution and the
corresponding discussion;
– Section 3: add Figure 4 for LLaMA family and the
corresponding discussion;
– Section 5: add latest discussion about the synthetic
data formatting of instruction tuning in Section 5.1.1,
the empirical analysis for instruction tuning in Sec-
tion 5.1.4, parameter-efficient model adaptation in
Section 5.3 and memory-efficient adaptation in Sec-
tion 5.4;
– Section 6: add latest discussion about the underlying
mechanism of ICL 6.1.3, planning for complex task
solving in Section 6.3;
– Section 7: add Table 10 for representative datasets for
evaluating advanced abilities of LLMs, and empirical
ability evaluation in Section 7.3.2;
– Section 8: add prompt design;
– Section 9: add the discussions on applications of
LLMs in finance and scientific research domains;