Staying up to date with Large Language Models (no hype).
This is my plan for staying excited about large language models:
-
Stop worrying about missing the latest trendy python framework.
-
Make a reading list which covers all of the important LLM innovations (both academia and industry) and work through the list. I want the list to contain both papers (academia) and commentary on (successful) practical applications of LLMs.
-
Subscribe to email newsletters, LinkedIn, YouTube and Twitter accounts of significant LLM innovators and commentators (avoiding sources focusing on hype and speculation).
Here is my (constantly evolving) reading list:
Status | Title | Value | Year | Topic | Type | Notes | Link(s) |
---|---|---|---|---|---|---|---|
✅ read | ReAct: Synergizing Reasoning and Acting in Language Models | ★★★★★ | 2023 | LLM Prompting | paper | Compelling and readable | https://arxiv.org/abs/2210.03629 |
✅ read | Self-consistency improves chain of thought reasoning in language models | ★★★★☆ | 2022 | LLM Prompting | paper | https://arxiv.org/abs/2203.11171 | |
✅ read | Hermes: A Text-to-SQL solution at Swiggy | ★★★★★ | blog | https://bytes.swiggy.com/hermes-a-text-to-sql-solution-at-swiggy-81573fb4fb6e | |||
✅ read | A survey on large language model based autonomous agents | ★★★★★ | 2024 | paper | https://link.springer.com/article/10.1007/s11704-024-40231-1 | ||
✅ read | Reflexion: Language Agents with Verbal Reinforcement Learning | ★★★★★ | 2023 | LLM Agents | paper | Simple but very powerful concept | https://arxiv.org/abs/2303.11366 |
✅ read | RAGChecker: A Fine-grained Framework for Diagnosing Retrieval-Augmented Generation | ★★★★★ | 2024 | LLM Evaluation | paper | very convincing and elegant framework. Very thorough research | https://arxiv.org/abs/2408.08067 |
✅ read | RefChecker: Reference-based Fine-grained Hallucination Checker and Benchmark for Large Language Models | 2024 | LLM Evaluation | paper | this framework is also used in the RAGChecker paper | https://arxiv.org/abs/2405.14486 | |
✅ read | Cognitive Architectures for Language Agents | ★★★★★ | 2023 | LLM Agents | paper | notes | https://arxiv.org/abs/2309.02427 |
✅ read | Scaling Test Time Compute with Open Models | ★★★★★ | 2024 | Test Time Compute | blog | Much more readable than the original paper | https://huggingface.co/spaces/HuggingFaceH4/blogpost-scaling-test-time-compute |
✅ read | Unleashing the Emergent Cognitive Synergy in Large Language Models: A Task-Solving Agent through Multi-Persona Self-Collaboration | ★★★★★ | 2024 | reasoning | paper | I loved this paper! Such a straightforward but creative concept | https://arxiv.org/abs/2307.05300 |
❌ not started | KAG: Boosting LLMs in Professional Domains via Knowledge Augmented Generation | 2024 | RAG | paper | |||
✅ read | Reasoning over Uncertain Text by Generative Large Language Models | ★★★☆☆ | 2024 | paper | Less applied than I'd hoped (more theoretical) but still quite interesting | https://arxiv.org/abs/2402.09614v3 | |
❌ not started | Understanding Multimodal LLMs | 2024 | Multi-Modal LLMs | blog | notes | https://www.linkedin.com/pulse/understanding-multimodal-llms-sebastian-raschka-phd-t7h5c | |
❌ not started | Large Language Models Understand and Can Be Enhanced by Emotional Stimuli | 2023 | LLM Prompting | paper | |||
❌ not started | The Llama 3 Herd of Models | 2024 | Foundation models | paper | |||
◐ partially read | Build a Large Language Model (from scratch) | ★★★★★ | LLM Architecture | book | If I could read only 1 thing, this would be it | ||
✅ read | Orchestrating Agents: Routines and Handoffs | ★★★★☆ | 2024 | LLM Agents | blog | Very clear. Nice python code examples with no additional frameworks required | https://cookbook.openai.com/examples/orchestrating_agents |
✅ read | Endless Jailbreaks with Bijection Learning | ★★★★★ | 2024 | LLM Security | paper | very simple but powerful technique | https://arxiv.org/abs/2410.01294 |
✅ read | Multi-expert Prompting Improves Reliability, Safety, and Usefulness of Large Language Models | ★★★★☆ | 2024 | paper | Very novel, effective and interpretable technique | https://arxiv.org/abs/2411.00492 | |
◐ partially read | Fine-tuning Embedding Models for RAG | ★★★★★ | 2024 | Embeddings | blog | Impressive performance gain. Contains lots of useful code | https://www.philschmid.de/fine-tune-embedding-model-for-rag |
◐ partially read | Graph Retrieval-Augmented Generation: A Survey | ★★★☆☆ | 2024 | RAG | paper | notes | https://arxiv.org/abs/2408.08921 |
❌ not started | Great, Now Write an Article About That: The Crescendo Multi-Turn LLM Jailbreak Attack | 2024 | LLM Security | paper | https://arxiv.org/abs/2404.01833 | ||
✅ read | BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding | ★★★★☆ | 2018 | Foundation Models | paper | https://arxiv.org/abs/1810.04805 | |
❌ not started | Chain-of-Thought Reasoning Without Prompting | year | topic | type | notes | https://arxiv.org/abs/2402.10200 | |
✅ read | o1 isn't a chat model (and that's the point) | ★★★★☆ | 2025 | Test-time compute | blog | https://www.latent.space/p/o1-skill-issue | |
❌ not started | Tree of Thoughts: Deliberate Problem Solving with Large Language Models | ★★☆☆☆ | year | topic | type | notes | https://arxiv.org/abs/2305.10601 |
❌ not started | Graph of Thoughts: Solving Elaborate Problems with Large Language Models | 2023 | LLM Agents | paper | notes | https://arxiv.org/abs/2308.09687 | |
❌ not started | GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models | 2024 | LLM Limitations | paper | https://arxiv.org/abs/2410.05229v1 | ||
❌ not started | From Local to Global: A Graph RAG Approach to Query-Focused Summarization | 2024 | RAG | paper | notes | links | |
◐ partially read | What We've Learned From a Year of Building with LLMs | 2024 | llm-ops | blog | https://applied-llms.org | ||
❌ not started | Large Language Models are Zero-Shot Reasoners | 2022 | LLM Prompting | paper | notes | https://arxiv.org/abs/2205.11916 | |
❌ not started | Planning with Large Language Models via Corrective Re-prompting | 2022 | topic | paper | notes | https://openreview.net/pdf?id=cMDMRBe1TKs | |
❌ not started | Evaluating and enhancing probabilistic reasoning in language models | 2024 | blog | notes | https://research.google/blog/evaluating-and-enhancing-probabilistic-reasoning-in-language-models/ | ||
❌ not started | ReWOO: Decoupling Reasoning from Observations for Efficient Augmented Language Models | 2023 | LLM Agents | paper | notes | https://arxiv.org/abs/2305.18323 | |
❌ not started | Sparse Priming Representations | 2023 | github repo | https://github.com/daveshap/SparsePrimingRepresentations https://www.youtube.com/watch?v=YjdmYCd6y0M |
|||
❌ not started | BOLAA: Benchmarking and Orchestrating LLM-augmented Autonomous Agents | 2023 | LLM Agents | paper | https://arxiv.org/abs/2308.05960 https://github.com/salesforce/BOLAA |
||
◐ partially read | ML and LLM system design: 450 case studies to learn from | ★★★★★ | 2024 | llm-ops | website | notes | https://www.evidentlyai.com/ml-system-design |
❌ not started | Your RAG application is a communication system | 2024 | RAG | blog | notes | https://superlinked.com/vectorhub/articles/rag-application-communication-system | |
❌ not started | Reader-LM: Small Language Models for Cleaning and Converting HTML to Markdown | 2024 | blog | notes | https://jina.ai/news/reader-lm-small-language-models-for-cleaning-and-converting-html-to-markdown/ | ||
❌ not started | Writing in the Margins: Better Inference Pattern for Long Context Retrieval | 2024 | RAG | paper | notes | https://www.arxiv.org/abs/2408.14906 | |
❌ not started | Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters | 2024 | paper | notes | https://arxiv.org/abs/2408.03314 | ||
❌ not started | LightRAG: Simple and Fast Retrieval-Augmented Generation | 2024 | RAG | paper | notes | https://arxiv.org/abs/2410.05779 | |
❌ not started | GPT-4 Technical Report | 2024 | LLM Architecture | paper | notes | https://arxiv.org/abs/2303.08774 | |
❌ not started | Sparks of Artificial General Intelligence: Early experiments with GPT-4 | 2023 | AGI | paper | notes | https://arxiv.org/abs/2303.12712 | |
❌ not started | A Survey of Large Language Models | ongoing (started 2023) | paper (survey) | notes | https://arxiv.org/abs/2303.18223 | ||
❌ not started | QLoRA: Efficient Finetuning of Quantized LLMs | 2023 | Quantization | paper | notes | https://arxiv.org/abs/2305.14314 | |
❌ not started | Mamba: Linear-Time Sequence Modeling with Selective State Spaces | 2024 | LLM Architecture | paper | notes | https://arxiv.org/abs/2312.00752 | |
❌ not started | Towards Expert-Level Medical Question Answering with Large Language Models | 2023 | paper | notes | https://arxiv.org/abs/2305.09617 | ||
❌ not started | ZEBRA: Zero-Shot Example-Based Retrieval Augmentation for Commonsense Question Answering | 2024 | RAG | paper | notes | https://arxiv.org/abs/2410.05077 | |
◐ partially read | Two Tales of Persona in LLMs: A Survey of Role-Playing and Personalization | 2024 | LLM Persona | paper (survey) | https://arxiv.org/abs/2406.01171 | ||
◐ partially read | Quantifying the Persona Effect in LLM Simulations | 2024 | LLM Persona | paper | notes | https://arxiv.org/abs/2402.10811 | |
❌ not started | Looking Inward: Language Models Can Learn About Themselves by Introspection | 2024 | paper | notes | https://arxiv.org/abs/2410.13787 | ||
❌ not started | Who Validates the Validators? Aligning LLM-Assisted Evaluation of LLM Outputs with Human Preferences | 2024 | LLM Evaluation | paper | notes | https://arxiv.org/abs/2404.12272 | |
✅❌templaterow | title | ★★☆☆☆ | year | topic | type | notes | links |
Also to check out:
- https://magazine.sebastianraschka.com/p/llm-research-papers-the-2024-list
- SmolLM2
- (paper) Efficient Few-Shot Learning Without Prompts
LLM sampling techniques:
-
temperature sampling (Ackley et al., 1985; Ficler & Goldberg, 2017)
-
top-k sampling (Fan et al., 2018; Holtzman et al., 2018; Radford et al., 2019)
-
nucleus sampling (Holtzman et al., 2020)
Subscribed to/followed:
-
Data Elixir
-
Andrej Karpathy
-
Sebastian Raschka
-
AI Tidbits
-
AlphaSignal
-
Data Science Weekly
-
The Batch (Andrew NG)
Interesting projects:
Project Name | Description | Link(s) |
---|---|---|
optillm | <https://github.com/codelion/optillm |
References and Resources: