TinyLlama: An Open-Source Small Language Model #1530

hahuyhoang411 · 2024-01-11T11:29:46Z

hahuyhoang411
Jan 11, 2024
Maintainer

hahuyhoang411 · 2024-01-11T11:44:34Z

Model Overview: TinyLlama is a compact 1.1 billion parameter language model pre-trained on around 1 trillion tokens, leveraging the architecture and tokenizer of Llama 2 and advancements from the open-source community.
Model Architecture: TinyLlama adopts a Transformer architecture similar to Llama 2, featuring a hidden size of 2048, an intermediate hidden size of 5632, a context length of 2048, and 22 layers with 32,000 vocab size.
Speed Optimizations: Incorporates Fully Sharded Data Parallel (FSDP), Flash Attention for efficient training, and Grouped-query Attention to reduce memory overhead.
Training Performance: Achieves a training throughput of 24,000 tokens per second per A100-40G GPU, requiring significantly fewer GPU hours compared to models like Pythia-1.0B and MPT-1.3B.
Comparative Analysis: Outperforms similar-sized open-source language models like OPT-1.3B and Pythia-1.4B in various downstream tasks.
Evaluation on Commonsense Reasoning Tasks: Demonstrates superior performance on tasks like HellaSwag, OpenBookQA, WinoGrande, ARC-Easy, ARC-Challenge, BoolQ, and PIQA.
Problem-solving Capabilities: Evaluated using the InstructEval benchmark including tasks like MMLU, BIG-Bench Hard, DROP, and HumanEval, where TinyLlama shows better problem-solving skills compared to existing models.

1 reply