Efficient Deep Learning Systems

This repository contains materials for the Efficient Deep Learning Systems course taught at the Faculty of Computer Science of HSE University and Yandex School of Data Analysis.

This branch corresponds to the ongoing 2025 course. If you want to see full materials of past years, see the "Past versions" section.

Syllabus

Week 1: Introduction
- Lecture: Course overview and organizational details. Core concepts of the GPU architecture and CUDA API.
- Seminar: CUDA operations in PyTorch. Introduction to benchmarking.
Week 2: Experiment tracking, model and data versioning, testing DL code in Python
- Lecture: Experiment management basics and pipeline versioning. Configuring Python applications. Intro to regular and property-based testing.
- Seminar: Example DVC+Weights & Biases project walkthrough. Intro to testing with pytest.
Week 3: Training optimizations, FP16/BF16/FP8 formats, profiling deep learning code
- Lecture: Measuring performance of GPU-accelerated software. Mixed-precision training. Data storage and loading optimizations. Tools for profiling deep learning workloads.
- Seminar: Automatic Mixed Precision in PyTorch. Dynamic padding for sequence data and JPEG decoding benchmarks. Basics of profiling with py-spy, PyTorch Profiler, Memory Snapshot and Nsight Systems.
Week 4: Data-parallel training and All-Reduce
- Lecture: Introduction to distributed training. Data-parallel training of neural networks. All-Reduce and its efficient implementations.
- Seminar: Introduction to PyTorch Distributed. Data-parallel training primitives.
Week 5: Training large models
- Lecture: Tensor, pipeline, sequence parallelism. Gradient checkpointing, offloading.
- Seminar: Gradient checkpointing and tensor parallelism in practice.
Week 6: Sharded data-parallel training, distributed training optimizations
- Lecture: Fully-sharded data parallel training and its optimizations
- Seminar: In-depth overview of FSDP2
Week 7: Python web application deployment
- Lecture/Seminar: Building and deployment of production-ready web services. App & web servers, Docker, Prometheus, API via HTTP and gRPC.
Week 8: LLM inference optimizations and software
- Lecture: Inference speed metrics. KV caching, batch inference, continuous batching. FlashAttention with its modifications and PagedAttention. Overview of popular LLM serving frameworks.
- Seminar: Implementation of KV caching. Basics of the Triton language. Layer fusion in PyTorch and Triton. Liger Kernels. FlashAttention and FlexAttention in practice.
Week 9: Efficient model inference
- Lecture: Speculative decoding, architecture optimizations, quantization, knowledge distillation
- Seminar: Introduction to speculative decoding. Matrix multiplication in Triton for different scenarios.
Week 10: Guest lecture

Grading

There will be several home assignments (spread over multiple weeks) on the following topics:

Training pipelines and code profiling
Distributed and memory-efficient training
Deploying and optimizing models for production

The final grade is a weighted sum of per-assignment grades. Please refer to the course page of your institution for details.

Name		Name	Last commit message	Last commit date
Latest commit History 193 Commits
week01_intro		week01_intro
week02_management_and_testing		week02_management_and_testing
week03_fast_pipelines		week03_fast_pipelines
week04_data_parallel		week04_data_parallel
week05_large_models		week05_large_models
week06_fsdp		week06_fsdp
week07_application_deployment		week07_application_deployment
week08_inference_software		week08_inference_software
week09_inference_algorithms		week09_inference_algorithms
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Efficient Deep Learning Systems

Syllabus

Grading

Staff

Past versions

About

Uh oh!

Uh oh!

Contributors 21

Languages

License

mryab/efficient-dl-systems

Folders and files

Latest commit

History

Repository files navigation

Efficient Deep Learning Systems

Syllabus

Grading

Staff

Past versions

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors 21

Languages