Skip to content
This repository has been archived by the owner on May 20, 2024. It is now read-only.

Latest commit

 

History

History
186 lines (128 loc) · 17.4 KB

README_en.md

File metadata and controls

186 lines (128 loc) · 17.4 KB

eBPF Developer Tutorial and Knowledge Base: Learning eBPF Step by Step with Tools

CI

GitHub Gitee Mirror

This is a development tutorial for eBPF based on CO-RE (Compile Once, Run Everywhere). It provides practical eBPF development practices from beginner to advanced, including basic concepts, code examples, and real-world applications. Unlike BCC, we use frameworks like libbpf, Cilium, libbpf-rs, and eunomia-bpf for development, with examples in languages such as C, Go, and Rust.

This tutorial does not cover complex concepts and scenario introductions. Its main purpose is to provide examples of eBPF tools (very short, starting with twenty lines of code!) to help eBPF application developers quickly grasp eBPF development methods and techniques. The tutorial content can be found in the directory, with each directory being an independent eBPF tool example.

The tutorial focuses on eBPF examples in observability, networking, security, and more.

Table of Contents

Getting Started Examples

This section contains simple eBPF program examples and introductions. It primarily utilizes the eunomia-bpf framework to simplify development and introduces the basic usage and development process of eBPF.

  • lesson 0-introduce Introduces basic concepts of eBPF and common development tools
  • lesson 1-helloworld Develops the simplest "Hello World" program using eBPF and introduces the basic framework and development process of eBPF
  • lesson 2-kprobe-unlink Uses kprobe in eBPF to capture the unlink system call
  • lesson 3-fentry-unlink Uses fentry in eBPF to capture the unlink system call
  • lesson 4-opensnoop Uses eBPF to capture the system call collection of processes opening files, and filters process PIDs in eBPF using global variables
  • lesson 5-uprobe-bashreadline Uses uprobe in eBPF to capture the readline function calls in bash
  • lesson 6-sigsnoop Captures the system call collection of processes sending signals and uses a hash map to store states
  • lesson 7-execsnoop Captures process execution times and prints output to user space through perf event array
  • lesson 8-exitsnoop Captures process exit events and prints output to user space using a ring buffer
  • lesson 9-runqlat Captures process scheduling delays and records them in histogram format
  • lesson 10-hardirqs Captures interrupt events using hardirqs or softirqs

Advanced Documents and Examples

We start to build complete eBPF projects mainly based on libbpf and combine them with various application scenarios for practical use.

In-Depth Topics

This section covers advanced topics related to eBPF, including using eBPF programs on Android, possible attacks and defenses using eBPF programs, and complex tracing. Combining the user-mode and kernel-mode aspects of eBPF can bring great power (as well as security risks).

Android:

Networking and tracing:

Security:

Other:

Continuously updating...

Why write this tutorial?

In the process of learning eBPF, we have been inspired and helped by the bcc python developer tutorial. However, from the current perspective, using libbpf to develop eBPF applications is a relatively better choice. However, there seems to be few tutorials that focus on eBPF development based on libbpf and BPF CO-RE, introducing it through examples and tools. Therefore, we initiated this project, adopting a similar organization method as the bcc python developer tutorial, but using CO-RE's libbpf for development.

This project is mainly based on libbpf-bootstrap and eunomia-bpf frameworks, and uses eunomia-bpf to help simplify the development of some user-space libbpf eBPF code, allowing developers to focus on kernel-space eBPF code development.

  • We also provide a small tool called GPTtrace, which uses ChatGPT to automatically write eBPF programs and trace Linux systems through natural language descriptions. This tool allows you to interactively learn eBPF programs: GPTtrace
  • Feel free to raise any questions or issues related to eBPF learning, or bugs encountered in practice, in the issue or discussion section of this repository. We will do our best to help you!

GitHub Templates: Easily build eBPF projects and development environments, compile and run eBPF programs online with one click

When starting a new eBPF project, are you confused about how to set up the environment and choose a programming language? Don't worry, we have prepared a series of GitHub templates for you to quickly start a brand new eBPF project. Just click the Use this template button on GitHub to get started.- https://github.com/eunomia-bpf/libbpf-starter-template: eBPF project template based on the C language and libbpf framework

These starter templates include the following features:

  • A Makefile to build the project with a single command
  • A Dockerfile to automatically create a containerized environment for your eBPF project and publish it to GitHub Packages
  • GitHub Actions to automate the build, test, and release processes
  • All dependencies required for eBPF development

By setting an existing repository as a template, you and others can quickly generate new repositories with the same basic structure, eliminating the need for manual creation and configuration. With GitHub template repositories, developers can focus on the core functionality and logic of their projects without wasting time on the setup and structure. For more information about template repositories, see the official documentation: https://docs.github.com/en/repositories/creating-and-managing-repositories/creating-a-template-repository

When you create a new repository using one of the eBPF project templates mentioned above, you can easily set up and launch an online development environment with GitHub Codespaces. Here are the steps to compile and run eBPF programs using GitHub Codespaces:

  1. Click the Code button in your new repository and select the Open with Codespaces option:

    code

  2. GitHub will create a new Codespace for you, which may take a few minutes depending on your network speed and the size of the repository.

  3. Once your Codespace is launched and ready to use, you can open the terminal and navigate to your project directory.

  4. You can follow the instructions in the corresponding repository to compile and run eBPF programs:

    codespace

With Codespaces, you can easily create, manage, and share cloud-based development environments, speeding up and making your development process more reliable. You can develop with Codespaces anywhere, on any device, just need a computer with a web browser. Additionally, GitHub Codespaces supports pre-configured environments, customized development containers, and customizable development experiences to meet your development needs.

After writing code in a codespace and making a commit, GitHub Actions will compile and automatically publish the container image. Then, you can use Docker to run this eBPF program anywhere with just one command, for example:

$ sudo docker run --rm -it --privileged ghcr.io/eunomia-bpf/libbpf-rs-template:latest
[sudo] password for xxx: 
Tracing run queue latency higher than 10000 us
TIME     COMM             TID     LAT(us)       
12:09:19 systemd-udevd    30786   18300         
12:09:19 systemd-udevd    30796   21941         
12:09:19 systemd-udevd    30793   10323         
12:09:19 systemd-udevd    30795   14827         
12:09:19 systemd-udevd    30790   17973         
12:09:19 systemd-udevd    30793   12328         
12:09:19 systemd-udevd    30796   28721

docker

Why do we need tutorials based on libbpf and BPF CO-RE?

In history, when it comes to developing a BPF application, one could choose the BCC framework to load the BPF program into the kernel when implementing various BPF programs for Tracepoints. BCC provides a built-in Clang compiler that can compile BPF code at runtime and customize it into a program that conforms to a specific host kernel. This is the only way to develop maintainable BPF applications under the constantly changing internal kernel environment. The portability of BPF and the introduction of CO-RE are detailed in the article "BPF Portability and CO-RE", explaining why BCC was the only viable option before and why libbpf is now considered a better choice. Last year, Libbpf saw significant improvements in functionality and complexity, eliminating many differences with BCC (especially for Tracepoints applications) and adding many new and powerful features that BCC does not support (such as global variables and BPF skeletons)

Admittedly, BCC does its best to simplify the work of BPF developers, but sometimes it also increases the difficulty of problem localization and fixing while providing convenience. Users must remember its naming conventions and the autogenerated structures for Tracepoints, and they must rely on rewriting this code to read kernel data and access kprobe parameters. When using BPF maps, it is necessary to write half-object-oriented C code that does not completely match what happens in the kernel. Furthermore, BCC leads to the writing of a large amount of boilerplate code in user space, with manually configuring the most trivial parts.

As mentioned above, BCC relies on runtime compilation and embeds a large LLVM/Clang library, which creates certain gaps between BCC and an ideal usage scenario:

  • High resource utilization (memory and CPU) at compile time, which may interfere with the main process in busy servers.
  • It relies on the kernel header package and needs to be installed on each target host. Even so, if certain kernel contents are not exposed through public header files, type definitions need to be copied and pasted into the BPF code to achieve the purpose.
  • Even the smallest compile-time errors can only be detected at runtime, followed by recompiling and restarting the user-space application. This greatly affects the iteration time of development (and increases frustration...).

Libbpf + BPF CO-RE (Compile Once - Run Everywhere) takes a different approach, considering BPF programs as normal user-space programs: they only need to be compiled into small binaries that can be deployed on target hosts without modification. libbpf acts as a loader for BPF programs, responsible for configuration work (relocating, loading, and verifying BPF programs, creating BPF maps, attaching to BPF hooks, etc.), and developers only need to focus on the correctness and performance of BPF programs. This approach minimizes overhead, eliminates dependencies, and improves the overall developer experience.

In terms of API and code conventions, libbpf adheres to the philosophy of "least surprise", where most things need to be explicitly stated: no header files are implied, and no code is rewritten. Most monotonous steps can be eliminated using simple C code and appropriate auxiliary macros. In addition, what users write is the content that needs to be executed, and the structure of BPF applications is one-to-one, finally verified and executed by the kernel.

Reference: BCC to Libbpf Conversion Guide (Translation) - Deep Dive into eBPF

eunomia-bpf

eunomia-bpf is an open-source eBPF dynamic loading runtime and development toolkit designed to simplify the development, building, distribution, and execution of eBPF programs. It is based on the libbpf CO-RE lightweight development framework.

With eunomia-bpf, you can:

  • Write only the libbpf kernel mode code when writing eBPF programs or tools, automatically retrieving kernel mode export information.
  • Use Wasm to develop eBPF user mode programs, controlling the entire eBPF program loading and execution, as well as handling related data within the WASM virtual machine.
  • eunomia-bpf can package pre-compiled eBPF programs into universal JSON or WASM modules for distribution across architectures and kernel versions, allowing dynamic loading and execution without the need for recompilation.

eunomia-bpf consists of a compilation toolchain and a runtime library. Compared to traditional frameworks like BCC and native libbpf, it greatly simplifies the development process of eBPF programs, where in most cases, only the kernel mode code needs to be written to easily build, package, and publish complete eBPF applications. At the same time, the kernel mode eBPF code guarantees compatibility with mainstream development frameworks such as libbpf, libbpfgo, libbpf-rs, and more. When user mode code needs to be written, multiple languages can be used with the help of Webassembly. Compared to script tools like bpftrace, eunomia-bpf maintains similar convenience, while not being limited to trace scenarios and can be used in various other fields such as networking and security.

Let ChatGPT Help Us

This tutorial uses ChatGPT to learn how to write eBPF programs. At the same time, we try to teach ChatGPT how to write eBPF programs. The general steps are as follows:

  1. Teach it the basic knowledge of eBPF programming.
  2. Show it some cases: hello world, basic structure of eBPF programs, how to use eBPF programs for tracing, and let it start writing tutorials.
  3. Manually adjust the tutorials and correct errors in the code and documents.
  4. Feed the modified code back to ChatGPT for further learning.
  5. Try to make ChatGPT generate eBPF programs and corresponding tutorial documents automatically! For example:

ebpf-chatgpt-signal

The complete conversation log can be found here: ChatGPT.md

We have also built a demo of a command-line tool. Through training in this tutorial, it can automatically write eBPF programs and trace Linux systems using natural language descriptions: https://github.com/eunomia-bpf/GPTtrace

ebpf-chatgpt-signal