This begins after Phil-Opp’s excellent post on heap allocation: We have a kernel with basic terminal and keyboard I/O, and kernel memory allocation from a fixed size heap. In this section we’ll work out how to interrupt a thread, access its context (i.e. the registers), and then change the context to switch threads.
The code here roughly corresponds to EuraliOS commit 2a515e40. There are differences, as bugs were fixed, code tidied up, and the order of changes adjusted based on experience in later sections.
Pre-emptive multitasking works by interrupting the CPU and swapping out one process for another by saving the state of all the registers, and changing the registers to the state of another task. The usual way to do this is with the timer interrupt which now looks like:
extern "x86-interrupt" fn timer_interrupt_handler(
_stack_frame: InterruptStackFrame) {
print!(".");
unsafe {
PICS.lock()
.notify_end_of_interrupt(InterruptIndex::Timer.as_u8());
}
}
All interrupt handlers currently use the x86-interrupt calling convention. With that convention the compiler automatically inserts code to save scratch registers to the current stack, and passes a pointer to the exception stack frame to the handler function. It was introduced in this PR which has some useful discussion and links.
The x86-interrupt calling convention works well for most interrupt handlers, but it has some limitations: It doesn’t (as far as I can tell) provide a way to access and modify the register values which are restored when the handler returns. Instead we’ll write a naked function to handle saving and restoring register values ourselves.
First we need to define a structure which stores the register values,
so we’ll be able to access and modify them. In interrupts.rs
define
a Context
struct:
#[derive(Debug)]
#[repr(packed)]
pub struct Context {
// These are pushed in the handler function
pub r15: usize,
pub r14: usize,
pub r13: usize,
pub r12: usize,
pub r11: usize,
pub r10: usize,
pub r9: usize,
pub r8: usize,
pub rbp: usize,
pub rsi: usize,
pub rdi: usize,
pub rdx: usize,
pub rcx: usize,
pub rbx: usize,
pub rax: usize,
// Below is the exception stack frame pushed by the CPU on interrupt
// Note: For some interrupts (e.g. Page fault), an error code is pushed here
rip: usize, // Instruction pointer
cs: usize, // Code segment
rflags: usize, // Processor flags
rsp: usize, // Stack pointer
ss: usize, // Stack segment
// Here the CPU may push values to align the stack on a 16-byte boundary (for SSE)
}
We now define a function to replace the timer handler:
#[naked]
pub extern "x86-interrupt" fn timer_handler_naked (
_stack_frame: InterruptStackFrame) {
unsafe {
...
}
}
Here the x86-interrupt
calling convention is specified to satisfy
the requirements of the IDT set_handler_fn
method, but doesn’t have
any effect because the function is [naked]
and so the compiler doesn’t
add any code around ours.
Because naked functions are still somewhat experimental, we
need to enable this feature in lib.rs
:
#![feature(naked_functions)]
While we’re editing lib.rs
, add another feature we’ll need soon:
#![feature(asm_sym)]
which enables the sym
operands in assembly blocks, used to take
the address of symbols.
Now we can fill in the unsafe
block in the timer_handler_naked
function. In interrupts.rs
import the asm!
macro:
use core::arch::asm;
and then create an asm!
block inside the unsafe scope of
timer_handler_naked
. Note that naked functions in recent Rust
versions can only contain a single asm
block, and can’t mix Rust and
asm code, because the mix of naked functions, asm and Rust code
sometimes led to surprising behaviour. We therefore call out to Rust
code as quickly as possible:
asm!(
// Disable interrupts
"cli",
// Push registers
"push rax",
"push rbx",
"push rcx",
"push rdx",
"push rdi",
"push rsi",
"push rbp",
"push r8",
"push r9",
"push r10",
"push r11",
"push r12",
"push r13",
"push r14",
"push r15",
// First argument in rdi with C calling convention
"mov rdi, rsp",
// Call the hander function
"call {handler}",
// Pop scratch registers
"pop r15",
"pop r14",
"pop r13",
"pop r12",
"pop r11",
"pop r10",
"pop r9",
"pop r8",
"pop rbp",
"pop rsi",
"pop rdi",
"pop rdx",
"pop rcx",
"pop rbx",
"pop rax",
// Enable interrupts
"sti",
// Interrupt return
"iretq",
// Note: Getting the handler pointer here using `sym` operand, because
// an `in` operand would clobber a register that we need to save, and we
// can't have two asm blocks
handler = sym timer_handler,
options(noreturn)
);
This function pushes the values of the registers onto the stack
(note: the same stack as the interrupted process). Because stacks move
downwards in memory the registers are pushed onto the stack in the
order they appear in the Context
struct from bottom to top.
The sym timer_handler
argument takes the address of a symbol (the
timer_handler
function we haven’t written yet) and passes it in as
handler
so that it can be called. Now we can implement this handler
function, choosing the “C” calling convention so that the first
argument is in the RDI
register:
extern "C" fn timer_handler(context: &mut Context) {
print!("+");
// Tell the PIC that the interrupt has been processed
unsafe {
PICS.lock()
.notify_end_of_interrupt(InterruptIndex::Timer.as_u8());
}
}
Note that the C calling convention here means some registers will be pushed onto the stack again, which is a bit inefficient but will have to do for now. We then set the IDT entry to the new handler:
idt[InterruptIndex::Timer.as_usize()]
.set_handler_fn(timer_handler_naked); // new
While we’re at it we can also add interrupt handlers for page and general protection faults (GPF), which for now will just print error messages. This gives us more useful information than just a double fault.
use x86_64::structures::idt::PageFaultErrorCode;
use crate::hlt_loop;
extern "x86-interrupt" fn page_fault_handler(
stack_frame: InterruptStackFrame,
error_code: PageFaultErrorCode,
) {
use x86_64::registers::control::Cr2;
println!("EXCEPTION: PAGE FAULT");
println!("Accessed Address: {:?}", Cr2::read());
println!("Error Code: {:?}", error_code);
println!("{:#?}", stack_frame);
hlt_loop();
}
extern "x86-interrupt" fn general_protection_fault_handler(
stack_frame: InterruptStackFrame,
_error_code: u64) {
panic!("EXCEPTION: GENERAL PROTECTION FAULT\n{:#?}", stack_frame);
}
and add these handlers to the IDT:
idt.page_fault.
set_handler_fn(page_fault_handler).
set_stack_index(gdt::PAGE_FAULT_IST_INDEX);
idt.general_protection_fault.
set_handler_fn(general_protection_fault_handler).
set_stack_index(gdt::GENERAL_PROTECTION_FAULT_IST_INDEX);
That uses two constants defined in gdt.rs
:
pub const PAGE_FAULT_IST_INDEX: u16 = 0;
pub const GENERAL_PROTECTION_FAULT_IST_INDEX: u16 = 0;
All that set_stack_index
method does is select an index in the TSS
to load a stack pointer from when the interrupt occurs, ensuring that
these handlers run with a known-good stack. There are ways this can go
wrong even with a “good” stack pointer. For example if the page tables
are messed up then the stack pointer or handler function virtual
addresses may not be mapped to a physical address, in which case
typically the double fault handler can’t be found either, a triple
fault occurs and the CPU resets.
First check that this compiles and runs (cargo run
). We can then
check that the values of the registers are saved and restored
correctly. In the new timer_handler
function we can print some
registers, change them, and then check that the change is seen in the
interrupted code.
In main.rs
, before the hlt_loop()
call we can try this:
// Set some registers
unsafe {
asm!("mov r11, 0x4242",
"mov rdi, 0x22",
"mov rcx, 0x93"
);
}
// Wait for an interrupt
unsafe {asm!("hlt");}
// Get the register values
let (r11, rdi, rcx): (i64, i64, i64);
unsafe {asm!("nop",
lateout("r11") r11,
lateout("rdi") rdi,
lateout("rcx") rcx);}
println!("R11: 0x{:x} RDI: 0x{:x} RCX: 0x{:x}", r11, rdi, rcx);
which will need the asm!
macro so put use core::arch::asm;
near
the top of main.rs
. When this runs we should see
R11: 0x4242 RDI: 0x22 RCX: 0x93
, so the registers weren’t modified by
the interrupt.
Now in interrupt.rs
we can access and modify some registers, and check
that they are set correctly:
extern "C" fn timer_handler(context: &mut Context) {
print!("<0x{:x}, 0x{:x}>", context.r11, context.rcx);
context.r11 = context.rdi + 0x5321;
context.rcx = 0xdeadbeef;
// Tell the PIC that the interrupt has been processed
unsafe {
PICS.lock()
.notify_end_of_interrupt(InterruptIndex::Timer.as_u8());
}
}
So we now print the values of some registers, and modify the r11
and rcx
registers. Running again we should see:
<0x4242, 0x93> R11: 0x5343 RDI: 0x22 RCX: 0xDEADBEEF It did not crash! <0x5343, 0xDEADBEEF><0x5343, 0xDEADBEEF>...
This shows that the timer handler can read and modify the process state, which we’ll need when we want to switch processes.
To turn this into a test case we can keep around, we need some way to test the
wrapper code while replacing the timer_handler
.
Based on the MOROS code, turn the timer_handler_naked
function into a macro:
macro_rules! wrap {
($func: ident => $wrapper:ident) => {
#[naked]
pub extern "x86-interrupt" fn $wrapper (_stack_frame: InterruptStackFrame) {
unsafe{
...
}
}
};
}
which can create the hander as before:
wrap!(timer_handler => timer_handler_naked);
Running (cargo run
) should give the same result as before, but now we can
write tests for the macro by wrapping different functions.
We can make a standalone test like the stack_overflow
test by making the Context
struct
members public, exporting the wrap
function (renamed to interrupt_wrap
), and setting up a minimal
IDT in the test case. This is in tests/interrupt_wrap.rs
.
We’re now going to use the timer interrupt handler to switch between threads. When the timer interrupt handler starts, it pushes the context onto the stack, and when the handler finishes it pops the context off the stack. To change thread we just need to change the stack pointer inside the handler.
There is a lot of confusing information online about interrupt handling and context switching in x86. Most of this confusion is due to the different behaviour which has evolved over the last 40-some years: Real mode, 32-bit protected mode and 64-bit long mode all work somewhat differently. For 64-bit mode there is a good summary of exceptions here and a good explanation of context switching here.
In 64-bit mode interrupts can switch to known-good stacks which are listed in the Interrupt Stack Table (IST) in the Task State Segment. The IST has 7 entries per core that we can use (IST1 to IST7). The index into this table which should be used for each interrupt is specified in the Interrupt Descriptor Table (IDT). We’ve already used this to set good stacks for some of our handlers, and now we can use it to also switch tasks.
First we’ll set an IST index for the timer interrupt handler in the
IDT table (interrupts.rs
):
idt[InterruptIndex::Timer.as_usize()]
.set_handler_fn(timer_handler_naked)
.set_stack_index(gdt::TIMER_INTERRUPT_INDEX);
where the index is different from the index used for the fault
handlers (which are all 0 for now). In gdt.rs
:
pub const DOUBLE_FAULT_IST_INDEX: u16 = 0;
pub const PAGE_FAULT_IST_INDEX: u16 = 0;
pub const GENERAL_PROTECTION_FAULT_IST_INDEX: u16 = 0;
pub const TIMER_INTERRUPT_INDEX: u16 = 1; // New
The trick is that we are now going to change TSS entry 1 so that each thread stores its context in a different stack. Every time we switch thread we’re going to change the TSS entry to the new thread’s kernel stack, so when the timer interrupt occurs the thread’s context will be saved to its own kernel stack. To make this work we need to make the TSS mutable as well as static.
In gdt.rs
the TSS lazy static is now wrapped in a Mutex. In the
initialisation we set the timer interrupt entry to be the same as the
double fault index so it’s got a sensible value:
use spin::Mutex; // New
use lazy_static::lazy_static;
lazy_static! {
static ref TSS: Mutex<TaskStateSegment> = { // Modified Mutex<>
let mut tss = TaskStateSegment::new();
tss.interrupt_stack_table[DOUBLE_FAULT_IST_INDEX as usize] = {
const STACK_SIZE: usize = 4096 * 5;
static mut STACK: [u8; STACK_SIZE] = [0; STACK_SIZE];
let stack_start = VirtAddr::from_ptr(unsafe { &STACK });
let stack_end = stack_start + STACK_SIZE;
stack_end
};
tss.interrupt_stack_table[TIMER_INTERRUPT_INDEX as usize] =
tss.interrupt_stack_table[DOUBLE_FAULT_IST_INDEX as usize]; // New
Mutex::new(tss) // Modified Mutex::new
};
}
To set and retrieve values we can define some helper functions:
unsafe fn tss_reference() -> &'static TaskStateSegment {
let tss_ptr = &*TSS.lock() as *const TaskStateSegment;
& *tss_ptr
}
pub fn set_interrupt_stack_table(index: usize, stack_end: VirtAddr) {
TSS.lock().interrupt_stack_table[index] = stack_end;
}
The tss_reference()
function is only for internal use because we
need to set the TSS address in the GDT. The
set_interrupt_stack_table()
function is public because we’ll call it
to modify the timer interrupt stack entry in the TSS.
The Global Descriptor Table (GDT) is now changed to call
tss_reference()
to put the TSS address into the GDT. (This code was
originally adapted from MOROS).
lazy_static! {
static ref GDT: (GlobalDescriptorTable, Selectors) = {
let mut gdt = GlobalDescriptorTable::new();
let code_selector = gdt.add_entry(Descriptor::kernel_code_segment());
let data_selector = gdt.add_entry(Descriptor::kernel_data_segment());
let tss_selector = gdt.add_entry(Descriptor::tss_segment(
unsafe {tss_reference()}));
(gdt, Selectors {code_selector, data_selector, tss_selector})
}
}
struct Selectors {
code_selector: SegmentSelector,
data_selector: SegmentSelector,
tss_selector: SegmentSelector,
}
pub fn get_kernel_segments() -> (SegmentSelector, SegmentSelector) {
(GDT.1.code_selector, GDT.1.data_selector)
}
Note that we don’t need to modify the GDT after initialisation, unlike the TSS, so it doesn’t need a Mutex.
Now we have some of the basic mechanisms, with the timer interrupt handler and ability to modify the TSS entry, we need to create some structures to create and keep track of threads.
In a new source file process.rs
we can create a struct Thread
:
extern crate alloc;
use alloc::vec::Vec;
struct Thread {
kernel_stack: Vec<u8>,
user_stack: Vec<u8>,
kernel_stack_end: u64, // This address goes in the TSS
user_stack_end: u64,
context: u64, // Address of Context on kernel stack
}
We need a “user” stack which is in use while the thread is doing
whatever it’s supposed to do, a kernel stack which we’ll use to store
the Context
and use in the code which runs when a timer interrupt
occurs. The end of the kernel stack (kernel_stack_end
) is the address
that goes into the TSS (index 1, TIMER_INTERRUPT_INDEX
)
To keep track of threads we need to know which process is currently
running, and which threads are waiting. Most operating systems seem to
use some kind of linked list for this, but lists in Rust
are.. awkward. Instead we’ll use a VecDeque which we can use as a
queue by pushing Thread
structures onto one end and popping them off
the other. This is a static variable so to make writing thread safe we’ll
wrap the VecDeque in a RwLock spin lock.
use spin::RwLock;
use lazy_static::lazy_static;
use alloc::{boxed::Box, collections::vec_deque::VecDeque};
lazy_static! {
static ref RUNNING_QUEUE: RwLock<VecDeque<Box<Thread>>> =
RwLock::new(VecDeque::new());
static ref CURRENT_THREAD: RwLock<Option<Box<Thread>>> =
RwLock::new(None);
}
So we’re going to put Thread
structs in boxes, and put the boxes in
RUNNING_QUEUE
. When it’s a thread’s turn to run it’s going to be
taken out of the queue and put into CURRENT_THREAD
.
Let’s make a function to create a new kernel thread. It needs
to take a function pointer as input, create a Thread
struct,
and put it into RUNNING_QUEUE
.
const KERNEL_STACK_SIZE: usize = 4096 * 2;
const USER_STACK_SIZE: usize = 4096 * 5;
pub fn new_kernel_thread(function: fn()->()) {
let new_thread = {
let kernel_stack = Vec::with_capacity(KERNEL_STACK_SIZE);
let kernel_stack_end = (VirtAddr::from_ptr(kernel_stack.as_ptr())
+ KERNEL_STACK_SIZE).as_u64()
let user_stack = Vec::with_capacity(USER_STACK_SIZE);
let user_stack_end = (VirtAddr::from_ptr(user_stack.as_ptr())
+ USER_STACK_SIZE).as_u64() as usize;
let context = kernel_stack_end - INTERRUPT_CONTEXT_SIZE as u64;
Box::new(Thread {
kernel_stack,
user_stack,
kernel_stack_end,
user_stack_end,
context})
};
// Set context registers
// Add Thread to RUNNING_QUEUE
}
To set the Thread context registers we can do:
let context = unsafe {&mut *(new_thread.context as *mut Context)};
context.rip = function as usize; // Instruction pointer
context.rsp = new_thread.user_stack_end; // Stack pointer
context.rflags = 0x200; // Interrupts enabled
let (code_selector, data_selector) = gdt::get_kernel_segments();
context.cs = code_selector.0 as usize;
context.ss = data_selector.0 as usize;
and finally to insert the new thread into running queue we use:
use x86_64::instructions::interrupts;
...
interrupts::without_interrupts(|| {
RUNNING_QUEUE.write().push_back(new_thread);
});
where we are disabling interrupts while modifying the queue, so we don’t get a timer interrupt half way through which then tries to modify the queue.
We can try this out by creating a function in main.rs
:
use blog_os::process;
fn kernel_thread_main() {
println!("Kernel thread start");
loop {
println!("<< 1 >>");
x86_64::instructions::hlt();
}
}
fn kernel_main(boot_info: &'static BootInfo) -> ! {
println!("Hello World{}", "!");
blog_os::init();
memory::init(boot_info);
// Launch a kernel thread
process::new_kernel_thread(kernel_thread_main);
blog_os::hlt_loop();
}
Currently that won’t do anything because the thread is never scheduled. When it is run it should keep printing “<< 1 >>”.
The final step is to modify the timer interrupt handler (again) so
that it uses the RUNNING_QUEUE
to get the next thread to run, and
switches the kernel stacks.
We need to be able to change stack, but this change needs to happen just before the interrupt occurs because any use of the stack, like calling or returning from a function, will modify the context and do unexpected things. We’ll therefore change the stack in the naked timer interrupt function:
...
"push r14",
"push r15",
"mov rdi, rsp",
"call {handler}",
// New: stack pointer is in RAX
"cmp rax, 0",
"je 2f", // if rax != 0 {
"mov rsp, rax", // rsp = rax;
"2:", // }
"pop r15",
"pop r14",
...
The return value from a function with C calling convention is in the RAX register. We now check if the return value is non-zero, and if so we make it the new stack pointer. The registers will then be popped from this new stack.
The handler function can just call a function in process.rs
(which
we’re going to define soon):
use crate::process;
extern "C" fn timer_handler(context_addr: usize) -> usize {
let next_stack = process::schedule_next(context_addr);
unsafe {
PICS.lock()
.notify_end_of_interrupt(InterruptIndex::Timer.as_u8());
}
next_stack
}
The schedule_next
function is where we have to decide which thread
runs next, i.e the thread scheduling. This scheduling decision has to
be made quickly, and there has been a lot of work done on optimum
strategies. This is a useful page on CPU scheduling strategies. For now
we’ll just implement the simplest Round Robin method, in which the
threads take turns and each gets the same amount of time.
pub fn schedule_next(context_addr: usize) -> usize {
let mut running_queue = RUNNING_QUEUE.write();
let mut current_thread = CURRENT_THREAD.write();
if let Some(mut thread) = current_thread.take() {
// Save the location of the Context struct
thread.context = context_addr as u64;
// Put to the back of the queue
running_queue.push_back(thread);
}
// Get the next thread in the queue
*current_thread = running_queue.pop_front();
match current_thread.as_ref() {
Some(thread) => {
// Set the kernel stack for the next interrupt
gdt::set_interrupt_stack_table(
gdt::TIMER_INTERRUPT_INDEX as usize,
VirtAddr::new(thread.kernel_stack_end));
// Point the stack to the new context
thread.context as usize
},
None => 0 // Timer handler won't modify stack
}
}
This function takes the Thread struct from CURRENT_THREAD
(if there
is one), stores the context address that should be the same most of
the time but not always, and puts it to the back of the queue. The
thread at the front of the queue is then taken, the timer interrupt
stack entry is set, and the context location is returned. This
function handles the case that there are no threads, which occurs if a
timer interrupt occurs before the first kernel thread is started.
We haven’t worried about disabling interrupts while modifying anything
in this schedule_next
function because interrupts are disabled in
the timer interrupt handler (the cli
instruction, sti
re-enables).
To see if this works, we can modify the kernel thread functions, try calling functions and starting other threads:
fn kernel_thread_main() {
println!("Kernel thread start");
// Launch another kernel thread
process::new_kernel_thread(test_kernel_fn2);
loop {
println!("<< 1 >>");
x86_64::instructions::hlt();
}
}
fn test_kernel_fn2() {
println!("Hello from kernel function 2!");
loop {
println!(" << 2 >>");
x86_64::instructions::hlt();
}
}
which should now produce something like
<< 1 >> << 2 >> << 1 >> << 2 >> …
Now that the kernel can switch between multiple threads, we’re ready to start isolating them from each other. We’ll also want to be able to load programs from memory and (eventually) from disk. This is what we’ll start working on in the next section.