Skip to content

Latest commit

 

History

History
723 lines (629 loc) · 25.1 KB

01-interrupts-processes.org

File metadata and controls

723 lines (629 loc) · 25.1 KB

Interrupts and processes

This begins after Phil-Opp’s excellent post on heap allocation: We have a kernel with basic terminal and keyboard I/O, and kernel memory allocation from a fixed size heap. In this section we’ll work out how to interrupt a thread, access its context (i.e. the registers), and then change the context to switch threads.

The code here roughly corresponds to EuraliOS commit 2a515e40. There are differences, as bugs were fixed, code tidied up, and the order of changes adjusted based on experience in later sections.

Accessing processor context

Pre-emptive multitasking works by interrupting the CPU and swapping out one process for another by saving the state of all the registers, and changing the registers to the state of another task. The usual way to do this is with the timer interrupt which now looks like:

extern "x86-interrupt" fn timer_interrupt_handler(
    _stack_frame: InterruptStackFrame) {
    print!(".");
    unsafe {
        PICS.lock()
            .notify_end_of_interrupt(InterruptIndex::Timer.as_u8());
    }
}

All interrupt handlers currently use the x86-interrupt calling convention. With that convention the compiler automatically inserts code to save scratch registers to the current stack, and passes a pointer to the exception stack frame to the handler function. It was introduced in this PR which has some useful discussion and links.

The x86-interrupt calling convention works well for most interrupt handlers, but it has some limitations: It doesn’t (as far as I can tell) provide a way to access and modify the register values which are restored when the handler returns. Instead we’ll write a naked function to handle saving and restoring register values ourselves.

First we need to define a structure which stores the register values, so we’ll be able to access and modify them. In interrupts.rs define a Context struct:

#[derive(Debug)]
#[repr(packed)]
pub struct Context {
    // These are pushed in the handler function
    pub r15: usize,
    pub r14: usize,
    pub r13: usize,

    pub r12: usize,
    pub r11: usize,
    pub r10: usize,
    pub r9: usize,

    pub r8: usize,
    pub rbp: usize,
    pub rsi: usize,
    pub rdi: usize,

    pub rdx: usize,
    pub rcx: usize,
    pub rbx: usize,
    pub rax: usize,
    // Below is the exception stack frame pushed by the CPU on interrupt
    // Note: For some interrupts (e.g. Page fault), an error code is pushed here
    rip: usize,     // Instruction pointer
    cs: usize,      // Code segment
    rflags: usize,  // Processor flags
    rsp: usize,     // Stack pointer
    ss: usize,      // Stack segment
    // Here the CPU may push values to align the stack on a 16-byte boundary (for SSE)
}

We now define a function to replace the timer handler:

#[naked]
pub extern "x86-interrupt" fn timer_handler_naked (
   _stack_frame: InterruptStackFrame) {
  unsafe {
  ...
  }
}

Here the x86-interrupt calling convention is specified to satisfy the requirements of the IDT set_handler_fn method, but doesn’t have any effect because the function is [naked] and so the compiler doesn’t add any code around ours.

Because naked functions are still somewhat experimental, we need to enable this feature in lib.rs:

#![feature(naked_functions)]

While we’re editing lib.rs, add another feature we’ll need soon:

#![feature(asm_sym)]

which enables the sym operands in assembly blocks, used to take the address of symbols.

Now we can fill in the unsafe block in the timer_handler_naked function. In interrupts.rs import the asm! macro:

use core::arch::asm;

and then create an asm! block inside the unsafe scope of timer_handler_naked. Note that naked functions in recent Rust versions can only contain a single asm block, and can’t mix Rust and asm code, because the mix of naked functions, asm and Rust code sometimes led to surprising behaviour. We therefore call out to Rust code as quickly as possible:

asm!(
    // Disable interrupts
    "cli",
    // Push registers
    "push rax",
    "push rbx",
    "push rcx",
    "push rdx",

    "push rdi",
    "push rsi",
    "push rbp",
    "push r8",

    "push r9",
    "push r10",
    "push r11",
    "push r12",

    "push r13",
    "push r14",
    "push r15",

    // First argument in rdi with C calling convention
    "mov rdi, rsp",
    // Call the hander function
    "call {handler}",

    // Pop scratch registers
    "pop r15",
    "pop r14",
    "pop r13",

    "pop r12",
    "pop r11",
    "pop r10",
    "pop r9",

    "pop r8",
    "pop rbp",
    "pop rsi",
    "pop rdi",

    "pop rdx",
    "pop rcx",
    "pop rbx",
    "pop rax",
    // Enable interrupts
    "sti",
    // Interrupt return
    "iretq",
    // Note: Getting the handler pointer here using `sym` operand, because
    // an `in` operand would clobber a register that we need to save, and we
    // can't have two asm blocks
    handler = sym timer_handler,
    options(noreturn)
);

This function pushes the values of the registers onto the stack (note: the same stack as the interrupted process). Because stacks move downwards in memory the registers are pushed onto the stack in the order they appear in the Context struct from bottom to top.

The sym timer_handler argument takes the address of a symbol (the timer_handler function we haven’t written yet) and passes it in as handler so that it can be called. Now we can implement this handler function, choosing the “C” calling convention so that the first argument is in the RDI register:

extern "C" fn timer_handler(context: &mut Context) {
    print!("+");

    // Tell the PIC that the interrupt has been processed
    unsafe {
        PICS.lock()
            .notify_end_of_interrupt(InterruptIndex::Timer.as_u8());
    }
}

Note that the C calling convention here means some registers will be pushed onto the stack again, which is a bit inefficient but will have to do for now. We then set the IDT entry to the new handler:

idt[InterruptIndex::Timer.as_usize()]
            .set_handler_fn(timer_handler_naked); // new

While we’re at it we can also add interrupt handlers for page and general protection faults (GPF), which for now will just print error messages. This gives us more useful information than just a double fault.

use x86_64::structures::idt::PageFaultErrorCode;
use crate::hlt_loop;

extern "x86-interrupt" fn page_fault_handler(
    stack_frame: InterruptStackFrame,
    error_code: PageFaultErrorCode,
) {
    use x86_64::registers::control::Cr2;

    println!("EXCEPTION: PAGE FAULT");
    println!("Accessed Address: {:?}", Cr2::read());
    println!("Error Code: {:?}", error_code);
    println!("{:#?}", stack_frame);
    hlt_loop();
}

extern "x86-interrupt" fn general_protection_fault_handler(
    stack_frame: InterruptStackFrame,
    _error_code: u64) {
    panic!("EXCEPTION: GENERAL PROTECTION FAULT\n{:#?}", stack_frame);
}

and add these handlers to the IDT:

idt.page_fault.
    set_handler_fn(page_fault_handler).
    set_stack_index(gdt::PAGE_FAULT_IST_INDEX);
idt.general_protection_fault.
    set_handler_fn(general_protection_fault_handler).
    set_stack_index(gdt::GENERAL_PROTECTION_FAULT_IST_INDEX);

That uses two constants defined in gdt.rs:

pub const PAGE_FAULT_IST_INDEX: u16 = 0;
pub const GENERAL_PROTECTION_FAULT_IST_INDEX: u16 = 0;

All that set_stack_index method does is select an index in the TSS to load a stack pointer from when the interrupt occurs, ensuring that these handlers run with a known-good stack. There are ways this can go wrong even with a “good” stack pointer. For example if the page tables are messed up then the stack pointer or handler function virtual addresses may not be mapped to a physical address, in which case typically the double fault handler can’t be found either, a triple fault occurs and the CPU resets.

Testing

First check that this compiles and runs (cargo run). We can then check that the values of the registers are saved and restored correctly. In the new timer_handler function we can print some registers, change them, and then check that the change is seen in the interrupted code.

In main.rs, before the hlt_loop() call we can try this:

// Set some registers
unsafe {
  asm!("mov r11, 0x4242",
       "mov rdi, 0x22",
       "mov rcx, 0x93"
  );
}

// Wait for an interrupt
unsafe {asm!("hlt");}

// Get the register values
let (r11, rdi, rcx): (i64, i64, i64);
unsafe {asm!("nop",
             lateout("r11") r11,
             lateout("rdi") rdi,
             lateout("rcx") rcx);}
println!("R11: 0x{:x} RDI: 0x{:x} RCX: 0x{:x}", r11, rdi, rcx);

which will need the asm! macro so put use core::arch::asm; near the top of main.rs. When this runs we should see R11: 0x4242 RDI: 0x22 RCX: 0x93, so the registers weren’t modified by the interrupt.

Now in interrupt.rs we can access and modify some registers, and check that they are set correctly:

extern "C" fn timer_handler(context: &mut Context) {
    print!("<0x{:x}, 0x{:x}>", context.r11, context.rcx);
    context.r11 = context.rdi + 0x5321;
    context.rcx = 0xdeadbeef;

    // Tell the PIC that the interrupt has been processed
    unsafe {
        PICS.lock()
            .notify_end_of_interrupt(InterruptIndex::Timer.as_u8());
    }
}

So we now print the values of some registers, and modify the r11 and rcx registers. Running again we should see:

<0x4242, 0x93> R11: 0x5343 RDI: 0x22 RCX: 0xDEADBEEF
It did not crash!
<0x5343, 0xDEADBEEF><0x5343, 0xDEADBEEF>...

This shows that the timer handler can read and modify the process state, which we’ll need when we want to switch processes.

Test case

To turn this into a test case we can keep around, we need some way to test the wrapper code while replacing the timer_handler. Based on the MOROS code, turn the timer_handler_naked function into a macro:

macro_rules! wrap {
    ($func: ident => $wrapper:ident) => {
        #[naked]
        pub extern "x86-interrupt" fn $wrapper (_stack_frame: InterruptStackFrame) {
            unsafe{
              ...
            }
        }
    };
}

which can create the hander as before:

wrap!(timer_handler => timer_handler_naked);

Running (cargo run) should give the same result as before, but now we can write tests for the macro by wrapping different functions.

We can make a standalone test like the stack_overflow test by making the Context struct members public, exporting the wrap function (renamed to interrupt_wrap), and setting up a minimal IDT in the test case. This is in tests/interrupt_wrap.rs.

Kernel task switching

We’re now going to use the timer interrupt handler to switch between threads. When the timer interrupt handler starts, it pushes the context onto the stack, and when the handler finishes it pops the context off the stack. To change thread we just need to change the stack pointer inside the handler.

There is a lot of confusing information online about interrupt handling and context switching in x86. Most of this confusion is due to the different behaviour which has evolved over the last 40-some years: Real mode, 32-bit protected mode and 64-bit long mode all work somewhat differently. For 64-bit mode there is a good summary of exceptions here and a good explanation of context switching here.

In 64-bit mode interrupts can switch to known-good stacks which are listed in the Interrupt Stack Table (IST) in the Task State Segment. The IST has 7 entries per core that we can use (IST1 to IST7). The index into this table which should be used for each interrupt is specified in the Interrupt Descriptor Table (IDT). We’ve already used this to set good stacks for some of our handlers, and now we can use it to also switch tasks.

First we’ll set an IST index for the timer interrupt handler in the IDT table (interrupts.rs):

idt[InterruptIndex::Timer.as_usize()]
    .set_handler_fn(timer_handler_naked)
    .set_stack_index(gdt::TIMER_INTERRUPT_INDEX);

where the index is different from the index used for the fault handlers (which are all 0 for now). In gdt.rs:

pub const DOUBLE_FAULT_IST_INDEX: u16 = 0;
pub const PAGE_FAULT_IST_INDEX: u16 = 0;
pub const GENERAL_PROTECTION_FAULT_IST_INDEX: u16 = 0;
pub const TIMER_INTERRUPT_INDEX: u16 = 1; // New

The trick is that we are now going to change TSS entry 1 so that each thread stores its context in a different stack. Every time we switch thread we’re going to change the TSS entry to the new thread’s kernel stack, so when the timer interrupt occurs the thread’s context will be saved to its own kernel stack. To make this work we need to make the TSS mutable as well as static.

In gdt.rs the TSS lazy static is now wrapped in a Mutex. In the initialisation we set the timer interrupt entry to be the same as the double fault index so it’s got a sensible value:

use spin::Mutex; // New
use lazy_static::lazy_static;

lazy_static! {
    static ref TSS: Mutex<TaskStateSegment> = { // Modified Mutex<>
        let mut tss = TaskStateSegment::new();
        tss.interrupt_stack_table[DOUBLE_FAULT_IST_INDEX as usize] = {
            const STACK_SIZE: usize = 4096 * 5;
            static mut STACK: [u8; STACK_SIZE] = [0; STACK_SIZE];

            let stack_start = VirtAddr::from_ptr(unsafe { &STACK });
            let stack_end = stack_start + STACK_SIZE;
            stack_end
        };

        tss.interrupt_stack_table[TIMER_INTERRUPT_INDEX as usize] =
            tss.interrupt_stack_table[DOUBLE_FAULT_IST_INDEX as usize]; // New

        Mutex::new(tss) // Modified Mutex::new
    };
}

To set and retrieve values we can define some helper functions:

unsafe fn tss_reference() -> &'static TaskStateSegment {
    let tss_ptr = &*TSS.lock() as *const TaskStateSegment;
    & *tss_ptr
}

pub fn set_interrupt_stack_table(index: usize, stack_end: VirtAddr) {
    TSS.lock().interrupt_stack_table[index] = stack_end;
}

The tss_reference() function is only for internal use because we need to set the TSS address in the GDT. The set_interrupt_stack_table() function is public because we’ll call it to modify the timer interrupt stack entry in the TSS.

The Global Descriptor Table (GDT) is now changed to call tss_reference() to put the TSS address into the GDT. (This code was originally adapted from MOROS).

lazy_static! {
    static ref GDT: (GlobalDescriptorTable, Selectors) = {
        let mut gdt = GlobalDescriptorTable::new();

        let code_selector = gdt.add_entry(Descriptor::kernel_code_segment());
        let data_selector = gdt.add_entry(Descriptor::kernel_data_segment());
        let tss_selector = gdt.add_entry(Descriptor::tss_segment(
            unsafe {tss_reference()}));

        (gdt, Selectors {code_selector, data_selector, tss_selector})
    }
}

struct Selectors {
    code_selector: SegmentSelector,
    data_selector: SegmentSelector,
    tss_selector: SegmentSelector,
}

pub fn get_kernel_segments() -> (SegmentSelector, SegmentSelector) {
  (GDT.1.code_selector, GDT.1.data_selector)
}

Note that we don’t need to modify the GDT after initialisation, unlike the TSS, so it doesn’t need a Mutex.

Now we have some of the basic mechanisms, with the timer interrupt handler and ability to modify the TSS entry, we need to create some structures to create and keep track of threads.

Threads and processes

In a new source file process.rs we can create a struct Thread:

extern crate alloc;
use alloc::vec::Vec;

struct Thread {
    kernel_stack: Vec<u8>,
    user_stack: Vec<u8>,
    kernel_stack_end: u64, // This address goes in the TSS
    user_stack_end: u64,
    context: u64, // Address of Context on kernel stack
}

We need a “user” stack which is in use while the thread is doing whatever it’s supposed to do, a kernel stack which we’ll use to store the Context and use in the code which runs when a timer interrupt occurs. The end of the kernel stack (kernel_stack_end) is the address that goes into the TSS (index 1, TIMER_INTERRUPT_INDEX)

To keep track of threads we need to know which process is currently running, and which threads are waiting. Most operating systems seem to use some kind of linked list for this, but lists in Rust are.. awkward. Instead we’ll use a VecDeque which we can use as a queue by pushing Thread structures onto one end and popping them off the other. This is a static variable so to make writing thread safe we’ll wrap the VecDeque in a RwLock spin lock.

use spin::RwLock;
use lazy_static::lazy_static;
use alloc::{boxed::Box, collections::vec_deque::VecDeque};

lazy_static! {
    static ref RUNNING_QUEUE: RwLock<VecDeque<Box<Thread>>> =
        RwLock::new(VecDeque::new());

    static ref CURRENT_THREAD: RwLock<Option<Box<Thread>>> =
        RwLock::new(None);
}

So we’re going to put Thread structs in boxes, and put the boxes in RUNNING_QUEUE. When it’s a thread’s turn to run it’s going to be taken out of the queue and put into CURRENT_THREAD.

Let’s make a function to create a new kernel thread. It needs to take a function pointer as input, create a Thread struct, and put it into RUNNING_QUEUE.

const KERNEL_STACK_SIZE: usize = 4096 * 2;
const USER_STACK_SIZE: usize = 4096 * 5;

pub fn new_kernel_thread(function: fn()->()) {
    let new_thread = {
        let kernel_stack = Vec::with_capacity(KERNEL_STACK_SIZE);
        let kernel_stack_end = (VirtAddr::from_ptr(kernel_stack.as_ptr())
                               + KERNEL_STACK_SIZE).as_u64()
        let user_stack = Vec::with_capacity(USER_STACK_SIZE);
        let user_stack_end = (VirtAddr::from_ptr(user_stack.as_ptr())
                              + USER_STACK_SIZE).as_u64() as usize;
        let context = kernel_stack_end - INTERRUPT_CONTEXT_SIZE as u64;

        Box::new(Thread {
            kernel_stack,
            user_stack,
            kernel_stack_end,
            user_stack_end,
            context})
    };
    // Set context registers
    // Add Thread to RUNNING_QUEUE
}

To set the Thread context registers we can do:

let context = unsafe {&mut *(new_thread.context as *mut Context)};
context.rip = function as usize; // Instruction pointer
context.rsp = new_thread.user_stack_end; // Stack pointer
context.rflags = 0x200; // Interrupts enabled

let (code_selector, data_selector) = gdt::get_kernel_segments();
context.cs = code_selector.0 as usize;
context.ss = data_selector.0 as usize;

and finally to insert the new thread into running queue we use:

use x86_64::instructions::interrupts;
...
    interrupts::without_interrupts(|| {
        RUNNING_QUEUE.write().push_back(new_thread);
    });

where we are disabling interrupts while modifying the queue, so we don’t get a timer interrupt half way through which then tries to modify the queue.

We can try this out by creating a function in main.rs:

use blog_os::process;

fn kernel_thread_main() {
    println!("Kernel thread start");
    loop {
        println!("<< 1 >>");
        x86_64::instructions::hlt();
    }
}

fn kernel_main(boot_info: &'static BootInfo) -> ! {
    println!("Hello World{}", "!");

    blog_os::init();
    memory::init(boot_info);

    // Launch a kernel thread
    process::new_kernel_thread(kernel_thread_main);
    blog_os::hlt_loop();
}

Currently that won’t do anything because the thread is never scheduled. When it is run it should keep printing “<< 1 >>”.

Scheduling

The final step is to modify the timer interrupt handler (again) so that it uses the RUNNING_QUEUE to get the next thread to run, and switches the kernel stacks.

We need to be able to change stack, but this change needs to happen just before the interrupt occurs because any use of the stack, like calling or returning from a function, will modify the context and do unexpected things. We’ll therefore change the stack in the naked timer interrupt function:

...
"push r14",
"push r15",

"mov rdi, rsp",
"call {handler}",
// New: stack pointer is in RAX
"cmp rax, 0",
"je 2f",        // if rax != 0 {
"mov rsp, rax", //   rsp = rax;
"2:",           // }

"pop r15",
"pop r14",
...

The return value from a function with C calling convention is in the RAX register. We now check if the return value is non-zero, and if so we make it the new stack pointer. The registers will then be popped from this new stack.

The handler function can just call a function in process.rs (which we’re going to define soon):

use crate::process;

extern "C" fn timer_handler(context_addr: usize) -> usize {
    let next_stack = process::schedule_next(context_addr);
    unsafe {
        PICS.lock()
            .notify_end_of_interrupt(InterruptIndex::Timer.as_u8());
    }
    next_stack
}

The schedule_next function is where we have to decide which thread runs next, i.e the thread scheduling. This scheduling decision has to be made quickly, and there has been a lot of work done on optimum strategies. This is a useful page on CPU scheduling strategies. For now we’ll just implement the simplest Round Robin method, in which the threads take turns and each gets the same amount of time.

pub fn schedule_next(context_addr: usize) -> usize {
    let mut running_queue = RUNNING_QUEUE.write();
    let mut current_thread = CURRENT_THREAD.write();

    if let Some(mut thread) = current_thread.take() {
        // Save the location of the Context struct
        thread.context = context_addr as u64;
        // Put to the back of the queue
        running_queue.push_back(thread);
    }
    // Get the next thread in the queue
    *current_thread = running_queue.pop_front();
    match current_thread.as_ref() {
        Some(thread) => {
            // Set the kernel stack for the next interrupt
            gdt::set_interrupt_stack_table(
              gdt::TIMER_INTERRUPT_INDEX as usize,
              VirtAddr::new(thread.kernel_stack_end));
            // Point the stack to the new context
            thread.context as usize
          },
        None => 0  // Timer handler won't modify stack
    }
}

This function takes the Thread struct from CURRENT_THREAD (if there is one), stores the context address that should be the same most of the time but not always, and puts it to the back of the queue. The thread at the front of the queue is then taken, the timer interrupt stack entry is set, and the context location is returned. This function handles the case that there are no threads, which occurs if a timer interrupt occurs before the first kernel thread is started.

We haven’t worried about disabling interrupts while modifying anything in this schedule_next function because interrupts are disabled in the timer interrupt handler (the cli instruction, sti re-enables).

To see if this works, we can modify the kernel thread functions, try calling functions and starting other threads:

fn kernel_thread_main() {
    println!("Kernel thread start");

    // Launch another kernel thread
    process::new_kernel_thread(test_kernel_fn2);

    loop {
        println!("<< 1 >>");
        x86_64::instructions::hlt();
    }
}

fn test_kernel_fn2() {
    println!("Hello from kernel function 2!");
    loop {
        println!("       << 2 >>");
        x86_64::instructions::hlt();
    }
}

which should now produce something like

<< 1 >> << 2 >> << 1 >> << 2 >> …

Now that the kernel can switch between multiple threads, we’re ready to start isolating them from each other. We’ll also want to be able to load programs from memory and (eventually) from disk. This is what we’ll start working on in the next section.