In the last section we worked out how to run a program in ring 3, but couldn’t protect programs from each other or run more than one without having to manually choose memory ranges for each program.
In this section we will
- Isolate programs from each other by creating separate page tables for each process. We will want to keep the kernel pages mapped, and add the user pages.
- Prepare for multi-threaded processes by creating an allocator for user thread stacks.
In the process we’ll learn about memory layout, page tables, and how to switch between them.
First add a new function to memory.rs
which creates a new empty
level 4 pagetable for the user process:
fn create_empty_pagetable() -> (*mut PageTable, u64) {
// Need to borrow as mutable so that we can allocate new frames
// and so modify the frame allocator
let memory_info = unsafe {MEMORY_INFO.as_mut().unwrap()};
// Get a frame to store the level 4 table
let level_4_table_frame = memory_info.frame_allocator.allocate_frame().unwrap();
let phys = level_4_table_frame.start_address(); // Physical address
let virt = memory_info.physical_memory_offset + phys.as_u64(); // Kernel virtual address
let page_table_ptr: *mut PageTable = virt.as_mut_ptr();
// Clear all entries in the page table
unsafe {
(*page_table_ptr).zero();
}
(page_table_ptr, phys.as_u64())
}
Note that it returns both a pointer (i.e. a virtual memory address) to the new page table, and also the physical address of the table. That physical address is what needs to be written to the CR3 register in order for this page table to be used.
If we switch to this page table then none of the kernel code or data
will be accessible. Before using this page table we therefore have to
add all the kernel pages which are in the currently active page table.
The simplest, brute force, way to do this is to create a new set of
page tables for each process. That avoids any possibility of pages
being made available to multiple processes, but duplicates the kernel
page tables. In memory.rs
the MemoryInfo
struct can be extended to
include a reference to the kernel page table:
struct MemoryInfo {
boot_info: &'static BootInfo,
physical_memory_offset: VirtAddr,
frame_allocator: BootInfoFrameAllocator,
kernel_l4_table: &'static mut PageTable // new
}
which can be set in the init
function:
pub fn init(boot_info: &'static BootInfo) {
// ...
// Store boot_info for later calls
unsafe { MEMORY_INFO = Some(MemoryInfo {
boot_info,
physical_memory_offset,
frame_allocator,
kernel_l4_table: level_4_table // new
}) };
// ...
}
In the new_user_thread
function in process.rs
we can now get the
page table pointer and physical address, and then switch to the new
table:
if let Ok(obj) = object::File::parse(bin) {
let (user_page_table_ptr, user_page_table_physaddr) =
memory::create_kernel_only_pagetable(); // New
unsafe {
asm!("mov cr3, {addr}", addr = in(reg) user_page_table_physaddr); // New
}
...
So when memory is allocated and the ELF data is read, the new page table entries are in the new page table. This writes the physical address of the page table to CR3 (Control Register 3), which triggers a TLB flush. Since we are not changing the kernel pages, these can be kept rather than flushed by setting the Global bit in the page table, as explained in this OSdev wiki page.
The user program should now run with the new page table, producing the same output as before! Unfortunately this pagetable is now used for all threads, and creating a new user process will change the page table for all threads. To really separate processes we need to change page tables during context switches.
To isolate threads from each other, we’re now going to give each
thread its own page table. In processes.rs
add a field to the
Thread
struct:
struct Thread {
/// Thread ID
tid: usize,
/// Page table physical address
page_table_physaddr: u64, // New
...
}
In the new_kernel_thread
function we’ll set this address to
zero, to indicate that no page table switch is needed, since
kernel pages are mapped in all tables.
Box::new(Thread {
tid: 0,
page_table_physaddr: 0, // New
In the new_user_thread
we’ll store the physical address of the
new page table:
Box::new(Thread {
tid: 0,
page_table_physaddr: user_page_table_physaddr, // New
We’re going to need to switch page tables in a couple of places now
(the context switch and new user thread code) so let’s define
a function in memory.rs
to do this:
pub fn switch_to_pagetable(physaddr: u64) {
unsafe {
asm!("mov cr3, {addr}",
addr = in(reg) physaddr);
}
}
And add use core::arch::asm;
near the top of memory.rs
. We can
then call this function in new_user_thread
, replacing the unsafe asm
block:
memory::switch_to_pagetable(user_page_table_physaddr);
At this point it’s also very important to consider interrupts in our
new_user_thread
function: It is changing to a new page table and
then modifying it. If a context switch occurs while this is
happening, the page table will be switched and changes will be made to
the wrong tables. We can either disable interrupts while working with the
new page table, or the context switch needs to save and restore each
thread’s page table.
Finally in process.rs
the function schedule_next
, which is called
by the timer interrupt to switch context, can be modified:
match current_thread.as_ref() {
Some(thread) => {
gdt::set_interrupt_stack_table(
gdt::TIMER_INTERRUPT_INDEX as usize,
VirtAddr::new(thread.kernel_stack_end));
if thread.page_table_physaddr != 0 {
memory::switch_to_pagetable(thread.page_table_physaddr); // New
}
thread.context as usize
An optimisation here would be to only switch pagetable if it’s different from the already active pagetable e.g if there is only one running thread.
To try this out we need to run two userspace programs simultaneously.
In main.rs
we have the entry point:
entry_point!(kernel_entry);
fn kernel_entry(boot_info: &'static BootInfo) -> ! {
blog_os::init();
memory::init(boot_info);
syscalls::init();
#[cfg(test)]
test_main();
process::new_kernel_thread(kernel_thread_main);
blog_os::hlt_loop();
}
which sets up some basic kernel functions, then starts a kernel
thread and waits for it to be scheduled. At this point we go
to the kernel_thread_main
function, and can launch two
of the same programs:
fn kernel_thread_main() {
println!("Kernel thread start");
process::new_user_thread(include_bytes!("../user/hello"));
process::new_user_thread(include_bytes!("../user/hello"));
blog_os::hlt_loop();
}
To see if both threads are running side-by-side, we can add some delays
between outputs in each thread. For now this will be just brute force nop
loops
in hello.rs
:
#[no_mangle]
pub unsafe extern "sysv64" fn _start() -> ! {
print!("Hello from user world! {}", 42);
for i in 1..10 {
println!("{}", i);
for i in 1..10000000 { // wait
unsafe { asm!("nop");}
}
}
loop {}
}
When run you should see two counters interleaved, each counting up to 9.
This doesn’t necessarily need to be done now, but it’ll be useful to
label threads somehow, and having unique numbers comes in handy
occasionally. The easiest way to make a unique number is with a
counter: In process.rs
we can add:
lazy_static! {
// ...
static ref UNIQUE_COUNTER: RwLock<u64> = RwLock::new(0);
}
and then a function which returns a different number each time it is called:
pub fn unique_id() -> u64 {
interrupts::without_interrupts(|| {
let mut counter = UNIQUE_COUNTER.write();
*counter += 1;
*counter
})
}
Everywhere we create a new Thread
object we can now write:
Box::new(Thread {
tid: unique_id(), // new
We’re working up to enabling multi-threaded programs in the next
section. Those will share a page table and other resources, but it’s
important that they have separate stacks. Currently we use a fixed
(virtual) address for the user stack (const USER_STACK_START: u64 =
0x5200000;
) which we hope doesn’t overlap with data loaded from the
ELF file. This approach won’t work for two or more threads sharing the
same virtual memory space.
To give each thread a different region of (virtual) memory to use as a stack, we’ll need to understand a bit better how the virtual memory is being used. These python routines are useful, as they convert between virtual addresses and page table indices.
def page_table_indices(vaddr):
return ((vaddr >> 39) & 511, (vaddr >> 30) & 511, (vaddr >> 21) & 511, (vaddr >> 12) & 511, vaddr & 4095)
def page_table_address(indices):
return (indices[0] << 39) + (indices[1] << 30) + (indices[2] << 21) + (indices[3] << 12) + indices[4]
We’re currently loading our user programs starting at 0x5000000,
corresponding to indices (0, 0, 40, 0, 0)
so level 4 and level 3
table index 0, level 2 index 40, level 1 index 0 and frame offset 0.
A common choice is to use the lower half of 32-bit address space for
user code, and reserve the upper half between 0x80000000 (page table
indices (0, 2, 0, 0, 0)
) and 0x100000000 (indices (0, 4, 0, 0,
0)
)for kernel code.
Above 0x100000000 (beyond 32-bit addresses) we can use any address range we like for things like stack and heap allocations. A small part of this huge address space is already used:
- The bootloader maps all memory starting at address
0x18000000000
, corresponding to page table indices(3, 0, 0, 0, 0)
. - The kernel
heap (in
allocator.rs
) starts at0x_4444_4444_0000
which is page index(136, 273, 34, 64, 0)
.
For now we can reserve a level 1 page table, which has 512 pages. If we allocate 8 to each thread then we’ll be able to have up to 64 threads per page table. We can divide this up into (from low to high):
- One unused page table as a guard; accessing this (user stack overflow) will trigger a fault.
- Seven user stack pages (28k).
A user stack underflow accesses the unused guard page in the next set of thread pages, so this should hopefully prevent some hard-to-find bugs which would occur if threads started writing over each other’s stacks.
Note that these stacks are quite small: 7 pages is 28k. Linux allocates two pages per thread for kernel stacks, but 8Mb for user stacks. The user stack size can be found by running
$ ulimit -s
which gives the user stack size in kb. We’ll have to find a different solution later if larger user stacks are needed.
We can choose an arbitrary page table, for example (5,0,0,*,*)
which
maps virtual memory addresses 0x28000000000 to 0x28000200000. We can store
this choice in a set of indices in memory.rs
:
const THREAD_STACK_PAGE_INDEX: [u8; 3] = [5, 0, 0];
so that we can quickly find the page table by following entries.
Defining a new function allocate_user_stack
:
pub fn allocate_user_stack(
level_4_table: *mut PageTable
) -> Result<(u64, u64), &'static str> {
...
}
which will take the level 4 (top-level) page table for the process,
and return either a pair of u64
integers with the start and end
address of the user stack, or an error string.
First we need to find the level 1 table by following entries. Since page table entries store physical addresses, we need to use the physical memory offset to convert these to virtual addresses:
let memory_info = unsafe {MEMORY_INFO.as_mut().unwrap()};
let mut table = unsafe {&mut *level_4_table};
for index in THREAD_STACK_PAGE_INDEX {
let entry = &mut table[index as usize];
if entry.is_unused() {
...
}
table = unsafe {&mut *(memory_info.physical_memory_offset
+ entry.addr().as_u64())
.as_mut_ptr()};
}
This walks through the page tables, using indices in the
THREAD_STACK_PAGE_INDEX
array. As written, it relies on
the tables already being allocated. If an entry is unused,
we will allocate a new page table with:
let (new_table_ptr, new_table_physaddr) = create_empty_pagetable();
entry.set_addr(PhysAddr::new(new_table_physaddr),
PageTableFlags::PRESENT |
PageTableFlags::WRITABLE |
PageTableFlags::USER_ACCESSIBLE);
Note: We could do something more clever by allocating just one page table and mapping one of its entries to itself. Then we would only need one new page table per process, rather than three, and memory translation might be faster. For now we’ll just do the simple thing, and revisit this once we decide how to allocate heap memory for user processes.
Having got a reference to the level 1 page table, we now need to find a group of 8 pages which are available. We divide the 512 entries into 64 slots. Since most processes will probably have far fewer than 64 threads, the chances are quite good that a random slot will be unused. If the slot is used, then we’ll go through the slots sequentially.
Getting a random number is harder than it seems, and isn’t really needed here,
so we’ll just use unique_id
which should be good enough:
use crate::process;
let n_start = process::unique_id();
then iterate through each slot n
, cycling around the 64 slots in the
page table, and check if it is used. The first page in each 8-page
slot is always left empty as a guard page, so we check the n * 8 + 1
page table entry:
for i in 0..64 {
let n = ((n_start + i) % 64) as usize;
if table[n * 8 + 1].is_unused() {
// Found an empty slot
}
}
Err("All thread stack slots full")
At the end if all the slots are full we leave the for loop and return an error message.
Slot n
consists of page tables [n * 8]
to [n * 8 + 7]
,
of which the first is empty and the remaining 7 need to be allocated:
for j in 1..8 {
let entry = &mut table[n * 8 + j];
let frame = memory_info.frame_allocator.allocate_frame()
.ok_or("Failed to allocate frame")?;
entry.set_addr(frame.start_address(),
PageTableFlags::PRESENT |
PageTableFlags::WRITABLE |
PageTableFlags::USER_ACCESSIBLE);
}
All that remains is to calculate and return the virtual address range of the stack we’ve just allocated:
let slot_address: u64 =
((THREAD_STACK_PAGE_INDEX[0] as u64) << 39) +
((THREAD_STACK_PAGE_INDEX[1] as u64) << 30) +
((THREAD_STACK_PAGE_INDEX[2] as u64) << 21) +
(((n * 8) as u64) << 12);
return Ok((slot_address + 4096,
slot_address + 8 * 4096));
To test this out we will modify the Thread
struct, and the kernel
and user thread creation functions in process.rs
.
The Thread
struct can now be modified to store the end addresses of
both kernel and user stacks, and use only one Vec
to allocate stack
space on the kernel heap. Keeping the kernel stack on the kernel heap
makes switching pagetables easier because the kernel stack is then
mapped in all page tables.
struct Thread {
tid: u64,
page_table_physaddr: u64,
kernel_stack: Vec<u8>,
kernel_stack_end: u64,
user_stack_end: u64, // new
context: u64,
}
The fmt
implementation for Thread
also needs to be modified to print the stack locations.
In the new_kernel_thread
function, we can now allocate
both kernel and “user” stacks in the same Vec
:
let new_thread = {
let kernel_stack = Vec::with_capacity(KERNEL_STACK_SIZE + USER_STACK_SIZE); // new
let kernel_stack_start = VirtAddr::from_ptr(kernel_stack.as_ptr());
let kernel_stack_end = (kernel_stack_start + KERNEL_STACK_SIZE).as_u64(); // new
let user_stack_end = kernel_stack_end + (USER_STACK_SIZE as u64); // new
Box::new(Thread {
tid: unique_id(),
page_table_physaddr: 0,
kernel_stack,
kernel_stack_end,
user_stack_end, // new
context: kernel_stack_end - INTERRUPT_CONTEXT_SIZE as u64,
})
};
The stack pointer in the new context is set to the user stack end:
context.rsp = new_thread.user_stack_end as usize;
In new_user_thread
we can use the new allocate_user_stack
function
let new_thread = {
let kernel_stack = Vec::with_capacity(KERNEL_STACK_SIZE);
let kernel_stack_start = VirtAddr::from_ptr(kernel_stack.as_ptr());
let kernel_stack_end = (kernel_stack_start + KERNEL_STACK_SIZE).as_u64();
let (user_stack_start, user_stack_end) =
memory::allocate_user_stack(user_page_table_ptr)?; // new
Box::new(Thread {
tid: unique_id(),
page_table_physaddr: user_page_table_physaddr,
kernel_stack: kernel_stack,
kernel_stack_end,
user_stack_end, // new
context: kernel_stack_end - INTERRUPT_CONTEXT_SIZE as u64
})
};
and then set context.rsp
in the same way as for kernel threads.
Hopefully the code still runs as before, context switching between two user processes. In the next section we’ll use the stack allocator and add syscalls to let user threads spawn new threads.