In section 12 we wrote a basic pci
program, which scans the
PCI bus for devices and makes it available to other processes.
Then in the last section we worked on enabling programs to open
connections, and ensuring that return messages go to the right
program. In this section we’ll start adding device drivers,
beginning with a network interface card, the RTL8139.
In addition to the OSDev page, the MOROS source code was very useful while developing this.
There are two ways to read and write data to PCI cards: via I/O ports and by direct memory access (DMA). The addresses are stored in Base Address Registers (BARs) in the configuration registers, and are explained in this OSDev wiki page.
Reading the BAR involves writing to the CONFIG_ADDRESS
port, then
reading from CONFIG_DATA
(see section 12). If multiple device
drivers try to do this simultaneously then they may be unlucky and
be interrupted in between writing the address and reading the value,
with undefined consequences. To prevent this we’ll only access
the configuration registers via the pci
program.
In pci/src/main.rs
we need to add a message handler for a new pci::READ_BAR
message type:
syscalls::Message::Short(
pci::READ_BAR, address, bar_id) => {
if address > 0xFFFF_FFFF || bar_id > 5 {
// Out of range
syscalls::send(0,
syscalls::Message::Short(
pci::NOTFOUND,
0xFFFF_FFFF_FFFF_FFFF, 0));
continue;
}
let bar_value =
PciLocation::from_address(address as u32)
.read_register(4 + bar_id as u8);
syscalls::send(0,
syscalls::Message::Short(
pci::BAR,
bar_value as u64, bar_id));
}
This either returns pci::NOTFOUND
, or a pci::BAR
message
type containing the value. The new message types are added to
euralios_std/src/message.rs
:
pub mod pci {
pub const FIND_DEVICE: u64 = 256;
pub const READ_BAR: u64 = 257; // New
pub const ADDRESS: u64 = 384;
pub const NOTFOUND: u64 = 385;
pub const BAR: u64 = 386; // New
}
We can now read and print BAR0 in rtl8139
by sending a message
to pci
:
let bar0 = match syscalls::send_receive(
handle,
syscalls::Message::Short(
pci::READ_BAR, address, 0)).unwrap() {
syscalls::Message::Short(pci::BAR,
bar_value, _) => bar_value,
_ => panic!("rtl8139 unexpected reply: {:?}", reply)
};
debug_println!("BAR0: {:08X}", bar0);
which prints BAR0: 0000C001
. The final bit is 1
so this is
an I/O space BAR. To get the 16-bit I/O address we need
BAR0 & 0xFFFC
to mask the lowest two bits.
We’re going to often have to use send_receive
to send
messages to another process and wait for a reply, a kind of
Remote Procedure Call (RPC). The code above to read the BAR
has quite a bit of boilerplate, and doesn’t even handle the
case that the Rendezvous is busy and we need to wait and retry.
To wrap this up we can add a function rcall()
in
euralios_std/src/message.rs
pub fn rcall(
handle: u32,
data1: u64,
data2: u64,
data3: u64,
expect_rdata1: Option<u64>
) -> Result<(u64, u64, u64), u64> {
...
}
The idea is that a user passes in the handle and message data, and can
optionally specify the value expected in the return data1 part of the
message. The code calls send_receive()
in a loop; if the Rendezvous
is blocked then it waits and retries up to a maximum number of times.
If successful then it returns the three data values; if not then an
error code.
The rtl8139
program can now be simplified to:
#[no_mangle]
fn main() {
debug_println!("[rtl8139] Starting driver");
let handle = syscalls::open("/pci").expect("Couldn't open pci");
// Use PCI program to look for device
let (msg_type, address, _) = rcall(handle, pci::FIND_DEVICE,
0x10EC, 0x8139,
None).unwrap();
if msg_type != pci::ADDRESS {
debug_println!("[rtl8139] Device not found. Exiting.");
return;
}
debug_println!("[rtl8139] Found at address: {:08X}", address);
// Read BAR0 to get the I/O address
let (_, bar0, _) = rcall(handle, pci::READ_BAR,
address, 0,
Some(pci::BAR)).unwrap();
let ioaddr = (bar0 & 0xFFFC) as u16;
debug_println!("[rtl8139] BAR0: {:08X}. I/O addr: {:04X}", bar0, ioaddr);
}
Following the OSDev page and MOROS source code, we first need to reset the network card:
struct Device {
ioaddr: u16,
}
impl Device {
fn reset(&mut self) -> Result<(), &'static str> {
...
Ok(())
}
}
We’ll need to read and write to ports quite often, so can define some functions to help:
fn outportb(ioaddr: u16, value: u8) {
unsafe {
asm!("out dx, al",
in("dx") ioaddr,
in("al") value,
options(nomem, nostack));
}
}
fn inb(ioaddr: u16) -> u8 {
let value: u8;
unsafe {
asm!("in al, dx",
in("dx") ioaddr,
lateout("al") value,
options(nomem, nostack));
}
value
}
Resetting consists of powering on:
outportb(self.ioaddr + 0x52, 0);
starting a software reset:
outportb(self.ioaddr + 0x37, 0x10);
and then waiting for the reset bit to be cleared:
const MAX_ATTEMPTS: usize = 1000;
let mut retry = 0;
while (inb(self.ioaddr + 0x37) & 0x10) != 0 {
retry += 1;
if retry > MAX_ATTEMPTS {
return Err("Timeout");
}
// Wait for a bit
for _i in 0..100000 {
unsafe{ asm!("nop"); }
}
}
It would be nice if we had a sleep
or yield
syscall
so that we might do something useful while waiting. For now
we just call nop
many times.
The Media Access Control address is used to uniquely identify a network interface on a local network. It is the low-level address which is needed to actually deliver a packet of data to the specific intended recipient device.
We are probably going to need MAC addresses quite often, so will add
it to the standard library in a new file, euralios_std/src/net.rs
:
pub struct MacAddress {
octet: [u8; 6]
}
along with some methods to convert to and from arrays, intended to be the same as the mac_address crate:
impl MacAddress {
/// Create a new MacAddress from bytes
pub fn new(octet: [u8; 6]) -> MacAddress {
MacAddress{octet}
}
/// Return the address as an array of bytes
pub fn bytes(&self) -> [u8; 6] {
self.octet
}
}
and a Display
trait for pretty printing:
use core::fmt;
impl fmt::Display for MacAddress {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
for i in 0..5 {
write!(f, "{:02X}:", self.octet[i])?;
}
write!(f, "{:02X}", self.octet[5])
}
}
Now adapting the MOROS code we can write a method to read the MAC
address from the RTL8139 I/O address in rtl8139/src/main.rs
:
impl Device {
fn reset(&mut self) -> Result<(), &'static str> {
...
}
fn mac_address(&self) -> MacAddress { // New
let mut octet: [u8; 6] = [0; 6];
for ind in 0..octet.len() {
octet[ind] = inb(self.ioaddr + ind as u16);
}
MacAddress::new(octet)
}
}
After getting the ioaddr
, the main()
function can now
reset the network card and read the MAC address:
let mut device = Device{ioaddr};
match device.reset() {
Ok(()) => debug_println!("[rtl8139] Device reset OK"),
Err(message) => {
debug_println!("[rtl8139] Device failed to reset: {}", message);
return;
}
}
debug_println!("[rtl8139] MAC address {}", device.mac_address());
To check that this produces the result we expect, we can try
changing the MAC address that QEMU assigns to the network card.
In kernel/Cargo.toml
we can choose the MAC address e.g.
[package.metadata.bootimage]
run-args = ["-nic", "user,model=rtl8139,mac=00:11:22:33:44:55"]
sets the MAC address to 00:11:22:33:44:55
. Running the code
produces something like figure fig-reset:
I’ve tidied up some of the output which isn’t really needed now
(like the ELF segments), and started putting the name of the
program at the start of the line (e.g [rtl8139]
or [pci]
)
because their outputs may be interleaved.
When data is received the network card is going to write data to memory, so we need to give it a physical memory address to write to. This is a problem because our driver is a user space program which doesn’t have access to page tables.
It is recommended that the receive buffer be 8k + 16 bytes long, just over 2 pages long, so we need three consecutive pages mapped to three consecutive frames. The address is 32 bits, so all of this memory must be in physical memory below 4Gb.
We’ll therefore add a system call malloc
to allocate chunks of
memory. As discussed in section 6, the Linux mmap()
syscall does
something like this, allocating pages which can be free’d back to the
operating system. We also need to be able to allocate and pass around
chunks of memory for large messages where we want to transfer more
than a few registers between processes (see section 11), so now seems
like a good time to do this.
The malloc
system call will need:
- The number of pages to allocate
- Whether the frames need to be consecutive
- Whether they must be in 32-bit address space
For simple memory allocation where the frames don’t need to be consecutive a lazy on-demand allocation can be used. Where frames must be consecutive then we need to add functionality to the frame allocator to find consecutive frames.
We need to quickly find a set of pages to allocate to put in our frames. Fortunately the 64-bit address space is extravagantly large so finding space is fairly straightforward.
As in section 3 and section 6 we can use the the page_table_address
function to work out addresses:
- The user stack pages are (5,0,0,0,0) to (5,0,1,0,0) i.e. addresses
0x28000000000 to 0x28000200000, 2Mb in total.
- The user heap is (5,0,3,0,0) to (5,0,23,0,0), addresses
0x28000600000 to 0x28002e00000, a total of 0x2800000 bytes or 40Mb.
To keep this simple we could use (5,1,0,0,0) to (6,0,0,0,0) for memory chunks. Each chunk will be an entry in the L3 table, so (5,1,*,*,*) is one chunk, (5,2,*,*,*) is another. This means:
- Each chunk is limited to 1Gb maximum
- Each process can have up to 511 chunks (because index 0 is already used)
- Moving a memory chunk from one process to another just requires moving one L3 table entry.
In kernel/src/memory.rs
we need to add a method to the frame
allocator, which will search for consecutive frames. This is not going
to be very efficient, but this isn’t an operation which will be needed
often; only device drivers will really care if their frames are
consecutive or if they are 32-bit addressable. The outline of the
function is below; many details are similar to what was done in
section 5 or can be found in the EuraliOS repository.
impl MultilevelBitmapFrameAllocator {
fn consecutive_frames(
&mut self,
needed_frames: u64,
max_address: u64
) -> Option<u64> {
// Restrict frame range to those with physical address below max_address
for frame in range {
if frame is available {
count += 1;
if count == needed_frames {
// Found run of frames
// Mark frames as taken and return
}
} else {
count = 0; // Run ended
}
}
}
}
Like before, we can also write a small wrapper to convert to PhysFrame:
fn allocate_consecutive_frames(
&mut self,
needed_frames: u64,
max_address: u64
) -> Option<PhysFrame> {
if let Some(frame_number) = self.consecutive_frames(needed_frames, max_address) {
// Convert from frame number to physical address
return PhysFrame::from_start_address(
self.frame_phys_addr + frame_number * 4096).ok();
}
None
}
Then we need a function which will create the page table entries, starting at a given virtual address:
pub fn create_consecutive_pages(
level_4_physaddr: u64,
start_addr: VirtAddr,
num_frames: u64,
max_physaddr: u64)
-> Result<PhysAddr, MapToError<Size4KiB>> {
let memory_info = unsafe {MEMORY_INFO.as_mut().unwrap()};
let frame_allocator = &mut memory_info.frame_allocator;
// Try to allocate a consecutive set of frames
let start_frame = frame_allocator
.allocate_consecutive_frames(num_frames, max_physaddr)
.ok_or(MapToError::FrameAllocationFailed)?;
let frame_range = PhysFrame::range(start_frame, start_frame + num_frames);
let page_range = {
let start_page = Page::containing_address(start_addr);
Page::range(start_page, start_page + num_frames)
};
let l4_table: &mut PageTable = unsafe {
&mut *(memory_info.physical_memory_offset
+ level_4_physaddr).as_mut_ptr()};
let mut mapper = unsafe {
OffsetPageTable::new(l4_table,
memory_info.physical_memory_offset)};
for (page, frame) in page_range.zip(frame_range) {
println!("Page: {:?} -> Frame: {:?}", page, frame);
unsafe {
mapper.map_to_with_table_flags(page,
frame,
// Writeable by user
PageTableFlags::PRESENT |
PageTableFlags::WRITABLE |
PageTableFlags::USER_ACCESSIBLE,
// Parent table flags include writable
PageTableFlags::PRESENT |
PageTableFlags::WRITABLE |
PageTableFlags::USER_ACCESSIBLE,
frame_allocator)?.flush()
};
}
Ok(start_frame.start_address())
}
That function needs the starting virtual address of the page range to be used. We’ll need a function to find an empty page table entry which can be used:
const MEMORY_CHUNK_L4_ENTRY: usize = 5;
const MEMORY_CHUNK_L3_FIRST: usize = 1;
const MEMORY_CHUNK_L3_LAST: usize = 511;
pub fn find_available_page_chunk(
level_4_physaddr: u64
) -> Option<VirtAddr> {
let memory_info = unsafe {MEMORY_INFO.as_mut().unwrap()};
let l4_table: &mut PageTable = unsafe {
&mut *(memory_info.physical_memory_offset
+ level_4_physaddr).as_mut_ptr()};
let l4_entry = &mut l4_table[MEMORY_CHUNK_L4_ENTRY];
if l4_entry.is_unused() {
// L3 table not allocated -> Create
let (_new_table_ptr, new_table_physaddr) = create_empty_pagetable();
l4_entry.set_addr(PhysAddr::new(new_table_physaddr),
PageTableFlags::PRESENT |
PageTableFlags::WRITABLE |
PageTableFlags::USER_ACCESSIBLE);
}
let l3_table: &PageTable = unsafe {
& *(memory_info.physical_memory_offset
+ l4_entry.addr().as_u64()).as_ptr()};
for ind in MEMORY_CHUNK_L3_FIRST..=MEMORY_CHUNK_L3_LAST {
let entry = &l3_table[ind];
if entry.is_unused() {
return Some(VirtAddr::new(((MEMORY_CHUNK_L4_ENTRY as u64) << 39) |
(ind << 30) as u64));
}
}
None
}
Bringing these pieces together, we can write a function in
kernel/src/process.rs
which will find an available chunk of pages
and allocate frames, either a consecutive set or on-demand:
pub fn new_memory_chunk(
num_pages: u64,
max_physaddr: u64
) -> Result<(VirtAddr, PhysAddr), usize> {
// Get the current thread
if let Some(thread) = CURRENT_THREAD.read().as_ref() {
println!("Thread {} new chunk {} pages",
thread.tid(), num_pages);
// Virtual address of the available page chunk
let start_addr = match memory::find_available_page_chunk(
thread.page_table_physaddr) {
Some(values) => values,
None => return Err(syscalls::SYSCALL_ERROR_MEMORY)
};
if max_physaddr != 0 {
// Allocate a consecutive set of frames
let physaddr = match memory::create_consecutive_pages(
thread.page_table_physaddr,
start_addr,
num_pages,
max_physaddr) {
Ok(physaddr) => physaddr,
Err(_) => return Err(syscalls::SYSCALL_ERROR_MEMORY)
};
return Ok((start_addr, physaddr));
} else {
// User doesn't need frames to be consecutive
// -> Allocate frames only when actually used
if memory::create_user_ondemand_pages(
thread.page_table_physaddr,
start_addr,
num_pages).is_err() {
return Err(syscalls::SYSCALL_ERROR_MEMORY);
}
// Note: physical address not returned because
// the frames are not guaranteed to be
// consecutive in physical address.
return Ok((start_addr, PhysAddr::new(0)));
}
}
Err(syscalls::SYSCALL_ERROR_THREAD)
}
Now we can write the syscall interface to use the
create_consecutive_pages
function to allocate memory chunks.
In the library code euralios_std/src/syscalls.rs
we can define a
handle type
#[derive(Debug)]
pub struct MemoryHandle(u64);
To try and make a safe interface for users, this handle can’t be copied but refers to a unique region of memory. User code will be able to pass it to other processes, and the memory will be free’d when the handle is dropped. To access the memory users can get a reference:
impl MemoryHandle {
/// Get a reference with lifetime tied to MemoryHandle
pub unsafe fn as_ref<T>(&self) -> &T {
& *(self.0 as *const T)
}
/// Get a mutable reference with lifetime tied to MemoryHandle
pub unsafe fn as_mut_ref<T>(&mut self) -> &mut T {
&mut *(self.0 as *mut T)
}
}
Those references will have the same lifetime as the handle, enabling user code to avoid using the memory region after it is free’d.
Rather than just returning a number, we can make errors a bit more ergonomic
#[derive(Debug, Copy, Clone, PartialEq, Eq)]
pub struct SyscallError(u64);
This will allow us to print error messages, and by deriving
PartialEq
and Eq
traits we can match on particular errors.
While we’re adding handles for memory regions, we can make one for
communication handles. They also represent unique resources which can
be made (by the open
syscall), and sent between processes.
#[derive(Debug)]
pub struct CommHandle(u32);
We can define a couple of handles for input and output:
pub const STDIN:CommHandle = CommHandle(0);
pub const STDOUT:CommHandle = CommHandle(1);
Note that these work a bit differently from Unix stdin and stdout: In EuraliOS programs can send messages to stdin, and receive them from stdout.
Now the receive()
function can take a reference to a CommHandle
:
pub fn receive(
handle: &CommHandle
) -> Result<Message, SyscallError> {
...
}
and similarly for the send()
, send_receive()
, open()
and
rcall()
functions. In the pci/src/main.rs
code we can now replace
handle 0
with &STDIN
which is longer but (perhaps) more
descriptive, and in the rtl8139/src/main.rs
code we now pass
references to handle
to the rcall()
function.
We can now write a syscall to allocate a region of memory, optionally specifying an upper limit on the address:
pub fn malloc(
num_bytes: u64,
max_physaddr: u64
) -> Result<(MemoryHandle, u64), SyscallError> {
let num_pages = (num_bytes >> 12) +
if (num_bytes & 4095) != 0 {1} else {0};
let error: u64;
let handle: u16;
let virtaddr: u64;
let physaddr: u64;
unsafe {
asm!("mov rax, 7", // syscall function
"syscall",
in("rdi") num_pages, // First argument
in("rsi") max_physaddr, // Second argument
out("rax") error,
lateout("rdi") virtaddr,
lateout("rsi") physaddr,
out("rcx") _,
out("r11") _);
}
if error == 0 {
Ok((MemoryHandle(virtaddr), physaddr))
} else {
Err(SyscallError(error))
}
}
The inputs are the number of bytes needed (which is rounded up to
calculate the number of pages), and a second argument which is the
maximum physical memory address (max_physaddr
). If that is zero then
frames will be allocated on demand; if non-zero then consecutive
frames will be allocated before returning. The return values are the
memory handle and starting physical address (if consecutive).
In kernel/src/syscalls.rs
we’ll dispatch syscall 7 to sys_malloc()
:
fn sys_malloc(
context_ptr: *mut Context,
num_pages: u64,
max_physaddr: u64
) {
let context = unsafe {&mut (*context_ptr)};
match process::new_memory_chunk(
num_pages,
max_physaddr) {
Ok((virtaddr, physaddr)) => {
context.rax = 0; // No error
context.rdi = virtaddr.as_u64() as usize;
context.rsi = physaddr.as_u64() as usize;
}
Err(code) => {
context.rax = code;
context.rdi = 0;
context.rsi = 0;
context.rdx = 0;
}
}
}
which calls the new_memory_chunk()
function in process.rs
The MemoryHandle
object will own a chunk of memory, and
free it when it is dropped. Syscall 8 is sys_free
, which
takes the virtual address of the start of the memory chunk.
We can then implement drop()
as:
impl Drop for MemoryHandle {
fn drop(&mut self) {
let error: u64;
unsafe {
asm!("syscall",
in("rax") SYSCALL_FREE,
in("rdi") self.0, // First argument
lateout("rax") error,
out("rcx") _,
out("r11") _);
}
if error != 0 {
debug_println!("MemoryHandle::drop({:X}) error {}", self.0, error);
}
}
}
Trying this out in rtl8139
with:
let result = syscalls::malloc(8192 + 16, 0xFFFF_FFFF);
debug_println!("Received: {:?}", result);
We can now pass this physical address to the Device struct:
let (rx_buffer, rx_buffer_physaddr) =
syscalls::malloc(8192 + 16, 0xFFFF_FFFF).unwrap();
let mut device = Device{ioaddr,
rx_buffer_physaddr:(rx_buffer_physaddr as u32)};
then in Device::reset()
:
// Set the receive buffer
outportd(self.ioaddr + REG_RX_ADDR, self.rx_buffer_physaddr);
// Configure receive buffer
outportd(self.ioaddr + REG_RX_CONFIG, 0xf);
// Enable receive and transmitter
outportb(self.ioaddr + REG_CMD, 0x0C);
where we have defined some constants for the registers:
const REG_RX_ADDR: u16 = 0x30;
const REG_CMD: u16 = 0x37;
const REG_RX_CONFIG: u16 = 0x44;
Like MOROS we can now try polling for received packets
impl Device {
fn receive_packet(&self) {
if inb(self.ioaddr + REG_CMD) & CR_BUFFER_EMPTY
== CR_BUFFER_EMPTY {
return; // No packet
}
debug_println!("Received packet!");
}
}
and in main()
after resetting the device just keep calling this
function:
loop { device.receive_packet(); }
To be able to test this we need a way to send packets to
the QEMU network card. QEMU networking has several options
for making the guest operating system accessible to the host
and outside world, but the easiest seems to be to forward
a TCP port. In kernel/Cargo.toml
we can set QEMU arguments
to forward port 5555 on the host to port 23 on the guest:
run-args = ["-nic", "user,model=rtl8139,hostfwd=tcp::5555-:23"]
When this is run, as soon as you run on the host something like
telnet
or ssh
and connect to port 5555 e.g.
telnet localhost 5555
should keep writing “Received packet!” because we’re not taking the data out of the buffer yet.
Rx buffer, when not empty, will contain:
- header (2 bytes)
- length (2 bytes)
- packet (length - 4 bytes)
- crc (4 bytes)
The receive_packet()
function will allocate a chunk of memory,
copy the data from the receive buffer into it, and return the
MemoryHandle
. The plan is that MemoryHandle
can be forwarded
on to higher levels of the network stack implemented in a separate
program.
fn receive_packet(&self) -> Option<MemoryHandle> {
if inb(self.ioaddr + REG_CMD) & CR_BUFFER_EMPTY
== CR_BUFFER_EMPTY {
return None
}
debug_println!("Received packet!");
let capr = inw(self.ioaddr + REG_CAPR);
let cbr = inw(self.ioaddr + REG_CBR);
// CAPR starts at 65520 and with the pad it overflows to 0
let offset = ((capr as u64) + RX_BUFFER_PAD) & 0xFFFF;
let header = unsafe{*((self.rx_buffer.as_u64() + offset) as *const u16)};
if header & ROK != ROK {
debug_println!(" => Packet not ok");
outportw(self.ioaddr + REG_CAPR, cbr);
return None;
}
// Length of the packet
let length = unsafe{*((self.rx_buffer.as_u64() + offset + 2) as *const u16)};
// Receive buffer, including header (u16), length (u16) and crc (u32)
let src_data = (self.rx_buffer.as_u64() + offset) as *const u8;
// Copy data into a separate memory chunk which can be
// sent to other processes. Use malloc syscall to get a MemoryHandle.
let (mem_handle, _) = syscalls::malloc((length + 4) as u64, 0).ok()?;
let dest_data = mem_handle.as_u64() as *mut u8;
unsafe{
ptr::copy_nonoverlapping(src_data, dest_data,
(length + 4) as usize);
}
// Update buffer read pointer
let rx_offset = ((offset as u16) + length + 4 + 3) & !3;
outportw(self.ioaddr + REG_CAPR,
rx_offset - (RX_BUFFER_PAD as u16));
Some(mem_handle)
}
Running telnet
or other command should now produce output as the memory handles are
allocated and free’d.
In section 16 we’ll start building up the network stack by implementing a simple Address Resolution Protocol (ARP) program on top of this network driver. In the next section we first make the inter-process messaging more robust.