EuraliOS really needs a simple text editor, to be able to start doing useful things like changing the configuration from within the operating system itself.
Ultimately it would probably be best to support a library like Crossterm or Termion (used in Redox OS), that many programs can use to provide a terminal interface. For now we’ll write a simple editor, as an excuse to learn something about text editor data structures. It is also a way to add things to EuraliOS, like command-line arguments and environment variables, that user programs need.
The first thing our text editor needs is a way to read command line arguments. Well, maybe not the first thing, but it’s what seemed achievable…
Command-line arguments in Rust are collected by calling
std::env::args()
(see the Rust book). In EuraliOS the interface
will be:
use euralios_std::env;
...
let args: Vec<String> = env::args().collect();
To make this work the command-line arguments must be passed
from the parent to the new process. One way would be to send a
message from the parent to the child process, containing the
arguments. This would add more syscalls to the process startup,
and create possible deadlocks for the parent process if the
child didn’t start correctly. Instead we can use the parameter
string that is already passed to the exec
syscall.
In Multiple Users the parameter string was used to specify the Virtual File System (VFS) of the child process. This uses a simple format with single character commands (e.g. ‘C’ copies the parent VFS, ‘-’ removes a mount path). We now add another command ‘A’ that specifies a null-terminated string.
The library interface to exec
is euralios_std::syscalls::exec
. We
add an args: Vec<&str>
input, push an ‘A’ character to the parameter
string followed by these strings separated by 0x03
, the ETX
End-of-text ASCII control character, and terminating with a 0
byte.
In the kernel sys_exec
function the argument string is copied into a
Vec<u8>
on the kernel heap so that it’s accessible when the page
table is changed. It’s then passed as a new input to
new_user_thread
in process.rs
.
The argument string is passed to the new process on the stack:
Stacks grow downwards, to the string is put on the stack above
the length of string. Then register rdx
is set to the address
of the string length. In the _start
function (that calls the
user main()
) we get args_address
from rdx
and then store
the u8
slice in a static mut
variable:
ARGS_SLICE = unsafe{
let length = *(args_address as *mut i32) as usize;
slice::from_raw_parts((args_address + 4) as *const u8, length)
};
Once a program starts, this slice is only read by calling an
internal function get_args -> &'static [u8]
.
The user accesses this through the env::args()
function:
pub fn args() -> impl Iterator<Item = String> {
// Convert bytes into a vector of owned strings
get_args()
.split(|&b| b == 0x03)
.map(|arg| String::from_utf8_lossy(arg).into_owned())
}
This splits the string on 0x03
chars, as used in the
euralios_std::syscalls::exec
function. We can now start
an editor and get the command-line arguments:
use euralios_std::{println, env};
#[no_mangle]
fn main() {
for arg in env::args() {
println!("Arg: {}", arg);
}
}
When we open the file we use the path given on the command line. If the caller uses the absolute path then it works, but relative paths do not: The text editor doesn’t know the shell’s working directory!
Linux stores each processor’s current working directory (and lots of
other information) in its task_struct
data structure as described
here. In the spirit of keeping things simple, we’re instead going to
store the current directory as an environment variable: The kernel
doesn’t need to know about working directories (probably).
Maybe we could treat environment variables as a kind of argument, but
for now they are treated separately: In the new process parameter
string we used ‘A’ to begin the arguments; now we can add ‘E’ to begin
a null-terminated environment string. This is collected into a
Vec<u8>
in kernel memory, and passed as another member of the
process::Params
parameter struct that is passed to
process::new_user_thread
. The environment string (if there is one)
is put on the user stack underneath the arguments, with the length of
the string as a u32
underneath that. We set the RDI
register to
point to that string length. [The choice of RDI
is fairly
arbitrary, though some registers e.g RBX are reserved for LLVM use].
When a new process starts, the _start()
entry point in
euralios_std::lib
is called. This now reads heap information,
argument and environment addresses:
#[no_mangle]
pub unsafe extern "sysv64" fn _start() -> ! {
// Information passed from the operating system
let heap_start: usize;
let heap_size: usize;
let args_address: usize;
let env_address: usize;
asm!("",
lateout("rax") heap_start,
lateout("rcx") heap_size,
lateout("rdx") args_address,
lateout("rdi") env_address,
options(pure, nomem, nostack)
);
We’re going to create a private static variable to point to the environment string (as done for the arguments) and public function to read it:
static mut ENV_SLICE: &[u8] = b"";
pub fn get_env() -> &'static [u8] {
return unsafe{ ENV_SLICE };
}
It has to be mut
but we’re only going to modify it once in _start
,
where we read the lengh and then create a slice pointing to the
environment section of the stack.
if env_address != 0 {
ENV_SLICE = unsafe{
let length = *(env_address as *mut i32) as usize;
slice::from_raw_parts((env_address + 4) as *const u8, length)
};
}
We can now implement some of the functions in std::env, including
env::vars()
to iterate over environment variables. Like arguments,
we use byte 0x03 (the ASCII End-of-Text record separator) to separate
key-value pairs.
Note: For now all of these use get_env()
to get a static string
slice. If we want to implement std::env
functions to set
environment variables, then we’ll need to allocate memory to store the
new values. Most programs won’t need to modify their environment
variables, so this memory should probably be allocated the first time
a variable is set.
In the shell program we now create an environment string with the
current working directory and pass it to syscalls::exec
to be
inserted into the parameter string:
let mut env_string = String::from("PWD=");
env_string.push_str(current_directory
.as_os_str().to_str().unwrap());
syscalls::exec(
&bin,
0, // Permission flags
exe_input2,
syscalls::STDOUT.clone(),
VFS::shared(),
args,
&env_string
)?;
In future this could be extended to allow users to set more environment variables to be passed to programs.
The editor program can now get and print its environment variables using:
use euralios_std::env;
// Print all environment variables
for (key, value) in env::vars() {
print!("ENV '{key}' = '{value}'\n");
}
// Get the current working directory
if let Ok(pwd) = env::var("PWD") {
print!("pwd = {pwd}\n");
}
We can then wrap this call to env::var=
in a function
env::current_dir
pub fn current_dir() -> Result<PathBuf, VarError> {
let pwd = var("PWD")?;
Ok(PathBuf::from(
OsString::from(pwd)))
}
Now in fs::File
when we open or create a file we modify the path
if the current directory (“PWD”) is set in the environment:
pub fn open<P: AsRef<Path>>(path: P) -> Result<File, SyscallError> {
let pwd_or_err = env::current_dir();
let handle = if path.as_ref().is_relative() & pwd_or_err.is_ok() {
let mut abspath = pwd_or_err.unwrap();
abspath.push(path.as_ref());
syscalls::open(abspath.as_os_str(), message::O_READ)?
} else {
syscalls::open(path.as_ref().as_os_str(), message::O_READ)?
};
Ok(File(handle))
}
Finally, we can pass a command-line argument to the text editor, and use paths relative to the current working directory.
Now that we can tell the editor which file to open, we need to choose
how to represent the data in memory. Here we’re going to implement a
Piece table, one of several data structures used in text editors
(others include Gap buffers and Ropes. For small files a simple Vec
is probably sufficient, or a Vec
of lines and each line a Vec
of
u8
).
Piece tables have three components: The original text, that is read and never modified, an “Add buffer” that is added to (never removed) as the user types, and the piece table itself. The piece table describes the order of pieces of text in the document. Each piece can come from either the Original buffer or the Add buffer, and has a starting position in that buffer and a length. When a file is first opened there is only one piece, the whole of the Original buffer. As text is added or deleted, more pieces are created. When the text is displayed on screen or saved to file, the pieces are assembled in the order in the piece table.
We can define a Piece as:
#[derive(Clone, Copy)]
enum Piece {
// Start, len [bytes]
Original{start: usize, len: usize},
Add{start: usize, len: usize},
}
A file can be represented as the Original and Add buffers, and the list of pieces:
struct File {
path: String, // The path to the file
original: Vec<u8>, // Original contents of the file
add: Vec<u8>, // Add buffer (append only)
pieces: Vec<Piece>,
}
To display a file we need to print each of the pieces in the right order. The simplest way is to iterate through the pieces:
fn display(file: &File) {
// Move cursor to (0,0) then erase below
print!("\x1b[H\x1bJ");
for (piece_index, piece) in file.pieces.iter().enumerate() {
let bytes = match piece {
Piece::Original{start: start,
len: len} => {
&file.original[(*start)..(start + len)]
},
Piece::Add{start: start,
len: len} => {
print!("\x1b[31m"); // Set foreground to red
&file.add[(*start)..(start + len)]
}
};
print!("{}\x1b[m", unsafe{str::from_utf8_unchecked(bytes)});
}
}
Control codes are used to do things like reset the cursor and erase the screen, and change the foreground color to red when printing text from the Add buffer. The different colors are useful for debugging.
The above code works but is very inefficient: It calls print!
many
times, producing a flickering effect when typing because the screen is
cleared and then filled in again every time the display is
refreshed. It also doesn’t allow for things like line numbering,
display a cursor location, or displaying part of a long file. We’ll fix
those things one by one…
The piece table provides a nice way to insert and delete text anywhere,
but makes moving around, say one line up or down, a little fiddly.
To keep track of a location in the file we can define a Cursor
:
#[derive(Clone, Copy)]
struct Cursor {
piece: usize, // Index of the piece the cursor is in
pos: usize, // Position inside the piece
}
Some operations that are needed in many places are moving to the next and previous character:
impl File {
fn next(&self, cursor: Cursor) -> Option<Cursor> {
if cursor.piece == self.pieces.len() {
return None; // End of the file
}
if cursor.pos == self.pieces[cursor.piece].len() - 1 {
return Some(Cursor{piece: cursor.piece + 1,
pos: 0});
}
Some(Cursor{piece: cursor.piece,
pos: cursor.pos + 1})
}
fn previous(&self, cursor: Cursor) -> Option<Cursor> {
if cursor.pos == 0 {
if cursor.piece == 0 {
return None; // Start of the file
}
return Some(Cursor {piece: cursor.piece - 1,
pos: self.pieces[cursor.piece - 1].len() - 1});
}
Some(Cursor{piece: cursor.piece,
pos: cursor.pos - 1})
}
}
Note that the cursor can be on the character after the end of the file
(pieces
equal to file.pieces.len()
), because it’s the location
where the next character will go. This needs to be handled
in the function that returns the character at the cursor location:
impl File {
fn at(&self, cursor: Cursor) -> u8 {
if cursor.piece == self.pieces.len() {
return 0;
}
match self.pieces[cursor.piece] {
Piece::Original{start: start, ..} =>
self.original[start + cursor.pos],
Piece::Add{start: start, ..} =>
self.add[start + cursor.pos]
}
}
}
With the next
and previous
methods we can move around the file
character by character, e.g. an ArrowRight
character can be handled
with:
} else if ch == console::sequences::ArrowRight {
// Shift to next character
if let Some(cursor) = file.next(file.cursor) {
file.cursor = cursor;
}
Moving up and down lines requires a bit more work: We need to know
which column the cursor is on, and find which character is at the
corresponding column on the line above or below. To implement this
it is useful to have File::find
and File::rfind
methods e.g.
// Reverse find, starting from the given cursor
// If found, returns a Cursor pointing to the location, and the number of characters traversed
fn rfind(&self, start: Cursor, pattern: u8) -> Option<(Cursor, usize)> {
// Number of characters traversed
let mut ntraversed: usize = 0;
let mut cursor = start;
loop {
if let Some(c) = self.previous(cursor) {
cursor = c;
} else {
break None;
}
// Count characters. NOTE: Assumes ASCII
ntraversed += 1;
// Get the byte at this location
if self.at(cursor) == pattern {
break Some((cursor, ntraversed));
}
}
}
With this we can handle ArrowUp
characters by counting backwards to the beginning
of the line, then finding the start of the previous line, and counting forwards to
find the right column:
} else if ch == console::sequences::ArrowUp {
// Scan backwards until finding a `\n`
if let Some((line_end, mut nchars)) = file.rfind(file.cursor, b'\n') {
// Scan backwards again
let mut cursor = if let Some((line_start, _)) = file.rfind(line_end, b'\n') {
line_start
} else {
// Start of file
nchars -= 1;
Cursor{piece: 0, pos: 0}
};
// Now move forward nchars or until end of line
for _ in 0..nchars {
cursor = file.next(cursor).unwrap();
if file.at(cursor) == b'\n' {
break;
}
}
file.cursor = cursor;
}
A nice feature to add would be to keep track of the line number over
repeated ArrowUp
and ArrowDown
characters, so that the column
of the original line is kept, even if the cursor goes through
lines with fewer characters.
A status bar takes up one of the 25 lines of text, but is useful to show the name of the file being edited, and indicate when a file has changed. I like the nano interface with its list of shortcut keys at the bottom. At least it’s clear how to exit the editor.
Putting a bar at the bottom of the screen is harder than it seems: We would need to know how many rows the screen has. We’re using VGA mode so we could hard-wire this, but for now we’ll just put the bar at the top of the screen instead. [Note: A little lower down we’ll have to hard-wire the number of lines anyway, to know when to stop printing a page. By then the bar at the top had grown on me].
The code below creates a top line with a blue background. The file path is displayed on the left, followed by an indicator of whether the file has changed since the last save (in yellow). To the right we print the shortcuts for Save and Quit in light blue: It’s nice to tell new users how to get out of the program.
// Move cursor to (0,0)
// Set background to blue (44m); foreground white (37m)
print!("\x1b[H\x1b[44m\x1b[37m {}", &file.path);
if file.changed {
// Foreground yellow (33m)
print!(" \x1b[33mchanged\x1b[37m");
} else {
print!(" ");
}
// Clear to the right (K), reset colors
print!(" \x1b[36m^S Save ^Q Quit\x1b[K\x1b[m\n");
The `display` function is fast enough, but quite inefficient: It makes
multiple calls to print
, where one would be enough. Each call to
print
allocates a new memory chunk, copies text into it, and sends
the chunk in a message to the VGA driver to be processed.
Instead we can eliminate most of this copying, and only send one message, by writing everything into one buffer and sending that. Allocate some memory:
let buffer_limit = 8000; // Maximum number of bytes
let (mut mem_handle, _) = syscalls::malloc(buffer_limit as u64, 0).unwrap();
Now we need to wrap this memory handle in an object that implements
the core::fmt::Write
interface.
struct Buffer<'a>(&'a mut [u8], usize);
impl Write for Buffer<'_> {
fn write_str(&mut self, s: &str) -> fmt::Result {
let space_left = self.0.len() - self.1;
if space_left > s.len() {
self.0[self.1..][..s.len()].copy_from_slice(s.as_bytes());
self.1 += s.len();
Ok(())
} else {
Err(fmt::Error)
}
}
}
We can now replace all print!(...)
calls with write!(buffer,...)
.
Then at the end of the display
function we send the whole buffer
to stdout
:
_ = message::rcall(&syscalls::STDOUT,
message::WRITE,
(buffer.1 as u64).into(), // Length
mem_handle.into(), // Memory
None);
The (very) basic text editor is now working!
Basic text editing is now working, but we can’t move around by going up and down lines or edit files longer than a single screen.
We now need to keep track of the first character on the screen, and its line number. The fiddly part of this (I found) was ensuring that the line number and character index stay in sync as the page is moved up and down:
// Describe where to begin display
display_start: Cursor,
line_start: usize,
To keep track of the location in the piece buffer we use the same
Cursor
struct as for the editing cursor, and a usize
for the
first line number. In the display
function we skip pieces
before the display_start
piece, and handle the case where
the display start is inside a piece:
for (piece_index, piece) in file.pieces.iter().enumerate() {
if piece_index < file.display_start.piece {
continue;
}
let mut cursor_pos = file.cursor.pos;
let bytes = if piece_index == file.display_start.piece {
if cursor_pos < file.display_start.pos {
cursor_pos = 0;
} else {
cursor_pos -= file.display_start.pos;
}
Moving the page up and down is complicated because we want to start the page on the character after the new line (‘\n’). If a line is empty, however, then the next character is also a new line. The way that seemed to work was to first figure out how to move one line, and then repeat it ten times:
} else if ch == console::sequences::PageDown {
for _ in 0..10 {
// If start is a newline, the first line is empty
if file.at(file.display_start) != b'\n' {
// Find the end of the first line
if let Some((newline,_)) = file.find(file.display_start, b'\n') {
file.display_start = newline;
} else {
continue; // Already at the end
}
}
// Start display on the character after the new line
if let Some(cursor) = file.next(file.display_start) {
file.display_start = cursor;
file.line_start += 1
}
}
There are many things still to be done to make this editor more usable. In no particular order:
- Handle UTF8 characters. The
next
,previous
,find
andrfind
routines move around the text buffer in bytes, rather than characters. EuraliOS currently can’t display (or enter) non-ASCII characters so this is hard to test for now. - Handle the delete key
- Keep cursor column number across multiple
ArrowUp
andArrowDown
characters. - Ensure that the editing cursor is always on screen: Currently the page up/down characters don’t move the cursor, and moving the cursor outside the display doesn’t change the display. Editing while the cursor is before the display start can cause a panic because it changes the piece count.
- Add an Undo feature. This can be done by tracking changes to the piece table, or by pushing inverse operations into a stack.
- Add search, syntax highlighting, and other creature comforts.