Skip to content

Latest commit

 

History

History
622 lines (539 loc) · 22.6 KB

27-text-editor.org

File metadata and controls

622 lines (539 loc) · 22.6 KB

A text editor for EuraliOS

EuraliOS really needs a simple text editor, to be able to start doing useful things like changing the configuration from within the operating system itself.

Ultimately it would probably be best to support a library like Crossterm or Termion (used in Redox OS), that many programs can use to provide a terminal interface. For now we’ll write a simple editor, as an excuse to learn something about text editor data structures. It is also a way to add things to EuraliOS, like command-line arguments and environment variables, that user programs need.

The first thing our text editor needs is a way to read command line arguments. Well, maybe not the first thing, but it’s what seemed achievable…

Command line arguments

Command-line arguments in Rust are collected by calling std::env::args() (see the Rust book). In EuraliOS the interface will be:

use euralios_std::env;
...
let args: Vec<String> = env::args().collect();

To make this work the command-line arguments must be passed from the parent to the new process. One way would be to send a message from the parent to the child process, containing the arguments. This would add more syscalls to the process startup, and create possible deadlocks for the parent process if the child didn’t start correctly. Instead we can use the parameter string that is already passed to the exec syscall.

In Multiple Users the parameter string was used to specify the Virtual File System (VFS) of the child process. This uses a simple format with single character commands (e.g. ‘C’ copies the parent VFS, ‘-’ removes a mount path). We now add another command ‘A’ that specifies a null-terminated string.

The library interface to exec is euralios_std::syscalls::exec. We add an args: Vec<&str> input, push an ‘A’ character to the parameter string followed by these strings separated by 0x03, the ETX End-of-text ASCII control character, and terminating with a 0 byte.

In the kernel sys_exec function the argument string is copied into a Vec<u8> on the kernel heap so that it’s accessible when the page table is changed. It’s then passed as a new input to new_user_thread in process.rs.

The argument string is passed to the new process on the stack: Stacks grow downwards, to the string is put on the stack above the length of string. Then register rdx is set to the address of the string length. In the _start function (that calls the user main()) we get args_address from rdx and then store the u8 slice in a static mut variable:

ARGS_SLICE = unsafe{
    let length = *(args_address as *mut i32) as usize;
    slice::from_raw_parts((args_address + 4) as *const u8, length)
};

Once a program starts, this slice is only read by calling an internal function get_args -> &'static [u8]. The user accesses this through the env::args() function:

pub fn args() -> impl Iterator<Item = String> {
  // Convert bytes into a vector of owned strings
  get_args()
      .split(|&b| b == 0x03)
      .map(|arg| String::from_utf8_lossy(arg).into_owned())
}

This splits the string on 0x03 chars, as used in the euralios_std::syscalls::exec function. We can now start an editor and get the command-line arguments:

use euralios_std::{println, env};

#[no_mangle]
fn main() {
    for arg in env::args() {
        println!("Arg: {}", arg);
    }
}

Environment variables

When we open the file we use the path given on the command line. If the caller uses the absolute path then it works, but relative paths do not: The text editor doesn’t know the shell’s working directory!

Linux stores each processor’s current working directory (and lots of other information) in its task_struct data structure as described here. In the spirit of keeping things simple, we’re instead going to store the current directory as an environment variable: The kernel doesn’t need to know about working directories (probably).

Maybe we could treat environment variables as a kind of argument, but for now they are treated separately: In the new process parameter string we used ‘A’ to begin the arguments; now we can add ‘E’ to begin a null-terminated environment string. This is collected into a Vec<u8> in kernel memory, and passed as another member of the process::Params parameter struct that is passed to process::new_user_thread. The environment string (if there is one) is put on the user stack underneath the arguments, with the length of the string as a u32 underneath that. We set the RDI register to point to that string length. [The choice of RDI is fairly arbitrary, though some registers e.g RBX are reserved for LLVM use].

When a new process starts, the _start() entry point in euralios_std::lib is called. This now reads heap information, argument and environment addresses:

#[no_mangle]
pub unsafe extern "sysv64" fn _start() -> ! {
    // Information passed from the operating system
    let heap_start: usize;
    let heap_size: usize;
    let args_address: usize;
    let env_address: usize;
    asm!("",
         lateout("rax") heap_start,
         lateout("rcx") heap_size,
         lateout("rdx") args_address,
         lateout("rdi") env_address,
         options(pure, nomem, nostack)
    );

We’re going to create a private static variable to point to the environment string (as done for the arguments) and public function to read it:

static mut ENV_SLICE: &[u8] = b"";

pub fn get_env() -> &'static [u8] {
    return unsafe{ ENV_SLICE };
}

It has to be mut but we’re only going to modify it once in _start, where we read the lengh and then create a slice pointing to the environment section of the stack.

if env_address != 0 {
    ENV_SLICE = unsafe{
        let length = *(env_address as *mut i32) as usize;
        slice::from_raw_parts((env_address + 4) as *const u8, length)
    };
}

We can now implement some of the functions in std::env, including env::vars() to iterate over environment variables. Like arguments, we use byte 0x03 (the ASCII End-of-Text record separator) to separate key-value pairs.

Note: For now all of these use get_env() to get a static string slice. If we want to implement std::env functions to set environment variables, then we’ll need to allocate memory to store the new values. Most programs won’t need to modify their environment variables, so this memory should probably be allocated the first time a variable is set.

In the shell program we now create an environment string with the current working directory and pass it to syscalls::exec to be inserted into the parameter string:

let mut env_string = String::from("PWD=");
env_string.push_str(current_directory
                      .as_os_str().to_str().unwrap());
syscalls::exec(
        &bin,
        0, // Permission flags
        exe_input2,
        syscalls::STDOUT.clone(),
        VFS::shared(),
        args,
        &env_string
    )?;

In future this could be extended to allow users to set more environment variables to be passed to programs.

The editor program can now get and print its environment variables using:

use euralios_std::env;

// Print all environment variables
for (key, value) in env::vars() {
    print!("ENV '{key}' = '{value}'\n");
}

// Get the current working directory
if let Ok(pwd) = env::var("PWD") {
    print!("pwd = {pwd}\n");
}

We can then wrap this call to env::var= in a function env::current_dir

pub fn current_dir() -> Result<PathBuf, VarError> {
    let pwd = var("PWD")?;
    Ok(PathBuf::from(
        OsString::from(pwd)))
}

Now in fs::File when we open or create a file we modify the path if the current directory (“PWD”) is set in the environment:

pub fn open<P: AsRef<Path>>(path: P) -> Result<File, SyscallError> {
    let pwd_or_err = env::current_dir();
    let handle = if path.as_ref().is_relative() & pwd_or_err.is_ok() {
        let mut abspath = pwd_or_err.unwrap();
        abspath.push(path.as_ref());
        syscalls::open(abspath.as_os_str(), message::O_READ)?
    } else {
        syscalls::open(path.as_ref().as_os_str(), message::O_READ)?
    };
    Ok(File(handle))
}

Finally, we can pass a command-line argument to the text editor, and use paths relative to the current working directory.

Text editor piece table

Now that we can tell the editor which file to open, we need to choose how to represent the data in memory. Here we’re going to implement a Piece table, one of several data structures used in text editors (others include Gap buffers and Ropes. For small files a simple Vec is probably sufficient, or a Vec of lines and each line a Vec of u8).

Piece tables have three components: The original text, that is read and never modified, an “Add buffer” that is added to (never removed) as the user types, and the piece table itself. The piece table describes the order of pieces of text in the document. Each piece can come from either the Original buffer or the Add buffer, and has a starting position in that buffer and a length. When a file is first opened there is only one piece, the whole of the Original buffer. As text is added or deleted, more pieces are created. When the text is displayed on screen or saved to file, the pieces are assembled in the order in the piece table.

We can define a Piece as:

#[derive(Clone, Copy)]
enum Piece {
    // Start, len [bytes]
    Original{start: usize, len: usize},
    Add{start: usize, len: usize},
}

A file can be represented as the Original and Add buffers, and the list of pieces:

struct File {
    path: String, // The path to the file
    original: Vec<u8>, // Original contents of the file
    add: Vec<u8>, // Add buffer (append only)
    pieces: Vec<Piece>,
}

To display a file we need to print each of the pieces in the right order. The simplest way is to iterate through the pieces:

fn display(file: &File) {
    // Move cursor to (0,0) then erase below
    print!("\x1b[H\x1bJ");
    for (piece_index, piece) in file.pieces.iter().enumerate() {
        let bytes = match piece {
            Piece::Original{start: start,
                            len: len} => {
                &file.original[(*start)..(start + len)]
            },
            Piece::Add{start: start,
                       len: len} => {
                print!("\x1b[31m"); // Set foreground to red
                &file.add[(*start)..(start + len)]
            }
        };
        print!("{}\x1b[m", unsafe{str::from_utf8_unchecked(bytes)});
    }
}

Control codes are used to do things like reset the cursor and erase the screen, and change the foreground color to red when printing text from the Add buffer. The different colors are useful for debugging.

The above code works but is very inefficient: It calls print! many times, producing a flickering effect when typing because the screen is cleared and then filled in again every time the display is refreshed. It also doesn’t allow for things like line numbering, display a cursor location, or displaying part of a long file. We’ll fix those things one by one…

Moving around in the buffer

The piece table provides a nice way to insert and delete text anywhere, but makes moving around, say one line up or down, a little fiddly. To keep track of a location in the file we can define a Cursor:

  #[derive(Clone, Copy)]
struct Cursor {
    piece: usize, // Index of the piece the cursor is in
    pos: usize, // Position inside the piece
}

Some operations that are needed in many places are moving to the next and previous character:

impl File {
    fn next(&self, cursor: Cursor) -> Option<Cursor> {
        if cursor.piece == self.pieces.len() {
            return None; // End of the file
        }
        if cursor.pos == self.pieces[cursor.piece].len() - 1 {
            return Some(Cursor{piece: cursor.piece + 1,
                               pos: 0});
        }
        Some(Cursor{piece: cursor.piece,
                    pos: cursor.pos + 1})
    }

    fn previous(&self, cursor: Cursor) -> Option<Cursor> {
        if cursor.pos == 0 {
            if cursor.piece == 0 {
                return None; // Start of the file
            }
            return Some(Cursor {piece: cursor.piece - 1,
                                pos: self.pieces[cursor.piece - 1].len() - 1});
        }
        Some(Cursor{piece: cursor.piece,
                    pos: cursor.pos - 1})
    }
}

Note that the cursor can be on the character after the end of the file (pieces equal to file.pieces.len()), because it’s the location where the next character will go. This needs to be handled in the function that returns the character at the cursor location:

impl File {
    fn at(&self, cursor: Cursor) -> u8 {
        if cursor.piece == self.pieces.len() {
            return 0;
        }
        match self.pieces[cursor.piece] {
            Piece::Original{start: start, ..} =>
                self.original[start + cursor.pos],
            Piece::Add{start: start, ..} =>
                self.add[start + cursor.pos]
        }
    }
}

With the next and previous methods we can move around the file character by character, e.g. an ArrowRight character can be handled with:

} else if ch == console::sequences::ArrowRight {
    // Shift to next character
    if let Some(cursor) = file.next(file.cursor) {
        file.cursor = cursor;
    }

Moving up and down lines requires a bit more work: We need to know which column the cursor is on, and find which character is at the corresponding column on the line above or below. To implement this it is useful to have File::find and File::rfind methods e.g.

// Reverse find, starting from the given cursor
// If found, returns a Cursor pointing to the location, and the number of characters traversed
fn rfind(&self, start: Cursor, pattern: u8) -> Option<(Cursor, usize)> {
    // Number of characters traversed
    let mut ntraversed: usize = 0;
    let mut cursor = start;
    loop {
        if let Some(c) = self.previous(cursor) {
            cursor = c;
        } else {
            break None;
        }

        // Count characters. NOTE: Assumes ASCII
        ntraversed += 1;

        // Get the byte at this location
        if self.at(cursor) == pattern {
            break Some((cursor, ntraversed));
        }
    }
}

With this we can handle ArrowUp characters by counting backwards to the beginning of the line, then finding the start of the previous line, and counting forwards to find the right column:

} else if ch == console::sequences::ArrowUp {
    // Scan backwards until finding a `\n`
    if let Some((line_end, mut nchars)) = file.rfind(file.cursor, b'\n') {
        // Scan backwards again
        let mut cursor = if let Some((line_start, _)) = file.rfind(line_end, b'\n') {
            line_start
        } else {
            // Start of file
            nchars -= 1;
            Cursor{piece: 0, pos: 0}
        };
        // Now move forward nchars or until end of line
        for _ in 0..nchars {
            cursor = file.next(cursor).unwrap();
            if file.at(cursor) == b'\n' {
                break;
            }
        }
        file.cursor = cursor;
    }

A nice feature to add would be to keep track of the line number over repeated ArrowUp and ArrowDown characters, so that the column of the original line is kept, even if the cursor goes through lines with fewer characters.

Adding a status bar

A status bar takes up one of the 25 lines of text, but is useful to show the name of the file being edited, and indicate when a file has changed. I like the nano interface with its list of shortcut keys at the bottom. At least it’s clear how to exit the editor.

Putting a bar at the bottom of the screen is harder than it seems: We would need to know how many rows the screen has. We’re using VGA mode so we could hard-wire this, but for now we’ll just put the bar at the top of the screen instead. [Note: A little lower down we’ll have to hard-wire the number of lines anyway, to know when to stop printing a page. By then the bar at the top had grown on me].

The code below creates a top line with a blue background. The file path is displayed on the left, followed by an indicator of whether the file has changed since the last save (in yellow). To the right we print the shortcuts for Save and Quit in light blue: It’s nice to tell new users how to get out of the program.

// Move cursor to (0,0)
// Set background to blue (44m); foreground white (37m)
print!("\x1b[H\x1b[44m\x1b[37m  {}", &file.path);
if file.changed {
    // Foreground yellow (33m)
    print!(" \x1b[33mchanged\x1b[37m");
} else {
    print!("        ");
}
// Clear to the right (K), reset colors
print!("                    \x1b[36m^S Save ^Q Quit\x1b[K\x1b[m\n");

More efficient output

The `display` function is fast enough, but quite inefficient: It makes multiple calls to print, where one would be enough. Each call to print allocates a new memory chunk, copies text into it, and sends the chunk in a message to the VGA driver to be processed.

Instead we can eliminate most of this copying, and only send one message, by writing everything into one buffer and sending that. Allocate some memory:

let buffer_limit = 8000; // Maximum number of bytes
let (mut mem_handle, _) = syscalls::malloc(buffer_limit as u64, 0).unwrap();

Now we need to wrap this memory handle in an object that implements the core::fmt::Write interface.

struct Buffer<'a>(&'a mut [u8], usize);
impl Write for Buffer<'_> {
    fn write_str(&mut self, s: &str) -> fmt::Result {
        let space_left = self.0.len() - self.1;
        if space_left > s.len() {
            self.0[self.1..][..s.len()].copy_from_slice(s.as_bytes());
            self.1 += s.len();
            Ok(())
        } else {
            Err(fmt::Error)
        }
    }
}

We can now replace all print!(...) calls with write!(buffer,...). Then at the end of the display function we send the whole buffer to stdout:

_ = message::rcall(&syscalls::STDOUT,
                   message::WRITE,
                   (buffer.1 as u64).into(), // Length
                   mem_handle.into(), // Memory
                   None);

The (very) basic text editor is now working!

./img/27-01-basic-editor.png

Editing longer files

Basic text editing is now working, but we can’t move around by going up and down lines or edit files longer than a single screen.

We now need to keep track of the first character on the screen, and its line number. The fiddly part of this (I found) was ensuring that the line number and character index stay in sync as the page is moved up and down:

// Describe where to begin display
display_start: Cursor,
line_start: usize,

To keep track of the location in the piece buffer we use the same Cursor struct as for the editing cursor, and a usize for the first line number. In the display function we skip pieces before the display_start piece, and handle the case where the display start is inside a piece:

for (piece_index, piece) in file.pieces.iter().enumerate() {
    if piece_index < file.display_start.piece {
        continue;
    }

    let mut cursor_pos = file.cursor.pos;

    let bytes = if piece_index == file.display_start.piece {
        if cursor_pos < file.display_start.pos {
            cursor_pos = 0;
        } else {
            cursor_pos -= file.display_start.pos;
        }

Moving the page up and down is complicated because we want to start the page on the character after the new line (‘\n’). If a line is empty, however, then the next character is also a new line. The way that seemed to work was to first figure out how to move one line, and then repeat it ten times:

} else if ch == console::sequences::PageDown {
    for _ in 0..10 {
        // If start is a newline, the first line is empty
        if file.at(file.display_start) != b'\n' {
            // Find the end of the first line
            if let Some((newline,_)) = file.find(file.display_start, b'\n') {
                file.display_start = newline;
            } else {
                continue; // Already at the end
            }
        }
        // Start display on the character after the new line
        if let Some(cursor) = file.next(file.display_start) {
            file.display_start = cursor;
            file.line_start += 1
        }
    }

Still to do

There are many things still to be done to make this editor more usable. In no particular order:

  • Handle UTF8 characters. The next, previous, find and rfind routines move around the text buffer in bytes, rather than characters. EuraliOS currently can’t display (or enter) non-ASCII characters so this is hard to test for now.
  • Handle the delete key
  • Keep cursor column number across multiple ArrowUp and ArrowDown characters.
  • Ensure that the editing cursor is always on screen: Currently the page up/down characters don’t move the cursor, and moving the cursor outside the display doesn’t change the display. Editing while the cursor is before the display start can cause a panic because it changes the piece count.
  • Add an Undo feature. This can be done by tracking changes to the piece table, or by pushing inverse operations into a stack.
  • Add search, syntax highlighting, and other creature comforts.