Skip to content

Latest commit

 

History

History
313 lines (268 loc) · 15.6 KB

18-gopher.org

File metadata and controls

313 lines (268 loc) · 15.6 KB

Gopher

Gopher is a simple protocol which was introduced in 1991 and at one time was an alternative to HTTP and the WWW. It is still in use as a text-based way to browse information over the internet. It uses TCP port 70, and one of the main gopher servers is gopher.floodgap.com.

It’s a simple protocol which doesn’t really require a browser to use: We can use telnet to send and receive messages and see how this works. We open a connection to gopher.floodgap.com on port 70:

$ telnet gopher.floodgap.com 70
Trying 192.80.49.99...
Connected to gopher.floodgap.com.
Escape character is '^]'.

And then pressing Enter produces:

iWelcome to Floodgap Systems' official gopher server.		error.host	1
iFloodgap has served the gopher community since 1999		error.host	1
i(formerly gopher.ptloma.edu).		error.host	1
i 		error.host	1
iWe run Bucktooth 0.2.9 on xinetd as our server system.		error.host	1
igopher.floodgap.com is an IBM Power 520 Express with a 2-way		error.host	1
i4.2GHz POWER6 CPU and 8GB of RAM, running AIX 6.1 +patches.		error.host	1
iSend [email protected] your questions and suggestions.		error.host	1
i 		error.host	1
i***********************************************************		error.host	1
i**              OVER 20 YEARS SERVING YOU!               **		error.host	1
i**               Plain text is beautiful!                **		error.host	1
i***********************************************************		error.host	1
<snip>
.
Connection closed by foreign host.

If we want a different document then we enter the “selector” string before sending the CR-LF. For example typing gopher/welcome then Enter should produce:

Floodgap HELP: What is Gopher?
updated 27 December 2000

   gopher  n.  1. Any of various short tailed, burrowing mammals of
   the family Geomyidae, of North America.  2. (Amer. colloq.)
   Native or inhabitant of Minnesota: the Gopher State.
   3. (Amer. colloq.) One who runs errands, does odd-jobs, fetches
   or delivers documents for office staff.  4. (computer tech.)
   Software following a simple protocol for tunneling through a TCP/IP
   internet.

Welcome to Gopherspace!

Virtual File System

In section 13 we implemented a very basic Virtual File System (VFS), which has allowed programs to open particular paths e.g “/dev/nic” to get a communication handle.

Opening “/tcp/192.80.49.99/70” should result in the tcp server opening a new socket, connecting to the given address (192.80.49.99, gopher.floodgap.com) and port (70). The returned communication handle should allow sending (writing) and receiving (reading) data through that socket.

One way to do this would put everything behind one syscall:

  1. The client calls the open syscall with the “/tcp/192.80.49.99/70” string
  2. The kernel checks the client’s VFS, choosing the mount point which matches the longest part of the string, in this case “/tcp”.
  3. The kernel sends an OPEN message to the rendezvous at that mount point, passing the rest of the string, e.g. “192.80.49.99/70”.
  4. The tcp process receives the OPEN message, returning a new handle for this connection. Inside the tcp process a new thread might be spawned to service the new connection.
  5. The kernel returns the new handle to the client

Unfortunately this gets more complicated than hoped, so in the spirit of making the kernel as lazy as possible we make the standard library do most of the work:

  1. The client calls the open syscall with the “/tcp/192.80.49.99/70” string
  2. The kernel checks the client’s VFS, choosing the mount point which matches the longest part of the string, in this case “/tcp”.
  3. The kernel returns a communication handle to that rendezvous to the client, along with the number of characters matched (4 in this case).
  4. The client sends an OPEN message using the new handle, passing the rest of the string, e.g. “192.80.49.99/70”.
  5. The tcp process receives the OPEN message, returning a new handle for this connection. Inside the tcp process a new thread might be spawned to service the new connection.
  6. The kernel returns the new handle to the client

This now involves three syscalls: the original open, a malloc call to create a chunk to store the path, and a sendreceive call to send the OPEN message.

At some point we’ll have to add permissions to this mechanism, to prevent users from modifying each others’ files or messing up the system. In EuraliOS this will be mainly capabilities based: A process can only access a resource if it has a communication handle to it, and each process can be given its own separate VFS with only the resources it’s allowed to access. The open syscall (and OPEN message) should have permissions flag, so the ways that the communication handle is used can be restricted.

First gopher request

In gopher/src/main.rs:

let handle = syscalls::open("/tcp/192.80.49.99/70").expect("Couldn't open");

let data = [0x0D, 0x0A]; // CR LF

let result = rcall(&handle,
                   message::WRITE, (data.len() as u64).into(),
                   syscalls::MemoryHandle::from_u8_slice(&data).into(),
                   None);

debug_println!("Returned: {:?}", result);

Looking at the packets going through the network card with tcpdump -r dump.dat we see:

ARP, Request who-has gopher.floodgap.com (Broadcast) tell 0.0.0.0, length 28

and then nothing. No machines on the local network know what hardware address 192.80.49.99 (gopher.floodgap.com) has, and if they did then telling 0.0.0.0 wouldn’t do much good.

Now the ARP request gets the hardware address of the gateway, rather than gopher.floodgap.com and then sends IP packets to gopher.floodgap.com through the gateway:

ARP, Request who-has 10.0.2.2 (Broadcast) tell 10.0.2.15, length 28
ARP, Reply 10.0.2.2 is-at 52:55:0a:00:02:02 (oui Unknown), length 50
IP 10.0.2.15.49152 > gopher.floodgap.com.gopher: Flags [S], seq 1043035874, win 1024, options [mss 1446,wscale 0,sackOK,eol], length 0
IP gopher.floodgap.com.gopher > 10.0.2.15.49152: Flags [S.], seq 64001, ack 1043035875, win 65535, options [mss 1460], length 0
IP 10.0.2.15.49152 > gopher.floodgap.com.gopher: Flags [.], ack 1, win 1024, length 0

./img/18-01-write.png

./img/18-02-read.png

./img/18-03-gopher.png

Failing to read a second page

The TCP connection is currenly quite unreliable: If we try reading two pages soon after each other then the connection hangs while sending the request: When sending may_send remains false because the socket is in the SYN_SEND state.

To diagnose this, we can use tcpdump to inspect the packets being sent. First we have the DHCP requests and replies:

17:59:17.645296 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 52:54:00:12:34:56 (oui Unknown), length 262
17:59:17.645384 IP 10.0.2.2.bootps > 255.255.255.255.bootpc: BOOTP/DHCP, Reply, length 548
17:59:17.652697 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 52:54:00:12:34:56 (oui Unknown), length 274
17:59:17.652721 IP 10.0.2.2.bootps > 255.255.255.255.bootpc: BOOTP/DHCP, Reply, length 548

then the ARP request to get the gateway hardware address:

17:59:17.673175 ARP, Request who-has 10.0.2.2 (Broadcast) tell 10.0.2.15, length 28
17:59:17.673266 ARP, Reply 10.0.2.2 is-at 52:55:0a:00:02:02 (oui Unknown), length 50

then the first request and reply:

17:59:17.732320 IP 10.0.2.15.49152 > gopher.floodgap.com.gopher: Flags [S], seq 1043035874, win 1448, options [mss 1446,wscale 0,sackOK,eol], length 0
17:59:17.800152 IP gopher.floodgap.com.gopher > 10.0.2.15.49152: Flags [S.], seq 64001, ack 1043035875, win 65535, options [mss 1460], length 0
17:59:17.843614 IP 10.0.2.15.49152 > gopher.floodgap.com.gopher: Flags [.], ack 1, win 1460, length 0

where we send a SYN packet ([S] flag) to establish a connection; gopher.floodgap.com.gopher responds with a SYN-ACK ([S.]), establishing a connection. We then push some data ([P.] flag), and floodgap acknowledges:

17:59:17.864604 IP 10.0.2.15.49152 > gopher.floodgap.com.gopher: Flags [P.], seq 1:3, ack 1, win 1460, length 2
17:59:17.864712 IP gopher.floodgap.com.gopher > 10.0.2.15.49152: Flags [.], ack 3, win 65535, length 0

Then floodgap pushes some data and there’s a series of acknowledgements “ok..ok..ok” etc:

17:59:17.929359 IP gopher.floodgap.com.gopher > 10.0.2.15.49152: Flags [P.], seq 1:70, ack 3, win 65535, length 69
17:59:17.960059 IP 10.0.2.15.49152 > gopher.floodgap.com.gopher: Flags [.], ack 70, win 1460, length 0
17:59:17.960178 IP gopher.floodgap.com.gopher > 10.0.2.15.49152: Flags [.], seq 70:1510, ack 3, win 65535, length 1440
17:59:17.976770 IP 10.0.2.15.49152 > gopher.floodgap.com.gopher: Flags [.], ack 1510, win 1460, length 0
17:59:17.976811 IP gopher.floodgap.com.gopher > 10.0.2.15.49152: Flags [.], seq 1510:2950, ack 3, win 65535, length 1440
17:59:17.996857 IP 10.0.2.15.49152 > gopher.floodgap.com.gopher: Flags [.], ack 2950, win 1460, length 0
17:59:17.996928 IP gopher.floodgap.com.gopher > 10.0.2.15.49152: Flags [.], seq 2950:4390, ack 3, win 65535, length 1440
17:59:18.009477 IP 10.0.2.15.49152 > gopher.floodgap.com.gopher: Flags [.], ack 4390, win 1460, length 0

Floodgap is done, so sends a couple of packets with the FIN flag (finish, flag [FP.]), indicating that the session is finished.

17:59:18.009514 IP gopher.floodgap.com.gopher > 10.0.2.15.49152: Flags [FP.], seq 4390:5424, ack 3, win 65535, length 1034
17:59:19.254681 IP gopher.floodgap.com.gopher > 10.0.2.15.49152: Flags [FP.], seq 4390:5424, ack 3, win 65535, length 1034

We then select a different page to load (or the same page), sending a SYN packet to establish a new session:

17:59:21.643464 IP 10.0.2.15.49152 > gopher.floodgap.com.gopher: Flags [S], seq 2972242379, win 1448, options [mss 1446,wscale 0,sackOK,eol],
length 0

Floodgap responds saying that the session is finished, and we get into an endless cycle of insisting that we want a connection, and Floodgap insisting that the session is finished:

17:59:21.643826 IP gopher.floodgap.com.gopher > 10.0.2.15.49152: Flags [F.], seq 5424, ack 3, win 65535, length 0
17:59:22.254872 IP gopher.floodgap.com.gopher > 10.0.2.15.49152: Flags [F.], seq 4390, ack 3, win 65535, length 0
17:59:22.392248 IP 10.0.2.15.49152 > gopher.floodgap.com.gopher: Flags [S], seq 2972242379, win 1448, options [mss 1446,wscale 0,sackOK,eol], length 0
17:59:22.392320 IP gopher.floodgap.com.gopher > 10.0.2.15.49152: Flags [F.], seq 4391, ack 3, win 65535, length 0
17:59:23.106789 IP 10.0.2.15.49152 > gopher.floodgap.com.gopher: Flags [S], seq 2972242379, win 1448, options [mss 1446,wscale 0,sackOK,eol], length 0
17:59:23.106864 IP gopher.floodgap.com.gopher > 10.0.2.15.49152: Flags [F.], seq 4392, ack 3, win 65535, length 0
...

I think this is happening because A) we never sent a finish or reset packet to floodgap, so never closed our side of the connection; B) we used the same port number for both connections (because it’s hard-wired to 49152). Fixing either of these may solve our problem.

Aborting the connection

Calling TcpSocket::abort() on the socket when closing after receiving a CLOSE message results in a RST reset packet ([R.] flag) being sent:

...
22:02:41.534160 IP 10.0.2.15.49152 > gopher.floodgap.com.gopher: Flags [.], ack 17281, win 1460, length 0
22:02:41.534293 IP gopher.floodgap.com.gopher > 10.0.2.15.49152: Flags [FP.], seq 17281:18089, ack 19, win 65535, length 808
22:02:41.569211 IP 10.0.2.15.49152 > gopher.floodgap.com.gopher: Flags [R.], seq 19, ack 18090, win 1460, length 0

This allows another request to be sent, and we can keep reading pages! If we try reloading too quickly however, we still find that this reset is not sent, and we still get stuck. I suspect this is because a new TCP socket is opened on the same port number before smoltcp can send the reset packet.

Using different port numbers

The port used to receive packets when we open a temporary session is called an Ephemeral port, and usually use numbers 49152–65535. To generate a random port number for each session for we could just use the time stamp counter to get a “random” number. There are 16384 available ports, so the chance of any two sessions accidentally sharing a port is low. Unfortunately the Birthday problem implies that once we have just 150 sessions, there is about a 50% chance that two of them share a port number: The probability of them all being different is 1 * (1 - 1/16384) * (1 - 2/16384) * ... and can be calculated with:

def p(n):
    result = 1.
    for i in range(1, n):
        result *= 1. - i / 16384
    return result

where n is the number of sessions, and when n is about 150, p(n) is about 0.5.

There is a list of ephemeral port allocation strategies used and an Internet Engineering Task Force (IETF) RFC 6056 on “Recommendations for Transport-Protocol Port Randomization” which suggests some solutions: The problem seems to be a trade-off, with more randomisation perhaps improving security but also increasing chances of collisions.

For now the more robust choice seems to be to just allocate sequentially, so we have zero chance of collisions until 16384 sessions have been opened. Since it needs to be thread-safe, we can use the atomic::AtomicU16 type:

use core::sync::atomic::{AtomicU16, Ordering};

fn ephemeral_port_number() -> u16 {
    static PORT: AtomicU16 = AtomicU16::new(49152);
    PORT.fetch_update(Ordering::SeqCst, Ordering::SeqCst,
                      |p| Some(if p == 65535 {49152}
                               else {p + 1})).unwrap()
}

Remaining issues

One relatively minor - though puzzling - issue is that apparently we’re not rendering text files correctly. Following the first link to the gopher/proxy selector, we read the text file and just print every line and get figure fig-broken:

./img/18-04-broken.png

The Gopher specification is RFC 1436 but doesn’t say very much about how text files should be rendered. This is definitely one for later.

The bigger issue is that we can only read files from one IP address, corresponding to floodgap.com. As nice as that site is, a gopher browser which can only visit one IP address isn’t much use. To fix this we need to be able to convert domain names like gopher.floodgap.com into IP addresses. We’ll do this using the Domain Name System in section 20.

First we’ll take a slight detour into timing in the next section, so that the TCP stack can use times to implement things like timeouts.