Gopher is a simple protocol which was introduced in 1991 and at one
time was an alternative to HTTP and the WWW. It is still in use as a
text-based way to browse information over the internet. It uses TCP
port 70, and one of the main gopher servers is gopher.floodgap.com
.
It’s a simple protocol which doesn’t really require a browser to use:
We can use telnet
to send and receive messages and see how this
works. We open a connection to gopher.floodgap.com
on port 70:
$ telnet gopher.floodgap.com 70
Trying 192.80.49.99...
Connected to gopher.floodgap.com.
Escape character is '^]'.
And then pressing Enter
produces:
iWelcome to Floodgap Systems' official gopher server. error.host 1
iFloodgap has served the gopher community since 1999 error.host 1
i(formerly gopher.ptloma.edu). error.host 1
i error.host 1
iWe run Bucktooth 0.2.9 on xinetd as our server system. error.host 1
igopher.floodgap.com is an IBM Power 520 Express with a 2-way error.host 1
i4.2GHz POWER6 CPU and 8GB of RAM, running AIX 6.1 +patches. error.host 1
iSend [email protected] your questions and suggestions. error.host 1
i error.host 1
i*********************************************************** error.host 1
i** OVER 20 YEARS SERVING YOU! ** error.host 1
i** Plain text is beautiful! ** error.host 1
i*********************************************************** error.host 1
<snip>
.
Connection closed by foreign host.
If we want a different document then we enter the “selector” string
before sending the CR-LF. For example typing gopher/welcome
then Enter
should
produce:
Floodgap HELP: What is Gopher?
updated 27 December 2000
gopher n. 1. Any of various short tailed, burrowing mammals of
the family Geomyidae, of North America. 2. (Amer. colloq.)
Native or inhabitant of Minnesota: the Gopher State.
3. (Amer. colloq.) One who runs errands, does odd-jobs, fetches
or delivers documents for office staff. 4. (computer tech.)
Software following a simple protocol for tunneling through a TCP/IP
internet.
Welcome to Gopherspace!
In section 13 we implemented a very basic Virtual File System (VFS), which has allowed programs to open particular paths e.g “/dev/nic” to get a communication handle.
Opening “/tcp/192.80.49.99/70” should result in the tcp server opening a new socket, connecting to the given address (192.80.49.99, gopher.floodgap.com) and port (70). The returned communication handle should allow sending (writing) and receiving (reading) data through that socket.
One way to do this would put everything behind one syscall:
- The client calls the
open
syscall with the “/tcp/192.80.49.99/70” string - The kernel checks the client’s VFS, choosing the mount point which matches the longest part of the string, in this case “/tcp”.
- The kernel sends an
OPEN
message to the rendezvous at that mount point, passing the rest of the string, e.g. “192.80.49.99/70”. - The
tcp
process receives theOPEN
message, returning a new handle for this connection. Inside thetcp
process a new thread might be spawned to service the new connection. - The kernel returns the new handle to the client
Unfortunately this gets more complicated than hoped, so in the spirit of making the kernel as lazy as possible we make the standard library do most of the work:
- The client calls the
open
syscall with the “/tcp/192.80.49.99/70” string - The kernel checks the client’s VFS, choosing the mount point which matches the longest part of the string, in this case “/tcp”.
- The kernel returns a communication handle to that rendezvous to the client, along with the number of characters matched (4 in this case).
- The client sends an
OPEN
message using the new handle, passing the rest of the string, e.g. “192.80.49.99/70”. - The
tcp
process receives theOPEN
message, returning a new handle for this connection. Inside thetcp
process a new thread might be spawned to service the new connection. - The kernel returns the new handle to the client
This now involves three syscalls: the original open
, a malloc
call
to create a chunk to store the path, and a sendreceive
call to
send the OPEN
message.
At some point we’ll have to add permissions to this mechanism, to
prevent users from modifying each others’ files or messing up the
system. In EuraliOS this will be mainly capabilities based: A process
can only access a resource if it has a communication handle to it, and
each process can be given its own separate VFS with only the resources
it’s allowed to access. The open
syscall (and OPEN
message) should
have permissions flag, so the ways that the communication handle is
used can be restricted.
In gopher/src/main.rs
:
let handle = syscalls::open("/tcp/192.80.49.99/70").expect("Couldn't open");
let data = [0x0D, 0x0A]; // CR LF
let result = rcall(&handle,
message::WRITE, (data.len() as u64).into(),
syscalls::MemoryHandle::from_u8_slice(&data).into(),
None);
debug_println!("Returned: {:?}", result);
Looking at the packets going through the network card with tcpdump -r dump.dat
we see:
ARP, Request who-has gopher.floodgap.com (Broadcast) tell 0.0.0.0, length 28
and then nothing. No machines on the local network know what hardware address 192.80.49.99
(gopher.floodgap.com
)
has, and if they did then telling 0.0.0.0
wouldn’t do much good.
Now the ARP request gets the hardware address of the gateway, rather
than gopher.floodgap.com
and then sends IP packets to
gopher.floodgap.com
through the gateway:
ARP, Request who-has 10.0.2.2 (Broadcast) tell 10.0.2.15, length 28
ARP, Reply 10.0.2.2 is-at 52:55:0a:00:02:02 (oui Unknown), length 50
IP 10.0.2.15.49152 > gopher.floodgap.com.gopher: Flags [S], seq 1043035874, win 1024, options [mss 1446,wscale 0,sackOK,eol], length 0
IP gopher.floodgap.com.gopher > 10.0.2.15.49152: Flags [S.], seq 64001, ack 1043035875, win 65535, options [mss 1460], length 0
IP 10.0.2.15.49152 > gopher.floodgap.com.gopher: Flags [.], ack 1, win 1024, length 0
The TCP connection is currenly quite unreliable: If we try reading two pages soon
after each other then the connection hangs while sending the request:
When sending may_send
remains false
because the socket is in the SYN_SEND
state.
To diagnose this, we can use tcpdump
to inspect the packets being sent.
First we have the DHCP requests and replies:
17:59:17.645296 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 52:54:00:12:34:56 (oui Unknown), length 262
17:59:17.645384 IP 10.0.2.2.bootps > 255.255.255.255.bootpc: BOOTP/DHCP, Reply, length 548
17:59:17.652697 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 52:54:00:12:34:56 (oui Unknown), length 274
17:59:17.652721 IP 10.0.2.2.bootps > 255.255.255.255.bootpc: BOOTP/DHCP, Reply, length 548
then the ARP request to get the gateway hardware address:
17:59:17.673175 ARP, Request who-has 10.0.2.2 (Broadcast) tell 10.0.2.15, length 28
17:59:17.673266 ARP, Reply 10.0.2.2 is-at 52:55:0a:00:02:02 (oui Unknown), length 50
then the first request and reply:
17:59:17.732320 IP 10.0.2.15.49152 > gopher.floodgap.com.gopher: Flags [S], seq 1043035874, win 1448, options [mss 1446,wscale 0,sackOK,eol], length 0
17:59:17.800152 IP gopher.floodgap.com.gopher > 10.0.2.15.49152: Flags [S.], seq 64001, ack 1043035875, win 65535, options [mss 1460], length 0
17:59:17.843614 IP 10.0.2.15.49152 > gopher.floodgap.com.gopher: Flags [.], ack 1, win 1460, length 0
where we send a SYN packet ([S]
flag) to establish a connection;
gopher.floodgap.com.gopher
responds with a SYN-ACK ([S.]
),
establishing a connection. We then push some data ([P.]
flag), and
floodgap acknowledges:
17:59:17.864604 IP 10.0.2.15.49152 > gopher.floodgap.com.gopher: Flags [P.], seq 1:3, ack 1, win 1460, length 2
17:59:17.864712 IP gopher.floodgap.com.gopher > 10.0.2.15.49152: Flags [.], ack 3, win 65535, length 0
Then floodgap pushes some data and there’s a series of acknowledgements “ok..ok..ok” etc:
17:59:17.929359 IP gopher.floodgap.com.gopher > 10.0.2.15.49152: Flags [P.], seq 1:70, ack 3, win 65535, length 69
17:59:17.960059 IP 10.0.2.15.49152 > gopher.floodgap.com.gopher: Flags [.], ack 70, win 1460, length 0
17:59:17.960178 IP gopher.floodgap.com.gopher > 10.0.2.15.49152: Flags [.], seq 70:1510, ack 3, win 65535, length 1440
17:59:17.976770 IP 10.0.2.15.49152 > gopher.floodgap.com.gopher: Flags [.], ack 1510, win 1460, length 0
17:59:17.976811 IP gopher.floodgap.com.gopher > 10.0.2.15.49152: Flags [.], seq 1510:2950, ack 3, win 65535, length 1440
17:59:17.996857 IP 10.0.2.15.49152 > gopher.floodgap.com.gopher: Flags [.], ack 2950, win 1460, length 0
17:59:17.996928 IP gopher.floodgap.com.gopher > 10.0.2.15.49152: Flags [.], seq 2950:4390, ack 3, win 65535, length 1440
17:59:18.009477 IP 10.0.2.15.49152 > gopher.floodgap.com.gopher: Flags [.], ack 4390, win 1460, length 0
Floodgap is done, so sends a couple of packets with the FIN flag
(finish, flag [FP.]
), indicating that the session is finished.
17:59:18.009514 IP gopher.floodgap.com.gopher > 10.0.2.15.49152: Flags [FP.], seq 4390:5424, ack 3, win 65535, length 1034
17:59:19.254681 IP gopher.floodgap.com.gopher > 10.0.2.15.49152: Flags [FP.], seq 4390:5424, ack 3, win 65535, length 1034
We then select a different page to load (or the same page), sending a SYN packet to establish a new session:
17:59:21.643464 IP 10.0.2.15.49152 > gopher.floodgap.com.gopher: Flags [S], seq 2972242379, win 1448, options [mss 1446,wscale 0,sackOK,eol],
length 0
Floodgap responds saying that the session is finished, and we get into an endless cycle of insisting that we want a connection, and Floodgap insisting that the session is finished:
17:59:21.643826 IP gopher.floodgap.com.gopher > 10.0.2.15.49152: Flags [F.], seq 5424, ack 3, win 65535, length 0
17:59:22.254872 IP gopher.floodgap.com.gopher > 10.0.2.15.49152: Flags [F.], seq 4390, ack 3, win 65535, length 0
17:59:22.392248 IP 10.0.2.15.49152 > gopher.floodgap.com.gopher: Flags [S], seq 2972242379, win 1448, options [mss 1446,wscale 0,sackOK,eol], length 0
17:59:22.392320 IP gopher.floodgap.com.gopher > 10.0.2.15.49152: Flags [F.], seq 4391, ack 3, win 65535, length 0
17:59:23.106789 IP 10.0.2.15.49152 > gopher.floodgap.com.gopher: Flags [S], seq 2972242379, win 1448, options [mss 1446,wscale 0,sackOK,eol], length 0
17:59:23.106864 IP gopher.floodgap.com.gopher > 10.0.2.15.49152: Flags [F.], seq 4392, ack 3, win 65535, length 0
...
I think this is happening because A) we never sent a finish or reset packet to floodgap, so never closed our side of the connection; B) we used the same port number for both connections (because it’s hard-wired to 49152). Fixing either of these may solve our problem.
Calling TcpSocket::abort() on the socket when closing after receiving
a CLOSE
message results in a RST reset packet ([R.]
flag) being
sent:
...
22:02:41.534160 IP 10.0.2.15.49152 > gopher.floodgap.com.gopher: Flags [.], ack 17281, win 1460, length 0
22:02:41.534293 IP gopher.floodgap.com.gopher > 10.0.2.15.49152: Flags [FP.], seq 17281:18089, ack 19, win 65535, length 808
22:02:41.569211 IP 10.0.2.15.49152 > gopher.floodgap.com.gopher: Flags [R.], seq 19, ack 18090, win 1460, length 0
This allows another request to be sent, and we can keep reading pages! If we try reloading too quickly however, we still find that this reset is not sent, and we still get stuck. I suspect this is because a new TCP socket is opened on the same port number before smoltcp can send the reset packet.
The port used to receive packets when we open a temporary session is
called an Ephemeral port, and usually use numbers 49152–65535. To
generate a random port number for each session for we could just use
the time stamp counter to get a “random” number. There are 16384
available ports, so the chance of any two sessions accidentally
sharing a port is low. Unfortunately the Birthday problem implies that
once we have just 150 sessions, there is about a 50% chance that two
of them share a port number: The probability of them all being
different is 1 * (1 - 1/16384) * (1 - 2/16384) * ...
and can be
calculated with:
def p(n):
result = 1.
for i in range(1, n):
result *= 1. - i / 16384
return result
where n
is the number of sessions, and when n
is about 150, p(n)
is about 0.5.
There is a list of ephemeral port allocation strategies used and an Internet Engineering Task Force (IETF) RFC 6056 on “Recommendations for Transport-Protocol Port Randomization” which suggests some solutions: The problem seems to be a trade-off, with more randomisation perhaps improving security but also increasing chances of collisions.
For now the more robust choice seems to be to just allocate sequentially, so we have zero chance of collisions until 16384 sessions have been opened. Since it needs to be thread-safe, we can use the atomic::AtomicU16 type:
use core::sync::atomic::{AtomicU16, Ordering};
fn ephemeral_port_number() -> u16 {
static PORT: AtomicU16 = AtomicU16::new(49152);
PORT.fetch_update(Ordering::SeqCst, Ordering::SeqCst,
|p| Some(if p == 65535 {49152}
else {p + 1})).unwrap()
}
One relatively minor - though puzzling - issue is that apparently we’re
not rendering text files correctly. Following the first link to the
gopher/proxy
selector, we read the text file and just print every
line and get figure fig-broken:
The Gopher specification is RFC 1436 but doesn’t say very much about how text files should be rendered. This is definitely one for later.
The bigger issue is that we can only read files from one IP address,
corresponding to floodgap.com
. As nice as that site is, a gopher
browser which can only visit one IP address isn’t much use. To fix
this we need to be able to convert domain names like gopher.floodgap.com
into IP addresses. We’ll do this using the Domain Name System in
section 20.
First we’ll take a slight detour into timing in the next section, so that the TCP stack can use times to implement things like timeouts.