We may release some updates/bug fixes to the infrastructure files during the project.
If we do, the only thing you will have to do is run the following command in the repository in your VM:
git pull https://gitlab.ethz.ch/nsg/public/adv-net-2020-project.git master
During this year's project, you and your group are responsible for configuring the network shown below. Your primary objective is to deliver as much traffic (i.e., maximize the number of successfully delivered packets) as possible for different traffic patterns and failures.
As for the exercises, we provide each group of students with a VM where everything is pre-installed already.
This time, your VM has more CPUs, memory and hard drive to ensure better performance when testing the network.
To access your VM, you can follow the guidelines that we wrote for the exercises. The login info for your group's VM are in login_XX.txt
To get started with the project, log into your VM and clone this repository in a directory called /infrastructure
in /home
. This is done with the following command
cd ~
git clone <THIS-REPO> infrastructure
- Overview
- Configuring the network
- Building the topology and running scenarios
- Frequently asked questions (FAQ)
Below, we give you the key points about the provided topology and your task: successfully delivering as much traffic as possible for different traffic and failure scenarios.
-
We provide you with the topology shown above consisting of hosts, P4 switches, and FRR routers. All interfaces and addresses are pre-configured and you already have basic connectivity in your network. That is, each host can reach any other host, which you can verify using
ping
or similar. -
The links between hosts, switches, and routers have different capacities, as shown in the figure.
-
There are six hosts (
h1
--h6
) connected to your network. Traffic will be sent/received by these hosts, but you have otherwise no control over them, i.e. you will not configure the hosts. -
Each host is connected to a P4 switch (
S1
--S6
) at the edge of your network, which you can program during this challenge. All P4 switches are managed by a central controller (not shown), which you can also program. The switches and controller we provide you are already capable of L2 forwarding to ensure connectivity in your network. In addition to modifying the P4 program of each switch, you also have access to the Linux nodes the switches are running on, which allows you to run commands such astc
. -
In the core of your network are four routers running FRR. The interfaces as well as OSPF are already configured to ensure connectivity. The OSPF weight of each link is initialized to 10. Similar to switches, you can both configure FRR and the underlying Linux node.
During this project, the hosts exchange UDP traffic according to traffic scenarios. Your task is to deliver as much of this traffic as possible. Three main factors will make your life difficult:
- The volume of exchanged traffic will often exceed the capacity of the shortest paths in the network (in its initial configuration).
- You only have limited setup time before traffic starts.
- While traffic is exchanged, links will fail randomly according to failure scenarios.
The traffic and failure scenarios are defined in the scenarios
directory.
They are CSV files with the respective file endings .traffic
for traffic scenarios and .failure
for failure scenarios.
We will explain them in detail below.
Feel free to add your own scenarios if you want to experiment!
In each traffic scenario file, the first line indicates what each column is used for. With that information, you can build your own custom scenarios and further test your solution.
In general, each scenario lasts one minute. Your router configuration and controller will be started at the same time as the scenario. To give your network time to stabilize (e.g. to exchange messages or install table rules), traffic and failures will only begin a few seconds after the start of the scenario.
The default.traffic
scenario is defined as follows:
-
Hosts exchange UDP traffic. We use the term flow to describe UDP traffic from one host to another.
-
Hosts can be both senders and destinations of flows at the same time.
-
Flows begin 5 seconds after the scenario starts (
t=5
), and end after 50 seconds (t=55
). The receivers listen 5 more seconds for remaining traffic. Anything arriving after one minute (t=60
) is considered lost. -
Flows belong to different classes: gold, silver, and bronze. Intuitively, gold traffic matters most, followed by silver, then bronze. Hosts will indicate the class of their traffic using the
TOS
field in the IP header: 128 for gold, 64 for silver, 32 for bronze. -
UDP packets payload won't be bigger than 1400 Bytes (but can be lower). That means, adding
ethernet
,ip
,udp
headers, sent packets won't be bigger than 1442 bytes.
There are other traffic scenarios to test your configuration. They do not necessarily follow the specification above, but might be helpful to test your configuration.
default-*
: different node combinations.hard
: Each class sends at 12 Mbps.two-flows
: Each host sends two flows instead of one.circle
:H1
sends toH2
,H2
toH3
, ..., andH6
toH1
.incast
:H1
does not send traffic, all other hosts send traffic toH1
. There is 1 gold, 2 silver, and 2 bronze flows in this scenario.
The default.failure
scenario is defined as follows:
-
After 10 seconds (
t=10
), links will fail pseudo-randomly. -
Each failure starts at a random time between
t=10
andt=35
, and lasts 10 seconds. -
Failures will never partition the network, i.e. your network will always stay connected. In particular, this means links between hosts and switches never fail.
-
Multiple failures may occur at the same time, but there will never be more than three failures at the same time.
There are other failure scenarios to test your configuration. They do not necessarily follow the specification above, but might be helpful to test your configuration.
default-*
: same rules, but different random seeds for other failures.hard
: Up to 8 failures at the same time.persistent-*
: Fail 1, 2, or 3 links fromt=5
up tot=50
, i.e. no links go up and down except at the start and end of the experiment.
Like with traffic scenarios, feel free to build your own failure scenarios!
To evaluate the performance of your solution, we use the following process.
-
For each flow, we measure the packet reception ratio (PRR), the fraction of correctly received packets.
-
The combined score is the weighted average of all PRRs. Gold traffic will account for
10/15
of the combined score, silver traffic for4/15
, and bronze traffic for1/15
(regardless of the number of flows per class). Consequently, your combined score will be between0
(nothing arrived) and1
(everything arrived).
Keep in mind that having a high score with one scenario does not mean that your solution performs well in general.
With scenarios that do not follow the
default
specification (e.g.incast
), it might not be possible to reach a score of 1 because the network simply does not provide sufficient bandwidth.
We provide a leaderboard where you can compare your solution against the other groups. To assess the performance of your solution, we run it on our server and on traffic and failure scenarios that we keep private. We now give more detail on how we compute your score on the leaderboard.
-
We test your solution on multiple scenarios, and for each scenario we do multiple runs.
-
For each scenario, there is always exactly 6 flows in total.
-
We always use at least one flow of each class (Gold, Silver and Bronze).
-
Each scenario lasts 1 minute.
-
Each scenario is run both with and without failures, and your network should perform well in both cases.
-
Finally, we assess your solution using the packet reception ratio. The final score will be the minimum of the scores obtained with and without failures.
We recommend you to follow the
default
specification presented above to create custom scenarios similar to our evaluation, however you are free to use more flows or shorter/longer scenarios if you would like to.
Your complete network configuration should go into the configuration
directory.
You will find sub-directories for routers, switches, and a controller.
Here, we explain the purpose of each file, and in the next section we introduce the commands to apply the configuration.
Some files already contain an initial configuration to ensure that your network is connected. You are free to change any existing configuration (unless you are explicitly told not to change it). However, do not rename or move the provided files.
You are already familiar with the router configuration scripts from the MPLS and Multicast exercises.
In configuration/routers
, you will find a configuration script for each router.
The commands in this script will be executed in a terminal on the corresponding router node.
The most important command is vtysh
, the virtual terminal used to configure FRR.
Any lines you add here will be sent to the vtysh
as if you were entering them one by one.
However, keep in mind that you can add additional commands before/after the vtysh
command!
Any command that works in the router terminal can be used, for example tc
.
Your configuration/routers/R1.sh
could look like this:
vtysh << EOM
(...)
EOM
echo "Configuring tc."
tc (some command)
Similar to routers, you will find S*.sh
scripts corresponding to each switch in configuration/p4switches
.
Other than routers, the switches do not have a vtysh
, but you can still use other commands, if you need.
In this folder, you can also find the P4 program switch.p4
, which will be installed on all switches.
The default program is already capable of L2 forwarding.
While all switches share the same program, their functionality can differ greatly, depending on which rules are installed on each switch, which register values are set, etc.
Finally, there is a S*-commands.txt
file corresponding to each switch, which you may use for static P4 commands (if any).
Finally, there is a central controller that is connected to all P4 switches.
This controller is implemented in Python 2 in configuration/controller.py
.
We cannot (yet) use Python 3 because of some dependencies that do not support it. Make sure that your controller works with Python 2.
The controller is started with the selected traffic scenario and automatically parses the CSV file for you.
The controller variable traffic
contains a list of dictionaries.
Each dict contains the information of one flow, such as src
, dst
, etc.
The provided controller already installs the required rules for L2 forwarding onto the switches.
⛔ The controller may only interact with the P4 switches, and nothing else. Aside from the (already provided) traffic scenario, the controller must not attempt to read any traffic or failure files, nor run any commands interacting with the infrastructure, in particular no docker
or ovs-ofctl
commands.
Finally, you may find yourself in need of some additional dependencies.
If anything has to be installed, be sure to add this requirement to configuration/requirements.txt
. (What are requirement files?)
Only requirements from PyPI are allowed, i.e. requirements that you can install "normally" via pip install <req>
.
Pip is a powerful tool that is capable of installing requirements from github and other sources, but you must not do this for this project. We will not install any requirement that does not come from PyPI, and using them may result in your controller not working.
To make operating the network with routers, switches, and a controller as easy as possible, we have prepared the script cli.sh
for you.
With this command line interface (CLI), you can run the whole project pipeline from build to traffic generation; start individual steps of the pipeline; as well as access nodes to try out commands.
Usage: cli.sh [FLAGS] COMMAND [ARGS...]
Main commands:
run-pipeline [t] [f] Run everything from build over config to scenario.
run-pipeline-dry [t] Run everything *except* traffic or failures.
list-scenarios Show all available traffic and failure scenarios.
access [node] [cmd...] Access node. If no cmd is given, open default shell.
monitor Print the bit rate of each link in real time.
Commands to run individual steps of the pipeline:
build Build the project topology.
cleanup Cleans the project topology.
install-requirements Install python requirements.
configure-nodes Send configuration commands to router and switch nodes.
start-switches Compile the p4 programs and start all switches.
run-controller [t] Run the controller.
run-scenario [t] [f] Generate traffic and failures.
Additional management commands:
stop-switches Stop all p4 switches.
reboot-switches Reboot all p4 switches.
Arguments:
[t] Traffic scenario name. See list-scenarios.
[f] Failure scenario name. See list-scenarios.
[node] Node name, e.g. H1, S1 or R1. Capitalization does not matter.
[cmd...] Any command(s) and arguments.
Flags:
--no-opt Do *not* use optimized p4 switches (build).
--pcap Enable pcap for non-optimized p4 switches (start-switches).
--debug-conf Configure nodes sequentially, printing output (configure-nodes).
There are some flags that allow you to change the behavior of ./cli.sh
, mostly intended to make debugging easier.
The flags are accepted by both individual commands and run-pipeline
.
Make sure to put all flags before the command, like this:
./cli.sh --no-opt --pcap run-pipeline
./cli.sh --no-opt --pcap start-switches
Commands ignore flags that are irrelevant for them, e.g.
start-switches
would ignore--no-opt
, which is abuild
flag.
See the respective commands below for a detailed explanation of each flag.
This command runs everything. It is also the command that we will use to evaluate your configuration, so it's your best option to check whether everything works.
You can either run it without any arguments, or provide the names of both traffic (t
) and failure (f
) scenario to run.
When providing scenarios, use the scenario name (without directory or file ending).
For example, ./cli.sh run-pipeline example1 example2
will use the traffic scenario scenarios/example1.traffic
and the failure scenario scenarios/example2.failure
(They do not exist, the names are just chosen for illustration).
You can combine each traffic scenario with any failure scenario.
If you do not provide any arguments, default.traffic
and default.failure
are used.
You may be wondering about the timeline of events during the pipeline. Concretely, the following happens first:
- The network is built (
./cli.sh build
). - Requirements are installed (
./cli.sh install-requirements
). - The P4 programs are compiled and the P4 switches are started, along with static rules (
./cli.sh start-switches
). - Configuration scripts are executed (
./cli.sh configure-nodes
).
The first time you build the network it will take few minutes because the script pulls several docker images, after it should be faster.
Right after the configuration commands have been sent, the following happens at the same time:
- Traffic and failure generation is started (
./cli.sh run-scenario
). - The controller for P4 switches is started (
./cli.sh run-controller
).
As described above, traffic and failures will not begin immediately, so your routers and controller will have a few seconds to converge to their final state.
If you want to run the individual steps manually, it's a good idea to stick to this general order of commands. Running the commands individually gives you more flexibility, e.g. you can recompile your P4 programs without rebuilding the whole topology.
During development, you might often need to start the whole setup, but without actually starting any scenarios, such that you can experiment yourself.
To faciliate this, you can run ./cli.sh run-pipeline-dry
, which does exactly this:
Run everything, except traffic or failures.
You can still provide the name of a traffic scenario, which is passed to your controller.
If you are unsure which scenarios are available, use this command.
$ ./cli.sh list-scenarios
Traffic scenarios:
circle
default-2
default-3
default
hard
none
two-flows
Failure scenarios:
default-2
default-3
default
hard
none
persistent-1
persistent-2
persistent-3
The
none
scenarios simply do nothing, and you can use them to test traffic without failures, etc.
Similar to the ./access.sh
script from previous exercises, this command allows you to open a terminal or run commands on each node in the network.
Be careful! This command is only intended for trying out stuff. If you want anything to be considered during evaluation, you must put it in one of the configuration files (see Configuring the network above)!
To open a terminal, just provide the node name without any parameters, e.g. ./cli.sh access H1
for host 1 or ./cli.sh access R1
for router 1.
By default, this will open bash
for hosts and switches, and the FRR vtysh
for routers.
You can also open a shell explicitly, e.g. ./cli.sh R1 bash
opens bash
instead of vtysh
on router R1.
The same syntax can also be used to directly run a command, which is particularly useful for tcpdump
:
$ ./cli.sh access h1 tcpdump -ni S1switch
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on S1switch, link-type EN10MB (Ethernet), capture size 262144 bytes
Remember that you can also run e.g. ./cli.sh access h1 ifconfig
to check interface names.
Capitalization does not matter, Both
H1
andh1
, etc., are accepted.
With multiple flows and failures, it can be cumbersome to track whats going on. We have prepared a monitoring command that displays the current link load for all links in real-time.
The command displays the bit rate for both directions of a link. The bit rate from the left to the right of a link is the value on the left between parentheses whereas the bit rate from the right to the left of the link is the value on the right. If a link is strictly vertical, the bit rate for the direction top-bottom is on the left and the bit rate for the direction bottom-left is on the right.
Make sure your terminal is high and wide enough to fit the whole display!
Use ./cli.sh build
and ./cli.sh cleanup
to build and tear down all containers.
By default, we use an optimized p4 image that allows high throughput.
The --no-opt
flag uses a non-optimized image instead that logs more events for easier debugging.
Also, if you want to use the --pcap
flag, you must use --no-opt
.
After building, you can verify that the containers are up using sudo docker ps
. You should see a container for every node in the network.
$ sudo docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
2ab1000cecd8 thomahol/d_host "/usr/sbin/docker-st…" 8 minutes ago Up 8 minutes 1_S6host
134565454cde thomahol/d_p4_opt "/usr/sbin/docker-st…" 8 minutes ago Up 8 minutes 1_S6router
1_S5host
(...)
If you use the
--no-opt
flag, you will see thethomahol/d_p4
image instead.
Executes all .sh
scripts in the configuration/
directory on the respective nodes.
By default, all nodes are configured in parallel, to avoid situations where one node blocks another.
Also, all output is hidden by default, as lines from any configuration would show up intermingled and hard to match to a specific node.
As this can make debugging difficult, you can provide the --debug-conf
flag.
This flag forced the nodes to be configured one after another, and displays all output.
./cli.sh start-switches
both compiles the P4 program in configuration/
and starts all P4 switches with it.
Use this command if you want to recompile your P4 program and start switches with the new program.
The stop and reboot commands do not recompile the P4 program.
When you have built non-optimized p4 switches using --no-opt
, you may use the --pcap
flag when starting switches to collect all traffic.
You can find the pcap files in ~/infrastructure/shared/S*/pcap/
.
Especially the pcap files can take up a lot of disk space! If you run out of space, make sure to clean your pcaps and logs.
Installs all controller requirements in configuration/requirements.txt
.
Starts the controller in configuration/controller.py
.
The chosen traffic scenario ([t]
) is sent to the controller as well.
If you do not specify a traffic scenario, the default
scenario is used.
When you run the commands manually and rely on the traffic scenario in your controller, make sure to pass the correct scenario to the controller!
run-pipeline
automatically takes care of this.
Runs traffic and failure scenarios. The arguments are the same as for run-pipeline
.
All the links but the ones between the switches and the hosts.
No. We give more detail in this section of the README.
No. And if you do so, your host configuration will not be used when we will run your solution on our server. You can only configure your network via the configuration files available in the configuration
directory.
No, it does not.
This is because the optimized switch is compiled with the digest capability disabled.
Fortunately, you can still use copy_to_cpu
to send message from the data plane to the control plane. We show how to use copy_to_cpu
in this example.
You should use iperf3
instead of iperf
, it is installed on all the hosts as well as the routers and switches.
You can use ping -Q 0x80
, this should work just fine.
This is the expected behavior. The problem with iperf3
is that it sends sequences of bursts of UDP packets, instead of sending UDP packets constantly. Because of that, the queues used in the devices will quickly become full upon a burst of UDP packets, and many packets will then be dropped.
If you want to measure the available bandwidth with UDP traffic, we provide the upd.py
script that is available on every host in the /home
directory.
You can start a UDP flow sending 4Mbps during 5sec with the following command.
python3 -i udp.py
>>> send_udp_flow("2.0.0.1", rate="4M", duration=5, packet_size=1500, batch_size=1)
Here, the packets will have a size of 1500 bytes, and will be sent one by one (batch_size=1). If you want to see how many packets arrived=, you can run the following command one the destination node, where the first parameter is the source IP and the second parameter is the destination port.
python3 -i udp.py
>>> recv_udp_flow("1.0.0.1", 5001)
Then with ctrl+c
you can see the number of received packets.
iperf3
can only use TOS values that are multiple of 4 (with the option -S
or --tos
). If you want to use a TOS value that is not multiple of 4, you can use the udp.py
script that we provide with the tos
option on the sender side. For instance if you want to use a TOS of 27:
python3 -i udp.py
>>> send_udp_flow("2.0.0.1", rate="4M", duration=5, packet_size=1500, batch_size=1, tos=27)
FRRouting does not support everything you have seen during the lecture, or some features that you have seen in configuration examples for real routers. Check the FRRouting Documentation to see which features are available. To safe yourselves some time searching, we collect known limitations below: OSPF does not support LFAs (loop-free alternates).
In the MPLS exercise, we used 0x8848
for the ethertype, which means MPLS multicast label switched packet
. Actually, you need to use instead the ethertype 0x8847
, which means MPLS label switched packet
, otherwise the routers will not process the MPLS packets. We have updated the MPLS exercise solution, but make sure to also update your P4 code.
Switch nodes run bmv2
(the p4 switch software implementation) which sends and receives packets from the interfaces
using libpcap
and uses binds using raw
sockets and ETH_P_ALL
. That makes iptables
not to intercept any packet. Furthermore, and more importantly for us when using the tc filter
we can not match to protocol ip
anymore since the packet protocol is unknown by the underlying structures. Thus, to make tc filter
in our switch interfaces we have to replace:
Does not work:
tc filter add dev intf parent 1: protocol ip prio 1 u32 match ip dsfield 1 0xff flowid 1:10
Works:
tc filter add dev intf parent 1: protocol all prio 1 u32 match ip dsfield 1 0xff flowid 1:10
When trying to do tc
it is very important you also rate limit the traffic to the interface maximum, otherwise your priorities, fair queueing, etc wont work. Why so? your device interfaces have “unlimited” bw, we do all the rate limiting in some middle nodes we added. Thus, if you use priorities or Fair queueing packets are dequeued so fast that its like having nothing, therefore you must rate limit at the same time you use priorities, or other things.
Important: if you use htb
make sure the sum of children rates
does not exceed the parent rate
otherwise child nodes will be able to send above the limit. Futhremore, remember that the rate
is just the guaranteed rate
for a given class, you can set the sum of rates to be at max the parent rate
(or link bw) or you can set it to a lower number. This is very important to avoid lower prioity traffic to steal some bandwidth to higher priority traffic classes.
You can send any packet or clone/mirror packets to the controller as much as you want with only one limitation. You are not allowed to buffer normal traffic in the controller and then inject it back to the network. Also you are not allowed to use the controller as a forwarding node. For example forwarding normal traffic like S1->Controller->S6
.
You are allowed to inject as many packets (be careful with cpu and bandwidth usage) from the controller to switches. You can use those packets to either clock
the switches or make switches forward them somewhere.
To send packets to the controller you can either forward a packet to the cpu port (but then you lose that traffic packet)
or you clone a packet. You can find an example here https://github.com/nsg-ethz/p4-learning/tree/master/exercises/04-L2_Learning. You can see that to receive packets we use scapy
and the sniff
function https://github.com/nsg-ethz/p4-learning/blob/master/exercises/04-L2_Learning/solution/l2_learning_controller.py#L123. You are free to use that or any other function or python library to get packets from interfaces. However, keep in mind that sniff
will block your programm execution. You can either listent to multiple interfaces at the same time or you can use Threads
to deal with all the switches in parallel.
For that use the topology object, you can either use topo.get_cpu_port_intf(sw_name)
or topo.get_ctl_cpu_intf(sw_name)
.
To send packets from the controller to the switch you need to inject
raw packets to the respective switch
cpu
port interface. To get the name use the topology object.
You can send packets using any python tool/library that allows sending raw
packets. You can use scapy
, raw
linux sockets. For simplicity, we recommend
you to use scapy
and the sendp
function (https://0xbharath.github.io/art-of-packet-crafting-with-scapy/scapy/sending_recieving/index.html). However, if you need
more performance you can use python raw sockets (https://sira.dev/2019/06/24/layer-2-socket-programming.html).