Skip to content

Protocol

paidforby edited this page Apr 8, 2020 · 20 revisions

LoRa Layer 2 Networking (LoRa Mesh)

A LoRaLayer2 (LL2) network is a hetreogenous wireless mesh network that utilizes the LoRa modulation scheme for point-to-point or point-to-multipoint connections between nodes. The following document describes the LoRaLayer2 protocol developed for use on LoRa mesh networks.

Outline:

Existing Work

The LoRaLayer2 protocol is heavily reliant on existing LoRa technology and development.

Physical properties

LoRa refers to a proprietary modulation scheme developed by Semtech. For more info, the physical modulation was reverse engineered by Matt Knight, https://www.link-labs.com/blog/what-is-lora. The modulation scheme should not be confused with LoRaWAN, the protocol stack for the Things Network, https://www.link-labs.com/blog/what-is-lorawan.

Mesh routing protocols

There are a number of mesh routing protocols and alogrithms that have been implemented for TCP/IP networks. The work done on the LoRaLayer2 routing protocol is particularly inspired by jech's work on the Babel routing protocol.

Layer 2 protocol

There are a few network stacks already being devleoped for LoRa-based networks that are worth looking into,

  • Things Network - star topology, mostly for sensor networks
  • Reticulum - pre-alpha, may be capabale of ad-hoc style meshing, physical layer agnostic
  • 6LoWPAN - IPv6 network stack for low power wireless personal area networks, "industry" standard (i.e. little publicly-available documentation), related to Thread
  • IEEE 802.15.4e - MAC amendement to IEEE standard for wireless personal area networks (WPANs)
  • PJON - an open-source network protocol able to connect devices using most physical layers and media, some LoRa support

Goals, Challenges, Assumptions

Below are some of the initial goals, known limitations, and baseline assumptions for building the LoRaLayer2 protocol.

Desired capabilities

Implement a network stack for LoRa mesh networks that is capable of,

  • Communication between any two nodes on network via LoRa, without the need for a server or gateway.
  • Distance-vector, loop-avoiding, routing of packets on network with limited knowledge of network topology
  • Minimal design that could be easily reimplemented on many devices (e.g. Arduino microcontrollers, OpenWrt routers, Linux computers)
  • Connectionless protocol that does not require acknowledgements, so as not to over use air space

Challenges and limitations

Network structure

The structure of a LoRa mesh network is assumed to be x number of devices (or nodes) with a single radio transceiver interface that transmits messages omnidirectionally such that, under ideal conditions, all nodes should receive every message transmitted by a neighboring node. LoRa packets take a relatively long time (200ms - 1s) to be transmited, therefore the duty-cycle of devices should be proportional to the number of devices on the network (or number of neighbors?) to guarentee a certain RF air quality.

LoRa airtime

The airtime of a LoRa packet means the time it takes for a LoRa transceiver to transmit a single packet. Due to LoRa's modulation scheme it takes a fairly long time to transmit a packet. This transmit time (or packet airtime) is a factor of size of the packet being sent and a few transceiver settings that are set in advance and for our purpose represent constants. LoRa airtime for packet of size N can be calculated as follows,

airtime(N) = 8 + (max(ceil((8 * N - 4 * SF + 28 + 16 - 20 * (1 - HD)) / (4 * (SF - 2 * DR))) * CR, 0));

where, N is the number of bytes in the packet's payload
SF is the spreading factor (between 6-12) for our simulations this is currently set to 9
HD is a header bit, 0 for implicit header, 1 for explicit header
DR is a bit for low data optimization, 1 when enabled, 0 when disabled
CR is the coding rate (1 corresponding to 4/5 and 4 to 4/8)

adapted from page 31 of the Semtech's SX1276 documentation

Addressing

There are two possible ways of looking at addressing on a LoRa mesh.

  1. single node addressing
  2. multicast addressing

Single node addressing implies that messages are sent with a single destination address. Every disaster radio node has a unique 4 byte address. Intitially, we are using the last 4 bytes of MAC address of the nodes WiFi interface as its unique address. This should be unique across the network since it corresponds to the last byte of the OID and the three bytes of device ID which are given to the ESP32 mircocontroller by Espressif. If we begin supporting other microcontrollers we may need to reevalute how addresses are determines.

Multicast addressing implies that messages are sent with multiple destinations, these destinations have a shared 4 byte multicast address. This would enable nodes to subscribe to a channel by accepting messages for a specific multicast address. This may assume prior knowledge of the multicast address, though we may choose to implement some form of discovery eventually.

There are two addresses reserved for device function,

  • FF:FF:FF:FF limited broadcast to all neighboring nodes
  • aF:FF:FF:FF packets containing routing table information meant for LL2 routing table managment

Packet Structure

Semetech's specifications for LoRa tranceivers include a header (set explicit or implicit). When set in explicit mode, a header will be sent that contains the payload's length, coding rate, and the presence of a CRC, (link to RFM doc) https://revspace.nl/DecodingLora. In implicit mode, only a preamble is sent to intitiate communication.

Currently, we are using an explicit header, though we should consider switching to implicit,

Byte 0 Byte 1 Byte 2 - 5 Byte 6 - 9 Byte 10 Byte 11 - 256
ttl totalLength source nextHop sequence datagram

where,
ttl is the "time to live", i.e. the number of hops allowed before message is discarded by a node
totalLength is the entire length of the paket (header + data)
source is the 4 byte address of node that originated the message
nextHop is the 4 byte address of next hop of the route
sequence is the global message counter that determines packet loss rate
datagram is the content of packet to be interpreted by Layer 3 applications

An optional CRC may be included at the end of the payload (it's unclear how or when this is checked, it may be internal to the SX1276?).

Routing

Devices on a LoRa mesh network are assumed to only have a single radio transceiver (though we plan on adding support for two). This means that every routing decision comes down to a single question, Should the node retransmit a message or not? This decision is based on a few of conditions,

  1. Am I the intended nextHop address?
  2. Is the destination address in the node's routing table?
  3. Has the packet exceeded it's time to live?

LL2 routing is managed by two tables that answer two questions,

  • the neighbor table - who's around me?
  • the routing table - who's around who's around me...?

Neighbor table

A neighbor table is generated by both regular messages (i.e. chat, map, etc) and hello packets. When a regular message has not been sent after a specified timeout period, a hello packet will be sent to ensure a node is not dropped from a neighboring node's routing table. A neighbor table entry consists of the following,

struct NeighborTableEntry{
    uint8_t address[ADDR_LENGTH];
    uint8_t lastReceived;
    uint8_t packet_success;
    uint8_t metric;
};

address is 6 byte address of neighbor node
lastReceived is the sequence number of last received message from this address used to recalculate packet loss
packet_success is the instanteanous success rate of packets from this address
metric is a 0 to 255 representation of quality of link to the neighbor address

Hello packets

A hello packet should be thought of as an empty routing table packet. It is the packet which intiates route discovery. In order to avoid flooding the network, hello packets have a TTL of 1, meaning they are only transmitted a single hop to immeadiate neighbors. This implements one of the default multicast addresses, FF:FF:FF:FF, which refers to all immediate neighbors.

Routing table

A routing table entry on a node in a LoRa mesh network consists of the following,

struct RoutingTableEntry{
    uint8_t destination[ADDR_LENGTH];
    uint8_t nextHop[ADDR_LENGTH];
    uint8_t distance;
    uint8_t lastReceived;
    uint8_t metric;
};

where,
destination is the 6 byte address of end point of route
nextHop is the 6 byte address of neighbor through which end point maybe reached
distance is the number of hops from end point
lastReceived is the sequence number of last received message from destination used to recalculate packet loss
metric is a 0 to 255 representation of quality of link between the destination address and its final hop (should be average metric of entire route?)

Neighbor table entries are routing entries where the destination and next hop are the same and the number of hops is known to be one (zero hops being the localhost). When a neighbor is found and added to the neighbor table, it is also added to the routing table as a route via itself and a single hop.

The routing table is generated by nodes advertising their routes to their neighbors. The advertising of routes requires larger packets to be transmitted and will therefore be sent less often than hello packets.

Routing table packets

Every so often (maybe after a change in the network, or on set intervals) a node transmits a routing table packet to advertise it's routes to it's neighbors. Much like hello packets, routing tables packets have a ttl of 1 and are addressed to 0xFFFFFFFFFF, that is all immediate neighbors. The data of this packet containing n routes is structured like so,

Byte 11 to 14 Byte 15 Byte 16 ... Byte n * 6 to (n * 6)+3 Byte (n * 6)+4 Byte (n * 6)+5
routeTable[0].destination routeTable[0].distance routeTable[0]. metric routeTable[n].destination routeTable[n].distance routeTable[n].metric

On reception by a neighboring node, this data is parsed, the distance values are incremented to reflect the hop by which the routes were received, and the routes contained are compared against all exisiting routes in the node's routing table. After compairsion, the node either, adds the route to its table, updates an existing route with a new metric or better nextHop, or drops the route and does nothing because it already has a better route or the route refers to the nodes local address.

Dues to limited packet size (240 bytes of data), a routing table packet is limited to 30 routes (8 bytes/route). After discovering more than 30 routes, a node needs to send multiple packets to share its entire routing table. Rather than assuming that neighboring nodes can receive sequential packet and append them to one another, we assume the worst-case scenario of a lossy network and instead send a random assortment of 30 routes at a time. This way, it is less likely neighboring node will completely miss the existence of a route because of a poor quality link to the transmitting node (e.g. what if the neighboring node is missing every other third packet? then it might miss the entire last third of the transmitting node's routing table).

Metrics

In order to avoid over using air space (exceeding duty cycle), the routing protocol is built into normal packet structure. This way every packet can be used to update the routing table and route metrics. If a node reaches a timeout period, based on acceptable duty cycle, without transmitting it should transmit a "hello, I'm alive" packet, so other nodes do not drop it from their routing table.

For route selection, a node uses two metrics,

  • hop-count
  • packet success

A node will almost always prefer the shortest number of hops, unless the packet success rate drops below a certain threshold. This threshold can be tuned on a per node basis.

Packet success is calculated from the sequence number in the header of packets and is represented by a 0 - 255 value. If a node misses a packet from neighbor, the packet success value is decreased by 16, meaning that after 16 packets are missed packet success will be reduced to 0.

It is possible to add additional metrics as weighted averages of packet success. For development purposes, we are implementing a random weighted metric that represents potential packet loss on the network.

Multi-hop metrics are an average of the metric of each hop. They are calculated as follows

hopRatio = 1/route.distance;
metric = neighborTable[entry].metric * (hopRatio) + route.metric * (1-hopRatio)

where, route.distance is the number of hops between node X and route destination, node Z neighborTable[entry] is the neighbor table entry corresponding to node Y which sent routing packet neighborTable[entry].metric is the metric for the link between node X and node Y route.metric is the "rest of route" metric sent in routing packet representing the average metric between node Y and node Z, calculated as shown above

This way the routing metrics are recursviely calculated as a average without node X knowing the metric of every link between Y and Z.

Node States

When a node first joins a LoRa mesh network it must go through a number of steps before it can begin routing.

Discovery

During the discovery phase, a node transmits hello packets to introduce itself to its neighbors and listens for other packets being sent in its range so it can discover its neighbors and add them to its neighbor and routing tables. After a set timeout or some minimum number of neighbors is discovered a node can enter the learning phase.

Learning

During the learning phase a node shares its routing table with its neighbors and listens for routing packets being sent from neighbors. It then parses the routing packets as described above and updates its routing table accordingly. This way it learns about the neighbors of its neighbors, i.e. nodes that are multiple hops away. The learning phase is arguably never complete, as new nodes could appear at any moment, however, after a certain point, a node should have enough routes to begin routing packets and therefore enter the forwarding state

After learning phase has reached an adequate point, a routing table may look like so,

Routing table example:

1 hops from 26a8f7dd via 26a8f7dd metric 195 
1 hops from 77339dd7 via 77339dd7 metric 215 
1 hops from 36110216 via 36110216 metric 214 
2 hops from e5093e8c via 77339dd7 metric 205 
2 hops from 9bd3a4aa via 77339dd7 metric 187 
2 hops from 6767e872 via 26a8f7dd metric 228 
2 hops from ecca8bd9 via 36110216 metric 244 
3 hops from 65a48843 via 36110216 metric 244 
3 hops from 89d39582 via 36110216 metric 213 
5 hops from 0bfeb0f4 via 36110216 metric 246 
6 hops from 590f4402 via 36110216 metric 210 

Forwarding

In the forwarding state, a node has discovered enough neighbors and learned enough about the network to be considered a good route by some number of (at least one) neighboring node(s). This should be consider the normal operating state of a disaster radio node, as it can now be used to send addressed message and retransmit messages addressed to nodes for which it has routes.

Convergence

Convergence implies that every node has knowledge of a route to every other node in the network.

Similar to the Babel, a TCP/IP mesh networking protocol, a node in a LoRa mesh network can be said to know the address of every node in the network and the nextHop to reach that address, but it does not know the entire topology of the network. This reduces the memory required to maintain a routing table and the time needed for convergence.

Given the physical constraints of LoRa modulation, a LoRa mesh network converges relatively slowly (approx. on order of 1 minute for every 6 hops and perhaps longer under non-ideal conditions). Attempting to increase the convergence time can result in high packet loss or, on a physical network, RF interference from neighboring nodes. For convergence to occur, a route must be relayed from one end of the network to the other. At a ~10s transmission intervals, routes may take more than a minute to traverse a 6 hop wide network.
The ideal convergence time for a LoRa mesh can be calculated as follows,

convergence(n) = (airtime + delay) * (width) * 2

where, airtime is the time is takes for node to transmit a single routing table packet delay is the time between a node transmiting routing table packets width is the maximum hops possible on the network, depends on network structure, but often one half the total nodes

For example, in a 15 node network, assuming a network width of 7 hops, a message delay of 10s, and an approximate transmit time of 200ms,

converegence(15) = (0.2 + 10) * (7) * 2 = 142.8s

Routed Packets

After entering a forwarding state (ideally once the network has converged), routed packets can be formed by adding the destination address as the first four bytes of the datagram.

Packets must be given an intended destination by a Layer 3 application. For example, if you receive a chat message from a user at specific node, the chat app will need to reply to that message by placing the correct address in the

A datagram is formed like so,

Byte 11 to 14 Byte 15 Byte 16 - 256
destination type message

The originating node checks their routing table and finds the entry which matches the datagram's destination and then creates a LL2 packet the corresponding nextHop in the packet header. This header then encapsulates the orginal datagram, preserving the destination, type, and message.

When this packet is received by neighboring nodes, they check if the nextHop specified matches their local address. If it does not match, the packet is dropped. If the nextHop does match their local address, they decrement the ttl in the header and then check their routing table to find the nextHop for the packet and replace their local address in the original packet with the nextHop from their routing table. The message is then pushed on to buffer and queued to be transmitted.

Route selection

Route selection is done on node-by-node basis, there is no intention to implement source-specific routing. Each node maintains only possible routes with lowest distance to a destination in it routing table. When a node receives a routed packet that it is intended to retransmit, it will refer to it's routing table find the best nextHop. By default, a node will select the nextHop that gives the lowest distance route.

Current Implementations

Actively being developed for use on ESP32 dev boards:
https://github.com/sudomesh/LoRaLayer2

The network stack and routing protocol outlined in this wiki were originally developed in a network simulator, https://github.com/sudomesh/disaster-radio-simulator
Note: the simulator is out-of-date and does not reflect many of the latest improvements to the LL2 protocol.

Clone this wiki locally