Skip to content

Conversation

taspelund
Copy link
Contributor

This PR adds initial support for IPv6 static routing in maghemite, with the very beginnings of BGP support (really just originated IPv6 prefixes).
This addresses a few bugs found along the way (e.g. #529), adds unit tests for old/new functionality, and does a good bit of code cleanup, e.g.

  • mostly standardizing on oxnet::IpNet in generic spots and rdb::Prefix in db-specific spots, rather than several unique types
  • moving common functionality into mg-common where it can be used by all other crates w/o creating circular dependencies

taspelund added 14 commits July 25, 2025 12:38
Removes one of the multiple duplicate types that implement an IP Prefix.

Signed-off-by: Trey Aspelund <[email protected]>
Moves to ddm/oxnet types to represent an IP Prefix instead of custom
types. Step 2 in standardizing around oxnet Prefix types.

Signed-off-by: Trey Aspelund <[email protected]>
Remove unused BgpAttributes4, BgpAttributes6, OriginChangeSet, and
Status types.  There were a couple other unused types that seemed more
forward-looking, so I left those in place (i.e. fn to_buf(), Policy4Key)

Signed-off-by: Trey Aspelund <[email protected]>
Updates Db to incorporate new methods that support IPv6.
The RIB is now split into three separate types:
- Rib: Holds Prefix (protocol-agnostic enum)
- Rib4: Holds Prefix4 (IPv4 specific struct)
- Rib6: Holds Prefix6 (IPv6 specific struct)
Updates bestpaths() to be completely prefix-agnostic (now accepts paths
as an argument in lieu of a prefix and Rib).
Callers of prefix add/remove methods are now responsible for managing
locks for the appropriate RIB structures, allowing each protocol's RIB
to be locked independently and giving the caller the option to reduce
how many times the lock needs to be acquired/released.

Signed-off-by: Trey Aspelund <[email protected]>
Initial implementation of IPv6 Static Routing:
- New mgd APIs
- New mgadm commands (mirroring v4 static routing commands)

Signed-off-by: Trey Aspelund <[email protected]>
During creation of Prefix4/Prefix6 structs, an Ipv{4,6}Addr was blindly
allowed w/o any validation or zeroing of the host bits.  This allowed
for the RIB to hold multiple entries for routes w/ the same underlying
subnet (e.g. 1.1.1.1/24 and 1.1.1.2/24 would be considered unique keys,
but the underlying subnet is still 1.1.1.0/24), possibly even with
differing next-hop values.
This commit addresses that issue by adding a step to zero the host bits
during the creation of new Prefix4/Prefix6 structs.  Figuring out a
migration strategy for the on-disk sled::Db remains a concern, but
this prevents new problematic entries from being created.

Fixes: #529

Signed-off-by: Trey Aspelund <[email protected]>
Second half of the fix for #529. First we moved to a constructor that
enforced all host bits be zeroed when creating a Prefix* type. Now, we
check whether the routes in the on-disk db are valid (host bits unset)
and remove them if they're invalid.
This also moves most of the logic into methods of the Prefix* types,
with the exception of the mg-admin-client mirrors of them, as progenitor
can't know about methods/traits from the openapi spec so that logic
has to live somewhere.

Fixes: #529

Signed-off-by: Trey Aspelund <[email protected]>
Some helpful parsing and retry macros were sitting in bgp and other
crates couldn't use them... so I moved them to mg-common.

Signed-off-by: Trey Aspelund <[email protected]>
Adds APIs, db methods, mgadm commands and unit tests for IPv6 BGP Origin

Signed-off-by: Trey Aspelund <[email protected]>
Signed-off-by: Trey Aspelund <[email protected]>
@taspelund taspelund requested a review from rcgoodfellow August 8, 2025 22:45
@taspelund taspelund self-assigned this Aug 8, 2025
@taspelund taspelund added bgp Border Gateway Protocol mgd Maghemite daemon customer For any bug reports or feature requests tied to customer requests static Static Routing rust Pull requests that update rust code labels Aug 8, 2025
Signed-off-by: Trey Aspelund <[email protected]>
@taspelund taspelund linked an issue Aug 8, 2025 that may be closed by this pull request
Moves RIB queries from /bgp to /rib (this will likely require
a corresponding update in Nexus), as rib_in and rib_loc are not
specific to BGP.
Moves RIB logic into its own set of files (mgd/src/rib_admin.rs,
mgadm/src/rib.rs).
Refactors print_rib() and move it into rib.rs.
Adds support for server-side filtering of RIB queries by AddressFamily
and Protocol (BGP/Static).
Removes a bunch of duplicate types and impls by using progenitor's
"replace" directive in generate_api!() macro to automatically map
progenitor's auto-derived types back to their original rdb types (this
is something we should do further refactoring on, as there is quite a
lot of type/impl duplication going on).

Fixes: #525

Signed-off-by: Trey Aspelund <[email protected]>
@@ -83,23 +42,28 @@ fn test_basic_peering() {
// Ensure that r1's peer session to r2 has gone back to connect.
r2.shutdown();
d2.shutdown();
wait_for_eq!(r1_session.state(), FsmStateKind::Connect);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we expecting state machine behavior changes? It looks like tests are now failing here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main reason I modified this is because I believed the FSM state observed is timing-dependent, not consistent.

It seemed to me like the goal of this test case is to confirm that "shutdown" on one side brings the session down.
If so, then looking for "!established" is a less timing-dependent way to confirm that things have been completely torn down.

Otherwise, the state on the active side could be "connect", "opensent" (since the listener isn't torn down on the shutdown side, the transport can still come up) or "idle".
I was hoping to avoid a scenario where the test fails because the active peer is in "idle" or "opensent" and we were only looking for "connect".

I don't have super strong opinions about this though, so if reverting that change means the test passes consistently again, then I'm fine with that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure at what point this test became unreliable. I'll go back through and figure out which of my changes borked things. I don't suspect the issue is with the logic change of "state != established", but I could be wrong

};

for p in prefixes {
update.nlri.push(p.clone());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't v6 be in mp_reach_nlri/mp_unreach_nlri?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eventually, yes. MP_{REACH,UNREACH}_NLRI hasn't been implemented yet

The idle hold timer was not being handled properly before. This led to
situations where a session that was configured as passive to transition
from the Idle FSM state into Connect, when passive sessions should only
ever go into Active.  This adds explicit handling of the timer in the
FSM when transitions to Idle state are occurring or when the
IdleHoldTimerExpires FSM Event is received.

Signed-off-by: Trey Aspelund <[email protected]>
The Dispatcher loop was calling Listener::bind() after any event,
regardless of whether that was a normal/good event, e.g. accept()'ing a
connection, or a bad/error event, e.g. an accept() or send() failure.
This moves the accept() and send() logic into an inner loop and sets up
flow control such that a normal event will not trigger a re-binding of
the listener socket. The output below are from two successful test runs
with additional debugs added, one with the flow control changes and one
without.

Without loop updates (169 bindings occurs):
```
treyaspelund@Tallon-IV 01:25:46 PM | ~/git/maghemite  trey/ipv6 ○
‣ grep 'binding' bgp/r1.basic_peering.log | wc
     169     676   26343
```

With loop updates (only 1 bind occurs):
```
treyaspelund@Tallon-IV 01:24:24 PM | ~/git/maghemite  trey/ipv6 ✓
‣ grep 'binding' bgp/r1.basic_peering.log
{"msg":"bgp dispatcher binding 1.0.0.1:179","v":0,"name":"slog-rs","level":20,"time":"2025-08-22T13:23:51.058233-04:00","hostname":"Tallon-IV","pid":22176}
```

Signed-off-by: Trey Aspelund <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bgp Border Gateway Protocol customer For any bug reports or feature requests tied to customer requests mgd Maghemite daemon rust Pull requests that update rust code static Static Routing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

IPv6 support for static routing
2 participants