Skip to content

v6.4.0 Route Servers - Resilience and UI Based Community Filtering

Compare
Choose a tag to compare
@barryo barryo released this 24 May 18:39
· 19 commits to master since this release
Screenshot 2024-05-24 at 19 04 14

This release provides significant new features for route server resilience, UI-based community filtering, and many smaller improvements and bug fixes.

Release Summary

git --no-pager diff --shortstat v6.3.1 master
 218 files changed, 21503 insertions(+), 28569 deletions(-)

Upgrade Instructions

🚨 🎥 There is a tutorial video demonstrating this upgrade including all the required changes on route collectors and route servers available on YouTube here.

The official upgrade instructions can be found here. Follow these, including the database migrations.

Edit your .env file and add the following:

TELESCOPE_ENABLED=false

Once that is complete, and assuming you have read these release notes in full, proceed as follows:

  1. Edit your router definitions on IXP Manager to set up router pairs.
  2. Replace your route server and route collector update scripts with the new scripts linked below.
  3. Update the cadence of the router update scripts and enable the route server filtering UI.

Route Server Resiliency

For IXPs, route servers are considered a critical production service and most IXPs deploy them in redundant pairs. This is usually implemented with dedicated hardware (servers with dual PSU, hardware RAID, and out-of-band management access) deployed in different points of presence.

When it comes to updating the configuration of these, the scripts provided by IXP Manager suggested that this be done about four times per day with the timing of the cronjob set so that there is an offset so that each server will not update at the same time. The hope was that if there was an issue, only one server of the resilient pair would be affected, and engineers would have time to react and prevent updates on the other working server. Some IXPs added additional logic to the scripts to check if the other server was functional before performing a reconfiguration, but this was often limited to pings and a simple check to see if Bird was running.

This release adds a significant new resilience mechanism by pairing servers. In the IXP Manager router UI, you can now select another router to pair with the one you are editing. You would select pairs as follows:

  • For route servers deployed in pairs, rs1-ipv4 should be paired with rs2-ipv4 and vice versa - be sure to set the paired server in each individual server.
  • For route collectors, quarantine route collectors and AS112 services where you would normally have a single instance, you can pair the ipv4 version with the ipv6 version, ensuring at least one will always be running. For example, pair rc1-ipv4 with rc1-ipv6 and vice versa.

Once your pairs are set up, you need to deploy the new router update scripts as follows:

There is no need to use different scripts for route collectors and servers. Traditionally, at INEX, these scripts were developed slightly differently from each other (e.g., the collector script updates both IPv4 and IPv6 versions and provides more informative output, whereas the route server script takes a specific route server handle to update). We may merge these in the future.

You can use these scripts exactly as they are on an Ubuntu server changing only the configuration lines at the top:

APIKEY="your-api-key"
URLROOT="https://ixp.example.com"
BIRDBIN="/usr/sbin/bird"

The collector script takes an additional configuration option for the handles of the servers to update - e.g.:

HANDLES="rc1-ipv4 rc1-ipv6"

These new scripts now work as follows:

  1. NEW: Obtain a local script lock preventing more than one update script to execute at a time.
  2. NEW: Obtain a configuration lock from IXP Manager.
    • This involves making an API call to /api/v4/router/get-update-lock/$handle, which IXP Manager then processes and returns HTTP code 200 if the lock is acquired and the update can proceed.
    • A lock is not granted if the router is paused for updates within IXP Manager (new per-router option in the router's dropdown menu on the router list page).
    • A lock is not granted if another process has already acquired a configuration lock for this router.
    • A lock is also not granted if the router's partner is locked. This major new resiliency addition prevents two paired route servers from being updated in parallel.
    • The update script will be aborted if IXP Manager is unavailable or in maintenance mode.
  3. If a lock is acquired, the script will then download the latest configuration from IXP Manager.
  4. The script will do some basic sanity checks on the downloaded configuration:
    • First, check that the HTTP request to pull the new configuration succeeded.
    • Second, check that the downloaded file exists and is non-zero in size.
    • Third, ensure at least two BGP protocol definitions are in the configuration file.
    • Lastly, the script has Bird parse the downloaded file to ensure validity.
  5. NEW: The update script will now compare the newly downloaded script to the running configuration.
    • If there are differences, the old configuration is backed up, and the Bird daemon will be reloaded.
    • If no differences exist, the Bird daemon will not be reloaded.
  6. A check is performed to ensure the Bird daemon is actually running and, if not, it is started.
  7. IMPROVED: A final API call is made to IXP Manager via /api/v4/router/updated/$handle to release the lock and update the timestamp.
    • A significant improvement here is the use of a until api-succeeds, sleep 60, retry construct to ensure the lock is released even when there are transitive network issues / IXP Manager maintenance modes / server maintenance, etc.

Adding step (5) above (only reload on changes) now allows the update script to be safely run as frequently as every few minutes, which is necessary for the UI-based community filtering to be effective.

You should still offset the updates between router pairs, as the script will give up if a lock cannot be obtained. Future improvements could allow for some retries.

For additional information with UI images, see slides 25-30 in this presentation PDF.

Route Server Community Filtering via the UI

Community-based filtering is the standard way to allow route server participants at an IXP to control their routing policy. IXP Manager has supported - and set - the standard across the industry since route servers were introduced at INEX in 2007.

Such filtering is essential to maximise participation with route servers as the member is essentially outsourcing their routing policy to the IXP, and many would be uncomfortable or unable to do this without these basic controls.

Community-based filtering in practice can be difficult for participants at both ends of the network-size scale:

  • Small networks rarely touch their border routers and may be both unfamiliar and uncomfortable with the necessary concepts and configuration to use them. This is especially true in a stressful situation when they urgently need to apply communities for the first time.
  • Large networks may need cumbersome change control procedures or, in some cases, their automated provisioning pipeline may not even support them.

We must also remember that community filtering is only half the story - the participant will still need to apply route filters to the routes they learn from the route servers (community filtering applies to how their prefixes are propagated by the route servers).

This release of IXP Manager introduces a new feature which allows IXP members to configure route server filtering in a web-based UI. This will move the configuration complexity from the member and their router to the IXP's route servers. The actual mechanism of filtering is unchanged - just where it happens moves:

  • The route server will apply community tags to the member's routes immediately at ingress rather than the member doing it on
    egress.
  • In the other direction, the route server will filter routes to be advertised to the member on egress rather than the member doing it on ingress.

We expect this to work for >=90% of use cases. A member with a more complex routing policy should handle it on their own routers anyway.

The implementation in IXP Manager uses two database tables - a staging table and a production table. When a member first creates or subsequently edits their filters, this will happen in the staging table. Once they are satisfied their routing intentions are complete, they can commit the changes to the production table. As each router processes its next configuration update, the comparison diff discussed in the above section will show differences, and the router configuration will be updated. It is important, therefore, to have the route servers update on a schedule of at least every 10 minutes.

To allow IXP administrators to update and increase the frequency of their route server update scripts, this UI feature is disabled by default. To enabled it, add the following to your .env file:

IXP_FE_FRONTEND_DISABLED_RS_FILTERS=false 

Sample cadence for two route servers:

rs1-ipv4: 0,10,20,30,40,50 *     * * *
rs1-ipv6: 2,12,22,32,42,52 *     * * *
rs2-ipv4: 5,15,25,35,45,55 *     * * *
rs2-ipv6: 7,17,27,37,47,57 *     * * *

You will also want to give your members an indication of when they can expect changes to go live. For that, you can skin the introductory text as follows:

IXPROOT=/srv/ixpmanager   # adjust to suit your own install
cd $IXPROOT
mkdir -p resources/skins/{your-skin-name}/rs-filter
cp resources/views/rs-filter/introduction.foil.php resources/skins/{your-skin-name}/rs-filter
vi resources/skins/{your-skin-name}/rs-filter/introduction.foil.php
# add a bullet such as: <li>All changes made will be synced to production within 10 minutes.</li>

For additional information with UI images, see slides 8-18 in this presentation PDF.

Fronted Unit Testing

This release includes extensive improvements and significant additional new coverage for the Laravel Dusk UI tests from the project's newest developer, @griphons.

These tests emulate a browser user and perform all the standard create, update, view and delete tasks on the UI and confirm those changes propagate as expected to the database.

Smaller Improvements, Fixes and Security

Platform Updates and Security Fixes

  • Laravel Framework upgraded from v8 to v9. v9 is the final release with PHP 8.0 support.
  • Tailwind CSS upgraded from v1 to v3.
  • parsedown/laravel now abandoned - imported the two files directly (MIT license) via 5648032
  • Third party library, dompdf, updated due to security issues
  • Third party library, zendesk, updated due to security issues

Additional New Features and Improvements

  • Added vlan.export_to_ixf option to allow operator to select which vlans are exported in the IX-F export schema
  • Add notes and p/o number to customer detail
  • No longer logging automated update events for route servers which were spamming the database logs on every router update.
  • Reorder the sflow processing perl script for performance and clarity - 1e8908e
  • Mail testing tool improved via cb22b5e
  • #877 IRRDB updates -> continue to next customer on error
  • #880 Whois PeeringDB closed Jan 2024 - switched to PeeringDB API now

Bug Fixes

  • Fix exception thrown (always) when editing the switchport of a core bundle - db24038
  • Fix a BGP template issue that prevented route servers with 32-bit ASNs from working - 20a8995
  • Fix sed command line issue in tools/runtime/dns-arpa/update-dns-from-ixp-manager.sh - 5aca8f9
  • Fix capitalisation via PR #879 for snmp polling - 0a1f8e8
  • #855 Aggregate MRTG graphs are including the reseller uplink interfaces as well as the peering interfaces, creating incorrect graphs/stats
  • #862 Notes preview in admin edit panels is not correctly rendered
  • #865 Migrations fail on initial setup with an empty database
  • #873 Can not update IRRDB if only IPv6 is configured