Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First commit of vpc_fast_convergence script #74

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
49 changes: 49 additions & 0 deletions nx-os/python/vpc_fast_convergence/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
# vPC Fast Convergence

## Description
The `vpc_fast_convergence.py` script is designed to introduce a delay in bringing up the VPC (Virtual Port Channel) on Cisco Nexus 9000 switches running NX-OS. This script is useful in scenarios where you want to ensure that certain conditions are met before the VPC is fully operational.

In version 9.x and previous switches, when the vPC Peer switch reboots up, the peer-link of vPC will switch the traffic over immediately, at this time, because some interfaces have not been fully initialized, it will lead to traffic loss.

In order to solve this problem, we can `shutdown` the vPC member ports first, and then `no shutdown` vPC member ports one by one after the initialization is completed.

There are a few known issues or enhancements related to VPC convergence, some of them are listed below:
[CSCvw14768 10 seconds Packet loss observed when VPC peer is joining back VPC after reload](https://bst.cloudapps.cisco.com/bugsearch/bug/CSCvw14768)

[CSCvu13461 Packet loss for seconds when VPC leg bring up](https://bst.cloudapps.cisco.com/bugsearch/bug/CSCvu13461)

[CSCwa20455 N9K : spanning-tree vPC convergence code commit to kr3f_dev](https://bst.cloudapps.cisco.com/bugsearch/bug/CSCwa20455)

Upgrade to the latest version of NX-OS to get the latest fixes and enhancements always the best practice. But if you are not able to upgrade, you can use this script to avoid the packet loss issue. But please test it in your lab environment before deploying it in production.

## Prerequisites
- Python 3.x
- Cisco Nexus 9000 switch running NX-OS
- TOR switch only

## Usage
1. Clone the repository or download the `vpc_fast_convergence.py` script.
2. Copy the script to the switch using SCP or any other method.
3. Run the script using the following command to install it on the switch, EEM applet will be created to run the script when Module 1 is online:
```
N9K# python3 vpc_fast_convergence.py install
```
4. To uninstall the script, run the following command:
```
N9K# python3 vpc_fast_convergence.py uninstall
```
5. To check the status of the script, run the following command:
```
N9K# sh run eem

!Command: show running-config eem
!Running configuration last done at: Sat Feb 24 18:50:03 2024
!Time: Sat Feb 24 18:50:10 2024

version 9.3(12) Bios:version 05.47
event manager applet vpc_fast_convergence
event syslog pattern "Module 1 is online"
action 1 cli python bootflash:vpc_fast_convergence.py enable

```
6. Please DO NOT use `enable` option if you are not sure about the script. It will enable the script and it will start delaying the VPC convergence.This may cause network outage.
84 changes: 84 additions & 0 deletions nx-os/python/vpc_fast_convergence/vpc_fast_convergence.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
from cli import cli, clid
from time import sleep
import json
import argparse
import syslog
import logging
import sys

logging.basicConfig(level=logging.INFO, format="vPC Fast Convergence: %(asctime)s - %(levelname)s - %(message)s")
logger = logging.getLogger(__name__)

SLEEPTIMER = 300
# Sleep timer to wait vPC convergence, default is 300 seconds
DELAYUPTIMER = 5
# Delay timer to no shutdown delay for each vPC member port, default is 5 seconds


def vpc_fast_convergence(delay_timer, delay_port_timer):
try:
result = json.loads(clid('show vpc'))
vpc_ports = result['TABLE_vpc']['ROW_vpc']
except Exception as e:
syslog.syslog(3, 'vpc_fast_convergence: Unable to get vPC member ports, Error: %s' % e)
sys.exit(0)

syslog.syslog(3, 'vpc_fast_convergence: Will shutdown all vPC member ports to wait vPC convergence!')
for i in vpc_ports:
cli('conf t ; interface %s ; shutdown' % i['vpc-ifindex'])
sleep(delay_timer)

syslog.syslog(3, 'vpc_fast_convergence: Start up all vPC member ports!')
for i in vpc_ports:
port = i['vpc-ifindex']
try:
cli('conf t ; interface %s ; no shutdown' % port)
except KeyboardInterrupt:
logger.info("User interrupt, exiting!")
sys.exit(0)
except Exception as e:
syslog.syslog(3, 'vpc_fast_convergence: Unable to no shutdown vPC member ports %s, Error: %s' % (port, e))
pass
sleep(delay_port_timer)


def install_script():
try:
cli('conf t ; event manager applet vpc_fast_convergence ; \
event syslog pattern \"Module 1 is online\" ; \
action 1 cli python bootflash:vpc_fast_convergence.py enable')
logger.info("Script installed successfully. "
"Automatic backup of running configuration to startup configuration is in progress.")
cli('copy running-config startup-config')
except Exception as e:
logger.error('Install script failed, please try later! Error: %s' % e)


def uninstall_script():
try:
cli('conf t ; no event manager applet vpc_fast_convergence')
logger.info("Script uninstalled successfully. "
"Automatic backup of running configuration to startup configuration is in progress.")
cli('copy running-config startup-config')
except Exception as e:
logger.error('Uninstall script failed, please try later! Error: %s' % e)


def main():
parser = argparse.ArgumentParser()
parser.add_argument("option", help="install/uninstall",
type=str)
args = parser.parse_args()
if args.option == "enable":
# enable is a hidden option for EEM applet only
vpc_fast_convergence(SLEEPTIMER, DELAYUPTIMER)
elif args.option == "install":
install_script()
elif args.option == "uninstall":
uninstall_script()
else:
logger.error("Invalid option, please use install or uninstall!")


if __name__ == '__main__':
main()