Design Consideration - The effects of VLAN ID's on Spanning-Tree convergence

Network Switch

When designing a network, consideration should be given to separating traffic into VLANs. This is done for logical separation, security and performance reasons.

This affects the convergence of a Layer 2 Network. Most Cisco switches running STP (Spanning-Tree Protocol) will be running an instance per VLAN. This is called PVST (Per VLAN Spanning-Tree), or RPVST (Rapid Per VLAN Spanning-Tree), depending on whether you are running STP (which is very legacy) or RSTP (Rapid Spanning-Tree Protocol). This means that for every individual VLAN, an independent instance of spanning-tree is running to calculate the topology.

When there is a TCN (Topology Change Notification), each of these instances has to recalculate the topology and make the relevant port changes.

The Control Plane of a switch uses a processor, which only has so many cores - in older switches that was one, in newer it can be up to 8. Even with 8 cores, if you have a large network with 1000 VLANs there will still be a wait while the processor deals with all of these spanning-tree instances changing at the same time.

Example

Here is an example, with only 4 VLANs (1, 2, 99, 100). The topology is just two switches, with two links between them (no etherchannel).

Switch#sh spann vl 2

VLAN0002
  Spanning tree enabled protocol rstp
  Root ID    Priority    24578
             Address     0c77.ded6.3200
             Cost        4
             Port        1 (GigabitEthernet0/0)
             Hello Time   2 sec  Max Age 20 sec  Forward Delay 15 sec

  Bridge ID  Priority    32770  (priority 32768 sys-id-ext 2)
             Address     0c77.de5f.f700
             Hello Time   2 sec  Max Age 20 sec  Forward Delay 15 sec
             Aging Time  300 sec

Interface           Role Sts Cost      Prio.Nbr Type
------------------- ---- --- --------- -------- --------------------------------
Gi0/0               Root FWD 4         128.1    P2p 
Gi0/1               Altn BLK 4         128.2    P2p 


Switch#debug spanning-tree events
Spanning Tree event debugging is on
Switch#conf t
Enter configuration commands, one per line.  End with CNTL/Z.
Switch(config)#int gi0/0
Switch(config-if)#shut 
Switch(config-if)#
Jan  9 10:31:01.538: RSTP(1): updt roles, root port Gi0/0 going down
Jan  9 10:31:01.539: RSTP(1): Gi0/1 is now root port
Jan  9 10:31:01.540: RSTP(2): updt roles, root port Gi0/0 going down
Jan  9 10:31:01.540: RSTP(2): Gi0/1 is now root port
Jan  9 10:31:01.541: RSTP(99): updt roles, root port Gi0/0 going down
Jan  9 10:31:01.541: RSTP(99): Gi0/1 is now root port
Jan  9 10:31:01.542: RSTP(100): updt roles, root port Gi0/0 going down
Jan  9 10:31:01.543: RSTP(100): Gi0/1 is now root port
Jan  9 10:31:01.565: STP[1]: Generating TC trap for port GigabitEthernet0/1
Jan  9 10:31:01.566: STP[2]: Generating TC trap for port GigabitEthernet0/1
Jan  9 10:31:01.567: STP[99]: Generating TC trap for port GigabitEthernet0/1
Jan  9 10:31:01.569: STP[100]: Generating TC trap for port GigabitEthernet0/1
Jan  9 10:31:03.500: %LINK-5-CHANGED: Interface GigabitEthernet0/0, changed state to administratively down
Jan  9 10:31:04.500: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet0/0, changed state to down
Switch(config-if)#

Looking through the debug log, you can see that the switch sequentially processes the VLANs in numerical order (look at the number in brackets - that's the instance (and VLAN) ID).

Now, this happened in a fraction of a second - so why does it matter? In the event of an overloaded network, where the switches have high CPU load (and there has just been a topology change, remember, so lots of things will be reconverging), this problem can be exacerbated. The network might have 1000 VLAN's, each needing to converge at layer 2. After that there might be 1000 instances of HSRP, or multiple routing protocols within VRFs. All of these also need CPU time.

What does this mean?

Looking at this, it means we should be looking to use lower numbered VLANs for our most sensitive and critical traffic. A common example of this is Voice, but the most important traffic depends on your organisation's requirements. In low latency networks such as in the financial industry, this is much more critical. These VLANs should converge the fastest, and so in the event of a topology change they should have the least disruption.

We should also be looking at migrating away from RSTP and moving to MST (Multiple Spanning-Tree). This is a method of administratively grouping VLANs into spanning-tree instances. Most redundant layer 2 networks are designed around pairs of switches, which means you typically only have two valid topologies. Why run 1000 RSTP instances when in reality there are only 2 topologies? It's just 500 of the same thing, twice - using CPU and memory and slowing down convergence events. In this example below, two instances are configured with half the VLANs on each instance - you can see that instead of the 4 updates we received in the above example, this time there are only 2:

Switch#sh spann mst     

##### MST0    vlans mapped:   1-50,101-4094
Bridge        address 0c77.de5f.f700  priority      32768 (32768 sysid 0)
Root          address 0c77.ded6.3200  priority      24576 (24576 sysid 0)
              port    Gi0/0           path cost     0        
Regional Root address 0c77.ded6.3200  priority      24576 (24576 sysid 0)
                                      internal cost 20000     rem hops 19
Operational   hello time 2 , forward delay 15, max age 20, txholdcount 6 
Configured    hello time 2 , forward delay 15, max age 20, max hops    20

Interface        Role Sts Cost      Prio.Nbr Type
---------------- ---- --- --------- -------- --------------------------------
Gi0/0            Root FWD 20000     128.1    P2p 
Gi0/1            Altn BLK 20000     128.2    P2p 

##### MST1    vlans mapped:   51-100
Bridge        address 0c77.de5f.f700  priority      32769 (32768 sysid 1)
Root          address 0c77.ded6.3200  priority      24577 (24576 sysid 1)
              port    Gi0/0           cost          20000     rem hops 19

Interface        Role Sts Cost      Prio.Nbr Type
---------------- ---- --- --------- -------- --------------------------------
Gi0/0            Root FWD 20000     128.1    P2p 
Gi0/1            Altn BLK 20000     128.2    P2p 


Switch#debug spanning-tree mstp state 
Switch#conf t
Enter configuration commands, one per line.  End with CNTL/Z.
Switch(config)#int gi0/0
Switch(config-if)#shut
Switch(config-if)#
Jan  9 11:04:14.214: MST[1]: Gi0/0 state change forwarding -> disabled
Jan  9 11:04:14.215: MST[1]: Gi0/1 state change blocking -> forwarding
Jan  9 11:04:14.216: MST[0]: Gi0/0 state change forwarding -> disabled
Jan  9 11:04:14.216: MST[0]: Gi0/0 state change disabled -> blocking
Jan  9 11:04:14.217: MST[0]: Gi0/1 state change blocking -> forwarding
Jan  9 11:04:14.238: STP[0]: Generating TC trap for port GigabitEthernet0/1
Jan  9 11:04:14.239: STP[1]: Generating TC trap for port GigabitEthernet0/1
Jan  9 11:04:16.175: %LINK-5-CHANGED: Interface GigabitEthernet0/0, changed state to administratively down
Jan  9 11:04:17.180: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet0/0, changed state to down
Switch(config-if)#

The big difference here is that as more VLANs are added, the first example would get longer. Each VLAN would generate a debug log, as it individually converged. With this example using MST there will only ever be two events, as there are only two spanning trees.

Share this post

  • Share to Facebook
  • Share to Twitter
  • Share to Google+
  • Share to LinkedIn
  • Share by Email