Cisco Nexus 1000v Module in “Other” state

What you don’t need while you are checking your morning emails and drinking your first cup of coffee of the day is to receive an email saying that the VSM for the Nexus 1k has rebooted.

By the time we logged on to the Nexus 1000v, it was back up. “show system redundancy status” showed both VSM’s (supervisors) as being up and HA. The “show version” and “show logs” definitely revealed a reload though – the uptime was less than an hour, and the logs were only as recent as the reload.

Well that’s good then. It’s back up, it’s in HA. No service disruptions – the VEM’s kept forwarding traffic, exactly as they should. Pressure off, let’s just figure out why it happened and make sure it doesn’t happen again. Cue one of the VMware guys shouting over – “Is there any reason why we can’t vMotion any VM’s? It says the distributed switch is unavailable!”Continue reading

CCIE Written Blueprint: 1.3.c Interpret packet capture

This is a very short section! I didn’t see the point in harping on about wireshark, I use it most days at work. And the IOS embedded packet capture was discussed in length further up the blueprint (i.e. in a previous blog post).

1.3.c Interpret packet capture

1.3.c (i) Using Wireshark trace analyzer

Packet capture can be obtained using a hub, or more commonly a SPAN / RSPAN port. Functionality includes filtering, tracing sessions, reassembling conversations, etc. Knowing the protocols, and therefore what to expect to see, is key. Actually using wireshark is a whole other video series!

1.3.c (ii) Using IOS embedded packet capture

As described above in 1.3.a (iii) Embedded packet capture. In my experience it is almost always better to save this as a PCAP, export and open in Wireshark. If needed, show monitor capture can provide the information.

CCIE Written Blueprint: 1.3.b Apply troubleshooting methodologies

This is another difficult section in the blueprint to write about. I find troubleshooting techniques and methodologies to be quite personal; no two people’s brains work the same way. I guess this is based on how I do things and some tips I’ve received from a few people over the years.

1.3.b (i) Diagnose the root cause of networking issue (analyze symptoms, identify and describe root cause)

Read the information in the support ticket very carefully, and take into consideration all of the symptoms. Take particular note of anything that may have changed around the time the symptoms start. This should give you a vague area to begin – L2, L3, specific routing protocol, etc. Verify that the fault is as described. Either start hop by hop, or use split half, to try and isolate the problem.

1.3.b (ii) Design and implement valid solutions according to constraints

Within the guidelines of what is permitted within the scope of the network design (or exam question!), draft a solution. Write it down if needed for clarity. Review this solution in your head before implementing. Think outside the box – is the solution you are proposing going to have any knock on effects to other services? Implement. If it doesn’t work, don’t jump in head first and start changing things. Step back, reassess, and start the process again. Otherwise you end up changing so many things you don’t know what you did.

1.3.b (iii) Verify and monitor resolution

Use the appropriate show commands to verify that everything has worked as expected. Test end to end connectivity.

CCIE Written Blueprint: 1.3.a Use IOS troubleshooting tools

1.3.a Use IOS troubleshooting tools

1.3.a (i) debug, conditional debug

Debugs can be used on a wide range of functions (debug ?). Some debugs can be very noisy. Debug conditions can be set to filter out some of the noise – for example debug condition interface fa0/0 will limit the debug information to things using that interface. Undebug all does not remove conditions, they must be specifically removed with the undebug condition command. Debugs can be quite processor intensive, so it is wise to check whether the device can handle it, and cancel it when it isn’t required.

1.3.a (ii) ping, traceroute with extended options

The extended ping / traceroute allow the use of specific IP headers to test different network scenarios. Options are:Continue reading

CCIE Written Blueprint: 1.2.a Evaluate proposed changes to a network

As the blueprint goes, this is, in my opinion, the most vague topic to write about. It is dependent on the understanding of the topics, and how the changes will impact the existing network. I have skimmed through this really, with the intention of covering the topics in their actual topic sections. I am pretty used to evaluating impact – I seem to spend my entire life writing change orders and determining “disruptiveness”.

1.2.a Evaluate proposed changes to a network

This is a difficult section to write a paragraph about, as it is based on the understanding of the core topics, analysing the proposed changes and deciding how they will impact / affect the existing network infrastructure. These will be covered in more detail in their specific sections.

1.2.a (i) Changes to routing protocol parameters

Could include things like metrics, additional routes, redistribution. How these changes will impact existing services, etc.

1.2.a (ii) Migrate parts of a network to IPv6

Involves looking at IPv6 transition mechanisms: 6to4 tunnels, Toredo, ISATAP, Dual Stack, etc. Impact on existing services, interoperability, etc.Continue reading

CCIE Written Blueprint: 1.1.f Explain UDP operations

This topic made me think about the starvation stuff. I suppose it is pretty obvious that UDP wouldn’t back off if WRED was employed, but it’s something I never really thought about.

I found a few good videos on YouTube which gave some good RTP/RTCP overviews.

1.1.f (i) Starvation

TCP Starvation / UDP Dominance is experienced in times of congestion where UDP and TCP streams are assigned to the same class. Because UDP has no flow control causing it to back off in the event of congestion, but TCP does, TCP ends up backing off and allowing even more bandwidth to UDP streams to the point where UDP takes over completely. This is not helped by WRED, as the drops caused by WRED would not affect UDP streams.

The best way to resolve this is toContinue reading

CCIE Written Blueprint: 1.1.e Explain TCP operations

TCP – I thought I’d glance over this section. Turns out there was some stuff I’d never heard of, such as the bandwidth delay product.

1.1.e (i) IPv4 and IPv6 PMTU

Path MTU Discovery is the process of sending increasingly larger packets with the DF bit set, until finally a ICMP Destination Unreachable (Packet too large, DF bit set) message is received. The size just below that which caused this message is the maximum MTU for the path. Note that this relies on ICMP traffic being permitted through the network.

1.1.e (ii) MSS

The Maximum Segment Size is the maximum amount of data, in bytes, that can be received in a single TCP segment, excluding the TCP and IP headers. This is separate to MTU – a large TCP segment can be fragmented across multiple packets; the MSS refers to the reassembled size.

1.1.e (iii) Latency

Continue reading

CCIE Written Blueprint: 1.1.d Explain IP operations

1.1.d (i) ICMP unreachable, redirect

ICMP Unreachable

Generated by a host or gateway to indicate that the packet was discarded as the destination is unreachable. It will not be generated for multicast traffic. It is sub-divided into 15 types as follows:

Code Value Message Subtype Description
0 Network Unreachable The datagram could not be delivered to the network specified in the network ID portion of the IP address. Usually means a problem with routing but could also be caused by a bad address.
1 Host Unreachable The datagram was delivered to the network specified in the network ID portion of the IP address but could not be sent to the specific host indicated in the address. Again, this usually implies a routing issue.
2 Protocol Unreachable The protocol specified in the Protocol field was invalid for the host to which the datagram was delivered.
3 Port Unreachable The destination port specified in the UDP or TCP header was invalid.
4 Fragmentation Needed and DF Set The MTU is smaller than the packet size, and the router is not allowed to fragment the packet.This message type is most often used in a “clever” way, by intentionally sending messages of increasing size to discover the maximum transmission size that a link can handle. This process is called MTU path discovery.
5 Source Route Failed Generated if a source route was specified for the datagram in an option but a router could not forward the datagram to the next step in the route.
6 Destination Network Unknown Not used; Code 0 is used instead.
7 Destination Host Unknown The host specified is not known. This is usually generated by a router local to the destination host and usually means a bad address.
8 Source Host Isolated Obsolete, no longer used.
9 Communication with Destination Network is Administratively Prohibited The source device is not allowed to send to the network where the destination device is located.
10 Communication with Destination Host is Administratively Prohibited The source device is allowed to send to the network where the destination device is located, but not that particular device.
11 Destination Network Unreachable for Type of Service The network specified in the IP address cannot be reached due to inability to provide service specified in the Type Of Service field of the datagram header.
12 Destination Host Unreachable for Type of Service The destination host specified in the IP address cannot be reached due to inability to provide service specified in the datagram’s Type Of Service field.
13 Communication Administratively Prohibited The datagram could not be forwarded due to filtering that blocks the message based on its contents.
14 Host Precedence Violation Sent by a first-hop router (the first router to handle a sent datagram) when the Precedence value in the Type Of Service field is not permitted.
15 Precedence Cutoff In Effect Sent by a router when receiving a datagram whose Precedence value (priority) is lower than the minimum allowed for the network at that time.

Continue reading

CCIE Written Blueprint: 1.1.c Explain general network challenges

 

1.1.c (i) Unicast flooding

One of the main causes is asymmetric routing. This is covered in 1.1.c(iii). Useful document here: http://www.cisco.com/c/en/us/support/docs/switches/catalyst-6000-series-switches/23563-143.html

The primary impact of this is that all hosts connected in that VLAN receive the traffic. Suppose two 10gig servers are communicating, and asymmetric routing is taking place; If there is a 100mbps host on the same switch, it is going to receive ALL traffic from the server, effectively saturating the link.

STP TCN’s (topology change notifications) causes forwarding tables to age out quicker than their normal timers. If there is a flapping link causing STP reconvergence, this can cause excessive unicast flooding. Configuring port-fast on all edge interfaces limits TCN’s.Continue reading