Nexus 7000 Software Bug - Flash RAID Errors - Part 2

Wed 12 August 2015

This is a continuation of a previous post.

The last post finished where we thought all was good, because the flash status code was reading 0xF0, which we were told means both flash drives are healthy. What we noticed though was that the diag tests were still failing for compact flash - test 7 - on some of the sups. Initially Cisco told us that this was cosmetic, and meant nothing.

However, we sent them some logs for review and confirmation and they came back and told use we still had dodgy flash on 2 sups. In the output of slot [5/6] show system internal raid, where the partitions are listed, you are looking for [UU]. UU means good. An underscore, like in the following output, means you have a bad disk:

Personalities : [raid1] 
md6 : active raid1 sde6[0]
77888 blocks [2/1] [U_] <<< out of 2 disks, 1 is active

md5 : active raid1 sde5[0]
78400 blocks [2/1] [U_] <<<

md4 : active raid1 sde4[0]
39424 blocks [2/1] [U_] <<< 

md3 : active raid1 sde3[0]
1802240 blocks [2/1] [U_] <<

Fortunately, as we already did the hard fix, and it was a single failed flash, the fix was non-disruptive.

TAC provided us with their Flash Recovery Tool Version 10.0(2). When you unzip it, it's a ".gbin" file, and a PDF. The PDF contains some very detailed instructions of when you can and can't use it (basically, single flash failure per sup only), and exactly how to use it and how to verify. To summarise, you FTP the file on to the device and then launch it running load <filename>. It runs through automatically and recovers any faulty flash drives which it can on both sups. You just run it from the primary, and it fixes both. Simple! And...totally non-disruptive. Obviously I would still take the usual precautions - maintenance window, backups, bla bla.

After it finishes you have to restart the diag tests to clear those down and then, for us at least, it was all fixed.

Next, it's time for a software upgrade. Our TAC guy confirmed that the bug is cleared in 6.2(14), so that's likely where we'll be heading.

Share this post

  • Share to Facebook
  • Share to Twitter
  • Share to Google+
  • Share to LinkedIn
  • Share by Email