This is a continuation of a previous post.
The last post finished where we thought all was good, because the flash status code was reading 0xF0, which we were told means both flash drives are healthy. What we noticed though was that the diag tests were still failing for compact flash - test 7 - on some of the sups. Initially Cisco told us that this was cosmetic, and meant nothing.
However, we sent them some logs for review and confirmation and they
came back and told use we still had dodgy flash on 2 sups. In the output
slot [5/6] show system internal raid, where the partitions are
listed, you are looking for
[UU]. UU means good. An underscore, like
in the following output, means you have a bad disk:
Personalities : [raid1] md6 : active raid1 sde6 77888 blocks [2/1] [U_] <<< out of 2 disks, 1 is active md5 : active raid1 sde5 78400 blocks [2/1] [U_] <<< md4 : active raid1 sde4 39424 blocks [2/1] [U_] <<< md3 : active raid1 sde3 1802240 blocks [2/1] [U_] <<
Fortunately, as we already did the hard fix, and it was a single failed flash, the fix was non-disruptive.
TAC provided us with their Flash Recovery Tool Version 10.0(2). When you
unzip it, it's a ".gbin" file, and a PDF. The PDF contains some very
detailed instructions of when you can and can't use it (basically,
single flash failure per sup only), and exactly how to use it and how to
verify. To summarise, you FTP the file on to the device and then launch
load <filename>. It runs through automatically and
recovers any faulty flash drives which it can on both sups. You just
run it from the primary, and it fixes both. Simple! And...totally
non-disruptive. Obviously I would still take the usual precautions -
maintenance window, backups, bla bla.
After it finishes you have to restart the diag tests to clear those down and then, for us at least, it was all fixed.
Next, it's time for a software upgrade. Our TAC guy confirmed that the bug is cleared in 6.2(14), so that's likely where we'll be heading.