UCS – VIF Hold Down

Last night I had a maintenance that went fine, but had a little hiccup that I wanted to write about. The maintenance was to simply add another block of WWN Initiators to our configuration and everything ran smooth.

Now, the reason for adding this new block is simply that we were out of WWNs and needed to turn up 4 new UCS B-200 blades. Well, since I have the new WWNs, might as well use them right? This is where the error came up that raised a bunch of alarms for our networking team which in turn, gave me a bit of ass-ache.

The fault code that was raised was Fault Code:F0283 and had a message in this format.

[transport] VIF [chassisId] / [slotId] [switchId]-[id] down, reason: [stateQual][transport] VIF [chassisId] / [id] [switchId]-[id] down, reason: [stateQual]

We ran into the stateQual being VIF Hold Down.

While looking for this error, I found the Cisco UCS Faults and Error Messages Reference which basically explained that “Endpoint(switch/fabric interconnect) reports the connectivity state on virtual interface as one of: a.down, b.errored, c.unavailable.”

Since this was a brand spanking new install and no zoning was present, all things are pointing a bit to the ports not being fully initialized since nothing has loaded a driver or attempted to make them active. Once we have a good first boot, these errors should clear themselves up as our Firmware has the appropriate bootcode for the HBAs, its just a bit of a chicken and egg issue. To test this theory, I simply loaded the ESXi boot CD with the Cisco boot drivers on it and lo and behold, the errors went away.

The confusing part is that this “looks” like an ethernet error when really it should be an HBA error. Someone on the support forum had a similiar issue as well. Also, UCS throws a major link down alarm when the link was never really up to begin with as this is a new install. A warning indication and a clearer error code on where the error actually is would make this error a lot easier to debug.

So, it appears to be harmless in our use case, but wanted to make a note of it here and maybe help out some other bastard with the same issue.