Puppet Presentation

January 25th, 2012 | by | sysadmin, tips & tricks

Jan
25

For those looking for the slides from the puppet presentation that I gave last week, here they are.

I’ll be working on getting a screen cast of the demo up in the next week or so. Too many other things distracting me at this moment.

No Comments »

Pizza and Puppet

December 15th, 2011 | by | in the news, sysadmin, tips & tricks

Dec
15

For those looking to see my handsome mug in person and listen to my beautiful tenor voice, I will be giving a Puppet demonstration at the January 2012 CIALUG meeting. Details on the event can be found on the events page of the cialug.org site.

Puppetlabs has stepped up to offer sponsorship of the event so besides my insightful talk, there will be pizza and t-shirts that will be handed out while supplies last or whatever game we come up with to give them away. I have a whole box of them so hopefully many of you are sporting the latest in puppet wear before the night is through.

So put it in your calendar now to join me for Pizza and Puppet on January 18th at 7PM in the LightEdge corporate headquarters.

Comments Closed

vmkfstool for the win

December 14th, 2011 | by | sysadmin, vmware

Dec
14

Got a call tonight concerning a VM that was having some issues. We had p2v’d our Virtual Center and when the new vCenter came online, apparently DRS decided to balance the heck out of things and somewhere in the process, a VM got squished and had a variety of issues.

When I was brought in, it wasn’t powering up and throwing an error concerning a lock. So here is what I tried.

  1. Remove the VM from inventory and re-add it. Based on past support calls with VMware, this trick seems to be a tried and true ‘do this first’ method of debugging a VM. I’m not going to go into great detail on why this might work, but needless to say, I’m dealing with a file system lock and re-adding it really does nothing for me here. In theory, it might remove the lock, but unfortunately method failed.
  2. Snapshot shuffle. This is where you take a snapshot of the VM and then attempt to do a Delete All in the snapshot manager. The VM was acting as though it had a snapshot on the system and wasn’t letting it go. By doing a delete all, VMware will look for other snapshots and remove them if they are in the chain, even if they are not showing up in the snapshot manager. This method also failed.
  3. vmkfstools for the win. I’m a unix admin so getting in the shell doesn’t bother me. For others, you might be a bit afraid by this. But I ssh’d into the host, found my VM directory and did a check on each of the VMDK files using vmkfstools. Each came back clean so then I moved onto the breaking of the lock file also with vmkfstools. The command technically stated that it didn’t complete successfully but I found that it did indeed work and I was able to power on the VM. Mission accomplished! Here is a run down of the commands used in this method:


    To check a VMDK:
    vmkfstools -x check virtualMachine.vmdk

    To remove a lock:
    vmkfstools -B virtualMachine.vmdk

    To list all vmkfstools options:
    vmkfstools -h
    -or-
    man vmkfstools

Hopefully you don’t run into this issue anytime soon, but if you do, try a few of these methods and see where you end up. If all else fails, build a new VM and restore from backup. Or…call VMware support, they’re really good at this sort of thing.

Comments Closed

UCS – VIF Hold Down

November 22nd, 2011 | by | sysadmin

Nov
22

Last night I had a maintenance that went fine, but had a little hiccup that I wanted to write about. The maintenance was to simply add another block of WWN Initiators to our configuration and everything ran smooth.

Now, the reason for adding this new block is simply that we were out of WWNs and needed to turn up 4 new UCS B-200 blades. Well, since I have the new WWNs, might as well use them right? This is where the error came up that raised a bunch of alarms for our networking team which in turn, gave me a bit of ass-ache.

The fault code that was raised was Fault Code:F0283 and had a message in this format.

[transport] VIF [chassisId] / [slotId] [switchId]-[id] down, reason: [stateQual][transport] VIF [chassisId] / [id] [switchId]-[id] down, reason: [stateQual]

We ran into the stateQual being VIF Hold Down.

While looking for this error, I found the Cisco UCS Faults and Error Messages Reference which basically explained that “Endpoint(switch/fabric interconnect) reports the connectivity state on virtual interface as one of: a.down, b.errored, c.unavailable.”

Since this was a brand spanking new install and no zoning was present, all things are pointing a bit to the ports not being fully initialized since nothing has loaded a driver or attempted to make them active. Once we have a good first boot, these errors should clear themselves up as our Firmware has the appropriate bootcode for the HBAs, its just a bit of a chicken and egg issue. To test this theory, I simply loaded the ESXi boot CD with the Cisco boot drivers on it and lo and behold, the errors went away.

The confusing part is that this “looks” like an ethernet error when really it should be an HBA error. Someone on the support forum had a similiar issue as well. Also, UCS throws a major link down alarm when the link was never really up to begin with as this is a new install. A warning indication and a clearer error code on where the error actually is would make this error a lot easier to debug.

So, it appears to be harmless in our use case, but wanted to make a note of it here and maybe help out some other bastard with the same issue.

Comments Closed