Tour of VMware’s vCenter Operations Manager

As a former network operations guy, I came to depend on the ability to monitor systems, view capacity and performance metrics and have all relevant information at my fingertips when troubleshooting. As a VMware administrator, it was ideal to have all this information in one place. That’s why I came to love VMware vCenter Operations Manager (vCOps) – the one monitoring solution built by the engineers who built the cornerstone virtualization platform in most enterprise environments today.

Sure there are other monitoring solutions out there for vSphere – some much simpler than vCOps – but the depth of visibility, gathered metrics, capacity planning and reporting gained by vCOps make it a hard-to-beat solution. Of course with the exhaustive array of graphs, tables and dashboards, it can be a bit intimidating and might even send some folks off to those simpler solutions. For this reason, I decided to provide an online tour of the features and capabilities of VMware’s robust solution and show how to maximize its usage. This will be a series of posts published over the next few weeks, the first of which will be a quick start guide to installing vCOps and getting it up and running. Before I publish that post, however, here is an extensive list of official VMware resources as well as some other fantastic resources provided by folks in the community to get anyone started. Hope you find this useful.

Downloads

Product and Support

VMware Training

Third-Party Resources

 

File Lock on Full VMFS Volume

We just recently had a VMFS volume become full due to over-provisioning which caused the VMs on the datastore to stop responding.  Typically the solution is easy – free up space on the volume by migrating VMs off the datastore or increase the space on the underlying volume and expand the datastore.  Since this was just a development environment, we did not have an enterprise-grade array that provided features such as volume autogrow, nor did we even have the luxury of additional space to add to the volume.  We realized we would have to move files off the datastore to free up space to allow the VMs to “breathe” again.  We quickly discovered however, that we could not migrate VMs nor delete any files off the volume.

We were prompted with an error when attempting a VM migration or a file deletion from the vSphere client.  We also tried removing files via the service console which returned the following error:

rm: cannot remove <filename>: Input/output error

It appeared that the files were locked.  Thankfully, we discovered a quick solution.  One of the servers in the cluster had a lock on a file on the full volume but had no space to release the lock.  The only way to manually force this release was to attempt to remove any one file from from this volume from each of the hosts in the cluster.  This command would be successful on whichever host in the cluster was holding the lock.

VMware wrote this KB article stating exactly this solution:  http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1011592

Thankfully, this worked for us and allowed us to free up enough space to perform normal operations on the VMFS volume and get the stopped VMs running once again.

ESX or vSphere Host Not Responding

We just discovered one of our older host servers was in a non-responsive state in vCenter.  After successfully confirming network connectivity of the host server and virtual machines, we determined that the problem must be the host management service was hung.

The issue was resolved by running the following command after logging into the service console:

# service mgmt-vmware restart

About a minute after successfully restarting the host agent service, the host regained connected state and full mangement of the host resumed.

Great VMware KB articles to reference:

Diagnosing an ESX/ESXi host that is disconnected or not responding in vCenter Server:  http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1003409

Restarting the Management agents on an ESX or ESXi Server:  http://kb.vmware.com/selfservice/microsites/search.do?cmd=displayKC&externalId=1003490

Additional note:

Ray Heffer noted in his blog that if the restart hangs, then the process causing the issue must be killed.  We did not need to take this step, but if this situation occurs, Ray has some great notes for killing the conflicting process.

 

Installing Dell OpenManage on ESXi 4.1

I know this topic has been written about in the past, but I figured this would be a good topic for my first technical post.  Just yesterday I had to remind myself how to install Dell OpenManage Server Administrator on a VMware ESXi 4.1 host server.  Since ESXi does not include the Service Console (such old news isn’t this?!) there is no ability to install the OMSA client on the host.  Instead, one simply installs the OpenManage Offline Bundle and VIB on the host, enables the CIM providers and then connects to the host using a locally installed OMSA client.  Dell’s documentation can be found here:  http://support.dell.com/support/edocs/software/smsom/6.3/en/omsa_ig/html/instesxi.htm

Here are the quick and dirty steps using the vSphere CLI.

1.  Download and install the latest version of vSphere CLI:  http://www.vmware.com/support/developer/vcli/  (Requires a MyVMware account.)

2.  Download the latest Dell OpenManage Offline Bundle and VIB for ESXi from http://support.dell.com.  So far the latest I’ve found for ESXi 4.1 is OM-SrvAdmin-Dell-Web-6.5.0-2247.VIB-ESX41i_A01.zip.

3.  Shut down all VMs on the host and place the host in maintenance mode.

4.  Navigate to the working directory of the vSphere CLI.  C:\Program Files (x86)\VMware\VMware vSphere CLI\bin.

5.  Install the OpenManage Bundle to the ESXi host server using the following syntax:  vihostupdate.pl –server <IP address of ESXi host> -i -b <path to Dell OpenManage file>

6.  Restart the ESXi host server after confirmation of successful installation.

Not quite done yet… In order for the newly installed Server Administrator Web Server to communicate with the CIM providers, these providers must be enabled.  To do this through the vSphere Client, follow these next steps:

1.  Logon to the ESXi host using the vSphere Client.

2.  Go to the Configuration tab of the respective host.

3.  Under the Software section, click on Advanced Settings.

4.  In the Advanced Settings, find UserVars.  Then change the value of CIMOEMProviderEnabled field to 1.  Click OKNote: The Dell documentation points out the wrong variable to change.  This should be CIMOEMProviderEnabled (singular).

5.  Execute the Restart Management Agents on the Direct Console User Interface (DCUI) of the ESXi host.  This will hopefully allow the CIM providers to be enabled without rebooting the host.  If it does not, a reboot will have to be initiated.

Bonus:  The above setting can also be modifed using the vSphere CLI.  Use the following command:  vicfg-advcfg.pl –server <ip_address of ESXi host> –username <user_name> –password <password> –set 1 UserVars.CIMOEMProviderEnabled

In order to pull up OpenManage, the OMSA client must be installed on a local desktop or server.

1.  From http://support.dell.com, download and install the latest OpenManage Server Administrator Managed Node for your version of OS.

2.  Type the Hostname/IP Address, Username and Password for the selected host server.  Be sure to check the “Ignore certificate warnings” checkbox.

That’s it!  A little more involved than the old way of installing the Server Administrator on the old ESX servers with the Service Console.  This still allows for full functionality with the smaller footprint afforded by ESXi.