Aug 182018
 
VMware vSphere Mobile Watchlist Logo

Did you know that you can monitor and manage your VMware vSphere environment (ESXi hosts, cluster, and VMs) remotely with the “VMware vSphere Mobile Watchlist” app on your Android phone? Well, you can!

Download link: https://play.google.com/store/apps/details?id=com.vmware.beacon

Please note, shortly after this post, VMware removed the availability of this app. If you had it installed prior, you may continue to use it.

Update – June 12th, 2019: VMware released a new vSphere Mobile Client in alpha. For more information, see my new post at: https://www.stephenwagner.com/2019/06/12/vsphere-mobile-client-android-mobile-devices/

The VMware vSphere Mobile Watchlist (VMware Watchlist) Android App

For some time now, I’ve been using this neat little app from VMware (available for download here) to monitor and manage my vSphere cluster remotely. You can use the app while directly on your LAN, or via VPN (I use it with OpenVPN to connect to my Sophos UTM). I’ve even used it while on airplanes using the on board in-flight WiFi.

The reason why I’m posting about this, is because I’ve never actually heard anyone talk about the app (which I find strange), so I’m assuming others aren’t aware of it’s existence as well.

The app runs extremely well on my Samsung S8+, Samsung S9+, and Samsung Tab E LTE tablet. I haven’t run in to any issues or app crashes.

Let’s take a look at the app

vSphere Mobile Watchlist Login Prompt

vSphere Mobile Watchlist Login Prompt

The above screen is where you initially log in. I use my Active Directory credentials (since I have my vCenter server integrated with AD).

vSphere Mobile Watchlist Hosts and VM list

vSphere Mobile Watchlist Hosts and VMs

In the default view (shown above), you can view a brief summary of your ESXi hosts, as well as a list of virtual machines running.

vSphere Mobile Watchlist Host Information

vSphere Mobile Watchlist Host Information

After selecting an ESXi host, you can view the hosts resources, details, related objects, as well as flip over to view host options.

vSphere Mobile Watchlist Host Options

vSphere Mobile Watchlist Host Options

Under host options, you can Enter Maintenance mode, reboot the host, shutdown the host, disconnect the host, or view the hosts’ sensor data.

vSphere Mobile Watchlist Host Sensor List and Fan Data

vSphere Mobile Watchlist Host Sensor Data (Fans)

Checking the HPE Proliant DL360p Gen8 fan sensor data with VMware Watchlist.

vSphere Mobile Watchlist Host Sensor Data (Temperature Sensor List)

vSphere Mobile Watchlist Host Sensor Data (Temperature)

Checking the HPE Proliant DL360p Gen8 temperature sensor data with VMware Watchlist. While not shown above, you can select individual items to pull the actual temperature values. Please Note that the temperature values are missing a decimal (Example: 2100 = 21.00 Celsius).

vSphere Mobile Watchlist VM Information

vSphere Mobile Watchlist VM Information View

When selecting a VM (Virtual Machine) from the default view, you can view the VM’s Resources (CPU, Memory, and Storage), Details (IP Addresses, DNS hostnames, Guest OS, VMWare Tools Status), related objects, and a list of other VMs running on the same host.

vSphere Mobile Watchlist VM Options

vSphere Mobile Watchlist VM Options

Flipping over to the VM options, we have the ability to power off, suspend, reset, shutdown, or gracefully restart the VM. We also have some snapshot functionality to take a snapshot, or manage VM snapshots.

Additional Notes

In my environment I have two HPE DL360p Gen8 Servers and the sensor data is fully supported (I used the HPE custom ESXi install image which includes host drivers).

Apr 292018
 
Directory Services Restore Mode

Running Veeam Backup and Replication, a Microsoft Windows Server Domain Controller may boot in to safe mode and directory services restore mode.

About a week ago, I loaded up Veeam Backup and Replication in to my test environment. It’s a fantastic product, and it’s working great, however today I had a little bit of an issue with a DC running Windows Server 2016 Server Core.

I woke up to a notification that the backup failed due to a VSS snapshot issue. Now I know that VSS can be a little picky at times, so I decided to restart the guest VM. Upon restarting, she came back up, was pingable, and appeared to be running fine, however the backup kept failing with new errors, the event log was looking very strange on the server, and numerous services that were set to automatic were not starting up.

This specific server was installed using Server Core mode, so it has no GUI and is administered via command prompt over RDP, or via remote management utilities. Once RDP’ing in to the server, I noticed the “Safe Mode” branding on each corner of the display, this was very odd. I restarted the server again, this time manually trying to start Active Directory Services manually via services.msc.

This presented:

Event ID: 16652
Source: Directory-Services-SAM
General Description: The domain controller is booting to directory services restore mode.

Screenshot:

Directory Services Restore Mode

The domain controller is booting to directory services restore mode.

 

This surprised me (and scared me for that matter). I immediately started searching the internet to find out what would have caused this…

To my relief, I read numerous sites that advise that when an active backup is running on a guest VM which is a domain controller, Veeam activates directory services restore mode temporarily, so in the event of a restore, it will boot to this mode automatically. In my case, the switch was not changed back during the backup failure.

Running the following command in a command prompt, verifies that the safeboot switch is set to dsrepair enabled:

bcdedit /v

To disable directory services restore mode, type the following in a command prompt:

bcdedit /deletevalue safeboot

Restart the server and the issue should be resolved!

Apr 172018
 

With the news of VMware vSphere 6.7 being released today, a lot of you are looking for the download links for the 6.7 download (including vSphere 6.7, ESXi 6.7, etc…). I couldn’t find it myself, but after doing some scouring through alternative URLs, I came across the link.

VMware vSphere 6.7 DownloadVMware vSphere 6.7 Download Link

Here’s the link: https://my.vmware.com/web/vmware/info/slug/datacenter_cloud_infrastructure/vmware_vsphere/6_7

HPE Specific (HPE Customization for ESXi) Version 6.7 is available at: https://www.hpe.com/us/en/servers/hpe-esxi.html

Unfortunately the page is blank at the moment, however you can bet the download and product listing will be added shortly!

UPDATE 10:15AM MST: The Download link is now live!

More information on the release of vSphere 6.7 can be found here, here, here, here, here, and here.

An article on the upgrade can be found at: https://blogs.vmware.com/vsphere/2018/05/upgrading-vcenter-server-appliance-6-5-6-7.html

Happy Virtualizing!

Jan 202018
 
10ZiG 5948q Zero Client

As promised in my previous post which covered my first impressions, here are some pictures and video of the 10ZiG 5948q zero client in action! In the videos I demonstrate video playback as well as the USB redirection capabilities of the 10ZiG Zero Client and VMware Horizon View. Scroll down to the bottom for videos!

If you’re interesting in 10ZiG products and looking to buy, don’t hesitate to reach out to me for information and/or a quote! We can configure and sell 10ZiG Zero Clients (and thin clients), help with solution design and deployment, and provide consulting services! We sell and ship to Canada and the USA!

Pictures

10ZiG 5948q Zero Client
10ZiG 5948q Zero Client
10ZiG 5948qv Zero Client VMware Horizon View
10ZiG 5948qv Zero Client VMware Horizon View
10ZiG 5900 Series Zero Client VMware Horizon View Login
10ZiG 5900 Series Zero Client VMware Horizon View Login
10ZiG 5948q Series Zero Client Configuration Menu
10ZiG 5948q Series Zero Client Configuration Menu

Videos

In this video, I demonstrate the capability of the 10ZiG 5948q zero client connected to a VMware Horizon View server (via a Unified Access Gateway) playing a video from YouTube. Please note that the ESXi server does not have a GPU and 3D rendering is disabled for the test (this is as low performance as it gets).

In this video, I demonstrate the capability of the 10ZiG 5948q zero client using USB redirection on a live VDI session.

And finally, here’s a video of a 10ZiG zero client cold boot for those that are interested.

And remember, my company Digitally Accurate Inc. is a VMware Solutions Provider and 10ZiG Partner. I’m also regularly posting content on these on the corporate blog as well!

Jan 182018
 

The Problem

I run a Sophos UTM firewall appliance in my VMware vSphere environment and noticed the other day that I was getting warnings on the space used on the ESXi host for the thin-provisioned vmdk file for the guest VM. I thought “Hey, this is weird”, so I enabled SSH and logged in to check my volumes. Everything looked fine and my disk usage was great! So what gives?

After spending some more time troubleshooting and not finding much, I thought to myself “What if it’s not unmapping unused blocks from the vmdk to the host ESXi machine?”. What is unampping you ask? When files get deleted in a guest VM, the free blocks aren’t automatically “unmapped” and released back to the host hypervisor in some cases.

Two things need to happen:

  1. The guest VM has to release these blocks (notify the hypervisor that it’s not using them, making the vmdk smaller)
  2. The host has to reclaim these and issue the unmap command to the storage (freeing up the space on the SAN/storage itself)

On a side note: In ESXi 6.5 and when using VMFS version 6 (VMFS6), the datastores can be configured for automatic unmapping. You can still kick it off manually, but many administrators would prefer it to happen automatically in the background with low priority (low I/O).

Most of my guest VMs automatically do the first step (releasing the blocks back to the host). On Windows this occurs with the defrag utility which issues trim commands and “trims” the volumes. On linux this occurs with the fstrim command. All my guest VMs do this automatically with the exception being the Sophos UTM appliance.

The fix

First, a warning: Enable SSH on the Sophos UTM at your own risk. You need to know what you are doing, this also may pose a security risk and should be disabled once your are finished. You’ll need to “su” to root once you log in with the “loginuser” account.

This fix not only applies to the Sophos UTM, but most other linux based guest virtual machines.

Now to fix the issue, I used the “df” command which provides a list of the filesystems, their mount points, and storage free for those fileystems. I’ve included an example below (this wasn’t the full list):

hostname:/root # df
Filesystem                       1K-blocks     Used Available Use% Mounted on
/dev/sda6                          5412452  2832960   2281512  56% /
udev                               3059712       72   3059640   1% /dev
tmpfs                              3059712      100   3059612   1% /dev/shm
/dev/sda1                           338875    15755    301104   5% /boot
/dev/sda5                         98447760 13659464  79513880  15% /var/storage
/dev/sda7                        129002700  4624468 117474220   4% /var/log
/dev/sda8                          5284460   274992   4717988   6% /tmp
/dev                               3059712       72   3059640   1% /var/storage/chroot-clientlessvpn/dev


You’ll need to run the fstrim command on every mountpoint for file systems “/dev/sdaX” (X means you’ll be doing this for multiple mountpoints). In the example above, you’ll need to run it on “/”, “/boot”, “/var/storage”, “/var/log”, “/tmp”, and any other mountpoints that use “/dev/sdaX” filesytems.

Two examples:

fstrim / -v

fstrim / -v

 

 

fstrim /var/storage -v

fstrim /var/storage -v

 

 

Again, you’ll repeat this for all mount points for your /dev/sdaX storage (X is replaced with the volume number). The command above only works with mountpoints, and not the actual device mappings.

Time to release the unused blocks to the SAN:

The above completes the first step of releasing the storage back to the host. Now you can either let the automatic unmap occur slowly overtime if you’re using VMFS6, or you can manually kick it off. I decided to manually kick it off using the steps I have listed at: https://www.stephenwagner.com/2017/02/07/vmfs-unmap-command-on-vsphere-6-5-with-vmfs-6-runs-repeatedly/

You’ll need to use esxcli to do this. I simply enabled SSH on my ESXi hosts temporarily.

Please note: Using the unmap command on ESXi hosts is very storage I/O intensive. Do this during maintenance window, or at a time of low I/O as this will perform MAJOR I/O on your hosts…

I issue the command (replace “DATASTORENAME” with the name of your datastore):

esxcli storage vmfs unmap --volume-label=DATASTORENAME --reclaim-unit=8

This could run for hours, possibly days depending on your “reclaim-unit” size (this is the block size of the unit you’re trying to reclaim from the VMFS file-system). In this example I choose 8, but most people do something larger like 100, or 200 to reduce the load and time for the command to complete (lower values look for smaller chunks of free space, so the command takes longer to execute).

I let this run for 2 hours on a 10TB datastore, however it may take way longer (possibly 6+ hours, to days).

Finally, not only are we are left with a smaller vmdk file, but we’ve released the space back to the SAN as well!

Jan 062018
 

Last night I updated my VMware VDI envionrment to VMware Horizon 7.4.0. For the most part the upgrade went smooth, however I discovered an issue (probably unrelated to the upgrade itself, and more so just previously overlooked). When connecting with Google Chrome to  VMware Horizon HTML Access via the UAG (Unified Access Gateway), an error pops up after pressing the button saying “Failed to connected to the connection server”.

The Problem:

This error pops up ONLY when using Chrome, and ONLY when connecting through the UAG. If you use a different browser (Firefox, IE), this issue will not occur. If you connect using Chrome to the connection server itself, this issue will not occur. It took me hours to find out what was causing this as virtually nothing popped up when searching for a solution.

Finally I stumbled across a VMware document that mentions on View Connection Server instances and security servers that reside behind a gateway (such as a UAG, or Access Point), the instance must be aware of the address in which browsers will connect to the gateway for HTML access.

The VMware document is here: https://docs.vmware.com/en/VMware-Horizon-7/7.0/com.vmware.horizon-view.installation.doc/GUID-FE26A9DE-E344-42EC-A1EE-E1389299B793.html

To resolve this:

On the view connection server, create a file called “locked.properties” in “install_directory\VMware\VMware View\Server\sslgateway\conf\”.

If you have a single UAG/Access Point, populate this file with:

portalHost=view-gateway.example.com

If you have multiple UAG/Access Points, populate the file with:

portalHost.1=view-gateway-1.example.com
portalHost.2=view-gateway-2.example.com

Restart the server

The issue should now be resolved!

On a side note, I also deleted my VMware Unified Access Gateways VMs and deployed the updated version that ship with Horizon 7.4.0. This means I deployed VMware Unified Access Gateway 3.2.0. There was an issue importing the configuration from the export backup I took from the previous version, so I had to configure from scratch (installing certificates, configuring URLs, etc…), be aware of this issue importing configuration.

 

Oct 272017
 

I went to re-deploy some vDP appliances today and noticed a newer version was made available a few months ago (vSphere Data Protection 6.1.5). After downloading the vSphereDataProtection-6.1.5.ova file, I went to deploy it to my vSphere cluster and it failed due to an invalid certificate and a message reading “The OVF package is signed with an invalid certificate”.

I went ahead and downloaded the certificate to see what was wrong with it. While the publisher was valid, the certificate was only valid from September 5th, 2016 to September 7th, 2017, and today was October 27th, 2017. It looks like the guys at VMware should have generated a new cert before releasing it.

 

 

To resolve this, you need to repackage the OVA file and skip the certificate using the VMware Open Virtualization Format Tool (ovftool) available at https://code.vmware.com/tool/ovf/4.1.0

Once you download and install this, the executable can be found in your Program Files\VMware\VMware OVF Tool folder.

Open a command prompt and change to the above directory and run the following:

ovftool.exe --skipManifestCheck c:\folder\vSphereDataProtection-6.1.5.ova c:\folder\vdpgood.ova

This command will repackage and remove the certificate from the OVA and save it as the new file named vdpgood.ova above. Afterwards deploy it to your vSphere environment and all should be working!

 

Feb 182017
 
Windows Server Volume Shadow Copy Volumes Snapshot Screenshot

On VMware vSphere ESXi 6.5, 6.7, and 7.0, a condition exists where one is unable to take a quiesced snapshot. This is an issue that effects quite a few people and numerous forum threads can be found on the internet by those searching for the solution.

This issues can occur both when taking manual snapshots of virtual machines when one chooses “Quiesce guest filesystem”, or when using snapshot based backup applications such as vSphere Data Protection (vSphere vDP), Veeam, or other applications that utilize quiesced snapshots.

The Issue

I experienced this problem on one of my test VMs (Windows Server 2012 R2), however I believe it can occur on newer versions of Windows Server as well, including Windows Server 2016 and Windows Server 2019.

When this issue occurs, the snapshot will fail and the following errors will be present:

An error occurred while taking a snapshot: Failed to quiesce the virtual machine.
An error occurred while saving the snapshot: Failed to quiesce the virtual machine.

Performing standard troubleshooting, I restarted the VM, checked for VSS provider errors, and confirmed that the Windows Services involved with snapshots were in their correct state and configuration. Unfortunately this had no effect, and everything was configured the way it should be.

I also tried to re-install VMWare tools, which had no effect.

PLEASE NOTE: If you experience this issue, you should confirm the services are in their correct state and configuration, as outlined in VMware KB: 1007696. Source: https://kb.vmware.com/s/article/1007696

The Fix

In the days leading up to the failure when things were running properly, I did notice that the quiesced snapshots for that VM were taking a long time process, but were still functioning correctly before the failure.

This morning during troubleshooting, I went ahead and deleted all the Windows Volume Shadow Copies (VSS Snapshots) which are internal and inside of the Virtual Machine itself. These are the shadow copies that the Windows guest operating system takes on it’s own filesystem (completely unrelated to VMware).

To my surprise after doing this, not only was I able to create a quiesced snapshot, but the snapshot processed almost instantly (200x faster than previously when it was functioning).

If you’re comfortable deleting all your snapshots, it may also be a good idea to fully disable and then re-enable the VSS Snapshots on the volume to make sure they are completely deleted and reset.

I’m assuming this was causing a high load for the VMware snapshot to process and a timeout was being hit on snapshot creation which caused the issue. While Windows volume shadow copies are unrelated to VMware snapshots, they both utilize the same VSS (Volume Shadow Copy Service) system inside of windows to function and process. One must also keep in mind that the Windows volume shadow copies will of course be part of a VMware snapshot since they are stored inside of the VMDK (the virtual disk) file.

PLEASE NOTE: Deleting your Windows Volume Shadow copies will delete your Windows volume snapshots inside of the virtual machine. You will lose the ability to restore files and folders from previous volume shadow copy snapshots. Be aware of what this means and what you are doing before attempting this fix.

Feb 142017
 

Years ago, HPE released the GL200 firmware for their HPE MSA 2040 SAN that allowed users to provision and use virtual disk groups (and virtual volumes). This firmware came with a whole bunch of features such as Read Cache, performance tiering, thin provisioning of virtual disk group based volumes, and being able to allocate and commission new virtual disk groups as required.

(Please Note: On virtual disk groups, you cannot add a single disk to an already created disk group, you must either create another disk group (best practice to create with the same number of disks, same RAID type, and same disk type), or migrate data, delete and re-create the disk group.)

The biggest thing with virtual storage, was the fact that volumes created on virtual disk groups, could span across multiple disk groups and provide access to different types of data, over different disks that offered different performance capabilities. Essentially, via an automated process internal to the MSA 2040, the SAN would place highly used data (hot data) on faster media such as SSD based disk groups, and place regularly/seldom used data (cold data) on slower types of media such as Enterprise SAS disks, or archival MDL SAS disks.

(Please Note: To use the performance tier either requires the purchase of a performance tiering license, or is bundled if you purchase an HPE MSA 2042 which additionally comes with SSD drives for use with “Read Cache” or “Performance tier.)

When the firmware was first released, I had no impulse to try it out since I have 24 x 900GB SAS disks (only one type of storage), and of course everything was running great, so why change it? With that being said, I’ve wanted and planned to one day kill off my linear storage groups, and implement the virtual disk groups. The key reason for me being thin provisioning (the MSA 2040 supports the “DELETE” VAAI function), and virtual based snapshots (in my environment, I require over-commitment of the volume). As a side-note, as of ESXi 6.5, ESXi now regularly unmaps unused blocks when using the VMFS-6 filesystem (if left enabled), which is great for SANs using thin provision that support the “DELETE” VAAI function.

My environment consisted of 2 linear disk groups, 12 disks in RAID5 owned by controller A, and 12 disks in RAID5 owned by controller B (24 disks total). Two weekends ago, I went ahead and migrated all my VMs to the other datastore (on the other volume), deleted the linear disk group, created a virtual disk group, and then migrated all the VMs back, deleted my second linear volume, and created a virtual disk group.

Overall the process was very easy and fast. No downtime is required for this operation if you’re licensed for Storage vMotion in your vSphere environment.

During testing, I’ve noticed absolutely no performance loss using virtual vs linear, except for some functions that utilize the VAAI storage providers which of course run faster on the virtual disk groups since it’s being offloaded to the SAN. This was a major concern for me as block linear based storage is accessed more directly, then virtual disk groups which add an extra level of software involvement between the controllers and disks (block based access vs file based access for the iSCSI targets being provided by the controllers).

Unfortunately since I have no SSDs and no extra room for disks, I won’t be able to try the performance tiering, but I’m looking forward to it in the future.

I highly recommend implementing virtual disk groups on your HPE MSA 2040 SAN!

Feb 082017
 

When running vSphere 6.5, 6.7, or 7.0 (or later) and utilizing a VMFS6 datastore, we now have access to automatic LUN reclaim (this unmaps unused blocks on your LUN), which automatically unmaps unused storage on your LUNs. This is very handy for thin provisioned storage.

Essentially when you unmap blocks, it “tells” the storage (SAN) that unused (deleted or moved data) blocks aren’t being used anymore and to unmap them, which decreases the allocated size on the storage layer and frees up storage space. Your storage LUN must support VAAI and the “Delete” function.

Now taking this a step further, most of you have noticed that storage reclaim in the vSphere client has two settings for priority in the web client; none, or low.

For those of you who feel daring or want to spice life up a bit, you can manually increase the priority of the automated space reclamation through the esxcli command. While I can’t recommend this (obviously VMware chose to hide these options due to performance considerations), you can follow these instructions to change the priority higher.

Manually Configure Storage Reclaim (UNMAP) Priority

To view the current settings:

esxcli storage vmfs reclaim config get --volume-label=DATASTORENAME

To set ESXi reclaim/unmap priority to medium:

esxcli storage vmfs reclaim config set --volume-label=DATASTORENAME --reclaim-priority=medium

To set ESXi reclaim/unmap priority to high:

esxcli storage vmfs reclaim config set --volume-label=DATASTORENAME --reclaim-priority=high

You can confirm these settings took effect by running the first “get” command to view the settings, or view the datastore in the storage section of the vSphere client. While the vSphere client will reflect the higher priority setting, if you change it lower and then want to change it back higher, you’ll need to use the esxcli command to bring it up to a higher priority again.

Happy Virtualizing! Leave a comment!