Jan 072024
 
VMware Horizon View Logo

This guide will outline the instructions to Disable the VMware Horizon Session Bar. These instructions can be used to disable the Horizon Session Bar (also known as the Horizon Client Menu Bar or Shade Bar) for full screen Horizon VDI sessions.

Horizon Client Menu Bar (Shade)

The Horizon Client Menu Bar, or “Shade”, is the Session bar at the top of full screen VMware Horizon VDI Sessions.

This Menu Bar provides information on the connection, ability to send key sequences, connect USB devices, restart a VDI guest VM and more.

In same cases, users or administrators may want to disable the Shade.

Disable the Horizon Client Menu Bar (Shade)

There are multiple ways that you can disable the shade including using GPOs as well as the registry on client systems. Please note that if you are setting up clients in Kiosk mode, the shade will be automatically disabled and these instructions aren’t required.

Disable Horizon Shade using GPO

To disable the Shade with GPOs, create a Group Policy Object (or edit the local group policy on the client system), and navigate to the following location:

User Configuration -> Policies -> Administrative Templates -> VMware Horizon Client Configuration

Here, we will set Enable the shade to “Disabled”, as show below:

Disable VMware Horizon Shade using GPO to set “Enable the Shade” to Disabled

Disable Horizon Shade using Registry

To disable the Shade using registry on the client system, navigate to the following registry key:

HKEY_LOCAL_MACHINE\Software\VMware, Inc.\VMware VDM\Client\

Here, we can create a String (REG_SZ) value called EnableShade and set it to False which will disable the Shade.

Additional Information

Jan 052024
 
NVIDIA vGPU Installed in VMware ESXi Host

You may experience GPU issues with the VMware Horizon Indirect Display Driver in your environment when using 3rd party applications which incorrectly utilize the incorrect display adapter. This results with the inability to use and/or run GPU accelerated workloads including VDI, AI, and ML.

This issue effects NVIDIA vGPU (both vGPU and vDGA passthrough), AMD MxGPU, and Intel Data Center GPU Flex GPUs using SR-IOV, in any deployment where the VMware Indirect Display Driver is installed.

When this issue occurs, the application will incorrectly query the capabilities of the VMware Indirect Display Adapter instead of the GPU that is presented to the VM, resulting in a scenario where the application isn’t aware of the capabilities of the GPU you are utilizing, failing to utilize the GPU, and hardware acceleration, such as hardware encoding (NVENC) and hardware decoding.

What is the VMware Horizon Indirect Display Driver

The VMware Horizon Indirect Display Driver, also known as the VMware Indirect Display Driver, is a “virtual” display driver that isn’t bound to a specific hypervisor, and works with many deployments because of the lack of that limitation.

GPU Issues with the VMware Horizon Indirect Display Driver Enabled

This driver is installed with the VMware Horizon agent, and can work in conjunction with hardware acceleration, including GPUs (such as NVIDIA vGPU, AMD MxGPU, and Intel Data Center GPUs using SR-IOV).

Under normal circumstances, the VMware Horizon Indirect Display Driver is prioritized as a fallback driver for remoting protocols, except in environments where no hypervisor or GPU display drivers are available (like Horizon Cloud on Azure) in which case it would become the priority.

The Problem

Applications designed to use a GPU, may not be able to correctly identify which display adapter to use on the VM. While you may have a GPU, vGPU, or 3D acceleration in your environment, the application may be unaware of the device and/or its capabilities.

This is caused by the application either not correctly using the preferred primary display adapter (GPU and/or vGPU), or not being designed to handle multiple display adapters (and drivers).

Example Scenario:

When using CyberLink PowerDirector 360 in a VMware Horizon environment with an NVIDIA vGPU, the application will query the VM’s Windows instance for hardware acceleration capabilities, specifically hardware encoding, hardware decoding, and use of APIs like NVIDIA’s NVENC encoder. In this scenario, while the VM does have an NVIDIA vGPU workstation profile attached with a valid NVIDIA RTX Virtual Workstation (vWS) license, the application is only aware of the VMware Indirect Display Driver and it’s capabilities. This results in all hardware accelerated encoding and decoding capabilities to be disabled.

Example Symptoms

  • 3D Acceleration not detected by application
  • CUDA Cores not available for application
  • OpenCL not available
  • DirectX and Direct3D usage unavailable

In all scenarios, the VM will appear to have 3D acceleration, however one or multiple applications won’t have access.

The Solution

Thanks to the design of the VMware Indirect Display Driver, it should be prioritized in a fashion that it’s used only when other display drivers aren’t available (including NVIDIA vGPU), or system resources aren’t available; however, some 3rd party application may not be able to reference the prioritization, or support multi-GPU (multi display driver), resulting in the incorrect display adapter being used.

As a workaround, you can remove the VMware Indirect Display Driver from the Windows instance running in the VM.

NVIDIA vGPU with VMware Horizon Indirect Display Driver Removed

Please note that simply disabling the “VMware Horizon Indirect Display Driver” will not suffice. A full removal (Right Click, “Uninstall Device”) is required to workaround this issue. Additionally, upgrading or re-installing the VMware Horizon Agent will re-install the VMware Indirect Display Driver.

Oct 072023
 
Installing VDI optimized New Teams client application on Windows VDI

In this guide we will deploy and install the new Microsoft Teams for VDI (Virtual Desktop Infrastructure) client, and enable Microsoft Teams Media Optimization on VMware Horizon.

This guide replaces and supersedes my old guide “Microsoft (Classic) Teams VDI Optimization for VMware Horizon” which covered the old Classic Teams client and VDI optimizations. The new Microsoft Teams app requires the same special considerations on VDI, and requires special installation instructions to function VMware Horizon and other VDI environments.

You can run the old and new Teams applications side by side in your environment as you transition users.

New Teams client with toggle for old version running on VMware Horizon VDI with optimization
Switch between New Teams and old Teams on VDI

Let’s cover what the new Microsoft Teams app is about, and how to install it in your VDI deployment.

Please note: VDI (Virtual Desktop Infrastructure) support for the new Teams client went G.A. (Generally Availabile) on December 05, 2023. Additionally, Classic teams will go end of support on June 30, 2024.

Table of Contents

Please see below for a table of contents:

The New Microsoft Teams App

On October 05, 2023, Microsoft announced the availability of the new Microsoft Teams application for Windows and Mac computers. This application is a complete rebuild from the old client, and provides numerous enhancements with performance, resource utilization, and memory management.

New Microsoft Teams app VDI optimized with Toggle for new/old version

Ultimately, it’s way faster, and consumes way less memory. And fortunately for us, it supports media optimizations for VDI environments.

My close friend and colleague, mobile jon, did a fantastic in-depth Deep Dive into the New Microsoft Teams and it’s inner workings that I highly recommend reading.

Interestingly enough, it uses the same media optimization channels for VDI as the old client used, so enablement and/or migrating from the old version is very simple if you’re running VMware Horizon, Citrix, AVD, and/or Windows 365.

Install New Microsoft Teams for VDI

While installing the new Teams is fairly simple for non-VDI environment (by simply either enabling the new version in the Teams Admin portal, or using your application manager to deploy the installer), a special method is required to deploy on your VDI images, whether persistent or non-persistent.

Do not include and bundle the Microsoft Teams install with your Microsoft 365 (Office 365) deployment as these need to be installed separately.

Please Note: If you have deployed non-persistent VDI (Instant Clones), you’ll want to make sure you disable auto-updates, as these should be performed manually on the base image. For persistent VDI, you will want auto updates enabled. See below for more information on configurating auto-updates.

You will also need to enable Microsoft Teams Media Optimization for the VDI platform you are using (in my case and example, VMware Horizon).

Considerations for New Teams on VDI

  • Auto-updates can be disabled via a registry key
  • New Teams client app uses the same VDI media optimization channels as the old teams (for VMware Horizon, Citrix, AVD, and W365)
    • If you have already enabled Media Optimization for Teams on VDI for the old version, you can simply install the client using the special bulk installer for all users as shown below, as the new client uses the existing media optimizations.
  • While it is recommended to uninstall the old client and install the new client, you can choose to run both versions side by side together, providing an option to your users as to which version they would like to use.

Enable Media Optimization for Microsoft Teams on VDI

If you haven’t previously for the old client, you’ll need to enable the Teams Media Optimizations for VDI for your VDI platform.

For VMware Horizon, we’ll create a GPO and set the “Enable HTML5 Features” and “Enable Media Optimization for Microsoft Teams”, to “Enabled”. If you have done this for the old Teams app, you can skip this.

Please see below for the GPO setting locations:

Computer Configuration -> Policies -> Administrative Templates -> VMware View Agent Configuration -> VMware HTML5 Features -> Enable VMware HTML5 Features
Computer Configuration -> Policies -> Administrative Templates -> VMware View Agent Configuration -> VMware HTML5 Features -> VMware WebRTC Redirection Features -> Enable Media Optimization for Microsoft Teams

When installing the VMware Horizon client on Windows computers, you’ll need to make sure you check and enable the “Media Optimization for Microsoft Teams” option on the installer if prompted. Your install may automatically include Teams Optimization and not prompt.

Screenshot of VMware View Client Install with Microsoft Teams Optimization
VMware Horizon Client Install with Media Optimization for Microsoft Teams

If you are using a thin client or zero client, you’ll need to make sure you have the required firmware version installed, and any applicable vendor plugins installed and/or configurables enabled.

Install New Microsoft Teams client on VDI

At this time, we will now install the new Teams app on to both non-persistent images, and persistent VDI VM guests. This method performs a live download and provisions as Administrator. If running this un-elevated, an elevation prompt will appear:

  1. Download the new Microsoft Teams Bootstrapper: https://go.microsoft.com/fwlink/?linkid=2243204&clcid=0x409
  2. On your persistent or non-persistent VM, run the following command as an administrator: teamsbootstrapper.exe -p
  3. Restart the VM (and/or seal your image for deployment)
Installing
Install the new Teams for VDI (Virtual Desktop Infrastructure) with teamsbootstrapper.exe

See below for an example of the deployment:

C:\Users\Administrator.DOMAIN\Downloads>teamsbootstrapper.exe -p
{
  "success": true
}

You’ll note that running the command returns success equals true, and Teams is now installed for all users on this machine.

Install New Microsoft Teams client on VDI (Offline Installer using MSIX package)

Additionally, you can perform an offline installation by also downloading the MSI-X packages and running the following command:

teamsbootstrapper.exe -p -o "C:\LOCATION\MSTeams-x64.msix"
New Teams admin provisioned offline install for VDI
New Teams admin provisioned offline install for VDI

For the offline installation, you’ll need to download the appropriate MSI-X file in additional to the bootstrapper above. See below for download links:

Disable New Microsoft Teams Client Auto Updates

For non-persistent environments, you’ll want to disable the auto update feature and install updates manually on your base image.

To disable auto-updates for the new Teams client, configure the registry key below on your base image:

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Teams

Create a DWORD value called “disableAutoUpdate”, and set to value of “1”.

New Teams app disappears after Optimization with OSOT

If you are using the VMware Operating System Optimization Tool (OSOT), you may notice that after installing New Teams in your base or golden image, that it disappears when publishing and pushing the image to your desktop pool.

The New Teams application is a Windows Store app, and organizations commonly choose to remove all Windows Store apps inside the golden image using the OSOT tool when optimizing the image. Doing this will remove New Teams from your image.

To workaround this issue, you’ll need to choose “Keep all Windows Store Applications” in the OSOT common options, which won’t remove Teams.

Using New Microsoft Teams with FSLogix Profile Containers

When using the new Teams client with FSLogix Profile Containers on non-persistent VDI, you must upgrade to FSLogix version 2.9.8716.30241 to support the new teams client.

Confirm New Microsoft Teams VDI Optimization is working

To confirm that VDI Optimization is working on New Teams, open New Teams, click the “…” in the top right next to your user icon, click “Settings”, then click on “About Teams” on the far bottom of the Settings menu.

New Teams showing “VMware Media Optimized”

You’ll notice “VMware Media Optimized” which indicates VDI Optimization for VMware Horizon is functioning. The text will reflect for other platforms as well.

Uninstall New Microsoft Teams on VDI

The Teams Boot Strap utility can also remove teams for all users on this machine as well by using the “-x” flag. Please see below for all the options for “teamsbootstrapper.exe”:

C:\Users\Administrator.DOMAIN\Downloads>teamsbootstrapper.exe --help
Provisioning program for Microsoft Teams.

Usage: teamsbootstrapper.exe [OPTIONS]

Options:
  -p, --provision-admin    Provision Teams for all users on this machine.
  -x, --deprovision-admin  Remove Teams for all users on this machine.
  -h, --help               Print help

Install New Microsoft Teams on VMware App Volumes / Citrix App Layering

Using the New Teams bootstrapper, it appears that it evades and doesn’t work with App Packaging and App attaching technologies such as VMware App Volumes and Citrix Application layering.

The New Teams bootstrapper downloads and installs an MSIX app package to the computer running the bootstrapper.

To deploy and install new Teams on VMware App Volumes or Citrix App Layering (or other app technologies), you’ll most likely need to download and import the MSIX package in to the application manager and deploy using that.

Conclusion

It’s great news that we finally have a better performing Microsoft Teams client that supports VDI optimizations. With new Teams support for VDI reaching GA, and with the extensive testing I’ve performed in my own environment, I’d highly recommend switching over at your convenience!

Jul 232023
 
Azure AD SSO with Horizon

With the release of VMware Horizon 2303, VMware Horizon now supports Hybrid Azure AD Join with Azure AD Connect using Instant Clones and non-persistent VDI.

So what exactly does this mean? It means you can now use Azure SSO using PRT (Primary Refresh Token) to authenticate and access on-premise and cloud based applications and resources.

What else? It allows you to use conditional access!

What is Hybrid Azure AD Join, and why would we want to do it with Azure AD Connect?

Historically, it was a bit challenging when it came to Understanding Microsoft Azure AD SSO with VDI (click to read the post and/or see the video), and special considerations had to be made when an organization wished to implement SSO between their on-prem non-persistent VDI deployment and Azure AD.

Screenshot of a Hybrid Azure AD Joined login
Hybrid Azure AD Joined Login

Azure AD SSO, the old way

The old way to accomplish this was to either implement Azure AD with ADFS, or use Seamless SSO. ADFS was bulky and annoying to manage, and Seamless SSO was actually intended to enable SSO on “downlevel devices” (older operating systems before Windows 10).

For customers without ADFS, I would always recommend using Seamless SSO to enable SSO on non-persistent VDI Instant Clones, until now!

Azure AD SSO, the new way with Azure AD Connect and Azure SSO PRTs

According to the release notes for VMware Horizon 2303:

Hybrid Azure Active Directory for SSO is now supported on instant clone desktop pools. See KB 89127 for details.

This means we can now enable and use Azure SSO with PRTs (Primary Refresh Tokens) using Azure AD Connect and non-persistent VDI Instant Clones.

Azure SSO with PRT and Non-Persistent VDI

This is actually a huge deal because not only does it allow us to use the preferred method for performing SSO with Azure, but it also allows us to start using fancy Azure features like conditional access!

Requirements for Hybrid Azure AD Join with non-persistent VDI and Azure AD Connect

In order to utilize Hybrid Join and PRTs with non-persistent VDI on Horizon, you’ll need the following:

  • VMware Horizon 2303 (or later)
  • Active Directory
  • Azure AD Connect (Implemented, Configured, and Functioning)
    • Azure AD Hybrid Domain Join must be enabled
    • OU and Object filtering must include the non-persistent computer objects and computer accounts
  • Create a VMware Horizon Non-Persistent Desktop Pool for Instant Clones
    • “Allow Reuse of Existing Computer Accounts” must be checked

When you configure this, you’ll notice that after provisioning a desktop pool (or pushing a new snapshot), that there may be a delay for PRTs to be issued. This is expected, however the PRT will be issued eventually, and subsequent desktops shouldn’t experience issues unless you have a limited number available.

*Please note: VMware still notes that ADFS is the preferred way for fast issuance of the PRT.

While VMware does recommend ADFS for performance when issuing PRTs, in my own testing I had no problems or complaints, however when deploying this in production I’d recommend that because of the PRT delay after deploying the pool or a new snapshot, to do this after hours or SSO will not function for some users who immediately get a new desktop.

Additional Considerations

Please note the following:

  • When switching from ADFS to Azure AD Connect, the sign-in process may change for users.
    • You must prepare the users for the change.
  • When using locally stored identifies and/or cached credentials, enabling Azure SSO may change the login process, or cause issues for users signing in.
    • You may have to delete saved credentials in the users persistent profile
    • You may have to adjust GPOs to account for Azure SSO
    • You may have to modify settings in your profile persistent solution
      • Example: “RoamIdentity” on FSLogix
  • I recommend testing before implementing
    • Test Environment
    • Test with new/blank user profiles
    • Test with existing users

If you’re coming from an environment that was previously using Seamless SSO for non-persistent VDI, you can create new test desktop pools that use newly created Active Directory OU containers and adjust the OU filtering appropriately to include the test OUs for synchronization to Azure AD with Azure AD Connect. This way you’re only syncing the test desktop pool, while allowing Seamless SSO to continue to function for existing desktop pools.

How to test Azure AD Hybrid Join, SSO, and PRT

To test the current status of Azure AD Hybrid Join, SSO, and PRT, you can use the following command:

dsregcmd /status

To check if the OS is Hybrid Domain joined, you’ll see the following:

+----------------------------------------------------------------------+
| Device State                                                         |
+----------------------------------------------------------------------+

             AzureAdJoined : YES
          EnterpriseJoined : NO
              DomainJoined : YES
                DomainName : DOMAIN

As you can see above, “AzureADJoined” is “YES”.

Further down the output, you’ll find information related to SSO and PRT Status:

+----------------------------------------------------------------------+
| SSO State                                                            |
+----------------------------------------------------------------------+

                AzureAdPrt : YES
      AzureAdPrtUpdateTime : 2023-07-23 19:46:19.000 UTC
      AzureAdPrtExpiryTime : 2023-08-06 19:46:18.000 UTC
       AzureAdPrtAuthority : https://login.microsoftonline.com/XXXXXXXX-XXXX-XXXXXXX
             EnterprisePrt : NO
    EnterprisePrtAuthority :
                 OnPremTgt : NO
                  CloudTgt : YES
         KerbTopLevelNames : XXXXXXXXXXXXX

Here we can see that “AzureAdPrt” is YES which means we have a valid Primary Refresh Token issued by Azure AD SSO because of the Hybrid Join.

Mar 052023
 
NVIDIA vGPU

In this NVIDIA vGPU Troubleshooting Guide, I’ll help show you how to troubleshoot NVIDIA vGPU issues on VMware platforms, including VMware Horizon and VMware Tanzu. This guide applies to the full vGPU platform, so it’s relevant for VDI, AI, ML, and Kubernetes workloads, as well other virtualization platforms.

This guide will provide common troubleshooting methods, along with common issues and problems associated with NVIDIA vGPU as well as their fixes.

Please note, there are numerous other additional methods available to troubleshoot your NVIDIA vGPU deployment, including 3rd party tools. This is a general document provided as a means to get started learning how to troubleshoot vGPU.

NVIDIA vGPU

NVIDIA vGPU is a technology platform that includes a product line of GPUs that provide virtualized GPUs (vGPU) for Virtualization environments. Using a vGPU, you can essentially “slice” up a physical GPU and distribute Virtual GPUs to a number of Virtual Machines and/or Kubernetes containers.

NVIDIA vGPU Installed in VMware ESXi Host
NVIDIA vGPU Installed in VMware ESXi Host

These virtual machines and containers can then use these vGPU’s to provide accelerated workloads including VDI (Virtual Desktop Infrastructure), AI (Artificial Intelligence), and ML (Machine Learning).

While the solution works beautifully, when deployed incorrectly or if the solution isn’t maintained, issues can occur requiring troubleshooting and remediation.

At the end of this blog post, you’ll find some additional (external) links and resources, which will assist further in troubleshooting.

Troubleshooting

Below, you’ll find a list of my most commonly used troubleshooting methods.

Please click on an item below which will take you directly to the section in this post.

Common Problems

Below is a list of problems and issues I commonly see customers experience or struggle with in their vGPU enabled VMware environments.

Please click on an item below which will take you directly to the section in this post.

vGPU Troubleshooting

Using “nvidia-smi”

The NVIDIA vGPU driver comes with a utility called the “NVIDIA System Management Interface”. This CLI program allows you to monitor, manage, and query your NVIDIA vGPU (including non-vGPU GPUs).

Screenshot of "nvidia-smi" command running on VMware ESXi host with NVIDIA GPU
NVIDIA vGPU “nvidia-smi” command

Simply running the command with no switches or flags, allow you to query and pull basic information on your vGPU, or multiple vGPUs.

For a list of available switches, you can run: “nvidia-smi -h”.

Running “nvidia-smi” on the ESXi Host

To use “nvidia-smi” on your VMware ESXi host, you’ll need to SSH in and/or enable console access.

When you launch “nvidia-smi” on the ESXi host, you’ll see information on the physical GPU, as well as the VM instances that are consuming a virtual GPU (vGPU). This usage will also provide information like fan speeds, temperatures, power usage and GPU utilization.

[root@ESXi-Host:~] nvidia-smi
Sat Mar  4 21:26:05 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.85.07    Driver Version: 525.85.07    CUDA Version: N/A      |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A2           On   | 00000000:04:00.0 Off |                  Off |
|  0%   36C    P8     8W /  60W |   7808MiB / 16380MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A   2108966    C+G   VM-WS02                          3904MiB |
|    0   N/A  N/A   2108989    C+G   VM-WS01                          3904MiB |
+-----------------------------------------------------------------------------+

This will aid with troubleshooting potential issues specific to the host or the VM. The following pieces of information are helpful:

  • Driver Version
  • GPU Fan and Temperature Information
  • Power Usage
  • GPU Utilization (GPU-Util)
  • ECC Information and Error Count
  • Virtual Machine VMs assigned a vGPU
  • vGPU Type (C+G means Compute and Graphics)

Additionally, instead of running once, you can issue “nvidia-smi -l x” replacing “x” with the number of seconds you’d like it to auto-loop and refresh.

Example:

nvidia-smi -l 3

The above would refresh and loop “nvidia-smi” every 3 seconds.

For vGPU specific information from the ESXi host, you can run:

nvidia-smi vgpu
root@ESXi-Host:~] nvidia-smi vgpu
Mon Mar  6 11:47:44 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.85.07              Driver Version: 525.85.07                 |
|---------------------------------+------------------------------+------------+
| GPU  Name                       | Bus-Id                       | GPU-Util   |
|      vGPU ID     Name           | VM ID     VM Name            | vGPU-Util  |
|=================================+==============================+============|
|   0  NVIDIA A2                  | 00000000:04:00.0             |   0%       |
|      3251713382  NVIDIA A2-4Q   | 2321577  VMWS01              |      0%    |
+---------------------------------+------------------------------+------------+

This command shows information on the vGPU instances currently provisioned.

There are also a number of switches you can throw at this to get even more information on vGPU including scheduling, vGPU types, accounting, and more. Run the following command to view the switches:

nvidia-smi vgpu -h

Another common switch I use on the ESXi host with vGPU for troubleshooting is: “nvidia-smi -q”, which provides lots of information on the physical GPU in the host:

[root@ESXi-HOST:~] nvidia-smi -q

==============NVSMI LOG==============

Timestamp                                 : Sat Mar  4 21:26:18 2023
Driver Version                            : 525.85.07
CUDA Version                              : Not Found
vGPU Driver Capability
        Heterogenous Multi-vGPU           : Supported

Attached GPUs                             : 1
GPU 00000000:04:00.0
    Product Name                          : NVIDIA A2
    Product Brand                         : NVIDIA
    Product Architecture                  : Ampere
    Display Mode                          : Enabled
    Display Active                        : Disabled
    Persistence Mode                      : Enabled
    vGPU Device Capability
        Fractional Multi-vGPU             : Not Supported
        Heterogeneous Time-Slice Profiles : Supported
        Heterogeneous Time-Slice Sizes    : Not Supported
    MIG Mode
        Current                           : N/A
        Pending                           : N/A
    Accounting Mode                       : Enabled
    Accounting Mode Buffer Size           : 4000
    Driver Model
        Current                           : N/A
        Pending                           : N/A
    Serial Number                         : XXXN0TY0SERIALZXXX
    GPU UUID                              : GPU-de23234-3450-6456-e12d-bfekgje82743a
    Minor Number                          : 0
    VBIOS Version                         : 94.07.5B.00.92
    MultiGPU Board                        : No
    Board ID                              : 0x400
    Board Part Number                     : XXX-XXXXX-XXXX-XXX
    GPU Part Number                       : XXXX-XXX-XX
    Module ID                             : 1
    Inforom Version
        Image Version                     : G179.0220.00.01
        OEM Object                        : 2.0
        ECC Object                        : 6.16
        Power Management Object           : N/A
    GPU Operation Mode
        Current                           : N/A
        Pending                           : N/A
    GSP Firmware Version                  : N/A
    GPU Virtualization Mode
        Virtualization Mode               : Host VGPU
        Host VGPU Mode                    : SR-IOV
    IBMNPU
        Relaxed Ordering Mode             : N/A
    PCI
        Bus                               : 0x04
        Device                            : 0x00
        Domain                            : 0x0000
        Device Id                         : 0x25B610DE
        Bus Id                            : 00000000:04:00.0
        Sub System Id                     : 0x157E10DE
        GPU Link Info
            PCIe Generation
                Max                       : 3
                Current                   : 1
                Device Current            : 1
                Device Max                : 4
                Host Max                  : N/A
            Link Width
                Max                       : 16x
                Current                   : 8x
        Bridge Chip
            Type                          : N/A
            Firmware                      : N/A
        Replays Since Reset               : 0
        Replay Number Rollovers           : 0
        Tx Throughput                     : 0 KB/s
        Rx Throughput                     : 0 KB/s
        Atomic Caps Inbound               : N/A
        Atomic Caps Outbound              : N/A
    Fan Speed                             : 0 %
    Performance State                     : P8
    Clocks Throttle Reasons
        Idle                              : Active
        Applications Clocks Setting       : Not Active
        SW Power Cap                      : Not Active
        HW Slowdown                       : Not Active
            HW Thermal Slowdown           : Not Active
            HW Power Brake Slowdown       : Not Active
        Sync Boost                        : Not Active
        SW Thermal Slowdown               : Not Active
        Display Clock Setting             : Not Active
    FB Memory Usage
        Total                             : 16380 MiB
        Reserved                          : 264 MiB
        Used                              : 7808 MiB
        Free                              : 8306 MiB
    BAR1 Memory Usage
        Total                             : 16384 MiB
        Used                              : 1 MiB
        Free                              : 16383 MiB
    Compute Mode                          : Default
    Utilization
        Gpu                               : 0 %
        Memory                            : 0 %
        Encoder                           : 0 %
        Decoder                           : 0 %
    Encoder Stats
        Active Sessions                   : 0
        Average FPS                       : 0
        Average Latency                   : 0
    FBC Stats
        Active Sessions                   : 0
        Average FPS                       : 0
        Average Latency                   : 0
    Ecc Mode
        Current                           : Disabled
        Pending                           : Disabled
    ECC Errors
        Volatile
            SRAM Correctable              : N/A
            SRAM Uncorrectable            : N/A
            DRAM Correctable              : N/A
            DRAM Uncorrectable            : N/A
        Aggregate
            SRAM Correctable              : N/A
            SRAM Uncorrectable            : N/A
            DRAM Correctable              : N/A
            DRAM Uncorrectable            : N/A
    Retired Pages
        Single Bit ECC                    : N/A
        Double Bit ECC                    : N/A
        Pending Page Blacklist            : N/A
    Remapped Rows
        Correctable Error                 : 0
        Uncorrectable Error               : 0
        Pending                           : No
        Remapping Failure Occurred        : No
        Bank Remap Availability Histogram
            Max                           : 64 bank(s)
            High                          : 0 bank(s)
            Partial                       : 0 bank(s)
            Low                           : 0 bank(s)
            None                          : 0 bank(s)
    Temperature
        GPU Current Temp                  : 37 C
        GPU T.Limit Temp                  : N/A
        GPU Shutdown Temp                 : 96 C
        GPU Slowdown Temp                 : 93 C
        GPU Max Operating Temp            : 86 C
        GPU Target Temperature            : N/A
        Memory Current Temp               : N/A
        Memory Max Operating Temp         : N/A
    Power Readings
        Power Management                  : Supported
        Power Draw                        : 8.82 W
        Power Limit                       : 60.00 W
        Default Power Limit               : 60.00 W
        Enforced Power Limit              : 60.00 W
        Min Power Limit                   : 35.00 W
        Max Power Limit                   : 60.00 W
    Clocks
        Graphics                          : 210 MHz
        SM                                : 210 MHz
        Memory                            : 405 MHz
        Video                             : 795 MHz
    Applications Clocks
        Graphics                          : 1770 MHz
        Memory                            : 6251 MHz
    Default Applications Clocks
        Graphics                          : 1770 MHz
        Memory                            : 6251 MHz
    Deferred Clocks
        Memory                            : N/A
    Max Clocks
        Graphics                          : 1770 MHz
        SM                                : 1770 MHz
        Memory                            : 6251 MHz
        Video                             : 1650 MHz
    Max Customer Boost Clocks
        Graphics                          : 1770 MHz
    Clock Policy
        Auto Boost                        : N/A
        Auto Boost Default                : N/A
    Voltage
        Graphics                          : 650.000 mV
    Fabric
        State                             : N/A
        Status                            : N/A
    Processes
        GPU instance ID                   : N/A
        Compute instance ID               : N/A
        Process ID                        : 2108966
            Type                          : C+G
            Name                          : VM-WS02
            Used GPU Memory               : 3904 MiB
        GPU instance ID                   : N/A
        Compute instance ID               : N/A
        Process ID                        : 2108989
            Type                          : C+G
            Name                          : VM-WS01
            Used GPU Memory               : 3904 MiB

As you can see, you can pull quite a bit of information in detail from the vGPU, as well as the VM processes.

Running “nvidia-smi” on the VM Guest

You can also run “nvidia-smi” inside of the guest VM, which will provide you information on the vGPU instance that is being provided to that specific VM, along with information on the guest VM’s processes that are utilizing the GPU.

Screenshot of "nvidia-smi" running on guest virtual machine VM
“nvidia-smi” Running on Guest VM

This is helpful for providing information on the guest VM’s usage of the vGPU instance, as well as processes that require GPU usage.

Virtual Machine log files

Each Virtual Machine has a “vmware.log” file inside of the VM’s folder on the datastore.

To identify logging events pertaining to NVIDIA vGPU, you can search for the “vmiop” string inside of the vmware.log file.

Example:

cat /vmfs/volumes/DATASTORE/VirtualMachineName/vmware.log | grep -i vmiop

The above will read out any lines inside of the log that have the “vmiop” string inside of them. The “-i” flag instructs grep to ignore case sensitivity.

This logs provide initialization information, licensing information, as well as XID error codes and faults.

ESXi Host log files

Additionally, since the ESXi host is running the vGPU Host Driver (vGPU Manager), it also has logs that pertain and assist with vGPU troubleshooting.

Some commands you can run are:

cat /var/log/vmkernel.log | grep -i vmiop
cat /var/log/vmkernel.log | grep -i nvrm
cat /var/log/vmkernel.log | grep -i nvidia

The above commands will pull NVIDIA vGPU related log items from the ESXi log files.

Using “dxdiag” in the guest VM

Microsoft has a tool called “dxdiag” which provides diagnostic infromation for testing and troubleshooting video (and sound) with DirectX.

I find this tool very handy for quickly verifying

Microsoft DirectX "dxdiag" showing information on vGPU
NVIDIA vGPU with Microsoft DirectX “dxdiag” tool

As you can see:

  • DirectDraw Acceleration: Enabled
  • Direct3D Acceleration: Enabled
  • AGP Texture Acceleration: Enabled
  • DirectX 12 Ultimate: Enabled

The above show that hardware acceleration is fully functioning with DirectX. This is a indicator that things are generally working as expected. If you have a vGPU and one of the first three is showing as disabled, then you have a problem that requires troubleshooting. Additionally, if you do not see your vGPU card, then you have a problem that requires troubleshooting.

Please Note: You may not see “DirectX 12 Ultimate” as this is related to licensing.

Using the “VMware Horizon Performance Monitor”

The VMware Horizon Performance Monitor, is a great tool that can be installed by the VMware Horizon Agent, that allows you to pull information (stats, connection information, etc) for the session. Please note that this is not installed by default, and must be selected when running the Horizon Agent installer.

When it comes to troubleshooting vGPU, it’s handy to use this too to confirm you’re getting H.264 or H.265/HEVC offload from the vGPU instance, and also get information on how many FPS (Frames Per Second) you’re getting from the session.

VMware Horizon Performance Monitor showing vGPU NVIDIA NvEnc HEVC as encoder type
VMware Horizon Performance Tracker with NVIDIA vGPU

Once opening, you’ll change the view above using the specified selector, and you can see what the “Encoder Name” is being used to encode the session.

Examples of GPU Offload “Encoder Name” types:

  • NVIDIA NvEnc HEVC 4:2:0 – This is using the vGPU offload using HEVC
  • NVIDIA NvEnc HEVC 4:4:4 – This is using the vGPU offload using HEVC high color accuracy
  • NVIDIA NvEnc H264 4:2:0 – This is using the vGPU offload using H.264
  • NVIDIA NvEnc H264 4:4:4 – This is using the vGPU offload using H.264 high color accuracy

Examples of Software (CPU) Session “Encoder Name” types:

  • BlastCodec – New VMware Horizon “Blast Codec”
  • h264 4:2:0 – Software CPU encoded h.264

If you’re seeing “NVIDIA NvEnc” in the encoder name, then the encoding is being offloaded to the GPU resulting in optimum performance. If you don’t see it, it’s most likely using the CPU for encoding, which is not optimal if you have a vGPU, and requires further troubleshooting.

NVIDIA vGPU Known Issues

Depending on the version of vGPU that you are running, there can be “known issues”.

When viewing the NVIDIA vGPU Documentation, you can view known issues, and fixes that NVIDIA may provide. Please make sure to reference the documentation specific to the version you’re running and/or the version that fixes the issues you’re experiencing.

vGPU Common Problems

There are a number of common problems that I come across when I’m contacted to assist with vGPU deployments.

Please see below for some of the most common issues I experience, along with their applicable fix/workaround.

XID Error Codes

When viewing your Virtual Machine VM or ESXi log file, and experiencing an XID error or XID fault, you can usually look up the error codes.

Typically, vGPU errors will provide an “XiD Error” code, which can be looked up on NVIDIA’s Xid Messages page here: XID Errors :: GPU Deployment and Management Documentation (nvidia.com).

The table on this page allows you to lookup the XID code, find the cause, and also provides information if the issue is realted to “HW Error” (Hardware Error), “Driver Error”, “User App Error”, “System Memory Corruption”, “Bus Error”, “Thermal Issue”, or “FB Corruption”.

An example:

2023-02-26T23:33:24.396Z Er(02) vthread-2108265 - vmiop_log: (0x0): XID 45 detected on physical_chid:0x60f, guest_chid:0xf
2023-02-26T23:33:36.023Z Er(02) vthread-2108266 - vmiop_log: (0x0): Timeout occurred, reset initiated.
2023-02-26T23:33:36.023Z Er(02) vthread-2108266 - vmiop_log: (0x0): TDR_DUMP:0x52445456 0x00e207e8 0x000001cc 0x00000001
2023-02-26T23:33:36.023Z Er(02) vthread-2108266 - vmiop_log: (0x0): TDR_DUMP:0x00989680 0x00000000 0x000001bb 0x0000000f
2023-02-26T23:33:36.023Z Er(02) vthread-2108266 - vmiop_log: (0x0): TDR_DUMP:0x00000100 0x00000000 0x0000115e 0x00000000
2023-02-26T23:33:36.023Z Er(02) vthread-2108266 - vmiop_log: (0x0): TDR_DUMP:0x00000000 0x00000000 0x00001600 0x00000000
2023-02-26T23:33:36.023Z Er(02) vthread-2108266 - vmiop_log: (0x0): TDR_DUMP:0x00002214 0x00000000 0x00000000 0x00000000

2023-02-26T23:33:36.024Z Er(02) vthread-2108266 - vmiop_log: (0x0): TDR_DUMP:0x64726148 0x00736964 0x00000000 0x00000000
2023-02-26T23:33:36.068Z Er(02) vthread-2108265 - vmiop_log: (0x0): XID 43 detected on physical_chid:0x600, guest_chid:0x0

One can see XID code 45, as well as XID code 43, which after looking up on NVIDIA’s document, states:

  • XID 43 – GPU stopped processing
    • Possible Cause: Driver Error
    • Possible Cause: User App Error
  • XID 45 – Preemptive cleanup, due to previous errors — Most likely to see when running multiple cuda applications and hitting a DBE
    • Possible Cause: Driver Error

In the situation above, one can deduce that the issue is either Driver Error, Application Error, or a combination of both. In this specific case, you could try changing drivers to troubleshoot.

vGPU Licensing

You may experience issues in your vGPU deployment due to licensing issues. Depending on how you have you environment configured, you may be running in an unlicensed mode and not be aware.

In the event that the vGPU driver cannot obtain a valid license, it will run for 20 minutes with full capabilities. After that the performance and functionality will start to degrade. After 24 hours it will degrade even further.

Some symptoms of issues experienced when unlicensed:

  • Users experiencing laggy VDI sessions
  • Performance issues
  • Frames per Second (FPS) limited to 15 fps or 3 fps
  • Applications using OpenCL, CUDA, or other accelerated APIs fail

Additionally, some error messages and event logs may occur:

  • Event ID 2, “NVIDIA OpenGL Driver” – “The NVIDIA OpenGL driver has not been able to initialize a connection with the GPU.”
  • AutoCAD/Revit – “Hardware Acceleration is disabled. Software emulation mode is in use.”
  • “Guest is unlicensed”

Please see below for screenshots of said errors:

Additonally, when looking at the Virtual Machine VM vmware.log (inside of the VM’s folder on the ESXi datastore), you may see:

Guest is unlicensed. Cannot allocate more than 0x55 channels!
VGPU message 6 failed, result code: 0x1a

If this occurs, you’ll need to troubleshoot your vGPU licensing and resolve any issues occurring.

vGPU Type (vGPU Profile) mismatch

When using the default (“time-sliced”) vGPU deployment method, only a single vGPU type can be used on virtual machines or containers per physical GPU. Essentially all VMs or containers utilizing the physical GPU must use the same vGPU type.

If the physical GPU card has multiple GPUs (GPU chips), then a different type can be used on each physical GPU chip on the same card. 2 x GPUs on a single card = 2 different vGPU types.

Additionally, if you have multiple cards inside of a single host, the number of vGPU types you can deployed is based off the total number of GPUs across the total number of cards in your host.

If you configure multiple vGPU types and cannot support it, you will have issues starting VMs, as shown below:

Cannot power on VM with vGPU due to insufficient resources
Cannot power on VM with vGPU: Power on Failure, Insuffiecient resources

The error reads as follows:

Power On Failures

vCenter Server was unable to find a suitable host to power on the following virtual machines for the reasons listed below.

Insufficient resources. One or more devices (pciPassthru0) required by VM VDIWS01 are not available on host ESXi-Host.

Additionally, if provisioning via VMware Horizon, you may see: “NVIDIA GRID vGPU Support has detected a mismatch with the supported vGPUs”

Note: If you are using MIG (Multi Instance GPU), this does not apply as different MIG types can be applied to VMs from the same card/GPU.

vGPU or Passthrough with 16GB+ of Video RAM Memory

When attaching a vGPU to a VM, or passing through a GPU to a VM, with 16GB or more of Video RAM (Framebuffer memory), you may run in to a situation where the VM will not boot.

This is because the VM cannot map that large of memory space to be accesible for use.

Please see my blog post GPU or vGPU Passthrough with 16GB+ of video memory, for more information as well as the fix.

vGPU VM Freezes during VMware vMotion

Your users may report issues where their VDI guest VM freezes for a period of time during use. This could be caused due to VMware vMotion moving the virtual machine from one VMware ESXi host to another.

Please see my blog post NVIDIA vGPU VM Freezes during VMware vMotion: vGPU STUN Time for more information.

“ERR!” State

When experiencing issues, you may notice that “nvidia-smi” throws “ERR!” in the view. See the example below:

nvidia-smi showing ERR! error state on VMware ESXi host with vGPU
NVIDIA vGPU “nvidia-smi” reporting “ERR!”

This is an indicator that you’re in a fault or error state, and would recommend checking the ESXi Host log files, and the Virtual Machine log files for XID codes to identify the problem.

vGPU Driver Mismatch

When vGPU is deployed, drivers are installed on the VMware ESXi host (vGPU Manager Driver), as well as the guest VM virtual machine (guest VM driver).

Guest VM vGPU driver mismatch with VMware ESXi host
NVIDIA vGPU Driver Mismatch

These two drivers must be compatible with each other. As per NVIDIA’s Documentation, see below for compatibility:

  • NVIDIA vGPU Manager with guest VM drivers from the same release
  • NVIDIA vGPU Manager with guest VM drivers from different releases within the same major release branch
  • NVIDIA vGPU Manager from a later major release branch with guest VM drivers from the previous branch

Additionally, if you’re using the LTS (Long Term Support Branch), the additional compatibility note applies.

  • NVIDIA vGPU Manager from a later long-term support branch with guest VM drivers from the previous long-term support branch

If you have a vGPU driver mismatch, you’ll likely see Event ID 160 from “nvlddmkm” reporting:

NVIDIA driver version mismatch error: Guest driver is incompatible with host drive.

To resolve this, you’ll need to change drivers on the ESXi host and/or Guest VM to a supported combination.

Upgrading NVIDIA vGPU

When upgrading NVIDIA vGPU drivers on the host, you may experience issues or errors stating that the NVIDIA vGPU modules or services are loaded and in use, stopping your ability to upgrade.

Normally an upgrade would be preformed by placing the host in maintenance mode and running:

esxcli software vib update -d /vmfs/volumes/DATASTORE/Files/vGPU-15/NVD-VGPU-702_525.85.07-1OEM.702.0.0.17630552_21166599.zip

However, this fails due to modules that are loaded and in use by the NVIDIA vGPU Manager Services.

Before attempting to upgrade (or uninstall and re-install), place the host in maintenance mode and run the following command:

/etc/init.d/nvdGpuMgmtDaemon stop

This should allow you to proceed with the upgrade and/or re-install.

VMware Horizon Black Screen

If you experiencing a blank or black screen when connecting to a VDI session with an NVIDIA vGPU on VMware Horizon, it may not even be related to the vGPU deployment.

To troubleshoot the VMware Horizon Black Screen, please review my guide on how to troubleshoot a VMware Horizon Blank Screen.

VM High CPU RDY (High CPU Ready)

CPU RDY (CPU Ready) is a state when a VM is ready and waiting to be scheduled on a physical host’s CPU. In more detail, the VM’s vCPUs are ready to be scheduled on the ESXi host’s pCPUs.

In rare cases, I have observed situations where VMs with a vGPU and high CPU RDY times, experience instability. I believe this is due to timing conflicts with the vGPU’s time slicing, and the VM’s CPU waiting to be scheduled.

To check VM CPU RDY, you can use one of the following methods:

  1. Run “esxtop” from the CLI using the console or SSH
  2. View the hosts performance stats on vCenter
    • Select host, “Monitor”, “Advanced”, “Chart Options”, de-select all, select “Readiness Average %”

When viewing the CPU RDY time in a VDI environment, generally we’d like to see CPU RDY at 3 or lower. Anything higher than 3 may cause latency or user experience issues, or even vGPU issues at higher values.

For your server virtualization environment (non-VDI and no vGPU), CPU Ready times are not as big of a consideration.

vGPU Profiles Missing from VMware Horizon

When using newer GPUs with older versions of VMware Horizon, you may encounter an issue with non-persistent instant clones resulting in a provisioning error.

This is caused by missing vGPU Types or vGPU Profiles, and requires either downloading the latest definitions, or possibly creating your own.

For more information on this issue, please see my post NVIDIA A2 vGPU Profiles Missing from VMware Horizon causing provision failure.

Issues with the VMware Horizon Indirect Display Driver

You may experience vGPU (and GPU) related issues when using certain applications due to the presence of the VMware Horizon Indirect Display Driver in the Virtual Machine. This is due to the application either querying the incorrect Display Adapter (VMware Indirect Display Driver), or due to lack of multi-display adapter support in the application.

The application, when detecting vGPU and/or GPU capabilities, may query the Indirect Display Adapter, instead of the vGPU in the VM. Resulting in failing to detect the vGPU and/or GPU capabilities.

To workaround this issue, uninstall the VMware Horizon Indirect Display Adapter from the Device Manager in the VM. Please note that if you simply disable it, the issue will still occur as the device must be uninstalled from the Device Manager.

Additionally, under normal circumstances you do not want to modify, change, or remove this display adapter. However this is only a workaround if you are experiencing this issue. Subsequent updates of the VMware Horizon agent will re-install this adapter.

For more information on this issue, please see GPU issues with the VMware Horizon Indirect Display Driver.

Please see these these additional external links and resources which may assist.

Oct 252022
 
Screenshot of Horizon Agent for Linux on Ubuntu 22.04 LTS

Today I’m going to show you the process to install Horizon Agent for Linux on Ubuntu 22.04 LTS. We’ll be installing the Horizon Agent for Linux from VMware Horizon 8 version 2209.

The official documentation from VMware is helpful, but unfortunately doesn’t provide all the information to get up and running quickly, which is why I’ve put together this guide as a “Quick Start”.

Please note, that this is just a guide to get to the point where you can install NVIDIA vGPU drivers and have installed the Horizon Agent for Linux on the VM. This will provide you with a persistent VM that you can use with Horizon, and the instructions can be adapted for use in a non-persistent instant clone environment as well.

Screenshot of Horizon Agent for Linux on Ubuntu 22.04 LTS
Horizon Agent for Linux on Ubuntu 22.04 LTS

I highly recommend reading VMware’s documentation for Linux Desktops and Applications in Horizon.

Requirements

  • VMware Horizon 8 (I’m running VMware Horizon 8 2209)
  • Horizon Enterprise or Horizon for Linux Licensing
  • Ubuntu 22.04 LTS Installer ISO (download here)
  • Horizon Agent for Linux (download here)
  • Functioning internal DNS

Instructions

  1. Create a VM on your vCenter Server, attached the Ubuntu 22.04 LTS ISO, and install Ubuntu
  2. Install any Root CA’s or modifications you need for network access (usually not needed unless you’re on an enterprise network)
  3. Update Ubuntu as root
    apt update
    apt upgrade
    reboot
  4. Install software needed for VMware Horizon Agent for Linux as root
    apt install make gcc libglvnd-dev open-vm-tools open-vm-tools-dev open-vm-tools-desktop
  5. Install your software (Chrome, etc.)
  6. Install NVIDIA vGPU drivers if you are using NVIDIA vGPU (this must be performed before install the Horizon Agent). Make sure the installer modifies and configures the X configuration files.
  7. Install the Horizon Agent For Linux as root (accepting TOS, enabling audio, and disabling SSO).
    See Command-line Options for Installing Horizon Agent for Linux
    ./install_viewagent.sh -A yes -a yes -S no
  8. Reboot the Ubuntu VM
  9. Log on to your Horizon Connection Server
  10. Create a manual pool and configure it
  11. Add the Ubuntu 22.04 LTS VM to the manual desktop pool
  12. Entitle the User account to the desktop pool and assign to the VM
  13. Connect to the Ubuntu 22.04 Linux VDI VM from the VMware Horizon Client

You should now be able to connect to the Ubuntu Linux VDI VM using the VMware Horizon client. Additionally, if you installed the vGPU drivers for NVIDIA vGPU, you should have full 3D acceleration and functionality.

Oct 032022
 
NVIDIA A2 vGPU

When deploying automated desktop pools with NVIDIA vGPU on VMware Horizon with an NVIDIA A2 GPU, you may notice provisioning fails with an error.

Error during Provisioning Cloning of VM VM-NAME-01 has failed: Fault type is UNKNOWN_FAULT_FATAL - No GPU capable host available for provisioning VM-NAME-01 with profile nvidia_a2-4q. Please refer to VMware KB 59271 for more details.

Further, when visiting VMware KB 59271 and performing the instructions, provisioning still continues to fail.

Screenshot of error message Automated vGPU Desktop Pool fails to provision due to missing vGPU profiles
Automated vGPU Desktop Pool fails to provision due to missing vGPU profiles

Essentially, at present there is no “supported” to resolve this issue without applying the fix listed in this post. Additionally, if you’re a VMware customer with an active support agreement, I would recommend opening a ticket with VMware Support so that it can be addressed in a future release.

The Problem

The NVIDIA A2 GPU is fairly new, along with VMware vSphere support. Even newer, is the support for vGPU and VMware Horizon, requiring the latest drivers (vGPU Drivers versions 14.2 released August 2022) to enable vGPU profiles for the card.

After troubleshooting this, I noted that the “graphic-profiles.properties” file in “C:\Program Files\VMware\VMware View\Server\broker\conf” did not contain any NVIDIA A2 vGPU Profiles. Additionally, the file available on the VMware KB was also missing these profiles.

The Fix

To fix this, I referenced the NVIDIA vGPU User Guide to note the vGPU profiles allowed on the card, and created my own entries for the configuration file.

After adding these entries, restarting the server (or service), I was able to provision NVIDIA A2 enabled vGPU desktop pools.

To resolve this issue, add the following entries to your “graphic-profiles.properties” file in “C:\Program Files\VMware\VMware View\Server\broker\conf” (note, the contents of the file is case-sensitive):

# NVIDIA A2 Profiles
# Q-Series Virtual GPU Types for NVIDIA A2
nvidia_a2-16q=1
nvidia_a2-8q=2
nvidia_a2-4q=4
nvidia_a2-2q=8
nvidia_a2-1q=16

# B-Series Virtual GPU Types for NVIDIA A2
nvidia_a2-2b=8
nvidia_a2-1b=16

# C-Series Virtual GPU Types for NVIDIA A2
nvidia_a2-16c=1
nvidia_a2-8c=2
nvidia_a2-4c=4

# A-Series Virtual GPU Types for NVIDIA A2
nvidia_a2-16a=1
nvidia_a2-8a=2
nvidia_a2-4a=4
nvidia_a2-2a=8
nvidia_a2-1a=16

After restarting the server or services, you should now be able to use the NVIDIA A2 vGPU profiles with VMware Horizon automated (vGPU) desktop pools.

You should be able to use this fix for other new vGPU cards that have been recently released where the profiles have not been configured for Horizon. VMware is likely to fix this in future released of VMware Horizon.

Aug 142022
 
HP Printer on VDI

When it comes to troubleshooting login times with non-persistent VDI (VMware Horizon Instant Clones), I often find delays associated with printer drivers not being included in the golden image. In this post, I’m going to show you how to add a printer driver to an Instant Clone golden image!

Printing with non-persistent VDI and Instant Clones

In most environments, printers will be mapped for users during logon. If a printer is mapped or added and the driver is not added to the golden image, it will usually be retrieved from the print server and installed, adding to the login process and ultimately leading to a delay.

Due of the nature of non-persistent VDI and Instant Clones, every time the user goes to login and get’s a new VM, the driver will then be downloaded and installed each of these times, creating a redundant process wasting time and network bandwidth.

To avoid this, we need to inject the required printer drivers in to the golden image. You can add numerous drivers and should include all the drivers that any and all the users are expecting to use.

An important consideration: Try using Universal Print Drivers as much as possible. Universal Printer Drivers often support numerous different printers, which allows you to install one driver to support many different printers from the same vendor.

How to add a printer driver to an instant clone golden image

Below, I’ll show you how to inject a driver in to the Instant Clone golden image. Note that this doesn’t actually add a printer, but only installs the printer driver in to the Windows operating system so it is available for a printer to be configured and/or mapped.

Let’s get started! In this example we’ll add the HP Universal Driver. These instructions work on both Windows 10 and Windows 11 (as well as Windows Server operating systems):

  1. Click Start, type in “Print Management” and open the “Print Management”. You can also click Start, Run, and type “printmanagement.msc”.
    Launch Print Management
  2. On the left hand side, expand “Print Servers”, then expand your computer name, and select “Drivers”.
    Print Management Drivers
  3. Right click on “Drivers” and select “Add Driver”.
    Print Management Add Driver
  4. When the “Welcome to the Add Printer Driver Wizard” opens, click Next.
    Add Printer Driver Wizard
  5. Leave the default for the architecture. It should default to the architecture of the golden image.
  6. When you are at the “Printer Driver Selection” stage, click on “Have Disk”.
    Print Management Add Printer Driver Location
  7. Browse to the location of your printer driver. In this example, we navigate to the extracted HP Universal Print Driver.
    Browse Printer Driver Location
  8. Select the driver you want to install.
    VDI Select Printer Driver to Install
  9. Click on Finish to complete the driver installation.
    Finish installing Instant Clone Printer Driver

The driver you installed should now appear in the list as it has been installed in to the operating system and is now available should a user add a printer, or have a printer automatically mapped.

Screenshot of Printer Driver installed on non-persistent VDI Instant Clone golden image
Printer Driver installed on Non-Persistent Instance Clone Golden Image

Now seal, snap, and deploy your image, and you’re good to go!

Jul 172022
 
VMware vSphere ESXi with vTPM from NKP

It’s been coming for a while: The requirement to deploy VMs with a TPM module… Today I’ll be showing you the easiest and quickest way to create and deploy Virtual Machines with vTPM on VMware vSphere ESXi!

As most of you know, Windows 11 has a requirement for Secureboot as well as a TPM module. It’s with no doubt that we’ll also possibly see this requirement with future Microsoft Windows Server operating systems.

While users struggle to deploy TPM modules on their own workstations to be eligible for the Windows 11 upgrade, ESXi administrators are also struggling with deploying Virtual TPM modules, or vTPM modules on their virtualized infrastructure.

What is a TPM Module?

TPM stands for Trusted Platform Module. A Trusted Platform Module, is a piece of hardware (or chip) inside or outside of your computer that provides secured computing features to the computer, system, or server that it’s attached to.

This TPM modules provides things like a random number generator, storage of encryption keys and cryptographic information, as well as aiding in secure authentication of the host system.

In a virtualization environment, we need to emulate this physical device with a Virtual TPM module, or vTPM.

What is a Virtual TPM (vTPM) Module?

A vTPM module is a virtualized software instance of a traditional physical TPM module. A vTPM can be attached to Virtual Machines and provide the same features and functionality that a physical TPM module would provide to a physical system.

vTPM modules can be can be deployed with VMware vSphere ESXi, and can be used to deploy Windows 11 on ESXi.

Deployment of vTPM modules, require a Key Provider on the vCenter Server.

For more information on vTPM modules, see VMware’s “Virtual Trust Platform Module Overview” documentation.

Deploying vTPM (Virtual TPM Modules) on VMware vSphere ESXi

In order to deploy vTPM modules (and VM encryption, vSAN Encryption) on VMware vSphere ESXi, you need to configure a Key Provider on your vCenter Server.

Traditionally, this would be accomplished with a Standard Key Provider utilizing a Key Management Server (KMS), however this required a 3rd party KMS server and is what I would consider a complex deployment.

VMware has made this easy as of vSphere 7 Update 2 (7U2), with the Native Key Provider (NKP) on the vCenter Server.

The Native Key Provider, allows you to easily deploy technologies such as vTPM modules, VM encryption, vSAN encryption, and the best part is, it’s all built in to vCenter Server.

Enabling VMware Native Key Provider (NKP)

To enable NKP across your vSphere infrastructure:

  1. Log on to your vCenter Server
  2. Select your vCenter Server from the Inventory List
  3. Select “Key Providers”
  4. Click on “Add”, and select “Add Native Key Provider”
  5. Give the new NKP a friendly name
  6. De-select “Use key provider only with TPM protected ESXi hosts” to allow your ESXi hosts without a TPM to be able to use the native key provider.

In order to activate your new native key provider, you need to click on “Backup” to make sure you have it backed up. Keep this backup in a safe place. After the backup is complete, you NKP will be active and usable by your ESXi hosts.

Screenshot of VMware vCenter Server with Native Key Provider (NKP) Configured
VMware vCenter with Native Key Provider (NKP) Configured

There’s a few additional things to note:

  • Your ESXi hosts do NOT require a physical TPM module in order to use the Native Key Provider
    • Just make sure you disable the checkbox “Use key provider only with TPM protected ESXi hosts”
  • NKP can be used to enable vTPM modules on all editions of vSphere
  • If your ESXi hosts have a TPM module, using the Native Key Provider with your hosts TPM modules can provide enhanced security
    • Onboard TPM module allows keys to be stored and used if the vCenter server goes offline
  • If you delete the Native Key Provider, you are also deleting all the keys stored with it.
    • Make sure you have it backed up
    • Make sure you don’t have any hosts/VMs using the NKP before deleting

You can now deploy vTPM modules to virtual machines in your VMware environment.

Jun 182022
 
Nvidia GRID Logo

When performing a VMware vMotion on a Virtual Machine with an NVIDIA vGPU attached to it, the VM may freeze during migration. Additionally, when performing a vMotion on a VM without a vGPU, the VM does not freeze during migration.

So why is it that adding a vGPU to a VM causes it to become frozen during vMotion? This is referred to as the VM Stun Time.

I’m going to explain why this happens, and what you can do to reduce these STUN times.

VMware vMotion

First, let’s start with traditional vMotion without a vGPU attached.

VMware vMotion with vSphere and ESXi
VMware vMotion with vSphere

vMotion allows us to live migrate a Virtual Machine instance from one ESXi host, to another, with (visibly) no downtime. You’ll notice that I put “visibly” in brackets…

When performing a vMotion, vSphere will migrate the VM’s memory from the source to destination host and create checkpoints. It will then continue to copy memory deltas including changes blocks after the initial copy.

Essentially vMotion copies the memory of the instance, then initiates more copies to copy over the changes after the original transfer was completed, until the point where it’s all copied and the instance is now running on the destination host.

VMware vMotion with vGPU

For some time, we have had the ability to perform a vMotion with a VM that as a GPU attached to it.

VMware vSphere with NVIDIA vGPU
VMware VMs with vGPU

However, in this situation things work slightly different. When performing a vMotion, it’s not only the system RAM memory that needs to be transferred, but the GPU’s memory (VRAM) as well.

Unfortunately the checkpoint/delta transfer technology that’s used with then system RAM isn’t available to transfer the GPU, which means that the VM has to be stunned (frozen) to stop it so that the video RAM can be transferred and then the instance can be initialized on the destination host.

STUN Time

The STUN time is essentially the time it takes to transfer the video RAM (framebuffer) from one host to another.

When researching this, you may find examples of the time it takes to transfer various sizes of VRAM. An example would be from VMware’s documentation “Using vMotion to Migrate vGPU Virtual Machines“:

NVIDIA vGPU Estimated STUN Times
Expected STUN Times for vMotion with vGPU on 10Gig vMotion NIC

However, it will always vary depending on a number of factors. These factors include:

  • vMotion Network Speed
  • vMotion Network Optimization
    • Multi-NIC vMotion to utilize multiple NICs
    • Multi-vmk vMotion to optimize and saturate single NICs
  • Server Load
  • Network Throughput
  • The number of VM’s that are currently being migrated with vMotion

As you can see, there’s a number of things that play in to this. If you have a single 10Gig link for vMotion and you’re migrating many VMs with a vGPU, it’s obviously going to take longer than if you were just migrating a single VM with a vGPU.

Optimizing and Minimizing vGPU STUN Time

There’s a number of things we can look at to minimize the vGPU STUN times. This includes:

  • Upgrading networking throughput with faster NICs
  • Optimizing vMotion (Configure multiple vMotion VMK adapters to saturate a NIC)
  • Configure Multi-NIC vMotion (Utilize multiple physical NICs to increase vMotion throughput)
  • Reduce DRS aggressiveness
  • Migrate fewer VMs at the same time

All of the above can be implemented together, which I would actually recommend.

In short, the faster we migrate the VM, the less the STUN Time will be. Check out my blog post on Optimizing VMware vMotion which includes how to perform the above recommendations.

Hope this helps!