If you’re like me, you want to make sure that your environment is as optimized as possible. I recently noticed that my NVIDIA A2 vGPU was reporting the vGPU PCIe Link Speed and Generation that the card was using was below what it was supposed to be running at on my VMware vSphere ESXi host.
I needed to find out if this was being reported incorrectly, if there was an issue, or something else effecting this. In my case, the specific GPU I was using is supposed to support PCIe Gen4, and has a physical connector supporting 4x, my host has PCIe Gen3 slots, so I should at least be getting Gen3 speeds.
When running the command “nvidia-smi -q”, the GPU was reporting that it was only running at PCIe Gen 1 speeds, as shown below:
GPU Link Info
Max : 3
Current : 1
Device Current : 1
Device Max : 4
Host Max : N/A
Max : 16x
Current : 8x
Something else worth noting, is that the card states that it supports a 16x interface, when it actually only physical has a 8x connector. I believe they use this chip on another board that has multiple GPUs on a single board that actually supports 16x.
You could say I was quite puzzled. Why would the card only be running at PCIe Generation 1 speeds? I thought it could be any of the scenarios below:
Dynamic mode that alternates when required (possibly for power savings)
Hardware Limitation (I’m using this in an older server)
Unfortunately, when searching the internet, I couldn’t find many references to this metric, however I did find references from other user’s copy/pastes of “nvidia-smi -q” which had the same values (running PCIe Gen1), even with beefier and more high-end cards.
After some more searching, I finally came across an NVIDIA technical document titled “Useful nvidia-smi Queries” that states that the current PCIe Generation Link speed “may be reduced when the GPU is not in use”. This confirms that it’s dynamic and adjusts when needed.
Finally, I decided to give some games a shot in a couple of the VMs, and to my surprise when running a game, the “Device Current” and “Current” PCIe Generation changed to PCIe Gen3 (note that my server isn’t capable of PCIe Gen4, which is the cards maximum), as shown below:
GPU Link Info
Max : 3
Current : 3
Device Current : 3
Device Max : 4
Host Max : N/A
Max : 16x
Current : 8x
In conclusion, if you notice this in your environment, do not be alarmed as this is completely normal and expected behavior.
When deploying automated desktop pools with NVIDIA vGPU on VMware Horizon with an NVIDIA A2 GPU, you may notice provisioning fails with an error.
Error during Provisioning Cloning of VM VM-NAME-01 has failed: Fault type is UNKNOWN_FAULT_FATAL - No GPU capable host available for provisioning VM-NAME-01 with profile nvidia_a2-4q. Please refer to VMware KB 59271 for more details.
Further, when visiting VMware KB 59271 and performing the instructions, provisioning still continues to fail.
Essentially, at present there is no “supported” to resolve this issue without applying the fix listed in this post. Additionally, if you’re a VMware customer with an active support agreement, I would recommend opening a ticket with VMware Support so that it can be addressed in a future release.
The NVIDIA A2 GPU is fairly new, along with VMware vSphere support. Even newer, is the support for vGPU and VMware Horizon, requiring the latest drivers (vGPU Drivers versions 14.2 released August 2022) to enable vGPU profiles for the card.
After troubleshooting this, I noted that the “graphic-profiles.properties” file in “C:\Program Files\VMware\VMware View\Server\broker\conf” did not contain any NVIDIA A2 vGPU Profiles. Additionally, the file available on the VMware KB was also missing these profiles.
To fix this, I referenced the NVIDIA vGPU User Guide to note the vGPU profiles allowed on the card, and created my own entries for the configuration file.
After adding these entries, restarting the server (or service), I was able to provision NVIDIA A2 enabled vGPU desktop pools.
To resolve this issue, add the following entries to your “graphic-profiles.properties” file in “C:\Program Files\VMware\VMware View\Server\broker\conf” (note, the contents of the file is case-sensitive):
After restarting the server or services, you should now be able to use the NVIDIA A2 vGPU profiles with VMware Horizon automated (vGPU) desktop pools.
You should be able to use this fix for other new vGPU cards that have been recently released where the profiles have not been configured for Horizon. VMware is likely to fix this in future released of VMware Horizon.
When either directly passing through a GPU, or attaching an NVIDIA vGPU to a Virtual Machine on VMware ESXi that has more than 16GB of Video Memory, you may run in to a situation where the VM fails to boot with the error “Module ‘DevicePowerOn’ power on failed.”. Special considerations are required when performing GPU or vGPU Passthrough with 16GB+ of video memory.
This issue is specifically caused by memory mapping a GPU or vGPU device that has 16GB of memory or higher, and could involve both the host system (the ESXi host) and/or the Virtual Machine configuration.
In this post, I’ll address the considerations and requirements to passthrough these devices to virtual machines in your environment.
In the order of occurrence, it’s usually VM configuration related, however if the recommendations in the “VM Configuration Considerations” section do not resolve the issue, please proceed to reviewing the “ESXi Host Considerations” section.
Please note that if the issue is host related, other errors may be present, or the device may not even be visible to ESXi.
VM GPU and vGPU Configuration Considerations
First and foremost, all new VMs should be created using the “EFI” Firmware type. EFI provides numerous advantages in device access and memory mapping versus the older style “BIOS” firmware types.
To do this, create a new virtual machine, navigate to “VM Options”, expand “Boot Options”, and confirm/change the Firmware to “EFI”. I recommend this for all new VMs, and not only for VMs accessing GPUs or vGPUs with over 16GB of memory. Please note that you shouldn’t change an existing VM, and should do this on a fresh new VM.
With performing GPU or vGPU Passthrough with 16GB+ of video memory, you’ll need to create a couple of entries under “Advanced” settings to properly configure access to these PCIe devices and provide the proper environment for memory mapping. The lack of these settings is specifically what causes the “Module ‘DevicePowerOn’ power on failed.” error.
Under the VM settings, head over to “VM Options”, expand “Advanced” and click on “Edit Configuration”, click on “Add Configuration Params”, and add the following entries:
You’ll notice that while our GPU or vGPU profile may have 16GB of memory, we need to double that value, and set it for the “pciPassthru.64bitMMIOSizeGB” variable. If your card or vGPU profile had 32GB, you’d set it to “64”.
Additionally if you were passing through multiple GPUs or vGPU devices, you’d need to factor all the memory being mapped, and double the combined amount.
ESXi GPU and vGPU Host Considerations
On most new and modern servers, the host level doesn’t require any special configuration as they are already designed to pass through such devices to the hypervisor properly. However in some special cases, and/or when using older servers, you may need to modify configuration and settings in the UEFI or BIOS.
If setting the VM Configuration above still results in the same error (or possibly other errors), than you most likely need to make modifications to the ESXi hosts BIOS/UEFI/RBSU to allow the proper memory mapping of the PCIe device, in our case being the GPU.
This is where things get a bit tricky because every server manufacturer has different settings that will need to be configured.
Look for the following settings, or settings with similar terminology:
“Memory Mapping Above 4G”
“Above 4G Decoding”
“PCI Express 64-Bit BAR Support”
“64-Bit IOMMU Mapping”
Once you find the correct setting or settings, enable them.
Every vendor could be using different terminology and there may be other settings that need to be configured that I don’t have listed above. In my case, I had to go in to a secret “SERVICE OPTIONS” menu on my HPE Proliant DL360p Gen8, as documented here.
After performing the recommendations in this guide, you should now be able to passthrough devices with over 16GB of memory.
When performing a VMware vMotion on a Virtual Machine with an NVIDIA vGPU attached to it, the VM may freeze during migration. Additionally, when performing a vMotion on a VM without a vGPU, the VM does not freeze during migration.
So why is it that adding a vGPU to a VM causes it to become frozen during vMotion? This is referred to as the VM Stun Time.
I’m going to explain why this happens, and what you can do to reduce these STUN times.
First, let’s start with traditional vMotion without a vGPU attached.
vMotion allows us to live migrate a Virtual Machine instance from one ESXi host, to another, with (visibly) no downtime. You’ll notice that I put “visibly” in brackets…
When performing a vMotion, vSphere will migrate the VM’s memory from the source to destination host and create checkpoints. It will then continue to copy memory deltas including changes blocks after the initial copy.
Essentially vMotion copies the memory of the instance, then initiates more copies to copy over the changes after the original transfer was completed, until the point where it’s all copied and the instance is now running on the destination host.
VMware vMotion with vGPU
For some time, we have had the ability to perform a vMotion with a VM that as a GPU attached to it.
However, in this situation things work slightly different. When performing a vMotion, it’s not only the system RAM memory that needs to be transferred, but the GPU’s memory (VRAM) as well.
Unfortunately the checkpoint/delta transfer technology that’s used with then system RAM isn’t available to transfer the GPU, which means that the VM has to be stunned (frozen) to stop it so that the video RAM can be transferred and then the instance can be initialized on the destination host.
The STUN time is essentially the time it takes to transfer the video RAM (framebuffer) from one host to another.
However, it will always vary depending on a number of factors. These factors include:
vMotion Network Speed
vMotion Network Optimization
Multi-NIC vMotion to utilize multiple NICs
Multi-vmk vMotion to optimize and saturate single NICs
The number of VM’s that are currently being migrated with vMotion
As you can see, there’s a number of things that play in to this. If you have a single 10Gig link for vMotion and you’re migrating many VMs with a vGPU, it’s obviously going to take longer than if you were just migrating a single VM with a vGPU.
Optimizing and Minimizing vGPU STUN Time
There’s a number of things we can look at to minimize the vGPU STUN times. This includes:
Upgrading networking throughput with faster NICs
Optimizing vMotion (Configure multiple vMotion VMK adapters to saturate a NIC)
Originally when an environment was configured with an Nvidia GRID K1 or K2 card, not only does the card provide 3D acceleration and rendering, but it also offloads the VMware BLAST h264 stream (the visual session) so that the CPU doesn’t have to. This results in less CPU usage and provides a streamlined experience for the user.
This functionality was handled via NVFBC (Nvidia Frame Buffer Capture) which was part of the Nvidia Capture SDK (formerly known as GRID SDK). This function allowed the video card to capture the video frame buffer and encode it using NVENC (Nvidia Encoder).
Ultimately after spending hours troubleshooting, I learned that NVFBC has been deprecated and is no longer support, hence why it’s no longer functioning. I also checked and noticed that tools (such as nvfbcenable) were no longer bundled with the VMware Horizon agent. One can assume that the agent doesn’t even attempt to check or use this function.
Before I was aware of this, I noticed that while 3D Acceleration and graphics were functioning, I was experiencing high CPU usage. Upon further investigation I noticed that my VMware BLAST sessions were not offloading h264 encoding to the video card.
You’ll notice above that under the “Encoder” section, the “Encoder Name” was listed as “h264 4:2:0”. Normally this would say “NVIDIA NvEnc H264” (or “NVIDIA NvEnc HEVC” on newer cards) if it was being offloaded to the GPU.
Looking at a VMware Blast session (Blast-Worker-SessionId1.log), the following lines can be seen.
You’ll notice it tries to load the proper functions, however it fails.
Unfortunately the only solution is to upgrade to newer or different hardware.
The GRID K1 and GRID K2 cards have reached their EOL (End of Life) and are no longer support. The drivers are not being maintained or updated so I doubt they will take advantage of the newer frame buffer capture functions of Windows 10.
Newer hardware and solutions have incorporated this change and use a different means of frame buffer capture.
To resolve this in my own homelab, I plan to migrate to an AMD FirePro S7150x2.
Since I’ve installed and configured my Nvidia GRID K1, I’ve been wanting to do a graphics quality demo video. I finally had some time to put a demo together.
I wanted to highlight what type of graphics can be achieved in a VDI environment. Even using an old Nvidia GRID K1 card, we can still achieve amazing graphical performance in a virtual desktop environment.
This demo outlines 3D accelerated graphics provided by vGPU.
Please see below for the video:
VMware Horizon View 7.8
NVidia GRID K1
GRID vGPU Profile: GRID K180q
HPE ML310e Gen8 V2
ESXi 6.5 U2
Virtual Desktop: Windows 10 Enterprise
Game: Steam – Counter-Strike Global Offensive (CS:GO)
Resolution of the Virtual Desktop is set to 1024×768
Blast Extreme is the protocol used
Graphics on game are set to max
Motion is smooth in person, screen recorder caused some jitter
This video was then edited on that VM using CyberLink PowerDirector
I can’t tell you how excited I am that after many years, I’ve finally gotten my hands on and purchased an Nvidia Quadro K1 GPU. This card will be used in my homelab to learn, and demo Nvidia GRID accelerated graphics on VMware Horizon View. In this post I’ll outline the details, installation, configuration, and thoughts. And of course I’ll have plenty of pictures below!
The focus will be to use this card both with vGPU, as well as 3D accelerated vSGA inside in an HPE server running ESXi 6.5 and VMware Horizon View 7.8.
Please Note: Some, most, or all of what I’m doing is not officially supported by Nvidia, HPE, and/or VMware. I am simply doing this to learn and demo, and there was a real possibility that it may not have worked since I’m not following the vendor HCL (Hardware Compatibility lists). If you attempt to do this, or something similar, you do so at your own risk.
For some time I’ve been trying to source either an Nvidia GRID K1/K2 or an AMD FirePro S7150 to get started with a simple homelab/demo environment. One of the reasons for the time it took was I didn’t want to spend too much on it, especially with the chances it may not even work.
Essentially, I have 3 Servers:
HPE DL360p Gen8 (Dual Proc, 128GB RAM)
HPE DL360p Gen8 (Dual Proc, 128GB RAM)
HPE ML310e Gen8 v2 (Single Proc, 32GB RAM)
For the DL360p servers, while the servers are beefy enough, have enough power (dual redundant power supplies), and resources, unfortunately the PCIe slots are half-height. In order for me to use a dual-height card, I’d need to rig something up to have an eGPU (external GPU) outside of the server.
As for the ML310e, it’s an entry level tower server. While it does support dual-height (dual slot) PCIe cards, it only has a single 350W power supply, misses some fancy server technologies (I’ve had issues with VT-d, etc), and only a single processor. I should be able to install the card, however I’m worried about powering it (it has no 6pin PCIe power connector), and having ESXi be able to use it.
Finally, I was worried about cooling. The GRID K1 and GRID K2 are typically passively cooled and meant to be installed in to rack servers with fans running at jet engine speeds. If I used the DL360p with an external setup, this would cause issues. If I used the ML310e internally, I had significant doubts that cooling would be enough. The ML310e did have the plastic air baffles, but only had one fan for the expansion cards area, and of course not all the air would pass through the GRID K1 card.
Because of a limited budget, and the possibility I may not even be able to get it working, I didn’t want to spend too much. I found an eBay user local in my city who had a couple Grid K1 and Grid K2 cards, as well as a bunch of other cool stuff.
We spoke and he decided to give me a wicked deal on the Grid K1 card. I thought this was a fantastic idea as the power requirements were significantly less (more likely to work on the ML310e) on the K1 card at 130 W max power, versus the K2 card at 225 W max power.
We set a time and a place to meet. Preemptively I ran out to a local supply store to purchase an LP4 power adapter splitter, as well as a LP4 to 6pin PCIe power adapter. There were no available power connectors inside of the ML310e server so this was needed. I still thought the chances of this working were slim…
I also decided to go ahead and download the Nvidia GRID Software Package. This includes the release notes, user guide, ESXi vib driver (includes vSGA, vGPU), as well as guest drivers for vGPU and pass through. The package also includes the GRID vGPU Manager. The driver I used was from: https://www.nvidia.com/Download/driverResults.aspx/144909/en-us
To install, I copied over the vib file “NVIDIA-vGPU-kepler-VMware_ESXi_6.5_Host_Driver_367.130-1OEM.6220.127.116.1198673.vib” to a datastore, enabled SSH, and then ran the following command to install:
The command completed successfully and I shut down the host. Now I waited to meet.
We finally met and the transaction went smooth in a parking lot (people were staring at us as I handed him cash, and he handed me a big brick of something folded inside of grey static wrap). The card looked like it was in beautiful shape, and we had a good but brief chat. I’ll definitely be purchasing some more hardware from him.
Installing the card in the ML310e was difficult and took some time with care. First I had to remove the plastic air baffle. Then I had issues getting it inside of the case as the back bracket was 1cm too long to be able to put the card in. I had to finesse and slide in on and angle but finally got it installed. The back bracket (front side of case) on the other side slid in to the blue plastic case bracket. This was nice as the ML310e was designed for extremely long PCIe expansion cards and has a bracket on the front side of the case to help support and hold the card up as well.
For power I disconnected the DVD-ROM (who uses those anyways, right?), and connected the LP5 splitter and the LP5 to 6pin power adapter. I finally hooked it up to the card.
I laid the cables out nicely and then re-installed the air baffle. Everything was snug and tight.
Please see below for pictures of the Nvidia GRID K1 installed in the ML310e Gen8 V2.
Powering on the server was a tense moment for me. A few things could have happened:
Server won’t power on
Server would power on but hang & report health alert
Nvidia GRID card could overheat
Nvidia GRID card could overheat and become damaged
Nvidia GRID card could overheat and catch fire
Server would boot but not recognize the card
Server would boot, recognize the card, but not work
Server would boot, recognize the card, and work
With great suspense, the server powered on as per normal. No errors or health alerts were presented.
I logged in to iLo on the server, and watched the server perform a BIOS POST, and start it’s boot to ESXi. Everything was looking well and normal.
After ESXi booted, and the server came online in vCenter. I went to the server and confirmed the GRID K1 was detected. I went ahead and configured 2 GPUs for vGPU, and 2 GPUs for 3D vSGA.
I restarted the X.org service (required when changing the options above), and proceeded to add a vGPU to a virtual machine I already had configured and was using for VDI. You do this by adding a “Shared PCI Device”, selecting “NVIDIA GRID vGPU”, and I chose to use the highest profile available on the K1 card called “grid_k180q”.
After adding and selecting ok, you should see a warning telling you that must allocate and reserve all resources for the virtual machine, click “ok” and continue.
Power On and Testing
I went ahead and powered on the VM. I used the vSphere VM console to install the Nvidia GRID driver package (included in the driver ZIP file downloaded earlier) on the guest. I then restarted the guest.
After restarting, I logged in via Horizon, and could instantly tell it was working. Next step was to disable the VMware vSGA Display Adapter in the “Device Manager” and restart the host again.
Upon restarting again, to see if I had full 3D acceleration, I opened DirectX diagnostics by clicking on “Start” -> “Run” -> “dxdiag”.
It worked! Now it was time to check the temperature of the card to make sure nothing was overheating. I enabled SSH on the ESXi host, logged in, and ran the “nvidia-smi” command.
According to this, the different GPUs ranged from 33C to 50C which was PERFECT! Further testing under stress, and I haven’t gotten a core to go above 56. The ML310e still has an option in the BIOS to increase fan speed, which I may test in the future if the temps get higher.
With “nvidia-smi” you can see the 4 GPUs, power usage, temperatures, memory usage, GPU utilization, and processes. This is the main GPU manager for the card. There are some other flags you can use for relevant information.
Overall I’m very impressed, and it’s working great. While I haven’t tested any games, it’s working perfect for videos, music, YouTube, and multi-monitor support on my 10ZiG 5948qv. I’m using 2 displays with both running at 1920×1080 for resolution.
I’m looking forward to doing some tests with this VM while continuing to use vGPU. I will also be doing some testing utilizing 3D Accelerated vSGA.
The two coolest parts of this project are:
3D Acceleration and Hardware h.264 Encoding on VMware Horizon
Getting a GRID K1 working on an HPE ML310e Gen8 v2
Highly recommend getting a setup like this for your own homelab!
Uses and Projects
Well, I’m writing this “Uses and Projects” section after I wrote the original article (it’s now March 8th, 2020). I have to say I couldn’t be impressed more with this setup, using it as my daily driver.
Since I’ve set this up, I’ve used it remotely while on airplanes, working while travelling, even for video editing.
Some of the projects (and posts) I’ve done, can be found here:
Essentially Fermi based GPUs utilize WDDM 1.3 mode, whereas the newer architectures of Maxwell and Kepler support WDDM 2.0. In Windows 10, it is not able to load multiple display drivers using different WDDM versions.
For a really long time I waited and no updates enabled the functionality until September when I performed an update, and out of nowhere they started to work. I assumed they fixed the issue permanently, however after updating once again, I lost the capabilities. In this case I reverted to the last driver.
I’m not sure if they updated the Fermi driver to support WDDM 2.0, but I just know it started working. And then after a short while, with another driver update stopped working again. Again, the driver rollback fixed the issue.
I recently upgraded to the latest build of Windows 10, and completely lost the ability once again, and lost the ability to rollback drivers.
It was time to find out exactly what driver version WORKS with both Kepler, Fermi, and Maxwell architectures.
After playing around, I found the WORKING NVidia driver version to be: 358.50
Load this version up, and you’ll be good to go! Hope it saves you some time!
Privacy & Cookies Policy