Yesterday, December 12th 2019, I powered on my Windows 10 1909 workstation to see that the start menu wasn’t working, along with the notification tray. I could launch programs from the taskbar, but search, start, and notifications were not functioning.
Since my workstation is running as a VDI instance, I checked vSphere and noticed the VM was running at extremely high CPU. Inside of the workstation, I opened up the event log and found numerous errors pertaining to the User Shell Experience, as well as multiple Windows 10 apps (UWP apps).
I tried to troubleshoot this using multiple methods found online on google. It sounds like this is a common issue for the past couple months, but no one has been able to find a fix.
Finally, after 14 hours of frusteration, I finally decided to restore the workstaton (VM) from a snapshot backup the night before. Powering it on the start menu was working. I installed some updates and everything is still working great.
If anyone has any information on this, please post it in the comments! I was surprised this isn’t easily fixable and actually required a restore from backup. I’m assuming numerous others are experiencing this issue.
I can’t tell you how excited I am that after many years, I’ve finally gotten my hands on and purchased an Nvidia Quadro K1 GPU. This card will be used in my homelab to learn, and demo Nvidia GRID accelerated graphics on VMware Horizon View. In this post I’ll outline the details, installation, configuration, and thoughts. And of course I’ll have plenty of pictures below!
The focus will be to use this card both with vGPU, as well as 3D accelerated vSGA inside in an HPe server running ESXi 6.5 and VMware Horizon View 7.8.
Please Note: Some, most, or all of what I’m doing is not officially supported by Nvidia, HPe, and/or VMware. I am simply doing this to learn and demo, and there was a real possibility that it may not have worked since I’m not following the vendor HCL (Hardware Compatibility lists). If you attempt to do this, or something similar, you do so at your own risk.
For some time I’ve been trying to source either an Nvidia GRID K1/K2 or an AMD FirePro S7150 to get started with a simple homelab/demo environment. One of the reasons for the time it took was I didn’t want to spend too much on it, especially with the chances it may not even work.
Essentially, I have 3 Servers:
HPe DL360p Gen8 (Dual Proc, 128GB RAM)
HPe DL360p Gen8 (Dual Proc, 128GB RAM)
HPe ML310e Gen8 v2 (Single Proc, 32GB RAM)
For the DL360p servers, while the servers are beefy enough, have enough power (dual redundant power supplies), and resources, unfortunately the PCIe slots are half-height. In order for me to use a dual-height card, I’d need to rig something up to have an eGPU (external GPU) outside of the server.
As for the ML310e, it’s an entry level tower server. While it does support dual-height (dual slot) PCIe cards, it only has a single 350W power supply, misses some fancy server technologies (I’ve had issues with VT-d, etc), and only a single processor. I should be able to install the card, however I’m worried about powering it (it has no 6pin PCIe power connector), and having ESXi be able to use it.
Finally, I was worried about cooling. The GRID K1 and GRID K2 are typically passively cooled and meant to be installed in to rack servers with fans running at jet engine speeds. If I used the DL360p with an external setup, this would cause issues. If I used the ML310e internally, I had significant doubts that cooling would be enough. The ML310e did have the plastic air baffles, but only had one fan for the expansion cards area, and of course not all the air would pass through the GRID K1 card.
Because of a limited budget, and the possibility I may not even be able to get it working, I didn’t want to spend too much. I found an eBay user local in my city who had a couple Grid K1 and Grid K2 cards, as well as a bunch of other cool stuff.
We spoke and he decided to give me a wicked deal on the Grid K1 card. I thought this was a fantastic idea as the power requirements were significantly less (more likely to work on the ML310e) on the K1 card at 130 W max power, versus the K2 card at 225 W max power.
We set a time and a place to meet. Preemptively I ran out to a local supply store to purchase an LP4 power adapter splitter, as well as a LP4 to 6pin PCIe power adapter. There were no available power connectors inside of the ML310e server so this was needed. I still thought the chances of this working were slim…
I also decided to go ahead and download the Nvidia GRID Software Package. This includes the release notes, user guide, ESXi vib driver (includes vSGA, vGPU), as well as guest drivers for vGPU and pass through. The package also includes the GRID vGPU Manager. The driver I used was from: https://www.nvidia.com/Download/driverResults.aspx/144909/en-us
To install, I copied over the vib file “NVIDIA-vGPU-kepler-VMware_ESXi_6.5_Host_Driver_367.130-1OEM.6220.127.116.1198673.vib” to a datastore, enabled SSH, and then ran the following command to install:
The command completed successfully and I shut down the host. Now I waited to meet.
We finally met and the transaction went smooth in a parking lot (people were staring at us as I handed him cash, and he handed me a big brick of something folded inside of grey static wrap). The card looked like it was in beautiful shape, and we had a good but brief chat. I’ll definitely be purchasing some more hardware from him.
Installing the card in the ML310e was difficult and took some time with care. First I had to remove the plastic air baffle. Then I had issues getting it inside of the case as the back bracket was 1cm too long to be able to put the card in. I had to finesse and slide in on and angle but finally got it installed. The back bracket (front side of case) on the other side slid in to the blue plastic case bracket. This was nice as the ML310e was designed for extremely long PCIe expansion cards and has a bracket on the front side of the case to help support and hold the card up as well.
For power I disconnected the DVD-ROM (who uses those anyways, right?), and connected the LP5 splitter and the LP5 to 6pin power adapter. I finally hooked it up to the card.
I laid the cables out nicely and then re-installed the air baffle. Everything was snug and tight.
Please see below for pictures of the Nvidia GRID K1 installed in the ML310e Gen8 V2.
Powering on the server was a tense moment for me. A few things could have happened:
Server won’t power on
Server would power on but hang & report health alert
Nvidia GRID card could overheat
Nvidia GRID card could overheat and become damaged
Nvidia GRID card could overheat and catch fire
Server would boot but not recognize the card
Server would boot, recognize the card, but not work
Server would boot, recognize the card, and work
With great suspense, the server powered on as per normal. No errors or health alerts were presented.
I logged in to iLo on the server, and watched the server perform a BIOS POST, and start it’s boot to ESXi. Everything was looking well and normal.
After ESXi booted, and the server came online in vCenter. I went to the server and confirmed the GRID K1 was detected. I went ahead and configured 2 GPUs for vGPU, and 2 GPUs for 3D vSGA.
I restarted the X.org service (required when changing the options above), and proceeded to add a vGPU to a virtual machine I already had configured and was using for VDI. You do this by adding a “Shared PCI Device”, selecting “NVIDIA GRID vGPU”, and I chose to use the highest profile available on the K1 card called “grid_k180q”.
After adding and selecting ok, you should see a warning telling you that must allocate and reserve all resources for the virtual machine, click “ok” and continue.
Power On and Testing
I went ahead and powered on the VM. I used the vSphere VM console to install the Nvidia GRID driver package (included in the driver ZIP file downloaded earlier) on the guest. I then restarted the guest.
After restarting, I logged in via Horizon, and could instantly tell it was working. Next step was to disable the VMware vSGA Display Adapter in the “Device Manager” and restart the host again.
Upon restarting again, to see if I had full 3D acceleration, I opened DirectX diagnostics by clicking on “Start” -> “Run” -> “dxdiag”.
It worked! Now it was time to check the temperature of the card to make sure nothing was overheating. I enabled SSH on the ESXi host, logged in, and ran the “nvidia-smi” command.
According to this, the different GPUs ranged from 33C to 50C which was PERFECT! Further testing under stress, and I haven’t gotten a core to go above 56. The ML310e still has an option in the BIOS to increase fan speed, which I may test in the future if the temps get higher.
With “nvidia-smi” you can see the 4 GPUs, power usage, temperatures, memory usage, GPU utilization, and processes. This is the main GPU manager for the card. There are some other flags you can use for relevant information.
Overall I’m very impressed, and it’s working great. While I haven’t tested any games, it’s working perfect for videos, music, YouTube, and multi-monitor support on my 10ZiG 5948qv. I’m using 2 displays with both running at 1920×1080 for resolution.
I’m looking forward to doing some tests with this VM while continuing to use vGPU. I will also be doing some testing utilizing 3D Accelerated vSGA.
The two coolest parts of this project are:
3D Acceleration and Hardware h.264 Encoding on VMware Horizon
Getting a GRID K1 working on an HPe ML310e Gen8 v2
Leave a comment and let me know what you think! Or leave a question!
If you are running Microsoft Windows in a domain environment with WSUS configured, you may notice that you’re not able to install some FODs (Features on Demand), or use the “Turn Windows features on or off”. This will stop you from installing things like the RSAT tools, .NET Framework, Language Speech packs, etc…
You may see “failure to download files”, “cannot download”, or errors like “0x800F0954” when running DISM to install packages.
To resolve this, you need to modify your domain’s group policy settings to allow your workstations to query Windows Update servers for additional content. The workstations will still use your WSUS server for approvals, downloads, and updates, however in the event content is not found, it will query Windows Update.
Enable download of “Optional features” directly from Windows Update
Open the group policy editor on your domain
Create a new GPO, or modify an existing one. Make sure it applies to the computers you’d like
Navigate to “Computer Configuration”, “Policies”, “Administrative Templates”, and then “System”.
Double click or open “Specify settings for optional component installation and component repair”
Make sure “Never attempt to download payload from Windows Update” is NOT checked
Make sure “Download repair content and optional features directly from Windows Update instead of Windows Server Update Services (WSUS)” IS checked.
Wait for your GPO to update, or run “gpupdate /force” on the workstations.
Please see an example of the configuration below:
Download repair content and optional features directly from Windows Update instead of Windows Server Update Services (WSUS)
You should now be able to download/install RSAT, .NET, Speech language packs, and more!
Just a few words of warning when upgrading your VMware vSphere Windows 10 virtual machines to Windows 10 Version 1809 (October Update). When upgrading, after the first restart, you may notice multiple BSOD (Blue Screen of Death) with the error “Driver PNP Watchdog”. This will fail the upgrade. This issue may also occur on the Windows 10 Version 1903 (May Update).
Update – November 14 2018: This issue is still occurring on upgrades using the re-released November version of the October update.
Update and Fix – November 26th 2018: A very big thank you goes out to my reader Werner, who advised that the issue only occurs if the VM is in a snapshotted state. After his comment on this post, I decided to try upgraded without the VM in a snapshot state and it worked! Thanks Werner!
When the upgrade fails, the system will re-attempt until utlimately failing and reverting to the previous version of Windows 10.
In my case, I had a successful upgrade on numerous physical workstations, and a snapshot, so I decided to uninstall both the VMware tools agent, and VMware Horizon View agent. This had no affect and the VM still wouldn’t perform an upgrade.
I’m not sure if it’s the fact that it’s a VM, the VMware tools install, or the VMware Horizon View agent install, however I highly recommend waiting to upgrade until all the bugs get sorted out.
After you upgrade Microsoft Windows 10 to version 1809 (October Update) or later, you may notice that your RSAT (Remote Server Administration Tools) have uninstalled and that you cannot download or install RSAT on the new version of Windows 10. This is because Microsoft has provided the RSAT tools as part of “Features on Demand” on Windows 10 itself. This will apply to all future Windows 10 releases.
Some of you may not be familiar with using the “Features on Demand” or “DISM” tool on Windows, so I decided to write up this little post to assist you in installing RSAT on the latest version of Windows 10.
Please Note: These instructions should also be valid for Windows 10 May 2019 Update Build 1903.
Install RSAT on Windows 10 1809 (and higher)
To install RSAT on Windows 10 version 1809, open an elevated command and run the following command:
After installing Windows 10 Fall Creators Update (Windows 10 Version 1709), I’m noticing that on one of my multi-monitor machines it’s showing blue colors as purple on one of the displays.
This is very visible when highlighting text, viewing the blue Facebook logo and banner, or any other blue content. When dragging something across both displays (window is shown on both displays) you can see the color differences. However, one interesting thing, is that when dragging from one display to the other, for the last 10% or so when moving, it’ll quickly change to the proper blue before leaving the display, which means this is software related since it will briefly show the proper blue.
After spending over an hour troubleshooting, it’s totally unrelated to monitor drivers (color configurations), video drivers, etc… and I cannot find any configuration to fix this. Also, searching on the internet I cannot find any other occurrences.
Please comment if you have any information, or are experiencing the same issue!
Update: I’ve seen 2 other posts of people reporting issues with colors, but no one is going in to detail. I’ve found that the color differences actually show up in screenshots as well (the color changes depending on which display it’s on).
Update October 25th, 2017 – Very odd update… I went ahead and tried using the “Calibrate display color”, and while I didn’t really change any settings, on completion of the wizard the colors are now fixed on my display. I’m thinking this is an issue or bug in Windows 10 Fall Creators Update.
Well, it’s October 18th 2017 and the Fall Creators update (Feature update to Windows 10, version 1709) is now available for download. In my particular environment, I use WSUS to deploy and manage updates.
Update: It’s now May 2018, and this article also applies to Windows 10 April 2018 update version 1803 as well!
Update: It’s now October 2018, and this article also applies to Windows 10 October 2018 update version 1809 as well!
Update: It’s now May 2019, and this article also applies to Windows 10 May 2019 update version 1903 as well!
I went ahead earlier today and approved the updates for deployment, however I noticed an issue on multiple Windows 10 machines, where the Windows Update client would get stuck on Downloading updates 0% status.
I checked a bunch of things, but noticed that it simply couldn’t download the updates from my WSUS server. Further investigation found that the feature updates are packaged in .esd files and IIS may not be able to serve these properly without a minor modification. I remember applying this fix in the past, however I’m assuming it was removed by a prior update on my Windows Server 2012 R2 server.
If you are experiencing this issue, here’s the fix:
On your server running WSUS and IIS, open up the IIS manager.
Expand Sites, and select “WSUS Administration”
On the right side, under IIS, select “MIME Types”
Make sure there is not a MIME type for .esd, if there is, you’re having a different issue, if not, continue with the instructions.
Click on “Add” on the right Actions pane.
File name extension will be “.esd” (without quotations), and MIME type will be “application/octet-stream” (without quotations).
Reset IIS or restart WSUS/IIS server
You’ll notice the clients will now update without a problem! Happy Updating!