Joe Cooper and I (Stephen Wagner), talk about AI Prototyping and AI Development with NVIDIA vGPU powered Virtualized Workstations.
Using NVIDIA vGPU technology, NIMs (NVIDIA Inference Microservices), and VDI you can enable high powered, private, and secure AI Development Workstations.
These environments can be spun up on your VMware infrastructure using NVIDIA datacenter GPUs, NVIDIA NIMs, and using Omnissa Horizon or Citrix for delivery.
When upgrading VMware ESXi hosts using VMware vCenter, and vLCM (vSphere Lifecycle Management), you may notice a failure to upgrade and remediate with vLCM and vGPU on ESXi.
This error appears in tasks as a general failure. Inside of vLCM when monitoring remediation, you’ll see an error in regards to a service, module, or VIB that is currently in use which blocks the update and/or upgrade.
Cause
I suspect this is occuring with vGPU release 18.3 (host driver 570.158.02) due to the fact that the host driver has a version change, however the GPU monitoring and management daemon does not (stays at 570.148.06). Since the GPU daemon isn’t touched, the services do not stop, which keeps the NVIDIA ESXi vGPU host driver loaded in the kernel, stopping the vLCM remediation from completeling.
Resolution
I tried a number of different things to resolve this, such as stopping services, re-attempting, then attempting to unload the NVIDIA vGPU kernel driver, however none of these provided a quick fix.
To resolve this issue, I stopped all the NVIDIA services, uninstalled the vGPU host driver and management daemon, restarted the host, checked compliance, and then remediated the host. Remediation completes succesfully.
Steps to perform these actions:
Place the host in maintenance mode
SSH in to the ESXi host
Run the following command to identify the NVIDIA driver and GPU management daemon:
esxcli software vib list | grep -i NVD
This will return the NVIDIA VIBs, example below:
NVD-VMware_ESXi_8.0.0_Driver
nvdgpumgmtdaemon
Stop the NVIDIA vGPU and related services using the following commands (some of these may already be stopped):
/etc/init.d/nvdGpuMgmtDaemon stop
/etc/init.d/gpuManager stop
/etc/init.d/xorg stop
Uninstall the NVIDIA vGPU Host Driver, and Management daemon using the following commands:
A friendly reminder that it’s time to upgrade (or start planning) since VMware vSphere 7 is reaching end of life on October 2nd, 2025. This means that if you’re running VMware vSphere 7 in your environment, VMware will no longer release updates, security patches, and/or provide support for your environment.
Please note: You will require an active subscription to be entitled to, and also have access to the updates and upgrades. You’ll also want to check the interopability and HCLs to make sure your hardware is supported.
Upgrade Path for VMware vSphere Standard, vSphere Enterprise Plus)
It’s never been a better time to upgrade (literally) with the pending EOL. For customers running VMware vSphere Standard (VVS) or those with with VMware vSphere Enterprise Plus subscriptions, your upgrade path will be to vSphere 8.
Upgrade Path for VMware vSphere Foundation, VMware Cloud Foundation
For customers who are currently licensed for VMware vSphere Foundation (VVF), or VMware Cloud Foundation (VCF) subscriptions and licensing, you’ll be able to either upgrade to vSphere 8 products, or the nice and shiny new VMware vSphere Foundation 9 (VVF 9), or VMware Cloud Foundation 9 (VCF 9).
Upgrading VMware vCenter Server
You’ll always want to upgrade your VMware vCenter instance first (except when using VCF, as the procedures are different and out of the scope of this post). Just a reminder that this is a generally easy process where, using the installer, a new VM is deployed using the vCenter Server Installer ISO. The workflow then migrates and upgrades your data to the new appliance, shutting down the old.
Always make sure to perform a VAMI file-based backup, in addition to a snapshot of the previous vCSA appliance. I usually disabled DRS and HA before the backup/snapshot as well, as this allows easier recovery in the event of a failed vCenter upgrade.
Upgrading VMware ESXi Hosts
When it comes to your VMware ESXi hosts (as I recommend to customers), use vLCM (VMware Lifecycle Management) and Image Based Updates if possible as this makes the upgrade a breeze (and supports QuickBoot). Note that baselines updates are deprecated.
If the hardware in your cluster comes from a single vendor (example, HPE, Cisco, Dell), you can use cluster based (and cluster focused) vLCM Image based updates.
When you change your cluster to Image based Updates (irreversable for the cluster once created), you’ll be able to choose your target ESXi version, specify the Vendor add-on, and then customize additional components (such as adding the NVIDIA vGPU Host Driver and GPU Management daemon, storage plugins, etc).
After creating your image, you’ll then be able to apply it to your hosts. This can be used for minor updates, and also larger upgrades (such as VMware ESXi 7 to 8).
This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish.
Do you accept the use of cookies and accept our privacy policy? AcceptRejectCookie and Privacy Policy
Privacy & Cookies Policy
Privacy Overview
This website uses cookies to improve your experience while you navigate through the website. Out of these cookies, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may have an effect on your browsing experience.