Jun 222025
 

When upgrading VMware ESXi hosts using VMware vCenter, and vLCM (vSphere Lifecycle Management), you may notice a failure to upgrade and remediate with vLCM and vGPU on ESXi.

This error appears in tasks as a general failure. Inside of vLCM when monitoring remediation, you’ll see an error in regards to a service, module, or VIB that is currently in use which blocks the update and/or upgrade.

vGPU and vLCM remediation

Cause

I suspect this is occuring with vGPU release 18.3 (host driver 570.158.02) due to the fact that the host driver has a version change, however the GPU monitoring and management daemon does not (stays at 570.148.06). Since the GPU daemon isn’t touched, the services do not stop, which keeps the NVIDIA ESXi vGPU host driver loaded in the kernel, stopping the vLCM remediation from completeling.

Resolution

I tried a number of different things to resolve this, such as stopping services, re-attempting, then attempting to unload the NVIDIA vGPU kernel driver, however none of these provided a quick fix.

To resolve this issue, I stopped all the NVIDIA services, uninstalled the vGPU host driver and management daemon, restarted the host, checked compliance, and then remediated the host. Remediation completes succesfully.

Steps to perform these actions:

  1. Place the host in maintenance mode
  2. SSH in to the ESXi host
  3. Run the following command to identify the NVIDIA driver and GPU management daemon:
    • esxcli software vib list | grep -i NVD
  4. This will return the NVIDIA VIBs, example below:
    • NVD-VMware_ESXi_8.0.0_Driver
    • nvdgpumgmtdaemon
  5. Stop the NVIDIA vGPU and related services using the following commands (some of these may already be stopped):
    • /etc/init.d/nvdGpuMgmtDaemon stop
    • /etc/init.d/gpuManager stop
    • /etc/init.d/xorg stop
  6. Uninstall the NVIDIA vGPU Host Driver, and Management daemon using the following commands:
    • esxcli software vib remove -n NVIDIA-VMware_ESXi_8.0_Host_Driver
    • esxcli software vib remove -n nvdgpumgmtdaemon
  7. Reboot the host
  8. Check vLCM Compliance (don’t forget to skip this)
  9. Remediate the host

After performing these steps, you’ll be able to succesfully remediate the host resulting in upgraded NVIDIA vGPU drivers.

  One Response to “Failure to Upgrade and Remediate with vLCM and vGPU on ESXi”

  1. I had this issue and used this process which worked for me.
    In Vsphere Lifecycle manager – upload the new Nvidia drivers and Daemon
    At the cluster level under Updates edit the image to remove the Nvidia driver and daemon.
    Save the image, check compliance, and remediate all hosts to remove the Nvidia software.
    After all hosts are done, edit the image again and add in the latest Nvidia drivers and daemon.
    Save the image, check compliance and remediate all hosts and they should update normally.

 Leave a Reply

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

(required)

(required)