Sep 042022
 

When either directly passing through a GPU, or attaching an NVIDIA vGPU to a Virtual Machine on VMware ESXi that has more than 16GB of Video Memory, you may run in to a situation where the VM fails to boot with the error “Module ‘DevicePowerOn’ power on failed.”. Special considerations are required when performing GPU or vGPU Passthrough with 16GB+ of video memory.

This issue is specifically caused by memory mapping a GPU or vGPU device that has 16GB of memory or higher, and could involve both the host system (the ESXi host) and/or the Virtual Machine configuration.

In this post, I’ll address the considerations and requirements to passthrough these devices to virtual machines in your environment.

In the order of occurrence, it’s usually VM configuration related, however if the recommendations in the “VM Configuration Considerations” section do not resolve the issue, please proceed to reviewing the “ESXi Host Considerations” section.

Please note that if the issue is host related, other errors may be present, or the device may not even be visible to ESXi.

VM GPU and vGPU Configuration Considerations

First and foremost, all new VMs should be created using the “EFI” Firmware type. EFI provides numerous advantages in device access and memory mapping versus the older style “BIOS” firmware types.

VM Firmware type EFI

To do this, create a new virtual machine, navigate to “VM Options”, expand “Boot Options”, and confirm/change the Firmware to “EFI”. I recommend this for all new VMs, and not only for VMs accessing GPUs or vGPUs with over 16GB of memory. Please note that you shouldn’t change an existing VM, and should do this on a fresh new VM.

With performing GPU or vGPU Passthrough with 16GB+ of video memory, you’ll need to create a couple of entries under “Advanced” settings to properly configure access to these PCIe devices and provide the proper environment for memory mapping. The lack of these settings is specifically what causes the “Module ‘DevicePowerOn’ power on failed.” error.

Under the VM settings, head over to “VM Options”, expand “Advanced” and click on “Edit Configuration”, click on “Add Configuration Params”, and add the following entries:

pciPassthru.use64bitMMIO=”TRUE”
pciPassthru.64bitMMIOSizeGB=32

Example below:

VM GPU and vGPU Memory Settings for 16GB or higher memory mapping

You’ll notice that while our GPU or vGPU profile may have 16GB of memory, we need to double that value, and set it for the “pciPassthru.64bitMMIOSizeGB” variable. If your card or vGPU profile had 32GB, you’d set it to “64”.

Additionally if you were passing through multiple GPUs or vGPU devices, you’d need to factor all the memory being mapped, and double the combined amount.

ESXi GPU and vGPU Host Considerations

On most new and modern servers, the host level doesn’t require any special configuration as they are already designed to pass through such devices to the hypervisor properly. However in some special cases, and/or when using older servers, you may need to modify configuration and settings in the UEFI or BIOS.

If setting the VM Configuration above still results in the same error (or possibly other errors), than you most likely need to make modifications to the ESXi hosts BIOS/UEFI/RBSU to allow the proper memory mapping of the PCIe device, in our case being the GPU.

This is where things get a bit tricky because every server manufacturer has different settings that will need to be configured.

Look for the following settings, or settings with similar terminology:

  • “Memory Mapping Above 4G”
  • “Above 4G Decoding”
  • “PCI Express 64-Bit BAR Support”
  • “64-Bit IOMMU Mapping”

Once you find the correct setting or settings, enable them.

Every vendor could be using different terminology and there may be other settings that need to be configured that I don’t have listed above. In my case, I had to go in to a secret “SERVICE OPTIONS” menu on my HPE Proliant DL360p Gen8, as documented here.

After performing the recommendations in this guide, you should now be able to passthrough devices with over 16GB of memory.

Additional Resources:

Sep 042022
 

With VMware ESXi 6.5 and 6.7 going End of Life on October 15th, 2022, many of you are looking for options to update hosts in your homelab, especially in my case putting ESXi 7.0 on HP Proliant DL360p Gen8 servers.

As far as support goes, HPE last provided a custom installer for ESXi for versions 6.5 U3 which was released December of 2019. This was the “last Pre-Gen9 custom image” released, as ESXi 7.0 on the DL360p Gen8 is totally unsupported.

Update: Check out my post covering ESXi 8.0 on HPE Proliant DL360p Gen8 servers!

ESXi 6.7 or higher on the Gen8 Servers

The jump from 6.5 to 6.7 was a little easier, as you could use the 6.5 custom installer, and then upgrade to 6.7. For the most part, as long as the hardware itself was supported, you were in pretty good shape.

Additionally, with the HPE vibsdepot loaded in to VMware Update Manager (now known as Lifecycle Manager), you could also keep all the HPE drivers and agents up to date.

ESXi 7.0 on the Gen8 Servers

Some were lucky enough to upgrade their current installs to 7 with no or limited problems, however the general consensus online was to expect problems. There were some major driver changes, which I think at one point led to an advisory to perform a fresh install, even if you had a fully supported configuration with newer generation servers such as the Proliant Gen9 and Gen10 servers, when upgrading from older versions.

In my setup, I have the following:

  • 2 x HPE Proliant DL360p Gen8 Servers
    • Dual Intel Xeon E5-2660v2 Processors in each server
    • USB and/or SD for booting ESXi
    • No other internal storage
  • External SAN iSCSI Storage

Concerns and Considerations

My main concern is to not only have a stable and functioning ESXi 7 instance, but I also, if possible would like to have the HPE drivers, agents, and integrations with iLO.

You must consider that while this is completely unsupported, you do need to make sure that the components of your current configuration are supported, such as the processor and PCIe cards, even if the server as a whole is not supported.

Make sure you reference your hardware on the VMware Compatibility Guide (HCL).

In my case, my processors were supported, however my RAID controller was not. So theoretically, since I’m not using my RAID controllers, I should be fine.

The process – Installing ESXi 7.0

I was able to install ESXi 7.0 on my HPE Proliant Gen8 servers, by performing the following steps.

  1. Download the Generic ESXi installer from VMware directly.
    1. Link: Download VMware vSphere
  2. Download the “HPE Custom Addon for ESXi 7.0”.
    1. Link: HPE Custom Addon for ESXi 7.0 U3 for July 2022
  3. Boot server, install using the Generic Installer downloaded above.
  4. Mount NFS or iSCSI datastore.
  5. Copy HPE Custom Addon for ESXi zip file to datastore.
  6. Enable SSH on host (or use console).
  7. Place host in to maintenance mode.
  8. Run “esxcli software vib install -d /vmfs/volumes/datastore-name/folder-name/HPE-703.0.0.10.9.1.5-Jul2022-Addon-depot.zip” from the command line.
  9. The install will run and complete successfully.
  10. Restart your server as needed, you’ll now notice that not only were HPE drivers installed, but also agents like the Agentless management agent, and iLO integrations.

You’ll now have a functioning instance.

HP Proliant DL360p Gen8 running ESXi 7.0

In my case everything was working, except for the “Smart Array P420i” RAID Controller, which I don’t use anyways.

Additionally, if you have a vCenter instance running, make sure that you add the HPE vibsdepot repo to your Lifecycle Manager. After you add the repo, create a baseline, and attach the baseline to the host, go ahead and proceed to scan, stage, and remediate the server which will then further update all the HPE specific drivers and software.