Nov 202022
 

Today I want to talk about Memory Deduplication on ESXi with Transparent Page Sharing (TPS). This is a technology that isn’t widely known about, even amongst IT professionals with significant experience with VMware products.

And you may ask “Memory Deduplication, why aren’t we using this?!?” as it sounds like a pretty cool piece of technology… Well, I’m about to tell you why you’re not (Inter-VM), and share a few examples of where you would want to enable this.

I also want to show you how to enable TPS globally (Inter-VM), and also discuss TPS being used with VMware Horizon and VDI.

What is Transparent Page Sharing (TPS)?

Transparent Page Sharing is the process in which ESXi can provide memory deduplication by storing duplicate memory pages as a single page on the physical memory of the host. This process stops the system from storing redundant memory pages, and thus frees up physical memory for other uses.

If my memory serves me right, this was originally enabled by default in ESX/ESXi version 5, but was later globally disabled due to security vulnerabilities and concerns.

Note, that TPS is still enabled by default from within the same VM, even today with ESXi 8.

Security Concerns

I recall two potential scenarios and security concerns which led to VMware changing the original default behavior for TPS.

  • Scenario 1 included a concern about an attacker gaining access to a VM, and then having the ability to modify the memory contents of another VM.
  • Scenario 2 included a concern where an attacker may be able to get access to encryption keys used on another system.

A quick search led to a KB titled “Security considerations and disallowing inter-Virtual Machine Transparent Page Sharing (2080735)“, which outlines the details of scenario 2, along with stating “This technique works only in a highly controlled system configured in a non-standard way that VMware believes would not be recreated in a production environment”.

With that being said, it sounds like this would be an extremely difficult attack that requires systems to be configured in a non-standard way.

Current status of TPS

Believe it or not, TPS and memory deduplication is still enabled, however it’s only deduplicating pages from within the same VM. TPS is not deduplicating pages from multiple VMs.

Additionally, VMware has given us controls to configure TPS to allow it amongst multiple VMs, or even enable it globally across the ESXi host.

See below for the settings to configure TPS on ESXi via “Advanced Settings”:

A table providing configurable options for Transparent Page Sharing (TPS) on VMware vSphere ESXi
Transparent Page Sharing (TPS) Settings for ESXi Host

The above table was provided by VMware’s “Additional Transparent Page Sharing management capabilities and new default settings (2097593)” KB.

In short, you could enable TPS globally (Inter-VM) by setting “Mem.ShareForceSalting” in “Advanced Settings”, to a value of “0”. You can also use the salting to configure groups of VMs that are allow to share memory pages.

Additionally, you can tweak the behavior of TPS by modifying some of the settings shown below:

TPS Memory Sharing Settings

As you can see you can configure things like the scanning occurrence (Mem.ShareScanTime) of how often the system will check for memory pages that can be shared/deduplicated and other settings.

TPS is enabled, but not working

So, you may have decided to enable TPS in your environment, but you’re noticing that either no, or very little memory pages are being marked as shared.

ESXi Memory Graph showing Memory Deduplication from TPS
TPS Memory Deduplication – Amount of host physical memory that backs shared guest physical memory

In the example above, you’ll notice that on a loaded host, with TPS enabled globally (Inter-VM, amongst all VMs), that the host is only deduplicating 1,052KB of memory.

This is because you will most often only see TPS being heavily utilized on an ESXi host that has over-committed memory, there’s also a chance that you simply don’t have enough memory pages that can be duplicated.

Memory Deduplication, TPS, and VMware Horizon VDI

Because VMware Horizon utilizes the “vmfork” with “Just-in-Time” desktop delivery, non-persistent VDI will benefit from some level of memory deduplication by default when using Instant Clones with non-persistent VDI. This is because non-persistent VDI guests are spawned from a running base image.

Additionally, you can further implement, enable, and configure TPS by configuring some Transparent Page Sharing options inside of the VMware Horizon Administration console.

When creating a Desktop Pool, you can set the “Transparent Page Sharing” open to “Virtual Machine” (Memory dedupe inside of the VM only), “Pool” (Memory dedupe across the Desktop Pool), “Pod” (Dedupe across the pod), or “Global” (Full Inter-VM memory deduplication across the ESXi host).

If you enabled TPS on the ESXi host globally, these settings are null and not used.

TPS Use Cases

So you might be asking when it’s a good time to use TPS?

  • The Homelab – When is a homelab not a good reason to try something? Looking to save some memory and overcommit memory resources? Implement TPS.
  • VDI Environments – On highly dense hosts, you may consider implementing TPS at some level to maximize the utilization of resources, however you must be aware of the security consequences and factor this in when configuring TPS.
  • Environments with no Sensitive Information – It’s hard to imagine, but if you have an environment that doesn’t contain any sensitive information and doesn’t use any security keys, it would be suitable to enable TPS.

I’m sure there’s a number of other use cases, so leave a comment if you can think of one.

Conclusion

In my opinion Transparent Page Sharing is a technology that should not be forgotten and discarded. VMware admins should be aware of it, how to configure it, and what the implications are of using it.

If you are considering enabling TPS in your environment, you must factor in the potential security consequences of doing so.

Oct 302022
 
vGPU nvidia-smi GPU Link Info

If you’re like me, you want to make sure that your environment is as optimized as possible. I recently noticed that my NVIDIA A2 vGPU was reporting the vGPU PCIe Link Speed and Generation that the card was using was below what it was supposed to be running at on my VMware vSphere ESXi host.

I needed to find out if this was being reported incorrectly, if there was an issue, or something else effecting this. In my case, the specific GPU I was using is supposed to support PCIe Gen4, and has a physical connector supporting 4x, my host has PCIe Gen3 slots, so I should at least be getting Gen3 speeds.

NVIDIA A2 vGPU

The Problem

When running the command “nvidia-smi -q”, the GPU was reporting that it was only running at PCIe Gen 1 speeds, as shown below:

        GPU Link Info
            PCIe Generation
                Max                       : 3
                Current                   : 1
                Device Current            : 1
                Device Max                : 4
                Host Max                  : N/A
            Link Width
                Max                       : 16x
                Current                   : 8x

Something else worth noting, is that the card states that it supports a 16x interface, when it actually only physical has a 8x connector. I believe they use this chip on another board that has multiple GPUs on a single board that actually supports 16x.

You could say I was quite puzzled. Why would the card only be running at PCIe Generation 1 speeds? I thought it could be any of the scenarios below:

  • Dynamic mode that alternates when required (possibly for power savings)
  • Hardware issue
  • Hardware Limitation (I’m using this in an older server)
  • Software issues
  • Configuration issue

Unfortunately, when searching the internet, I couldn’t find many references to this metric, however I did find references from other user’s copy/pastes of “nvidia-smi -q” which had the same values (running PCIe Gen1), even with beefier and more high-end cards.

The Solution

After some more searching, I finally came across an NVIDIA technical document titled “Useful nvidia-smi Queries” that states that the current PCIe Generation Link speed “may be reduced when the GPU is not in use”. This confirms that it’s dynamic and adjusts when needed.

Finally, I decided to give some games a shot in a couple of the VMs, and to my surprise when running a game, the “Device Current” and “Current” PCIe Generation changed to PCIe Gen3 (note that my server isn’t capable of PCIe Gen4, which is the cards maximum), as shown below:

        GPU Link Info
            PCIe Generation
                Max                       : 3
                Current                   : 3
                Device Current            : 3
                Device Max                : 4
                Host Max                  : N/A
            Link Width
                Max                       : 16x
                Current                   : 8x

In conclusion, if you notice this in your environment, do not be alarmed as this is completely normal and expected behavior.

Sep 042022
 

When either directly passing through a GPU, or attaching an NVIDIA vGPU to a Virtual Machine on VMware ESXi that has more than 16GB of Video Memory, you may run in to a situation where the VM fails to boot with the error “Module ‘DevicePowerOn’ power on failed.”. Special considerations are required when performing GPU or vGPU Passthrough with 16GB+ of video memory.

This issue is specifically caused by memory mapping a GPU or vGPU device that has 16GB of memory or higher, and could involve both the host system (the ESXi host) and/or the Virtual Machine configuration.

In this post, I’ll address the considerations and requirements to passthrough these devices to virtual machines in your environment.

In the order of occurrence, it’s usually VM configuration related, however if the recommendations in the “VM Configuration Considerations” section do not resolve the issue, please proceed to reviewing the “ESXi Host Considerations” section.

Please note that if the issue is host related, other errors may be present, or the device may not even be visible to ESXi.

VM GPU and vGPU Configuration Considerations

First and foremost, all new VMs should be created using the “EFI” Firmware type. EFI provides numerous advantages in device access and memory mapping versus the older style “BIOS” firmware types.

VM Firmware type EFI

To do this, create a new virtual machine, navigate to “VM Options”, expand “Boot Options”, and confirm/change the Firmware to “EFI”. I recommend this for all new VMs, and not only for VMs accessing GPUs or vGPUs with over 16GB of memory. Please note that you shouldn’t change an existing VM, and should do this on a fresh new VM.

With performing GPU or vGPU Passthrough with 16GB+ of video memory, you’ll need to create a couple of entries under “Advanced” settings to properly configure access to these PCIe devices and provide the proper environment for memory mapping. The lack of these settings is specifically what causes the “Module ‘DevicePowerOn’ power on failed.” error.

Under the VM settings, head over to “VM Options”, expand “Advanced” and click on “Edit Configuration”, click on “Add Configuration Params”, and add the following entries:

pciPassthru.use64bitMMIO=”TRUE”
pciPassthru.64bitMMIOSizeGB=32

Example below:

VM GPU and vGPU Memory Settings for 16GB or higher memory mapping

You’ll notice that while our GPU or vGPU profile may have 16GB of memory, we need to double that value, and set it for the “pciPassthru.64bitMMIOSizeGB” variable. If your card or vGPU profile had 32GB, you’d set it to “64”.

Additionally if you were passing through multiple GPUs or vGPU devices, you’d need to factor all the memory being mapped, and double the combined amount.

ESXi GPU and vGPU Host Considerations

On most new and modern servers, the host level doesn’t require any special configuration as they are already designed to pass through such devices to the hypervisor properly. However in some special cases, and/or when using older servers, you may need to modify configuration and settings in the UEFI or BIOS.

If setting the VM Configuration above still results in the same error (or possibly other errors), than you most likely need to make modifications to the ESXi hosts BIOS/UEFI/RBSU to allow the proper memory mapping of the PCIe device, in our case being the GPU.

This is where things get a bit tricky because every server manufacturer has different settings that will need to be configured.

Look for the following settings, or settings with similar terminology:

  • “Memory Mapping Above 4G”
  • “Above 4G Decoding”
  • “PCI Express 64-Bit BAR Support”
  • “64-Bit IOMMU Mapping”

Once you find the correct setting or settings, enable them.

Every vendor could be using different terminology and there may be other settings that need to be configured that I don’t have listed above. In my case, I had to go in to a secret “SERVICE OPTIONS” menu on my HPE Proliant DL360p Gen8, as documented here.

After performing the recommendations in this guide, you should now be able to passthrough devices with over 16GB of memory.

Additional Resources:

Sep 042022
 

With VMware ESXi 6.5 and 6.7 going End of Life on October 15th, 2022, many of you are looking for options to update hosts in your homelab, especially in my case putting ESXi 7.0 on HP Proliant DL360p Gen8 servers.

As far as support goes, HPE last provided a custom installer for ESXi for versions 6.5 U3 which was released December of 2019. This was the “last Pre-Gen9 custom image” released, as ESXi 7.0 on the DL360p Gen8 is totally unsupported.

ESXi 6.7 or higher on the Gen8 Servers

The jump from 6.5 to 6.7 was a little easier, as you could use the 6.5 custom installer, and then upgrade to 6.7. For the most part, as long as the hardware itself was supported, you were in pretty good shape.

Additionally, with the HPE vibsdepot loaded in to VMware Update Manager (now known as Lifecycle Manager), you could also keep all the HPE drivers and agents up to date.

ESXi 7.0 on the Gen8 Servers

Some were lucky enough to upgrade their current installs to 7 with no or limited problems, however the general consensus online was to expect problems. There were some major driver changes, which I think at one point led to an advisory to perform a fresh install, even if you had a fully supported configuration with newer generation servers such as the Proliant Gen9 and Gen10 servers, when upgrading from older versions.

In my setup, I have the following:

  • 2 x HPE Proliant DL360p Gen8 Servers
    • Dual Intel Xeon E5-2660v2 Processors in each server
    • USB and/or SD for booting ESXi
    • No other internal storage
  • External SAN iSCSI Storage

Concerns and Considerations

My main concern is to not only have a stable and functioning ESXi 7 instance, but I also, if possible would like to have the HPE drivers, agents, and integrations with iLO.

You must consider that while this is completely unsupported, you do need to make sure that the components of your current configuration are supported, such as the processor and PCIe cards, even if the server as a whole is not supported.

Make sure you reference your hardware on the VMware Compatibility Guide (HCL).

In my case, my processors were supported, however my RAID controller was not. So theoretically, since I’m not using my RAID controllers, I should be fine.

The process – Installing ESXi 7.0

I was able to install ESXi 7.0 on my HPE Proliant Gen8 servers, by performing the following steps.

  1. Download the Generic ESXi installer from VMware directly.
    1. Link: Download VMware vSphere
  2. Download the “HPE Custom Addon for ESXi 7.0”.
    1. Link: HPE Custom Addon for ESXi 7.0 U3 for July 2022
  3. Boot server, install using the Generic Installer downloaded above.
  4. Mount NFS or iSCSI datastore.
  5. Copy HPE Custom Addon for ESXi zip file to datastore.
  6. Enable SSH on host (or use console).
  7. Place host in to maintenance mode.
  8. Run “esxcli software vib install -d /vmfs/volumes/datastore-name/folder-name/HPE-703.0.0.10.9.1.5-Jul2022-Addon-depot.zip” from the command line.
  9. The install will run and complete successfully.
  10. Restart your server as needed, you’ll now notice that not only were HPE drivers installed, but also agents like the Agentless management agent, and iLO integrations.

You’ll now have a functioning instance.

HP Proliant DL360p Gen8 running ESXi 7.0

In my case everything was working, except for the “Smart Array P420i” RAID Controller, which I don’t use anyways.

Additionally, if you have a vCenter instance running, make sure that you add the HPE vibsdepot repo to your Lifecycle Manager. After you add the repo, create a baseline, and attach the baseline to the host, go ahead and proceed to scan, stage, and remediate the server which will then further update all the HPE specific drivers and software.

Jul 172022
 
VMware vSphere ESXi with vTPM from NKP

It’s been coming for a while: The requirement to deploy VMs with a TPM module… Today I’ll be showing you the easiest and quickest way to create and deploy Virtual Machines with vTPM on VMware vSphere ESXi!

As most of you know, Windows 11 has a requirement for Secureboot as well as a TPM module. It’s with no doubt that we’ll also possibly see this requirement with future Microsoft Windows Server operating systems.

While users struggle to deploy TPM modules on their own workstations to be eligible for the Windows 11 upgrade, ESXi administrators are also struggling with deploying Virtual TPM modules, or vTPM modules on their virtualized infrastructure.

What is a TPM Module?

TPM stands for Trusted Platform Module. A Trusted Platform Module, is a piece of hardware (or chip) inside or outside of your computer that provides secured computing features to the computer, system, or server that it’s attached to.

This TPM modules provides things like a random number generator, storage of encryption keys and cryptographic information, as well as aiding in secure authentication of the host system.

In a virtualization environment, we need to emulate this physical device with a Virtual TPM module, or vTPM.

What is a Virtual TPM (vTPM) Module?

A vTPM module is a virtualized software instance of a traditional physical TPM module. A vTPM can be attached to Virtual Machines and provide the same features and functionality that a physical TPM module would provide to a physical system.

vTPM modules can be can be deployed with VMware vSphere ESXi, and can be used to deploy Windows 11 on ESXi.

Deployment of vTPM modules, require a Key Provider on the vCenter Server.

For more information on vTPM modules, see VMware’s “Virtual Trust Platform Module Overview” documentation.

Deploying vTPM (Virtual TPM Modules) on VMware vSphere ESXi

In order to deploy vTPM modules (and VM encryption, vSAN Encryption) on VMware vSphere ESXi, you need to configure a Key Provider on your vCenter Server.

Traditionally, this would be accomplished with a Standard Key Provider utilizing a Key Management Server (KMS), however this required a 3rd party KMS server and is what I would consider a complex deployment.

VMware has made this easy as of vSphere 7 Update 2 (7U2), with the Native Key Provider (NKP) on the vCenter Server.

The Native Key Provider, allows you to easily deploy technologies such as vTPM modules, VM encryption, vSAN encryption, and the best part is, it’s all built in to vCenter Server.

Enabling VMware Native Key Provider (NKP)

To enable NKP across your vSphere infrastructure:

  1. Log on to your vCenter Server
  2. Select your vCenter Server from the Inventory List
  3. Select “Key Providers”
  4. Click on “Add”, and select “Add Native Key Provider”
  5. Give the new NKP a friendly name
  6. De-select “Use key provider only with TPM protected ESXi hosts” to allow your ESXi hosts without a TPM to be able to use the native key provider.

In order to activate your new native key provider, you need to click on “Backup” to make sure you have it backed up. Keep this backup in a safe place. After the backup is complete, you NKP will be active and usable by your ESXi hosts.

Screenshot of VMware vCenter Server with Native Key Provider (NKP) Configured
VMware vCenter with Native Key Provider (NKP) Configured

There’s a few additional things to note:

  • Your ESXi hosts do NOT require a physical TPM module in order to use the Native Key Provider
    • Just make sure you disable the checkbox “Use key provider only with TPM protected ESXi hosts”
  • NKP can be used to enable vTPM modules on all editions of vSphere
  • If your ESXi hosts have a TPM module, using the Native Key Provider with your hosts TPM modules can provide enhanced security
    • Onboard TPM module allows keys to be stored and used if the vCenter server goes offline
  • If you delete the Native Key Provider, you are also deleting all the keys stored with it.
    • Make sure you have it backed up
    • Make sure you don’t have any hosts/VMs using the NKP before deleting

You can now deploy vTPM modules to virtual machines in your VMware environment.

Jun 192022
 
VMware vSphere 7 Logo

We all know that vMotion is awesome, but what is even more awesome? Optimizing VMware vMotion to make it redundant and faster!

vMotion allows us to migrate live Virtual Machines from one ESXi host to another without any downtime. This allows us to perform physical maintenance on the ESXi hosts, update and restart the hosts, and also load balance VMs across the hosts. We can even take this a step further use DRS (Distributed Resource Scheduler) automation to intelligently load the hosts on VM boot and to dynamically load balance the VMs as they run.

Picture of VMware vMotion diagram
VMware vMotion

In this post, I’m hoping to provide information on how to fully optimize and use vMotion to it’s full potential.

VMware vMotion

Most of you are probably running vMotion in your environment, whether it’s a homelab, dev environment, or production environment.

I typically see vMotion deployed on the existing data network in smaller environments, I see it deployed on it’s own network in larger environments, and in very highly configured environments I see it being used with the vMotion TCP stack.

While you can preform a vMotion with 1Gb networking, you certainly almost always want at least 10Gb networking for the vMotion network, to avoid any long running VMs. Typically most IT admins are happy with live migration vMotion’s in the seconds, and not the minutes.

VMware vMotion Optimization

So you might ask, if vMotion is working and you’re satisfied, what is there to optimize? There’s actually a few things, but first let’s talk about what we can improve on.

We’re aiming for improvements with:

  • Throughput/Speed
    • Faster vMotion
      • Faster Speed
      • Less Time
    • Migrate more VMs
      • Evacuate hosts faster
      • Enable more aggressive DRS
      • Migrate many VMs at once very quickly
  • Redundancy
    • Redundant vMotion Interfaces (NICs and Uplinks)
  • More Complex vMotion Configurations
    • vMotion over different subnets and VLANs
      • vMotion routed over Layer 3 networks

To achieve the above, we can focus on the following optimizations:

  1. Enable Jumbo Frames
  2. Saturation of NIC/Uplink for vMotion
  3. Multi-NIC/Uplink vMotion
  4. Use of the vMotion TCP Stack

Let’s get to it!

Enable Jumbo Frames

I can’t stress enough how important it is to use Jumbo Frames for specialized network traffic on high speed network links. I highly recommend you enable Jumbo Frames on your vMotion network.

Note, that you’ll need to have a physical switch and NICs that supports Jumbo frames.

In my own high throughput testing on a 10Gb link, without using Jumbo frames I was only able to achieve transfer speeds of ~6.7Gbps, whereas enabling Jumbo Frames allowed me to achieve speeds of ~9.8Gbps.

When enabling this inside of vSphere and/or ESXi, you’ll need to make sure you change and update the applicable vmk adapter, vSwitch/vDS switches, and port groups. Additionally as mentioned above you’ll need to enable it on your physical switches.

You may assume that once you configure a vMotion enabled NIC, that when performing migrations you will be able to fully saturate it. This is not necessarily the case!

When performing a vMotion, the vmk adapter is bound to a single thread (or CPU core). Depending on the power of your processor and the speed of the NIC, you may not actually be able to fully saturate a single 10Gb uplink.

In my own testing in my homelab, I needed to have a total of 2 VMK adapters to saturate a single 10Gb link.

If you’re running 40Gb or even 100Gb, you definitely want to look at adding multiple VMK adapters to your vMotion network to be able to fully saturate a single NIC or Uplink.

You can do this by simply configuring multiple VMK adapters per host with different IP addresses on the same subnet.

One important thing to mention is that if you have multiple physical NICs and Uplinks connected to your vMotion switch, this change will not help you utilize multiple physical interfaces (NICs/Uplinks). See “Multi-NIC/Uplink vMotion”.

Please note: As of VMware vSphere 7 Update 2, the above is not required as vMotion has been optimized to use multiple streams to fully saturate the interface. See VMware’s blog post “Faster vMotion Makes Balancing Workloads Invisible” for more information.

Multi-NIC/Uplink vMotion

Another situation is where we may want to utilize multiple NICs and Uplinks for vMotion. When implemented correctly, this can provide load balancing (additional throughput) as well as redundancy on the vMotion network.

If you were to simply add additional NIC interfaces as Uplinks to your vMotion network, this would add redundancy in the event of a failover but it wouldn’t actually result in increased speed or throughput as special configuration is required.

To take advantage of the additional bandwidth made available by additional Uplinks, we need to specially configure multiple portgroups on the switch (vSwitch or vDS Distributed Switch), and configure each portgroup to only use one of the Uplinks as the “Active Uplink” with the rest of the uplinks under “Standby Uplink”.

Example Configuration

  • vSwitch or vDS Switch
    • Portgroup 1
      • Active Uplink: Uplink 1
      • Standby Uplinks: Uplink 2, Uplink 3, Uplink 4
    • Portgroup 2
      • Active Uplink: Uplink 2
      • Standby Uplinks: Uplink 1, Uplink 3, Uplink 4
    • Portgroup 3
      • Active Uplink: Uplink 3
      • Standby Uplinks: Uplink 1, Uplink 2, Uplink 4
    • Portgroup 4
      • Active Uplink: Uplink 4
      • Standby Uplinks: Uplink 1, Uplink 2, Uplink 3

You would then place a single or multiple vmk adapters on each of the portgroups per host, which would result in essentially mapping the vmk(s) to the specific uplink. This will allow you to utilize multiple NICs for vMotion.

And remember, you may not be able to fully saturate a NIC interface (as stated above) with a single vmk adapter, so I highly recommend creating multiple vmk adapters on each of the Port groups above to make sure that you’re not only using multiple NICs, but that you can also fully saturate each of the NICs.

For more information, see VMware’s KB “Multiple-NIC vMotion in vSphere (2007467)“.

Use of the vMotion TCP Stack

VMware released the vMotion TCP Stack to provided added security to vMotion capabilities, as well as introduce vMotion over multiple subnets (routed vMotion over layer 3).

Using the vMotion TCP Stack, you can isolate and have the vMotion network using it’s own gateway separate from the other vmk adapters using the traditional TCP stack on the ESXi host.

This stack is optimized for vMotion.

Please note, that troubleshooting processes may be different when Troubleshooting vMotion using the vMotion TCP/IP Stack (click the link for my blog post on troubleshooting).

For more information, see VMware’s Documentation on “vMotion TCP/IP Stack“.

Additional resources:

VMware – How to Tune vMotion for Lower Migration Times?

Jun 182022
 
Nvidia GRID Logo

When performing a VMware vMotion on a Virtual Machine with an NVIDIA vGPU attached to it, the VM may freeze during migration. Additionally, when performing a vMotion on a VM without a vGPU, the VM does not freeze during migration.

So why is it that adding a vGPU to a VM causes it to become frozen during vMotion? This is referred to as the VM Stun Time.

I’m going to explain why this happens, and what you can do to reduce these STUN times.

VMware vMotion

First, let’s start with traditional vMotion without a vGPU attached.

VMware vMotion with vSphere and ESXi
VMware vMotion with vSphere

vMotion allows us to live migrate a Virtual Machine instance from one ESXi host, to another, with (visibly) no downtime. You’ll notice that I put “visibly” in brackets…

When performing a vMotion, vSphere will migrate the VM’s memory from the source to destination host and create checkpoints. It will then continue to copy memory deltas including changes blocks after the initial copy.

Essentially vMotion copies the memory of the instance, then initiates more copies to copy over the changes after the original transfer was completed, until the point where it’s all copied and the instance is now running on the destination host.

VMware vMotion with vGPU

For some time, we have had the ability to perform a vMotion with a VM that as a GPU attached to it.

VMware vSphere with NVIDIA vGPU
VMware VMs with vGPU

However, in this situation things work slightly different. When performing a vMotion, it’s not only the system RAM memory that needs to be transferred, but the GPU’s memory (VRAM) as well.

Unfortunately the checkpoint/delta transfer technology that’s used with then system RAM isn’t available to transfer the GPU, which means that the VM has to be stunned (frozen) to stop it so that the video RAM can be transferred and then the instance can be initialized on the destination host.

STUN Time

The STUN time is essentially the time it takes to transfer the video RAM (framebuffer) from one host to another.

When researching this, you may find examples of the time it takes to transfer various sizes of VRAM. An example would be from VMware’s documentation “Using vMotion to Migrate vGPU Virtual Machines“:

NVIDIA vGPU Estimated STUN Times
Expected STUN Times for vMotion with vGPU on 10Gig vMotion NIC

However, it will always vary depending on a number of factors. These factors include:

  • vMotion Network Speed
  • vMotion Network Optimization
    • Multi-NIC vMotion to utilize multiple NICs
    • Multi-vmk vMotion to optimize and saturate single NICs
  • Server Load
  • Network Throughput
  • The number of VM’s that are currently being migrated with vMotion

As you can see, there’s a number of things that play in to this. If you have a single 10Gig link for vMotion and you’re migrating many VMs with a vGPU, it’s obviously going to take longer than if you were just migrating a single VM with a vGPU.

Optimizing and Minimizing vGPU STUN Time

There’s a number of things we can look at to minimize the vGPU STUN times. This includes:

  • Upgrading networking throughput with faster NICs
  • Optimizing vMotion (Configure multiple vMotion VMK adapters to saturate a NIC)
  • Configure Multi-NIC vMotion (Utilize multiple physical NICs to increase vMotion throughput)
  • Reduce DRS aggressiveness
  • Migrate fewer VMs at the same time

All of the above can be implemented together, which I would actually recommend.

In short, the faster we migrate the VM, the less the STUN Time will be. Check out my blog post on Optimizing VMware vMotion which includes how to perform the above recommendations.

Hope this helps!

May 012021
 
Picture of NVMe Storage Server Project

For over a year and a half I have been working on building a custom NVMe Storage Server for my homelab. I wanted to build a high speed storage system similar to a NAS or SAN, backed with NVMe drives that provides iSCSI, NFS, and SMB Windows File Shares to my network.

The computers accessing the NVMe Storage Server would include VMware ESXi hosts, Raspberry Pi SBCs, and of course Windows Computers and Workstations.

The focus of this project is on high throughput (in the GB/sec) and IOPS.

The current plan for the storage environment is for video editing, as well as VDI VM storage. This can and will change as the project progresses.

The History

More and more businesses are using all-flash NVMe and SSD based storage systems, so I figured there’s no reason why I can’t have build and have my own budget custom all NVMe flash NAS.

This is the story of how I built my own NVMe based Storage Server.

The first version of the NVMe Storage Server consisted of the IO-PEX40152 card with 4 x 2TB Sabrent Rocket 4 NVMe drives inside of an HPE Proliant DL360p Gen8 Server. The server was running ESXi with TrueNAS virtualized, and the PCIe card passed through to the TrueNAS VM.

The results were great, the performance was amazing, and both servers had access to the NFS export via 2 x 10Gb SFP+ networking.

There were three main problems with this setup:

  1. Virtualized – Once a month I had an ESXi PSOD. This was either due to overheating of the IO-PEX40152 card because of modifications I made, or bugs with the DL360p servers and PCIe passthrough.
  2. NFS instead of iSCSI – Because TrueNAS was virtualized inside of the host that was using it for storage, I had to use NFS since the host virtualizing TrueNAS would also be accessing the data on the TrueNAS VM. When shutting down the host, you need to shut down TrueNAS first. NFS disconnects are handled way healthier than iSCSI disconnects (which can cause corruption even if no files are being used).
  3. CPU Cores maxed on data transfer – When doing initial testing, I was maxing out the CPU cores assigned to the TrueNAS VM because the data transfers were so high. I needed a CPU and setup that was better fit.

Version 1 went great, but you can see some things needed to be changed. I decided to go with a dedicated server, not virtualize TrueNAS, and go for a newer CPU with a higher Ghz speed.

And so, version 2 was born (built). Keep reading and scrolling for pictures!

The Hardware

On version 2 of the project, the hardware includes:

Notes on the Hardware:

  • While the ML310e Gen8 v2 server is a cheap low entry server, it’s been a fantastic team member of my homelab.
  • HPE Dual 10G Port 560SFP+ adapters can be found brand new in unsealed boxes on eBay at very attractive prices. Using HPE Parts inside of HPE Servers, avoids the fans from spinning up fast.
  • The ML310e Gen8 v2 has some issues with passing through PCIe cards to ESXi. Works perfect when not passing through.

The new NVMe Storage Server

I decided to repurpose an HPE Proliant ML310e Gen8 v2 Server. This server was originally acting as my Nvidia Grid K1 VDI server, because it supported large PCIe cards. With the addition of my new AMD S7150 x2 hacked in/on to one of my DL360p Gen8’s, I no longer needed the GRID card in this server and decided to repurpose it.

Picture of an HPe ML310e Gen8 v2 with NVMe Storage
HPe ML310e Gen8 v2 with NVMe Storage

I installed the IOCREST IO-PEX40152 card in to the PCIe 16x slot, with 4 x 2TB Sabrent Rocket 4 NVME drives.

Picture of IOCREST IO-PEX40152 with GLOTRENDS M.2 NVMe SSD Heatsink on Sabrent Rocket 4 NVME
IOCREST IO-PEX40152 with GLOTRENDS M.2 NVMe SSD Heatsink on Sabrent Rocket 4 NVME

While the server has a PCIe 16x wide slot, it only has an 8x bus going to the slot. This means we will have half the capable speed vs the true 16x slot. This however does not pose a problem because we’ll be maxing out the 10Gb NICs long before we max out the 8x bus speed.

I also installed an HPE Dual Port 560SFP+ NIC in to the second slot. This will allow a total of 2 x 10Gb network connections from the server to the Ubiquiti UniFi US-16-XG 10Gb network switch, the backbone of my network.

The Server also have 4 x Hot Swappable HD bays on the front. When configured in HBA mode (via the BIOS), these are accessible by TrueNAS and can be used. I plan on populating these with 4 x 4TB HPE MDL SATA Hot Swappable drives to act as a replication destination for the NVMe pool and/or slower magnetic long-term storage.

Front view of HPE ML310e Gen8 v2 with Hotswap Drive bays
HPE ML310e Gen8 v2 with Hotswap Drive bays

I may also try to give WD RED Pro drives a try, but I’m not sure if they will cause the fans to speed up on the server.

TrueNAS Installation and Configuration

For the initial Proof-Of-Concept for version 2, I decided to be quick and dirty and install it to a USB stick. I also waited until I installed TrueNAS on to the USB stick and completed basic configuration before installing the Quad NVMe PCIe card and 10Gb NIC. I’m using a USB 3.0 port on the back of the server for speed, as I can’t verify if the port on the motherboard is USB 2 or USB 3.

Picture of a TrueNAS USB Stick on HPE ML310e Gen8 v2
TrueNAS USB Stick on HPE ML310e Gen8 v2

TrueNAS installation worked without any problems whatsoever on the ML310e. I configured the basic IP, time, accounts, and other generic settings. I then proceeded to install the PCIe cards (storage and networking).

Screenshot of TrueNAS Dashboard Installed on NVMe Storage Server
TrueNAS Installed on NVMe Storage Server

All NVMe drives were recognized, along with the 2 HDDs I had in the front Hot-swap bays (sitting on an HP B120i Controller configured in HBA mode).

Screenshot of available TrueNAS NVMe Disks
TrueNAS NVMe Disks

The 560SFP+ NIC also was detected without any issues and available to configure.

Dashboard Screenshot of TrueNAS 560SFP+ 10Gb NIC
TrueNAS 560SFP+ 10Gb NIC

Storage Configuration

I’ve already done some testing and created a guide on FreeNAS and TrueNAS ZFS Optimizations and Considerations for SSD and NVMe, so I made sure to use what I learned in this version of the project.

I created a striped pool (no redundancy) of all 4 x 2TB NVMe drives. This gave us around 8TB of usable high speed NVMe storage. I also created some datasets and a zVOL for iSCSI.

Screenshot of NVMe TrueNAS Storage Pool with Datasets and zVol
NVMe TrueNAS Storage Pool with Datasets and zVol

I chose to go with the defaults for compression to start with. I will be testing throughput and achievable speeds in the future. You should always test this in every and all custom environments as the results will always vary.

Network Configuration

Initial configuration was done via the 1Gb NIC connection to my main LAN network. I had to change this as the 10Gb NIC will be directly connected to the network backbone and needs to access the LAN and Storage VLANs.

I went ahead and configured a VLAN Interface on VLAN 220 for the Storage network. Connections for iSCSI and NFS will be made on this network as all my ESXi servers have vmknics configured on this VLAN for storage. I also made sure to configure an MTU of 9000 for jumbo frames (packets) to increase performance. Remember that all hosts must have the same MTU to communicate.

Screenshot of 10Gb NIC on Storage VLAN
10Gb NIC on Storage VLAN

Next up, I had to create another VLAN interface for the LAN network. This would be used for management, as well as to provide Windows File Share (SMB/Samba) access to the workstations on the network. We leave the MTU on this adapter as 1500 since that’s what my LAN network is using.

Screenshot of 10Gb NIC on LAN VLAN
10Gb NIC on LAN VLAN

As a note, I had to delete the configuration for the existing management settings (don’t worry, it doesn’t take effect until you hit test) and configure the VLAN interface for my LANs VLAN and IP. I tested the settings, confirmed it was good, and it was all setup.

At this point, only the 10Gb NIC is now being used so I went ahead and disconnected the 1Gb network cable.

Sharing Setup and Configuration

It’s now time to configure the sharing protocols that will be used. As mentioned before, I plan on deploying iSCSI, NFS, and Windows File Shares (SMB/Samba).

iSCSI and NFS Configuration

Normally, for a VMware ESXi virtualization environment, I would always usually prefer iSCSI based storage, however I also wanted to configure NFS to test throughput of both with NVMe flash storage.

Earlier, I created the datasets for all my my NFS exports and a zVOL volume for iSCSI.

Note, that in order to take advantage of the VMware VAAI storage directives (enhancements), you must use a zVOL to present an iSCSI target to an ESXi host.

For NFS, you can simply create a dataset and then export it.

For iSCSI, you need to create a zVol and then configure the iSCSI Target settings and make it available.

SMB (Windows File Shares)

I needed to create a Windows File Share for file based storage from Windows computers. I plan on using the Windows File Share for high-speed storage of files for video editing.

Using the dataset I created earlier, I configured a Windows Share, user accounts, and tested accessing it. Works perfect!

Connecting the host

Connecting the ESXi hosts to the iSCSI targets and the NFS exports is done in the exact same way that you would with any other storage system, so I won’t be including details on that in this post.

We can clearly see the iSCSI target and NFS exports on the ESXi host.

Screenshot of TrueNAS NVMe iSCSI Target on VMware ESXi Host
TrueNAS NVMe iSCSI Target on VMware ESXi Host
Screenshot of NVMe iSCSI and NFS ESXi Datastores
NVMe iSCSI and NFS ESXi Datastores

To access Windows File Shares, we log on and map the network share like you would normally with any file server.

Testing

For testing, I moved (using Storage vMotion) my main VDI desktop to the new NVMe based iSCSI Target LUN on the NVMe Storage Server. After testing iSCSI, I then used Storage vMotion again to move it to the NFS datastore. Please see below for the NVMe storage server speed test results.

Speed Tests

Just to start off, I want to post a screenshot of a few previous benchmarks I compiled when testing and reviewing the Sabrent Rocket 4 NVMe SSD disks installed in my HPE DL360p Gen8 Server and passed through to a VM (Add NVMe capability to an HPE Proliant DL360p Gen8 Server).

Screenshot of CrystalDiskMark testing an IOCREST IO-PEX40152 and Sabrent Rocket 4 NVME SSD for speed
CrystalDiskMark testing an IOCREST IO-PEX40152 and Sabrent Rocket 4 NVME SSD
Screenshot of CrystalDiskMark testing IOPS on an IOCREST IO-PEX40152 and Sabrent Rocket 4 NVME SSD
CrystalDiskMark testing IOPS on an IOCREST IO-PEX40152 and Sabrent Rocket 4 NVME SSD

Note, that when I performed these tests, my CPU was maxed out and limiting the actual throughput. Even then, these are some fairly impressive speeds. Also, these tests were directly testing each NVMe drive individually.

Moving on to the NVMe Storage Server, I decided to test iSCSI NVMe throughput and NFS NVMe throughput.

I opened up CrystalDiskMark and started a generic test, running a 16GB test file a total of 6 times on my VDI VM sitting on the iSCSI NVMe LUN.

Screenshot of NVMe Storage Server iSCSI Benchmark with CrystalDiskMark
NVMe Storage Server iSCSI Benchmark with CrystalDiskMark

You can see some impressive speeds maxing out the 10Gb NIC with crazy performance of the NVME storage:

  • 1196MB/sec READ
  • 1145.28MB/sec WRITE (Maxing out the 10GB NIC)
  • 62,725.10 IOPS READ
  • 42,203.13 IOPS WRITE

Additionally, here’s a screenshot of the ix0 NIC on the TrueNAS system during the speed test benchmark: 1.12 GiB/s.

Screenshot of TrueNAS NVME Maxing out 10Gig NIC
TrueNAS NVME Maxing out 10Gig NIC

And remember this is with compression. I’m really excited to see how I can further tweak and optimize this, and also what increases will come with configuring iSCSI MPIO. I’m also going to try to increase the IOPS to get them closer to what each individual NVMe drive can do.

Now on to NFS, the results were horrible when moving the VM to the NFS Export.

Screenshot of NVMe Storage Server NFS Benchmark with CrystalDiskMark
NVMe Storage Server NFS Benchmark with CrystalDiskMark

You can see that the read speed was impressive, but the write speed was not. This is partly due to how writes are handled with NFS exports.

Clearly iSCSI is the best performing method for ESXi host connectivity to a TrueNAS based NVMe Storage Server. This works perfect because we’ll get the VAAI features (like being able to reclaim space).

iSCSI MPIO Speed Test

This is more of an update… I was finally able to connect, configure, and utilize the 2nd 10Gbe port on the 560SFP+ NIC. In my setup, both hosts and the TrueNAS storage server all have 2 connections to the switch, with 2 VLANs and 2 subnets dedicated to storage. Check out the before/after speed tests with enabling iSCSI MPIO.

As you can see I was able to essentially double my read speeds (again maxing out the networking layer), however you’ll notice that the write speeds maxed out at 1598MB/sec. I believe we’ve reached a limitation of the CPU, PCIe bus, or something else inside of the server. Note, that this is not a limitation of the Sabrent Rocket 4 NVME drives, or the IOCREST NVME PCIe card.

Moving Forward

I’ve had this configuration running for around a week now with absolutely no issues, no crashes, and it’s been very stable.

Using a VDI VM on NVMe backed storage is lightning fast and I love the experience.

I plan on running like this for a little while to continue to test the stability of the environment before making more changes and expanding the configuration and usage.

Future Plans (and Configuration)

  • Drive Bays
    • I plan to populate the 4 hot-swappable drive bays with HPE 4TB MDL drives. Configured with RaidZ1, this should give me around 12TB usable storage. I can use this for file storage, backups, replication, and more.
  • NVMe Replication
    • This design was focused on creating non-redundant extremely fast storage. Because I’m limited to a total of 4 NVMe disks in this design, I chose not to use RaidZ and striped the data. If one NVMe drive is lost, all data is lost.
    • I don’t plan on storing anything important, and at this point the storage is only being used for VDI VMs (which are backed up), and Video editing.
    • If I can populate the front drive bays, I can replicate the NVMe storage to the traditional HDD storage on a frequent basis to protect against failure to some level or degree.
  • Version 3 of the NVMe Storage Server
    • More NVMe and Bigger NVMe – I want more storage! I want to test different levels of RaidZ, and connect to the backbone at even faster speeds.
    • NVME Drives with PLP (Power Loss Prevention) for data security and protection.
    • Dual Power Supply

Let me know your thoughts and ideas on this setup!

Mar 182020
 
vSphere Logo Image

I’ve noticed in a few situations where an ESXi host is marked as “unresponsive” or “disconnected” inside of vCenter due to issues occurring on that host (or connected hardware). This recently happened again with a customer and is why I’m writing this article at this very moment.

In these situations, usually all normal means of managing, connecting, or troubleshooting the host are unavailable. Usually in cases like this ESXi administrators would simply reset the host.

However, I’ve found hosts can often be rescued without requiring an ungraceful restart or reset.

Observations

In these situations, it can be observed that:

  • The ESXi host is in a unresponsive to disconnected state to vCenter Server.
  • Connecting to the ESXi host directly does not work as it either doesn’t acknowledge HTTPS requests, or comes up with an error.
  • Accessing the console of the ESXi host isn’t possible as it appears frozen.
  • While the ESXi host is unresponsive, the virtual machines are still online and available on the network.

Troubleshooting

In the few situations I’ve noticed this occurring, troubleshooting is possible but requires patience. Consider the following:

  • When trying to access the ESXi console, give it time after hitting enter or selecting a value. If there’s issues on the host such as commands pending, tasks pending, or memory issues, the console may actually respond if you give it 30 seconds to 5 minutes after selecting an item.
  • With the above in mind, attempt to enable console access (preferably console and not SSH). The logins may take some time (30 seconds to 5 minutes after typing in the password), but you might be able to gain troubleshooting access.
  • Check the SAN, NAS, and any shared storage… In one instance, there were issues with a SAN and datastore that froze 2 VMs. The Queued commands to the SAN caused the ESXi host to become unresponsive.
  • There may be memory issues with the ESXi instance. The VMs are fine, however an agent, driver, or piece of software may be causing the hypervisor layer to become unresponsive.

If there are storage issues, do what you can. In one of the cases above, we had to access the ESXi console, issue a “kill -9” to the VM, and then restart the SAN. We later found out there was issues with the SAN and corrupted virtual machines. The moment the SAN was restarted, the ESXi host became responsive, connected to the vCenter server and could be managed.

In another instance, on an older version of ESXi there was an HPE agentless management driver/service that was consuming the ESXi hosts memory continuously causing the memory to overflow, the host to fill the swap and become unresponsive. Eventually after gracefully shutting down the VMs, I was able to access the console, kill the service, and the host become responsive.

May 082019
 
vSphere Logo Image

We’ve all been in the situation where we need to install a driver, vib file, or check “esxtop”. Many advanced administration tasks on ESXi need to be performed via shell access, and to do this you either need a console on the physical ESXi host, an SSH session, or use the Remote vCLI.

In this blog post, I’m going to be providing a quick “How to” enable SSH on an ESXi host in your VMware Infrastructure using the vCenter flash-based web administration interface. This will allow you to perform the tasks above, as well as use the “esxcli” command which is frequently needed.

This method should work on all vCenter versions up to 6.7, and ESXi versions up to 6.7.

How to Enable SSH on an ESXi Host Server

  1. Log on to your vCenter server.
    vCenter Server Login Window
  2. On the left hand “Navigator” pane, select the ESXi host.
    ESXi Navigator Pane on vCenter Web Interface
  3. On the right hand pane, select the “Configure” tab, then “Security Profile” under “System.
    ESXi Host Configuration under Configure Tab in Web Interface
  4. Scroll down and look for “Services” further to the right and select “Edit”.
    ESXi Host Services in Host Configuration
  5. In the “Edit Security Profile” window, select and highlight “SSH” and then click “Start”.
    ESXi Services List on vCenter web interface
  6. Click “Ok”.

This method can also be used to stop, restart, and change the startup policy to enable or disable SSH starting on boot.

Congratulations, you can now SSH in to your ESXi host!