Jun 192022
 
VMware vSphere 7 Logo

We all know that vMotion is awesome, but what is even more awesome? Optimizing VMware vMotion to make it redundant and faster!

vMotion allows us to migrate live Virtual Machines from one ESXi host to another without any downtime. This allows us to perform physical maintenance on the ESXi hosts, update and restart the hosts, and also load balance VMs across the hosts. We can even take this a step further use DRS (Distributed Resource Scheduler) automation to intelligently load the hosts on VM boot and to dynamically load balance the VMs as they run.

Picture of VMware vMotion diagram
VMware vMotion

In this post, I’m hoping to provide information on how to fully optimize and use vMotion to it’s full potential.

VMware vMotion

Most of you are probably running vMotion in your environment, whether it’s a homelab, dev environment, or production environment.

I typically see vMotion deployed on the existing data network in smaller environments, I see it deployed on it’s own network in larger environments, and in very highly configured environments I see it being used with the vMotion TCP stack.

While you can preform a vMotion with 1Gb networking, you certainly almost always want at least 10Gb networking for the vMotion network, to avoid any long running VMs. Typically most IT admins are happy with live migration vMotion’s in the seconds, and not the minutes.

VMware vMotion Optimization

So you might ask, if vMotion is working and you’re satisfied, what is there to optimize? There’s actually a few things, but first let’s talk about what we can improve on.

We’re aiming for improvements with:

  • Throughput/Speed
    • Faster vMotion
      • Faster Speed
      • Less Time
    • Migrate more VMs
      • Evacuate hosts faster
      • Enable more aggressive DRS
      • Migrate many VMs at once very quickly
  • Redundancy
    • Redundant vMotion Interfaces (NICs and Uplinks)
  • More Complex vMotion Configurations
    • vMotion over different subnets and VLANs
      • vMotion routed over Layer 3 networks

To achieve the above, we can focus on the following optimizations:

  1. Enable Jumbo Frames
  2. Saturation of NIC/Uplink for vMotion
  3. Multi-NIC/Uplink vMotion
  4. Use of the vMotion TCP Stack

Let’s get to it!

Enable Jumbo Frames

I can’t stress enough how important it is to use Jumbo Frames for specialized network traffic on high speed network links. I highly recommend you enable Jumbo Frames on your vMotion network.

Note, that you’ll need to have a physical switch and NICs that supports Jumbo frames.

In my own high throughput testing on a 10Gb link, without using Jumbo frames I was only able to achieve transfer speeds of ~6.7Gbps, whereas enabling Jumbo Frames allowed me to achieve speeds of ~9.8Gbps.

When enabling this inside of vSphere and/or ESXi, you’ll need to make sure you change and update the applicable vmk adapter, vSwitch/vDS switches, and port groups. Additionally as mentioned above you’ll need to enable it on your physical switches.

You may assume that once you configure a vMotion enabled NIC, that when performing migrations you will be able to fully saturate it. This is not necessarily the case!

When performing a vMotion, the vmk adapter is bound to a single thread (or CPU core). Depending on the power of your processor and the speed of the NIC, you may not actually be able to fully saturate a single 10Gb uplink.

In my own testing in my homelab, I needed to have a total of 2 VMK adapters to saturate a single 10Gb link.

If you’re running 40Gb or even 100Gb, you definitely want to look at adding multiple VMK adapters to your vMotion network to be able to fully saturate a single NIC or Uplink.

You can do this by simply configuring multiple VMK adapters per host with different IP addresses on the same subnet.

One important thing to mention is that if you have multiple physical NICs and Uplinks connected to your vMotion switch, this change will not help you utilize multiple physical interfaces (NICs/Uplinks). See “Multi-NIC/Uplink vMotion”.

Please note: As of VMware vSphere 7 Update 2, the above is not required as vMotion has been optimized to use multiple streams to fully saturate the interface. See VMware’s blog post “Faster vMotion Makes Balancing Workloads Invisible” for more information.

Multi-NIC/Uplink vMotion

Another situation is where we may want to utilize multiple NICs and Uplinks for vMotion. When implemented correctly, this can provide load balancing (additional throughput) as well as redundancy on the vMotion network.

If you were to simply add additional NIC interfaces as Uplinks to your vMotion network, this would add redundancy in the event of a failover but it wouldn’t actually result in increased speed or throughput as special configuration is required.

To take advantage of the additional bandwidth made available by additional Uplinks, we need to specially configure multiple portgroups on the switch (vSwitch or vDS Distributed Switch), and configure each portgroup to only use one of the Uplinks as the “Active Uplink” with the rest of the uplinks under “Standby Uplink”.

Example Configuration

  • vSwitch or vDS Switch
    • Portgroup 1
      • Active Uplink: Uplink 1
      • Standby Uplinks: Uplink 2, Uplink 3, Uplink 4
    • Portgroup 2
      • Active Uplink: Uplink 2
      • Standby Uplinks: Uplink 1, Uplink 3, Uplink 4
    • Portgroup 3
      • Active Uplink: Uplink 3
      • Standby Uplinks: Uplink 1, Uplink 2, Uplink 4
    • Portgroup 4
      • Active Uplink: Uplink 4
      • Standby Uplinks: Uplink 1, Uplink 2, Uplink 3

You would then place a single or multiple vmk adapters on each of the portgroups per host, which would result in essentially mapping the vmk(s) to the specific uplink. This will allow you to utilize multiple NICs for vMotion.

And remember, you may not be able to fully saturate a NIC interface (as stated above) with a single vmk adapter, so I highly recommend creating multiple vmk adapters on each of the Port groups above to make sure that you’re not only using multiple NICs, but that you can also fully saturate each of the NICs.

For more information, see VMware’s KB “Multiple-NIC vMotion in vSphere (2007467)“.

Use of the vMotion TCP Stack

VMware released the vMotion TCP Stack to provided added security to vMotion capabilities, as well as introduce vMotion over multiple subnets (routed vMotion over layer 3).

Using the vMotion TCP Stack, you can isolate and have the vMotion network using it’s own gateway separate from the other vmk adapters using the traditional TCP stack on the ESXi host.

This stack is optimized for vMotion.

Please note, that troubleshooting processes may be different when Troubleshooting vMotion using the vMotion TCP/IP Stack (click the link for my blog post on troubleshooting).

For more information, see VMware’s Documentation on “vMotion TCP/IP Stack“.

Additional resources:

VMware – How to Tune vMotion for Lower Migration Times?

Jun 182022
 
Nvidia GRID Logo

When performing a VMware vMotion on a Virtual Machine with an NVIDIA vGPU attached to it, the VM may freeze during migration. Additionally, when performing a vMotion on a VM without a vGPU, the VM does not freeze during migration.

So why is it that adding a vGPU to a VM causes it to become frozen during vMotion? This is referred to as the VM Stun Time.

I’m going to explain why this happens, and what you can do to reduce these STUN times.

VMware vMotion

First, let’s start with traditional vMotion without a vGPU attached.

VMware vMotion with vSphere and ESXi
VMware vMotion with vSphere

vMotion allows us to live migrate a Virtual Machine instance from one ESXi host, to another, with (visibly) no downtime. You’ll notice that I put “visibly” in brackets…

When performing a vMotion, vSphere will migrate the VM’s memory from the source to destination host and create checkpoints. It will then continue to copy memory deltas including changes blocks after the initial copy.

Essentially vMotion copies the memory of the instance, then initiates more copies to copy over the changes after the original transfer was completed, until the point where it’s all copied and the instance is now running on the destination host.

VMware vMotion with vGPU

For some time, we have had the ability to perform a vMotion with a VM that as a GPU attached to it.

VMware vSphere with NVIDIA vGPU
VMware VMs with vGPU

However, in this situation things work slightly different. When performing a vMotion, it’s not only the system RAM memory that needs to be transferred, but the GPU’s memory (VRAM) as well.

Unfortunately the checkpoint/delta transfer technology that’s used with then system RAM isn’t available to transfer the GPU, which means that the VM has to be stunned (frozen) to stop it so that the video RAM can be transferred and then the instance can be initialized on the destination host.

STUN Time

The STUN time is essentially the time it takes to transfer the video RAM (framebuffer) from one host to another.

When researching this, you may find examples of the time it takes to transfer various sizes of VRAM. An example would be from VMware’s documentation “Using vMotion to Migrate vGPU Virtual Machines“:

NVIDIA vGPU Estimated STUN Times
Expected STUN Times for vMotion with vGPU on 10Gig vMotion NIC

However, it will always vary depending on a number of factors. These factors include:

  • vMotion Network Speed
  • vMotion Network Optimization
    • Multi-NIC vMotion to utilize multiple NICs
    • Multi-vmk vMotion to optimize and saturate single NICs
  • Server Load
  • Network Throughput
  • The number of VM’s that are currently being migrated with vMotion

As you can see, there’s a number of things that play in to this. If you have a single 10Gig link for vMotion and you’re migrating many VMs with a vGPU, it’s obviously going to take longer than if you were just migrating a single VM with a vGPU.

Optimizing and Minimizing vGPU STUN Time

There’s a number of things we can look at to minimize the vGPU STUN times. This includes:

  • Upgrading networking throughput with faster NICs
  • Optimizing vMotion (Configure multiple vMotion VMK adapters to saturate a NIC)
  • Configure Multi-NIC vMotion (Utilize multiple physical NICs to increase vMotion throughput)
  • Reduce DRS aggressiveness
  • Migrate fewer VMs at the same time

All of the above can be implemented together, which I would actually recommend.

In short, the faster we migrate the VM, the less the STUN Time will be. Check out my blog post on Optimizing VMware vMotion which includes how to perform the above recommendations.

Hope this helps!

Mar 062022
 
Azure AD SSO with Horizon

Whether deploying VDI for the first time or troubleshooting existing Azure AD SSO issues for an existing environment, special consideration must be made for Microsoft Azure AD SSO and VDI.

When you implement and use Microsoft 365 and Office 365 in a VDI environment, you should have your environment configured to handle Azure AD SSO for a seamless user experience, and to avoid numerous login prompts when accessing these services.

Microsoft Azure Active Directory has two different methods for handling SSO (Single Sign On), these include SSO via a Primary Refresh Token (PRT) and Azure Seamless SSO. In this post, I’ll explain the differences, and when to use which one.

Microsoft Azure AD SSO and VDI

What does Azure AD SSO do?

Azure AD SSO allows your domain joined Windows workstations (and Windows Servers) to have a Single Sign On experience so that users can have an single sign-on integrated experience when accessing Microsoft 365 and/or Office 365.

When Azure AD SSO is enabled and functioning, your users will not be prompted nor have to log on to Microsoft 365 or Office 365 applications or services (including web services) as all this will be handled transparently in the background with Azure AD SSO.

For VDI environments, especially non-persistent VDI (VMware Instant Clones), this is an important function so that users are not prompted to login every time they launch an Office 365 application.

Persistent VDI is not complex and doesn’t have any special considerations for Azure AD SSO, as it will function the same way as traditional workstations, however non-persistent VDI requires special planning.

Please Note: Organizations often associate the Office 365 login prompts to activation issues when in fact activation is functioning fine, however Azure AD SSO is either not enabled, incorrect configured, or not functioning which is why the users are being prompted for login credentials every time they establish a new session with non-persistent VDI. After reading this guide, it should allow you to resolve the issue of Office 365 login prompts on VDI non-persistent and Instant Clone VMs.

Azure AD SSO methods

There are two different ways to perform Azure AD SSO in an environment that is not using ADFS. These are:

  • Azure AD SSO via Primary Refresh Token
  • Azure AD Seamless SSO

Both accomplish the same task, but were created at different times, have different purposes, and are used for different scenarios. We’ll explore this below so that you can understand how each works.

Screenshot of a Hybrid Azure AD Joined login
Hybrid Azure AD Joined Login

Fun fact: You can have both Azure AD SSO via PRT and Azure AD Seamless SSO configured at the same time to service your Active Directory domain, devices, and users.

Azure SSO via Primary Refresh Token

When using Azure SSO via Primary Refresh Token, SSO requests are performed by Windows Workstations (or Windows Servers), that are Hybrid Azure AD Joined. When a device is Hybrid Azure AD Joined, it is joined both to your on-premise Active Directory domain, as well registered to your Azure Active Directory.

Azure SSO via Primary Refresh token requires the Windows instance to be running Windows 10 (or later), and/or Windows Server 2016 (or later), as well the Windows instance has to be Azure Hybrid AD joined. If you meet these requirements, SSO with PRT will be performed transparently in the background.

If you require your non-persistent VDI VMs to be Hybrid Azure AD joined and require Azure AD SSO with PRT, special considerations and steps are required:

This includes:

  • Scripts to automatically unjoin non-persistent (Instant Clone) VDI VMs from Azure AD on logoff.
  • Scripts to cleanup old entries on Azure AD

If you properly deploy this, it should function. If you don’t require your non-persistent VDI VMs to be Hybrid Azure AD joined, then Azure AD Seamless SSO may be better for your environment.

VMware Horizon 8 2303 now supports Hybrid Azure AD joined non-persistent VDI, using Azure AD Connect, providing Azure AD SSO with PRT. Using Horizon 8 version 2303, no scripts are required to manage Azure AD Devices.

Azure AD Seamless SSO

Microsoft Azure AD Seamless SSO after configured and implemented, handles Azure AD SSO requests without the requirement of the device being Hybrid Azure AD joined.

Seamless SSO works on Windows instances instances running Windows 7 (or later, including Windows 10 and Windows 11), and does NOT require the the device to be Hybrid joined.

Seamless SSO allows your Windows instances to access Azure related services (such as Microsoft 365 and Office 365) and provides a single sign-on experience.

This may be the easier method to use when deploying non-persistent VDI (VMware Instant Clones), if you want to implement SSO with Azure, but do not have the requirement of Hybrid AD joining your devices.

Additionally, by using Seamless SSO, you do not need to implement the require log-off and maintenance scripts mentioned in the above section (for Azure AD SSO via PRT).

To use Azure AD Seamless SSO with non-persistent VDI, you must configure and implement Seamless SSO, as well as perform one of the following to make sure your devices do not attempt to Hybrid AD join:

  • Exclude the non-persistent VDI computer OU containers from Azure AD Connect synchronization to Azure AD
  • Implement a registry key on your non-persistent (Instant Clone) golden image, to disable Hybrid Azure AD joining.

To disable Hybrid Azure AD join on Windows, create the registry key on your Windows image below:

HKLM\SOFTWARE\Policies\Microsoft\Windows\WorkplaceJoin: "BlockAADWorkplaceJoin"=dword:00000001

Conclusion

Different methods can be used to implement SSO with Active Directory and Azure AD as stated above. Use the method that will be the easiest to maintain and provide support for the applications and services you need to access. And remember, you can also implement and use both methods in your environment!

After configuring Azure AD SSO, you’ll still be required to implement the relevant GPOs to configure Microsoft 365 and Office 365 behavior in your environment.

Additional Resources

Please see below for additional information and resources:

Jan 212022
 
sconfig Server Configuration menu

We’re all used to updating our Windows Server operating systems with the Windows Update GUI, but did you know that you can update your server using command prompt and “sconfig”?

The past few years I’ve been managing quite a few Windows Server Core Instances that as we all know, do not have a GUI. In order to update those instances, you need to run Windows Update using the command line, but this method actually also works on normal Windows Server instances with the GUI as well!

Windows Update from CLI (Command Prompt)

Please enjoy this video or read on for why and how!

Why?

Using a GUI is great, however sometimes it’s not needed, and sometimes it even causes problems if it looses the backend connection where it’s pulling the data from. I’ve seen this true on newer Windows operating systems where the Windows Update GUI stops updating and you just sit there thinking the updates are running, when they are actually all complete.

The GUI also creates additional overhead and clutter. If there was an easier alternative to perform this function, wouldn’t it just make sense?

On Windows Server instances that have a GUI, I find it way faster and more responsive to just open an elevated (Administrative) command prompt, and kick off Windows Updates from there.

How

You can use this method on all modern Windows Server versions:

  • Windows Server (with a GUI)
  • Windows Server Core (without a GUI)

This also works with Windows Server Update Services so you can use this method either connecting to Windows Update (Microsoft Update) or Windows Server Update Services (WSUS).

Now lets get started!

  1. Open an Administrative (elevated) command prompt
  2. Run “sconfig” to launch the “Server Configuration” application
    command prompt launch sconfig
  3. Select option “6” to “Download and Install Windows Updates”
    sconfig Server Configuration menu
  4. Choose “A” for all updates, or “R” for recommended updates, and a scan will start
  5. After the available updates are shown, choose “A” for all updates, “N” for no updates, or “S” for single update selection

After performing the above, the updates will download and install.

sconfig Windows Update running
“sconfig” Windows Update downloading and installing

I find it so much easier to use this method when updating many/multiple servers instead of the GUI. Once the updates are complete and you’re back at the “Server Configuration” application, you can use option “13” to restart Windows.

Jan 162022
 

Welcome to Episode 04 of The Tech Journal Vlog at www.StephenWagner.com

The Tech Journal Vlog Episode 04

In this episode

Updates

  • VMware Horizon
    • Apache Log4j Mitigation with VMware Products
  • Homelab Update
    • HPE MSA 2040 vs Synology DS1621+
    • Migrating from MSA 2040 to a Synology DS1621+
    • Synology Benchmarking NVME Cache
  • DST Root CA X3 Expiration
    • End of Life Operating Systems

New Blog/Video Posts

Life Update/Fun Stuff

  • Work
  • Travel
  • Move

Current Projects

  • Synology DS1621+

Don’t forget to like and subscribe!
Leave a comment, feedback, or suggestions!