When attempting to log in to your VMware vCenter using the HPE Simplivity Upgrade Manager to perform an upgrade on your Simplivity Infrastructure, the login may fail with Access Denied, Incorrect Credentials, or Incorrect Username and Password.
Despite confirming that the credentials are correct (logging in to the vCenter UI, as well as the vCSA console via SSH), the HPE Simplivity Upgrade Manager will continue to fail on connection.
During the login process, the HPE Simplivity Upgrade Manager will not only check the credentials and attempt to logon to the vCenter server, but it will also attempt to pull and validate the SSL certificates (whether trusted or not) on the vCenter server.
During the typical login process, after entering the credentials and clicking “Connect”, the user will be prompted with the SSL certificate information asking to approve the connection. In this specific circumstance the SSL window is not presented.
Because of the SSL check not being presented, I thought there may have been a chance with trusting the connection, and possibly HPE Simplivity wasn’t able to show the error specific to the SSL check failing.
When clicking on this, I was presented with an HTTP 404 error (File not found), meaning the certficiates weren’t present, which I felt may be contributing or causing this problem.
When using VMware vSAN 7.0 Update 3 (7U3) and using the graceful shutdown (and restart) of your entire vSAN cluster, you may experience an issue resulting with all VMs inaccessible after everything has been powered back on and the hosts taken out of maintenance mode.
If you experience this issue, you will also notice that your vSAN datastore appears to be empty (files and VMs), however you can see that there is data used on the datastore (data usage calculation).
As of vSAN 7.0 Update 3, users can now gracefully shutdown and restart their entire vSAN cluster from the GUI instead of having to use the CLI/SSH. While you can still Manually Shut Down and Restart the vSAN Cluster, as one can expect if there’s any easy way to do it via the GUI, it’ll get used.
Last night I had a customer call who used this feature, and when bringing up their cluster, all the VMs were marked as inaccessible and the datastore appeared to be empty. What was even more odd is that all the vSAN health information pertaining to the disks looked good.
Connecting to troubleshoot this (with my limited experience with vSAN), I attempted the following:
Restart vSAN Management Services on all ESXi Hosts
Restart vSAN Health Services on the vCenter vCSA (then wait 15 minutes and restart ESXi vSAN Manage Services)
Restart one of the ESXi hosts (to troubleshoot quorum)
Troubleshoot Networking (Issues occurred after physical maintenance)
Check LAGs (for vSAN Storage Network)
Check Communication and Traffic
After doing all of the above, the VMs still were not accessible.
I had a feeling that this was related to the shutdown and restart (power on) process, so tried to manually start the vSAN cluster using the following command:
Today I want to talk about Memory Deduplication on ESXi with Transparent Page Sharing (TPS). This is a technology that isn’t widely known about, even amongst IT professionals with significant experience with VMware products.
And you may ask “Memory Deduplication, why aren’t we using this?!?” as it sounds like a pretty cool piece of technology… Well, I’m about to tell you why you’re not (Inter-VM), and share a few examples of where you would want to enable this.
I also want to show you how to enable TPS globally (Inter-VM), and also discuss TPS being used with VMware Horizon and VDI.
What is Transparent Page Sharing (TPS)?
Transparent Page Sharing is the process in which ESXi can provide memory deduplication by storing duplicate memory pages as a single page on the physical memory of the host. This process stops the system from storing redundant memory pages, and thus frees up physical memory for other uses.
If my memory serves me right, this was originally enabled by default in ESX/ESXi version 5, but was later globally disabled due to security vulnerabilities and concerns.
Note, that TPS is still enabled by default from within the same VM, even today with ESXi 8.
I recall two potential scenarios and security concerns which led to VMware changing the original default behavior for TPS.
Scenario 1 included a concern about an attacker gaining access to a VM, and then having the ability to modify the memory contents of another VM.
Scenario 2 included a concern where an attacker may be able to get access to encryption keys used on another system.
In short, you could enable TPS globally (Inter-VM) by setting “Mem.ShareForceSalting” in “Advanced Settings”, to a value of “0”. You can also use the salting to configure groups of VMs that are allow to share memory pages.
Additionally, you can tweak the behavior of TPS by modifying some of the settings shown below:
As you can see you can configure things like the scanning occurrence (Mem.ShareScanTime) of how often the system will check for memory pages that can be shared/deduplicated and other settings.
TPS is enabled, but not working
So, you may have decided to enable TPS in your environment, but you’re noticing that either no, or very little memory pages are being marked as shared.
In the example above, you’ll notice that on a loaded host, with TPS enabled globally (Inter-VM, amongst all VMs), that the host is only deduplicating 1,052KB of memory.
This is because you will most often only see TPS being heavily utilized on an ESXi host that has over-committed memory, there’s also a chance that you simply don’t have enough memory pages that can be duplicated.
Memory Deduplication, TPS, and VMware Horizon VDI
Because VMware Horizon utilizes the “vmfork” with “Just-in-Time” desktop delivery, non-persistent VDI will benefit from some level of memory deduplication by default when using Instant Clones with non-persistent VDI. This is because non-persistent VDI guests are spawned from a running base image.
Additionally, you can further implement, enable, and configure TPS by configuring some Transparent Page Sharing options inside of the VMware Horizon Administration console.
When creating a Desktop Pool, you can set the “Transparent Page Sharing” open to “Virtual Machine” (Memory dedupe inside of the VM only), “Pool” (Memory dedupe across the Desktop Pool), “Pod” (Dedupe across the pod), or “Global” (Full Inter-VM memory deduplication across the ESXi host).
If you enabled TPS on the ESXi host globally, these settings are null and not used.
TPS Use Cases
So you might be asking when it’s a good time to use TPS?
The Homelab – When is a homelab not a good reason to try something? Looking to save some memory and overcommit memory resources? Implement TPS.
VDI Environments – On highly dense hosts, you may consider implementing TPS at some level to maximize the utilization of resources, however you must be aware of the security consequences and factor this in when configuring TPS.
Environments with no Sensitive Information – It’s hard to imagine, but if you have an environment that doesn’t contain any sensitive information and doesn’t use any security keys, it would be suitable to enable TPS.
I’m sure there’s a number of other use cases, so leave a comment if you can think of one.
In my opinion Transparent Page Sharing is a technology that should not be forgotten and discarded. VMware admins should be aware of it, how to configure it, and what the implications are of using it.
If you are considering enabling TPS in your environment, you must factor in the potential security consequences of doing so.
If you’re anything like me, you were excited to get your hands on the latest Windows 11 22H2 Feature Update after it was released on September 20th, 2022. However, while it was releassed, as with all feature upgrades, it is deployed on a slow basis and not widely immediately available for download. So you may be asking how to force Windows 11 22H2 Feature Update.
From what I understand, for most x86 users, the Windows 11 22H2 Feature Upgrade made itself available slowly over the months after it’s release, however there may be some of you who still don’t have access to it.
Additionally, there may be some of you who are using special hardware such as ARM64, like me with my Lenovo X13s Windows-on-ARM laptop, who haven’t been offered the update as I believe it’s being rolled out slower than its x86 counterpart.
However, if you’re using ARM64, you cannot use any of those above as they are designed for x86 systems. I waited some time, but decided I wanted to find a way to force this update.
Inside of WSUS, I tried to approve the Windows 11 22H2 Feature Update, however that had no success, as the system wasn’t checking for that update (it wasn’t “required”). I then tried to modify the local GPOs to force the feature update, which to my surprise worked!
Instructions to force the update
This should work on systems that are not domain joined, as well as systems that are domain joined, even with WSUS.
Please note that this will only force the update if your system is approved for the update. Microsoft has various safeguards in place for certain scenarios and hardware, to block the update. See below on how to disable safeguards for feature updates.
In order to Force Windows 11 22H2 Feature Update, follow the instructions below:
Open the Local Group Policy via the start menu, or run “gpedit.msc”.
Expand “Local Computer Policy\Computer Configuration\Administrative Templates\Windows Components\Windows Update\Manage updates offered from Windows Update”
Open “Select the target Feature Update version”
Set the first field (Which Windows product version would you like to receive feature updates for), to “Windows 11”
Set the second field (Target Version for Feature Updates) to “22H2”.
Click Apply, Click Ok, close the windows.
Either restart the system, or run “gpupdate /force” to force the system to see the settings.
Check for Windows Update (From Microsoft Update if you’re using WSUS), you should now see the update available. You may need to check a few times and/or restart the system again.
Install the Feature Upgrade, and then go back to the setting and set to “Not Configured” to ensure you receive future feature upgrades.
See below for a screenshot of the setting:
For those with a domain and/or work environment, you could deploy this setting over a wide variety of computers using your Active Directory Domain’s Group Policy Objects.
Keep in mind that these safeguards are in place to protect you and your system from experiencing issues, possibly even issues that could result in a unrecoverable situation. I do not recommend doing this unless you have a backup and know what you are doing.
To disable safeguards for features:
Make sure you still have the “Local Group Policy” MMC still open on “Local Computer Policy\Computer Configuration\Administrative Templates\Windows Components\Windows Update\Manage updates offered from Windows Update”.
Open “Disable Safeguards for Feature Updates”
Set this option to “Enabled”, click Apply, and then OK.
See below for a screenshot of the setting:
After applying this, you should now be able to upgrade to Windows 11, version 22H2.
If you’re like me, you want to make sure that your environment is as optimized as possible. I recently noticed that my NVIDIA A2 vGPU was reporting the vGPU PCIe Link Speed and Generation that the card was using was below what it was supposed to be running at on my VMware vSphere ESXi host.
I needed to find out if this was being reported incorrectly, if there was an issue, or something else effecting this. In my case, the specific GPU I was using is supposed to support PCIe Gen4, and has a physical connector supporting 4x, my host has PCIe Gen3 slots, so I should at least be getting Gen3 speeds.
When running the command “nvidia-smi -q”, the GPU was reporting that it was only running at PCIe Gen 1 speeds, as shown below:
GPU Link Info
Max : 3
Current : 1
Device Current : 1
Device Max : 4
Host Max : N/A
Max : 16x
Current : 8x
Something else worth noting, is that the card states that it supports a 16x interface, when it actually only physical has a 8x connector. I believe they use this chip on another board that has multiple GPUs on a single board that actually supports 16x.
You could say I was quite puzzled. Why would the card only be running at PCIe Generation 1 speeds? I thought it could be any of the scenarios below:
Dynamic mode that alternates when required (possibly for power savings)
Hardware Limitation (I’m using this in an older server)
Unfortunately, when searching the internet, I couldn’t find many references to this metric, however I did find references from other user’s copy/pastes of “nvidia-smi -q” which had the same values (running PCIe Gen1), even with beefier and more high-end cards.
After some more searching, I finally came across an NVIDIA technical document titled “Useful nvidia-smi Queries” that states that the current PCIe Generation Link speed “may be reduced when the GPU is not in use”. This confirms that it’s dynamic and adjusts when needed.
Finally, I decided to give some games a shot in a couple of the VMs, and to my surprise when running a game, the “Device Current” and “Current” PCIe Generation changed to PCIe Gen3 (note that my server isn’t capable of PCIe Gen4, which is the cards maximum), as shown below:
GPU Link Info
Max : 3
Current : 3
Device Current : 3
Device Max : 4
Host Max : N/A
Max : 16x
Current : 8x
In conclusion, if you notice this in your environment, do not be alarmed as this is completely normal and expected behavior.
Today I’m going to show you the process to install Horizon Agent for Linux on Ubuntu 22.04 LTS. We’ll be installing the Horizon Agent for Linux from VMware Horizon 8 version 2209.
The official documentation from VMware is helpful, but unfortunately doesn’t provide all the information to get up and running quickly, which is why I’ve put together this guide as a “Quick Start”.
Please note, that this is just a guide to get to the point where you can install NVIDIA vGPU drivers and have installed the Horizon Agent for Linux on the VM. This will provide you with a persistent VM that you can use with Horizon, and the instructions can be adapted for use in a non-persistent instant clone environment as well.
Add the Ubuntu 22.04 LTS VM to the manual desktop pool
Entitle the User account to the desktop pool and assign to the VM
Connect to the Ubuntu 22.04 Linux VDI VM from the VMware Horizon Client
You should now be able to connect to the Ubuntu Linux VDI VM using the VMware Horizon client. Additionally, if you installed the vGPU drivers for NVIDIA vGPU, you should have full 3D acceleration and functionality.
When deploying automated desktop pools with NVIDIA vGPU on VMware Horizon with an NVIDIA A2 GPU, you may notice provisioning fails with an error.
Error during Provisioning Cloning of VM VM-NAME-01 has failed: Fault type is UNKNOWN_FAULT_FATAL - No GPU capable host available for provisioning VM-NAME-01 with profile nvidia_a2-4q. Please refer to VMware KB 59271 for more details.
Further, when visiting VMware KB 59271 and performing the instructions, provisioning still continues to fail.
Essentially, at present there is no “supported” to resolve this issue without applying the fix listed in this post. Additionally, if you’re a VMware customer with an active support agreement, I would recommend opening a ticket with VMware Support so that it can be addressed in a future release.
The NVIDIA A2 GPU is fairly new, along with VMware vSphere support. Even newer, is the support for vGPU and VMware Horizon, requiring the latest drivers (vGPU Drivers versions 14.2 released August 2022) to enable vGPU profiles for the card.
After troubleshooting this, I noted that the “graphic-profiles.properties” file in “C:\Program Files\VMware\VMware View\Server\broker\conf” did not contain any NVIDIA A2 vGPU Profiles. Additionally, the file available on the VMware KB was also missing these profiles.
To fix this, I referenced the NVIDIA vGPU User Guide to note the vGPU profiles allowed on the card, and created my own entries for the configuration file.
After adding these entries, restarting the server (or service), I was able to provision NVIDIA A2 enabled vGPU desktop pools.
To resolve this issue, add the following entries to your “graphic-profiles.properties” file in “C:\Program Files\VMware\VMware View\Server\broker\conf” (note, the contents of the file is case-sensitive):
After restarting the server or services, you should now be able to use the NVIDIA A2 vGPU profiles with VMware Horizon automated (vGPU) desktop pools.
You should be able to use this fix for other new vGPU cards that have been recently released where the profiles have not been configured for Horizon. VMware is likely to fix this in future released of VMware Horizon.
However, in some rare cases you may have everything configured properly, however the backup may continue to present these warnings with where it’s unable to truncate Microsoft SQL Server transaction logs.
I recently deployed an SQL Server in a domain, and of course made sure to setup the proper backup procedures as I’ve done a million times.
However, when performing a backup, the backup would present a warning with the following message:
Unable to truncate Microsoft SQL Server transaction logs. Details: Failed to call RPC function 'Vss.TruncateSqlLogs': Error code: 0x80004005. Failed to invoke func [TruncateSqlLogs]: Unspecified error. Failed to process 'TruncateSQLLog' command. Failed to logon user [ReallyLongDomainName\Admin-Account]. Win32 error:The user name or password is incorrect. Code: 1326.
This was very odd as I configured everything properly, and even confirmed it when referring the Veeam KB listed above in this post.
So I decided to look at this as if it was something different, something with credentials, or a different problem.
I noticed that in this specific customer environment, that their FQDN for their domain was so long, that the NETBIOS domain name did not equal their FQDN domain name.
Due to the length of the domain, they shortened the NETBIOS domain with abbreviated letters.
When configuring the Veeam credentials for guest processing, one would assume that when using the “AD Search” function, it would have pulled the “LCNDOMAIN\BackupAdminProcessing” account, however when using the check feature, it actually created an entry for “LongCompanyName\BackupAdminProcessing”, which was technically incorrect as it didn’t match the SAM logon format for the account.
Because of the observation noted above, I created a manual credential entry for “LCNDOMAIN\BackupAdminProcessing”, reconfigured the backup job to use those new credentials, and it worked!
The issue is because when using the AD search function in the credential manager, Veeam doesn’t translate and pull the NETBIOS domain, but uses the SAM logon format and assumes the UPN Domain matches the NETBIOS domain name.
While this may hold true in most scenarios, there may be rare situations (like above) where the NETBIOS domain name does not match the domain used in the UPN suffix.
When either directly passing through a GPU, or attaching an NVIDIA vGPU to a Virtual Machine on VMware ESXi that has more than 16GB of Video Memory, you may run in to a situation where the VM fails to boot with the error “Module ‘DevicePowerOn’ power on failed.”. Special considerations are required when performing GPU or vGPU Passthrough with 16GB+ of video memory.
This issue is specifically caused by memory mapping a GPU or vGPU device that has 16GB of memory or higher, and could involve both the host system (the ESXi host) and/or the Virtual Machine configuration.
In this post, I’ll address the considerations and requirements to passthrough these devices to virtual machines in your environment.
In the order of occurrence, it’s usually VM configuration related, however if the recommendations in the “VM Configuration Considerations” section do not resolve the issue, please proceed to reviewing the “ESXi Host Considerations” section.
Please note that if the issue is host related, other errors may be present, or the device may not even be visible to ESXi.
VM GPU and vGPU Configuration Considerations
First and foremost, all new VMs should be created using the “EFI” Firmware type. EFI provides numerous advantages in device access and memory mapping versus the older style “BIOS” firmware types.
To do this, create a new virtual machine, navigate to “VM Options”, expand “Boot Options”, and confirm/change the Firmware to “EFI”. I recommend this for all new VMs, and not only for VMs accessing GPUs or vGPUs with over 16GB of memory. Please note that you shouldn’t change an existing VM, and should do this on a fresh new VM.
With performing GPU or vGPU Passthrough with 16GB+ of video memory, you’ll need to create a couple of entries under “Advanced” settings to properly configure access to these PCIe devices and provide the proper environment for memory mapping. The lack of these settings is specifically what causes the “Module ‘DevicePowerOn’ power on failed.” error.
Under the VM settings, head over to “VM Options”, expand “Advanced” and click on “Edit Configuration”, click on “Add Configuration Params”, and add the following entries:
You’ll notice that while our GPU or vGPU profile may have 16GB of memory, we need to double that value, and set it for the “pciPassthru.64bitMMIOSizeGB” variable. If your card or vGPU profile had 32GB, you’d set it to “64”.
Additionally if you were passing through multiple GPUs or vGPU devices, you’d need to factor all the memory being mapped, and double the combined amount.
ESXi GPU and vGPU Host Considerations
On most new and modern servers, the host level doesn’t require any special configuration as they are already designed to pass through such devices to the hypervisor properly. However in some special cases, and/or when using older servers, you may need to modify configuration and settings in the UEFI or BIOS.
If setting the VM Configuration above still results in the same error (or possibly other errors), than you most likely need to make modifications to the ESXi hosts BIOS/UEFI/RBSU to allow the proper memory mapping of the PCIe device, in our case being the GPU.
This is where things get a bit tricky because every server manufacturer has different settings that will need to be configured.
Look for the following settings, or settings with similar terminology:
“Memory Mapping Above 4G”
“Above 4G Decoding”
“PCI Express 64-Bit BAR Support”
“64-Bit IOMMU Mapping”
Once you find the correct setting or settings, enable them.
Every vendor could be using different terminology and there may be other settings that need to be configured that I don’t have listed above. In my case, I had to go in to a secret “SERVICE OPTIONS” menu on my HPE Proliant DL360p Gen8, as documented here.
After performing the recommendations in this guide, you should now be able to passthrough devices with over 16GB of memory.
With VMware ESXi 6.5 and 6.7 going End of Life on October 15th, 2022, many of you are looking for options to update hosts in your homelab, especially in my case putting ESXi 7.0 on HP Proliant DL360p Gen8 servers.
As far as support goes, HPE last provided a custom installer for ESXi for versions 6.5 U3 which was released December of 2019. This was the “last Pre-Gen9 custom image” released, as ESXi 7.0 on the DL360p Gen8 is totally unsupported.
ESXi 6.7 or higher on the Gen8 Servers
The jump from 6.5 to 6.7 was a little easier, as you could use the 6.5 custom installer, and then upgrade to 6.7. For the most part, as long as the hardware itself was supported, you were in pretty good shape.
Additionally, with the HPE vibsdepot loaded in to VMware Update Manager (now known as Lifecycle Manager), you could also keep all the HPE drivers and agents up to date.
ESXi 7.0 on the Gen8 Servers
Some were lucky enough to upgrade their current installs to 7 with no or limited problems, however the general consensus online was to expect problems. There were some major driver changes, which I think at one point led to an advisory to perform a fresh install, even if you had a fully supported configuration with newer generation servers such as the Proliant Gen9 and Gen10 servers, when upgrading from older versions.
In my setup, I have the following:
2 x HPE Proliant DL360p Gen8 Servers
Dual Intel Xeon E5-2660v2 Processors in each server
USB and/or SD for booting ESXi
No other internal storage
External SAN iSCSI Storage
Concerns and Considerations
My main concern is to not only have a stable and functioning ESXi 7 instance, but I also, if possible would like to have the HPE drivers, agents, and integrations with iLO.
You must consider that while this is completely unsupported, you do need to make sure that the components of your current configuration are supported, such as the processor and PCIe cards, even if the server as a whole is not supported.
Boot server, install using the Generic Installer downloaded above.
Mount NFS or iSCSI datastore.
Copy HPE Custom Addon for ESXi zip file to datastore.
Enable SSH on host (or use console).
Place host in to maintenance mode.
Run “esxcli software vib install -d /vmfs/volumes/datastore-name/folder-name/HPE-703.0.0.10.9.1.5-Jul2022-Addon-depot.zip” from the command line.
The install will run and complete successfully.
Restart your server as needed, you’ll now notice that not only were HPE drivers installed, but also agents like the Agentless management agent, and iLO integrations.
You’ll now have a functioning instance.
In my case everything was working, except for the “Smart Array P420i” RAID Controller, which I don’t use anyways.
Additionally, if you have a vCenter instance running, make sure that you add the HPE vibsdepot repo to your Lifecycle Manager. After you add the repo, create a baseline, and attach the baseline to the host, go ahead and proceed to scan, stage, and remediate the server which will then further update all the HPE specific drivers and software.
Privacy & Cookies Policy