Dec 072016
 

When upgrading VMware vSphere and your ESXi hosts to version 6.5 or 6.7 you may experience an error similar to: “The upgrade contains the following set of conflicting VIBs: Mellanox_bootbank_net.XXXXversionnumbersXXXX. Remove the conflicting VIBs or use Image Builder to create a custom ISO.”. This is due to conflicting VIBs on your ESXi host. This post will go in to detail as to what causes it, and how to resolve it.

The issue

After successfully completing the migration from vCenter 6.0 (on Windows) to the vCenter 6.5 Appliance, all I had remaining was to upgrade my ESXi hosts to ESXi 6.5.

In my test environment, I run 2 x HPe Proliant DL360p Gen8 servers. I also have always used the HPe customized ESXi image for installs and upgrades.

It was easy enough to download the customized HPe installation image from VMware’s website, I then loaded it in to VMware Update Manager on the vCenter appliance, created a baseline, and was prepared to upgrade the hosts.

I successfully upgraded one of my hosts without any issues, however after scanning on my second host, it reported the upgrade as incompatible and stated: “The upgrade contains the following set of conflicting VIBs: Mellanox_bootbank_net.XXXXversionnumbersXXXX. Remove the conflicting VIBs or use Image Builder to create a custom ISO.”

The fix

I checked the host to see if I was even using the Mellanox drivers, and thankfully I wasn’t and could safely remove them. If you are using the drivers that are causing the conflict, DO NOT REMOVE them as it could disconnect all network interfaces from your host. In my case, since they were not being used, uninstalling them would not effect the system.

I SSH’ed in to the host and ran the following commands:

esxcli software vib list | grep Mell

(This command above shows the VIB package that the Mellanox driver is inside of. In my case, it returned “net-mst”)

esxcli network nic list

(This command above verifies which drivers you are using on your network interfaces on the host)

esxcli software vib remove -n net-mst

(This command above removes the VIB that contains the problematic driver)
After doing this, I restarted the host, scanned for upgrades, and successfully applied the new vCenter 6.5 ESXi Customized HPe image.

Hope this helps! Leave a comment!

Dec 072016
 

During my first migration from VMware vCenter 6.0 to VMware vCenter 6.5 Virtual appliance, the migration failed. The migration installation UI would shutdown the source VM, and numerous errors would occur afterwards when the destination vCenter appliance would try finishing configuration.

If you were monitoring the source vCenter server, during the export process, one would notice that an error pops up while compressing the source data. The error presented is generated from Windows creating an archive (zip file), the error reads: “The compressed (zipped) folder is invalid or corrupted.”. The entire migration process halts until you dismiss this message, with the entire migration ultimately failing (at first it appears to continue, but ultimately fails).

If you continued, and had the migration fail. You’ll need to power off the failed (new) vCenter appliance (it’s garbage now), and you’ll need to power on the source (original) vCenter server. The active directory trust will no longer exist at this point, so you’ll need to log on with a local (non-domain) account (on the source server), and re-create the computer trust on the domain using the netdom command:

netdom resetpwd /s:SERVERNAMEOFDOMAINCONTROLLER /ud:DOMAIN\ADMINACCOUNT /pd:*

After re-creating the trust, restart the original vCenter server. You have now reverted to your original vCenter instance and can retry the migration.

Now back to the main issue. I tried a bunch of different things and wasted an entire evening (checking character lengths on paths/filenames, trying different settings, pausing processes in case timeouts were being hit, etc…) however finally I noticed that the compression archive would crash/fail on a file called “vum_registry”.

VUM brings VMware Update Manager to mind, which I do have installed, configured, and running.

I went ahead and uninstalled VMware Update Manager off my source server (as it’s easy enough to re-configure from scratch after the migration). I then proceeded to initiate a migration. To my surprise, the “data to migrate” went from 7.9GB to 2.4GB. This is a huge sign that something was messed up with my VMware update manager deployment (even though it was working fine). I’m assuming there were either filenames that were too long (exceeded the 260 character limit on paths and filenames), special characters were being used where they shouldn’t, or something else was messed up.

After the uninstall of Update Manager, the migration completed successfully. Leave a comment!

Dec 052016
 

In the process of prepping my test environment so I can upgrade from vSphere 6.1 to 6.5, one of the prerequisites is to first upgrade your VDP appliances to version 6.1.3 (6.1.3 is the only version of VDP that supports vSphere 6.5). In my environment I’ll be upgrading VDP from 6.1.2 to 6.1.3.

After downloading the ISO, changing my disks to dependant, creating a snapshot, and attaching the ISO to the VM. My VDP appliances would not recognize the ISO image, showing the dreaded: “To upgrade your VDP appliance, place connect a valid upgrade ISO image to the appliance.”.

NoISODetected

I tried a few things, including trying the old “patch” that was issues for 6.1 when it couldn’t detect, unfortunately it didn’t help. I also tried to manually mount the virtual CD-Rom to the mountpoint but had no luck. The mountpoint /mnt/auto/cdrom is locked by the autofs service. If you try to modify these files (such as delete, create, etc…), you’ll encounter a bunch of errors and have no luck (permission denied, file and/or directory doesn’t exist, etc…).

Essentially the autofs service was not auto-mounting the virtual CD drive to the mount point.

To fix this:

  1. SSH in to the VDP appliance
  2. Run command “sudo su” to run commands as root
  3. Use vi to edit the auto.mnt file using command: “vi /etc/auto.mnt”
  4. At the end of the first line in the file, you will see “/dev/cdrom” (without quotation), change this to “/dev/sr0” (again, without quotation)
  5. Save the file (after editing the text, Ctrl+c, then type “:w” and enter which writes the file, then type “:q” then enter to quit vi.
  6. Reload the autofs config using command: “/etc/init.d/autofs reload”
  7. At the shell, run “mount” to show the active mountpoints, you’ll notice the ISO is now mounted after a few seconds.
  8. You can now initiate the upgrade. Start it.
  9. At 71%, autofs updates via a RPM, and the changes you made to the config are cleared. IMMEDIATELY edit the /etc/auto.mnt file again, change “/dev/cdrom” to “/dev/sr0” and save the file, and issue the command “/etc/init.d/autofs reload”. Do this as fast as possible.
  10. You’re good to go, the install will continue and take some time. The web interface will fail, and become unresponsive. Simply wait, and the vDP appliance will eventually shut down (in my case it took over 30 minutes after the web interface failed to reconnect, in a high performance environment for the vDP VM to shut down).

And done! Leave a comment!

 

Nov 052016
 

Yesterday, I had a reader (Nicolas) leave a comment on one of my previous blog posts bringing my attention to the MTU for Jumbo Frames on the HPe MSA 2040 SAN.

MSA 2040 MTU Comment

 

 

 

 

 

 

 

 

Since I first started working with the MSA 2040. Looking at numerous HPe documents outlining configuration and best practices, the documents did confirm that the unit supported Jumbo Frames. However, the documentation on the MTU was never clearly stated and can be confusing. I was under the assumption that the unit supported 9000 MTU, while reserving 100 bytes for overhead. This is not necessarily the case.

Nicolas chimed in and provided details on his tests which confirmed the HPe MSA 2040 does actually have a working MTU of 8900. In my configuration I did the tests (that Nicolas outlined), and confirmed that the MTU would cause packet fragmentation if the MTU was greater than 8900.

ESXi vmkping usage: https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1003728

This is a big discovery because packet fragmentation will not only degrade performance, but flood the links with lots of packet fragmentation.

I went ahead and re-configured my ESXi hosts to use an MTU of 8900 on the network used with my SAN. This immediately created a MASSIVE performance increase (both speed, and IOPS). I highly recommend that users of the MSA 2040 SAN confirm this on their own, and update the MTUs as they see fit.

Also, this brings up another consideration. Ideally, on a single network, you want all devices to be running the same MTU. If your MSA 2040 SAN is on a storage network with other SAN devices (or any other device), you may want to configure all of them to use the MTU of 8900 if possible (and of course, don’t forget your servers).

A big thank you to Nicolas for pointing this out!

Apr 102016
 

For those of you that use HP’s vibsdepot with VMWare Update Manager, you may have noticed that as of late you have not been able to synchronize patch definitions from the HP vibsdepot source.

I suspected this may have had something to do with the fact that in the past, the hp.com domain was being used to host these files, and with the company split, all enterprise related hosting has now moved to hpe.com

To fix this, simply log in to a vSphere client, jump to the “Admin View”, then “Download Settings” on the left. Right click on the HP related Download sources and simply update the URLs from hp.com to hpe.com and the problem is solved. After clicking on test, connectivity status updates to “Connected”.

Old URLS:

http://vibsdepot.hp.com/index.xml

http://vibsdepot.hp.com/index-drv.xml

New URLS:

http://vibsdepot.hpe.com/index.xml

http://vibsdepot.hpe.com/index-drv.xml

VMWare HPe vibsdepot

VMWare HPe vibsdepot

 

I later noticed this “notice” on HPe’s website (http://vibsdepot.hpe.com/):

HPE vibsdepot notice

HPE vibsdepot notice

Nov 212015
 
HP MSA2040 Dual Controller SAN with 10Gb DAC SFP+ cables

I’d say 50% of all e-mails/comments I receive from the blog in the last 12 months or so, have been from viewers requesting pictures or proof of the HPe MSA 2040 Dual Controller SAN being connection to servers via 10Gb DAC Cables. This should also apply to the newer generation HPe MSA 2050 Dual Controller SAN.

Decided to finally publicly post the pics! Let me know if you have any questions. In the pictures you’ll see the SAN connected to 2 X HPe Proliant DL360p Gen8 servers via 4 X HPe 10Gb DAC (Direct Attach Cable) Cables.

Connection of SAN from Servers

Connection of SAN from Servers

Connection of DAC Cables from SAN to Servers

Connection of DAC Cables from SAN to Servers

See below for a video with host connectivity:

Nov 172015
 

I recently had a reader reach out to me for some assistance with an issue they were having with a VMWare implementation. They were experiencing issues with uploading files, and performing I/O on Linux based virtual machines.

Originally it was believed that this was due to networking issues, since the performance issues were only one way (when uploading/writing to storage), and weren’t experienced with all virtual machines. Another particular behaviour notice was slow uploading speeds to the vSphere client file browser, and slow Physical to Virtual migrations.

After troubleshooting and exploring the issue with them, it was noticed that cache was not enabled on the RAID array that was providing the storage for the vSphere implementation.

Please note, that in virtual environments with storage based off RAID arrays, RAID cache is a must (for performance reasons). Further, Battery backed RAID cache is a must (for protection and data integrity). This allows write operations to be cached and performed on multiple disks at once, sometimes even optimizing the write procedures as they are processed. This allows writes to occur simultaneously to multiple disks, and also dramatically increases observed performance since the ESXi hosts, and virtual machines aren’t waiting for write operations to commit before proceeding to the next.

You’ll notice that under Windows virtual machines, this issue won’t be observed on writes since the Windows VMs typically cache file transfers to RAM, which then write to disk. This could give the impression that there are no storage issues when typically troubleshooting these issues (making one believe that it’s related to the Linux VMs, the ESXi hosts themselves, or some odd networking issue).

 

Again, I cannot stress enough that you should have a battery backed cache module, or capacitor backed flash module providing cache functions.

If you do implement cache without backing it with a battery, corruption can occur on the RAID array if there is a power failure, or if the RAID controller freezes. The battery backed cache allows cached write procedures to be committed to disk on next restart of the storage unit/storage controller thus providing protection.

Jun 072014
 

I’ve had the HPe MSA 2040 setup, configured, and running for about a week now. Thankfully this weekend I had some time to hit some benchmarks. Let’s take a look at the HPe MSA 2040 benchmarks on read, write, and IOPS.

 

First some info on the setup:

-2 X HPe Proliant DL360p Gen8 Servers (2 X 10 Core processors each, 128GB RAM each)

-HPe MSA 2040 Dual Controller – Configured for iSCSI

-HPe MSA 2040 is equipped with 24 X 900GB SAS Dual Port Enterprise Drives

-Each host is directly attached via 2 X 10Gb DAC cables (Each server has 1 DAC cable going to controller A, and Each server has 1 DAC cable going to controller B)

-2 vDisks are configured, each owned by a separate controller

-Disks 1-12 configured as RAID 5 owned by Controller A (512K Chunk Size Set)

-Disks 13-24 configured as RAID 5 owned by Controller B (512K Chunk Size Set)

-While round robin is configured, only one optimized path exists (only one path is being used) for each host to the datastore I tested

-Utilized “VMWare I/O Analyzer” (https://labs.vmware.com/flings/io-analyzer) which uses IOMeter for testing

-Running 2 “VMWare I/O Analyzer” VMs as worker processes. Both workers are testing at the same time, testing the same datastore.

 

Sequential Read Speed:

MSA2040-ReadMax Read: 1480.28MB/sec

 

Sequential Write Speed:

MSA2040-WriteMax Write: 1313.38MB/sec

 

See below for IOPS (Max Throughput) testing:

Please note: The MaxIOPS and MaxWriteIOPS workloads were used. These workloads don’t have any randomness, so I’m assuming the cache module answered all the I/O requests, however I could be wrong. Tests were run for 120 seconds. What this means is that this is more of a test of what the controller is capable of handling itself over a single 10Gb link from the controller to the host.

 

IOPS Read Testing:

MSA2040-MaxIOPSMax Read IOPS: 70679.91IOPS

 

IOPS Write Testing:

MSA2040-WriteOPSMax Write IOPS: 29452.35IOPS

 

PLEASE NOTE:

-These benchmarks were done by 2 seperate worker processes (1 running on each ESXi host) accessing the same datastore.

-I was running a VMWare vDP replication in the background (My bad, I know…).

-Sum is combined throughput of both hosts, Average is per host throughput.

 

Conclusion:

Holy crap this is fast! I’m betting the speed limit I’m hitting is the 10Gb interface. I need to get some more paths setup to the SAN!

Cheers

 

Jun 072014
 

While doing some semi-related research on the internet, I’ve come across numerous how-to and informational articles explaining how to configure iSCSI MPIO, and advising readers to incorrectly use iSCSI port binding. I felt the need to whip up a post to explain why and when you should use iSCSI port Binding on VMware vSphere.

This post applies to all versions of VMware vSphere, including 5, 5.5, 6, 6.5, and 6.7.

What does iSCSI port binding do

iSCSI port binding binds an iSCSI initiator interface on a ESXi host to a vmknic and configures accordingly to allow multipathing in a situation where both vmknics are residing in the same subnet. In normal circumstances without port binding, if you have multiple vmkernels on the same subnet, the ESXi host would simply choose one and not use both for transmission of packets, traffic, and data. iSCSI port binding forces the iSCSI initiator to use that adapter for both transmission and receiving of iSCSI packets.

In most simple SAN environments, there are two different types of setups/configurations.

  1. Multiple Subnet – Numerous paths to a storage device on a SAN, each path residing on separate subnets. These paths are isolated from each other and usually involve multiple switches.
  2. Single Subnet – Numerous paths to a storage device on a SAN, each path is on the same subnet. These paths usually go through 1-2 switches, with all interfaces on the SAN and the hosts residing on the same subnet.

A lot of you I.T. professionals know the the issues that occur when you have a host that is multi-homed, and you know that in normal typical scenarios with Windows and Linux, that if you have multiple adapters residing on the same subnet, you’ll have issues with broadcasts, and in most cases you have absolutely no control over what communications are initiated over what NIC due to the way the routing table is handled. In most cases all outbound connections will be initiated through the first NIC installed in the system, or whichever one is inside of the primary route in the routing table.

When to use iSCSI port binding

This is where iSCSI Port Binding comes in to play. If you have an ESXi host that has vmks sitting on the same subnet, you can bind the iSCSI initiators to the physical NICs. This allows multiple iSCSI connections on multiple NICs residing on the same subnet.

So the general rule of thumb is:

  • One subnet, iSCSI port binding is the way to go!
  • Two or more subnets, DON’T USE ISCSI PORT BINDING! It’s just not needed since all vmknics are residing on different subnets.

Additional Information

Here’s two links to VMWare documentation explaining this in more detail:

http://kb.vmware.com/kb/2010877
http://kb.vmware.com/kb/2038869

 

Jun 072014
 

When I first started really getting in to multipath iSCSI for vSphere, I had two major hurdles that really took a lot of time and research to figure out before deploying. So many articles on the internet, and most were wrong.

1) How to configure a vSphere Distributed Switch for iSCSI multipath MPIO with each path being on different subnets (a.k.a. each interface on separate subnets)

2) Whether or not I should use iSCSI Port Binding

 

In this article I’ll only be getting in to the first point. I’ll be creating another article soon to go in to detail with the second, but I will say right now in this type of configuration (using multiple subnets on multiple isolated networks), you DO NOT use iSCSI Port Binding.

 

Configuring a standard (non distributed) vSphere Standard Switch is easy, but we want to do this right, right? By configuring a vSphere Distributed Switch, it allows you to roll it out to multiple hosts making configuration and provisioning more easy. It also allows you to more easily manage and maintain the configuration as well. In my opinion, in a fully vSphere rollout, there’s no reason to use vSphere Standard switches. Everything should be distributed!

My configuration consists of two hosts connecting to an iSCSI device over 3 different paths, each with it’s own subnet. Each host has multiple NICs, and the storage device has multiple NICs as well.

As always, I plan the deployment on paper before touching anything. When getting ready for deployment, you should write down:

-Which subnets you will use

-Choose IP addresses for your SAN and hosts

-I always draw a map that explains what’s connecting to what. When you start rolling this out, it’s good to have that image in your mind and on paper. If you lose track it helps to get back on track and avoid mistakes.

 

For this example, let’s assume that we have 3 connections (I know it’s an odd number):

Subnets to be used:

10.0.1.X

10.0.2.X

10.0.3.X

SAN Device IP Assignment:

10.0.1.1 (NIC 1)

10.0.2.1 (NIC 2)

10.0.3.1 (NIC 3)

Host 1 IP Assignment:

10.0.1.2 (NIC 1)

10.0.2.2 (NIC 2)

10.0.3.2 (Nic 3)

Host 2 IP Assignment:

10.0.1.3 (NIC 1)

10.0.2.3 (NIC 2)

10.0.3.3 (NIC 3)

 

So no we know where everything is going to sit, and it’s addresses. It’s now time to configure a vSphere Distributed Switch and roll it out to the hosts.

1) We’ll start off by going in to the vSphere client and creating a new vSphere Distributed Switch. You can name this switch whatever you want, I’ll use “iSCSI-vDS” for this example. Going through the wizard you can assign a name. Stop when you get to “Add Hosts and Physical Adapter”, on this page we will chose “Add Later”. Also, when it asks us to create a default port group, we will un-check the box and NOT create one.

2) Now we need to create some “Port Groups”. Essentially we will be creating a Port Group for each subnet and NIC for the storage configuration. In this example case, we have 3 subnets, and 3 NICs per host, so we will be creating 3 port groups. Go ahead and right click on the new vSphere distributed switch we created (“iSCSI-vDS” in my example), and create a new port group. I’ll be naming my first one “iSCSI-01”, second will be called “iSCSI-02”, and so on. You can go ahead and create one for each subnet. After these are created, we’ll end up with this:

vSphere Distributed Switch Configuration for MPIO multiple subnets

 

3) After we have this setup, we now need to do some VERY important configuration. As of right now by default, each port group will have all uplinks configured as Active which we DO NOT want. Essentially what we will be doing, is assigning only one Active Uplink per Port Group. Each port group will be on it’s own subnet, so we need to make sure that the applicable uplink is only active, and the remainder are thrown in to the “Unused Uplinks” section. This can be achieved by right clicking on each port group, and going to “Teaming and Failover” underneath “Policies”. You’ll need to select the applicable uplinks and using the “Move Down” button, move them down to “Unused Uplinks”. Below you’ll see some screenshots from the iSCSI-02, and iSCSI-03 port groups we’ve created in this example:

PortGroup-iSCSI02

PortGroup-iSCSI03

You’ll notice that the iSCSI-02 port group, only has the iSCSI-02 uplink marked as active. Also, the iSCSI-03 port group, only have the iSCSI-03 uplink marked as active. The same applies to iSCSI-01, and any other links you have (more if you have more links). Please ignore the entry for “iSCSI-04”, I created this for something else, pretend the entry isn’t there. If you do have 4 subnets, and 4 NICs, then you would have a 4th port group.

4) Now we need to add the vSphere Distributed Switches to the hosts. Right click on the “iSCSI-vDS” Distributed switch we created and select “Add Host”. Select ONLY the hosts, and DO NOT select any of the physical adapters. A box will appear mentioning you haven’t selected any physical adapters, simply hit “Yes” to “Do you want to continue adding the hosts”. For the rest of the wizard just keep hitting “Next”, we don’t need to change anything. Example below:

AddVDStoHost

So here we are now, we have a vSphere Distributed Switch created, we have the port groups created, we’ve configured the port groups, and the vDS is attached to our hosts… Now we need to create vmks (vmkernel interfaces) in each port group, and then attach physical adapters to the port groups.

5) Head over to the Configuration tab inside of your ESXi host, and go to “Networking”. You’ll notice the newly created vSphere Distributed Switch is now inside the window. Expand it. You’ll need to perform these steps on each of your ESXi hosts. Essentially what we are doing, is creating a vmk on each port group, on each host.

Click on “Manage Virtual Adapters” and click on “Add”. We’ll select “New Virtual Adapter”, then on the next screen our only option will be “VMKernel”, click Next. In the “Select port group” option, select the applicable port group. You’ll need to do this multiple times as we need to create a vmkernel interface for each port group (a vmk on iSCSI-01, a vmk on iSCSI-02, etc…), on each host, click next.

Since this is the first port group (iSCSI-01) vmk we are creating on the first host, we’ll assign the IP address as 10.0.1.2, fill in the subnet box, and finish the wizard. Create another vmk for the second port group (iSCSI-02), since it’s the first host it’ll have an IP of 10.0.2.2, and then again for the 3rd port group with an IP of 10.0.3.2.

After you do this for the first host, you’ll need to do it again for the second host, only the IPs will be different since it’s a different host (in this example the second host would have 3 vmks on each port group, example: iSCSI01 – 10.0.1.3, iSCSI02 – 10.0.2.3, iSCSI03 – 10.0.3.3).

Here’s an example of iSCSI02 and iSCSI03 on ESXi host 1. Of course there’s also a iSCSI-01 but I cut it from the screenshot.

Screen1

6) Now we need to “manage the physical adapters” and attach the physical adapters to the individual port groups. Essentially this will map the physical NIC to the seperate subnets port groups we’ve created for storage in the vDS. We’ll need to do this on both hosts. Inside of the “Managed Physical Adapters” box, you’ll see each port group on the left hand side, click on “<Click to Add NIC>”. Now in everyone’s environments the vmnic you add will be different. You should know what the physical adapter you want to map to the subnet/port group is. I’ve removed the vmnic number from the below screenshot just in case… And to make sure you think about this one…

ManagePhysical

 

As mentioned above, you need to do this on both hosts for the applicable vmnics. You’ll want to assign all 3 (even though I’ve only assigned 2 in the above screenshot).

 

Voiala! You’re done! Now all you need to do is go in to your iSCSI initiator and add the IPs of the iSCSI target to the dynamic discovery tab on each host. Rescan the adapter, add the VMFS datastores and you’re done.

If you have any questions or comments, or feel this can be done in a better way, drop a comment on this article. Happy Virtualizing!

 

Additional Note – Jumbo Frames

There is one additional step if you are using jumbo frames. Please note that to use jumbo frames, all NICs, physical switches, and the storage device itself need to be configured to support this. On the VMWare side of things, you need to apply the following settings:

1) Under “Inventory” and “Networking”, Right Click on the newly created Distributed Switch. Under the “Properties” tab, select “Advanced” on the left hand side. Change the MTU to the applicable frame size. In my scenario this is 9000.

2) Under “Inventory” and “Hosts and Clusters”, click on the “Configuration Tab”, then “vSphere Distributed Switch”. Expand the newly created “Distributed Switch”, select “Manage Virtual Adapters”. Select a vmk interface, and click “edit”. Change the MTU to the applicable size, in my case this is 9000. You’ll need to do this for each vmk interface on each physical host.