VMWare vSphere iSCSI Port Binding – When to use iSCSI Port Binding, and why!

ESXi, iSCSI, Storage, VMware, vSphere Add comments

Jun 072014

Over the years I’ve come across numerous posts, blogs, articles, and howto guides that provide information on when to use iSCSI port binding, and they’ve all been wrong! Here, I’ll explain when to use iSCSI Port Binding, and why!

This post and information applies to all versions of VMware vSphere including 5, 5.5, 6, 6.5, 6.7, and 7.0.

See below for a video version of the blog post:

VMWare vSphere iSCSI Port Binding – When to use iSCSI Port Binding, and why!

What does iSCSI port binding do

iSCSI port binding binds a software iSCSI initiator interface on a ESXi host to a physical vmknic and configures it accordingly to allow multi-pathing (MPIO) in a situation where both vmknics are residing in the same subnet.

In normal circumstances without port binding, if you have multiple vmkernels on the same subnet (mulithomed), the ESXi host would simply choose one and not use both for transmission of packets, traffic, and data. iSCSI port binding forces the iSCSI initiator to use that adapter for both transmission and receiving of iSCSI packets.

In most simple SAN environments, there are two different types of setups/configurations.

Multiple Subnet – Numerous paths to a storage device on a SAN, each path residing on separate subnets. These paths are isolated from each other and usually involve multiple switches.
Single Subnet – Numerous paths to a storage device on a SAN, each path is on the same subnet. These paths usually go through 1-2 switches, with all interfaces on the SAN and the hosts residing on the same subnet.

IT professionals should be aware of the the issues that occur when you have a host that is multi-homed with multiple NICs on the same subnet.

In a normal typical scenario with Windows and Linux, if you have multiple adapters residing on the same subnet you’ll have issues with broadcasts and transmission of packets, and in most cases you have absolutely no control over what communications are initiated over what NIC due to the way the routing table is handled. In most cases all outbound connections will be initiated through the first NIC installed in the system, or whichever one is inside of the primary route in the routing table.

When to use iSCSI port binding

This is where iSCSI Port Binding comes in to play. If you have an ESXi host that has multiple vmk adapters sitting on the same subnet, you can bind the software iSCSI initiators (vmk adapters) to the physical NICs (vmnics). This allows multiple iSCSI connections on multiple NICs residing on the same subnet to transmit and handle the traffic properly.

So the general rule of thumb is:

One subnet, iSCSI port binding is the way to go!
Two or more subnets (multiple subnets), do not use iSCSI Port Binding! It’s just not needed since all vmknics are residing on different subnets.

Additional Information

Here’s two links to VMWare documentation explaining this in more detail:

For more information on configuring a vSphere Distributed Switch for iSCSI MPIO, click here!

And a final troubleshooting note: If you configure iSCSI Port Binding and notice that one of your interfaces is showing as “Not Used” and the other as “Last Used”, this is most likely due to either a physical cabling/switching issue (where one of the bound interfaces can’t connect to the iSCSI target), or you haven’t configured permissions on your SAN to allow a connection from that IP address.

60 Responses to “VMWare vSphere iSCSI Port Binding – When to use iSCSI Port Binding, and why!”

Fabio says:

07/06/2014 at 8:46 AM

Hi Stephen,

Do you have a good article that I can follow to configure a proper MPIO and iSCSI port binding?
In the past i follow this article: http://www.virtualtothecore.com/en/howto-configure-a-small-redundant-iscsi-infrastructure-for-vmware/ with 2 HP server and one QNAP with 4 NICs

Now I have 3 HP gen8 and one MSA 2040….. Can I follow the same old article?

P.S.
I have only one subnet 192.168.1.x

Thanks a lot for your support.
Stephen says:

07/08/2014 at 2:32 PM

Hi Fabio,

Sorry for the delayed response! (It’s Stampede week here in Calgary, busy time of the year!)

Do you only have 1 switch between the MSA 2040 and the 3 HP Servers, or multiple switches? Also, are you using standard switches, or vSphere Distributed Switches?

I briefly took a look at that guide and for the most part it looks good, however I might configure my vSphere switches slightly different. And as always, I always recommend using multiple subnets (and avoid using iSCSI port binding).

Let me know and I’ll see what I can come up with for you, or any advice I may have.

Cheers,
Stephen
Fabio says:

07/08/2014 at 3:11 PM

Hi Stephen,
I have 2 Switch HP 1910 24p Gb managed.
I can only use the standard vmware switch because I have the Essential Plus License.

If I usedmultiple subnet, i need to use the Vlan because in the two HP Switch there is also the normal traffic of the VMs and the other clients of the network with IP 192.168.1.x

Thanks a lot for your support, you are great…

P.s.
for the installation of the ESXi on the HP 360p Gen8 I will use a HP SDHC 32Gb. IT’s a good choise about the security and the stability of the system?
Stephen says:

07/08/2014 at 3:18 PM

Hi Fabio,

I’ll start off with the easiest question: The DL360p Gen8 works great with the SDHC cards for ESXi to be installed on to. I’ve used both the SD card and internal USB thumb drive option, and both work great!

So if you do only use one subnet, you can use that guide you originally posted, however instead of creating multiple switches and binding them to the same NICs, I would instead create only one, configure your VLAN, and then create multiple vmkernal (vmk) interfaces on that single switch (each with their own IP on the network). Then after this you would simply go in to the iSCSI initiator settings and enable iSCSI port binding on each vmk interface.

Keep in mind, that if you were to use both switches (with different subnets), then you would have added redundancy to your configuration in case one of the switches ever failed. This is just a consideration.

Hope this helps,

Stephen
Fabio says:

07/09/2014 at 9:51 AM

Thanks for your reply, I’ll do the configuration with one subnet, 192.168.1.x.
Next week all the products arrive in my lab and then I’ll write you my idea of configuration.

Thanks a lot for your support.
See you soon.

bye

Fabio
Fabio says:

07/16/2014 at 11:11 AM

Hi, finally the MSA2040 is arrived in my lab.
Dual controlle, 8 port iSCSI 1Gb, 7 HDD SFF 600Gb SAS.
I do this configuration:
I have created 2 VDSIK. The first, with SAS1-2-3 in RAID5.
The second with SAS 4-5-6 in RAID5 and SAS7 is Global Spare.
The first VDISK is mapped to controller A and the second vdisk is mapped to controller B.
Each VDISK have one volume of entire capacity mapped on each port of the controller (A1, A2, A3, A4 and B1, B2, B3, B4)

for you is a good configuration?

Thanks a lot for your support

Fabio
Stephen says:

07/16/2014 at 11:20 AM

Hi Fabio,

That should work great. When you created the Vdisks, did you choose auto for owning controller? If not, I would advise to change it.

Other than that you should be good!

Stephen
Fabio says:

07/16/2014 at 1:42 PM

ok, i change it….
I have chose Vdisk1 – controller A
vdisk2 – controller B
Stephen says:

07/16/2014 at 1:45 PM

Hi Fabio,

So just to confirm, when you created the volumes, when it asked for a controller ownership, you chose “Auto”, correct?

Thanks,
Stephen
Fabio says:

07/16/2014 at 2:01 PM

NO. Now I have manual select Controller A and controller B.
Tomorrow change the ownership to Auto
Devin says:

08/23/2014 at 9:39 PM

I’m looking at redesign our iscsi network to include 2 switches and put them on 2 subnets /24.
I understand that I should not do port binding in this case. But I wanted to confirm if that this case holds true if your host has 4 nics for iscsi. I was looking at putting vmnic7 and vmnic6 on subnet 10.0.1.x and vmnic5 and vmnic4 on subnet 10.0.2.x. Would you do portbinding on the nics with in the same subnet?

Thanks for your help.
Stephen says:

08/25/2014 at 6:30 AM

Hi Devin,

Let me see what I can find out, but I’m assuming that you would have to have port binding enabled, however it may result in some erroneous routes/paths which you may have to mark as “inactive” or “disabled” manually. This may or may not be the case, but I’m pretty sure in your case you would need to use iSCSI port binding.

Let me see what I can find out and I’ll get back to you!

Stephen
Stephen says:

08/25/2014 at 6:37 AM

Devin,

Got the information faster than I thought I would! haha

Essentially you WILL use iSCSI port binding. Make sure that pair of NICs that are on a single subnet are configured on their own vSwitch (or Distributed switch). DO NOT use the same vSwitch (or vDs) for different subnets.

When you have the server NICs (only put NICs on the same subnet on the same vSwitch) on their own vSwitch (or vDs), then you can configure iSCSI port binding!

Let me know if you have any questions!

Cheers,
Wihan says:

09/20/2015 at 11:03 PM

Hi Stephen,
I have 4 NIC’s in 1 host, and 8 NIC’s in SAN in test environment
NIC’s in host are each in unique subnet
172.16.0.1/24
172.16.1.1/24
172.16.2.1/24
172.16.3.1/24

and SAN ip

172.16.0.100/24
172.16.1.100/24
172.16.2.100/24
172.16.3.100/24

WIth Port binding enabled, I get good IO ~414MB read, if I disable it, i get 127MB read?
I am not sure where I am going wrong here, i just removed the NIC’s from the Software iSCSI addapter, do I also have to split them up into seperate vSwitches?

Cheers
WIHan
Stephen says:

09/21/2015 at 5:50 AM

Hi Wihan,

Just curious, how do you have everything wired? Do you have seperate physical switches? To confirm, the host is directly attached to the SAN?

If you are using multiple subnets, you should have your vSwitches specially configured.

Stephen
Wihan says:

09/21/2015 at 3:45 PM

Hi Stephen,

I have since found my throughput is fine, i just had to adjust IOPS=1, for some reason it changed back to 1000 after removing the binding. I would still like the answer to my question though so I know it is done correctly.

I have 1 host currently, but I will have 3 eventually after testing is complete.
Each host has 4 dedicated NIC’s for iSCSI.

I have 2 Switches stacked with 2×10 GB with STP (no vlan’s)

Each host has 2 of it’s iSCSI NIC’s connected to storage switch 1, and 2 connected to Storage switch 2.

The SAN has 2 ports from controller 1 connected to storage switch 1 and 2 connected to storage switch 2.

I then connected 2 from conroller2, to storage switch1, and 2 from controller 2, to storage switch 2.

I gave each NIC on the host a ip in a unique /24 subnet. I have them in one vSwich, but selected only one addapter for each network by moving the other addatpers to “Unavailable” in vmware

and on the SAN, port 1 from controller 1 and port 1 from controller2, is in the same subnet.
but port 1, 2,3, 4 on each controller is in different subnets.

I hope that is sorta clear.
Cheers
Wihan
Mattew says:

08/05/2016 at 12:44 PM

Hello, great article. I’m trying to get the best performances out of my QNAP NAS via iSCSI.

I have a vmware cluster with 2 hosts and a QNAP TS-1253 (celeron J1900 CPU, 4 cores 2GHz) with 10 WD RED 3TB (RAID10).

On both nodes I have 2 NIC dedicated to iSCSI on 2 different subnets.
My tests are running from a VM on a different datastore, with a second vmdk attached via iSCSI to my QNAP, no other VMs are using the iSCSI datastore.

After a lot of tweaking this is what I’ve got:

# dd if=/dev/zero of=/dev/sdc bs=1M count=8800
8800+0 records in
8800+0 records out
9227468800 bytes (9,2 GB) copied, 40,2418 s, 229 MB/s

# dd if=/dev/sdc of=/dev/null bs=1M count=4400
4400+0 records in
4400+0 records out
4613734400 bytes (4,6 GB) copied, 40,5963 s, 114 MB/s

As you can see, the writing speed is 229MB/s (excellent), but my reading speed is limited somewhere to 1Gbit (114MB/s). Looking on the QNAP side, both adapters are used equally (55MB/s each), looking on the vmware side both adapters are equally receiving data.

Where can the problem be? The array is capable to do 300MB/s reading speeds.. I’d like to have a reading speed at least = the writing speed.

Many thanks!
Stephen says:

08/05/2016 at 12:52 PM

Hi Mattew,

First and foremost, on the array itself, have you configured all the cache settings and everything?

I’m assuming since you have different subnets that you are NOT using iSCSI port binding (you shouldn’t be).

Also, as for path selection on the vSphere hosts, can you confirm that it is set to “Round Robin”.

Do you have jumbo frames enabled on all devices NICs (NAS as well as the vSphere hosts)?

And finally, on the NAS itself, how do you have the iSCSI target configured? Is the iSCSI volume mapped to the actual array volume itself (block IO), or are you using a virtual file iSCSI target (file IO)? Also, is it using any of the VMWare accelerated features?

Almost forgot, when reading, have you viewed the active performance of the device itself to make sure the CPU or any processes on the NAS aren’t overloading it?

Cheers,
Stephen
Mattew says:

08/05/2016 at 1:48 PM

On the QNAP I have an SSD caching on the array, but It doesn’t matter if I have it enabled or disabled (same results on sequential read/write).
Today I changed from 1 subnet with port binding (which gave mo hobbible write speed (35MB/s) and semi-horrible read speed (60MB/s)) to the new config with 2 subnets. The new setup gave me perfect write speeds but half read speeds. I didn’t removed the port binding but It should not influence the read speed, am I right? Just one question, to disable it is it enough to remove the 2 vmk ports from the network tab of the iscsi initiator on both hosts? Do I need to reboot the hosts?
The datastore is set to round robin and I changed the rr iops to 1, I have jumbo frames enabled on the switch and on the vSwitch/NIC on both hosts. Most important, my array is using Block IO, not virtual disk.
Last question: I looked on the processes on the NAS, cpu & ram are both under 20% of use.

It looks just a 1Gbit bottleneck but it can’t be on one direction only! 🙁

Thanks
Stephen says:

08/05/2016 at 1:58 PM

Hi Mattew,

In order to disable the iSCSI port binding, you’ll need to remove the interfaces from the iSCSI port binding configuration. I wouldn’t leave it active, make sure you remove this.

You mentioned you have jumbo frames enabled on the hosts, are jumbo frames also enabled on the NAS as well?

Furthermore, inside of the vSphere client, when you click on “Manage paths” on the datastores, is it showing correct paths (are they all real and valid, or is it showing any bogus paths in error or paths that are down)?

This is really a bizarre problem you’re experiencing. I’m wondering if it’s something specific to the QNAP unit….

Stephen
Stephen says:

08/05/2016 at 2:07 PM

Mattew,

One more question. I just want to confirm that you don’t have any trunking or network load balancing enabled on either the vSphere hosts, or the QNAP device, correct?
Stephen says:

08/05/2016 at 2:11 PM

Sorry for the 3rd response/comment.

I’m not sold on the write speeds either. Could you try copying an actual file using the dd command (a large file to rule out caching/ram) and let us know the results?

I’m thinking the write speeds are actually slower than being reported… They are probably being queued in the write cache and aren’t actually being performed that fast, also since they are coming from /dev/zero…
Mattew says:

08/05/2016 at 2:17 PM

I have no port trunking on the qnap nor on the switch and on the vmware i have only one vswitch with 2 nics, but the 2 vmk adapters are using vmnic1(vmnic6 unused) and vmnic6 (vmnic1 unused) rispectively. The answer shoul be no, i hope.
I cant really understand why is perfect but one way only…:(
Stephen says:

08/05/2016 at 2:22 PM

Hmm,

My interest is now pointing towards the fact you only have 1 vSwitch… Do you have multiple separate port groups under the vSwitch for each NIC?

If you’re only using 1 vSwitch, you’ll need to have a dedicated port group under that vSwitch for each NIC. And inside of those port groups, you’ll need to configure each port group so that it only has one active adapter, and the other is un-used (it’s un-unsed because in the other port group, the un-used is the active, and the active in the first port group is un-used).

Just for the sake of troubleshooting, it might be worthwhile removing that vSwitch, and creating two separate vSwitches and dedicate one to each subnet… If this helps the situation, it confirms the configuration of the single vSwitch was wrong, and you could either keep 2 vSwitches, or create a new single vSwitch with the proper configuration…
Mattew says:

08/05/2016 at 2:30 PM

# mkfs.ext4 -m0 /dev/sdc
# mkdir /test
# mount /dev/sdc /test
# tar cvf bigfile.tar /usr/
# du -sm *
4605 bigfile.tar (4605 MB)
# time cp bigfile.tar /test/

real 0m23.094s
user 0m0.032s
sys 0m5.296s

4605 / 23 = 200MB/s. Write speeds are ok.
Mattew says:

08/05/2016 at 2:38 PM

Many thanks for your help

Sure I have 2 separate port groups.
I began with this ufficial vmware guide http://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/techpaper/vmware-multipathing-configuration-software-iscsi-port-binding-white-paper.pdf the I switched to 2 separate subnets as I wrote before.

Do you think that creating a second vSwitch may really help, as I demonstrated the write speeds are real?
Stephen says:

08/05/2016 at 2:43 PM

I still have concerns about the iSCSI port binding. I’m also wondering if there are some settings that may be skewed on the QNAP device that are causing this.

I would delete all the iSCSI port binding settings, remove the adapters from the iSCSI port binding configuration window.

For the heck of it, delete the vSwitches, and create two separate ones from scratch, each dedicated to a subnet for testing.

Check the QNAP device for any odd settings are far as array access, networking, etc…
Mattew says:

08/05/2016 at 2:56 PM

Many thanks again. I will try to remove the vSwitch reconfigure the vsphere, but I think something odd is on the QNAP side. I can’t guess what can be. I will come back to report soon.
Mattew says:

08/06/2016 at 12:28 PM

Hi, I did not tried yet, I will do it on monday. Anyway, is it better to have separate broadcast domain as well? I mean, having 2 subnets should be enough but do I need to create a vlan for the secondary subnet (i already have one for the first subnet)?
I think this will not fix my issue, maybe I will gain some extra bandwidth, right?
Stephen says:

08/06/2016 at 12:34 PM

Hi Mattew,

Well this new info changes things quite a bit!

I thought you had two separate physical switches, one for each subnet. Or are you using only one physical switch? This could be causing or contributing to your issues (or may cause other issues in the future). You should have two physical switches, each one dedicated to one of the subnets, and you should NOT have any VLANs configured if they aren’t needed.

Also, you did not mention you had a VLAN configured before. This could also be causing speed issues (VLANs reduce the possible MTU due to the overhead of the VLAN and tagging in packet sizes). This is probably causing packets to be fragmented on the switch, which will significantly reduce performance (example, since the packets are fragmented, where only 1 packet should be sent/received, it’s getting split in to two, which could and probably is causing unneeded congestion.
Mattew says:

08/06/2016 at 1:09 PM

I know I should have separate switches for storage but I have not 😉
Anyway, on my first try I added a switch connecting 3 ports (2 hosts + 1 qnap port), with this configuration I had 180MB/s on write. The switch was unmanaged and I doubdtful about the jumbo frame support. After a couple of tests I removed it, reconnecting the 3 cables to the same vlan in place for the first subnet, and the write speeds increased to 229MB/s.
Anyway, saying I,’m working with a single switch is not correct, in fact I’m working on a stack of two, so if one member fails I’d lost only one iscsi path. I have redundancy but I miss decent reading speeds:)
Obviously I have the primary subnet on the first member and the secondary subnet on the second member.
I know this is not the best solution but why should this perform at best just one way only? Through the switch I’m sending 229MB/s to the Qnap, why shouldn’t do it the other way?
Stephen says:

08/06/2016 at 1:33 PM

Hi Mattew,

So with your last comment, just to confirm, you’re also using the switches for VM connectivity as well (not just storage)? This is more information that brings in to play other things that can be causing issues and more complexity.

Are the two switches in the stack connected, or are they completely separate?

Going back to the main potential issue:

Since you’re using VLANs, if they are changing MTU sizes, then the mismatch on hosts/storage/network layer will cause packet fragmentation. This is not good on any network (especially a storage/iSCSI network).

Packet fragmentation can cause a whole slew of issues when there are things mis-configured on the network, one of these includes speed issues going in certain ways (as packets in one direction are fragmented).

I’m not going to get in to too many details, but let’s say with jumbo frames, packets of 9,000 are being transmitted, when a VLAN tag is added to it, it adds 4 bytes (I think 4 is correct), which makes the packet 9,004 bytes. If the devices are configured for jumbo frames of 9,000, when it receives the packet, it will request fragmentation and a re-transmit because it can’t accept the initial packet. Which will cause not only a delay in fragmentation, but will also increase network congestion, as well as reduce performance.

If the MTU and packet size is not constant across the network and all devices, connections and speed will appear to be fine going one way (example: the receiving glove is big enough to hold the small ball), but packets will be dropped and re-transmits will occur going the other direction (example: because the ball is too big for the glove to be caught, so it will ask for two small balls instead of the big ball).

This also brings in a new potential issue that you need to check. There’s a chance that while a switch may support jumbo frames, it may not support jumbo frames over VLANs. You need to check in to this added complexity, you also need to make that if it does support jumbo over VLAN, that there isn’t a separate setting to enable this. You’ll also need to adjust all your MTUs, and make sure VLAN configuration is constant on all your network devices so make sure the standard packet transmission size matches.

And on a final note, if you are using the same switches for VM connectivity, and your running different MTUs on a bunch of different devices, you’re going to run in to a whole bunch of issues later on as well.
Mattew says:

08/06/2016 at 2:40 PM

I’m using the stack for the VM traffic as well. I’m not routing traffic between the storage VLAN and the others, I have no VLAN tagging on the esx host nor on the QNAP. From what I know, the 802.1q vlan tag is added when i packet needs to traverse the switch going to another vlan, but this is not my case.
I will try on monday with two separate switches dedicated to iSCSI only. Maybe I will have to disable jumbo frames because probably I can get 2 switches but I do not know if they will have jumbo support.
Anyway, from what I see, there is no performance loss, but a bottleneck of 1Gbit when the QNAP sends data, this has not much in common with a possible packet fragmentation. Sorry but I’m skeptical about the theory of fragmentation = culprit. If it would be the case, It would not about 1Gbit…
But, I know you’re right, it could cause issues, so I will try monday with 2 switches dedicated to iSCSI.

I will be back 🙂
Many thanks.
Mattew says:

08/08/2016 at 3:49 AM

Today I took 2 spare switches, both with jumbo frames support and I connected an iSCSI subnet to each one.
I’d want to be clear, I’m not trusting only the benchmark I run, but I’m also keeping an eye on each NIC to see if and how they are working together.
I also created a new vswitch with both the NICs (they are on separated PCI cards), set the mtu9000 and connected to my initiator.

Unfortunately, the results are not changed. But I have one detail more:

Wrinting to the iscsi datastore I see both NIC receiving ad semi-full speed(115MB/s on NIC3, 104MB/s on NIC4,looking on the QNAP side).

Reading from the iscsi datastore I see both NIC sending but 55MB/s on NIC3 + 55MB/s on NIC4. Here the fun part: If I disconnect the NIC4 during the transfert, the NIC3 starts sending at full speed (more than 110MB/s). If I reconnect the NIC4, both NICs goes to 55MB/s again.

It seems the QNAP array limited to 100-110MB/s, but it is not, moreover, how it can write at 220MB/s and read at half speed?

So, different vmware configs, different switches, different setups, same QNAP.

Really I don’tknow what my next try, any suggestion?
Mattew says:

08/08/2016 at 5:15 AM

Just to try, because I have 4 NICs on the QNAP, i setup a third NIC with a third subnet on a third switch.

256MB/s writing (about 85MB/s on each NIC)
104MB/s reading (about 35MB/s on each NIC)

This is cause by some bug on the QNAP side. I also tried creating another disk/LUN, allocation a few TB of free space, but the reading speeds are ridiculous.

🙁
Stephen says:

08/08/2016 at 5:54 AM

Hi Mattew,

Could you list what subnets you’re using for the configuration?

Also, do me a favour and disable jumbo frames (I’m wondering if this is another issue I’ve heard of). Try using the standard 1500, make sure you set this on the hosts as well as the QNAP system.

And are the servers firmware fully up to date (including NIC firmware)? What type of servers are they? Did you use the normal ESXi install image, or did you use a vendor customized install image (ex. HP, IBM, etc…).
Mattew says:

08/08/2016 at 7:45 AM

10.1.7.0/24 (target .12, inititators .22/.24)
10.1.17.0/24 (target .13, initiators .23/.25)
10.1.27.0/24 (target .14, initiators .26/.27)

Jumbo were disabled on the last run but the behavoir was the same.
The QNAP is updated to 4.2.1 (last firmware) and the ESXi are update to 6.0u2, I’m using custom lenovo firmware (20150420 was the initial release).

Do you suggest me to try with a different server on the unused iscsi target?
Mattew says:

08/08/2016 at 8:07 AM

Sorry I miss an answer. IBM/Lenovo System x5550 M5: Xeon E5-2620 v3 2.4GHz (dual socket) + 128GB ram each. Primary DS Fibre channel, QNAP for data protector only.
Mattew says:

08/08/2016 at 8:08 AM

x3550 M5, sorry.
Stephen says:

08/08/2016 at 9:47 AM

Hi Mattew,

This is where troubleshooting gets tricky…

Could you check to see if Lenovo has any updated VMWare drivers for the NICs on that system? It’s worth giving a try. This is going to be extremely difficult to troubleshoot.

A few things it could be:
-NIC driver on the ESXi hosts
-Other drivers on the ESXi hosts
-It could be a bug in the QNAP system
-It could be a configuration issue on the QNAP system
-It might be a RAID10 issue on the QNAP system, it might be worthwhile trying a different level of RAID to see what performance is like.
-Could be due to disk cache settings on the QNAP system, can you check these and tell me if disk cache is turned on
-There’s an extremely small chance it may be the NCQ settings on the drive, it might be worthwhile finding out how and if you can disable NCQ on the QNAP device, and see if this makes a difference.
Mattew says:

08/08/2016 at 11:20 AM

Unfortunately I have news =)

I tried with a new initiator (2 subnets because I removed the third one), using this multipath.conf:

device {
vendor “QNAP”
product “iSCSI Storage”
path_selector “round-robin 0”
path_grouping_policy group_by_prio
getuid_callout “/sbin/scsi_id-g-u-s /block/%n”
path_checker directio
failback immediate
prio const
rr_min_io 1
no_path_retry 5
}

Same results, I can write with NICs running both at full speed, but reading is half speed.
Ok, now I know the culprit is not the switch nor the vmware cluster, do you agree?

I’m adding a detail: using dd on the QNAP via ssh, I can get 297MB/s reading the lvm logical partition (I’m working with che SSD cache disabled). I’m just confirming what was clear, the array can perform better than 110MB/s… there would be enough meat for 3 NICs working together….

What do you think about? I cannot change raid level but as last option I can reset the QNAP to factory defaults. Maybe something is going wrong there.

THe QNAP has a kinda of linux OS, do you want me to search for anything from the CLI?
Mattew says:

08/09/2016 at 12:59 AM

First time that I saw an upload exceeding 60MB/s on each NICs at the same time!!
As my last post, yesterday I connect a linux box (2 NICs) to a second target on the QNAP, but the reading speed was limited to 110MB/s as with the ESXi hosts.
Before giving up, I tried to read at the same time from both targets, using the ESXi host AND the linux box. This way I saw for the first time, the QNAP NICs sendind at over 75MB/s each, this means I exceeded the 1Gbit limitation!! So, It’s possible!

One thing that It’s strange for me: on the QNAP gui, looking on the target active connections, I can see only the connection from 1 subnet (10.1.17.x), but looking on the command line of both the QNAP and the ESXi, both the subnets are connected (10.1.7.x and 10.1.17.x on port 3260).
So, in my opinion, there is kinda limitation “per initiator”. 2 session from the same initiaton (even with 2 different NICs) cannot exceed 1Gbit… Is it possible?
Maybe this is the reason why on the QNAP gui I can see just one connection per host, It “consider” just only 1 session when sending, but it must accept packets from both session when receiving…

What to no now?
Stephen says:

08/09/2016 at 6:30 AM

Hi Mattew,

I’m really starting to think that it is something on the QNAP device that is causing the speed/performance issues. I’m not sure if it’s the iSCSI target, or configuration values inside of the OS. But if it was me, if it were possible, I’d reset the device and start from scratch, also possibly try a different RAID level (just for the heck of it).

Then ultimately, I would reach out to QNAP support to see if they can comment on the issue you’re experiencing.

On a side note, I’m not saying this holds true for all entry-level NAS and SAN units, but when I was playing around with iSCSI on my Synology device, it was almost un-usable, for me to get any type of real-world production performance, I had to use NFS. This is one of the reasons why I made the investment on my HPe MSA 2040 unit, and haven’t had any issues since (I don’t even know what max reads/writes I’m getting on it, but I see it hit and sustain ~500MB/sec regularly). I still kept my Synology unit as an NFS vSphere Data protection store (vSphere data protection replication works like a dream on it).
Matthew says:

09/14/2016 at 5:02 AM

Hello! Here we are again 🙂
One month later, I reset the QNAP and I updated to the last release 4.2.2, but nothing changed.
Can you see the screenshot?

Any comment?
Matthew says:

09/20/2016 at 3:16 AM

Hello again, I have updates.

On the final step of the VDP configuration wizard (post-deploy), it asks to do a performance test, so I selected the option and I did this test. During the test I saw the NICs uploading at full speed, 110MB/s each.
I was very happy but after the test is finish, the reading speeds decreased to 1Gbps, again.

Not happy I tried another test: I connected 2 vmdk to my test VM and I start reading from both on the same time. It has been a success, again, over 100MB/s on each NIC. Now it seems kinda limitation on each VMDK, just on writes, not on reads… Is there this kind of settings somewhere on a vmware cluster?
Stephen says:

09/20/2016 at 7:20 AM

Hey Matthew,

Good idea on the test! It sounds like something is wrong in the config.

Just curious, have you confirmed you have round robin enabled? Played with any of the settings for balancing?

It sounds like you might have trunking enabled, or something along those lines. Trunking (Link Aggregation) should never be enabled when using MPIO. Trunking allows the utilization of only one link per connection.
Matthew says:

09/30/2016 at 4:08 AM

Hello,

I have no doubts about the phisical configuration of the iSCSI lines.
Inside my VM I made a RAID0 device by using 2 virtual disks (2 vmdks on the same qnap iscsi datastore).

Look here:

root@testvm ~ # mkfs.btrfs /dev/sdc /dev/sdd
btrfs-progs v4.6.1
See http://btrfs.wiki.kernel.org for more information.

Label: (null)
UUID: 63468aab-df86-4033-83fa-658314799dee
Node size: 16384
Sector size: 4096
Filesystem size: 64.00GiB
Block group profiles:
Data: RAID0 2.01GiB
Metadata: RAID1 1.01GiB
System: RAID1 12.00MiB
SSD detected: no
Incompat features: extref, skinny-metadata
Number of devices: 2
Devices:
ID SIZE PATH
1 32.00GiB /dev/sdc
2 32.00GiB /dev/sdd

root@testvm ~ # mount /dev/sdc /mnt/backup/

root@testvm ~ # dd if=/dev/zero of=/mnt/backup/test bs=1M cou
16384+0 records in
16384+0 records out
17179869184 bytes (17 GB) copied, 75.9357 s, 226 MB/s

root@testvm ~ # dd if=/mnt/backup/test of=/dev/null bs=1M
16384+0 records in
16384+0 records out
17179869184 bytes (17 GB) copied, 87.0977 s, 197 MB/s

What to check? It’s obviuos that the QNAP can send over 100MB/s on each NIC if combined 2 virtual disks. On the single vmdk I can write at this speed but I can read only at half speed…

I think the issue is somewhere on the vmware side. I have absolutely NO IOPS limit configured. I never touched these kind of settings of it because I never needed to limit anything…

How can we explain theese results? 🙂
Matthew says:

09/30/2016 at 6:54 AM

Another try: Tried to disable delayed ACK as https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1002598

I rebooted my hosts but… nothing changed.
Stephen says:

09/30/2016 at 6:58 AM

Hi Matthew,

Just curious, have you looked at the logs on the ESXi hosts?
Matthew says:

09/30/2016 at 11:39 AM

Can you suggest which logs to check and what should I search into?

1) tried with another initiator (linux box) with the same results
2) tried different phisical switches, same result
3) tried with/without jumbo, with/without delayed ack, with different subnets, dirrefent vswitches.. same result.

I know that I can get the speed that my array is capable doing :

1) reading from the linux box and the esx host at the same time
2) reading from both esx host at the same time
3) reading from 2 virtual disks at the same time (same Vm, single host)

I’am exausted. On monday I will setup another iSCSI target somewhere else with a single ssd as disk with 2 nics and I will connect my cluster. I will report back the result. I want to exclude the esx host from the culprits…

Thanks again..
Stephen says:

09/30/2016 at 11:42 AM

Hi Matthew,

In that case, I still believe that it’s something on the array itself, or something that is configured on the networking layer.

I’m sorry I can’t be of more assistance, but I’m not that familiar with the QNAP devices (I tend to stay away from these entry level NAS devices, as your results are similar to the results I had with my synology device, thus why I don’t use it anymore).
Matthew says:

10/03/2016 at 5:33 AM

Hi, sorry for many messages, I’m probably annoying you and flooding your blog, pardon 🙂
I repeat the test using a different target configured on a linux box with an SSD (tgtd as iscsi daemon).
Same subnet, same switch, same vmware config…

The results are not different:

# mount /dev/sdb /mnt/
# dd if=/dev/zero of=/mnt/test bs=1M count=8192
8192+0 records in
8192+0 records out
8589934592 bytes (8.6 GB) copied, 37.3294 s, 230 MB/s
# dd if=/mnt/test of=/dev/null bs=1M
8192+0 records in
8192+0 records out
8589934592 bytes (8.6 GB) copied, 67.4717 s, 127 MB/s

Again, doing a RAID0 with 2 virtual disks solve the reading issue…
Now it seems my cluster the culprit… So, do you know any vmware setting that can limit the reading speeds from iSCSI? Attaching a vmdk on my FC SAN I have no limitation at all… It seems only on iSCSI..
Matthew says:

10/03/2016 at 5:35 AM

ps: I attach the ‘dd’ output but when benchmarking I’m always watching the NICs speeds… they correspond to the speeds displayed by ‘dd’, so the speeds are real..
Stephen says:

10/03/2016 at 6:33 AM

Hi Matthew,

I can’t remember if I asked this before, but what type of servers do you have that are running ESXi?

I’m wondering if this might be related to outdated NIC drivers, or something along those lines…
Matthew says:

10/03/2016 at 7:02 AM

Yes you asked that; we excluded similar issues (NICs & vmware) because I tried with a linux box (as initiator the first time) and the behavoir was the same.
I repeated the tests now with a XEN cluster connected to the linux box target set-up before… same behavior, just a little bit worst due to slower cpu performances (210MB/s write, 105MB/s read)… I also used different switches.

I’ll give up. There must be something weird somewhere but I can’t guess where. The only thing in common on the two setups are that QNAP & the new SSD target has linux as OS and probably the same network stack or iscsi target software..

I’m not able to solve this 🙂

Thanks again 😀
Matthew says:

10/14/2016 at 12:20 AM

Hello,

again here, just to link two threads of somebody having the same issue I have.
I’m writing on your blog because maybe it will be worth for somebody reading this article too..

It seems not solved but I’m not alone.

https://forums.freenas.org/index.php?threads/iscsi-tests-suggest-a-problem-with-read-performance.26180/

https://forums.freenas.org/index.php?threads/low-read-performance-on-multipath-iscsi-with-ubuntu-initiator.25520/

More or less, same setup: writing to the iSCSI target can saturate both NICs, but reading can’t.
alaa says:

11/17/2019 at 9:49 PM

Hi
can you help me ? i didn’t find the tab of network port binding to complete the configuration .
vSphere Distributed Switch Configuration for iSCSI MPIO SAN using multiple subnets - The Tech Journal says:

04/04/2020 at 11:40 AM

[…] Whether or not I should use iSCSI Port Binding […]
Brad Wilson says:

09/14/2020 at 6:50 AM

I did find your video and blog very helpful.
Our situation is that I have two SANS with each having two interfaces.
Each are on VLAN 24 and 25.
So far so good.
The catch is that they have different MTU sizes and I can’t go back and change them.

Is that a case when I would want to use iSCSI binding?
Stephen Wagner says:

09/14/2020 at 8:02 AM

Hi Brad,

In your example, with each SAN having 2 interfaces, you’ll want to have them on separate VLANs (separate vDS/vSwitch port groups). You’ll create a vmk adapter on each host for each VLAN/portgroup, and configure the port groups and vmk adapters for the applicable MTU sizes.

You only need to use iSCSI port groups if you have multiple vmks/NICs on the actual ESXi host that reside on the same subnet. In your case you don’t mention the number of NICs, but if you had 1 NIC in each host, you would not use iSCSI port binding.

Cheers,
Stephen

What does iSCSI port binding do

When to use iSCSI port binding

Additional Information

60 Responses to “VMWare vSphere iSCSI Port Binding – When to use iSCSI Port Binding, and why!”

Leave a Reply