Jun 072014
 

While doing some semi-related research on the internet, I’ve come across numerous how-to and informational articles explaining how to configure iSCSI MPIO, and advising readers to incorrectly use iSCSI port binding. I felt the need to whip up a post to explain why and when you should use iSCSI port Binding on VMware vSphere.

This post applies to all versions of VMware vSphere, including 5, 5.5, 6, 6.5, and 6.7.

What does iSCSI port binding do

iSCSI port binding binds an iSCSI initiator interface on a ESXi host to a vmknic and configures accordingly to allow multipathing in a situation where both vmknics are residing in the same subnet. In normal circumstances without port binding, if you have multiple vmkernels on the same subnet, the ESXi host would simply choose one and not use both for transmission of packets, traffic, and data. iSCSI port binding forces the iSCSI initiator to use that adapter for both transmission and receiving of iSCSI packets.

In most simple SAN environments, there are two different types of setups/configurations.

  1. Multiple Subnet – Numerous paths to a storage device on a SAN, each path residing on separate subnets. These paths are isolated from each other and usually involve multiple switches.
  2. Single Subnet – Numerous paths to a storage device on a SAN, each path is on the same subnet. These paths usually go through 1-2 switches, with all interfaces on the SAN and the hosts residing on the same subnet.

A lot of you I.T. professionals know the the issues that occur when you have a host that is multi-homed, and you know that in normal typical scenarios with Windows and Linux, that if you have multiple adapters residing on the same subnet, you’ll have issues with broadcasts, and in most cases you have absolutely no control over what communications are initiated over what NIC due to the way the routing table is handled. In most cases all outbound connections will be initiated through the first NIC installed in the system, or whichever one is inside of the primary route in the routing table.

When to use iSCSI port binding

This is where iSCSI Port Binding comes in to play. If you have an ESXi host that has vmks sitting on the same subnet, you can bind the iSCSI initiators to the physical NICs. This allows multiple iSCSI connections on multiple NICs residing on the same subnet.

So the general rule of thumb is:

  • One subnet, iSCSI port binding is the way to go!
  • Two or more subnets, DON’T USE ISCSI PORT BINDING! It’s just not needed since all vmknics are residing on different subnets.

Additional Information

Here’s two links to VMWare documentation explaining this in more detail:

http://kb.vmware.com/kb/2010877
http://kb.vmware.com/kb/2038869

 

  56 Responses to “VMWare vSphere iSCSI Port Binding – When to use iSCSI Port Binding, and why!”

  1. Hi Stephen,

    Do you have a good article that I can follow to configure a proper MPIO and iSCSI port binding?
    In the past i follow this article: http://www.virtualtothecore.com/en/howto-configure-a-small-redundant-iscsi-infrastructure-for-vmware/ with 2 HP server and one QNAP with 4 NICs

    Now I have 3 HP gen8 and one MSA 2040….. Can I follow the same old article?

    P.S.
    I have only one subnet 192.168.1.x

    Thanks a lot for your support.

  2. Hi Fabio,

    Sorry for the delayed response! (It’s Stampede week here in Calgary, busy time of the year!)

    Do you only have 1 switch between the MSA 2040 and the 3 HP Servers, or multiple switches? Also, are you using standard switches, or vSphere Distributed Switches?

    I briefly took a look at that guide and for the most part it looks good, however I might configure my vSphere switches slightly different. And as always, I always recommend using multiple subnets (and avoid using iSCSI port binding).

    Let me know and I’ll see what I can come up with for you, or any advice I may have.

    Cheers,
    Stephen

  3. Hi Stephen,
    I have 2 Switch HP 1910 24p Gb managed.
    I can only use the standard vmware switch because I have the Essential Plus License.

    If I usedmultiple subnet, i need to use the Vlan because in the two HP Switch there is also the normal traffic of the VMs and the other clients of the network with IP 192.168.1.x

    Thanks a lot for your support, you are great…

    P.s.
    for the installation of the ESXi on the HP 360p Gen8 I will use a HP SDHC 32Gb. IT’s a good choise about the security and the stability of the system?

  4. Hi Fabio,

    I’ll start off with the easiest question: The DL360p Gen8 works great with the SDHC cards for ESXi to be installed on to. I’ve used both the SD card and internal USB thumb drive option, and both work great!

    So if you do only use one subnet, you can use that guide you originally posted, however instead of creating multiple switches and binding them to the same NICs, I would instead create only one, configure your VLAN, and then create multiple vmkernal (vmk) interfaces on that single switch (each with their own IP on the network). Then after this you would simply go in to the iSCSI initiator settings and enable iSCSI port binding on each vmk interface.

    Keep in mind, that if you were to use both switches (with different subnets), then you would have added redundancy to your configuration in case one of the switches ever failed. This is just a consideration.

    Hope this helps,

    Stephen

  5. Thanks for your reply, I’ll do the configuration with one subnet, 192.168.1.x.
    Next week all the products arrive in my lab and then I’ll write you my idea of configuration.

    Thanks a lot for your support.
    See you soon.

    bye

    Fabio

  6. Hi, finally the MSA2040 is arrived in my lab.
    Dual controlle, 8 port iSCSI 1Gb, 7 HDD SFF 600Gb SAS.
    I do this configuration:
    I have created 2 VDSIK. The first, with SAS1-2-3 in RAID5.
    The second with SAS 4-5-6 in RAID5 and SAS7 is Global Spare.
    The first VDISK is mapped to controller A and the second vdisk is mapped to controller B.
    Each VDISK have one volume of entire capacity mapped on each port of the controller (A1, A2, A3, A4 and B1, B2, B3, B4)

    for you is a good configuration?

    Thanks a lot for your support

    Fabio

  7. Hi Fabio,

    That should work great. When you created the Vdisks, did you choose auto for owning controller? If not, I would advise to change it.

    Other than that you should be good!

    Stephen

  8. ok, i change it….
    I have chose Vdisk1 – controller A
    vdisk2 – controller B

  9. Hi Fabio,

    So just to confirm, when you created the volumes, when it asked for a controller ownership, you chose “Auto”, correct?

    Thanks,
    Stephen

  10. NO. Now I have manual select Controller A and controller B.
    Tomorrow change the ownership to Auto

  11. I’m looking at redesign our iscsi network to include 2 switches and put them on 2 subnets /24.
    I understand that I should not do port binding in this case. But I wanted to confirm if that this case holds true if your host has 4 nics for iscsi. I was looking at putting vmnic7 and vmnic6 on subnet 10.0.1.x and vmnic5 and vmnic4 on subnet 10.0.2.x. Would you do portbinding on the nics with in the same subnet?

    Thanks for your help.

  12. Hi Devin,

    Let me see what I can find out, but I’m assuming that you would have to have port binding enabled, however it may result in some erroneous routes/paths which you may have to mark as “inactive” or “disabled” manually. This may or may not be the case, but I’m pretty sure in your case you would need to use iSCSI port binding.

    Let me see what I can find out and I’ll get back to you!

    Stephen

  13. Devin,

    Got the information faster than I thought I would! haha

    Essentially you WILL use iSCSI port binding. Make sure that pair of NICs that are on a single subnet are configured on their own vSwitch (or Distributed switch). DO NOT use the same vSwitch (or vDs) for different subnets.

    When you have the server NICs (only put NICs on the same subnet on the same vSwitch) on their own vSwitch (or vDs), then you can configure iSCSI port binding!

    Let me know if you have any questions!

    Cheers,

  14. Hi Stephen,
    I have 4 NIC’s in 1 host, and 8 NIC’s in SAN in test environment
    NIC’s in host are each in unique subnet
    172.16.0.1/24
    172.16.1.1/24
    172.16.2.1/24
    172.16.3.1/24

    and SAN ip

    172.16.0.100/24
    172.16.1.100/24
    172.16.2.100/24
    172.16.3.100/24

    WIth Port binding enabled, I get good IO ~414MB read, if I disable it, i get 127MB read?
    I am not sure where I am going wrong here, i just removed the NIC’s from the Software iSCSI addapter, do I also have to split them up into seperate vSwitches?

    Cheers
    WIHan

  15. Hi Wihan,

    Just curious, how do you have everything wired? Do you have seperate physical switches? To confirm, the host is directly attached to the SAN?

    If you are using multiple subnets, you should have your vSwitches specially configured.

    Stephen

  16. Hi Stephen,

    I have since found my throughput is fine, i just had to adjust IOPS=1, for some reason it changed back to 1000 after removing the binding. I would still like the answer to my question though so I know it is done correctly.

    I have 1 host currently, but I will have 3 eventually after testing is complete.
    Each host has 4 dedicated NIC’s for iSCSI.

    I have 2 Switches stacked with 2×10 GB with STP (no vlan’s)

    Each host has 2 of it’s iSCSI NIC’s connected to storage switch 1, and 2 connected to Storage switch 2.

    The SAN has 2 ports from controller 1 connected to storage switch 1 and 2 connected to storage switch 2.

    I then connected 2 from conroller2, to storage switch1, and 2 from controller 2, to storage switch 2.

    I gave each NIC on the host a ip in a unique /24 subnet. I have them in one vSwich, but selected only one addapter for each network by moving the other addatpers to “Unavailable” in vmware

    and on the SAN, port 1 from controller 1 and port 1 from controller2, is in the same subnet.
    but port 1, 2,3, 4 on each controller is in different subnets.

    I hope that is sorta clear.
    Cheers
    Wihan

  17. Hello, great article. I’m trying to get the best performances out of my QNAP NAS via iSCSI.

    I have a vmware cluster with 2 hosts and a QNAP TS-1253 (celeron J1900 CPU, 4 cores 2GHz) with 10 WD RED 3TB (RAID10).

    On both nodes I have 2 NIC dedicated to iSCSI on 2 different subnets.
    My tests are running from a VM on a different datastore, with a second vmdk attached via iSCSI to my QNAP, no other VMs are using the iSCSI datastore.

    After a lot of tweaking this is what I’ve got:

    # dd if=/dev/zero of=/dev/sdc bs=1M count=8800
    8800+0 records in
    8800+0 records out
    9227468800 bytes (9,2 GB) copied, 40,2418 s, 229 MB/s

    # dd if=/dev/sdc of=/dev/null bs=1M count=4400
    4400+0 records in
    4400+0 records out
    4613734400 bytes (4,6 GB) copied, 40,5963 s, 114 MB/s

    As you can see, the writing speed is 229MB/s (excellent), but my reading speed is limited somewhere to 1Gbit (114MB/s). Looking on the QNAP side, both adapters are used equally (55MB/s each), looking on the vmware side both adapters are equally receiving data.

    Where can the problem be? The array is capable to do 300MB/s reading speeds.. I’d like to have a reading speed at least = the writing speed.

    Many thanks!

  18. Hi Mattew,

    First and foremost, on the array itself, have you configured all the cache settings and everything?

    I’m assuming since you have different subnets that you are NOT using iSCSI port binding (you shouldn’t be).

    Also, as for path selection on the vSphere hosts, can you confirm that it is set to “Round Robin”.

    Do you have jumbo frames enabled on all devices NICs (NAS as well as the vSphere hosts)?

    And finally, on the NAS itself, how do you have the iSCSI target configured? Is the iSCSI volume mapped to the actual array volume itself (block IO), or are you using a virtual file iSCSI target (file IO)? Also, is it using any of the VMWare accelerated features?

    Almost forgot, when reading, have you viewed the active performance of the device itself to make sure the CPU or any processes on the NAS aren’t overloading it?

    Cheers,
    Stephen

  19. On the QNAP I have an SSD caching on the array, but It doesn’t matter if I have it enabled or disabled (same results on sequential read/write).
    Today I changed from 1 subnet with port binding (which gave mo hobbible write speed (35MB/s) and semi-horrible read speed (60MB/s)) to the new config with 2 subnets. The new setup gave me perfect write speeds but half read speeds. I didn’t removed the port binding but It should not influence the read speed, am I right? Just one question, to disable it is it enough to remove the 2 vmk ports from the network tab of the iscsi initiator on both hosts? Do I need to reboot the hosts?
    The datastore is set to round robin and I changed the rr iops to 1, I have jumbo frames enabled on the switch and on the vSwitch/NIC on both hosts. Most important, my array is using Block IO, not virtual disk.
    Last question: I looked on the processes on the NAS, cpu & ram are both under 20% of use.

    It looks just a 1Gbit bottleneck but it can’t be on one direction only! 🙁

    Thanks

  20. Hi Mattew,

    In order to disable the iSCSI port binding, you’ll need to remove the interfaces from the iSCSI port binding configuration. I wouldn’t leave it active, make sure you remove this.

    You mentioned you have jumbo frames enabled on the hosts, are jumbo frames also enabled on the NAS as well?

    Furthermore, inside of the vSphere client, when you click on “Manage paths” on the datastores, is it showing correct paths (are they all real and valid, or is it showing any bogus paths in error or paths that are down)?

    This is really a bizarre problem you’re experiencing. I’m wondering if it’s something specific to the QNAP unit….

    Stephen

  21. Mattew,

    One more question. I just want to confirm that you don’t have any trunking or network load balancing enabled on either the vSphere hosts, or the QNAP device, correct?

  22. Sorry for the 3rd response/comment.

    I’m not sold on the write speeds either. Could you try copying an actual file using the dd command (a large file to rule out caching/ram) and let us know the results?

    I’m thinking the write speeds are actually slower than being reported… They are probably being queued in the write cache and aren’t actually being performed that fast, also since they are coming from /dev/zero…

  23. I have no port trunking on the qnap nor on the switch and on the vmware i have only one vswitch with 2 nics, but the 2 vmk adapters are using vmnic1(vmnic6 unused) and vmnic6 (vmnic1 unused) rispectively. The answer shoul be no, i hope.
    I cant really understand why is perfect but one way only…:(

  24. Hmm,

    My interest is now pointing towards the fact you only have 1 vSwitch… Do you have multiple separate port groups under the vSwitch for each NIC?

    If you’re only using 1 vSwitch, you’ll need to have a dedicated port group under that vSwitch for each NIC. And inside of those port groups, you’ll need to configure each port group so that it only has one active adapter, and the other is un-used (it’s un-unsed because in the other port group, the un-used is the active, and the active in the first port group is un-used).

    Just for the sake of troubleshooting, it might be worthwhile removing that vSwitch, and creating two separate vSwitches and dedicate one to each subnet… If this helps the situation, it confirms the configuration of the single vSwitch was wrong, and you could either keep 2 vSwitches, or create a new single vSwitch with the proper configuration…

  25. # mkfs.ext4 -m0 /dev/sdc
    # mkdir /test
    # mount /dev/sdc /test
    # tar cvf bigfile.tar /usr/
    # du -sm *
    4605 bigfile.tar (4605 MB)
    # time cp bigfile.tar /test/

    real 0m23.094s
    user 0m0.032s
    sys 0m5.296s

    4605 / 23 = 200MB/s. Write speeds are ok.

  26. Many thanks for your help

    Sure I have 2 separate port groups.
    I began with this ufficial vmware guide http://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/techpaper/vmware-multipathing-configuration-software-iscsi-port-binding-white-paper.pdf the I switched to 2 separate subnets as I wrote before.

    Do you think that creating a second vSwitch may really help, as I demonstrated the write speeds are real?

  27. I still have concerns about the iSCSI port binding. I’m also wondering if there are some settings that may be skewed on the QNAP device that are causing this.

    I would delete all the iSCSI port binding settings, remove the adapters from the iSCSI port binding configuration window.

    For the heck of it, delete the vSwitches, and create two separate ones from scratch, each dedicated to a subnet for testing.

    Check the QNAP device for any odd settings are far as array access, networking, etc…

  28. Many thanks again. I will try to remove the vSwitch reconfigure the vsphere, but I think something odd is on the QNAP side. I can’t guess what can be. I will come back to report soon.

  29. Hi, I did not tried yet, I will do it on monday. Anyway, is it better to have separate broadcast domain as well? I mean, having 2 subnets should be enough but do I need to create a vlan for the secondary subnet (i already have one for the first subnet)?
    I think this will not fix my issue, maybe I will gain some extra bandwidth, right?

  30. Hi Mattew,

    Well this new info changes things quite a bit!

    I thought you had two separate physical switches, one for each subnet. Or are you using only one physical switch? This could be causing or contributing to your issues (or may cause other issues in the future). You should have two physical switches, each one dedicated to one of the subnets, and you should NOT have any VLANs configured if they aren’t needed.

    Also, you did not mention you had a VLAN configured before. This could also be causing speed issues (VLANs reduce the possible MTU due to the overhead of the VLAN and tagging in packet sizes). This is probably causing packets to be fragmented on the switch, which will significantly reduce performance (example, since the packets are fragmented, where only 1 packet should be sent/received, it’s getting split in to two, which could and probably is causing unneeded congestion.

  31. I know I should have separate switches for storage but I have not 😉
    Anyway, on my first try I added a switch connecting 3 ports (2 hosts + 1 qnap port), with this configuration I had 180MB/s on write. The switch was unmanaged and I doubdtful about the jumbo frame support. After a couple of tests I removed it, reconnecting the 3 cables to the same vlan in place for the first subnet, and the write speeds increased to 229MB/s.
    Anyway, saying I,’m working with a single switch is not correct, in fact I’m working on a stack of two, so if one member fails I’d lost only one iscsi path. I have redundancy but I miss decent reading speeds:)
    Obviously I have the primary subnet on the first member and the secondary subnet on the second member.
    I know this is not the best solution but why should this perform at best just one way only? Through the switch I’m sending 229MB/s to the Qnap, why shouldn’t do it the other way?

  32. Hi Mattew,

    So with your last comment, just to confirm, you’re also using the switches for VM connectivity as well (not just storage)? This is more information that brings in to play other things that can be causing issues and more complexity.

    Are the two switches in the stack connected, or are they completely separate?

    Going back to the main potential issue:

    Since you’re using VLANs, if they are changing MTU sizes, then the mismatch on hosts/storage/network layer will cause packet fragmentation. This is not good on any network (especially a storage/iSCSI network).

    Packet fragmentation can cause a whole slew of issues when there are things mis-configured on the network, one of these includes speed issues going in certain ways (as packets in one direction are fragmented).

    I’m not going to get in to too many details, but let’s say with jumbo frames, packets of 9,000 are being transmitted, when a VLAN tag is added to it, it adds 4 bytes (I think 4 is correct), which makes the packet 9,004 bytes. If the devices are configured for jumbo frames of 9,000, when it receives the packet, it will request fragmentation and a re-transmit because it can’t accept the initial packet. Which will cause not only a delay in fragmentation, but will also increase network congestion, as well as reduce performance.

    If the MTU and packet size is not constant across the network and all devices, connections and speed will appear to be fine going one way (example: the receiving glove is big enough to hold the small ball), but packets will be dropped and re-transmits will occur going the other direction (example: because the ball is too big for the glove to be caught, so it will ask for two small balls instead of the big ball).

    This also brings in a new potential issue that you need to check. There’s a chance that while a switch may support jumbo frames, it may not support jumbo frames over VLANs. You need to check in to this added complexity, you also need to make that if it does support jumbo over VLAN, that there isn’t a separate setting to enable this. You’ll also need to adjust all your MTUs, and make sure VLAN configuration is constant on all your network devices so make sure the standard packet transmission size matches.

    And on a final note, if you are using the same switches for VM connectivity, and your running different MTUs on a bunch of different devices, you’re going to run in to a whole bunch of issues later on as well.

  33. I’m using the stack for the VM traffic as well. I’m not routing traffic between the storage VLAN and the others, I have no VLAN tagging on the esx host nor on the QNAP. From what I know, the 802.1q vlan tag is added when i packet needs to traverse the switch going to another vlan, but this is not my case.
    I will try on monday with two separate switches dedicated to iSCSI only. Maybe I will have to disable jumbo frames because probably I can get 2 switches but I do not know if they will have jumbo support.
    Anyway, from what I see, there is no performance loss, but a bottleneck of 1Gbit when the QNAP sends data, this has not much in common with a possible packet fragmentation. Sorry but I’m skeptical about the theory of fragmentation = culprit. If it would be the case, It would not about 1Gbit…
    But, I know you’re right, it could cause issues, so I will try monday with 2 switches dedicated to iSCSI.

    I will be back 🙂
    Many thanks.

  34. Today I took 2 spare switches, both with jumbo frames support and I connected an iSCSI subnet to each one.
    I’d want to be clear, I’m not trusting only the benchmark I run, but I’m also keeping an eye on each NIC to see if and how they are working together.
    I also created a new vswitch with both the NICs (they are on separated PCI cards), set the mtu9000 and connected to my initiator.

    Unfortunately, the results are not changed. But I have one detail more:

    Wrinting to the iscsi datastore I see both NIC receiving ad semi-full speed(115MB/s on NIC3, 104MB/s on NIC4,looking on the QNAP side).

    Reading from the iscsi datastore I see both NIC sending but 55MB/s on NIC3 + 55MB/s on NIC4. Here the fun part: If I disconnect the NIC4 during the transfert, the NIC3 starts sending at full speed (more than 110MB/s). If I reconnect the NIC4, both NICs goes to 55MB/s again.

    It seems the QNAP array limited to 100-110MB/s, but it is not, moreover, how it can write at 220MB/s and read at half speed?

    So, different vmware configs, different switches, different setups, same QNAP.

    Really I don’tknow what my next try, any suggestion?

  35. Just to try, because I have 4 NICs on the QNAP, i setup a third NIC with a third subnet on a third switch.

    256MB/s writing (about 85MB/s on each NIC)
    104MB/s reading (about 35MB/s on each NIC)

    This is cause by some bug on the QNAP side. I also tried creating another disk/LUN, allocation a few TB of free space, but the reading speeds are ridiculous.

    🙁

  36. Hi Mattew,

    Could you list what subnets you’re using for the configuration?

    Also, do me a favour and disable jumbo frames (I’m wondering if this is another issue I’ve heard of). Try using the standard 1500, make sure you set this on the hosts as well as the QNAP system.

    And are the servers firmware fully up to date (including NIC firmware)? What type of servers are they? Did you use the normal ESXi install image, or did you use a vendor customized install image (ex. HP, IBM, etc…).

  37. 10.1.7.0/24 (target .12, inititators .22/.24)
    10.1.17.0/24 (target .13, initiators .23/.25)
    10.1.27.0/24 (target .14, initiators .26/.27)

    Jumbo were disabled on the last run but the behavoir was the same.
    The QNAP is updated to 4.2.1 (last firmware) and the ESXi are update to 6.0u2, I’m using custom lenovo firmware (20150420 was the initial release).

    Do you suggest me to try with a different server on the unused iscsi target?

  38. Sorry I miss an answer. IBM/Lenovo System x5550 M5: Xeon E5-2620 v3 2.4GHz (dual socket) + 128GB ram each. Primary DS Fibre channel, QNAP for data protector only.

  39. x3550 M5, sorry.

  40. Hi Mattew,

    This is where troubleshooting gets tricky…

    Could you check to see if Lenovo has any updated VMWare drivers for the NICs on that system? It’s worth giving a try. This is going to be extremely difficult to troubleshoot.

    A few things it could be:
    -NIC driver on the ESXi hosts
    -Other drivers on the ESXi hosts
    -It could be a bug in the QNAP system
    -It could be a configuration issue on the QNAP system
    -It might be a RAID10 issue on the QNAP system, it might be worthwhile trying a different level of RAID to see what performance is like.
    -Could be due to disk cache settings on the QNAP system, can you check these and tell me if disk cache is turned on
    -There’s an extremely small chance it may be the NCQ settings on the drive, it might be worthwhile finding out how and if you can disable NCQ on the QNAP device, and see if this makes a difference.

  41. Unfortunately I have news =)

    I tried with a new initiator (2 subnets because I removed the third one), using this multipath.conf:

    device {
    vendor “QNAP”
    product “iSCSI Storage”
    path_selector “round-robin 0”
    path_grouping_policy group_by_prio
    getuid_callout “/sbin/scsi_id-g-u-s /block/%n”
    path_checker directio
    failback immediate
    prio const
    rr_min_io 1
    no_path_retry 5
    }

    Same results, I can write with NICs running both at full speed, but reading is half speed.
    Ok, now I know the culprit is not the switch nor the vmware cluster, do you agree?

    I’m adding a detail: using dd on the QNAP via ssh, I can get 297MB/s reading the lvm logical partition (I’m working with che SSD cache disabled). I’m just confirming what was clear, the array can perform better than 110MB/s… there would be enough meat for 3 NICs working together….

    What do you think about? I cannot change raid level but as last option I can reset the QNAP to factory defaults. Maybe something is going wrong there.

    THe QNAP has a kinda of linux OS, do you want me to search for anything from the CLI?

  42. First time that I saw an upload exceeding 60MB/s on each NICs at the same time!!
    As my last post, yesterday I connect a linux box (2 NICs) to a second target on the QNAP, but the reading speed was limited to 110MB/s as with the ESXi hosts.
    Before giving up, I tried to read at the same time from both targets, using the ESXi host AND the linux box. This way I saw for the first time, the QNAP NICs sendind at over 75MB/s each, this means I exceeded the 1Gbit limitation!! So, It’s possible!

    One thing that It’s strange for me: on the QNAP gui, looking on the target active connections, I can see only the connection from 1 subnet (10.1.17.x), but looking on the command line of both the QNAP and the ESXi, both the subnets are connected (10.1.7.x and 10.1.17.x on port 3260).
    So, in my opinion, there is kinda limitation “per initiator”. 2 session from the same initiaton (even with 2 different NICs) cannot exceed 1Gbit… Is it possible?
    Maybe this is the reason why on the QNAP gui I can see just one connection per host, It “consider” just only 1 session when sending, but it must accept packets from both session when receiving…

    What to no now?

  43. Hi Mattew,

    I’m really starting to think that it is something on the QNAP device that is causing the speed/performance issues. I’m not sure if it’s the iSCSI target, or configuration values inside of the OS. But if it was me, if it were possible, I’d reset the device and start from scratch, also possibly try a different RAID level (just for the heck of it).

    Then ultimately, I would reach out to QNAP support to see if they can comment on the issue you’re experiencing.

    On a side note, I’m not saying this holds true for all entry-level NAS and SAN units, but when I was playing around with iSCSI on my Synology device, it was almost un-usable, for me to get any type of real-world production performance, I had to use NFS. This is one of the reasons why I made the investment on my HPe MSA 2040 unit, and haven’t had any issues since (I don’t even know what max reads/writes I’m getting on it, but I see it hit and sustain ~500MB/sec regularly). I still kept my Synology unit as an NFS vSphere Data protection store (vSphere data protection replication works like a dream on it).

  44. Hello! Here we are again 🙂
    One month later, I reset the QNAP and I updated to the last release 4.2.2, but nothing changed.
    Can you see the screenshot?

    Any comment?

  45. Hello again, I have updates.

    On the final step of the VDP configuration wizard (post-deploy), it asks to do a performance test, so I selected the option and I did this test. During the test I saw the NICs uploading at full speed, 110MB/s each.
    I was very happy but after the test is finish, the reading speeds decreased to 1Gbps, again.

    Not happy I tried another test: I connected 2 vmdk to my test VM and I start reading from both on the same time. It has been a success, again, over 100MB/s on each NIC. Now it seems kinda limitation on each VMDK, just on writes, not on reads… Is there this kind of settings somewhere on a vmware cluster?

  46. Hey Matthew,

    Good idea on the test! It sounds like something is wrong in the config.

    Just curious, have you confirmed you have round robin enabled? Played with any of the settings for balancing?

    It sounds like you might have trunking enabled, or something along those lines. Trunking (Link Aggregation) should never be enabled when using MPIO. Trunking allows the utilization of only one link per connection.

  47. Hello,

    I have no doubts about the phisical configuration of the iSCSI lines.
    Inside my VM I made a RAID0 device by using 2 virtual disks (2 vmdks on the same qnap iscsi datastore).

    Look here:

    root@testvm ~ # mkfs.btrfs /dev/sdc /dev/sdd
    btrfs-progs v4.6.1
    See http://btrfs.wiki.kernel.org for more information.

    Label: (null)
    UUID: 63468aab-df86-4033-83fa-658314799dee
    Node size: 16384
    Sector size: 4096
    Filesystem size: 64.00GiB
    Block group profiles:
    Data: RAID0 2.01GiB
    Metadata: RAID1 1.01GiB
    System: RAID1 12.00MiB
    SSD detected: no
    Incompat features: extref, skinny-metadata
    Number of devices: 2
    Devices:
    ID SIZE PATH
    1 32.00GiB /dev/sdc
    2 32.00GiB /dev/sdd

    root@testvm ~ # mount /dev/sdc /mnt/backup/

    root@testvm ~ # dd if=/dev/zero of=/mnt/backup/test bs=1M cou
    16384+0 records in
    16384+0 records out
    17179869184 bytes (17 GB) copied, 75.9357 s, 226 MB/s

    root@testvm ~ # dd if=/mnt/backup/test of=/dev/null bs=1M
    16384+0 records in
    16384+0 records out
    17179869184 bytes (17 GB) copied, 87.0977 s, 197 MB/s

    What to check? It’s obviuos that the QNAP can send over 100MB/s on each NIC if combined 2 virtual disks. On the single vmdk I can write at this speed but I can read only at half speed…

    I think the issue is somewhere on the vmware side. I have absolutely NO IOPS limit configured. I never touched these kind of settings of it because I never needed to limit anything…

    How can we explain theese results? 🙂

  48. Another try: Tried to disable delayed ACK as https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1002598

    I rebooted my hosts but… nothing changed.

  49. Hi Matthew,

    Just curious, have you looked at the logs on the ESXi hosts?

  50. Can you suggest which logs to check and what should I search into?

    1) tried with another initiator (linux box) with the same results
    2) tried different phisical switches, same result
    3) tried with/without jumbo, with/without delayed ack, with different subnets, dirrefent vswitches.. same result.

    I know that I can get the speed that my array is capable doing :

    1) reading from the linux box and the esx host at the same time
    2) reading from both esx host at the same time
    3) reading from 2 virtual disks at the same time (same Vm, single host)

    I’am exausted. On monday I will setup another iSCSI target somewhere else with a single ssd as disk with 2 nics and I will connect my cluster. I will report back the result. I want to exclude the esx host from the culprits…

    Thanks again..

  51. Hi Matthew,

    In that case, I still believe that it’s something on the array itself, or something that is configured on the networking layer.

    I’m sorry I can’t be of more assistance, but I’m not that familiar with the QNAP devices (I tend to stay away from these entry level NAS devices, as your results are similar to the results I had with my synology device, thus why I don’t use it anymore).

  52. Hi, sorry for many messages, I’m probably annoying you and flooding your blog, pardon 🙂
    I repeat the test using a different target configured on a linux box with an SSD (tgtd as iscsi daemon).
    Same subnet, same switch, same vmware config…

    The results are not different:

    # mount /dev/sdb /mnt/
    # dd if=/dev/zero of=/mnt/test bs=1M count=8192
    8192+0 records in
    8192+0 records out
    8589934592 bytes (8.6 GB) copied, 37.3294 s, 230 MB/s
    # dd if=/mnt/test of=/dev/null bs=1M
    8192+0 records in
    8192+0 records out
    8589934592 bytes (8.6 GB) copied, 67.4717 s, 127 MB/s

    Again, doing a RAID0 with 2 virtual disks solve the reading issue…
    Now it seems my cluster the culprit… So, do you know any vmware setting that can limit the reading speeds from iSCSI? Attaching a vmdk on my FC SAN I have no limitation at all… It seems only on iSCSI..

  53. ps: I attach the ‘dd’ output but when benchmarking I’m always watching the NICs speeds… they correspond to the speeds displayed by ‘dd’, so the speeds are real..

  54. Hi Matthew,

    I can’t remember if I asked this before, but what type of servers do you have that are running ESXi?

    I’m wondering if this might be related to outdated NIC drivers, or something along those lines…

  55. Yes you asked that; we excluded similar issues (NICs & vmware) because I tried with a linux box (as initiator the first time) and the behavoir was the same.
    I repeated the tests now with a XEN cluster connected to the linux box target set-up before… same behavior, just a little bit worst due to slower cpu performances (210MB/s write, 105MB/s read)… I also used different switches.

    I’ll give up. There must be something weird somewhere but I can’t guess where. The only thing in common on the two setups are that QNAP & the new SSD target has linux as OS and probably the same network stack or iscsi target software..

    I’m not able to solve this 🙂

    Thanks again 😀

  56. Hello,

    again here, just to link two threads of somebody having the same issue I have.
    I’m writing on your blog because maybe it will be worth for somebody reading this article too..

    It seems not solved but I’m not alone.

    https://forums.freenas.org/index.php?threads/iscsi-tests-suggest-a-problem-with-read-performance.26180/

    https://forums.freenas.org/index.php?threads/low-read-performance-on-multipath-iscsi-with-ubuntu-initiator.25520/

    More or less, same setup: writing to the iSCSI target can saturate both NICs, but reading can’t.

 Leave a Reply

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

(required)

(required)