Feb 072017
 

With vSphere 6.5 came VMFS 6, and with VMFS 6 came the auto unmap feature. This is a great feature, and very handy for those of you using thin provisioning on your datastores hosted on storage that supports VAAI.

I noticed something interesting when running the manual unmap command for the first time. It isn’t well documented, but I thought I’d share for those of you who are doing a manual LUN unmap for the first time.

Reason:

Automatic unmap (auto space reclamation) is on, however you want to speed it up or have a large chunk of block’s you want unmapped immediately, and don’t want to wait for the auto feature.

Problem:

I wasn’t noticing any unmaps were occurring automatically and I wanted to free up some space on the SAN, so I decided to run the old command to forcefully run the unmap to free up some space:

esxcli storage vmfs unmap –volume-label=DATASTORENAME –reclaim-unit=200

After kicking it off, I noticed it wasn’t completing as fast as I thought it should be. I decided to enable SSH on the host and took a look at the /var/log/hostd.log file. To my surprise, it wasn’t stopping at a 200 block reclaim, it just kept cycling running over and over (repeatedly doing 200 blocks):

2017-02-07T14:12:37.365Z info hostd[XXXXXXXX] [Originator@XXXX sub=Libs opID=esxcli-fb-XXXX user=root] Unmap: Async Unmapped 200 blocks from volume XXXXXXXX-XXXXXXXX-XXXX-XXXXXXXXX
2017-02-07T14:12:37.978Z info hostd[XXXXXXXX] [Originator@XXXX sub=Libs opID=esxcli-fb-XXXX user=root] Unmap: Async Unmapped 200 blocks from volume XXXXXXXX-XXXXXXXX-XXXX-XXXXXXXXX
2017-02-07T14:12:38.585Z info hostd[XXXXXXXX] [Originator@XXXX sub=Libs opID=esxcli-fb-XXXX user=root] Unmap: Async Unmapped 200 blocks from volume XXXXXXXX-XXXXXXXX-XXXX-XXXXXXXXX
2017-02-07T14:12:39.191Z info hostd[XXXXXXXX] [Originator@XXXX sub=Libs opID=esxcli-fb-XXXX user=root] Unmap: Async Unmapped 200 blocks from volume XXXXXXXX-XXXXXXXX-XXXX-XXXXXXXXX
2017-02-07T14:12:39.808Z info hostd[XXXXXXXX] [Originator@XXXX sub=Libs opID=esxcli-fb-XXXX user=root] Unmap: Async Unmapped 200 blocks from volume XXXXXXXX-XXXXXXXX-XXXX-XXXXXXXXX
2017-02-07T14:12:40.426Z info hostd[XXXXXXXX] [Originator@XXXX sub=Libs opID=esxcli-fb-XXXX user=root] Unmap: Async Unmapped 200 blocks from volume XXXXXXXX-XXXXXXXX-XXXX-XXXXXXXXX
2017-02-07T14:12:41.050Z info hostd[XXXXXXXX] [Originator@XXXX sub=Libs opID=esxcli-fb-XXXX user=root] Unmap: Async Unmapped 200 blocks from volume XXXXXXXX-XXXXXXXX-XXXX-XXXXXXXXX
2017-02-07T14:12:41.659Z info hostd[XXXXXXXX] [Originator@XXXX sub=Libs opID=esxcli-fb-XXXX user=root] Unmap: Async Unmapped 200 blocks from volume XXXXXXXX-XXXXXXXX-XXXX-XXXXXXXXX
2017-02-07T14:12:42.275Z info hostd[XXXXXXXX] [Originator@XXXX sub=Libs opID=esxcli-fb-9XXXX user=root] Unmap: Async Unmapped 200 blocks from volume XXXXXXXX-XXXXXXXX-XXXX-XXXXXXXXX
2017-02-07T14:12:42.886Z info hostd[XXXXXXXX] [Originator@XXXX sub=Libs opID=esxcli-fb-XXXX user=root] Unmap: Async Unmapped 200 blocks from volume XXXXXXXX-XXXXXXXX-XXXX-XXXXXXXXX

That’s just a small segment of the logs, but essentially it just kept repeating the unmap/reclaim over and over in 200 block segments. I waited hours, tried to issue a “CTRL+C” to stop it, however it kept running.

I left it to run overnight and it did eventually finish while I was sleeping. I’m assuming it attempted to unmap everything it could across the entire datastore. Initially I thought this command would only unmap the specified block size.

When running this command, it will continue to cycle in the block size specified until it goes through the entire LUN. Be aware of this when you’re planning on running the command.

Essentially, I would advise not to manually run the unmap command unless you’re prepared to unmap and reclaim ALL your unused allocated space on your VMFS 6 datastore. In my case I did this because I had 4TB of deleted data that I wanted to unmap immediately, and didn’t want to wait for the automatic unmap.

I thought this may have been occurring because the automatic unmap function was on, so I tried it again after disabling auto unmap. The behavior was the same and it just kept running.

 

If you are tempted to run the unmap function, keep in mind it will continue to scan the entire volume (despite what block count you set). With this being said, if you are firm on running this, choose a larger block count (200 or higher) since smaller blocks will take forever (tested with a block size of 1 and after analyzing the logs and rate of unmaps, it would have taken over 3 months to complete on a 9TB array).

  6 Responses to “VMFS unmap command on vSphere 6.5 with VMFS 6”

  1. That is be design, you specify a reclaim unit size then it continually cycles through all un-used space in that increment and attempts to unmaps space on that volume. My colleague did a good overview on how that works at VMworld 2013: http://www.slideshare.net/VMworld/vmworld-2013-capacity-jail-break-vsphere-5-space-reclamation-nuts-and-bolts

    Also you don’t need to run it manually in vSphere 6.5 unless you are using legacy VMFS5 datastores as it is now automatic again. Here’s an update I did on unmap in vSphere 6.5: http://vsphere-land.com/news/automatic-space-reclamation-unmap-is-back-in-vsphere-6-5.html

  2. I am actually facing the same issue and I am wondering about the efficiency of this command.

    I understand it creates a 200MB file and then rewrites the file everywhere on the datastore to unmap space (probably by writing zeroes, I didn’t look deeper into this).

    I’ve set up vsphere 6.5 and have vmfs5 and 6 LUNs on which I can test things. Now just like you I didn’t see manual unmapping was not intended to be done on vmfs6, which is already doing it automatically, so the command looped at beginning. The host had huge latencies towards the datastore and it used quite some CPU but the command never finished even 20hours after start (on 200GB LUN). Then I tried to kill the process but the host won’t react at all. Even df -h or ls -alh in the datastore would make the ssh shell hang forever. At some point the only solution was to physically reboot the Server.

    Then I saw that it should only be used on VMFS5 and I tried it on VMFS5 LUN, but I got the exact same results.

    I have LUNs that have couple hundred GBs of free space and would like to release it from Storage, but nothing happens. Neither automatically nor manually.

    Did you find a solution to this? VMware support is on vacation at the moment, responding weird stuff every 3 days, which is quite suprising given that we are a Service Provider customer…

  3. Hi psy,

    While with VMFS6, they did release automatic unmapping, you can still use the command if you please.

    In every occurrence I used the command (and still do), it completes for me with success. For me on a 9TB datastore, it takes around 24 hours using the block size of 200.

    I’m not sure if it still uses the temporary files to unmap (this was documented in older versions, however I think I may have read something a while ago that this has changed).

    You have the choice to use the auto unmap, or manual unmap, or both!

    It sounds like there’s an issue in your environment that’s causing this not to function. It could be something as simple as the performance of your SAN.

    Could you provide more information on the model of your SAN and the fabric you use to connect to it?

    Stephen

  4. Hi Stephen,

    It is a Huawei 5500 v3 full SSD connected with 4×10 GB. I don’t think it is the limiting factor or that there are any, the hardware is new…

    But I still didn’t find what causes the manual unmap command to behave like this. How much more latency do you get on your Datastore when you execute that command?

    Thanks a lot so far 😉

  5. Hi psy,

    I do get some latency, but all my VMs are still usable (and like I said before, the command does finish eventually).

    I’m curious, do you have the firmware and everything up to date on the SAN? I’m wondering if something with the VAAI provide is malfunctioning…

  6. Hi Stephen,

    It looks like the problem was due to the Huawei v3 SAN software version. A couple hotfixes and firmwares later, everything worked just fine.

    Something about iSCSI special caches not emptying, or something like that.

    Trying to free the space with sucessful command line now.

    Thanks for your kind advices 😉

 Leave a Reply

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

(required)

(required)