Feb 072017
 

With vSphere 6.5 came VMFS 6, and with VMFS 6 came the auto unmap feature. This is a great feature, and very handy for those of you using thin provisioning on your datastores hosted on storage that supports VAAI. However, you still have the ability to perform a manual UNMAP at high priority, even with VMware vSphere 7 and vSphere 8.

A while back, I noticed something interesting when running the manual unmap command for the first time. It isn’t well documented, but I thought I’d share for those of you who are doing a manual LUN unmap for the first time. This document will also provide you with the command to perform a manual unmap on a VMFS datastore.

Reason:

Automatic unmap (auto space reclamation) is on, however you want to speed it up or have a large chunk of block’s you want unmapped immediately, and don’t want to wait for the auto feature.

Problem:

I wasn’t noticing any unmaps were occurring automatically and I wanted to free up some space on the SAN, so I decided to run the old command to forcefully run the unmap to free up some space:

esxcli storage vmfs unmap --volume-label=DATASTORENAME --reclaim-unit=200

(The above command runs a manual unmap on a datastore)

After kicking it off, I noticed it wasn’t completing as fast as I thought it should be. I decided to enable SSH on the host and took a look at the /var/log/hostd.log file. To my surprise, it wasn’t stopping at a 200 block reclaim, it just kept cycling running over and over (repeatedly doing 200 blocks):

2017-02-07T14:12:37.365Z info hostd[XXXXXXXX] [Originator@XXXX sub=Libs opID=esxcli-fb-XXXX user=root] Unmap: Async Unmapped 200 blocks from volume XXXXXXXX-XXXXXXXX-XXXX-XXXXXXXXX
2017-02-07T14:12:37.978Z info hostd[XXXXXXXX] [Originator@XXXX sub=Libs opID=esxcli-fb-XXXX user=root] Unmap: Async Unmapped 200 blocks from volume XXXXXXXX-XXXXXXXX-XXXX-XXXXXXXXX
2017-02-07T14:12:38.585Z info hostd[XXXXXXXX] [Originator@XXXX sub=Libs opID=esxcli-fb-XXXX user=root] Unmap: Async Unmapped 200 blocks from volume XXXXXXXX-XXXXXXXX-XXXX-XXXXXXXXX
2017-02-07T14:12:39.191Z info hostd[XXXXXXXX] [Originator@XXXX sub=Libs opID=esxcli-fb-XXXX user=root] Unmap: Async Unmapped 200 blocks from volume XXXXXXXX-XXXXXXXX-XXXX-XXXXXXXXX
2017-02-07T14:12:39.808Z info hostd[XXXXXXXX] [Originator@XXXX sub=Libs opID=esxcli-fb-XXXX user=root] Unmap: Async Unmapped 200 blocks from volume XXXXXXXX-XXXXXXXX-XXXX-XXXXXXXXX
2017-02-07T14:12:40.426Z info hostd[XXXXXXXX] [Originator@XXXX sub=Libs opID=esxcli-fb-XXXX user=root] Unmap: Async Unmapped 200 blocks from volume XXXXXXXX-XXXXXXXX-XXXX-XXXXXXXXX
2017-02-07T14:12:41.050Z info hostd[XXXXXXXX] [Originator@XXXX sub=Libs opID=esxcli-fb-XXXX user=root] Unmap: Async Unmapped 200 blocks from volume XXXXXXXX-XXXXXXXX-XXXX-XXXXXXXXX
2017-02-07T14:12:41.659Z info hostd[XXXXXXXX] [Originator@XXXX sub=Libs opID=esxcli-fb-XXXX user=root] Unmap: Async Unmapped 200 blocks from volume XXXXXXXX-XXXXXXXX-XXXX-XXXXXXXXX
2017-02-07T14:12:42.275Z info hostd[XXXXXXXX] [Originator@XXXX sub=Libs opID=esxcli-fb-9XXXX user=root] Unmap: Async Unmapped 200 blocks from volume XXXXXXXX-XXXXXXXX-XXXX-XXXXXXXXX
2017-02-07T14:12:42.886Z info hostd[XXXXXXXX] [Originator@XXXX sub=Libs opID=esxcli-fb-XXXX user=root] Unmap: Async Unmapped 200 blocks from volume XXXXXXXX-XXXXXXXX-XXXX-XXXXXXXXX

That’s just a small segment of the logs, but essentially it just kept repeating the unmap/reclaim over and over in 200 block segments. I waited hours, tried to issue a “CTRL+C” to stop it, however it kept running.

I left it to run overnight and it did eventually finish while I was sleeping. I’m assuming it attempted to unmap everything it could across the entire datastore. Initially I thought this command would only unmap the specified block size.

When running this command, it will continue to cycle in the block size specified until it goes through the entire LUN. Be aware of this when you’re planning on running the command.

Essentially, I would advise not to manually run the unmap command unless you’re prepared to unmap and reclaim ALL your unused allocated space on your VMFS 6 datastore. In my case I did this because I had 4TB of deleted data that I wanted to unmap immediately, and didn’t want to wait for the automatic unmap.

I thought this may have been occurring because the automatic unmap function was on, so I tried it again after disabling auto unmap. The behavior was the same and it just kept running.

If you are tempted to run the unmap function, keep in mind it will continue to scan the entire volume (despite what block count you set). With this being said, if you are firm on running this, choose a larger block count (200 or higher) since smaller blocks will take forever (tested with a block size of 1 and after analyzing the logs and rate of unmaps, it would have taken over 3 months to complete on a 9TB array).

Update May 11th 2018: When running the manual unmap command with smaller “reclaim-unit” values (such as 1), your host may become unresponsive due to a memory overflow. vMotion’s will cease to function, and your ESXi host may need a restart to become fully functional. I’ve experienced this behavior twice. I highly suggest that if you perform this command, you do so while the host is in maintenance mode, and that your restart the host after a successful unmap sweep.

  12 Responses to “Perform manual VMFS UNMAP on VMware vSphere and ESXi”

  1. That is be design, you specify a reclaim unit size then it continually cycles through all un-used space in that increment and attempts to unmaps space on that volume. My colleague did a good overview on how that works at VMworld 2013: http://www.slideshare.net/VMworld/vmworld-2013-capacity-jail-break-vsphere-5-space-reclamation-nuts-and-bolts

    Also you don’t need to run it manually in vSphere 6.5 unless you are using legacy VMFS5 datastores as it is now automatic again. Here’s an update I did on unmap in vSphere 6.5: http://vsphere-land.com/news/automatic-space-reclamation-unmap-is-back-in-vsphere-6-5.html

  2. I am actually facing the same issue and I am wondering about the efficiency of this command.

    I understand it creates a 200MB file and then rewrites the file everywhere on the datastore to unmap space (probably by writing zeroes, I didn’t look deeper into this).

    I’ve set up vsphere 6.5 and have vmfs5 and 6 LUNs on which I can test things. Now just like you I didn’t see manual unmapping was not intended to be done on vmfs6, which is already doing it automatically, so the command looped at beginning. The host had huge latencies towards the datastore and it used quite some CPU but the command never finished even 20hours after start (on 200GB LUN). Then I tried to kill the process but the host won’t react at all. Even df -h or ls -alh in the datastore would make the ssh shell hang forever. At some point the only solution was to physically reboot the Server.

    Then I saw that it should only be used on VMFS5 and I tried it on VMFS5 LUN, but I got the exact same results.

    I have LUNs that have couple hundred GBs of free space and would like to release it from Storage, but nothing happens. Neither automatically nor manually.

    Did you find a solution to this? VMware support is on vacation at the moment, responding weird stuff every 3 days, which is quite suprising given that we are a Service Provider customer…

  3. Hi psy,

    While with VMFS6, they did release automatic unmapping, you can still use the command if you please.

    In every occurrence I used the command (and still do), it completes for me with success. For me on a 9TB datastore, it takes around 24 hours using the block size of 200.

    I’m not sure if it still uses the temporary files to unmap (this was documented in older versions, however I think I may have read something a while ago that this has changed).

    You have the choice to use the auto unmap, or manual unmap, or both!

    It sounds like there’s an issue in your environment that’s causing this not to function. It could be something as simple as the performance of your SAN.

    Could you provide more information on the model of your SAN and the fabric you use to connect to it?

    Stephen

  4. Hi Stephen,

    It is a Huawei 5500 v3 full SSD connected with 4×10 GB. I don’t think it is the limiting factor or that there are any, the hardware is new…

    But I still didn’t find what causes the manual unmap command to behave like this. How much more latency do you get on your Datastore when you execute that command?

    Thanks a lot so far 😉

  5. Hi psy,

    I do get some latency, but all my VMs are still usable (and like I said before, the command does finish eventually).

    I’m curious, do you have the firmware and everything up to date on the SAN? I’m wondering if something with the VAAI provide is malfunctioning…

  6. Hi Stephen,

    It looks like the problem was due to the Huawei v3 SAN software version. A couple hotfixes and firmwares later, everything worked just fine.

    Something about iSCSI special caches not emptying, or something like that.

    Trying to free the space with sucessful command line now.

    Thanks for your kind advices 😉

  7. […] The above completes the first step of releasing the storage back to the host. Now you can either let the automatic unmap occur slowly overtime if you’re using VMFS6, or you can manually kick it off. I decided to manually kick it off using the steps I have listed at: https://www.stephenwagner.com/2017/02/07/vmfs-unmap-command-on-vsphere-6-5-with-vmfs-6-runs-repeated… […]

  8. […] Perform manual VMFS unmap on vSphere 6.5 and 6.7 with VMFS 6 – https://www.stephenwagner.com/2017/02/07/vmfs-unmap-command-vsphere-6-5-with-vmfs-6-auto/ […]

  9. Hi Stephen,

    We are running 3x esx 6.5 hosts going back to an MSA 2040, EQL 6210 and a Compellent SCV 2020.

    The compellent is SSD, with an expansion shelf filled with slower 4TB spinning disks. When we check the auto unmap on our hosts, we can see this occurring on the compellent only, and for the SSD volume only. The compellent has a default datapage size of 2MB, it appears the SSD volume is running auto unmap fine with a datapage size of 512, but the other spinning disk volume with 4TB disks where the datapage size is 2MB, this is not occurring. Furthermore it does not appear to be running at all for the EQL or MSA array. I am yet to check the datapage size for the volumes on these yet. What we be our best option to resolve this? Can we setup a cron job to do manual unmap for the volumes not working? What’s the impact on the array during prod hours? I take it we would have to re-create the volumes with a 512k datapage size to fully resolve?

    James.

  10. Hi James,

    Due to the native page size on the MSA array, it doesn’t support auto unmap. The array only supports manual unmap (there’s no way to change this).

    You can run the manual unmap, however, do not do this during productive as it’s very intensive on the storage. Also, I would reboot the host you run the command on before and after the command as I’ve seen memory issues caused by running it (I have a sneaking suspicion that once this caused volume corruption because the host locked and didn’t complete).

    Hope this helps!

    Cheers,
    Stephen

  11. Hi Stephen,

    Thanks very much for your reply I really appreciate it. Would the command you have given is this article be the recommended approach:

    esxcli storage vmfs unmap –volume-label=DATASTORENAME –reclaim-unit=200

    Any way to pre-calculate expected run time per TB?

    Cheers.

  12. Hi James,

    Take a look at https://www.stephenwagner.com/2018/11/17/vmfs-auto-unmap-space-reclamation-not-working-vsphere/

    Page size on MSA 2040 is 4MB. So use 4 instead of 200. You can use larger chunks like 200 which will run way faster, but it’ll only unmap page sizes that are 200MB or larger.

    There is no way to estimate time.

    Hope this helps.

    Stephen

 Leave a Reply

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

(required)

(required)