The Tech Journal

Sophos UTM – 9.410-6 services crash and /tmp volume full after firmware upgrade

Feb 062017

Had a nasty little surprise with one of my clients this afternoon. Two days ago I updated their Sophos UTM (UTM220) to version 9.410-6 without any issues.

However, today I started to receive notifications that services were crashing (specifically ACC device agent).

After receiving a few of these, I logged in to check it out. Immediately there was no visible errors on the UTM itself, but after some further digging, I noticed these event logs in the “System Messages” log file:

2017:02:06-17:09:32 mail partitioncleaner[7918]: automatic cleaning for partition /tmp started (inodes: 0/100 blocks: 100/85)

2017:02:06-17:09:32 mail partitioncleaner[7918]: stopping deletion: can’t delete more files

Looks like a potential storage problem? Yes it was, but slightly more complicated.

I enabled SSH on the UTM and issued the “df” command (show’s volume usage), and found that the /tmp volume was 100% full.

Doing a “ls” and “ls -hl”, I found there were 25+ files that were around 235MB in size called: “AV-malware-names-XXXX-XXXXXX”.

Restarting the unit clears those files, however they come back shortly (I noticed it would add one every 5-10 minutes).

After some further digging (still haven’t heard back from Sophos on the support case), I came across some other users experiencing the same issues. While no one found a permanent resolution, they did mention this had to do with the Avira AV engine or possibly the dual scan engine.

Checking the UTM, I noticed that we had the E-Mail scanning configured for dual scan.

Solution (temporary workaround):

I went ahead and configured the E-Mail scanner (the only scanner I had that was using dual scan) to use single scan only. I then restarted the UTM. In my environment the default setting for single scanning is set to “Sophos”.

I am now sitting here with 30 minutes of uptime and absolutely no “AV-malware-names-XXXX-XXXXXX” files created.

I will post an update when I hear back from Sophos support.

Hope this helps someone else!

Update (after original post):

I heard back from Sophos support, this is a known bug in 9.410. The current official workaround is to change to single scan and use the AVIRA engine instead of the Sophos engine.

Update #2:

Received notification this morning of a new firmware update available (Version: 9.411003 – Maintenance Release). While I haven’t installed it, it appears from the Bugfixes notes that it was released to fix this issue:

Fix [NUTM-6804]: [AWS] Update breaks HVM standalone installations
Fix [NUTM-6747]: [Email] SAVI scanner coredumps permanently in MailProxy after update to 9.410
Fix [NUTM-6802]: [Web] New coredumps from httpproxy after update to v9.410

Update #3:

I noticed that this bug was interrupting some mailflow on my Sophos UTM, as well as some of my clients. I went ahead and as an emergency situation, installed 9.411-3.

Things were fine for around 10 hours until I started to receive notification of the HTTP proxy failing and requiring restart. Logging in to the UTM, it was very unresponsive, sometimes completely unresponsive for around 10 minutes. Web browsing was not functioning at all on the internal network behind the UTM.

This issue still hasn’t been resolved. Hopefully we see a stable working fix sometime soon.

HPE MSA 2040 – Disk Failure considerations and steps

Uncategorized 13 Responses »

Jan 272017

Greetings everyone!

I had my first predicted disk failure occur on my HPE MSA 2040. As always, it was a breeze contacting HPE support to get the drive replaced (since my unit has a 4 hour response warranty).

However, with this being my first drive swap I came across something worth mentioning. Typically in RAID arrays when a disk fails, you simply swap out the failed disk and it starts rebuilding, this is NOT the case if you have an HPE MSA 2040 that’s fully loaded with no spares configured.

If you have global spares, the moment the disk is failed, it will automatically rebuild on to available configured spares.

If you don’t have any global spares (my case), the replacement disk is marked as unused and available. You must set this disk as a spare in the SMU for the rebuild to start.

One additional note, if you do have spares and a disk fails, when you replace the disk that failed it will not automatically rebuild that disk back from the spare. You must force fail (pull out) the spare disk for it to start rebuilding on the freshly replaced disk. Always confirm current redundancy levels and activity before forcefully failing any disks!

As per HPE’s MSA 1040/2040 Best Practices document:

Source: https://h50146.www5.hpe.com/products/storage/whitepaper/pdfs/4AA4-6892ENW.pdf

After VMware vSphere 6.5 Upgrade/Migration – The VMware enhanced authentication plugin has updated it’s SSL certificate in FireFox

ESXi, vSphere 3 Responses »

Dec 082016

So you just completed your migration from an earlier version of vSphere up to vSphere 6.5 (particularly vCenter 6.5 Virtual Appliance). When trying to log in to the vSphere web client, you receive numerous “The VMware enhanced authentication plugin has updated it’s SSL certificate in Firefox. Please restart Firefox.”. You’ll usually see 2 of these messages in a row on each page load.

You’ll also note that the “Enhanced Authentication Plugin” doesn’t function after the install (it won’t pull your Active Directory authentication information).

To resolve this:

Uninstall all vSphere plugins from your workstation. I went ahead and uninstalled all vSphere related software on my workstation, this includes the deprecated vSphere C# client application, all authentication plugins, etc… These are all old.

Open up your web browser and point to your vCenter server (https://vCENTERSERVERNAME), and download the “Trusted root CA certificates” from VMCA (VMware certificate authority).

Download and extract the ZIP file. Navigate through the extracted contents to the windows certs. These root CA certificates need to be installed to your “Trusted Root Certification Authorities” store on your system, and make sure you skip the “Certificate Revocation List” file which ends in a “.r0”.

To install them, right click, choose “Install Certificate”, choose “Local Machine”, yes to UAC prompt, then choose “Place all certificates in the following store”, browse, and select “Trusted Root Certification Authorities”, and finally finish. Repeat for each of the certificates. Your workstation will now “trust” all certificates issued by your VMware Certificate Authority (VMCA).

You can now re-open your web browser, download the “Enhanced Authentication Plugin” from your vCenter instance, and install. After restarting your computer, the plugin should function and the messages will no longer appear.

VMware vSphere 6.0 to VMware vSphere 6.5 – Upgrade and Migration

ESXi, vSphere No Responses »

Dec 072016

Well, I start writing this post minutes after completing my first vSphere 6.0 upgrade to vSphere 6.5, and as always with VMware products it went extremely smooth (although with any upgrade there are minor hiccups).

Thankfully with the evolution of virtualization technology, upgrades such as the upgrade to vSphere 6.5 is such a massive change to your infrastructure, yet the process is extremely simplified, can be easily rolled out, and in the event of problems has very simple clear paths to revert back and re-attempt. Failed upgrades usually aren’t catastrophic, and don’t even affect production environments.

Whenever I do these vSphere upgrades, I find it funny how you’re making such massive changes to your infrastructure with each click and step, yet the thought process and understanding behind it is so simple and easy to follow. Essentially, after one of these upgrades you look back and think: “Wow, for the little amount of work I did, I sure did accomplish a lot”. It’s just one of the beauties of virtualization, especially holding true with VMware products.

To top it all off you can complete the entire upgrade/migration without even powering off any of your virtual machines. You could do this live, during business hours, in a production environment… How cool is that!

Just to provide some insights in to my environment, here’s a list of the hardware and configuration:

-2 X HPE Proliant DL360p Gen8 Servers (each with dual processors, and each with 128GB RAM, no local storage)

-1 X HPE MSA2040 Dual Controller SAN (each host has multiple connections to the SAN via 10Gb DAC iSCSI, 1 connection to each of the dual controllers)

-VMware vSphere 6.0 running on Windows Virtual Machine (Windows Server 2008 R2)

-VMware Update Manager (Running on the same server as the vCenter Server)

-VMware Data Protection (2 x VMware vDP Appliances, one as a backup server, one as a replication target)

-VMware ESXi 6.0 installed on to SD-cards in the servers (using HPE Customized ESXi installation)

One of the main reasons why I was so quick to adopt and migrate to vSphere 6.5, was I was extremely interested in the prospect of migrating a Windows based vCenter instance, to the new vCenter 6.5 appliance. This is handy as it simplifies the environment, reduces licensing costs and requirements, and reduces time/effort on server administration and maintenance.

First and foremost, following the recommended upgrade path (you have to specifically do the upgrades and migrations for all the separate modules/systems in a certain order), I had to upgrade my vDP appliances first. For vDP to support vCenter 6.5, you must upgrade your vDP appliances to 6.1.3. As with all vDP upgrades, you must shut down the appliance, mark all the data disks as dependent, take a snapshot, and mount the upgrade ISO, and then boot and initiate the upgrade from the appliance web interface. After you complete the upgrade and confirm the appliance is functioning, you shut down the appliance, remove the snapshot, mark all data disks as independent (except the first Virtual disk, you only mark virtual disk 2+ and up as independent), and you’re done your upgrade.

A note on a problem I dealt with during the upgrade process for vDP to version 6.1.3 (appliance does not detect mounted ISO image) can be found here: http://www.stephenwagner.com/?p=1107

Moving on to vCenter! VMware did a great job with this. You load up the VMware Migration Assistant tool on your source vCenter server, load up the migration/installation application on a separate computer (the workstation you’re using), and it does the rest. After prepping the destination vCenter appliance, it exports the data from the source server, copies it to the destination server, shuts down the source VM, and then imports the data to the destination appliance and takes over the role. It’s the coolest thing ever watching this happen live. Upon restart, you’ve completed your vCenter Server migration.

A note on a problem I dealt with during the migration process (which involved exporting VMware Update Manager from the source server) can be found here: http://www.stephenwagner.com/?p=1115

And as for the final step, it’s now time to upgrade your ESXi hosts to version 6.5. As always, this is an easy task with VMware Update Manager, and can be easily and quickly rolled out to multiple ESXi hosts (thanks to vMotion and DRS). After downloading your ESXi installation ISO (in my case I use the HPE customized image), you upload it in to your new VMware Update Manager instance, add it to an upgrade baseline, and then attach the baseline to your hosts. To push this upgrade out, simply select the cluster or specific host (depending on if you want to rollout to a single host, or multiple at once), and remediate! After a couple restarts the upgrade is done.

A note on a problem I dealt with during ESXi 6.5 upgrade (conflicting VIBs marking image as incompatible when deploying HPE customized image) can be found here: http://www.stephenwagner.com/?p=1120

After all of the above, the entire environment is now running on vSphere 6.5! Don’t forget to take a backup before and after the upgrade, and also upgrade your VM hardware versions to 6.5 (VM compatibility version), and upgrade VMware tools on all your VMs.

Make sure to visit https://YOURVCENTERSERVER to download the VMware Certificate Authority (VMCA) root certificates, and add them to the “Trusted Root Certification Authorities” on your workstation so you can validate all the SSL certs that vCenter uses. Also, note that the vSphere C# client (the windows application) has been deprecated, and you now must use the vSphere Web Client, or the new HTML5 web client.

Happy Virtualizing! Leave a comment!

VMware vSphere 6.5 and 6.7 – Conflicting VIBs when upgrading ESXi to 6.5, 6.7, or 7.0

ESXi, vSphere 88 Responses »

Dec 072016

When upgrading VMware vSphere and your ESXi hosts to version 6.5, 6.7, or 7.0, you may experience an error similar to:

"The upgrade contains the following set of conflicting VIBs: Mellanox_bootbank_net.XXXXversionnumbersXXXX. Remove the conflicting VIBs or use Image Builder to create a custom ISO."

This is due to conflicting VIBs on your ESXi host. This post will go in to detail as to what causes it, and how to resolve it.

The issue

After successfully completing the migration from vCenter 6.0 (on Windows) to the vCenter 6.5 Appliance, all I had remaining was to upgrade my ESXi hosts to ESXi 6.5.

In my test environment, I run 2 x HPE Proliant DL360p Gen8 servers. I also have always used the HPE customized ESXi image for installs and upgrades.

It was easy enough to download the customized HPE installation image from VMware’s website, I then loaded it in to VMware Update Manager on the vCenter appliance, created a baseline, and was prepared to upgrade the hosts.

I successfully upgraded one of my hosts without any issues, however after scanning on my second host, it reported the upgrade as incompatible and stated: “The upgrade contains the following set of conflicting VIBs: Mellanox_bootbank_net.XXXXversionnumbersXXXX. Remove the conflicting VIBs or use Image Builder to create a custom ISO.”

The fix

I checked the host to see if I was even using the Mellanox drivers, and thankfully I wasn’t and could safely remove them. If you are using the drivers that are causing the conflict, DO NOT REMOVE them as it could disconnect all network interfaces from your host. In my case, since they were not being used, uninstalling them would not effect the system.

I SSH’ed in to the host and ran the following commands:

esxcli software vib list | grep Mell

(This command above shows the VIB package that the Mellanox driver is inside of. In my case, it returned “net-mst”)

esxcli network nic list

(This command above verifies which drivers you are using on your network interfaces on the host)

esxcli software vib remove -n net-mst

(This command above removes the VIB that contains the problematic driver)
After doing this, I restarted the host, scanned for upgrades, and successfully applied the new vCenter 6.5 ESXi Customized HPE image.

Hope this helps! Leave a comment!

Older Entries Newer Entries