Feb 062017
 

Had a nasty little surprise with one of my clients this afternoon. Two days ago I updated their Sophos UTM (UTM220) to version 9.410-6 without any issues.

However, today I started to receive notifications that services were crashing (specifically ACC device agent).

After receiving a few of these, I logged in to check it out. Immediately there was no visible errors on the UTM itself, but after some further digging, I noticed these event logs in the “System Messages” log file:

2017:02:06-17:09:32 mail partitioncleaner[7918]: automatic cleaning for partition /tmp started (inodes: 0/100 blocks: 100/85)

2017:02:06-17:09:32 mail partitioncleaner[7918]: stopping deletion: can’t delete more files

Looks like a potential storage problem? Yes it was, but slightly more complicated.

I enabled SSH on the UTM and issued the “df” command (show’s volume usage), and found that the /tmp volume was 100% full.

Doing a “ls” and “ls -hl”, I found there were 25+ files that were around 235MB in size called: “AV-malware-names-XXXX-XXXXXX”.

Restarting the unit clears those files, however they come back shortly (I noticed it would add one every 5-10 minutes).

After some further digging (still haven’t heard back from Sophos on the support case), I came across some other users experiencing the same issues. While no one found a permanent resolution, they did mention this had to do with the Avira AV engine or possibly the dual scan engine.

Checking the UTM, I noticed that we had the E-Mail scanning configured for dual scan.

Solution (temporary workaround):

I went ahead and configured the E-Mail scanner (the only scanner I had that was using dual scan) to use single scan only. I then restarted the UTM. In my environment the default setting for single scanning is set to “Sophos”.

I am now sitting here with 30 minutes of uptime and absolutely no “AV-malware-names-XXXX-XXXXXX” files created.

I will post an update when I hear back from Sophos support.

Hope this helps someone else!

 

Update (after original post):

I heard back from Sophos support, this is a known bug in 9.410. The current official workaround is to change to single scan and use the AVIRA engine instead of the Sophos engine.

Update #2:

Received notification this morning of a new firmware update available (Version: 9.411003 – Maintenance Release). While I haven’t installed it, it appears from the Bugfixes notes that it was released to fix this issue:

 Fix [NUTM-6804]: [AWS] Update breaks HVM standalone installations
Fix [NUTM-6747]: [Email] SAVI scanner coredumps permanently in MailProxy after update to 9.410
Fix [NUTM-6802]: [Web] New coredumps from httpproxy after update to v9.410

Update #3:

I noticed that this bug was interrupting some mailflow on my Sophos UTM, as well as some of my clients. I went ahead and as an emergency situation, installed 9.411-3.

Things were fine for around 10 hours until I started to receive notification of the HTTP proxy failing and requiring restart. Logging in to the UTM, it was very unresponsive, sometimes completely unresponsive for around 10 minutes. Web browsing was not functioning at all on the internal network behind the UTM.

This issue still hasn’t been resolved. Hopefully we see a stable working fix sometime soon.

  9 Responses to “Sophos UTM – 9.410-6 services crash and /tmp volume full after firmware upgrade”

  1. Did you have any issues with http proxy restarting with this firmware version?

  2. Hi Mauricio,

    I personally did not see it, but I read on Sophos community boards that a couple other users did see the http proxy restarting.

    Cheers,
    Stephen

  3. Updated the post to reflect bugfix ID’s.

    Mauricio, it appears the update released today should fix the issue you were reporting with the http proxy restarting.

  4. Something changed with this update and not for the better. Proxied traffic is definitely sluggish overall and CPU has gone from an average of 5% to 30% since the update was applied on my firewall.

  5. Hi Chris,

    Please keep us posted. I’m thinking there’s a 50% chance it may just be downloading and installing AV definitions for all the scanning systems. Let me know if this lasts longer than 30 minutes.

    Cheers

  6. Hi, thanks for your post. I want to confirm this problem and share my experience.
    Initially I made the usually upgrade to 9.410-6 version and had the problem with data partition filling up quickly. I had an old system (I mean many live upgrades) and decided it’s time to reinstall and allocate more space. So I downloaded the last version, did a clean install and restored the settings. The data partition filling was gone (it grows till a limit and then drops back) but the http(s) proxy server restarts continued. Since then I applied each new update but the problem persists. No luck with single AV engine setting. No luck with disable https AV scanning. I noticed restarts during the night when nobody works. Also with the new install I have no unusual CPU usage or something else.
    Still waiting a new patch to resolve the http(s) proxy restarts.

  7. Hi chrysosotmos,

    As a temp fix, check to make sure you’re not using dual scan on any of the other systems (such as SMTP proxy, etc…). I noticed the UTM is behaving when I set every service on the UTM to single scan. So far I’m at 4 days of uptime on 5 units that were experiencing the issue, and they are behaving since making the change.

    Cheers

  8. Hi again,

    I did what you said, but no luck. I even disabled http(s) scanning everywhere and still have proxy restarts. Anyway the dashboard says the antivirus is still active for http/s protocols although I don’t know where exactly I left something enabled.

    Any news about this issue? Anything about a new patch?

  9. Hi chrysosotmos,

    I’ve heard of no changes or any updates to the issue. I’m still running with the modifications I made to make things work.

    Just curious, have you tried changing the default AV agent (from Sophos to Avira, or vice versa). I’d recommend restarting after making the change.

    Have you logged in to your UTM with SSH? Checking the logs to find out what’s crashing and why? Do you know if it’s being caused by memory usage on the HTTP/HTTPS proxy?

    Cheers,
    Stephen

 Leave a Reply

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

(required)

(required)