Oct 282018
 

I have noticed an issue when after upgrading Microsoft Exchange 2016 CU10 to Exchange 2016 CU11, services may fail to start. This issue can be intermittent, where some restarts are able to start more services, and others restarts fewer. I have observed this on 2 separate Exchange upgrades, both were CU10 to CU11.

The Problem

Recently, a customer had an issue where a Microsoft Exchange security update bricked their entire Exchange CU10 installation. Files were missing and services would not start (even after manually re-configuring all system services to their prior settings, and force starting). To fix this, we weighed our options and decided the best course of action would be to attempt the latest CU (CU11). This is because each Microsoft Exchange Cumulative update is actually a full installer that completely removes the old version, and installs the new version cleanly.

After installing CU11 we were able to rescue the Exchange installation (services could now start, and functioned), however numerous errors and warnings were now present, and we also noticed that there were some new issues with services.

One service in particular called “Net.Tcp Port Sharing Service”, would occasionally not start in time and cause all the Exchange Services not to start (Exchange is dependent on this services). Other times, this service would start, however random Exchange services would timeout.

Some of the errors and warnings included:

Event ID 7000
Source: Service Control Manager
Description:
The MSComplianceAudit service failed to start due to the following error: 
The service did not respond to the start or control request in a timely fashion.

Event ID 7009
Source: Service Control Manager
Description:
A timeout was reached (30000 milliseconds) while waiting for the MSComplianceAudit service to connect.

Event ID 7000
Source: Service Control Manager
Description:
The MSExchangeRepl service failed to start due to the following error: 
The service did not respond to the start or control request in a timely fashion.

Event ID 7009
Source: Service Control Manager
Description:
A timeout was reached (30000 milliseconds) while waiting for the MSExchangeRepl service to connect.

I also observed that on a few restarts, the services that failed would eventually end up restarting 10-15 minutes later (this only occurred 50% of the time).

Originally I was concerned and believed these issues were related to the original issues the customer experienced, however I upgraded my own Exchange 2016 server to CU11 and experienced the same problems (my instance was a clean fully functioning install). I also attempted to upgrade .NET to version 4.7.2 to see if this had any effect, but it did not.

When you go in to services (services.msc) and manually start the services, Exchange functions perfectly and everything works.

The Solution

As of yet, I don’t have a proper solution. I did however notice that with my customer’s environment, after it was left to sit overnight (around 8 hours), that subsequent restarts actually were able to start the majority of the services properly. It almost seemed as if it just needed time to fix itself. I’m not sure if this is because of IO load, or some type of Exchange database maintenance, but I’m waiting to see if it clears up on my instance as well after an amount of time. I’ll be keeping this post updated.

UPDATE – October 29th: I’ve confirmed for the 2nd time that the issue resolves at least 6-8 hours after the upgrade. At the end of the day I restarted my machine and everything was functioning properly.

If you are experiencing this issue, or can make a comment on it, please leave a comment on this post!

Additional Resources

  22 Responses to “Exchange 2016 CU11 – Services Fail to Start”

  1. Interesting as I’m just about to apply CU11 to a CU10 environment, (or was) thanks for the warning, 🙂

  2. Glad if the warning helped!

    I wouldn’t say to wait, I’d just recommend doing it on a Friday. After the install, restart the server. Let it sit for 20-30 minutes, do another restart.

    After this, give it 10 minutes to boot, and look at the services. You may need to manually start the Exchange services. Everything should work when you do this.

    After another 24 hours, try restarting the server again, and everything should be working fine!

    I haven’t had any issues with CU11 other than this one (which resolved itself), since I deployed it at the original time of the post.

    Cheers,
    Stephen

  3. Stephen thanks, I have successfully run the setup.exe /PrepareSchema

    But running setup.exe /PrepareAD consistently fails

    gets to about %44 through the process and bombs out ??

  4. Hi David,

    That’s odd. How did you kick off the install?

    I’d recommend opening an elevated administrative command prompt, navigate to the DVD, and then run setup.exe.

    The install will also try to run PrepareAD and PrepareSchema, it might give you more information. Also, what was the error when it failed PrepareAD?

    Finally, what is your Active Directory domain minimum version set to for the forest and domain? I’m wondering if this may be why it’s failing, but I’d need to see more on the error.

    Cheers

  5. I used an elevated command prompt, changed to the drive that setup.exe was situated on, and then ran setup.exe

    I notice in the event viewer (MsExchange Management) there are references to “Cmdlet failed” Cmdlet Install-ExchangeOrganization” event id 6 Task category General level error

    The Domain minimum version and forest are both what they should be.

  6. Hi David,

    Can you check the event log for any AD communication errors? Also, how many domain controllers do you have? Can you check their event log to see if they are healthy?

    Maybe do a health check with dcdiag and see if anything odd comes up. Also, try restarting your domain controllers, member servers, and then try the install again?

    I’d be more worried something is either wrong with DNS, AD, the network communication, or something else.

    Stephen

  7. I was able to upgrade 10 Exchange 2016 servers from CU7 to CU11 without any issues. Now I am building 5 brand new ones in another data center because we are moving and I ran into this issue… I built them with CU11 and services just keep failing… reboots and everything else doesnt fix it. I wait overnight and the next day they are all running fine. Further reboots do not re-introduce the issue anymore… very weird..

  8. Hi Chris,

    It is bizarre, usually experiencing this I’d just blast away the server and restart the install process… I have absolutely no idea what happens to self correct over a period of time…

    Most people wouldn’t even wait, lol.

    Cheers
    Stephen

  9. I found the root cause Stephen! Since the issue did not go away and continued after additional reboots, we found a bug with HP hardware, etc that is resolved by adding the following registry key on the affected servers:

    [HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\TimeZoneInformation]
    “RealTimeIsUniversal”=dword:00000001

    I looked into this direction because I saw a lot of events after reboot had the wrong time stamp (few hours into the future). I was validating time settings over and over which were correct. This is the only thing that resolved the issue.

  10. Hi Chris,

    That’s interesting and a great find! I use HPe servers but can also confirm that this issue occurred on a Dell environment with ESXi as well.

    Thanks again,
    Stephen

  11. Would be good to apply to a Dell server and see if it also helps. We don’t use anything other than HPE, therefore I cannot confirm.

  12. I had a variant of this issue. My ManageEngine Desktop Server application tried to reapply the CU11 update which hosed my install. Turned all of the Exchange Services to Disabled. My server is a Windows 2012 R2 Standard running on a VMWare 6.0 environment (Simplivity Storage Device).

  13. Hi Chris,

    In events like yours, if the install gets completely hosed, you can try to rescue it by trying to install an even newer CU version (if you can resolve the root of the problem).

    I’ve rescued many Exchange upgrades that were busted (services disabled, nothing works), by just using an even newer CU than the one that bricked the install.

    Just keep in mind that as you try newer ones, you run out until there aren’t any newer ones.

    Cheers,
    Stephen

  14. Just to add fuel to the fire. Getting the same error still.

    Exchange 2016, CU 12
    .NET 4.7.2

    Going to just manually start everything for a few days and see if it will go away.

  15. I have 3 Exchange 2016 CU 12 and 2 Exchange 2019 CU 1.
    The NetTcpPortSharing service disables it self and then exchange services stops.

    I have to enable the NetTcpPortSharing service and restart. After a while you can return and service is disabled with exchange services stoped

  16. I have resolved it by doing the following. Uninstall CU12 from view installed updates.
    Installed .Net 4.8 restarted.
    Ran Windows updates and picked up CU12 again and installed.

    After restart all is good

  17. Hello!
    I am doing an exchange migration from 2013 to 2016. Two new 2016 vm’s have been spun up, exchange installed on both.

    Once we install patches one of the machines takes a nose dive. But we are installing CU13.

    The symptoms are strikingly similar to what you describe in this article. Reboots, services fail to start, loads of errors.

    Our 7000 error is “The account specified for this service is different from the account specified for other services running in the same process”

    7001: DNS Client service depends on the network store interface service service which failed to start because of the following error: The account specified for this service is different from the account specified for other services running in the same process”

    7023 error: Windows Time Service terminated with the following error: An attempt was made to logon, but the network logon service was not started”

    We ran into this last night and after finding your article we let the server sit online overnight. Reboot this morning with no change.

    Is this a different issue? Any thoughts would be appreciated.

  18. Hi Joe,

    I’m sorry but I think the issues that you are experiencing are separate from the issues described in this blog post.

    In this post, services that fail to start are the actual Exchange services, not the other unrelated Windows Services.

    In your case, it sounds like something may be occurring at the OS level, if there’s issues with service accounts, as well as the Windows Time Services.

    I can’t comment further without knowing more about the environment, but I’m not so sure it’s Exchange’s fault for the issues, it sounds like something else is going on.

    Stephen

  19. Thank you for the quick reply and sorry I was not more clear. All exchange services fail to start as well as those noted in my error messages.

    It appears to start with the same network service – Windows Firewall services is also stopped. Manually starting Windows Firewall, NetLogOn, WinTime, and then exchange services appears to bring me back.

    We have not moved mailflow over from the old servers to the new ones yet.

    Thanks,
    Joe

  20. Hi Joe,

    I’m still not convinced it’s related to this error. Non-related Windows Services failure to start could cause Exchange services not to start.

    In my case, I believe it’s a bug with Exchange, whereas with your case, something on the OS level is causing non-exchange services not to start, which is then resulting in Exchange services not starting.

    I’d be worried about the operating system health.

    If you can eventually get all the services to start, and everything to function, if it was me, I would decommision the new servers (properly), and then start the new server deployment from scratch.

    Make sure that when you commission new servers, they are fully up to date before installing Exchange. Also make sure you are using the latest CU.

    Stephen

  21. Thank you for your reply.

    Thanks,
    Joe

  22. Quick follow-up.

    After some considerable troubleshooting we were able to determine the cause to be a windows update, most likely KB4507459 with some reasonable level of certainty.
    Thought not 100% sure.

    This is not exchange related at all – as we have seen this on other 2016 servers.

    What appears to be happening after a CU Patch is attempted, fails, and is rolled back; Followed by reboot “Local Service” account is removed from “Log on as a Service”. As you can imagine this causes no end of trouble.
    Windows Firewall, netlogon, wintime, and many other services fail to start on boot.

    Thank you for taking the time earlier this month to reply to my questions. It was very helpful in narrowing down the possible issues we were seeing.

    Regards,
    Joe

 Leave a Reply

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

(required)

(required)