May 162019
 

There may be a situation where you wish to completely reinstall WSUS from scratch. This can occur for a number of reasons, but most commonly is due to database corruption, or performance issues due to a WSUS database that hasn’t been maintained properly with the normal maintenance.

Commonly, when regular maintenance hasn’t occurred on a WSUS database, when an admin finally performs it, it can take days and weeks to re-index the database, clean up the database, and run the cleanup wizards.

Also, due to timeouts on IIS, the cleanup wizard may fail which could ultimately cause database corruption.

Administrators often want or choose to blast away their WSUS install, and completely start from scratch. I’ve done this numerous times in my own environment as well as numerous customer environments.

In this guide, we are going to assume that you’re running WSUS on a Windows Server that is dedicated to WSUS and is using the WID (Windows Internal Database) which is essentially a built-in version of SQL Express.

PLEASE NOTE: If you are using Microsoft SQL, these instructions will not apply to you and will require modification. Only use these instructions if the above applies to you.

What’s involved

WSUS (Windows Server Update Services) relies on numerous Windows roles and features to function. As part of the instructions we’ll need to completely clear out:

  • WSUS Role, Configuration, and Folders/Files
  • IIS Role, Configuration, and Folders/Files
  • WID Feature, Configuration, and Database Files

Since we are completely removing IIS (Role, Configuration, and Folders/Files), only proceed if the server is dedicated to WSUS. If you are using IIS for anything else, this will completely clear the configuration and files.

Let’s get to it!

Instructions

  1. Open “Server Manager” either on the host, or remotely and connect to the host you’d like to reinstall on.
  2. Open “Remove Roles and Features” wizard.
  3. Click “Next”, and select the Server, and click “Next” again.
  4. On the “Remove server roles” screen, under “Roles”, we want to de-select the following: “Web Server (IIS)” and “Windows Server Update Services” as shown below. Selecting WSUS and IIS Roles to be Removed
  5. Click “Next”
  6. On the “Remove features” screen, under “Features”, we want to de-select the following: “Windows Internal Database” and “Windows Process Activation Service” as shown below. Selecting WID and WPAS Features for Removal
  7. Click “Next” and follow the wizard to completion and remove the roles and features.
  8. Restart the Server.
  9. Open an administrative command prompt on the server, and run the command “powershell” or open powershell directly.
  10. Run the following command in powershell to remove any bits and pieces:
    Remove-WindowsFeature -Name UpdateServices,UpdateServices-DB,UpdateServices-RSAT,UpdateServices-API
  11. Restart the Server.
  12. We now must delete the WSUS folders and files. Delete the following folders:
    C:\WSUS
    C:\Program Files\Update Services

    Note: You may have stored the WSUS content directory somewhere else, please delete this as well.
  13. We now must delete the IIS folders and files (and configuration, including the WsusPool application pool, bindings, etc.). Delete the following folders:
    C:\inetpub
    C:\Windows\System32\inetsrv

    Note: You may have issues deleting the “inetsrv” directory. If this occurs, simply rename it to “inetsrv.bad”.
  14. We now must delete the WID (Windows Internal Database) folders and files (including the WSUS SQL Express database). Delete the following folder:
    C:\Windows\WID
  15. While we removed the IIS folders and files, we deleted a needed system file. Run the following command to restore the file:
    sfc /scannow
  16. Restart the Server.

WSUS, IIS, and WID have at this point been completely removed. We will now proceed to install, apply a memory fix, and configure WSUS.

For instructions on installing WSUS on Server Core, please click here: https://www.stephenwagner.com/2019/05/15/guide-using-installing-wsus-windows-server-core-2019/

  1. Open “powershell” (by typing powershell) and Install the WSUS Role with the following command:
    Install-WindowsFeature UpdateServices -Restart
  2. Run the post installation task command to configure WSUS:
    "C:\Program Files\Update Services\Tools\wsusutil.exe" postinstall CONTENT_DIR=C:\WSUS
  3. AT THIS POINT DO NOT CONTINUE CONFIGURING WSUS AS YOU MUST APPLY A MEMORY FIX TO IIS.
  4. Apply the “Private Memory Limit (KB)” fix as provided here: https://www.stephenwagner.com/2019/05/14/wsus-iis-memory-issue-error-connection-error/
  5. Restart the Server.
  6. Open the WSUS MMC on the server or remotely from a workstation on the network and connect it to the WSUS instance on your Server Core install.
  7. Run through the wizard as you would normally and perform an synchronization.
  8. WSUS has been re-installed.

And that’s it. You’ve completely reinstalled WSUS from scratch on your Windows Server.

Oct 082018
 
Microsoft Windows Logo

If you are running Microsoft Windows in a domain environment with WSUS configured, you may notice that you’re not able to install some FODs (Features on Demand), or use the “Turn Windows features on or off”. This will stop you from installing things like the RSAT tools, .NET Framework, Language Speech packs, etc…

You may see “failure to download files”, “cannot download”, or errors like “0x800F0954” when running DISM to install packages.

To resolve this, you need to modify your domain’s group policy settings to allow your workstations to query Windows Update servers for additional content. The workstations will still use your WSUS server for approvals, downloads, and updates, however in the event content is not found, it will query Windows Update.

Enable download of “Optional features” directly from Windows Update

  1. Open the group policy editor on your domain
  2. Create a new GPO, or modify an existing one. Make sure it applies to the computers you’d like
  3. Navigate to “Computer Configuration”, “Policies”, “Administrative Templates”, and then “System”.
  4. Double click or open “Specify settings for optional component installation and component repair”
  5. Make sure “Never attempt to download payload from Windows Update” is NOT checked
  6. Make sure “Download repair content and optional features directly from Windows Update instead of Windows Server Update Services (WSUS)” IS checked.
  7. Wait for your GPO to update, or run “gpupdate /force” on the workstations.

Please see an example of the configuration below:

Download repair content and optional features directly from Windows Update instead of Windows Server Update Services (WSUS)

You should now be able to download/install RSAT, .NET, Speech language packs, and more!

Aug 212018
 
Microsoft .NET Framework

You may notice on Windows Server 2012 R2, when applying Windows Updates that one or more .NET updates may fail with error code 0x80092004. This issue may affect all, or only some of your Windows Server 2012 R2 servers.

When troubleshooting this, you may notice numerous specific errors such as “Couldn’t find the hash of component: NetFx4-PenIMC”, or errors with a CAB file. These errors will probably come from update KB4054566 and KB4340558.

The Fix

To resolve this, we are going to download the updates MSU files from the Microsoft Update Catalog, and fully uninstall, then re-install the problematic updates.

Please Note: Always make sure you have a full backup before making modifications to your servers.

Please follow the instructions below:

  1. Create a folder called “updatefix” on the root of your C drive on the server
  2. Navigate to the Windows Update catalog at: https://www.catalog.update.microsoft.com/
  3. Search for KB4054566 and download the file for “Windows Server 2012 R2”, save it to the folder you created above called “updatefix” on the root of your C Drive. There should be one file in the download.
  4. Search for KB4340558 and download the files for “Windows Server 2012 R2”, save it to the folder you created above called “updatefix” on the root of your C Drive. There should be a total of 3 files in this download.
  5. Create a folder in the “updatefix” folder called “expanded”.
  6. Open an elevated command prompt, and run the following commands to extract the updates CAB files:
    expand -f:* "C:\updatefix\windows8.1-kb4338415-x64_cc34d1c48e0cc2a92f3c340ad9a0c927eb3ec2d1.msu" C:\updatefix\expanded\
    expand -f:* "C:\updatefix\windows8.1-kb4338419-x64_4d257a38e38b6b8e3d9e4763dba2ae7506b2754d.msu" C:\updatefix\expanded\
    expand -f:* "C:\updatefix\windows8.1-kb4338424-x64_e3d28f90c6b9dd7e80217b6fb0869e7b6dfe6738.msu" C:\updatefix\expanded\
    expand -f:* "C:\updatefix\windows8.1-kb4054566-x64_e780e6efac612bd0fcaf9cccfe15d6d05c9cc419.msu" C:\updatefix\expanded\
  7. Now let’s uninstall the problematic updates. Some of these commands may fail depending on which updates you have successfully installed. Run the following commands individually to remove the updates:
    dism /online /remove-package /packagepath:C:\updatefix\expanded\Windows8.1-KB4338424-x64.cab
    dism /online /remove-package /packagepath:C:\updatefix\expanded\Windows8.1-KB4338419-x64.cab
    dism /online /remove-package /packagepath:C:\updatefix\expanded\Windows8.1-KB4338415-x64.cab
    dism /online /remove-package /packagepath:C:\updatefix\expanded\Windows8.1-KB4054566-x64.cab
  8. Reboot your server.
  9. Now let’s cleanly install the updates. All of these commands should be successful when running. Run the following commands individually to install the updates:
    dism /online /add-package /packagepath:C:\updatefix\expanded\Windows8.1-KB4054566-x64.cab
    dism /online /add-package /packagepath:C:\updatefix\expanded\Windows8.1-KB4338415-x64.cab
    dism /online /add-package /packagepath:C:\updatefix\expanded\Windows8.1-KB4338419-x64.cab
    dism /online /add-package /packagepath:C:\updatefix\expanded\Windows8.1-KB4338424-x64.cab
  10. Reboot your server.
  11. You have now fixed the issue and all updates should now be cleanly installing via Windows Updates!

Leave a comment and let me know if this worked for you!

Apr 292018
 
Directory Services Restore Mode

Running Veeam Backup and Replication, a Microsoft Windows Server Domain Controller may boot in to safe mode and directory services restore mode.

About a week ago, I loaded up Veeam Backup and Replication in to my test environment. It’s a fantastic product, and it’s working great, however today I had a little bit of an issue with a DC running Windows Server 2016 Server Core.

I woke up to a notification that the backup failed due to a VSS snapshot issue. Now I know that VSS can be a little picky at times, so I decided to restart the guest VM. Upon restarting, she came back up, was pingable, and appeared to be running fine, however the backup kept failing with new errors, the event log was looking very strange on the server, and numerous services that were set to automatic were not starting up.

This specific server was installed using Server Core mode, so it has no GUI and is administered via command prompt over RDP, or via remote management utilities. Once RDP’ing in to the server, I noticed the “Safe Mode” branding on each corner of the display, this was very odd. I restarted the server again, this time manually trying to start Active Directory Services manually via services.msc.

This presented:

Event ID: 16652
Source: Directory-Services-SAM
General Description: The domain controller is booting to directory services restore mode.

Screenshot:

Directory Services Restore Mode

The domain controller is booting to directory services restore mode.

 

This surprised me (and scared me for that matter). I immediately started searching the internet to find out what would have caused this…

To my relief, I read numerous sites that advise that when an active backup is running on a guest VM which is a domain controller, Veeam activates directory services restore mode temporarily, so in the event of a restore, it will boot to this mode automatically. In my case, the switch was not changed back during the backup failure.

Running the following command in a command prompt, verifies that the safeboot switch is set to dsrepair enabled:

bcdedit /v

To disable directory services restore mode, type the following in a command prompt:

bcdedit /deletevalue safeboot

Restart the server and the issue should be resolved!

Jan 212018
 
Azure AD

This weekend I configured Azure AD Connect for pass through authentication for my on-premise Active Directory domain. This was a first for me and extremely easy to do, however there was a few issues with my firewall and SSL content filtering and scanning rules which was blocking the connection. I figured I’d create a post providing some information you’ll need to get this setup and running quickly.

In my environment, I have a Sophos UTM firewall which provides firewall services (port blocking), as well as HTTP and HTTPs scanning and filtering (web filtering).

The Problem

After running the Azure AD Connect wizard, all went good however there was an error at the end of the wizard notifying that synchronization was configured but is not occurring due to firewall. It provided a link for more information (that actually didn’t really contain the information needed).

While this issue is occurring, you’ll notice:

-Azure AD Connect in the Azure portal is reporting that pass-through authentication is Enabled, however after expanding the item, the Authentication Agent reports a status of Inactive on your internal domain controllers.

-In the Event log, under “Applications and Services Logs”, then “Microsoft”, then “AzureADConnect”, then “AuthenticationAgent”, and finally “Admin”, you’ll see the following error event:

Event ID: 12019

Source: Microsoft Azure AD Connect Authentication Agent (Microsoft-AzureADConnect-AuthenticationAgent)

Event:
The Connector stopped working because the client certificate is not valid. Uninstall the Connector and install it again. Request ID: '{WAJAJAJA-OHYA-YAAA-YAAAA-WAKAKAKAKAKAKAK}'

This event log above is due to the SSL and HTTPs content filtering.

-Azure Pass-Through authentication won’t work

The Fix

After doing some research, I came up with the following list of ports and hosts you’ll need to allow unfiltered to a specific list of hosts.

Ports

The following ports are used by Azure AD Connect:

Port 443 – SSL

Port 5671 – TCP (From the host running the Azure AD Connect to Internet)

Hosts (DNS Hosts)

Here’s the host list:

*blob.core.windows.net
*servicebus.windows.net
*adhybridhealth.azure.com
*management.azure.com
*policykeyservice.dc.ad.msft.net
*login.windows.net
*login.microsoftonline.com
*secure.aadcdn.microsoftonline-p.com
*microsoftonline.com
*windows.net
*msappproxy.net
*mscrl.microsoft.com
*crl.microsoft.com
*ocsp.msocsp.com
*www.microsoft.com

If you’re running a Sophos UTM like I am, you’ll need to create an HTTP(s) scanning exception and then import this list in to a rule “Matching these URLs”:

^https?://([A-Za-z0-9.-]*\.)?blob.core.windows.net/
^https?://([A-Za-z0-9.-]*\.)?servicebus.windows.net/
^https?://([A-Za-z0-9.-]*\.)?adhybridhealth.azure.com/
^https?://([A-Za-z0-9.-]*\.)?management.azure.com/
^https?://([A-Za-z0-9.-]*\.)?policykeyservice.dc.ad.msft.net/
^https?://([A-Za-z0-9.-]*\.)?login.windows.net/
^https?://([A-Za-z0-9.-]*\.)?login.microsoftonline.com/
^https?://([A-Za-z0-9.-]*\.)?secure.aadcdn.microsoftonline-p.com/
^https?://([A-Za-z0-9.-]*\.)?microsoftonline.com/
^https?://([A-Za-z0-9.-]*\.)?windows.net/
^https?://([A-Za-z0-9.-]*\.)?msappproxy.net/
^https?://([A-Za-z0-9.-]*\.)?mscrl.microsoft.com/
^https?://([A-Za-z0-9.-]*\.)?crl.microsoft.com/
^https?://([A-Za-z0-9.-]*\.)?ocsp.msocsp.com/
^https?://([A-Za-z0-9.-]*\.)?www.microsoft.com/

The exception I created skips:

  • Authentication
  • Caching
  • Antivirus
  • Extension Blocking
  • MIME type blocking
  • URL Filter
  • Content Removal
  • SSL Scanning
  • Certificate trust check
  • Certificate date check

After creating the exceptions, I restarted the “Microsoft Azure AD Connect Authentication Agent”. The errors stopped and Azure AD Pass-through started to function correctly! Also the status of the Authentication Agent now reports a status of active.

Oct 182017
 

Well, it’s October 18th 2017 and the Fall Creators update (Feature update to Windows 10, version 1709) is now available for download. In my particular environment, I use WSUS to deploy and manage updates.

Update: It’s now May 2018, and this article also applies to Windows 10 April 2018 update version 1803 as well!

Update: It’s now October 2018, and this article also applies to Windows 10 October 2018 update version 1809 as well!

Update: It’s now May 2019, and this article also applies to Windows 10 May 2019 update version 1903 as well!

I went ahead earlier today and approved the updates for deployment, however I noticed an issue on multiple Windows 10 machines, where the Windows Update client would get stuck on Downloading updates 0% status.

I checked a bunch of things, but noticed that it simply couldn’t download the updates from my WSUS server. Further investigation found that the feature updates are packaged in .esd files and IIS may not be able to serve these properly without a minor modification. I remember applying this fix in the past, however I’m assuming it was removed by a prior update on my Windows Server 2012 R2 server.

If you are experiencing this issue, here’s the fix:

  1. On your server running WSUS and IIS, open up the IIS manager.
  2. Expand Sites, and select “WSUS Administration”
  3. On the right side, under IIS, select “MIME Types”
  4. Make sure there is not a MIME type for .esd, if there is, you’re having a different issue, if not, continue with the instructions.
  5. Click on “Add” on the right Actions pane.
  6. File name extension will be “.esd” (without quotations), and MIME type will be “application/octet-stream” (without quotations).
  7. Reset IIS or restart WSUS/IIS server

You’ll notice the clients will now update without a problem! Happy Updating!

Feb 182017
 

This is an issue that effects quite a few people and numerous forum threads can be found on the internet by those searching for the solution.

This can occur both when taking manual snapshots of virtual machines when one chooses “Quiesce guest filesystem”, or when using snapshot based backup applications such as vSphere Data Protection (vSphere vDP).

 

For the last couple days, one of my test VMs (Windows Server 2012 R2) has been experiencing this issue and the snapshot has been failing with the following errors:

An error occurred while taking a snapshot: Failed to quiesce the virtual machine.
An error occurred while saving the snapshot: Failed to quiesce the virtual machine.

As always with standard troubleshooting, I restarted the VM, checked for VSS provider errors, and insured that the Windows Services involved with snapshots were in their correct state and configuration. Unfortunately this had no effect, and everything was configured the way it should be.

I also tried to re-install VMWare tools, which had no effect.

PLEASE NOTE: If you experience this issue, you should confirm the services are in their correct state and configuration, as outlined in VMware KB: 1007696. Source: https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1007696

 

The Surprise Fix:

In the days leading up to the failure when things were running properly, I did notice that the quiesced snapshots for that VM were taking a long time process, but were still functioning correctly before the failure.

This morning during troubleshooting, I went ahead and deleted all the Windows Volume Shadow Copies which are internal and inside of the Virtual Machine itself. These are the shadow copies that the Windows guest operating system takes on it’s own filesystem (completely unrelated to VMware).

To my surprise after doing this, not only was I able to create a quiesced snapshot, but the snapshot processed almost instantly (200x faster than previously when it was functioning).

I’m assuming this was causing a high load for the VMware snapshot to process and a timeout was being hit on snapshot creation which caused the issue. While Windows volume shadow copies are unrelated to VMware snapshots, they both utilize the same VSS (Volume Shadow Copy Service) system inside of windows to function and process. One must also keep in mind that the Windows volume shadow copies will of course be part of a VMware snapshot.

PLEASE NOTE: Deleting your Windows Volume Shadow copies will delete your Windows volume snapshots inside of the virtual machine. You will lose the ability to restore files and folders from previous volume shadow copy snapshots. Be aware of what this means and what you are doing before attempting this fix.

Sep 232016
 

Well, recently one of the servers I monitor and maintain in a remote oil town recently started throwing out a Windows event log warning:

Event ID: 129

Source: HpCISSs2

Description: Reset to device, \Device\RaidPort0, was issued.

 

The server is an HP ML350p Gen8 (Windows Server 2008 R2) running latest firmware and management software. It has 2 RAID Arrays (RAID1, and RAID5), and a total of 6 disks.

Researching this error, I read that most people had this occur when running the latest HP WBEM providers, as well as anti-virus software. In our case, I actually tried to downgrade to an older version, but noticed the warning still occurs. While we do have anti-virus, it’s not actively scanning (only weekly scheduled scans).

In the process of troubleshooting, I noticed that under the HP Systems Management Homepage, one of the drives in the RAID1 array, had the following stats:

Hard Read Erros:  150
Recovery Read Errors:  7
Total Seeks:  0
Seek Errors:  0

I found these numbers to be very high in my experience. None of the other drives had anything close to this (in 4 years of running, only one other disk had a read error (a single one), this disk however had tons. For some reason the drive is still reporting as operational, when I’d expect it to be marked as a predicted failure, or failed.

While all online documentation was pointing towards at locks on the array by software, from my own experience I think it was actually the array waiting for a read operation on the array, and it was this single disk that was causing a threshold to be hit in the driver, that caused a retry to recover the read operation.

Called up HPe support, I mentioned I’d like to have the drive replaced. The support engineer consulted her senior engineer and reviewed the evidence I presented (along with ADU reports, and Active Monitoring health reports), the senior engineer concurred that the drive should be replaced.

Replacing the drive resolved the issue. I’m also noticing a performance increase on the array as well.

Make sure to always check the stats on the individual components of your RAID arrays, even if everything is operating sound.

Sep 102016
 

When initiating manual backups or occasionally when automatic/scheduled backups run, a user may notice that Windows Server Backup may appear to “hang” when the status is reporting: “Preparing media to store backups…”.

In some rare cases, it may actually be in a hang state, however most of the time, it’s actually consolidating and/or checking previous backups on the destination media.

To Confirm this:

Open the Task Manager as Administrator, then click on the “Performance” tab, click on “Open Resource Monitor”. Flip over to the “Disk” tab, expand “Disk Activity”, and sort by name. You should see the read requests on the destination media, you’ll also notice that it is slowly progressing consecutively through each backup set (increments of 1, accessing multiple at a time).

This confirms that the Windows Server Backup services are functioning and it is in fact running. In one case, I had 723 previous backups, and it took around 50 minutes to count from 1 to 723, and then the backup finally proceeded.

I have also seen this occur when a previous backup failed or was cancelled. This occurs with Windows Server Backup on Windows Server 2008, Windows Server 2008 R2, and Windows Server 2012 R2.

Mar 262016
 

An issue that’s been making me rip my hair apart for some time… And a fix for you experiencing the same.

 

Equipment:

HP Proliant DL360 G6 Server (with a P800 Controller) running Server 2012 R2 and Backup Exec 2014

HP MSL-2024 Tape Library with a single HP SAS LTO-6 Tape Drive

 

Symptoms:

-After a clean restart, a backup job completes successfully. Subsequent jobs fail until server or services restarted.

-While the initial backup does complete, errors/warnings can be seen in the adamm.log and the Event Viewer even when successful.

-Subsequent backups failing report that the device is offline. The Windows Device Manager reports everything is fine.

-Windows Server itself does not report any device errors whatsoever.

 

Observations:

[5648] 03/05/16 07:50:46 Adamm Mover Error: DeviceIo: 03:07:00:00 – Device error 1167 on “\\.\Tape0”, SCSI cmd 0a, 1 total errors
[5648] 03/05/16 07:55:46 Adamm Mover Error: DeviceIo: 03:07:00:00 – Refresh handle on “\\.\Tape0”, SCSI cmd 00, new handle 214, error 0
[5648] 03/05/16 07:55:46 Adamm Mover Error: DeviceIo: 00:00:00:00 – Retry Logic: Retry logic was engaged on device: HP       Ultrium 6-SCSI
[5648] 03/05/16 07:55:46 Adamm Mover Error: DeviceIo: 00:00:00:00 – Retry Logic: Original settings restored on device: HP       Ultrium 6-SCSI

Event ID 58053
Backup Exec Alert: Storage Error
(Server: “WhatsMySRVRname”) The device state has been set to offline because the device attached to the computer is not responding.

Ensure that the drive hardware is turned on and is properly cabled. After you correct the problem, right-click the device, and then click Offline to clear the check mark and bring the device online.

[09968] 03/05/16 01:42:08.426 DeviceIo: 03:07:00:00 – Refresh handle on “\\.\Tape0”, SCSI cmd 17, new handle ffffffff, error 32
[09968] 03/05/16 01:42:08.426 DeviceIo: 03:07:00:00 – Refresh handle on “\\.\Tape0”, SCSI cmd 1a, new handle ffffffff, error 32
[09968] 03/05/16 01:22:07.867 PvlSession::DismountMedia( 0, 0, 0 )
Job = {JOBHEXNUMBERZZZZZZ} “ServerBackup-Full”
Changer    = {CHANGERZZZZ} “Robotic library 0001”
Drive      = {MYBACKUPDRVXZZZZZ} “Tape drive 0001”
Slot       = 13
Media      = {MEDIAZIDZZZZ} “BARCODEID”
ERROR = 0xE0008114 (E_PVL_CHANGER_NOT_AVAILABLE)

[19812] 03/05/16 01:42:12.613 DeviceIo: 03:07:00:00 – Refresh handle on “\\.\Tape0”, SCSI cmd 1a, new handle ffffffff, error 32
[19812] 03/05/16 01:42:13.129 DeviceIo: 03:07:00:00 – Refresh handle on “\\.\Tape0”, SCSI cmd 00, new handle ffffffff, error 32
[19812] 03/05/16 01:42:13.645 PvlDrive::DisableAccess() – ReserveDevice failed, offline device
Drive = 1007 “Tape drive 0001”
ERROR = 0x0000001F (ERROR_GEN_FAILURE)

[19812] 03/05/16 01:42:13.645 PvlDrive::UpdateOnlineState()
Drive = 1007 “Tape drive 0001”
ERROR = The device is offline!

[19812] 03/05/16 01:42:12.613 DeviceIo: 03:07:00:00 – Refresh handle on “\\.\Tape0”, SCSI cmd 1a, new handle ffffffff, error 32
[19812] 03/05/16 01:42:13.129 DeviceIo: 03:07:00:00 – Refresh handle on “\\.\Tape0”, SCSI cmd 00, new handle ffffffff, error 32
[19812] 03/05/16 01:42:13.645 PvlDrive::DisableAccess() – ReserveDevice failed, offline device
Drive = 1007 “Tape drive 0001”
ERROR = 0x0000001F (ERROR_GEN_FAILURE)

[19812] 03/05/16 01:42:13.645 PvlDrive::UpdateOnlineState()
Drive = 1007 “Tape drive 0001”
ERROR = The device is offline!

Event ID 1000
Faulting application name: wmiprvse.exe, version: 6.3.9600.17415, time stamp: 0x54505614
Faulting module name: MSVCR110.dll, version: 11.0.51106.1, time stamp: 0x5098826e

 

Research:

I spent a ton of time researching this… Old support threads were pointing me in all different directions, most of the threads being old, mentioning drivers, etc… Initially I thought it was hardware related, until through testing I got the gut feeling it was software related. There was absolutely no articles covering Backup Exec 2014 running on Windows Server 2012 R2 with this specific issue.

Tried a bunch of stuff, including swapping the P800 controller, for another HP P212. While it didn’t fix the issue, I gained some backup speed! 🙂

Updating the HP software (agents, providers, HP SMH, WBEM) had no effect.

Disabling the HP providers, and disabling the HP Monitoring, Insight, Management services had no effect whatsoever. Tried different firmware versions, also tried different drivers for the Library and Tape drive, no effect. Tried factory resets, no effect. Tried Library and Tape tools, all tests passed.

Disabled other monitoring software we have in place to monitor software/hardware on clients servers, no effect.

 

Resolution:

-Uninstalled the HP WBEM Providers and Agents.

-Added a “BusyRetryCount” 32-bit DWORD value of 250 (decimal) to the “Storport” key under “Device Parameters” in all the Tape Library and Tape Drive Registry entries. Example:

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Enum\SCSI\Changer&Ven_HP&Prod_MSL_G3_Series\5&334e8424&0&000500\Device Parameters\Storport]
“BusyRetryCount”=dword:000000fa

This needs to be added to ONLY and ALL the tape device entries (under HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Enum\SCSI\) for the Tape Library and Tape Drives. You probably will have to create “Storport” key under the devices “Device Parameters” key.

After doing this, the backups run consecutively with absolutely no issues. The event log is CLEAN, and Adamm.log is clean, and the “Faulting application name: wmiprvse.exe” errors in the event log no longer occur.

Fixed!

 

Additional Notes:

-Both “fixes” were applied at the same time. I believe the WBEM providers/agent caused the Event ID 1000 errors on WMIPRVSE.exe. While the registry keys alone may have possibly resolved the backup issues, I believe there still would have been an underlying issue with WMIPRVSE.exe faulting that could have other consequences.

-I do not believe the original installation of the HP WBEM providers caused the issue, I have a feeling a subsequent Windows Update, Backup Exec update, other module update, or an update to the HP software may have caused the issue to occur at a later time than original install. I do remember we didn’t have an issue with the backups for months, until one day it started occurring.

-I will be re-installing the HP providers and agents at a later time. I will be uninstalling all of them, and re-installing from scratch the latest versions. I will post an update with my results.

-There is a chance the registry key is needed for the HP software to co-exist with Backup Exec backups for this configuration.

-There is a chance that the registry key isn’t needed if you never load the HP software.