Mar 262016
 

An issue that’s been making me rip my hair apart for some time… And a fix for you experiencing the same.

 

Equipment:

HP Proliant DL360 G6 Server (with a P800 Controller) running Server 2012 R2 and Backup Exec 2014

HP MSL-2024 Tape Library with a single HP SAS LTO-6 Tape Drive

 

Symptoms:

-After a clean restart, a backup job completes successfully. Subsequent jobs fail until server or services restarted.

-While the initial backup does complete, errors/warnings can be seen in the adamm.log and the Event Viewer even when successful.

-Subsequent backups failing report that the device is offline. The Windows Device Manager reports everything is fine.

-Windows Server itself does not report any device errors whatsoever.

 

Observations:

[5648] 03/05/16 07:50:46 Adamm Mover Error: DeviceIo: 03:07:00:00 – Device error 1167 on “\\.\Tape0”, SCSI cmd 0a, 1 total errors
[5648] 03/05/16 07:55:46 Adamm Mover Error: DeviceIo: 03:07:00:00 – Refresh handle on “\\.\Tape0”, SCSI cmd 00, new handle 214, error 0
[5648] 03/05/16 07:55:46 Adamm Mover Error: DeviceIo: 00:00:00:00 – Retry Logic: Retry logic was engaged on device: HP       Ultrium 6-SCSI
[5648] 03/05/16 07:55:46 Adamm Mover Error: DeviceIo: 00:00:00:00 – Retry Logic: Original settings restored on device: HP       Ultrium 6-SCSI

Event ID 58053
Backup Exec Alert: Storage Error
(Server: “WhatsMySRVRname”) The device state has been set to offline because the device attached to the computer is not responding.

Ensure that the drive hardware is turned on and is properly cabled. After you correct the problem, right-click the device, and then click Offline to clear the check mark and bring the device online.

[09968] 03/05/16 01:42:08.426 DeviceIo: 03:07:00:00 – Refresh handle on “\\.\Tape0”, SCSI cmd 17, new handle ffffffff, error 32
[09968] 03/05/16 01:42:08.426 DeviceIo: 03:07:00:00 – Refresh handle on “\\.\Tape0”, SCSI cmd 1a, new handle ffffffff, error 32
[09968] 03/05/16 01:22:07.867 PvlSession::DismountMedia( 0, 0, 0 )
Job = {JOBHEXNUMBERZZZZZZ} “ServerBackup-Full”
Changer    = {CHANGERZZZZ} “Robotic library 0001”
Drive      = {MYBACKUPDRVXZZZZZ} “Tape drive 0001”
Slot       = 13
Media      = {MEDIAZIDZZZZ} “BARCODEID”
ERROR = 0xE0008114 (E_PVL_CHANGER_NOT_AVAILABLE)

[19812] 03/05/16 01:42:12.613 DeviceIo: 03:07:00:00 – Refresh handle on “\\.\Tape0”, SCSI cmd 1a, new handle ffffffff, error 32
[19812] 03/05/16 01:42:13.129 DeviceIo: 03:07:00:00 – Refresh handle on “\\.\Tape0”, SCSI cmd 00, new handle ffffffff, error 32
[19812] 03/05/16 01:42:13.645 PvlDrive::DisableAccess() – ReserveDevice failed, offline device
Drive = 1007 “Tape drive 0001”
ERROR = 0x0000001F (ERROR_GEN_FAILURE)

[19812] 03/05/16 01:42:13.645 PvlDrive::UpdateOnlineState()
Drive = 1007 “Tape drive 0001”
ERROR = The device is offline!

[19812] 03/05/16 01:42:12.613 DeviceIo: 03:07:00:00 – Refresh handle on “\\.\Tape0”, SCSI cmd 1a, new handle ffffffff, error 32
[19812] 03/05/16 01:42:13.129 DeviceIo: 03:07:00:00 – Refresh handle on “\\.\Tape0”, SCSI cmd 00, new handle ffffffff, error 32
[19812] 03/05/16 01:42:13.645 PvlDrive::DisableAccess() – ReserveDevice failed, offline device
Drive = 1007 “Tape drive 0001”
ERROR = 0x0000001F (ERROR_GEN_FAILURE)

[19812] 03/05/16 01:42:13.645 PvlDrive::UpdateOnlineState()
Drive = 1007 “Tape drive 0001”
ERROR = The device is offline!

Event ID 1000
Faulting application name: wmiprvse.exe, version: 6.3.9600.17415, time stamp: 0x54505614
Faulting module name: MSVCR110.dll, version: 11.0.51106.1, time stamp: 0x5098826e

 

Research:

I spent a ton of time researching this… Old support threads were pointing me in all different directions, most of the threads being old, mentioning drivers, etc… Initially I thought it was hardware related, until through testing I got the gut feeling it was software related. There was absolutely no articles covering Backup Exec 2014 running on Windows Server 2012 R2 with this specific issue.

Tried a bunch of stuff, including swapping the P800 controller, for another HP P212. While it didn’t fix the issue, I gained some backup speed! 🙂

Updating the HP software (agents, providers, HP SMH, WBEM) had no effect.

Disabling the HP providers, and disabling the HP Monitoring, Insight, Management services had no effect whatsoever. Tried different firmware versions, also tried different drivers for the Library and Tape drive, no effect. Tried factory resets, no effect. Tried Library and Tape tools, all tests passed.

Disabled other monitoring software we have in place to monitor software/hardware on clients servers, no effect.

 

Resolution:

-Uninstalled the HP WBEM Providers and Agents.

-Added a “BusyRetryCount” 32-bit DWORD value of 250 (decimal) to the “Storport” key under “Device Parameters” in all the Tape Library and Tape Drive Registry entries. Example:

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Enum\SCSI\Changer&Ven_HP&Prod_MSL_G3_Series\5&334e8424&0&000500\Device Parameters\Storport]
“BusyRetryCount”=dword:000000fa

This needs to be added to ONLY and ALL the tape device entries (under HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Enum\SCSI\) for the Tape Library and Tape Drives. You probably will have to create “Storport” key under the devices “Device Parameters” key.

After doing this, the backups run consecutively with absolutely no issues. The event log is CLEAN, and Adamm.log is clean, and the “Faulting application name: wmiprvse.exe” errors in the event log no longer occur.

Fixed!

 

Additional Notes:

-Both “fixes” were applied at the same time. I believe the WBEM providers/agent caused the Event ID 1000 errors on WMIPRVSE.exe. While the registry keys alone may have possibly resolved the backup issues, I believe there still would have been an underlying issue with WMIPRVSE.exe faulting that could have other consequences.

-I do not believe the original installation of the HP WBEM providers caused the issue, I have a feeling a subsequent Windows Update, Backup Exec update, other module update, or an update to the HP software may have caused the issue to occur at a later time than original install. I do remember we didn’t have an issue with the backups for months, until one day it started occurring.

-I will be re-installing the HP providers and agents at a later time. I will be uninstalling all of them, and re-installing from scratch the latest versions. I will post an update with my results.

-There is a chance the registry key is needed for the HP software to co-exist with Backup Exec backups for this configuration.

-There is a chance that the registry key isn’t needed if you never load the HP software.

  14 Responses to “Backup Exec 2014 – Jobs failing on HP Proliant DL360 G6 with an HP MSL2024 LTO-6 Tape Library”

  1. Hi,

    I’ve once a similar problem with an MSL2024 and an LTO3 drive – the key is the registry entry!
    Sometimes jobs crashed after a while and only reboot fixes the issue … then I added the registry key … and backup worked fine (BE12 then BE14 and BE15 … on an old Proliant ML370G5

    Also I deaktivated in the “old” HP management agent the tape support …

  2. Hi Robert,

    While I agree that the key “fixes” the issue of the job failing, I’d still be concerned about the consequences of the WBEM providers causing issues with wmiprvse.exe. I could be totally wrong, but I’m thinking it’s probably causing quite a bit of commands/requests on the SAS bus that could be effecting performance.

    Cheers,
    Stephen

  3. Hi Stephen

    Curious what driver were you running on your tape drives – the HP driver, or the generic MS driver?

    Currently having similar issues with an HP ML350G8 and a MSL 2024 G3, but I’m currently running the HP driver.

    Thanks

    Brian

  4. Hi Brian,

    I’ve tried both the HP Driver and the Microsoft Driver, I also tried using the HP driver on the Tape Library as well. All the changes had no effect on the issue.

    Please note, that on Windows Server 2012 R2 with Backup Exec, it’s best practice to use the HP driver for the tape drive, and use the generic Microsoft “Unknown Medium Changer” driver for the tape library driver. (On Server 2012, backup exec uses user-mode drivers).

    If you’re running an earlier version of Windows Server, you’ll want to use the Symantec drivers and install them from Backup Exec using the driver wizard.

    Again, please note that changing drivers had no effect on this issue when I was troubleshooting, the ultimate fix for me is outlined in the above post.

    Feel free to post more about your setup, I’m curious how similar your issue is.

    Cheers,
    Stephen

  5. Hi Stephen

    Thanks for the reply.

    My setup is Arcserve 16 (ugh), ML350G8, MSL2024 and Server 2008 R2. While it’s not that similar to your setup, the symptoms are identical. I found (as I’m sure you did as well) no shortage of similar issues, the common points being an HP server, backup software, and the backup software thinking that the drive has disconnected.

    HP LTT tests all come back OK. I have no events in the windows logs about the device disconnecting, just in the Arcserve logs. Yesterday I updated the firmware on the tape drive and the loader, and disabled some of the HP services to see if it would help. I also went in the HP Management Agents in the control panel and disabled anything to do with SCSI, SAS or Tape. While I did get a good backup last night, I still see the errors in the logs. The backups don’t always fail, but I’m always seeing the errors. I’m currently running the HP drivers, and while Arcserve does want the generic drivers, I have an almost identical setup working 100% in the field using the HP drivers, as well as dozens using a similar setup, sans loader.

    When I was prepping this server, I was restoring using the standalone drive in the server, not the loader. I was seeing the same errors while doing the restore as I do with the loader, and they’re on separate controllers.

    Next steps for me will be to try the generic drivers and using the registry entries you have listed above. I’m thinking that if I’m using the HP driver, the ‘storport’ registry entries will have no effect.

    Thanks for the help,

    Brian

  6. Hi Brian,

    No worries! 🙂

    The registry entries WILL have an effect no matter what driver you use. These entries modify variables on the SCSI driver backend system in Windows. They don’t work with the device drivers, but the device drivers actually uses the storport system in Windows.

    As I mentioned in the post, disabling the HP Management agents will not have an effect alone. You actually need to uninstall the WBEM providers.

    To resolve this quickly, I would suggest applying the registry fixes, and then uninstalling the HP WBEM providers (actually uninstall them from the system).

    Hoping that this fixes the issue, I would then suggest after confirming resolution and running a few backups, then later re-install the WBEM providers.

    I haven’t confirmed it yet, but I believe there may have been a Windows Update, or a 3rd party application install that may have caused some issues with the WBEM providers. I haven’t verified it yet, but I’m thinking there’s a possibility that installing them fresh later, may work and won’t interrupt the “fix”.

    Out of curiosity, is it just backup errors? Or are you also seeing the faulting “wmiprvse.exe” as well?

    Essentially what causes this issues, is too much “action” on the SCSI/SAS bus system on the Windows SCSI system. I’m thinking that it’s a combination of the WBEM providers queuering the status of devices too much (and also crashing at the same time), this hits a limit on the commands (which is why the registry fix allows more leeway).

    Seriously, apply the registry fixes, and uninstall the WBEM providers, then restart the system. I’m sure it’ll fix your issues! And I also recommend using the drivers that are suggested for your setup, don’t stray away! 🙂

    Cheers

  7. Hi Stephen

    I am seeing the ‘wmiprvse.exe’ errors as well. My initial thought is it’s something installed by the latest Proliant Support Pack. It’s the only difference from the other servers in the field, also built and configured by myself, using the same image.

    ‘Storport’ entries will be going in immediately then. Uninstalling WBEM will have to wait as there is currently a restart queued on the server.

    Thanks again,

    Brian

  8. You could be 100% bang on with it being associated with the latest support pack!

    In my situation, my setup actually worked with no issues, and just all of a sudden one day started having this above issue. This is why I suspect it was either an update to the HP agents/WBEM providers (Proliant Support Pack), or a Windows Update, or a 3rd party application install that may have effected a shared library or something…

    This is one of the reasons why I suspect there’s a chance that after uninstalling the WBEM providers, that you may be able to re-install them fresh afterwards and not re-active the issue.

    Please note, that after applying the registry entries you’ll need to restart the server for them to take effect. Also, I would suggest applying the registry entry for both the tape drive, and the tape library inside of the registry (as mentioned in my post above).

    Let me know how you make out!

    Cheers

  9. Hi Stephen,

    I believe that I can confirm for you that WBEM providers are the cause. I am in a similar situation where I am trying to install the HP agents/WBEM providers in order to talk to our HP SIM server for central hardware monitoring/alerts. Installing the HP management agents alone do not effect my backups, but installing the WBEM providers cause my backups to fail with the same errors you indicated in your original post. As soon as I uninstall WBEM providers, the next backups run successfully without errors.

  10. Hi Phil,

    Glad to hear we have narrowed down the problem. Just curious, did you apply the registry fix? I haven’t confirmed it, but there is a possibility that the registry fix may allow the WBEM providers to co-exist with everything.

    Need confirmation though.

    Cheers

  11. Hi Stephen,

    Sorry for the delayed reply, I just returned back from vacation. No, I did not apply the registry fix. In my case, once the HP Management agent was able to talk to my HP SIM, that was all I wanted so not having the WBEM providers didn’t matter to me at that point.

  12. If you’re using VMs I would skip Backup Exec all together, and move to a software such as Zerto Virtual Replication or Altaro Backup Software. Both of these provide good backup and Disaster Recovery, and you can do granular restores.

    From my personal experience the software is abysmal, it fails randomly and sometimes I was left without a proper backup for a week.
    Luckily I had volume shadow copies implemented and if something went wrong I could roll back the data.

    Symantec Backup Exec is not a very good backup solution.

  13. Unfortunately in most of my clients virtualization environments we use applications that require backup aware backup applications to remain supported (such as Active Directory, Exchange, SQL to name a few).

    While I know that snapshot backups will work most of the time, and I know there are “application aware” virtualization snapshot based backup applications, it’s still an “unsupported configuration” to do backups using these virtualization based snapshots (with exchange for example).

    A lot of people using virtual snapshot based backups, still use an application such as Backup Exec to perform proper Exchange/SQL backups.

    In my own environment I used virtualization based snapshots and replication for my own disaster recovery solution, however for clients and most businesses it’s important to stay in a “supported configuration” in the event issues ever occur and external or 3rd party support needs to get involved. In some cases they may refuse to provide any support if you take the solution of a supportable configuration.

    I do however agree that sometimes it can be a pain in the rear! 🙂

    But with that being said, all Backup solutions should be monitored/maintained. We monitor all our clients environments, so if a backup job does fail inside of Backup Exec, we immediately get e-mail notifications of the event.

  14. Solution:

    Step 1:

    Cause

    1. If the tape deployed was used extensively and needs a long erase.

    2. If the tape drive used for the job has multiple read/write errors.

    Solution

    1. Ensure that the firmware version of the tape drive is updated

    2. Reinstall Symantec Backup Exec device drivers

    3. Suggest a long erase on the tape used for the job (applicable only if there is abdicable data on tape)

    4. Clean the tape drive at regular intervals as recommended by the manufacturer or every 8 hrs of continuous usage.

    If backup jobs are still failing make sure that Tape Library & Tape drive have no hardware error logs or get the Tape library checked with Vendor.

    Step 2:

    To resolve the media write errors, the following steps were taken to modify the registry in Windows on the media server.

    1. Open Windows regedit.

    2. Open the path HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Enum\SCSI\[device identifier name for the tape LTO-3 tape device]\[numeric device instance id for the LTO-3 tape device]\Device Parameters

    3. Add a key named Storport under Device Parameters (if there is not one there already).

    4. Add a key name called BusyRetryCount. The key type is REG_DWORD, and the value should be set to 1000. (As per Symantec This value should be 250 or higher but in our case it worked with 1000.)

    5. Exit Regedit.

    6. Reboot the server.

    Step 3:

    1. Disable the tape drive in Backup Exec (right click on it) This must be done in Backup Exec! Doing the same in Windows Device Manager and rebooting did not fix the problem.

    2. Delete the tape drive in Backup Exec (right click again, need to disable first) This must be done in Backup Exec! Doing the same in Windows Device Manager and rebooting did not fix the problem.

    3. Stop all the Backup Exec services (I did this with services manager & confirmed in Windows Services)

    4. Start all the Backup Exec services

    5. Start Backup Exec

    6. The tape drive should be re-detected.

 Leave a Reply

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

(required)

(required)