Mar 262016
 

An issue that’s been making me rip my hair apart for some time… And a fix for you experiencing the same.

 

Equipment:

HP Proliant DL360 G6 Server (with a P800 Controller) running Server 2012 R2 and Backup Exec 2014

HP MSL-2024 Tape Library with a single HP SAS LTO-6 Tape Drive

 

Symptoms:

-After a clean restart, a backup job completes successfully. Subsequent jobs fail until server or services restarted.

-While the initial backup does complete, errors/warnings can be seen in the adamm.log and the Event Viewer even when successful.

-Subsequent backups failing report that the device is offline. The Windows Device Manager reports everything is fine.

-Windows Server itself does not report any device errors whatsoever.

 

Observations:

[5648] 03/05/16 07:50:46 Adamm Mover Error: DeviceIo: 03:07:00:00 – Device error 1167 on “\\.\Tape0”, SCSI cmd 0a, 1 total errors
[5648] 03/05/16 07:55:46 Adamm Mover Error: DeviceIo: 03:07:00:00 – Refresh handle on “\\.\Tape0”, SCSI cmd 00, new handle 214, error 0
[5648] 03/05/16 07:55:46 Adamm Mover Error: DeviceIo: 00:00:00:00 – Retry Logic: Retry logic was engaged on device: HP       Ultrium 6-SCSI
[5648] 03/05/16 07:55:46 Adamm Mover Error: DeviceIo: 00:00:00:00 – Retry Logic: Original settings restored on device: HP       Ultrium 6-SCSI

Event ID 58053
Backup Exec Alert: Storage Error
(Server: “WhatsMySRVRname”) The device state has been set to offline because the device attached to the computer is not responding.

Ensure that the drive hardware is turned on and is properly cabled. After you correct the problem, right-click the device, and then click Offline to clear the check mark and bring the device online.

[09968] 03/05/16 01:42:08.426 DeviceIo: 03:07:00:00 – Refresh handle on “\\.\Tape0”, SCSI cmd 17, new handle ffffffff, error 32
[09968] 03/05/16 01:42:08.426 DeviceIo: 03:07:00:00 – Refresh handle on “\\.\Tape0”, SCSI cmd 1a, new handle ffffffff, error 32
[09968] 03/05/16 01:22:07.867 PvlSession::DismountMedia( 0, 0, 0 )
Job = {JOBHEXNUMBERZZZZZZ} “ServerBackup-Full”
Changer    = {CHANGERZZZZ} “Robotic library 0001”
Drive      = {MYBACKUPDRVXZZZZZ} “Tape drive 0001”
Slot       = 13
Media      = {MEDIAZIDZZZZ} “BARCODEID”
ERROR = 0xE0008114 (E_PVL_CHANGER_NOT_AVAILABLE)

[19812] 03/05/16 01:42:12.613 DeviceIo: 03:07:00:00 – Refresh handle on “\\.\Tape0”, SCSI cmd 1a, new handle ffffffff, error 32
[19812] 03/05/16 01:42:13.129 DeviceIo: 03:07:00:00 – Refresh handle on “\\.\Tape0”, SCSI cmd 00, new handle ffffffff, error 32
[19812] 03/05/16 01:42:13.645 PvlDrive::DisableAccess() – ReserveDevice failed, offline device
Drive = 1007 “Tape drive 0001”
ERROR = 0x0000001F (ERROR_GEN_FAILURE)

[19812] 03/05/16 01:42:13.645 PvlDrive::UpdateOnlineState()
Drive = 1007 “Tape drive 0001”
ERROR = The device is offline!

[19812] 03/05/16 01:42:12.613 DeviceIo: 03:07:00:00 – Refresh handle on “\\.\Tape0”, SCSI cmd 1a, new handle ffffffff, error 32
[19812] 03/05/16 01:42:13.129 DeviceIo: 03:07:00:00 – Refresh handle on “\\.\Tape0”, SCSI cmd 00, new handle ffffffff, error 32
[19812] 03/05/16 01:42:13.645 PvlDrive::DisableAccess() – ReserveDevice failed, offline device
Drive = 1007 “Tape drive 0001”
ERROR = 0x0000001F (ERROR_GEN_FAILURE)

[19812] 03/05/16 01:42:13.645 PvlDrive::UpdateOnlineState()
Drive = 1007 “Tape drive 0001”
ERROR = The device is offline!

Event ID 1000
Faulting application name: wmiprvse.exe, version: 6.3.9600.17415, time stamp: 0x54505614
Faulting module name: MSVCR110.dll, version: 11.0.51106.1, time stamp: 0x5098826e

 

Research:

I spent a ton of time researching this… Old support threads were pointing me in all different directions, most of the threads being old, mentioning drivers, etc… Initially I thought it was hardware related, until through testing I got the gut feeling it was software related. There was absolutely no articles covering Backup Exec 2014 running on Windows Server 2012 R2 with this specific issue.

Tried a bunch of stuff, including swapping the P800 controller, for another HP P212. While it didn’t fix the issue, I gained some backup speed! 🙂

Updating the HP software (agents, providers, HP SMH, WBEM) had no effect.

Disabling the HP providers, and disabling the HP Monitoring, Insight, Management services had no effect whatsoever. Tried different firmware versions, also tried different drivers for the Library and Tape drive, no effect. Tried factory resets, no effect. Tried Library and Tape tools, all tests passed.

Disabled other monitoring software we have in place to monitor software/hardware on clients servers, no effect.

 

Resolution:

-Uninstalled the HP WBEM Providers and Agents.

-Added a “BusyRetryCount” 32-bit DWORD value of 250 (decimal) to the “Storport” key under “Device Parameters” in all the Tape Library and Tape Drive Registry entries. Example:

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Enum\SCSI\Changer&Ven_HP&Prod_MSL_G3_Series\5&334e8424&0&000500\Device Parameters\Storport]
“BusyRetryCount”=dword:000000fa

This needs to be added to ONLY and ALL the tape device entries (under HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Enum\SCSI\) for the Tape Library and Tape Drives. You probably will have to create “Storport” key under the devices “Device Parameters” key.

After doing this, the backups run consecutively with absolutely no issues. The event log is CLEAN, and Adamm.log is clean, and the “Faulting application name: wmiprvse.exe” errors in the event log no longer occur.

Fixed!

 

Additional Notes:

-Both “fixes” were applied at the same time. I believe the WBEM providers/agent caused the Event ID 1000 errors on WMIPRVSE.exe. While the registry keys alone may have possibly resolved the backup issues, I believe there still would have been an underlying issue with WMIPRVSE.exe faulting that could have other consequences.

-I do not believe the original installation of the HP WBEM providers caused the issue, I have a feeling a subsequent Windows Update, Backup Exec update, other module update, or an update to the HP software may have caused the issue to occur at a later time than original install. I do remember we didn’t have an issue with the backups for months, until one day it started occurring.

-I will be re-installing the HP providers and agents at a later time. I will be uninstalling all of them, and re-installing from scratch the latest versions. I will post an update with my results.

-There is a chance the registry key is needed for the HP software to co-exist with Backup Exec backups for this configuration.

-There is a chance that the registry key isn’t needed if you never load the HP software.

Feb 202013
 

Recently it was time to refresh a client’s disaster recovery solution. We were getting ready to release our dependance on our 5 year old HP MSL2024 with an LTO-4 tape drive, and implement a new HP MSL2024 library with a SAS LTO-6 tape drive. We need to use tape since the size of the backup requirements for a full back up are over 6TB.

The server that is connected to all this equipment is an HP Proliant DL360 G6 with a HP Smart Array P800 Controller. The P800 already has an HP StorageWorks MSA60 unit attached to it with 12 drive

Documentation for the P800 mentioned tape drive support. While I know that the P800 is only capable of 3Gb/sec, this is more that enough and chances are the hard drive will be maxed out reading anyways.

Anyways, client approved purchase, brought in the hardware and installed it. First we had to install Backup Exec 2012 (since only the 2012 SP1a HCL specifies support for LTO-6), which was messy but we did it. Then we re-configured all of our backup jobs, since the old jobs were migrated horribly.

When trying to run our first backup, the backup failed. I tried again numerous times, only to get these errors:

  • Storage device “HP 07” reported an error on a request to rewind the media.
  • Final error: 0xe00084f0 – The device timed out.
  • Storage device “HP 07” reported an error on a request to write data to media.
  • Storage device “HP 6” reported an error on a request to write data to media.
  • PvlDrive::DisableAccess() – ReserveDevice failed, offline device
  • ERROR = 0x0000001F (ERROR_GEN_FAILURE)

Also, every time the backup would fail, the Library and the Tape drive would disappear from the computers “Device Manager”. Essentially the device would lose it’s connection. Even when logging in to the HP MSL2024 web interface, it would state the SAS port is disconnected after a backup job would fail. To resolve this, you’d have to restart the library and restart the Backup Exec services. One interesting thing, when this occurred, my companies monitoring and management software would report a RAID failure had occured at the customers site, until the MSL was restarted (this was kinda cool).

 

I immediately called HP support. They mentioned the library had a firmware up 5.80 and asked to try to update. We did and it failed since the firmware file didn’t match it’s checksum, I was told that this is not important as 5.90 doesn’t contain any major changes. We continued to spend 6 hours on the phone trying to disable insight agents, check drivers, etc… Finally he decided to replace the tape drive.

Since LTO-6 is brand new technology, even with a 4 hour response contract, it took HP around 2 weeks to replace the drive since none were in Canada. During this time, I called two other separate times. The second tech told me that at the moment, no HP controllers support the HP LTO-6 tape drives (you’re kidding me right?), and the 3rd said he couldn’t provide me any information as there’s nothing in the documentation that specifies what controllers were compatible. All 3 tech’s mentioned that having the P800 controller in the server host both the MSA60 and the MSL2024 is probably causing the issues.

We received the new tape drive, tested, and the backups failed. I sent the drive back (which was a repaired unit, and kept the original brand new one). After this I tried numerous things, google’d for days. Finally I was just about to quote the client a new controller card, when I finally decided to give HP another call.

On this call, he escalated the issue to engineers. Later that night I received an e-mail stating that library firmware 5.90 is required for support for the LTO-6 tape drives. I was shocked, angry, etc… It turns out that library firmware 5.80 was “Recalled” due to major issues a while back.

Since LTT couldn’t load the firmware, I just downloaded it manually and flashed it via the MSL 2024 web interface. After this restarted the Backup Exec services, performed an inventory, and did a minor backup (around 130GB). Keep in mind that when the backups originally failed, it didn’t matter the size, the backup would simply fail just before it completed.

The backup completed! Later on that night I ran a full complete backup of 5TB (2 servers and 2 MSA60s) and it completely 100% successfully. Even with the MSA60 under extreme load maxing out the drives, this did not in any way impede performance of the LTO-6 tape drive/library.

 

So please, if you’re having this issue consider the following:

1. Tape library must be at firmware version 5.90 to support LTO-6 Tape drives. Always always always make sure you have the latest firmware.

2. I have a working configuration of a P800 controlling both an HP MSA60, and a HP MSL 2024 backup library and it’s working 100%

3. Make sure you have Backup Exec 2012 SP1a installed as it’s required for LTO-6 compatibility (make sure you read about the major changes upgrading to 2012 first, I can’t stress this enough!!!)

 

I hope this helps some of you out there as this was consuming my life for numerous weeks.