Feb 202013
 

Recently it was time to refresh a client’s disaster recovery solution. We were getting ready to release our dependance on our 5 year old HP MSL2024 with an LTO-4 tape drive, and implement a new HP MSL2024 library with a SAS LTO-6 tape drive. We need to use tape since the size of the backup requirements for a full back up are over 6TB.

The server that is connected to all this equipment is an HP Proliant DL360 G6 with a HP Smart Array P800 Controller. The P800 already has an HP StorageWorks MSA60 unit attached to it with 12 drive

Documentation for the P800 mentioned tape drive support. While I know that the P800 is only capable of 3Gb/sec, this is more that enough and chances are the hard drive will be maxed out reading anyways.

Anyways, client approved purchase, brought in the hardware and installed it. First we had to install Backup Exec 2012 (since only the 2012 SP1a HCL specifies support for LTO-6), which was messy but we did it. Then we re-configured all of our backup jobs, since the old jobs were migrated horribly.

When trying to run our first backup, the backup failed. I tried again numerous times, only to get these errors:

  • Storage device “HP 07” reported an error on a request to rewind the media.
  • Final error: 0xe00084f0 – The device timed out.
  • Storage device “HP 07” reported an error on a request to write data to media.
  • Storage device “HP 6” reported an error on a request to write data to media.
  • PvlDrive::DisableAccess() – ReserveDevice failed, offline device
  • ERROR = 0x0000001F (ERROR_GEN_FAILURE)

Also, every time the backup would fail, the Library and the Tape drive would disappear from the computers “Device Manager”. Essentially the device would lose it’s connection. Even when logging in to the HP MSL2024 web interface, it would state the SAS port is disconnected after a backup job would fail. To resolve this, you’d have to restart the library and restart the Backup Exec services. One interesting thing, when this occurred, my companies monitoring and management software would report a RAID failure had occured at the customers site, until the MSL was restarted (this was kinda cool).

 

I immediately called HP support. They mentioned the library had a firmware up 5.80 and asked to try to update. We did and it failed since the firmware file didn’t match it’s checksum, I was told that this is not important as 5.90 doesn’t contain any major changes. We continued to spend 6 hours on the phone trying to disable insight agents, check drivers, etc… Finally he decided to replace the tape drive.

Since LTO-6 is brand new technology, even with a 4 hour response contract, it took HP around 2 weeks to replace the drive since none were in Canada. During this time, I called two other separate times. The second tech told me that at the moment, no HP controllers support the HP LTO-6 tape drives (you’re kidding me right?), and the 3rd said he couldn’t provide me any information as there’s nothing in the documentation that specifies what controllers were compatible. All 3 tech’s mentioned that having the P800 controller in the server host both the MSA60 and the MSL2024 is probably causing the issues.

We received the new tape drive, tested, and the backups failed. I sent the drive back (which was a repaired unit, and kept the original brand new one). After this I tried numerous things, google’d for days. Finally I was just about to quote the client a new controller card, when I finally decided to give HP another call.

On this call, he escalated the issue to engineers. Later that night I received an e-mail stating that library firmware 5.90 is required for support for the LTO-6 tape drives. I was shocked, angry, etc… It turns out that library firmware 5.80 was “Recalled” due to major issues a while back.

Since LTT couldn’t load the firmware, I just downloaded it manually and flashed it via the MSL 2024 web interface. After this restarted the Backup Exec services, performed an inventory, and did a minor backup (around 130GB). Keep in mind that when the backups originally failed, it didn’t matter the size, the backup would simply fail just before it completed.

The backup completed! Later on that night I ran a full complete backup of 5TB (2 servers and 2 MSA60s) and it completely 100% successfully. Even with the MSA60 under extreme load maxing out the drives, this did not in any way impede performance of the LTO-6 tape drive/library.

 

So please, if you’re having this issue consider the following:

1. Tape library must be at firmware version 5.90 to support LTO-6 Tape drives. Always always always make sure you have the latest firmware.

2. I have a working configuration of a P800 controlling both an HP MSA60, and a HP MSL 2024 backup library and it’s working 100%

3. Make sure you have Backup Exec 2012 SP1a installed as it’s required for LTO-6 compatibility (make sure you read about the major changes upgrading to 2012 first, I can’t stress this enough!!!)

 

I hope this helps some of you out there as this was consuming my life for numerous weeks.

Nov 222012
 

Just something I wanted to share in case anyone else ran in to this issue…

At a specific client we have 2 X MSA60 units attached via Smart Array P800 controllers to 2 X DL360 G6 servers. These combo of server, controller, and storage units were purchased just after they were originally released from HP.

I’m writing about a specific condition in which after a drive fails in RAID 5, during rebuild, numerous (and I mean over 70,000) event log entries in the event viewer state: “Surface analysis has repaired an inconsistent stripe on logical drive 1 connected to array controller P800 located in server slot 2. This repair was conducted by updating the parity data to match the data drive contents.”

 

One one of these arrays, shortly after a successful rebuild while the event viewer was spitting these errors out, had another drive fail. At this point the RAID array went offline, and the entire RAID array and all it’s contents were unrecoverable. Keep in mind this occurred after the rebuild, while a surface scan was in progress. In this specific case we rebuilt the array, restored from backup and all was good. After mentioning this to HP support techs, they said it was safe to ignore these messages as they were fine and informational (I didn’t feel this was the case). After creating the new RAID array on this specific unit, we never saw these messages on that unit again.

On the other MSA60 unit however, we regularly received these messages (we always keep the firmware of the MSA60 unit, and the P800 controller up to date). Again numerous times asked HP support and they said we could safely ignore these. Recently, during a power outage, the P800 controller flagged it’s cache batteries as failed, at the same time a drive failed and we were yet again presented with these errors after the rebuild. After getting the drive replaced, I contacted HP again, and finally insisted that they investigate this issue regarding the event log errors. This specific time, new errors about parity were presenting themselves in the event viewer.

After being put on hold for some time, they came back and mentioned that these errors are probably caused because the RAID array was created with a very early firmware version. They recommended to delete the logical array, and re-create it with the latest firmware to avoid any data loss. I specifically asked if there was a chance that the array could fail due to these errors, and the fact it was created with an early firmware version, and they confirmed it. I went ahead, created backups, deleted the array and re-created it, restored the back and the errors are no longer present.

 

I just wanted to create this blog post, as I see numerous people are searching for the meaning of these errors, and wanted to shed some light and maybe help a few of you out, to help you avoid any future catastrophic problems!

Jun 032012
 

Well, for the longest time I have been running a vSphere 4.x cluster (1 X ML350 G5, 2 X DL360 G5) off of a pair of HP MSA20’s connected to a SuperMicro server running Lio-Target as a iSCSI Target on CentOS 6.

This configuration has worked perfectly for me for almost a year, requiring absolutely no maintenance/work at all.

Recently I moved, so I had to move all my servers, storage units, etc… When I got in to the new place, and went to power everything up, I noticed that my first drive had failed upon initializing one of the MSA20 units.. I replaced this drive, let it rebuild, and thought this would be the end of the issue, but I was incorrect. (Just so everyone knows, these units had been on continuously for 8+ months before turning them off to move).

For months since this happened and a successful rebuild, at times of high I/O (backing up to a NFS share using GhettoVCB), the logical drive in the array just disappears. I have each MSA20 connected to it’s own HP SmartArray 6400/6402 controller. When the logical drive disappears, I notice that the “Drive Failure” LED on the SA 640x controller illuminates. When this happens, I have to shut off all physical servers, the storage server (running Lio-Target), and the MSA’s, and restart everything.

Sometimes it is worse than others (example I’ve been dealing with this issue non-stop all weekend with no sleep). Even under low I/O I’ll be starting the VMs and it will just lose it again. While other times, I can run it for weeks as long as I keep the I/O to a minimum.

I’ve read numerous other articles and posts of other people having the same issue. These people have had HP replace every single item inside of the MSA20 unit (except the drives) and the issue still occurs. This made me start thinking.

This weekend, I NEEDED to get a backup done. While doing this, the issue occurred, and it got to the point where I couldn’t even start the VMs, let alone back them up. I figured that since other people have had this issue, and since replacing all hardware hasn’t fixed it (I even moved a RAID array from one MSA20 to the other I have with no affect), then it has to be the drives themselves.

There are two possibilities, either the drives are failed, and the MSA20 isn’t reporting them as failed (which I’ve seen happen), or the way the MSA20 creates the RAID array has issues. I ran a ADU report and carefully read the entire report. Absolutely no issues reading/writing to the drives, and the MSA20 has no events in it’s log. It HAS to be the way the RAID array is created on the disks.

Please do not try this unless you know exactly what you’re doing. This is very dangerous and you can lose all your data. This is just me reporting on my experience and usage.

In desperation I thought to myself this all happened when a drive failed and I put a new disk in the array. Since I couldn’t back any of my data up, or let alone even start the VMs, I decided to start behaving dangerously. I kept all my vSphere hosts offline, and just turned on the MSA20 units and the SuperMicro server that is attached to them. I then proceeded to remove a drive, re-insert it, let it rebuild over 3 hours, then when healthy and rebuilt, do it to the next drive. I have a RAID 5 Array containing 4 X 500GB disks, so this actually took me a day to do (had to remove/re-insert, rebuild, then next drive).

After finally removing and rebuilding each drive in the array, I finally decided to boot up the vSphere servers, and run a backup of my 12 VMs. Not only did everything seem faster, but I completed the backup without any problems. This shows that there’s a very high chance that the RAID stuff on the drives is either corrupt, damaged, or just wasn’t implemented very nicely. Rebuilding each drive seemed to fix this. I’ll report in a few weeks to let you know for sure that it’s resolved!

 

Hope this helps someone out there!

Jan 262012
 

Well, for all you people out there considering extending your MSA20 RAID Array or transforming the RAID type, but are concerned about how long it will take…

I recently added a 250GB drive to a RAID5 array consisting of 9 X 250GB disks. Adding another 250GB disk to the RAID 5 array, took less then 8 hours (it actually could have been WAY less) to add the drive. Extending the logical partition took no time at all.

One thing I do have to caution though, I did a test transformation converting a RAID 5 array to a RAID 6. It started off going fast, once it hit 25% it sat there, only increasing 1% every 1-2 days. After 4 days I finally killed the transformation. PLEASE NOTE: There is a chance this may have had to do with a damaged drive, and I think that may have had something to do with the issue. This will need further testing. Also, just so you are aware, you CANNOT cancel a transformation. I stopped mine by simply turning off the unit, and ALL data was destroyed. If you start a transformation, you NEED to let it complete.

ALWAYS insure you have a COMPLETE backup before doing these types of things to a RAID array!

Dec 142011
 

Recently I had the task of setting up a Site-to-Site IPSec tunnel between my office and one of my employees home office. At my main business HQ we have a Sophos UTM running inside of a vSphere 4 cluster. However I had to find the cheapest way to get the employee hooked up.

The main tasks of the VPN endpoint at the employee’s site was:

1) Filter web, pop3, and provide security for the devices behind the UTM at the home office (1-3 computers, and other random devices)

2) Provider a Site to Site VPN connection and to allow the user access to internal resources, along with providing access to our VoIP PBX (VoIP phone at employee site)

3) Provide access to other resources such as exchange, CRM, etc… And reverse management of devices at home office from HQ

First I needed to find an affordable computer to install the Sophos UTM software appliance on to. My company is an HP Partner, and we love their products, so I decided to purchase a new computer that would be powerful enough to host the UTM software appliance, and also be protected under HP’s business warranty. I wanted the system to have enough performance that in the future, if the home office was decommissioned, we would be able to use it still as a UTM device but for something else (let’s say a real remote office).

After taking a look at our distributor to find out what was immediately available (as this was a priority), we deiced to pick up a HP Compaq 4000 Pro Small Form Factor PC. Below are the specs:

HP Compaq 4000 Pro Small Form Factor PC

Part Number: LA072UT (Or LA072UT#ABA for the English version in Canada)

System features
Processor Intel® Core™2 Duo Processor E7500 (2.93 GHz, 3 MB L2 cache, 1066 MHz FSB)
Operating system installed Genuine Windows® 7 Professional 32-bit
Chipset Intel® B43 Express
Form factor Small Form Factor
PC Management Available for free download from www.hp.com/go/easydeploy: HP Client Automation Starter; HP SoftPaq Download Manager; HP Client Catalog for Microsoft SMS; HP Systems Software Manager
Memory
Standard memory 2 GB 1333 MHz DDR3 SDRAM
Memory slots 2 DIMM
Storage
Internal drive bays One 3.5″
External drive bays One 3.5″
One 5.25″
Internal drive 500 GB 7200 rpm SATA 3.0 Gb/s NCQ, Smart IV
Optical drive SATA SuperMulti LightScribe DVD writer
Graphics
Graphic card Integrated Intel Graphics Media Accelerator 4500
Expansion features
I/O ports 8 USB 2.0
1 serial (optional 2nd)
1 parallel (optional)
1 PS/2 keyboard
1 PS/2 mouse
1 VGA
1 DVI-D
1 microphone/headphone jack
1 audio in
1 audio line out
1 RJ-45
Slots 2 low-profile PCI
1 low-profile PCIe x16
1 low-profile PCIe x1
Media devices
Audio Integrated High Definition audio with Realtek 2 channel ALC261 codec
Communication features
Network interface 10/100/1000
Power and operating requirements
Power Requirements 240W power supply – active PFC
Operating Temperature Range 10 to 35°C
Dimensions and Weight
Product weight Starting at 7.6 kg
Dimensions (W x D x H) 33.8 x 37.85 x 10 cm
Security management
Security management Stringent Security (via BIOS)
SATA Port Disablement (via BIOS)
Drive Lock
Serial, Parallel, USB enable/disable (via BIOS)
Optional USB Port Disable at factory (user configurable via BIOS)
Removable Media Write/Boot Control
Power-On Password (via BIOS)
Setup Password (via BIOS)
HP Chassis Security Kit
Support for chassis padlocks and cable lock devices
What’s included
Software included Microsoft Windows Virtual PC
HP Power Assistant
Warranty features Protected by HP Services, including a 3 years parts, 3 years labour, and 3 years onsite service (3/3/3) standard warranty. Terms and conditions vary by country. Certain restrictions and exclusions apply.

This system was spec’d very nicely for the requirements we had. Another huge bonus is that it was covered under a factory 3 year warranty from HP. Which means that if anything failed, we would have next business day replacement (I love this, and so do my clients who all purchase HP). The one downside is that the system shipped with a Windows 7 license which we wouldn’t be using, but for the price of the system, it didn’t really matter.

The system only came standard with one Gigabit NIC (Network card), however we need two since this device is acting as a firewall/router. It’s a Small Form Factor system, so we had to find a second network adapter which was compatible with the computers case form factor. The card which we purchased was:

HP – Intel Gigabit CT Desktop NIC

Part Number: FH969AA

Although the computer above is not in the compatibility list for the network card, the network card still worked perfect. Once received, we simply replace the case bracket on the card with one that shipped with it for small form factor computers.

We then burned the .ISO image of the UTM software appliance, and proceeded to install it on the system. It installed (along with the 64-bit kernel) perfectly on the computer. After the install was completed, we configured it to connect to our main central Sophos SUM (Software Update Manager) and shipped the device out to the employee’s home office.

Once installed, we logged on to the Sophos SUM user interface, and created a Site to Site IPsec using the wizard. Within 2-5 seconds the connection was established and everything was working 100%.

After using this for a few days, I checked to make sure the computer was powerful enough to be providing the services required, and it was without any problems.

Just wanted to share my experience in case anyone else is doing something similar to what I did above. If you were to reproduce this, all the hardware should be under $700.00 CAD.

Oct 042010
 

For the first time I decided to test compability of non-hp branded drives inside of the HP MSA20 array.

First I installed a 500GB SATA2 Seagate drive I had laying around. It detected it perfectly and is working.

The ACU on CentOS 5.5 shows as follows:

Logical Drive
Status    OK
Drive Number    1
Drive Unique ID    XXXXXXXXXXXXXXXXXXXXXXXXXXX
Size    476908 MB
Fault Tolerance    RAID 0
Heads    255
Sectors/Track    32
Cylinders    65535
Strip Size    128 KB
Array Accelerator    Enabled
Disk Name    /dev/cciss/c0d0

I also just populated another bay with a Seagate 2TB SATA2 drive I just found, and it works aswell. ACU reports as follows:

Physical Drive
Status    OK
Drive Configuration Type    Unassigned
Size    2000.3 GB
Drive Type    SATA
Model    Seagate ST32000542AS
Serial Number    XXXXXXXX
Firmware Version    CC34

Keep in mind that no matter what size of drive you have, there is a 32-bit SCSI LUN limitation of under 2TB (I don’t know exactly what it is). Although you can create large arrays, your logical drives will not be able to go over this limitation.

Oct 032010
 

I’ve been messing around with this one old array that has a slew of problems for some time now.

One thing I just figured out today, if you have an MSA20 and even though all the drives have a green light and are NOT marked as failed, the drives still can be in very bad shape.

Example: I have 5 X 250GB SATA drives, and EACH of the 5 drives had failed. The MSA20 said the drives were good, however multiple I/O errors were showing up on the kernel logs on a CentOS 5 install. Taking the drives out and testing them on a different system and NOT using the MSA20 shows that the drives are physically dead. Furthermore, popped in some random 500GB SATA drive I had sitting around in to the MSA20, and now the MSA20 works beautifully!

Sep 222010
 

Just a note for you guys. I was searching for this and it was incredibly hard to find out any information, so I thought I’d create something that could be easily indexed when searched for.

Symptoms:

You try to upgrade the processor on a DL360 G5 to a Quad-Core processor and you receive the message:

“The revision of the Intel (R) 5000 series chipset on the system board does not support the installed processor type. System halted!”

Cause:

There are two separate motherboards that were being shipped with the DL360 G5. The first version of the motherboard only supported 50xx and 51xx dual core Xeon processors. This version shipped with the earlier products.

The second version supported 54xx quad core Xeon processors.

Part Numbers:

412199-001 – Only supports 50xx and 51xx Dual-Core

436066-001 – Supports up to 54xx Quad Core Processors.

435949-001 – (Not Confirmed) Supports up to 54xx Quad Core Processors. (Updated March 22nd, 2011)

Aug 232010
 

Well,

Received my FREE (yes free) HP Mini 210 today from HP BlueCarpet (rewards program for HP Partners for selling stuff). I have to say this device is SHAWEET. Way better than my Acer One.

Anyways, to cut to the chase, I’ve got info, I’ve reviewed every service manual, and PDF doc there is. I beleive I can get 3G running on this puppy using the PROPER equipment (utilizing the built in SIM slot, all HP).

Two seperate things are required. HP 3G/HSPA/HSDPA/GSM modem, and the WWAN antennas to be mounted in the display. I’m ordering the parts soon, I’ll post a how to and keep you updated on my progress.

And this device is definitely worth a review (it’s amazing)! Keep posted for that as well.