Jun 072014
 

Well, I’ve had the HP MSA 2040 setup, configured, and running for about a week now. Thankfully this weekend I had some time to hit some benchmarks.

 

First some info on the setup:

-2 X HP Proliant DL360p Gen8 Servers (2 X 10 Core processors each, 128GB RAM each)

-HP MSA 2040 Dual Controller – Configured for iSCSI

-HP MSA 2040 is equipped with 24 X 900GB SAS Dual Port Enterprise Drives

-Each host is directly attached via 2 X 10Gb DAC cables (Each server has 1 DAC cable going to controller A, and Each server has 1 DAC cable going to controller B)

-2 vDisks are configured, each owned by a separate controller

-Disks 1-12 configured as RAID 5 owned by Controller A (512K Chunk Size Set)

-Disks 13-24 configured as RAID 5 owned by Controller B (512K Chunk Size Set)

-While round robin is configured, only one optimized path exists (only one path is being used) for each host to the datastore I tested

-Utilized “VMWare I/O Analyzer” (https://labs.vmware.com/flings/io-analyzer) which uses IOMeter for testing

-Running 2 “VMWare I/O Analyzer” VMs as worker processes. Both workers are testing at the same time, testing the same datastore.

 

Sequential Read Speed:

MSA2040-ReadMax Read: 1480.28MB/sec

 

Sequential Write Speed:

MSA2040-WriteMax Write: 1313.38MB/sec

 

See below for IOPS (Max Throughput) testing:

Please note: The MaxIOPS and MaxWriteIOPS workloads were used. These workloads don’t have any randomness, so I’m assuming the cache module answered all the I/O requests, however I could be wrong. Tests were run for 120 seconds. What this means is that this is more of a test of what the controller is capable of handling itself over a single 10Gb link from the controller to the host.

 

IOPS Read Testing:

MSA2040-MaxIOPSMax Read IOPS: 70679.91IOPS

 

IOPS Write Testing:

MSA2040-WriteOPSMax Write IOPS: 29452.35IOPS

 

PLEASE NOTE:

-These benchmarks were done by 2 seperate worker processes (1 running on each ESXi host) accessing the same datastore.

-I was running a VMWare vDP replication in the background (My bad, I know…).

-Sum is combined throughput of both hosts, Average is per host throughput.

 

Conclusion:

Holy crap this is fast! I’m betting the speed limit I’m hitting is the 10Gb interface. I need to get some more paths setup to the SAN!

Cheers

 

May 282014
 

In the last few months, my company (Digitally Accurate Inc.) and our sister company (Wagner Consulting Services), have been working on a number of new cool projects. As a result of this, we needed to purchase more servers, and implement an enterprise grade SAN.

 

For the server, we just purchased another HP Proliant DL360p Gen8 (with 2 X 10 Core Processors, and 128Gb of RAM, exact same as our existing server), however I won’t be getting that in to this blog post.

 

Now for storage, we decided to pull the trigger and purchase an HP MSA 2040 Dual Controller SAN. We purchased it as a CTO (Configure to Order), and loaded it up with 4 X 1Gb iSCSI RJ45 SFP+ modules (there’s a minimum requirement of 1 4-pack SFP), and 24 X HP 900Gb 2.5inch 10k RPM SAS Dual Port Enterprise drives. Even though we have the 4 1Gb iSCSI modules, we aren’t using them to connect to the SAN. We also placed an order for 4 X 10Gb DAC cables.

 

To connect the SAN to the servers, we purchased 2 X HP Dual Port 10Gb Server SFP+ NICs, one for each server. The SAN will connect to each server with 2 X 10Gb DAC cables, one going to Controller A, and one going to Controller B.

 

I must say that configuration was an absolute breeze. As always, using intelligent provisioning on the DL360p, we had ESXi up and running in seconds with it installed to the onboard 8GB micro-sd card.

 

I’m completely new to the MSA 2040 SAN and have actually never played with, or configured one. After turning it on, I immediately went to HPs website and downloaded the latest firmware for both the drives, and the controllers themselves. It’s a well known fact that to enable iSCSI on the unit, you have to have the controllers running the latest firmware version.

 

Turning on the unit, I noticed the management NIC on the controllers quickly grabbed an IP from my DHCP server. Logging in, I found the web interface extremely easy to use. Right away I went to the firmware upgrade section, and uploaded the appropriate firmware file for the 24 X 900GB drives. The firmware took seconds to flash. I went ahead and restarted the entire storage unit to make sure that the drives were restarted with the flashed firmware (a proper shutdown of course).

 

While you can update the controller firmware with the web interface, I chose not to do this as HP provides a Windows executable that will connect to the management interface and update both controllers. Even though I didn’t have the unit configured yet, it’s a very interesting process that occurs. You can do live controller firmware updates with a Dual Controller MSA 2040 (as in no downtime). The way this works is, the firmware update utility first updates Controller A. If you have a multipath configuration where your hosts are configured to use both controllers, all I/O is passed to the other controller while the firmware update takes place. When it is complete, I/O resumes on that controller and the firmware update then takes place on the other controller. This allows you to do online firmware updates that will result in absolutely ZERO downtime. Very neat! PLEASE REMEMBER, this does not apply to drive firmware updates. When you update the hard drive firmware, there can be ZERO I/O occurring. You’d want to make sure all your connected hosts are offline, and no software connection exists to the SAN.

 

Anyways, the firmware update completed successfully. Now it was time to configure the unit and start playing. I read through a couple quick documents on where to get started. If I did this right the first time, I wouldn’t have to bother doing it again.

 

I used the wizards available to first configure the actually storage, and then provisioning and mapping to the hosts. When deploying a SAN, you should always write down and create a map of your Storage area Network topology. It helps when it comes time to configure, and really helps with reducing mistakes in the configuration. I quickly jaunted down the IP configuration for the various ports on each controller, the IPs I was going to assign to the NICs on the servers, and drew out a quick diagram as to how things will connect.

 

Since the MSA 2040 is a Dual Controller SAN, you want to make sure that each host can at least directly access both controllers. Therefore, in my configuration with a NIC with 2 ports, port 1 on the NIC would connect to a port on controller A of the SAN, while port 2 would connect to controller B on the SAN. When you do this and configure all the software properly (VMWare in my case), you can create a configuration that allows load balancing and fault tolerance. Keep in mind that in the Active/Active design of the MSA 2040, a controller has ownership of their configured vDisk. Most I/O will go through only to the main controller configured for that vDisk, but in the event the controller goes down, it will jump over to the other controller and I/O will proceed uninterrupted until your resolve the fault.

 

First part, I had to run the configuration wizard and set the various environment settings. This includes time, management port settings, unit names, friendly names, and most importantly host connection settings. I configured all the host ports for iSCSI and set the applicable IP addresses that I created in my SAN topology document in the above paragraph. Although the host ports can sit on the same subnets, it is best practice to use multiple subnets.

 

Jumping in to the storage provisioning wizard, I decided to create 2 separate RAID 5 arrays. The first array contains disks 1 to 12 (and while I have controller ownership set to auto, it will be assigned to controller A), and the second array contains disk 13 to 24 (again ownership is set to auto, but it will be assigned to controller B). After this, I assigned the LUN numbers, and then mapped the LUNs to all ports on the MSA 2040, ultimately allowing access to both iSCSI targets (and RAID volumes) to any port.

 

I’m now sitting here thinking “This was too easy”. And it turns out it was just that easy! The RAID volumes started to initialize.

 

At this point, I jumped on to my vSphere demo environment and configured the vDistributed iSCSI switches. I mapped the various uplinks to the various portgroups, confirmed that there was hardware link connectivity. I jumped in to the software iSCSI initator, typed in the discovery IP, and BAM! The iSCSI initiator found all available paths, and both RAID disks I configured. Did this for the other host as well, connected to the iSCSI target, formatted the volumes as VMFS and I was done!

 

I’m still shocked that such a high performace and powerful unit was this easy to configure and get running. I’ve had it running for 24 hours now and have had no problems. This DESTROYS my old storage configuration in performance, thankfully I can keep my old setup for a vDp (VMWare Data Protection) instance.

 

I’ve attached some pics below. I have to apologize for how ghetto the images/setup is. Keep in mind this is a test demo environment for showcasing the technologies and their capabilities.

 

HP MSA 2040 SAN - Front Image

HP MSA 2040 SAN – Front Image

HP MSA 2040 - Side Image

HP MSA 2040 – Side Image

HP MSA 2040 SAN with drives - Front Right Image

HP MSA 2040 SAN with drives – Front Right Image

HP MSA 2040 Rear Power Supply and iSCSI Controllers

HP MSA 2040 Rear Power Supply and iSCSI Controllers

HP MSA 2040 Dual Controller - Rear Image

HP MSA 2040 Dual Controller – Rear Image

HP MSA 2040 Dual Controller SAN - Rear Image

HP MSA 2040 Dual Controller SAN – Rear Image

HP Proliant DL 360p Gen8 HP MSA 2040 Dual Controller SAN

HP Proliant DL 360p Gen8
HP MSA 2040 Dual Controller SAN

HP MSA 2040 - With Power

HP MSA 2040 – With Power

HP MSA 2040 - Side shot with power on

HP MSA 2040 – Side shot with power on

HP Proliant DL360p Gen8 - UID LED on

HP Proliant DL360p Gen8 – UID LED on

HP Proliant DL360p Gen8 HP MSA 2040 Dual Controller SAN VMWare vSphere

HP Proliant DL360p Gen8
HP MSA 2040 Dual Controller SAN
VMWare vSphere

May 312013
 

Back in February, I was approached by a company that had multiple offices. They wanted my company to come in and implement a system that allowed them to share information, share files, communicate, use their line of business applications, and be easily manageable.

The first thing that always comes to mind is Microsoft Small Business Server 2011. However, what made this environment interesting is that they had two branch offices in addition to their headquarters all in different cities. One of their branch offices had 8+ users working out of it, and one only had a couple, with their main headquarters having 5+ users.

Usually when administrators think of SBS, they think of a single server (two server with the premium add-on) solution that provides a small business with up to 75 users with a stable, enterprise feature packed, IT infrastructure.

SBS 2011 Includes:

Windows Server 2008 R2 Standard

Exchange Server 2010

Microsoft SharePoint Foundation 2010

Microsoft SQL Server 2008 R2 Express

Windows Server Update Services

(And an additional Server 2008 R2 license with Microsoft SQL Server 2008 R2 Standard if the premium add-on is purchased)

 

Essentially this is all a small business typically needs, even if they have powerful line of business applications.

One misconception about Windows Small Business Server is the limitation of having a single domain controller. IT professionals often think that you cannot have any more domain controllers in an SBS environment. This actually isn’t true. SBS does allow multiple domain controllers, as long as there is a single forest, and not multiple domains. You can have a backup domain controller, and you can have multiple RODCs (Read Only Domain Controller), as long as the primary Active Directory roles stay with the SBS primary domain controller. You can have as many global catalogs as you’d like! As long as you pay for the proper licenses of all the additional servers :)

This is where this came in handy. While I’ve known about this for some time, this was the first time I was attempting at putting something like this in to production.

 

The plan was to setup SBS 2011 Premium at the HQ along with a second server at the HQ hosting their SQL, line of business applications, and Remote desktop Services (formerly Terminal Services) applications. Their HQ would be sitting behind an Astaro Security Gateway 220 (Sophos UTM).

The SBS 2011 Premium (2 Servers) setup at the HQ office will provide:

-Active Directory services

-DHCP and DNS Services

-Printing and file services (to the HQ and all branch offices)

-Microsoft Exchange

-“My Document” and “Desktop” redirection for client computers/users

-SQL DB services for LoB’s

-Remote Desktop Services (Terminal Services) to push applications out in to the field

 

The first branch office, will have a Windows Server 2008 R2 server, promoted to a Read Only Domain Controller (RODC), sitting behind an Astaro Security Gateway 110. The Astaro Security Gateway’s would establish a site-to-site branch VPN between the two offices and route the appropriate subnets. At the first branch office, there is issues with connectivity (they’re in the middle of nowhere), so they will have two internet connections with two separate ISPs (1 line of sight long range wireless backhaul, and one simple ADSL connection) which the ASG 110 will provide load balancing and fault tolerance.

The RODC at the first branch office will provide:

-Active Directory services for (cached) user logon and authentication

-Printing and file services (for both HQ and branch offices)

-DHCP and DNS services

-“My Documents” and “Desktop” redirection for client computers/users.

-WSUS replica server (replicates approvals and updates from WSUS on the SBS server at the main office).

-Exchange access (via the VPN connection)

Users at the first branch office will be accessing file shares located both on their local RODC, along with file shares located on the HQ server in Calgary. The main wireless backhaul has more then enough bandwidth to support SMB (Samba) shares over the VPN connection. After testing, it turns out the backup ADSL connection also handles this fairly well for the types of files they will be accessing.

 

The second branch office, will have an Astaro RED device (Remote Ethernet Device). The Astaro/Sophos RED devices, act as a remote ethernet port for your Astaro Security Gateways. Once configured, it’s as if the ASG at the HQ has an ethernet cable running to the branch office. It’s similar to a VPN, however (I could be wrong) I think it uses EoIP (Ethernet over IP). The second branch doesn’t require a domain controller due to the small number of users. As far as this branch office goes, this is the last we’ll talk about it as there’s no special configuration required for these guys.

The second branch office will have the following services:

-DHCP (via the ASG 220 in Calgary)

-DNS (via the main HQ SBS server)

-File and print services (via the HQ SBS server and other branch server)

-“My Document” and “Desktop” redirection (over the WAN via the HQ SBS server)

-Exchange access (via the Astaro RED device)

 

For all the servers, we chose HP hardware as always! The main SBS server, along with the RODC were brand new HP Proliant ML350p Gen8s. The second server at the HQ (running the premium add-on) is a re-purposed HP ML110 G7. I always configure iLo on all servers (especially remote servers) just so I can troubleshoot issues in the event of an emergency if the OS is down.

 

So now that we’ve gone through the plan. I’ll explain how this was all implemented.

  1. Configure and setup a typical SBS 2011 environment. I’m going to assume you already know how to do this. You’ll need to install the OS. Run through the SBS configuration wizards, enable all the proper firewall rules, configure users, install applicable server applications, etc…
  2. Configure the premium add-on. Install the Remote Desktop Services role (please note that you’ll need to purchase RDS CAL’s as they aren’t included with SBS). You can skip this step if you don’t plan on using RDS or the premium server at the main site.
  3. Configure all the Astaro devices. Configure a Router to Router VPN connection. Create the applicable firewall rules to allow traffic. You probably know this, but make sure both networks have their own subnet and are routing the separate subnets properly.
  4. Install Windows Server 2008 R2 on to the target RODC box (please note, in my case, I had to purchase an additional Server 2008 license since I was already using the premium add-on at the HQ site. (If you purchase the premium add-on, but aren’t using it at your main office, you can use this license at the remote site).
  5. Make sure the VPN is working and the servers can communicate with each other.
  6. Promote the target RODC to a read only domain controller. You can launch the famous dcpromo. Make sure you check the “Read Only domain controller” option when  you promote the server.
  7. You now have a working environment.
  8. Join computers using the SBS connect wizard. (DO NOT LOG ON AS THE REMOTE USERS UNTIL YOU READ THIS ENTIRE DOCUMENT)

I did all the above steps at my office and configured the servers before deploying them at the client site.

You essentially have a working basic network. Now to get to the tricky stuff! This tricky stuff is to enable folder redirection at the branch site to their own server (instead of the SBS server), and get them their own WSUS replica server.

 

Now to the fancy stuff!

1. Installing WSUS on the RODC using the add role feature in Windows Server: You have to remember that RODC’s are exactly what they say! !READ ONLY! (As far as Active directory goes)! Installing WSUS on a RODC will fail off the bat. It will report that access is denied when trying to create certain security groups. You’ll have to manually create these two groups in Active Directory on your primary SBS server to get it to work:

  • SQLServer2005MSFTEUser$RODCSERVERNAME$Microsoft##SSEE
  • SQLServer2005MSSQLUser$RODCSERVERNAME$Microsoft##SSEE

Replace RODCSERVERNAME with the computer name of your RODC Server. You’ll actually notice that two similiar groups already exist (with the server name different) for the existing Windows SBS WSUS install, this existing groups are for the main WSUS server. After creating these groups, this will allow it to install. After this is complete, follow through the WSUS configuration wizard to configure it as a replica for your primary SBS WSUS server.

2. One BIG thing to keep in mind is that with RODC’s you need to configure what accounts (both user and computer) are allowed to be “cached”. Cached credentials allow the RODC to authenticate computers and users in the event the primary domain controller is down. If you do not configure this, if the internet goes down, or the primary domain controller isn’t available, no one will be able to log in to their computers or access network resources at the branch site. When you promoted the server to a RODC, two groups were created in Active Directory: Allow RODC Cached Logins, and Deny RODC Cached Logins (I could be wrong on the exact name since I’m going off memory). You can’t just select and add users to these groups, you need to also select and add the computers they use as well since computers have their own “computer account” in Active Directory.

To overcome this, create two security groups under their respective existing groups. One group will be for users of the branch office, the other group will be for computers of the branch office. Make sure to add applicable users and groups as members of the security groups. Now go to the “Allow RODC Cached Logins” group created by the dc promotion, and add those two new security groups to that group. This will allow remote users and remote computers to authenticate using cached security credentials. PLEASE NOTE: DO NOT CACHE YOUR ADMINISTRATIVE ACCOUNT!!! Instead, create a separate administrative account for that remote office and cache that.

3. One of the sweet things about SBS is all the pre-configured Group policy objects that enable the automatic configuration of the WSUS server, folder redirection, and a bunch of other great stuff. You have to keep in mind that off of the above config, if left alone up to this point, the computers in the branch office will use the folder redirection settings and WSUS settings from the main office. Remote users folder redirection (whatever you have selected, in my case My Documents and Desktop redirection) locations will be stored on the main HQ server. If you’re alright with this and not concerned about the size of the user folders, you can leave this. What I needed to do (for reasons of simple disaster recovery purposes) is have the folder re-directions for the branch office users store the redirection on their own local branch server. Also, we need to have the computers connect to the local branch WSUS server as well (we don’t want each computer pulling updates over the VPN connection as this will use up tons of bandwidth). What’s really neat is when users open applications via RemoteApp (over RDS), if they export files to their desktop inside of RemoteApp, it’ll actually be immediately available on their computer desktop since the RDS server is using these GPOs.

To do this, we’ll need to duplicate and modify a couple of the default GPOs, and also create some OU (Organizational Unit) containers inside of Active Directory so we can apply the new GPOs to them.

First, under “SBSComputers” create an OU called “Branch01Comps” (or call it whatever you want). Then under “SBSUsers” create an OU called “Branch01Users”. Now keep in mind you want to have this fully configured before any users log on for the first time. All of this configuration should be done AFTER the computer is joined (using the SBS connect) to the domain and AFTER the users are configured, but BEFORE the user logs in for the first time. Move the branch office computer accounts to the new Branch office computers OU, and move the Branch office user accounts to the Branch office users OU.

Now open up the Group policy Management Management Console. You want to duplicate 2 GPOs: Update Services Common Settings Policy (rename the duplicate to “Branch Update Services Common Settings Policy” or something), and Small Business Server Folder Redirection Policy (rename the duplicate to “Branch Folder Redirection” or something).

Link the new duplicated Update Services policy to the Branch Computers OU we just created, and link the new duplicated folder redirection to the new users policy we just created.

Modify the duplicated server update policy to reflect the address of the new branch WSUS replica server. Computers at the branch office will now pull updates from that server.

As for Folder redirection, it’s a bit tricky. You’ll need to create a share (with full share access to all users), and then set special file permissions on the folder that you shared (info available at http://technet.microsoft.com/en-us/library/cc736916%28v=ws.10%29.aspx). On top of that, you’ll need to find a way to actually create the child users folders under that share/folder in which you created. I did this by going in to active directory, opening each remote user, and setting their profile variable to the file share. When I hit apply this would create a folder with their username with the applicable permissions under that share, after this was done, I would undo that variable setting and the directory created would stay. Repeat this for each remote user at that specific branch office. You’ll also need to do this each time you add a new user if they bring on more staff, you’ll also need to add all new computers and new users to the appropriate OUs, and security groups we’ve created above.

FINALLY you can now go in to the GPO you duplicated for Branch Folder redirection. Modify the GPO to reflect the new storage path for the redirection objects you want (just a matter of changing the server name).

4. Configure Active Directory Sites and Services. You’ll need to go in to Active Directory Sites and Services and configure sites for each subnet you have (you main HQ subnet, branch 1 subent, and branch 2 subnet), and set the applicable domain controller to those sites. In my case, I created 3 sites, and configured the HQ subnet and second branch to authenticate off the main SBS PDC, and configured the first branch (with their own RODC) to authenticate off their own RODC. Essentially, this tells the computers which domain controller they should be authenticating against.

 

And you’re done! (I don’t think I’ve forgotten anything). Few things to remember, whenever adding new users and/or computers to the branch, ALWAYS join using SBS wizard, add computer to the branch OU, add user to the branch OU, create the users master redirection folder using the profile var in the AD user object, and separately add both user and computer accounts as members of the security group we created to cache credentials.

And remember, always always always test your configuration before throwing it out in to production. In my case, I got it running first try without any problems, but I let it run as a test environment for over a month before deploying to production!

 

We’ve had this environment running for months now and it’s working great. What’s even cooler is how well the Astaro Security Gateway (Sophos UTM) is handling the multiple WAN connections during failures, it’s super slick!

Feb 202013
 

Recently it was time to refresh a client’s disaster recovery solution. We were getting ready to release our dependance on our 5 year old HP MSL2024 with an LTO-4 tape drive, and implement a new HP MSL2024 library with a SAS LTO-6 tape drive. We need to use tape since the size of the backup requirements for a full back up are over 6TB.

The server that is connected to all this equipment is an HP Proliant DL360 G6 with a HP Smart Array P800 Controller. The P800 already has an HP StorageWorks MSA60 unit attached to it with 12 drive

Documentation for the P800 mentioned tape drive support. While I know that the P800 is only capable of 3Gb/sec, this is more that enough and chances are the hard drive will be maxed out reading anyways.

Anyways, client approved purchase, brought in the hardware and installed it. First we had to install Backup Exec 2012 (since only the 2012 SP1a HCL specifies support for LTO-6), which was messy but we did it. Then we re-configured all of our backup jobs, since the old jobs were migrated horribly.

When trying to run our first backup, the backup failed. I tried again numerous times, only to get these errors:

  • Storage device “HP 07″ reported an error on a request to rewind the media.
  • Final error: 0xe00084f0 – The device timed out.
  • Storage device “HP 07″ reported an error on a request to write data to media.
  • Storage device “HP 6″ reported an error on a request to write data to media.
  • PvlDrive::DisableAccess() – ReserveDevice failed, offline device
  • ERROR = 0x0000001F (ERROR_GEN_FAILURE)

Also, every time the backup would fail, the Library and the Tape drive would disappear from the computers “Device Manager”. Essentially the device would lose it’s connection. Even when logging in to the HP MSL2024 web interface, it would state the SAS port is disconnected after a backup job would fail. To resolve this, you’d have to restart the library and restart the Backup Exec services. One interesting thing, when this occurred, my companies monitoring and management software would report a RAID failure had occured at the customers site, until the MSL was restarted (this was kinda cool).

 

I immediately called HP support. They mentioned the library had a firmware up 5.80 and asked to try to update. We did and it failed since the firmware file didn’t match it’s checksum, I was told that this is not important as 5.90 doesn’t contain any major changes. We continued to spend 6 hours on the phone trying to disable insight agents, check drivers, etc… Finally he decided to replace the tape drive.

Since LTO-6 is brand new technology, even with a 4 hour response contract, it took HP around 2 weeks to replace the drive since none were in Canada. During this time, I called two other separate times. The second tech told me that at the moment, no HP controllers support the HP LTO-6 tape drives (you’re kidding me right?), and the 3rd said he couldn’t provide me any information as there’s nothing in the documentation that specifies what controllers were compatible. All 3 tech’s mentioned that having the P800 controller in the server host both the MSA60 and the MSL2024 is probably causing the issues.

We received the new tape drive, tested, and the backups failed. I sent the drive back (which was a repaired unit, and kept the original brand new one). After this I tried numerous things, google’d for days. Finally I was just about to quote the client a new controller card, when I finally decided to give HP another call.

On this call, he escalated the issue to engineers. Later that night I received an e-mail stating that library firmware 5.90 is required for support for the LTO-6 tape drives. I was shocked, angry, etc… It turns out that library firmware 5.80 was “Recalled” due to major issues a while back.

Since LTT couldn’t load the firmware, I just downloaded it manually and flashed it via the MSL 2024 web interface. After this restarted the Backup Exec services, performed an inventory, and did a minor backup (around 130GB). Keep in mind that when the backups originally failed, it didn’t matter the size, the backup would simply fail just before it completed.

The backup completed! Later on that night I ran a full complete backup of 5TB (2 servers and 2 MSA60s) and it completely 100% successfully. Even with the MSA60 under extreme load maxing out the drives, this did not in any way impede performance of the LTO-6 tape drive/library.

 

So please, if you’re having this issue consider the following:

1. Tape library must be at firmware version 5.90 to support LTO-6 Tape drives. Always always always make sure you have the latest firmware.

2. I have a working configuration of a P800 controlling both an HP MSA60, and a HP MSL 2024 backup library and it’s working 100%

3. Make sure you have Backup Exec 2012 SP1a installed as it’s required for LTO-6 compatibility (make sure you read about the major changes upgrading to 2012 first, I can’t stress this enough!!!)

 

I hope this helps some of you out there as this was consuming my life for numerous weeks.

Nov 222012
 

Just something I wanted to share in case anyone else ran in to this issue…

At a specific client we have 2 X MSA60 units attached via Smart Array P800 controllers to 2 X DL360 G6 servers. These combo of server, controller, and storage units were purchased just after they were originally released from HP.

I’m writing about a specific condition in which after a drive fails in RAID 5, during rebuild, numerous (and I mean over 70,000) event log entries in the event viewer state: “Surface analysis has repaired an inconsistent stripe on logical drive 1 connected to array controller P800 located in server slot 2. This repair was conducted by updating the parity data to match the data drive contents.”

 

One one of these arrays, shortly after a successful rebuild while the event viewer was spitting these errors out, had another drive fail. At this point the RAID array went offline, and the entire RAID array and all it’s contents were unrecoverable. Keep in mind this occurred after the rebuild, while a surface scan was in progress. In this specific case we rebuilt the array, restored from backup and all was good. After mentioning this to HP support techs, they said it was safe to ignore these messages as they were fine and informational (I didn’t feel this was the case). After creating the new RAID array on this specific unit, we never saw these messages on that unit again.

On the other MSA60 unit however, we regularly received these messages (we always keep the firmware of the MSA60 unit, and the P800 controller up to date). Again numerous times asked HP support and they said we could safely ignore these. Recently, during a power outage, the P800 controller flagged it’s cache batteries as failed, at the same time a drive failed and we were yet again presented with these errors after the rebuild. After getting the drive replaced, I contacted HP again, and finally insisted that they investigate this issue regarding the event log errors. This specific time, new errors about parity were presenting themselves in the event viewer.

After being put on hold for some time, they came back and mentioned that these errors are probably caused because the RAID array was created with a very early firmware version. They recommended to delete the logical array, and re-create it with the latest firmware to avoid any data loss. I specifically asked if there was a chance that the array could fail due to these errors, and the fact it was created with an early firmware version, and they confirmed it. I went ahead, created backups, deleted the array and re-created it, restored the back and the errors are no longer present.

 

I just wanted to create this blog post, as I see numerous people are searching for the meaning of these errors, and wanted to shed some light and maybe help a few of you out, to help you avoid any future catastrophic problems!

Jun 032012
 

Well, for the longest time I have been running a vSphere 4.x cluster (1 X ML350 G5, 2 X DL360 G5) off of a pair of HP MSA20’s connected to a SuperMicro server running Lio-Target as a iSCSI Target on CentOS 6.

This configuration has worked perfectly for me for almost a year, requiring absolutely no maintenance/work at all.

Recently I moved, so I had to move all my servers, storage units, etc… When I got in to the new place, and went to power everything up, I noticed that my first drive had failed upon initializing one of the MSA20 units.. I replaced this drive, let it rebuild, and thought this would be the end of the issue, but I was incorrect. (Just so everyone knows, these units had been on continuously for 8+ months before turning them off to move).

For months since this happened and a successful rebuild, at times of high I/O (backing up to a NFS share using GhettoVCB), the logical drive in the array just disappears. I have each MSA20 connected to it’s own HP SmartArray 6400/6402 controller. When the logical drive disappears, I notice that the “Drive Failure” LED on the SA 640x controller illuminates. When this happens, I have to shut off all physical servers, the storage server (running Lio-Target), and the MSA’s, and restart everything.

Sometimes it is worse than others (example I’ve been dealing with this issue non-stop all weekend with no sleep). Even under low I/O I’ll be starting the VMs and it will just lose it again. While other times, I can run it for weeks as long as I keep the I/O to a minimum.

I’ve read numerous other articles and posts of other people having the same issue. These people have had HP replace every single item inside of the MSA20 unit (except the drives) and the issue still occurs. This made me start thinking.

This weekend, I NEEDED to get a backup done. While doing this, the issue occurred, and it got to the point where I couldn’t even start the VMs, let alone back them up. I figured that since other people have had this issue, and since replacing all hardware hasn’t fixed it (I even moved a RAID array from one MSA20 to the other I have with no affect), then it has to be the drives themselves.

There are two possibilities, either the drives are failed, and the MSA20 isn’t reporting them as failed (which I’ve seen happen), or the way the MSA20 creates the RAID array has issues. I ran a ADU report and carefully read the entire report. Absolutely no issues reading/writing to the drives, and the MSA20 has no events in it’s log. It HAS to be the way the RAID array is created on the disks.

Please do not try this unless you know exactly what you’re doing. This is very dangerous and you can lose all your data. This is just me reporting on my experience and usage.

In desperation I thought to myself this all happened when a drive failed and I put a new disk in the array. Since I couldn’t back any of my data up, or let alone even start the VMs, I decided to start behaving dangerously. I kept all my vSphere hosts offline, and just turned on the MSA20 units and the SuperMicro server that is attached to them. I then proceeded to remove a drive, re-insert it, let it rebuild over 3 hours, then when healthy and rebuilt, do it to the next drive. I have a RAID 5 Array containing 4 X 500GB disks, so this actually took me a day to do (had to remove/re-insert, rebuild, then next drive).

After finally removing and rebuilding each drive in the array, I finally decided to boot up the vSphere servers, and run a backup of my 12 VMs. Not only did everything seem faster, but I completed the backup without any problems. This shows that there’s a very high chance that the RAID stuff on the drives is either corrupt, damaged, or just wasn’t implemented very nicely. Rebuilding each drive seemed to fix this. I’ll report in a few weeks to let you know for sure that it’s resolved!

 

Hope this helps someone out there!

Jan 262012
 

Well, for all you people out there considering extending your MSA20 RAID Array or transforming the RAID type, but are concerned about how long it will take…

I recently added a 250GB drive to a RAID5 array consisting of 9 X 250GB disks. Adding another 250GB disk to the RAID 5 array, took less then 8 hours (it actually could have been WAY less) to add the drive. Extending the logical partition took no time at all.

One thing I do have to caution though, I did a test transformation converting a RAID 5 array to a RAID 6. It started off going fast, once it hit 25% it sat there, only increasing 1% every 1-2 days. After 4 days I finally killed the transformation. PLEASE NOTE: There is a chance this may have had to do with a damaged drive, and I think that may have had something to do with the issue. This will need further testing. Also, just so you are aware, you CANNOT cancel a transformation. I stopped mine by simply turning off the unit, and ALL data was destroyed. If you start a transformation, you NEED to let it complete.

ALWAYS insure you have a COMPLETE backup before doing these types of things to a RAID array!

Dec 142011
 

Recently I had the task of setting up a Site-to-Site IPSec tunnel between my office and one of my employees home office. At my main business HQ we have an Astaro Security Gateway running inside of a vSphere 4 cluster. However I had to find the cheapest way to get the employee hooked up.

The main tasks of the VPN endpoint at the employee’s site was:

1) Filter web, pop3, and provide security for the devices behind the ASG at the home office (1-3 computers, and other random devices)

2) Provider a Site to Site VPN connection and to allow the user access to internal resources, along with providing access to our VoIP PBX (VoIP phone at employee site)

3) Provide access to other resources such as exchange, CRM, etc… And reverse management of devices at home office from HQ

First I needed to find an affordable computer to install the Astaro Security Gateway V8 software appliance on to. My company is an HP Partner, and we love their products, so I decided to purchase a new computer that would be powerfull enough to host the ASG software, and also be protected under HP’s business warranty. I wanted the system to have enough performance that in the future, if the home office was decommisioned, we would be able to use it still as an ASG device but for something else (let’s say a real remote office).

After taking a look at our distributor to find out what was immediately available (as this was a priority), we deiced to pick up a HP Compaq 4000 Pro Small Form Factor PC. Below are the specs:

HP Compaq 4000 Pro Small Form Factor PC

Part Number: LA072UT (Or LA072UT#ABA for the English version in Canada)

System features
Processor Intel® Core™2 Duo Processor E7500 (2.93 GHz, 3 MB L2 cache, 1066 MHz FSB)
Operating system installed Genuine Windows® 7 Professional 32-bit
Chipset Intel® B43 Express
Form factor Small Form Factor
PC Management Available for free download from www.hp.com/go/easydeploy: HP Client Automation Starter; HP SoftPaq Download Manager; HP Client Catalog for Microsoft SMS; HP Systems Software Manager
Memory
Standard memory 2 GB 1333 MHz DDR3 SDRAM
Memory slots 2 DIMM
Storage
Internal drive bays One 3.5″
External drive bays One 3.5″
One 5.25″
Internal drive 500 GB 7200 rpm SATA 3.0 Gb/s NCQ, Smart IV
Optical drive SATA SuperMulti LightScribe DVD writer
Graphics
Graphic card Integrated Intel Graphics Media Accelerator 4500
Expansion features
I/O ports 8 USB 2.0
1 serial (optional 2nd)
1 parallel (optional)
1 PS/2 keyboard
1 PS/2 mouse
1 VGA
1 DVI-D
1 microphone/headphone jack
1 audio in
1 audio line out
1 RJ-45
Slots 2 low-profile PCI
1 low-profile PCIe x16
1 low-profile PCIe x1
Media devices
Audio Integrated High Definition audio with Realtek 2 channel ALC261 codec
Communication features
Network interface 10/100/1000
Power and operating requirements
Power Requirements 240W power supply – active PFC
Operating Temperature Range 10 to 35°C
Dimensions and Weight
Product weight Starting at 7.6 kg
Dimensions (W x D x H) 33.8 x 37.85 x 10 cm
Security management
Security management Stringent Security (via BIOS)
SATA Port Disablement (via BIOS)
Drive Lock
Serial, Parallel, USB enable/disable (via BIOS)
Optional USB Port Disable at factory (user configurable via BIOS)
Removable Media Write/Boot Control
Power-On Password (via BIOS)
Setup Password (via BIOS)
HP Chassis Security Kit
Support for chassis padlocks and cable lock devices
What’s included
Software included Microsoft Windows Virtual PC
HP Power Assistant
Warranty features Protected by HP Services, including a 3 years parts, 3 years labour, and 3 years onsite service (3/3/3) standard warranty. Terms and conditions vary by country. Certain restrictions and exclusions apply.

This system was spec’d very nicely for the requirements we had. Another huge bonus is that it was covered under a factory 3 year warranty from HP. Which means that if anything failed, we would have next business day replacement (I love this, and so do my clients who all purchase HP). The one downside is that the system shipped with a Windows 7 license which we wouldn’t be using, but for the price of the system, it didn’t really matter.

The system only came standard with one Gigabit NIC (Network card), however we need two since this device is acting as a firewall/router. It’s a Small Form Factor system, so we had to find a second network adapter which was compatible with the computers case form factor. The card which we purchased was:

HP – Intel Gigabit CT Desktop NIC

Part Number: FH969AA

Although the computer above is not in the compatibility list for the network card, the network card still worked perfect. Once received, we simply replace the case bracket on the card with one that shipped with it for small form factor computers.

We then burned the .ISO image of the ASG V8 software appliance, and proceeded to install it on the system. It installed (along with the 64-bit kernel) perfectly on the computer. After the install was completed, we configured it to connect to our main central Astaro Command Center and shipped the device out to the employee’s home office.

Once installed, we logged on to the Astaro Command Center user interface, and created a Site to Site IPsec using the wizard. Within 2-5 seconds the connection was established and everything was working 100%.

After using this for a few days, I checked to make sure the computer was powerful enough to be providing the services required, and it was without any problems.

Just wanted to share my experience in case anyone else is doing something similar to what I did above. If you were to reproduce this, all the hardware should be under $700.00 CAD.

Oct 042010
 

For the first time I decided to test compability of non-hp branded drives inside of the HP MSA20 array.

First I installed a 500GB SATA2 Seagate drive I had laying around. It detected it perfectly and is working.

The ACU on CentOS 5.5 shows as follows:

Logical Drive
Status    OK
Drive Number    1
Drive Unique ID    XXXXXXXXXXXXXXXXXXXXXXXXXXX
Size    476908 MB
Fault Tolerance    RAID 0
Heads    255
Sectors/Track    32
Cylinders    65535
Strip Size    128 KB
Array Accelerator    Enabled
Disk Name    /dev/cciss/c0d0

I also just populated another bay with a Seagate 2TB SATA2 drive I just found, and it works aswell. ACU reports as follows:

Physical Drive
Status    OK
Drive Configuration Type    Unassigned
Size    2000.3 GB
Drive Type    SATA
Model    Seagate ST32000542AS
Serial Number    XXXXXXXX
Firmware Version    CC34

Keep in mind that no matter what size of drive you have, there is a 32-bit SCSI LUN limitation of under 2TB (I don’t know exactly what it is). Although you can create large arrays, your logical drives will not be able to go over this limitation.

Oct 032010
 

I’ve been messing around with this one old array that has a slew of problems for some time now.

One thing I just figured out today, if you have an MSA20 and even though all the drives have a green light and are NOT marked as failed, the drives still can be in very bad shape.

Example: I have 5 X 250GB SATA drives, and EACH of the 5 drives had failed. The MSA20 said the drives were good, however multiple I/O errors were showing up on the kernel logs on a CentOS 5 install. Taking the drives out and testing them on a different system and NOT using the MSA20 shows that the drives are physically dead. Furthermore, popped in some random 500GB SATA drive I had sitting around in to the MSA20, and now the MSA20 works beautifully!