Jun 092012
 

Recently, I’ve started to have some issues with the HP MSA20 units attached to my SAN server at my office. These MSA20 units stored all my Virtual Machines inside of a VMFS filesystem which was presented to my vSphere cluster hosts over iSCSI using Lio-Target. In the last while, these logical drive has just been randomly disappearing, causing my 16+ virtual machines to just halt. This always requires me to shut off the physical hosts, shut off the SAN server, shut off the MSA20s, and bring everything all the way back up. This causes huge amounts of downtime, and it just a pain in the butt…

I decided it was time for me to re-do my storage system. Preferably, I would have purchased a couple HP MSA60s and P800 controllers to hook it up to my SAN server, but unfortunately right now it’s not in the budget.

A few years ago, I started using software RAID. In the past I was absolutely scared of it, thought it was complete crap, and would never have touched it, but my opinion drastically changed after playing with it, and regularly using it. While I still recommend businesses to use Hardware based RAID systems, especially for mission critical applications, I felt I could try out software RAID for the above situation since it’s more of a “hobby” setup.

I read that most storage enthusiasts use either the Super Micro AOC-SASLP-MV8, or the LSI SAS 9211-8i. Both are based off different chipsets (both of which are widely used in other well known cards), and both have their own pro’s and con’s.

During my research, I noticed a lot of people who run Windows Home Server were utilizing the AOC Super Micro Card. And while using WHS, most reported no issues whatsoever, however it was a different story when reading posts/blog articles from people using Linux. I don’t know how accurate this was, but apperently a lot of people had issues with this card under heavy load, and some just couldn’t get it running inside of linux.

Then there is the LSA 9211-8i (which is the same as the extremely popular IBM M1015). This bad boy supports basic RAID operations (1, 0, 10), but most people use it with JBOD and simply use Linux MD Software RAID. While there was numerous complaints about users having issues with their systems even detecting their card, other users also reported issues caused by the BIOS of this card (too much memory for the system to boot). When people did get this card working though, I read of mostly NO issues under Linux. Spent a few days confirming what I already had read and finally decided to make the purchase.

Both cards support SAS/SATA, however the LSI card supports 6Gb/sec SAS/SATA. Both also have 2 internal SFF8087 Mini-SAS connectors to hook up a total of 8 drives directly, or more using an SAS expander. The LSI card uses a PCIe (V.2) 8x slot, vs the AOC-SASLP which uses PCIe (V.1) 4x slot.

I went to NCIX.com and ordered the LSI 9211-8i along with 2 breakout cables (Card Part#: LSI00194, Cable Part#: CBL-SFF8087OCF-06M). This would allow me to hook up a total of 8 drives (even though I only plan to use 5). I already have an old computer I already use with an eSATA connector to a Sans Digital SATA Expander for NFS, etc… that I plan on installing the card in to. I also have an old Startech SATABAY5BK enclosure which will hold the drives and connect to the controller. Finished case:

Server with disk enclosures (StarTech SATABAY5BK)

(At this point I have the enclosure installed along with 5 X 1TB Seagate 7200.12 Barracuda drives)

Finally the controller showed up from NCIX:

LSI SAS 9211-8i

I popped this card in the computer (which unfortunately only had PCIe V1), and connected the cables! This is when I ran in to a few issues…

-If no drives were connected, the system would boot and I could succesfully boot to CentOS 6.

-If at all I pressed CTRL+C to get in to the cards interface, the system would freeze during BIOS POST.

-If any drives were connected and detected by the cards BIOS, the system would freeze during BIOS POST.

I went ahead and booted in to CentOS 6. Downloaded the updated firmware and BIOS and flashed the card. The flashing manual was insane, but had to read it all to make sure I didn’t break anything. First I updated both the firmware and BIOS (which went ok), however I couldn’t convert the card from IR firmware to IT firmware due to errors. I google’d this and came up with a bunch of articles, but this one: http://brycv.com/blog/2012/flashing-it-firmware-to-lsi-sas9211-8i/ was the only one that helped and pointed me in the right direction. Essentially just stating you have to use the DOS flasher, erase the card (MAKING SURE NOT TO REBOOT OR YOU’D BRICK IT), and then flashing the IT Firmware. This worked for me, check out his post! Thanks Bryan!

Anyways, after updating the card and converting it to the IT firmware. I still had the BIOS issue. I tried the card in another system, and still had a bunch of issues. I finally removed 1 of 2 video cards and populated the card in a Video Card slot, and I finally could get in to the BIOS. First I enabled staggered spin-up (to make sure I don’t blow the PSU on the computer with a bunch of drives starting up at once), changed some other settings to optimize, and finally disabled the boot BIOS, and changed the option for the adapter to be disabled for boot, and only available to the OS. When removing the card, and putting it in the target computer, this worked. Also noticed that the staggered spin-up started during the Linux kernel startup when initializing the card. Here’s a copy of the kernel log:

mpt2sas version 08.101.00.00 loaded
mpt2sas 0000:06:00.0: PCI INT A -> Link[LNKB] -> GSI 18 (level, low) -> IRQ 18
mpt2sas 0000:06:00.0: setting latency timer to 64
mpt2sas0: 64 BIT PCI BUS DMA ADDRESSING SUPPORTED, total mem (3925416 kB)
mpt2sas 0000:06:00.0: irq 24 for MSI/MSI-X
mpt2sas0: PCI-MSI-X enabled: IRQ 24
mpt2sas0: iomem(0x00000000dfffc000), mapped(0xffffc900110f0000), size(16384)
mpt2sas0: ioport(0x000000000000e000), size(256)
mpt2sas0: sending message unit reset !!
mpt2sas0: message unit reset: SUCCESS
mpt2sas0: Allocated physical memory: size(7441 kB)
mpt2sas0: Current Controller Queue Depth(3305), Max Controller Queue Depth(3432)
mpt2sas0: Scatter Gather Elements per IO(128)
mpt2sas0: LSISAS2008: FWVersion(13.00.57.00), ChipRevision(0x03), BiosVersion(07.25.00.00)
mpt2sas0: Protocol=(Initiator,Target), Capabilities=(TLR,EEDP,Snapshot Buffer,Diag Trace Buffer,Task Set Full,NCQ)
mpt2sas0: sending port enable !!
mpt2sas0: host_add: handle(0x0001), sas_addr(0x5000000080000000), phys(8)
mpt2sas0: port enable: SUCCESS

SUCCESS! Lot’s of SUCCESS! Just the way I like it! Haha, card intialized, had access to drives, etc…

Configured the RAID 5 Array using a 256kb chunk size. I also changed the “stripe_cache_size” to 2048 (the system has 4GB of RAM) to increase the RAID 5 performance.

cd /sys/block/md0/md/

echo 2048 > stripe_cache_size

At this point I simply formatted the drive using EXT4. Configured some folders, NFS exports, and then used Storage vMotion to migrate the Virtual Machines from the iSCSI target, to the new RAID5 array (currently using NFS). The main priority right now was to get the VMs off the MSA20 so I could at least create a backup after they have been moved. Next step, I’ll be re-doing the RAID5 array, configuring the md0 device as a iSCSI target using Lio-Target, and formatting it with VMFS. The performance of this Software RAID5 array is already blowing the MSA20 out of the water!

So there you have it! Feel free to post a comment if you have any questions or need any specifics. This setup is rocking away now under high I/O with absolutely no problems whatsoever. I think I may go purchase another 1-2 of these cards!

Nov 282011
 

Just thought I’d do up a quick little post about an issue I’ve been having for some time, and just got it all fixed.

I’ve been running Astaro Security Gateway inside of a VMware environment for a few years. When version 8.x came out, I went ahead and simply attached the ISO to the VM and re-installed over the old v7 and restored the config. This worked great, and for the longest time I had no real issues.

I noticed from time to time that with packet sniffs, there was quite a few retransmissions and TCP segments lost. This didn’t really pose any issues, and didn’t cause any problems, however it was odd.

Recently, I had to configure a Site to Site IPsec VPN between my office, and one of my employees to provide exchange, VoIP, etc… With astaro this is fairly easy, few clicks and it should work simple, however I started noticing huge issues with file transfers, whether being transferred over SMB (Windows File Sharing), or SCP/SSH. Transfers would either completely halt when started, transfer a few couple hundred kilobytes, or transfer half of the file until it would simply halt and become unresponsive.

After 3-4 days of troubleshooting, I went ahead and did a packet sniff, noticed there were numerous TCP segments lost, fragmentation, etc… Initially I beleive that maybe MTU configuration may have had something to do with it, however TCP/IP and the Astaro device should have taken care of properly setting the MTU on the IPsec automatically.

After trying fresh installs of ASG, etc… and no behaviour change, I finally decided to take a few days away and give it a shot later. I’ve troubleshot this from every avenue and for some reason the issue is still existing. I finally figured that the only thing I haven’t checked was with my VMware vSphere environment. Checked the settings, all was good, however I did notice that the NICs for the ASG vm (which were created by the v7 appliance) were set as flexible, and inside of the VM were detected as some type of AMD network adapter. I found this odd.

After shutting down the ASG VM, removing the NICs and configuring new ones using E1000, all of a sudden the issue was fixed, the IPsec Site to Site VPN functioned properly, and all the network issues seen in network captures were resolved.

I hope this helps some other people who may be frustrated dealing with the same issue.

Apr 152011
 

I thought I’d pass this on to all you iSCSI enthusiasts out there!

This morning I received an e-mail forwarded on by someone.

Apparently the Microsoft iSCSI Target Software is now free and runs on Windows Server 2008 R2.

Article:

http://blogs.technet.com/b/canitpro/archive/2011/04/05/the-microsoft-iscsi-software-target-is-now-free.aspx

I plan on getting this installed, setup, and configured sometime in the next couple weeks to test with my VMware vSphere environment. I’ll post my results 🙂 Happy SANing!

Jan 102011
 

I notice quite a bit of traffic coming in (alot of it is the same people coming back) searching for information on VMware vSphere using iSCSI, specifically Lio-Target (because of the compatibility with SCSI persistant reservations).

In the past I’ve been jumping all over the place testing Lio-Target on different distrobutions, test scenarios, etc… I’ve officially implemented it into my production network and just wanted to report it’s been running solid for a few months now.

Current Working Configuration (Stable):

Currently at the moment I have numerous HP Servers (ML350s, DL360s) running ESXi off an internal usb-key. These ESXi hosts are accesing numerous iSCSI targets over gigabit hosted on a DL360 G6 with 2 X MSA20 storage units. The server hosting the storage is running Ubuntu 10.10 and has been rock solid with absolutely no issues. I’m fully utilizing VMotion amongst all the hosts, and all hosts have concurrent access to the iSCSI targets. This is running in full production and I’m fully confident in the configuration and setup.

Future Plans:

Lio-Target is going upstream in to the Linux kernel on the next release (2.6.38). With the testing I did (and blogged about) in the past months, I have not been able to get the newer versions of Lio-target running stable on CentOS. Once a new version of CentOS is released, or there is a kernel upgrade available to bring CentOS to 2.6.38, I will be installing CentOS on to the storage server and adding more disk space. Once this change is complete, that will conclude any future changes for a while (excluding adding more ESXi hosts).

If anyone has any questions on my setup, or configuration with something similar, or have any issues with Lio-Target, please feel free to leave a comment and I’ll see if I can help!

Oct 082010
 

Just got my test network setup:

  1. HP Proliant ML350 G5 – Running ESXi
  2. HP Proliant DL360 G5 – Running ESXi
  3. HP Proliant DL360 G5 – Running ESXi
  4. Custom Intel Server – Running vSphere (management server)
  5. Super Micro Server – Running CentOS 5.5 with lio-target (iSCSI Target)
  6. 2 X HP MSA20 – Running Multiple Arrays/Targets connected to Super Micro Server

The HP servers are booting ESXi via USB keys. The Super Micro Server has one single internal drive for OS, and connected to multiple MSA20 arrays acting as iSCSI targets. The custom intel server is running the vSphere vCenter Management software.

I’ve Storage vMotion’ed multiple test VMs to the iSCSI arrays. Currently testing! Speed is AMAZING! And judging from the logs on the Super Micro Server, the iSCSI SCSI persistent reservations ARE being handled properly J

More to come, stay posted!

Update: After testing this for a few days, I noticed that although SCSI reservations were being handled properly (which is great), for some reason the kernel on the storage server would stall. Causing the storage system to crash. I’ve only been able to get it to do this, when two ESXi boxes are concurrently accessing the same datastore. I’ve been trying to re-replicate this in different circumstances (one ESXi initiator, using Windows to test the target, etc… but have been unable). I might test the 4.0 release candidate soon.