Apr 122014
 

Recently I decided it was time to beef up my storage link between my demonstration vSphere environment and my storage system. My existing setup included a single HP DL360p Gen8, connected to a Synology DS1813+ via NFS.

I went out and purchased the appropriate (and compatible) HP 4 x 1Gb Server NIC (Broadcom based, 4 ports), and connected the Synology device directly to the new server NIC (all 4 ports). I went ahead and configured an iSCSI Target using a File LUN with ALUA (Advanced LUN features). Configured the NICs on both the vSphere side, and on the Synology side, and enabled Jumbo frames of 9000 bytes.

I connected to the iSCSI LUN, and created a VMFS volume. I then configured Round Robin MPIO on the vSphere side of things (as always I made sure to enable “Multiple iSCSI initators” on the Synology side).

I started to migrate some VMs over to the iSCSI LUN. At first I noticed it was going extremely slow. I confirmed that traffic was being passed across all NICs (also verified that all paths were active). After the migration completed I decided to shut down the VMs and restart to compare boot times. Booting from the iSCSI LUN was absolutely horrible, the VMs took forever to boot up. Keep in mind I’m very familiar with vSphere (my company is a VMWare partner), so I know how to properly configure Round Robin, iSCSI, and MPIO.

I then decided to tweak some settings on the ESXi side of things. I configured the Round Robin policy to IOPS=1, which helped a bit. Then changed the RR policy to bytes=8800 which after numerous other tweaks, I determined achieved the highest performance to the storage system using iSCSI.

This config was used for a couple weeks, but ultimately I was very unsatisfied with the performance. I know it’s not very accurate, but looking at the Synology resource monitor, each gigabit link over iSCSI was only achieving 10-15MB/sec under high load (single contiguous copies) that should have resulted in 100MB/sec and higher per link. The combined LAN throughput as reported by the Synology device across all 4 gigabit links never exceeded 80MB/sec. File transfers inside of the virtual machines couldn’t get higher then 20MB/sec.

I have a VMWare vDP (VMWare Data Protection) test VM configured, which includes a performance analyzer inside of the configuration interface. I decided to use this to test some specs (I’m too lazy to actually configure a real IO/throughput test since I know I won’t be continuing to use iSCSI on the Synology with the horrible performance I’m getting). The performance analyzer tests run for 30-60 minutes, and measure writes and reads in MB/sec, and Seeks in seconds. I tested 3 different datastores.

 

Synology  DS1813+ NFS over 1 X Gigabit link (1500MTU):

Read 81.2MB/sec, Write 79.8MB/sec, 961.6 Seeks/sec

Synology DS1813+ iSCSI over 4 x Gigabit links configured in MPIO Round Robin BYTES=8800 (9000MTU):

Read 36.9MB/sec, Write 41.1MB/sec, 399.0 Seeks/sec

Custom built 8 year old computer running Linux MD Raid 5 running NFS with 1 X Gigabit NIC (1500MTU):

Read 94.2MB/sec, Write 97.9MB/sec, 1431.7 Seeks/sec

 

Can someone say WTF?!?!?!?! As you can see, it appears there is a major performance hit with the DS1813+ using 4 Gigabit MPIO iSCSI with Round Robin. It’s half the speed of a single link 1 X Gigabit NFS connection. Keep in mind I purchased the extra memory module for my DS1813+ so it has 4GB of memory.

I’m kind of choked I spent the money on the extra server NIC (as it was over $500.00), I’m also surprised that my custom built NFS server from 8 years ago (drives are 4 years old) with 5 drives is performing better then my 8 drive DS1813+. All drives used in both the Synology and Custom built NFS box are Seagate Barracuda 7200RPM drives (Custom box has 5 X 1TB drives configured RAID5, the Synology has 8 x 3TB drives configured in RAID 5).

I won’t be using iSCSI  or iSCSI MPIO again with the DS1813+ and actually plan on retiring it as my main datastore for vSphere. I’ve finally decided to bite the bullet and purchase an HP MSA2024 (Dual Controller with 4 X 10Gb SFP+ ports) to provide storage for my vSphere test/demo environment. I’ll keep the Synology DS1813+ online as an NFS vDP backup datastore.

Feel free to comment and let me know how your experience with the Synology devices using iSCSI MPIO is/was. I’m curious to see if others are experiencing the same results.

Apr 112014
 

Earlier today I was doing some work in my demonstration vSphere environment, when I had to modify some settings of one of my VMs that are setup as the latest version (which means you can only edit the settings inside of the vSphere Web Client).

To my surprise, when logging in, immediately I received an error: “ManagedObjectReference: type = Datastore, value = datastore-XXXX, serverGuid = XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXX refers to a managed object that no longer exists or has never existed“. Also, after clicking OK, I noticed that lots of information being presented inside of the vSphere web client was inaccurate. Some Virtual Machines were being reported as sitting on different datastores (they were at one point weeks ago, however since were moved). Also, it was reporting that some Virtual Machines were off, when in fact they were on and running.

Symptoms:

-Errors about missing datastores on log on to the vSphere Web Client.

-Virtual Machines were being reported as off (turned off) even though they were running.

-Viewing VMs in vSphere client, reporting they are being stored on a different datastore then they actually are.

-Disconnecting and (re) connecting hosts have no effect on issue.

 

This freaked me out, it was a true “Uhh Ohh” moment. Something was corrupt. Keep in mind that ALL information in the vSphere client was correct and accurate, it was only the vSphere Web client that was having issues.

 

Anyways, I tried a bunch of things to fix it, and spent hours working on the problem. FINALLY I came up with a fix. If you are running in to this issue, PLEASE take a snapshot of your vCenter Server before attempting to fix it, so that you can roll back if you screw anything up (which I had to do multiple times, lol).

The Fix:

1) Stop the “VMWare vCenter Inventory Service”.

2) Delete the “data” folder inside of “Program Files\VMware\Infrastructure\Inventory Service”.

3) Open a Command Prompt with elevated privileges. Change your working directory to “Program Files\VMware\Infrastructure\Inventory Service\scripts”.

4) Run “createDB.bat”, this will reset and create a Inventory Service database.

5) Run “is-change-sso.bat https://computername.domain.com:7444/lookupservice/sdk “administrator@vSphere.local” “SSO_PASSWORD”. Change the computername.domain.com to your FQDN for your vCenter server, and change the SSO_PASSWORD to your Single Signon Admin password.

6) Start the “VMWare vCenter Inventory Service”. At this point, if you try to log on to the vSphere Web Client, it will error with: “Client is not authenticated to VMware Inventory Service”. We’ve already won half the battle.

7) We now need to register the vCenter Server with the newly reset Inventory Service. In an elevated Command Prompt (that we opened above), changed the working path to: “Program Files\VMware\Infrastructure\VirtualCenter Server\isregtool”.

8) Run “register-is.bat https://computername.domain.com:443/sdk https://computername.domain.com:10443 https://computername.domain.com:7444/lookupservice/sdk”. Change computername.domain.com to your FQDN for your vCenter server.

9) Restart the “VMware VirtualCenter Server” service. This will also restart the Management Web services.

 

BAM, it’s fixed! I went ahead and restarted the entire server that the vCenter server was running on. After this, all was good, and everything looked great inside of the vSphere Web Client. I’m actually noticing it’s running WAY faster, and isn’t as glitchy as it was before.

Happy Virtualizing! :)

Jul 082013
 

Recently I needed to upgrade and replace my storage system which provides basic SMB dump file services, iSCSI, and NFS to my internal network and vSphere cluster. As most of you know, in the past I have traditionally created and configured my own storage systems. For the most part this has worked fantastic, especially with the NFS and iSCSI target services being provided and built in to the Linux OS (iSCSI thanks to Lio-Target).

A few reasons for the upgrade: 1) I need more storage, and 2) I need a pre-packaged product that comes with warranty. Taking care of the storage size was easy (buy more drives), however I needed to find a pre-packaged product that fits my requirements for performance, capabilities, stability, support, and of course warranty. iSCSI and NFS support was an absolute must!

Some time ago, when I first started working with Lio-Target before it was incorporated and merged in to the linux kernel, I noticed that the parent company Rising Tide Systems mentioned they also provided the target for numerous NAS and SAN devices available on the market, Synology being one of them. I never thought anything of this as back then I wasn’t interesting in purchasing a pre-packaged product, until my search for a new storage system.

Upon researching, I found that Synology released their 2013 line of products. These products had a focus on vSphere compatibility, performance, and redundant network connections (either through Trunking/Link aggregation, or MPIO iSCSI connections).

The device that caught my attention for my purpose was the DS1813+.

DS1813+

Synology DS1813+

Synology DS1813+ Specifications:

  • CPU Frequency : Dual Core 2.13GHz
  • Floating Point
  • Memory : DDR3 2GB (Expandable, up to 4GB)
  • Internal HDD/SSD : 3.5″ or 2.5″ SATA(II) X 8 (Hard drive not included)
  • Max Internal Capacity : 32TB (8 X 4TB HDD) (Capacity may vary by RAID types) (See All Supported HDD)
  • Hot Swappable HDD
  • External HDD Interface : USB 3.0 Port X 2, USB 2.0 Port X 4, eSATA Port X 2
  • Size (HxWxD) : 157 X 340 X 233 mm
  • Weight : 5.21kg
  • LAN : Gigabit X 4
  • Link Aggregation
  • Wake on LAN/WAN
  • System Fan : 120x120mm X2
  • Easy Replacement System Fan
  • Wireless Support (dongle)
  • Noise Level : 24.1 dB(A)
  • Power Recovery
  • Power Supply : 250W
  • AC Input Power Voltage : 100V to 240V AC
  • Power Frequency : 50/60 Hz, Single Phase
  • Power Consumption : 75.19W (Access); 34.12W (HDD Hibernation);
  • Operating Temperature : 5°C to 35°C (40°F to 95°F)
  • Storage Temperature : -10°C to 70°C (15°F to 155°F)
  • Relative Humidity : 5% to 95% RH
  • Maximum Operating Altitude : 6,500 feet
  • Certification : FCC Class B, CE Class B, BSMI Class B
  • Warranty : 3 Years

 

This puppy has 4 gigabit LAN ports, and 8 SATA bays. There’s tons of reviews on the internet praising Synology, and their DSM operating system (based on embedded linux) on the internet, so I decided to live dangerously and went ahead and placed an order for this device, along with 8 X Seagate 3TB Barracuda drives.

Unfortunately, it’s extremely difficult to get your hands on a DS1813+ in Canada (I’m not sure why). After numerous orders placed and cancelled with numerous companies, I finally found a distributor who was able to get me one. I’ll just say the wait was totally worth it. Initially I also purchased the 2GB RAM add-on as well, so I had this available when the DS1813+ arrived.

I was hoping to take a bunch of pictures, and do thorough testing with the unit before throwing it in to production, however right from the get go, it was extremely easy to configure and use, so right away I had it running in production. Sorry for the lack of pics! :)

I did however get a chance to setup the 8 drives in RAID 5, and configured an iSCSI block based target. The performance was fantastic, no problems whatsoever. Even maxing out one gigabit connection, the resources of the unit were barely touched.

I’m VERY impressed with the DSM operating system. Everything is clearly spelled out, and you have very detailed control of the device and all services. Configuration of SMB shares, iSCSI targets, and NFS exports is extremely simple, yet allows you to configure advanced features.

After testing out the iSCSI performance, I decided to get the unit ready for production. I created 2 shared folders, and exported these via NFS to my ESXi hosts. It was very simple, quick, and the ESXi hosts had absolutely no problems connecting to the exports.

One thing that really blew me away about this unit, is the performance. Immediately after configuring the NFS exports, mounting them and using Storage vMotion to migrate 14 live virtual machines to the DS1813+ I noticed MASSIVE performance gains. The performance gains were so large, it put my old custom storage system to shame. And this is really interesting, considering my old storage system, while custom, is actually spec’d way higher then the storage unit (CPU, RAM, and the SATA controller). I’m assuming the DS1813+ has numerous kernel optimizations for storage, and at the same time does not have the overhead of a fully Linux distribution. This also means it’s more stable since you don’t have tons of applications running in the background that you don’t need.

After migrating the VMs I noticed that the virtual machines were running way faster, and were may more responsive. I’m assuming this is due to increased IOPS.

Either way I’m extremely happy with the device and fully recommend it. I’ll be posting more blog articles later detailing configuration of services in detail such as iSCSI, NFS, and some other things. I’m already planning on picking up an additional DS1513+ (5 bay unit) to act as a storage server for VM backups which I perform using GhettoVCB.

Nice job Synology :)

Jun 092012
 

So there I was… Had a custom built vSphere cluster, 3 hosts, iSCSI target (setup with Lio-Target), everything running fine, smoothly, perfect… I’m done right? NOPE!

Most of us who care, are concerned about our Disaster Recovery or Backup solution. With virtualization things get a little bit interesting in the fact that either you have a CRAZY large setup, and can use the VMware backup stuff, or you have a smaller environment and want something simple, easy to use, and with a low footprint. Configuring a backup and/or disaster recovery solution for a virtualized environment may be difficult and complicated, however after it’s fully implemented; management, use, and administration is easy, and don’t forget about the abilities and features you get with virtualization.

Reasons why virtualization rocks when it comes to backup and disaster recovery:

-Unlike traditional backups you do not have to install a bare OS to run the backup software to restore over

-You can restore to hardware that is not like nor similar to the original hardware

-Backups are now simple files that can be easily moved, transported, copied, and saved on to normal or non-normal media (you could save an entire system on to a USB key if it was big enough). On a 2TB external USB drive, you could have a backup of over 16+ virtual machines!

-Ease of recovery: Copy the backed up VM files to host/datastore, and simply hit play. Restore complete and you’re up and running!

 

So with all that in mind, here we go! (Scroll to bottom of post for a quick conclusion).

For my solution, here are some of the requirements I had:

1) Utilize snapshots to take restorable backups while the VMs are running (no downtime).

2) Move the backups to a different location while running (this could be a drive, NFS export, SMB share, etc…).

3) Have the backups stored somewhere easy to access where I can move them to a removable external USB drive to take off-site. This way, I have fast disk-to-disk access to restore backups in the event my storage system goes down or RAID array is lost (downtime would be minimal), or in the event of something more serious like a fire, I would have the USB drive off-site to restore from. Disk-to-disk backups could happen on a daily basis, and disk-to-USB could be done weekly and taken off-site.

4) In the event of a failure, be able to bring USB drive onsite, transfer VMs back and be up and running in no time.

 

So with this all in mind, I started designing a solution. My existing environment (without backup) composed of:

2 X HP DL360 G5 Servers (running ESXi)

1 X HP ML350 G5 Server (running ESXi)

1 X Super Micro Intel Xeon Server (Running CentOS 6 & Lio-Target backports: providing iSCSI VMFS)

2 X HP MSA20 Storage Units

 

First, I need a way to create snapshots of my 16+ virtual machines. After, I would need to have the snapshots moved to another location (such as a backup server). There is a free script available called ghettoVCB. ghettoVCB is a “Free alternative for backup up VM’s for ESX(i) 3.5, 4.x+ & 5.x” and is available (along with tons of documentation) at: http://communities.vmware.com/docs/DOC-8760. This script is generally ran on the ESXi host, generates a snapshot and clones it to a seperate datastore configured on that host. It does this for all virtual machines named in the VM list you specify, or for all VMs on the host if a specific switch is passed to ghettovcb.sh.

So now that we have the software, we need to have a location setup to back up to. We could either create a new iSCSI target, or we could setup a new Linux server and have a RAID array configured and formatted with ext4 and exported via NFS. This would allow us to have the NFS setup as a datastore on ESXi (so we can backup to the NFS export), and afterwards be able to access the backups natively in Linux to copy/move to a external drive formatted with EXT4.

We configure a new server, running CentOS 6 with enough storage to backup all VMs. We create NFS exports and mount these to all the ESXi hosts. We copy the ghettovcb script to a location on the NFS export so it’s accessible to all hosts easily (without having to update the script on each host individually), and we create lists for each physical host containing the names of the virtual machine it virtualizes. We then edit the ghettovcb.sh file to specify the new destination datastore (the backup datastore) and how we want it to back up.

When executing:

./ghettovcb.sh -f esxserverlist01

It creates the snapshot for each VM in the list for that host, clones it to the destination datastore (which in my setup is the NFS export on the new backup server), then deletes the snapshot when the backup is complete, finally moving on to the next VM and repeating the process until done. The script needs to be ran on all hosts, and list files for VMs have to be created for each host.

We now have a backup server and have done a disk-to-disk backup of our virtual machines. We can now plug in a large external USB drive to the backup server, and simply copy over the backups to it.

 

I do everything manually because I like to confirm everything is done and backed up properly, however you can totally create scripts to automate the whole process. After this we have our new backup solution!

 

Quick recap:

1) Setup a new backup server with enough disk space to back up all VMs. Setup an NFS export and mount it to ESXi hosts.

2) Download and configure the ghettoVCB script. Run the script on each ESXi host to disk-to-disk backup your VMs to your new backup server.

3) Copy the backup files from the backup server to a external USB drive that has enough space. Take off-site.

 

I have had to restore a couple VMs in the past due to a damaged RAID array, and I did so using a backup from above. It worked great! I will create a post on the restore process sometime in the future (for now feel free to look at the ghettoVCB documentation)!

Jun 092012
 

Recently, I’ve started to have some issues with the HP MSA20 units attached to my SAN server at my office. These MSA20 units stored all my Virtual Machines inside of a VMFS filesystem which was presented to my vSphere cluster hosts over iSCSI using Lio-Target. In the last while, these logical drive has just been randomly disappearing, causing my 16+ virtual machines to just halt. This always requires me to shut off the physical hosts, shut off the SAN server, shut off the MSA20s, and bring everything all the way back up. This causes huge amounts of downtime, and it just a pain in the butt…

I decided it was time for me to re-do my storage system. Preferably, I would have purchased a couple HP MSA60s and P800 controllers to hook it up to my SAN server, but unfortunately right now it’s not in the budget.

A few years ago, I started using software RAID. In the past I was absolutely scared of it, thought it was complete crap, and would never have touched it, but my opinion drastically changed after playing with it, and regularly using it. While I still recommend businesses to use Hardware based RAID systems, especially for mission critical applications, I felt I could try out software RAID for the above situation since it’s more of a “hobby” setup.

 

I read that most storage enthusiasts use either the Super Micro AOC-SASLP-MV8, or the LSI SAS 9211-8i. Both are based off different chipsets (both of which are widely used in other well known cards), and both have their own pro’s and con’s.

During my research, I noticed a lot of people who run Windows Home Server were utilizing the AOC Super Micro Card. And while using WHS, most reported no issues whatsoever, however it was a different story when reading posts/blog articles from people using Linux. I don’t know how accurate this was, but apperently a lot of people had issues with this card under heavy load, and some just couldn’t get it running inside of linux.

Then there is the LSA 9211-8i (which is the same as the extremely popular IBM M1015). This bad boy supports basic RAID operations (1, 0, 10), but most people use it with JBOD and simply use Linux MD Software RAID. While there was numerous complaints about users having issues with their systems even detecting their card, other users also reported issues caused by the BIOS of this card (too much memory for the system to boot). When people did get this card working though, I read of mostly NO issues under Linux. Spent a few days confirming what I already had read and finally decided to make the purchase.

Both cards support SAS/SATA, however the LSI card supports 6Gb/sec SAS/SATA. Both also have 2 internal SFF8087 Mini-SAS connectors to hook up a total of 8 drives directly, or more using an SAS expander. The LSI card uses a PCIe (V.2) 8x slot, vs the AOC-SASLP which uses PCIe (V.1) 4x slot.

 

I went to NCIX.com and ordered the LSI 9211-8i along with 2 breakout cables (Card Part#: LSI00194, Cable Part#: CBL-SFF8087OCF-06M). This would allow me to hook up a total of 8 drives (even though I only plan to use 5). I already have an old computer I already use with an eSATA connector to a Sans Digital SATA Expander for NFS, etc… that I plan on installing the card in to. I also have an old Startech SATABAY5BK enclosure which will hold the drives and connect to the controller. Finished case:

Server with disk enclosures (StarTech SATABAY5BK)

 

 

 

 

 

 

 

 

 

(At this point I have the enclosure installed along with 5 X 1TB Seagate 7200.12 Barracuda drives)

Finally the controller showed up from NCIX:

LSI SAS 9211-8i

 

 

 

 

 

 

 

 

I popped this card in the computer (which unfortunately only had PCIe V1), and connected the cables! This is when I ran in to a few issues…

-If no drives were connected, the system would boot and I could succesfully boot to CentOS 6.

-If at all I pressed CTRL+C to get in to the cards interface, the system would freeze during BIOS POST.

-If any drives were connected and detected by the cards BIOS, the system would freeze during BIOS POST.

 

I went ahead and booted in to CentOS 6. Downloaded the updated firmware and BIOS and flashed the card. The flashing manual was insane, but had to read it all to make sure I didn’t break anything. First I updated both the firmware and BIOS (which went ok), however I couldn’t convert the card from IR firmware to IT firmware due to errors. I google’d this and came up with a bunch of articles, but this one: http://brycv.com/blog/2012/flashing-it-firmware-to-lsi-sas9211-8i/ was the only one that helped and pointed me in the right direction. Essentially just stating you have to use the DOS flasher, erase the card (MAKING SURE NOT TO REBOOT OR YOU’D BRICK IT), and then flashing the IT Firmware. This worked for me, check out his post! Thanks Bryan!

Anyways, after updating the card and converting it to the IT firmware. I still had the BIOS issue. I tried the card in another system, and still had a bunch of issues. I finally removed 1 of 2 video cards and populated the card in a Video Card slot, and I finally could get in to the BIOS. First I enabled staggered spin-up (to make sure I don’t blow the PSU on the computer with a bunch of drives starting up at once), changed some other settings to optimize, and finally disabled the boot BIOS, and changed the option for the adapter to be disabled for boot, and only available to the OS. When removing the card, and putting it in the target computer, this worked. Also noticed that the staggered spin-up started during the Linux kernel startup when initializing the card. Here’s a copy of the kernel log:

mpt2sas version 08.101.00.00 loaded
mpt2sas 0000:06:00.0: PCI INT A -> Link[LNKB] -> GSI 18 (level, low) -> IRQ 18
mpt2sas 0000:06:00.0: setting latency timer to 64
mpt2sas0: 64 BIT PCI BUS DMA ADDRESSING SUPPORTED, total mem (3925416 kB)
mpt2sas 0000:06:00.0: irq 24 for MSI/MSI-X
mpt2sas0: PCI-MSI-X enabled: IRQ 24
mpt2sas0: iomem(0x00000000dfffc000), mapped(0xffffc900110f0000), size(16384)
mpt2sas0: ioport(0x000000000000e000), size(256)
mpt2sas0: sending message unit reset !!
mpt2sas0: message unit reset: SUCCESS
mpt2sas0: Allocated physical memory: size(7441 kB)
mpt2sas0: Current Controller Queue Depth(3305), Max Controller Queue Depth(3432)
mpt2sas0: Scatter Gather Elements per IO(128)
mpt2sas0: LSISAS2008: FWVersion(13.00.57.00), ChipRevision(0×03), BiosVersion(07.25.00.00)
mpt2sas0: Protocol=(Initiator,Target), Capabilities=(TLR,EEDP,Snapshot Buffer,Diag Trace Buffer,Task Set Full,NCQ)
mpt2sas0: sending port enable !!
mpt2sas0: host_add: handle(0×0001), sas_addr(0×5000000080000000), phys(8)
mpt2sas0: port enable: SUCCESS

SUCCESS! Lot’s of SUCCESS! Just the way I like it! Haha, card intialized, had access to drives, etc…

 

Configured the RAID 5 Array using a 256kb chunk size. I also changed the “stripe_cache_size” to 2048 (the system has 4GB of RAM) to increase the RAID 5 performance.

cd /sys/block/md0/md/

echo 2048 > stripe_cache_size

 

At this point I simply formatted the drive using EXT4. Configured some folders, NFS exports, and then used Storage vMotion to migrate the Virtual Machines from the iSCSI target, to the new RAID5 array (currently using NFS). The main priority right now was to get the VMs off the MSA20 so I could at least create a backup after they have been moved. Next step, I’ll be re-doing the RAID5 array, configuring the md0 device as a iSCSI target using Lio-Target, and formatting it with VMFS. The performance of this Software RAID5 array is already blowing the MSA20 out of the water!

Here’s some videos of the LEDs on the card in action:

http://www.youtube.com/watch?v=2TbJ8eOWWEE

http://www.youtube.com/watch?v=5Jjf1HmESAc

 

So there you have it! Feel free to post a comment if you have any questions or need any specifics. This setup is rocking away now under high I/O with absolutely no problems whatsoever. I think I may go purchase another 1-2 of these cards!

Nov 282011
 

Just thought I’d do up a quick little post about an issue I’ve been having for some time, and just got it all fixed.

I’ve been running Astaro Security Gateway inside of a VMware environment for a few years. When version 8.x came out, I went ahead and simply attached the ISO to the VM and re-installed over the old v7 and restored the config. This worked great, and for the longest time I had no real issues.

I noticed from time to time that with packet sniffs, there was quite a few retransmissions and TCP segments lost. This didn’t really pose any issues, and didn’t cause any problems, however it was odd.

Recently, I had to configure a Site to Site IPsec VPN between my office, and one of my employees to provide exchange, VoIP, etc… With astaro this is fairly easy, few clicks and it should work simple, however I started noticing huge issues with file transfers, whether being transferred over SMB (Windows File Sharing), or SCP/SSH. Transfers would either completely halt when started, transfer a few couple hundred kilobytes, or transfer half of the file until it would simply halt and become unresponsive.

After 3-4 days of troubleshooting, I went ahead and did a packet sniff, noticed there were numerous TCP segments lost, fragmentation, etc… Initially I beleive that maybe MTU configuration may have had something to do with it, however TCP/IP and the Astaro device should have taken care of properly setting the MTU on the IPsec automatically.

After trying fresh installs of ASG, etc… and no behaviour change, I finally decided to take a few days away and give it a shot later. I’ve troubleshot this from every avenue and for some reason the issue is still existing. I finally figured that the only thing I haven’t checked was with my VMware vSphere environment. Checked the settings, all was good, however I did notice that the NICs for the ASG vm (which were created by the v7 appliance) were set as flexible, and inside of the VM were detected as some type of AMD network adapter. I found this odd.

After shutting down the ASG VM, removing the NICs and configuring new ones using E1000, all of a sudden the issue was fixed, the IPsec Site to Site VPN functioned properly, and all the network issues seen in network captures were resolved.

I hope this helps some other people who may be frustrated dealing with the same issue.

Apr 152011
 

I thought I’d pass this on to all you iSCSI enthusiasts out there!

This morning I received an e-mail forwarded on by someone.

Apparently the Microsoft iSCSI Target Software is now free and runs on Windows Server 2008 R2.

Article:

http://blogs.technet.com/b/canitpro/archive/2011/04/05/the-microsoft-iscsi-software-target-is-now-free.aspx

I plan on getting this installed, setup, and configured sometime in the next couple weeks to test with my VMware vSphere environment. I’ll post my results :) Happy SANing!

Jan 102011
 

I notice quite a bit of traffic coming in (alot of it is the same people coming back) searching for information on VMware vSphere using iSCSI, specifically Lio-Target (because of the compatibility with SCSI persistant reservations).

In the past I’ve been jumping all over the place testing Lio-Target on different distrobutions, test scenarios, etc… I’ve officially implemented it into my production network and just wanted to report it’s been running solid for a few months now.

Current Working Configuration (Stable):

Currently at the moment I have numerous HP Servers (ML350s, DL360s) running ESXi off an internal usb-key. These ESXi hosts are accesing numerous iSCSI targets over gigabit hosted on a DL360 G6 with 2 X MSA20 storage units. The server hosting the storage is running Ubuntu 10.10 and has been rock solid with absolutely no issues. I’m fully utilizing VMotion amongst all the hosts, and all hosts have concurrent access to the iSCSI targets. This is running in full production and I’m fully confident in the configuration and setup.

Future Plans:

Lio-Target is going upstream in to the Linux kernel on the next release (2.6.38). With the testing I did (and blogged about) in the past months, I have not been able to get the newer versions of Lio-target running stable on CentOS. Once a new version of CentOS is released, or there is a kernel upgrade available to bring CentOS to 2.6.38, I will be installing CentOS on to the storage server and adding more disk space. Once this change is complete, that will conclude any future changes for a while (excluding adding more ESXi hosts).

If anyone has any questions on my setup, or configuration with something similar, or have any issues with Lio-Target, please feel free to leave a comment and I’ll see if I can help!

Oct 082010
 

Just got my test network setup:

  1. HP Proliant ML350 G5 – Running ESXi
  2. HP Proliant DL360 G5 – Running ESXi
  3. HP Proliant DL360 G5 – Running ESXi
  4. Custom Intel Server – Running vSphere (management server)
  5. Super Micro Server – Running CentOS 5.5 with lio-target (iSCSI Target)
  6. 2 X HP MSA20 – Running Multiple Arrays/Targets connected to Super Micro Server

The HP servers are booting ESXi via USB keys. The Super Micro Server has one single internal drive for OS, and connected to multiple MSA20 arrays acting as iSCSI targets. The custom intel server is running the vSphere vCenter Management software.

I’ve Storage vMotion’ed multiple test VMs to the iSCSI arrays. Currently testing! Speed is AMAZING! And judging from the logs on the Super Micro Server, the iSCSI SCSI persistent reservations ARE being handled properly J

More to come, stay posted!

Update: After testing this for a few days, I noticed that although SCSI reservations were being handled properly (which is great), for some reason the kernel on the storage server would stall. Causing the storage system to crash. I’ve only been able to get it to do this, when two ESXi boxes are concurrently accessing the same datastore. I’ve been trying to re-replicate this in different circumstances (one ESXi initiator, using Windows to test the target, etc… but have been unable). I might test the 4.0 release candidate soon.

Oct 052010
 

PLEASE VISIT http://www.stephenwagner.com/?p=300 FOR AN UPDATED TUTORIAL.


Disclaimer: Please note that whenever doing any of the steps mentioned in this post, if you do not know what you are doing, you can render your linux install usless. Please do NOT use this in a production environment, and only use for testing. I’m not liable if you toast your linux install.

First and foremost, you need to know that from what I understand, the typical kernels that ship with CentOS have numerous patches, modifications, and updates from the typical Linux kernel releases. This is one reason why the releases of the kernels are always behind compared to ones that are actively being developed. I could be wrong, but I think the Redhat kernel patches are applied to CentOS kernels.

This tutorial will walk you through on how to get you up and running in the dirtiest and quickest method. You may and probably will have to modify things, re-compile, etc… to resolve any issues you may run in to.

I wrote this article only to help more people get up and running with one of the only open-source Linux iSCSI targets that has been certified by VMware (Certified when running on certain appliances) to be used in a vSphere environment (supports Vmotion, SCSI reservations, etc…)

I’m going to assume you have CentOS 5.5 installed and is fully up to date.

Anyways, here’s a breakdown of what we will be doing to get Lio-Target to run on CentOS:

  • Download lio-target modified kernel
  • Download lio-utils
  • Compile a modified kernel using existing kernel .config file
  • Compile and install lio-utils
  • Sample commands to setup a dedicated drive on your system as a iSCSI target (to test)
  • Compose two quick and dirty config files so lio-target can run

Download the lio-target modified kernel

For this step, you will need to have git installed. From what I understand git is not an option during the CentOS install, and cannot be installed using the default typical yum repos. To get git installed, we will first add the “RPMFusion” and “EPEL” yum repos.

(Info on how to install these can be found at http://rpmfusion.org/ and https://fedoraproject.org/wiki/EPEL)

Once you have installed these, it’s time to install git. This can be easily done by typing:

yum install git

After git is installed, it’s now time to download the lio-target kernel.

Type in:

git clone git://git.kernel.org/pub/scm/linux/kernel/git/nab/lio-core-2.6.git lio-core-2.6.git

It is now downloading the source. After this is complete, change directory to the new directory that was created, and type in:

git checkout origin/lio-3.4

You now have the source.

Download lio-utils

This is a pretty simple step. If you just completed the step above, make sure you cd back in to a directory that you can use as a workspace, and make sure you do NOT do this, inside of the directory that was downloaded above. Type in:

git clone git://git.kernel.org/pub/scm/linux/storage/lio/lio-utils.git lio-utils.git

This should create a directory called “lio-utils.git”. This step is done.

At this point

You should have two directories inside of your workspace:

lio-core-2.6.git

lio-utils.git

Compile a modified kernel using existing kernel .config file

It is now time to build a kernel. First of all, issue a uname –a and note the kernel you are running. In my case it’s “2.6.18-194.17.1.el5″. I’m going to take a config file from the boot directory that matches the number, and copy it to the lio-core-2.6.git directory, only change the name to “.config”. This is what I would type in on my end:

cp /boot/config-2.6.18-194.17.1.el5 /root/lio-core-2.6.git/.config

This command above would copy the config for the CentOS kernel, and move it to the lio-core-2.6.git directory with the new name of “.config”. Now change into the lio-core-2.6.git directory:

cd lio-core-2.6.git

At this point we are going to run a command to help the config adjust to the newer kernel version. Type in:

make oldconfig

This will spawn numerous messages. You can just go ahead and keep hitting enter. It will seem like an endless loop, however eventually it will complete. Now the next step, this is important. By default, new kernels use a different sysfs structure. We need to turn on the depreciated item that CentOS uses. Type in:

make menuconfig

Navigate to:

“General Setup”, then check (put a star in the box) for “enable deprecated sysfs features to support old userspace tools”.

Now hit tab to select exit. Once again tab to exit at the main menu. We are now ready to compile! To do a quick compile, type in:

make && make modules && make modules_install && make install

Feel free to disapeer for a while. This will take some time depending on the performance of your system. Once done, you now have a kernel with lio-target built in that is compiled and installed.

Keep in mind that this is NOT the default kernel, and you will have to select this to boot when starting your system. To change this, modify /etc/grub.conf and change the default value to whichever item it is (remember that the first item is 0, and not 1).

Let’s boot the new kernel. Safely shut down your linux box, reboot, and when grub shows up, boot using the new kernel you compiled.

Please note: This guide is a quick and dirty way to get this up and running. Since we skipped customizing the kernel, the kernel you compiled will no doubt come up with errors on boot. I simply ignore this. You can too, just to get this up and running. You can come back at a later time and refine your custom kernel to be used.

Compile and install lio-utils

This is simple. Change directory to the directory where you downloaded lio-utils.

cd /root/lio-utils.git

And to compile and install:

make

make install

And BAM, you’re done, that was easy!

Sample commands to setup a dedicated drive on your system as a iSCSI target (to test)

It’s show time. I’m still fairly new to the configuration and usage of lio-target, so I’m just posting the commands to get it working on a dedicated disk. There are no manual (man page) entries installed by lio-utils, so you will need to use lio_node –help and tcm_node –help for more information on proper usage.

Keep in mind, DO NOT use your linux disk as a iSCSI target! Let’s pertend we have a second disk in the system /dev/sdb that we want to turn in to a target. This is what we would type:

tcm_node –block iblock_0/array /dev/sdb

/etc/init.d/target start

lio_node –addlun iqn.2010.com.stephenwagner.iscsi:array 1 0 iscsi00 iblock_0/array

lio_node –listendpoints

lio_node –addnp iqn.2010.com.stephenwagner.iscsi:array 1 192.168.0.10:3260

lio_node –listendpoints

lio_node –disableauth iqn.2010.com.stephenwagner.iscsi:array 1

lio_node –addlunacl iqn.2010.com.stephenwagner.iscsi:array 1 iqn.CLIENTINITIATORHERE.com 0 0

lio_node –enabletpg iqn.2010.com.stephenwagner.iscsi:array 1

The above will create a target, and discovery portal for /dev/sdb on 192.168.0.10. This will also disable CHAP authentication, and will allow the initator I specified above to connect.

Please change /dev/sdb to the drive you want to use

Please change iqn.2010.com.stephenwagner.iscsi:array to the iSCSI target you want to call yours.

Please change 192.168.0.10 to the IP of the iSCSI target your configuring. Leave port as 3260.

Please change iqn.CLIENTINITIATORHERE.com to the iqn for your initiator (client). This will be set on the client you are using to connect to the iSCSI target.

BAM, your target is up and running! Keep in mind, this configuration is lost upon reboot.

Compose two quick and dirty config files so lio-target can run

Here’s what you need to put in the config files to make the above config work on boot:

/etc/target/tcm_start.sh should contain:

tcm_node –block iblock_0/array /dev/sdb

/etc/target/lio_start.sh should contain:

lio_node –addlun iqn.2010.com.stephenwagner.iscsi:array 1 0 iscsi00 iblock_0/array

lio_node –addnp iqn.2010.com.stephenwagner.iscsi:array 1 192.168.0.10:3260

lio_node –disableauth iqn.2010.com.stephenwagner.iscsi:array 1

lio_node –addlunacl iqn.2010.com.stephenwagner.iscsi:array 1 iqn.1991-05.com.CLIENTINITATOR.com 0 0

lio_node –enabletpg iqn.2010.com.stephenwagner.iscsi:array 1

After you make these config files. You should be able to start lio-target in a running state, by issuing:

/etc/init.d/target start

And remember, that you can always view a live feed of what’s going on by issuing:

tail –f /var/log/messages

Hope this helps!