Apr 122014
 

Recently I decided it was time to beef up my storage link between my demonstration vSphere environment and my storage system. My existing setup included a single HP DL360p Gen8, connected to a Synology DS1813+ via NFS.

I went out and purchased the appropriate (and compatible) HP 4 x 1Gb Server NIC (Broadcom based, 4 ports), and connected the Synology device directly to the new server NIC (all 4 ports). I went ahead and configured an iSCSI Target using a File LUN with ALUA (Advanced LUN features). Configured the NICs on both the vSphere side, and on the Synology side, and enabled Jumbo frames of 9000 bytes.

I connected to the iSCSI LUN, and created a VMFS volume. I then configured Round Robin MPIO on the vSphere side of things (as always I made sure to enable “Multiple iSCSI initators” on the Synology side).

I started to migrate some VMs over to the iSCSI LUN. At first I noticed it was going extremely slow. I confirmed that traffic was being passed across all NICs (also verified that all paths were active). After the migration completed I decided to shut down the VMs and restart to compare boot times. Booting from the iSCSI LUN was absolutely horrible, the VMs took forever to boot up. Keep in mind I’m very familiar with vSphere (my company is a VMWare partner), so I know how to properly configure Round Robin, iSCSI, and MPIO.

I then decided to tweak some settings on the ESXi side of things. I configured the Round Robin policy to IOPS=1, which helped a bit. Then changed the RR policy to bytes=8800 which after numerous other tweaks, I determined achieved the highest performance to the storage system using iSCSI.

This config was used for a couple weeks, but ultimately I was very unsatisfied with the performance. I know it’s not very accurate, but looking at the Synology resource monitor, each gigabit link over iSCSI was only achieving 10-15MB/sec under high load (single contiguous copies) that should have resulted in 100MB/sec and higher per link. The combined LAN throughput as reported by the Synology device across all 4 gigabit links never exceeded 80MB/sec. File transfers inside of the virtual machines couldn’t get higher then 20MB/sec.

I have a VMWare vDP (VMWare Data Protection) test VM configured, which includes a performance analyzer inside of the configuration interface. I decided to use this to test some specs (I’m too lazy to actually configure a real IO/throughput test since I know I won’t be continuing to use iSCSI on the Synology with the horrible performance I’m getting). The performance analyzer tests run for 30-60 minutes, and measure writes and reads in MB/sec, and Seeks in seconds. I tested 3 different datastores.

 

Synology  DS1813+ NFS over 1 X Gigabit link (1500MTU):

Read 81.2MB/sec, Write 79.8MB/sec, 961.6 Seeks/sec

Synology DS1813+ iSCSI over 4 x Gigabit links configured in MPIO Round Robin BYTES=8800 (9000MTU):

Read 36.9MB/sec, Write 41.1MB/sec, 399.0 Seeks/sec

Custom built 8 year old computer running Linux MD Raid 5 running NFS with 1 X Gigabit NIC (1500MTU):

Read 94.2MB/sec, Write 97.9MB/sec, 1431.7 Seeks/sec

 

Can someone say WTF?!?!?!?! As you can see, it appears there is a major performance hit with the DS1813+ using 4 Gigabit MPIO iSCSI with Round Robin. It’s half the speed of a single link 1 X Gigabit NFS connection. Keep in mind I purchased the extra memory module for my DS1813+ so it has 4GB of memory.

I’m kind of choked I spent the money on the extra server NIC (as it was over $500.00), I’m also surprised that my custom built NFS server from 8 years ago (drives are 4 years old) with 5 drives is performing better then my 8 drive DS1813+. All drives used in both the Synology and Custom built NFS box are Seagate Barracuda 7200RPM drives (Custom box has 5 X 1TB drives configured RAID5, the Synology has 8 x 3TB drives configured in RAID 5).

I won’t be using iSCSI  or iSCSI MPIO again with the DS1813+ and actually plan on retiring it as my main datastore for vSphere. I’ve finally decided to bite the bullet and purchase an HP MSA2024 (Dual Controller with 4 X 10Gb SFP+ ports) to provide storage for my vSphere test/demo environment. I’ll keep the Synology DS1813+ online as an NFS vDP backup datastore.

Feel free to comment and let me know how your experience with the Synology devices using iSCSI MPIO is/was. I’m curious to see if others are experiencing the same results.

 

UPDATE – June 6th, 2014

The other day, I finally had time to play around and do some testing. I created a new FileIO iSCSI Target, I connected it to my vSphere test environment and configured round robin. Doing some tests on the newly created datastore, the iSCSI connections kept disconnecting. It got to the point where it wasn’t usable.

I scratched that, and tried something else.

I deleted the existing RAID volume and I created a new RAID 5 volume and dedicated it to Block I/O iSCSI target. I connected it to my vSphere test environment and configured round robin MPIO.

At first all was going smoothly, until again, connection drops were occurring. Logging in to the DSM, absolutely no errors were being reported and everything was fine. Yet, I was at a point where all connections were down to the ESXi host.

I shut down the ESXi host, and then shut down and restarted the DS1813+. I waited for it to come back up however it wouldn’t. I let it sit there and waited for 2 hours for the IP to finally be pingable. I tried to connect to the Web interface, however it would only load portions of the page over extended amounts of time (it took 4 hour to load the interface). Once inside, it was EXTREMELY slow. However it was reporting that all was fine, and everything was up, and the disks were fine as well.

I booted the ESXi host and tried to connect to it, however it couldn’t make the connection to the iSCSI targets. Finally the Synology unit became un-responsive.

Since I only had a few test VMs loaded on the Synology device, I decided to just go ahead and do a factory reset on the unit (I noticed new firmware was available as of that day). I downloaded the firmware, and started the factory reset (which again, took forever since the web interface was crawling along).

After restarting the unit, it was not responsive. I waited a couple hours and again, the web interface finally responded but was extremely slow. It took a couple hours to get through the setup page, and a couple more hours for the unit to boot.

Something was wrong, so I restarted the unit yet again, and again, and again.

This time, the alarm light was illuminated on the unit, also one of the drive lights wouldn’t come on. Again, extreme unresponsiveness. I finally got access to the web interface and it was reporting the temperature of one of the drives as critical, but it said it was still functioning and all drives were OK. I shut off the unit, removed the drive, and restarted it again, all of a sudden it was extremely responsive.

I removed the drive, hooked it up to another computer and confirmed that it was failed (which it was).

I replaced the drive with a new one (same model), and did three tests. One with NFS, one with FileIO iSCSI, and one with BlockIO iSCSI. All of a sudden the unit was working fine, and there was absolutely NO iSCSI connections dropping. I tested the iSCSI targets under load for some time, and noticed considerable performance increases with iSCSI, and no connection drops.

Here are some thoughts:
-Two possible things fixed the connection drops, either the drive was acting up all along, or the new version of DSM fixed the iSCSI connection drops.

-While performance has increased with FileIO to around ~120-160MB/sec from ~50MB/sec, I’m still not even close to maxing out the 4 X 1Gb interfaces.

-I also noticed a significant performance increase with NFS, so I’m leaning towards the fact that the drive had been acting up since day one (seeks per second increased by 3 fold after replacing the drive and testing NFS). I/O wait has been significantly reduced

-Why did the Synology unit just freeze up once this drive really started dying? It should have been marked as failed instead of causing the entire Synology unit not to function.

-Why didn’t the drive get marked as failed at all? I regularly performed SMART tests, and checked drive health, there was absolutely no errors. Even when the unit was at a standstill, it still reported the drive as working fine.

Either way, the iSCSI connection drops aren’t occurring anymore, and performance with iSCSI is significantly better. However, I wish I could hit 200MB+/sec.

At this point it is usable for iSCSI using FileIO, however I was disappointed with BlockIO performance (BlockIO should be faster, no idea why it isn’t).

For now, I have an NFS datastore configured (using this for vDP backup), although I will be creating another FileIO iSCSI target and will do some more testing.

Jul 232012
 

Interesting story:

On the weekend, my Trixbox VoIP PBX (which runs Asterisk) failed. Unfortunately, during the restore process the hard drive also blew up. Temporarily I setup a ML350G5 as a temp VoIP PBX, however today I had the chance to setup an old Acer Aspire Netbook which I had sitting in a box as my new permanent VoIP PBX.

I used a USB CD-Rom to install Trixbox, as for some reason I couldn’t get it to load the kickstart file during a grub boot off a USB key, also couldn’t get it to mount the NFS install export (maybe the kernel didn’t have support for NFS?).

 

The netbook had decent specs:

Dual-Core 1.5Ghz Process

1GB Ram

Battery (this means I don’t have to put it on my UPS for all my server equipment)

 

Got it setup and it’s running great! :)

Jun 292012
 

As most of you have read, I received 2 X Raspberry Pi the other day. I’ve been actively hacking and working away on these lovely little devices.

One of the projects I wanted to do, was get Lio-Target (iSCSI Target) running on the Pi. I know that the Pi doesn’t have gigabit networking, but I thought this would still be an interesting proof of concept. Anyways, I got it running, and I have succesfully connected to a USB storage device which was configured as a iSCSI target on my Pi, from my Windows 7 workstation.

This is a brief overview, I will be providing instructions in detail at a later date. Here’s how I did it:

1) Download Fedora 17 for ARM (build for Raspberry Pi).

2) Put latest Firmware and Kernel from Raspberry Pi github repo on to the boot partition. Resize my 16GB card so I have boot, root, and a 2 GB swap.

3) Download snapshot of Raspberry Pi kernel sources. I built the iSCSI Target as modules (I also threw in some other stuff for future projects but it’s not important right now).

3) Install compilers, libraries, etc for kernel build process.

4) Compile kernel

5) Build Raspberry Pi kernel image using Raspberry Pi image tools on github repo, copy to boot.

6) Boot off new kernel

7) Install Target CLI from yum (this was a nice change from compiling on my own), and then build Lio-Utils (this isn’t mandatory, but I like Lio-utils).

8) Configure target, connect, test.

Here’s a copy/paste of proof I have it running!

[root@fedora-arm lio-utils.git]# uname -a
Linux fedora-arm 3.1.9.001 #1 PREEMPT Thu Jun 28 16:40:46 MDT 2012 armv6l armv6l armv6l GNU/Linux
[root@fedora-arm lio-utils.git]# w
09:12:10 up 34 min,  5 users,  load average: 0.99, 1.51, 1.10
USER     TTY      FROM              LOGIN@   IDLE   JCPU   PCPU WHAT
root     pts/0    host.digitall 31Dec69 22:38   2.27s  0.02s tail -f /var/log/messages
root     pts/1    host.digitall 31Dec69  0.00s  1.19s  0.05s w
root     pts/2    host.digitall 08:58   13:24  13.89s 13.04s top
root     pts/3    host.digitall 31Dec69  0.00s  0.00s   ?    -
[root@fedora-arm lio-utils.git]# /etc/init.d/target status
[---------------------------] TCM/ConfigFS Status [----------------------------]
\——> iblock_0
HBA Index: 1 plugin: iblock version: v4.1.0-rc1-ml
\——-> array0
Status: ACTIVATED  Execute/Left/Max Queue Depth: 0/128/128  SectorSize: 512  MaxSectors: 240
iBlock device: sdb  UDEV PATH: /dev/sdb
Major: 8 Minor: 16  CLAIMED: IBLOCK
udev_path: /dev/sdb

[---------------------------] LIO-Target Status [----------------------------]
\——> iqn.2003-01.org.linux-iscsi.fedora-arm.armv6l:sn.4682cf8cdeec
\——-> tpgt_1  TargetAlias: LIO Target
TPG Status: ENABLED
TPG Network Portals:
\——-> IP-hidden:3260
TPG Logical Units:
\——-> lun_0/30b42bf9f5 -> target/core/iblock_0/array0

Target Engine Core ConfigFS Infrastructure v4.1.0-rc1-ml on Linux/armv6l on 3.1.9.001
RisingTide Systems Linux-iSCSI Target v4.1.0-rc1
[root@fedora-arm lio-utils.git]# cat /proc/cpuinfo
Processor       : ARMv6-compatible processor rev 7 (v6l)
BogoMIPS        : 697.95
Features        : swp half thumb fastmult vfp edsp java tls
CPU implementer : 0×41
CPU architecture: 7
CPU variant     : 0×0
CPU part        : 0xb76
CPU revision    : 7

Hardware        : BCM2708
Revision        : 0002
Serial          : 00000000c1ad6318
[root@fedora-arm lio-utils.git]#

Jun 092012
 

So there I was… Had a custom built vSphere cluster, 3 hosts, iSCSI target (setup with Lio-Target), everything running fine, smoothly, perfect… I’m done right? NOPE!

Most of us who care, are concerned about our Disaster Recovery or Backup solution. With virtualization things get a little bit interesting in the fact that either you have a CRAZY large setup, and can use the VMware backup stuff, or you have a smaller environment and want something simple, easy to use, and with a low footprint. Configuring a backup and/or disaster recovery solution for a virtualized environment may be difficult and complicated, however after it’s fully implemented; management, use, and administration is easy, and don’t forget about the abilities and features you get with virtualization.

Reasons why virtualization rocks when it comes to backup and disaster recovery:

-Unlike traditional backups you do not have to install a bare OS to run the backup software to restore over

-You can restore to hardware that is not like nor similar to the original hardware

-Backups are now simple files that can be easily moved, transported, copied, and saved on to normal or non-normal media (you could save an entire system on to a USB key if it was big enough). On a 2TB external USB drive, you could have a backup of over 16+ virtual machines!

-Ease of recovery: Copy the backed up VM files to host/datastore, and simply hit play. Restore complete and you’re up and running!

 

So with all that in mind, here we go! (Scroll to bottom of post for a quick conclusion).

For my solution, here are some of the requirements I had:

1) Utilize snapshots to take restorable backups while the VMs are running (no downtime).

2) Move the backups to a different location while running (this could be a drive, NFS export, SMB share, etc…).

3) Have the backups stored somewhere easy to access where I can move them to a removable external USB drive to take off-site. This way, I have fast disk-to-disk access to restore backups in the event my storage system goes down or RAID array is lost (downtime would be minimal), or in the event of something more serious like a fire, I would have the USB drive off-site to restore from. Disk-to-disk backups could happen on a daily basis, and disk-to-USB could be done weekly and taken off-site.

4) In the event of a failure, be able to bring USB drive onsite, transfer VMs back and be up and running in no time.

 

So with this all in mind, I started designing a solution. My existing environment (without backup) composed of:

2 X HP DL360 G5 Servers (running ESXi)

1 X HP ML350 G5 Server (running ESXi)

1 X Super Micro Intel Xeon Server (Running CentOS 6 & Lio-Target backports: providing iSCSI VMFS)

2 X HP MSA20 Storage Units

 

First, I need a way to create snapshots of my 16+ virtual machines. After, I would need to have the snapshots moved to another location (such as a backup server). There is a free script available called ghettoVCB. ghettoVCB is a “Free alternative for backup up VM’s for ESX(i) 3.5, 4.x+ & 5.x” and is available (along with tons of documentation) at: http://communities.vmware.com/docs/DOC-8760. This script is generally ran on the ESXi host, generates a snapshot and clones it to a seperate datastore configured on that host. It does this for all virtual machines named in the VM list you specify, or for all VMs on the host if a specific switch is passed to ghettovcb.sh.

So now that we have the software, we need to have a location setup to back up to. We could either create a new iSCSI target, or we could setup a new Linux server and have a RAID array configured and formatted with ext4 and exported via NFS. This would allow us to have the NFS setup as a datastore on ESXi (so we can backup to the NFS export), and afterwards be able to access the backups natively in Linux to copy/move to a external drive formatted with EXT4.

We configure a new server, running CentOS 6 with enough storage to backup all VMs. We create NFS exports and mount these to all the ESXi hosts. We copy the ghettovcb script to a location on the NFS export so it’s accessible to all hosts easily (without having to update the script on each host individually), and we create lists for each physical host containing the names of the virtual machine it virtualizes. We then edit the ghettovcb.sh file to specify the new destination datastore (the backup datastore) and how we want it to back up.

When executing:

./ghettovcb.sh -f esxserverlist01

It creates the snapshot for each VM in the list for that host, clones it to the destination datastore (which in my setup is the NFS export on the new backup server), then deletes the snapshot when the backup is complete, finally moving on to the next VM and repeating the process until done. The script needs to be ran on all hosts, and list files for VMs have to be created for each host.

We now have a backup server and have done a disk-to-disk backup of our virtual machines. We can now plug in a large external USB drive to the backup server, and simply copy over the backups to it.

 

I do everything manually because I like to confirm everything is done and backed up properly, however you can totally create scripts to automate the whole process. After this we have our new backup solution!

 

Quick recap:

1) Setup a new backup server with enough disk space to back up all VMs. Setup an NFS export and mount it to ESXi hosts.

2) Download and configure the ghettoVCB script. Run the script on each ESXi host to disk-to-disk backup your VMs to your new backup server.

3) Copy the backup files from the backup server to a external USB drive that has enough space. Take off-site.

 

I have had to restore a couple VMs in the past due to a damaged RAID array, and I did so using a backup from above. It worked great! I will create a post on the restore process sometime in the future (for now feel free to look at the ghettoVCB documentation)!

Jun 092012
 

Recently, I’ve started to have some issues with the HP MSA20 units attached to my SAN server at my office. These MSA20 units stored all my Virtual Machines inside of a VMFS filesystem which was presented to my vSphere cluster hosts over iSCSI using Lio-Target. In the last while, these logical drive has just been randomly disappearing, causing my 16+ virtual machines to just halt. This always requires me to shut off the physical hosts, shut off the SAN server, shut off the MSA20s, and bring everything all the way back up. This causes huge amounts of downtime, and it just a pain in the butt…

I decided it was time for me to re-do my storage system. Preferably, I would have purchased a couple HP MSA60s and P800 controllers to hook it up to my SAN server, but unfortunately right now it’s not in the budget.

A few years ago, I started using software RAID. In the past I was absolutely scared of it, thought it was complete crap, and would never have touched it, but my opinion drastically changed after playing with it, and regularly using it. While I still recommend businesses to use Hardware based RAID systems, especially for mission critical applications, I felt I could try out software RAID for the above situation since it’s more of a “hobby” setup.

 

I read that most storage enthusiasts use either the Super Micro AOC-SASLP-MV8, or the LSI SAS 9211-8i. Both are based off different chipsets (both of which are widely used in other well known cards), and both have their own pro’s and con’s.

During my research, I noticed a lot of people who run Windows Home Server were utilizing the AOC Super Micro Card. And while using WHS, most reported no issues whatsoever, however it was a different story when reading posts/blog articles from people using Linux. I don’t know how accurate this was, but apperently a lot of people had issues with this card under heavy load, and some just couldn’t get it running inside of linux.

Then there is the LSA 9211-8i (which is the same as the extremely popular IBM M1015). This bad boy supports basic RAID operations (1, 0, 10), but most people use it with JBOD and simply use Linux MD Software RAID. While there was numerous complaints about users having issues with their systems even detecting their card, other users also reported issues caused by the BIOS of this card (too much memory for the system to boot). When people did get this card working though, I read of mostly NO issues under Linux. Spent a few days confirming what I already had read and finally decided to make the purchase.

Both cards support SAS/SATA, however the LSI card supports 6Gb/sec SAS/SATA. Both also have 2 internal SFF8087 Mini-SAS connectors to hook up a total of 8 drives directly, or more using an SAS expander. The LSI card uses a PCIe (V.2) 8x slot, vs the AOC-SASLP which uses PCIe (V.1) 4x slot.

 

I went to NCIX.com and ordered the LSI 9211-8i along with 2 breakout cables (Card Part#: LSI00194, Cable Part#: CBL-SFF8087OCF-06M). This would allow me to hook up a total of 8 drives (even though I only plan to use 5). I already have an old computer I already use with an eSATA connector to a Sans Digital SATA Expander for NFS, etc… that I plan on installing the card in to. I also have an old Startech SATABAY5BK enclosure which will hold the drives and connect to the controller. Finished case:

Server with disk enclosures (StarTech SATABAY5BK)

 

 

 

 

 

 

 

 

 

(At this point I have the enclosure installed along with 5 X 1TB Seagate 7200.12 Barracuda drives)

Finally the controller showed up from NCIX:

LSI SAS 9211-8i

 

 

 

 

 

 

 

 

I popped this card in the computer (which unfortunately only had PCIe V1), and connected the cables! This is when I ran in to a few issues…

-If no drives were connected, the system would boot and I could succesfully boot to CentOS 6.

-If at all I pressed CTRL+C to get in to the cards interface, the system would freeze during BIOS POST.

-If any drives were connected and detected by the cards BIOS, the system would freeze during BIOS POST.

 

I went ahead and booted in to CentOS 6. Downloaded the updated firmware and BIOS and flashed the card. The flashing manual was insane, but had to read it all to make sure I didn’t break anything. First I updated both the firmware and BIOS (which went ok), however I couldn’t convert the card from IR firmware to IT firmware due to errors. I google’d this and came up with a bunch of articles, but this one: http://brycv.com/blog/2012/flashing-it-firmware-to-lsi-sas9211-8i/ was the only one that helped and pointed me in the right direction. Essentially just stating you have to use the DOS flasher, erase the card (MAKING SURE NOT TO REBOOT OR YOU’D BRICK IT), and then flashing the IT Firmware. This worked for me, check out his post! Thanks Bryan!

Anyways, after updating the card and converting it to the IT firmware. I still had the BIOS issue. I tried the card in another system, and still had a bunch of issues. I finally removed 1 of 2 video cards and populated the card in a Video Card slot, and I finally could get in to the BIOS. First I enabled staggered spin-up (to make sure I don’t blow the PSU on the computer with a bunch of drives starting up at once), changed some other settings to optimize, and finally disabled the boot BIOS, and changed the option for the adapter to be disabled for boot, and only available to the OS. When removing the card, and putting it in the target computer, this worked. Also noticed that the staggered spin-up started during the Linux kernel startup when initializing the card. Here’s a copy of the kernel log:

mpt2sas version 08.101.00.00 loaded
mpt2sas 0000:06:00.0: PCI INT A -> Link[LNKB] -> GSI 18 (level, low) -> IRQ 18
mpt2sas 0000:06:00.0: setting latency timer to 64
mpt2sas0: 64 BIT PCI BUS DMA ADDRESSING SUPPORTED, total mem (3925416 kB)
mpt2sas 0000:06:00.0: irq 24 for MSI/MSI-X
mpt2sas0: PCI-MSI-X enabled: IRQ 24
mpt2sas0: iomem(0x00000000dfffc000), mapped(0xffffc900110f0000), size(16384)
mpt2sas0: ioport(0x000000000000e000), size(256)
mpt2sas0: sending message unit reset !!
mpt2sas0: message unit reset: SUCCESS
mpt2sas0: Allocated physical memory: size(7441 kB)
mpt2sas0: Current Controller Queue Depth(3305), Max Controller Queue Depth(3432)
mpt2sas0: Scatter Gather Elements per IO(128)
mpt2sas0: LSISAS2008: FWVersion(13.00.57.00), ChipRevision(0×03), BiosVersion(07.25.00.00)
mpt2sas0: Protocol=(Initiator,Target), Capabilities=(TLR,EEDP,Snapshot Buffer,Diag Trace Buffer,Task Set Full,NCQ)
mpt2sas0: sending port enable !!
mpt2sas0: host_add: handle(0×0001), sas_addr(0×5000000080000000), phys(8)
mpt2sas0: port enable: SUCCESS

SUCCESS! Lot’s of SUCCESS! Just the way I like it! Haha, card intialized, had access to drives, etc…

 

Configured the RAID 5 Array using a 256kb chunk size. I also changed the “stripe_cache_size” to 2048 (the system has 4GB of RAM) to increase the RAID 5 performance.

cd /sys/block/md0/md/

echo 2048 > stripe_cache_size

 

At this point I simply formatted the drive using EXT4. Configured some folders, NFS exports, and then used Storage vMotion to migrate the Virtual Machines from the iSCSI target, to the new RAID5 array (currently using NFS). The main priority right now was to get the VMs off the MSA20 so I could at least create a backup after they have been moved. Next step, I’ll be re-doing the RAID5 array, configuring the md0 device as a iSCSI target using Lio-Target, and formatting it with VMFS. The performance of this Software RAID5 array is already blowing the MSA20 out of the water!

Here’s some videos of the LEDs on the card in action:

http://www.youtube.com/watch?v=2TbJ8eOWWEE

http://www.youtube.com/watch?v=5Jjf1HmESAc

 

So there you have it! Feel free to post a comment if you have any questions or need any specifics. This setup is rocking away now under high I/O with absolutely no problems whatsoever. I think I may go purchase another 1-2 of these cards!

Jun 062012
 

So, as most of you know, I have TONS of articles pertaining to getting Lio-Target running on Cent OS. In the beginning, things seemed rather “hit or miss” due to weird errors when either building lio-target or lio-utils…

Turns out, most of the issues I’ve had are related to Python and the current running version. Recently I updated one of my storage boxes using yum, and it completely broke Lio-Target, and Lio-Utils when I had to rebuild them for the new kernel. I was in a panic to mount an old CentOS 6 ISO to get one of the first Python version for CentOS. After downgrading, I was able to build and install both.

 

Just a heads up for you people getting weird python errors.

Mar 062012
 

Well, I received a phone call from my father this morning, demanding I to go to his blog and check out a video… (I’ve been helping my dad get this first blog up and running the past couple days).

Check the video out at: http://www.russwagner.com/?p=4, pretty funny :)

Anyways, this made me think of some of my old favorite classics. Here’s a few pertaining to Linux:

Linux is Ready

Linux

Crime of the century

Jan 262012
 

In this “how to” we will go over installing Ubuntu 10.04 TLS on a Soekris Net4801 SBC (Single board computer).

To accomplish this, we will be network booting the Net4801 since it does not have any installation type storage (no cd-rom, and USB ports are not bootable), also since the Net4801 does not have a video card, or keyboard, we will be performing the installation over a serial console.

You can use this guide to perform the same function on other SBCs or other devices (even a standard server). The methods in the guide to both network boot, and provide a serial console are not mutual specific and can be done on their own (example, you don’t need to network boot to install using a serial console, or vice versa).

In this how to, we are using the Soekris Net4801 since it’s a small, interesting little computer which is designed as a somewhat open platform for router, wireless, and numerous other types of development and production type uses. The Net4801 specifications are available here: http://soekris.com/products/net4801.html.

The instructions I provide are using software and systems I have available to myself. Your environment may be different so remember that Google is your friend. The concepts will be the same.

To get started, we need:

1 X Soekris Net4801

1 X Computer with a serial port

1 X Serial DB-9 Null Modem Cable

1 X Linux or Windows computer running TFTP server and web server

When network booting the Ubuntu installer, you can either install directly off the internet (which requires simply networking booting, and following the installation instructions) or you can provide the installer the installation files which may speed things up if you are on a slow connection. For the purpose of enlightening whoever is reading this, we are going to provide the installer the files.

1) Install a TFTP Server

The first thing we have to do is create the environment necessary to network boot the Net4801. In my case I have a CentOS 6 server. I installed the tftp server by issuing “yum install tftp-server*”. After this is complete we open up /etc/xinetd.d/tftp and change the disable value to no. Go ahead and restart xinetd by typing “/etc/init.d/xinetd restart”.

We now have a TFTP server providing everything inside of /var/lib/tftpboot.

2) Configure the DHCP Server to provide PXE boot info to PXE clients

In my case, I have a Windows Server 2008 R2 box providing DHCP to my network. I simply log on to the server, and open the DHCP Server GUI. I browse to my network scope, and right click on “Scope Options” and hit “Configure Options”. We need to specify two options: First is “066 Boot Server Host Name” which we set to the IP address of the TFTP Server, and second “067 Bootfile Name” which we set to “pxelinux.0″. That’s it! When the PXE client boots it will receive this information.

3) Configure netboot files

If you have the Ubuntu alternative CD, you can copy over everything inside of the install/netboot directory to /var/lib/tftpboot.

If you don’t have the Ubuntu CD, change your working directory to /var/lib/tftpboot, and type “wget ftp://ubuntu.arcticnetwork.ca/pub/ubuntu/ubuntu/dists/lucid/main/installer-i386/current/images/netboot/netboot.tar.gz”. After this, type “tar zxvf netboot.tar.gz”. This will extract the netboot components to the directory.

When this is completed, this will provide the network installer for Ubuntu. Since we are using a serial terminal to install Ubuntu on the Net4801, we now have to configure the bootloader, and installer to use the serial console.

Inside of the /var/lib/tftpboot directory, open pxelinux.cfg/default using your favorite editor. Add these two lines to the top of the file:

console 0
serial 0 19200 0

Save and close. We have just instructed the bootloader to use the first serial port on the system to provide console. Next we need to configure the kernel to input/output to the serial port aswell.

Open ubuntu-installer/i386/boot-screens/text.cfg in your editor. We are going to remove and add a few things to the “append” line under the “install label”. We are going to remove the word quiet, and replace it with console=ttyS0,19200. After completed it should look like this:

default install
label install
menu label ^Install
menu default
kernel ubuntu-installer/i386/linux
append vga=normal initrd=ubuntu-installer/i386/initrd.gz — console=ttyS0,19200 earlyprint=serial,ttyS0,19200

Now that this is complete, the kernel will now input/output to the serial console.

4) Install sources (you can skip this, but please read)

If you are installing from the internet, you can simply skip this step. If you have the Ubuntu alternative CD, or CD Image, and want to install from those sources, this is what we need to do. In my case, I had the ubuntu .iso file.

On my CentOS server, I have Apache httpd installed. I have the .iso file in /root/. I type “mkdir /var/www/html/mount” to create a directory called mount inside of the web root. I then make sure I’m in /root/ when typing “mount cd-image-name.iso /var/www/html/mount -o loop” which mounts the CD Image to the mount directory.

We have now successfully mounted the CD image to the web server.

5) Network Boot the Soekris Net4801 via PXE

We now have the environment configured, it’s finally time to network boot the Net4801. Keep in mind, with a serial connection, one of the only problems you’ll run into is a) configuring software (ie. Linux, Grub, Bootloders) to use it, and b) speed settings. We’ve addressed the first issue already with configuration files, however we need to setup speed values on both the BIOS for the Net4801, and speed value for the client (in my case PuTTY). While you can use a whole range, I like to use 19200. It’s friendly, and I never have any issues :)

Hook up the Net4801 to your computer’s serial port using your Null Modem Serial cable. Open up PuTTY, and instead of using SSH, use Serial, and set the speed to 19200. I beleive this is the default for a fresh out of the box Net4801, and start the connection. Power on the Net4801 and you should see the startup text.

So right now, the connection is working, but I thought I’d go over a few things. Hit Ctrl + P while the BIOS is posting, and type show.

These are variables you can configure on the Net4801. A few to remember are: a) ConSpeed – Serial port speed, has to match on both sides b) Flash – Either Primary or Secondary, this specifies whether it is Master or Slave on the IDE Channel, just like traditional older IDE based computers. c) BootDrive – This is standard booting order, 80 = IDE Master, 81 = IDE Slave, F0 = PXE Network boot.

Anyways, that’s that. So now we want to network boot. While in the ComBios command console, type in “Boot F0″. This will initiate the network boot. Things might look a bit weird at first, however eventually it will prompt you for something, simply type “install” and hit enter. After the kernel boots, the Ubuntu text installation should start. Now it’s easy and normal.

FINAL NOTES:

-The Soekris Net4801′s processor is a i586 class processor. Ubuntu dropped support for i586 as of 10.10 and later. This is why I chose 10.04 LTS.

-There are issues with the installer on the Net4801. Issue being, after specifying network configuration, once it starts to download the initial installer components, the installer will freeze. Usually the screen goes blank for up to 45 minutes when it is working, however when this issue occurs, it will permanently freeze. 10 Months ago I narrowed down what was causing this, however have forgotten. I think it has something to do with just having a IDE drive connected to the Soekris, I think it started working when switching to Compact Flash for internal storage.

POST-INSTALL CONFIGURATION:

After installing, I noticed a bunch of weird things like ureadahead and plymouth crashing on startup (due to lack of resources). Also, some things were not showing up on the serial console which I wanted to (this is because everyone want’s GUI’s these days).

Few recommendations on cleaning up your install:

1) Disable AppArmor – It’s a waste of resources

Type “sudo update-rc.d -f apparmor remove”

2) Configure GRUB – Get the proper stuff going to the console

Open /etc/default/grub in your favorite editor, remove everything and paste this:

GRUB_DEFAULT=0
GRUB_HIDDEN_TIMEOUT_QUIET=true
GRUB_TIMEOUT=3
GRUB_DISTRIBUTOR=`lsb_release -i -s 2> /dev/null || echo Debian`
GRUB_CMDLINE_LINUX_DEFAULT=”noplymouth text nosplash”
GRUB_CMDLINE_LINUX=”console=ttyS0,19200 earlyprint=serial,ttyS0,19200″
GRUB_TERMINAL=console
GRUB_TERMINAL=serial
GRUB_SERIAL_COMMAND=”serial –unit=0 –speed=19200 –stop=1″

That disables plymouth, configures a text console, and shows GRUB on boot. After this is done, run “update-grub” and “update-initramfs -u”.

3) Set time – I don’t know why but my system lost it’s time, maybe my battery is going, for some reason this wasn’t automatic.

Type “ntpdate time.windows.com”

4) Compiler and Building stuff – If you’re going to compile anything, run this, or skip this

Type “apt-get install fakeroot build-essential libncurses5 libncurses5-dev libelf-dev asciidoc binutils-dev”

Have any questions, feel free to leave a comment. Also, sorry about the mess of a post.

Oct 142011
 

In this tutorial, I will be showing you how to get Lio-Target (an iSCSI target that is compatible with persistent reservations required by both VMware and MS Clustering) running on CentOS 6.

While this tutorial is targetted for CentOS 6 users, I see no reason why this should work on any other newer distributions.

Please note that while Lio-Target 4.x (and required tcm_loop and iSCSI) is available on newer/non-stable development kernels, Lio 3.X is stable, and currently builds nicely on CentOS 6. I will be doing up a tutorial for Lio 4.X once I myself start using it.

One more note, In the past I have thrown up a few tutorials on how to get Lio-Target running on various Linux distributions. These tutorials have worked for some, and not for others. I myself have had a few difficulties replicating the success I did originally. I myself am a technical guy, I do not understand some developer terms, and am not an expert in understanding some development cycles. This is one of the reasons why I had so many difficulties earlier. Since the earlier tutorials, I have caught up to speed and am familiar with what is required to get Lio-Target running.

Now on to the tutorial:

It is a good idea to start with a fresh install of CentOS 6. Make sure you do not have any of the iSCSI target packages installed that ship with CentOS. In my case I had to remove a package called something like “iSCSI-Target-utils” (This shipped with the CentOS 6 install).

1. Let’s download the software. We need to download both the 3.5 version of Lio-Target, along with Lio-utils which was built for 3.x of Lio-Target. (I chose the RisingTide Systems GIT repo since lio related projects have been missing from kernel.org’s GIT repo due to the issues kernel.org has been having recently).

Issue the following commands:

git clone git://risingtidesystems.com/lio-core-backports.git lio-core-backports.git

git clone git://risingtidesystems.com/lio-utils.git lio-utils.git

cd lio-utils.git/

git checkout --track -b lio-3.5 origin/lio-3.5

cd ..

(You have now downloaded both Lio-Target 3.5 backport, and lio-utils for lio-target 3.x)

2. Build kernel modules for your existing running CentOS kernel. Make sure you change in to the lio-core-backports directory first.

Change in to the lio-core-backports directory then issue the following commands:

make

make install

(You have now built, and installed the kernel modules for Lio-Target)

3. Build lio-utils and install. This is one of the tasks I had difficulties with, for some reason the install scripts were calling out to the incorrect python directory, I found a fix to this myself.

Apply the fix first:

Go into the tcm-py and lio-py directories inside of the lio-utils directory. Open the install.sh in both the tcm-py and lio-py directories and change the “SITE_PACKAGES” string to reflect the following:

SITE_PACKAGES=/usr/lib/python2.6/site-packages

Remember to do this in both the install.sh files for lio-py and tcm-py. Now on to building and installing lio-utils.

Issue the following commands from the lio-utils directory:

make

make install

And you are now done!

Lio-Target and Lio-Utils have no succesfully been installed. As you can see, this was way easier than my previous tutorials, and doesn’t include and rebuilding of kernels, etc… One of the plus’s is that you actually build the kernel modules for the existing CentOS kernel.

One last thing. Start lio-target by issuing the command:

/etc/init.d/target start

And do a ‘dmesg’ to confirm that it started ok!

As always, feel free to post any comments or questions. I’ll do my best to help!

Jan 102011
 

I notice quite a bit of traffic coming in (alot of it is the same people coming back) searching for information on VMware vSphere using iSCSI, specifically Lio-Target (because of the compatibility with SCSI persistant reservations).

In the past I’ve been jumping all over the place testing Lio-Target on different distrobutions, test scenarios, etc… I’ve officially implemented it into my production network and just wanted to report it’s been running solid for a few months now.

Current Working Configuration (Stable):

Currently at the moment I have numerous HP Servers (ML350s, DL360s) running ESXi off an internal usb-key. These ESXi hosts are accesing numerous iSCSI targets over gigabit hosted on a DL360 G6 with 2 X MSA20 storage units. The server hosting the storage is running Ubuntu 10.10 and has been rock solid with absolutely no issues. I’m fully utilizing VMotion amongst all the hosts, and all hosts have concurrent access to the iSCSI targets. This is running in full production and I’m fully confident in the configuration and setup.

Future Plans:

Lio-Target is going upstream in to the Linux kernel on the next release (2.6.38). With the testing I did (and blogged about) in the past months, I have not been able to get the newer versions of Lio-target running stable on CentOS. Once a new version of CentOS is released, or there is a kernel upgrade available to bring CentOS to 2.6.38, I will be installing CentOS on to the storage server and adding more disk space. Once this change is complete, that will conclude any future changes for a while (excluding adding more ESXi hosts).

If anyone has any questions on my setup, or configuration with something similar, or have any issues with Lio-Target, please feel free to leave a comment and I’ll see if I can help!