Saturday, 6 April 2013

VMWare ESXi iSCSI Performance Issues

VMware ESXi iSCSI Performance Issues

Last year I spent some time integrating an IBM DS3524 SAN into our existing iSCSI network. After implementation, I found that the envisaged performance was not up-to specification so decided to turn to IOMeter to gather some statistics. Using the VMWare unofficial storage thread (
http://communities.vmware.com/thread/73745) I run the outlined profiles and compared the results to the other admins findings. The performances that I was seeing was much lower in comparison.

Preconfigured profiles can be downloaded from:
http://vmktree.org/iometer/

Moving on we started digging further using the "Max-Throughput 100%Read" test and this was our findings:


CONFIGURATION:
 

SANIBM DS3524
2 controllers each with four 1000Mbps host ports.
24 136GB 15K RPM Disk drives.

Each host port assigned to VLAN 2 or 3 - see above image 

SWITCHES2 HP Procurve 2510 10/100/100 Switches
Each switch configured with an isolated VLAN (Switch 1 = VLAN 2, Switch 2 = VLAN 3)
Jumbo Packets Enabled

ESX (I will explain in another blog post our iSCSI configuration)
ESX 5.1
2 1000Mbps NIC's assigned to iSCSI
Added LUNs set to use Round-Robin

RESULTS

The results from IOMeter showed that we were getting a maximum of 104-113MBps using the Max-Throughput 100%Read test. This was even after implementing jumbo frames (on/off), flow control,  different RAID options (RAID 1/5/10), the turbo license, smaller LUNS etc.

 We had started to exhausted the options and were putting it down to a SAN fault. Even after getting in a replacement SAN (Demo Unit) the issue was still apparent.


Quite a bit of research later, I came across a blog post outlining an issue with ESXi, Round-robin path control and iSCSI. It would seem by default if ESXi is configured to use Round-Robin it sends 1000 Iops down one interfaces then 1000 Iops down the next. Changing the IOPS to 1 balances the load more affectively and increases performance - For us this increase was double to approx.. 207-210MBps and was within our tolerancea as the maximum throughput for a 1000Mbps interface is 125MBps totalling 250MBps across the two configured interfaces. 
























During our testing we also found the following:

If a single RAID array is created across all disks on a DS3524. Any performance intensives server can affect the performance of other servers as the read\write operations are spread across all disks
LESSON: CREATE SMALLER  RAID ARRAYS USING 4-6 DISKS CONFIGURATIONS

If a device or server is accessing a LUN the LUN is locked, this can affect other systems performance as the other server are waiting for the LUN to be freed
LESSON: IF CREATING A LUN FOR ESX VM's CREATE SMALLER LUNS WITHIN EACH ARRAY.

Of course the above lessons are for ESX LUNS and will be different for different workloads

CHANGING THE IOPS

Enable SSH:

Within VSphere Server > Click the ESXi Host > Click the "Configuration" Tab > Click "Security Profile" Tab > Locate the "Security Profile Heading > Click "Properties"




Locate and Left Click "SSH" > Click "Options" > Ensure "Start and Stop Manually" is selected > Click "Start"



SETTINGS IOPS

Using Putty Connect to the ESXi Hosts using SSH IP > Type username and password

Once logged in type the below command into putty. The output lists all iSCSI LUN that are configured on the ESX Host. I usually copy and paste the list into Notepad++. I am then able search and replace to get a small script together which include the rest of the commands

esxcli storage nmp device list | grep naa.600
For the below commands replace the naa.<number> for an naa. entry from the list gathered from the above step

Set the LUN to use Round Robin
esxcli storage nmp device set -d naa.<number> --psp=VMW_PSP_RR
Check that Round-Robin has been configured
esxcli stroage nmp device list -d naa.<number>
Set the IOPS to 1 for a LUN
esxcli storage nmp psp roundrobin deviceconfig set -t iops -I 1 -d=naa.<number>

Check the IOPS for a LUN
esxcli storage nmp psp roundrobin deviceconfig get -d=naa.<number>